All of lore.kernel.org
 help / color / mirror / Atom feed
* R4 problem started with 2.6.39 and still there with 3.6.6
@ 2012-12-07 17:56 Dušan Čolić
  2012-12-07 18:34 ` Dušan Čolić
  2012-12-09 12:36 ` R4 problem started with 2.6.39 and still there with 3.6.6 Ivan Shapovalov
  0 siblings, 2 replies; 28+ messages in thread
From: Dušan Čolić @ 2012-12-07 17:56 UTC (permalink / raw)
  To: reiserfs-devel

Hello

I'm using KVM for windows emulation and I have a ~3GB image file that
I run it from.
I started having problems with it lately on regular and ccreg40
partitions (I tried same file on both)  using 3.6.6.
Spammed output with a lot of these:

Dec  7 03:30:02 krshina3 kernel: [15135.133047] reiser4[find(5806)]:
parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
Dec  7 03:30:02 krshina3 kernel: [15135.133047] WARNING: Wrong level
found in node: 2 != 1
Dec  7 03:30:02 krshina3 kernel: [15135.133048] reiser4[find(5806)]:
key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
Dec  7 03:30:02 krshina3 kernel: [15135.133048] WARNING: Error for
inode 17802378 (-5)
Dec  7 03:30:02 krshina3 kernel: [15135.133056] reiser4[find(5806)]:
parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
Dec  7 03:30:02 krshina3 kernel: [15135.133056] WARNING: Wrong level
found in node: 2 != 1


Dec  7 17:26:23 krshina3 kernel: [38539.089191] reiser4[gdm(2676)]:
cbk_level_lookup (fs/reiser4/search.c:963)[vs-3533]:
Dec  7 17:26:23 krshina3 kernel: [38539.089194] reiser4[gdm(2676)]:
key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
Dec  7 17:26:23 krshina3 kernel: [38539.090837]
reiser4[gnome-screensav(3503)]: cbk_level_lookup
(fs/reiser4/search.c:963)[vs-3533]:
Dec  7 17:26:23 krshina3 kernel: [38539.090840]
reiser4[gnome-screensav(3503)]: key_warning
(fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:


 I fscked the FSes and had some errors that were corrected.

Now I started geting these and I can't kill the offending process:

Dec  7 18:43:29 krshina3 kernel: [  720.274343] INFO: task
qemu-system-x86:4156 blocked for more than 120 seconds.
Dec  7 18:43:29 krshina3 kernel: [  720.274347] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec  7 18:43:29 krshina3 kernel: [  720.274349] qemu-system-x86 D
0000000000000001     0  4156   3654 0x00000000
Dec  7 18:43:29 krshina3 kernel: [  720.274354]  ffff880206dd7990
0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
Dec  7 18:43:29 krshina3 kernel: [  720.274357]  0000000000011240
ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
Dec  7 18:43:29 krshina3 kernel: [  720.274361]  0000000000011240
ffff880206dd7990 0000000000011240 ffff8801def2e000
Dec  7 18:43:29 krshina3 kernel: [  720.274364] Call Trace:
Dec  7 18:43:29 krshina3 kernel: [  720.274372]  [<ffffffff810aec97>]
? pagevec_lookup_tag+0x18/0x21
Dec  7 18:43:29 krshina3 kernel: [  720.274375]  [<ffffffff810a528a>]
? filemap_fdatawait_range+0xff/0x144
Dec  7 18:43:29 krshina3 kernel: [  720.274380]  [<ffffffff81146b09>]
? writepages_unix_file+0x36e/0x3ce
Dec  7 18:43:29 krshina3 kernel: [  720.274384]  [<ffffffff810ab9f9>]
? global_dirtyable_memory+0xd/0x2c
Dec  7 18:43:29 krshina3 kernel: [  720.274389]  [<ffffffff81489dac>]
? __mutex_lock_slowpath+0xd0/0x116
Dec  7 18:43:29 krshina3 kernel: [  720.274392]  [<ffffffff8148a096>]
? mutex_lock+0x1a/0x2d
Dec  7 18:43:29 krshina3 kernel: [  720.274396]  [<ffffffff81143b73>]
? reiser4_sync_file_common+0x58/0xcd
Dec  7 18:43:29 krshina3 kernel: [  720.274400]  [<ffffffff81147a81>]
? write_unix_file+0x442/0x4b7
Dec  7 18:43:29 krshina3 kernel: [  720.274404]  [<ffffffff811498b9>]
? reiser4_write_careful+0xb8/0x450
Dec  7 18:43:29 krshina3 kernel: [  720.274409]  [<ffffffff810da90f>]
? vfs_write+0xaf/0x149
Dec  7 18:43:29 krshina3 kernel: [  720.274412]  [<ffffffff810dab49>]
? sys_pwrite64+0x53/0x71
Dec  7 18:43:29 krshina3 kernel: [  720.274415]  [<ffffffff8148c3e2>]
? system_call_fastpath+0x16/0x1b
Dec  7 18:43:29 krshina3 kernel: [  720.274418] INFO: task
qemu-system-x86:4162 blocked for more than 120 seconds.
Dec  7 18:43:29 krshina3 kernel: [  720.274419] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec  7 18:43:29 krshina3 kernel: [  720.274420] qemu-system-x86 D
0000000000000000     0  4162   3654 0x00000000
Dec  7 18:43:29 krshina3 kernel: [  720.274423]  ffff88020f7cacf0
0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
Dec  7 18:43:29 krshina3 kernel: [  720.274426]  0000000000011240
ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
Dec  7 18:43:29 krshina3 kernel: [  720.274429]  0000000000011240
ffff88020f7cacf0 0000000000011240 ffff8801e007e000
Dec  7 18:43:29 krshina3 kernel: [  720.274432] Call Trace:
Dec  7 18:43:29 krshina3 kernel: [  720.274437]  [<ffffffff810aec97>]
? pagevec_lookup_tag+0x18/0x21
Dec  7 18:43:29 krshina3 kernel: [  720.274439]  [<ffffffff810a528a>]
? filemap_fdatawait_range+0xff/0x144
Dec  7 18:43:29 krshina3 kernel: [  720.274443]  [<ffffffff81146b09>]
? writepages_unix_file+0x36e/0x3ce
Dec  7 18:43:29 krshina3 kernel: [  720.274447]  [<ffffffff81489dac>]
? __mutex_lock_slowpath+0xd0/0x116
Dec  7 18:43:29 krshina3 kernel: [  720.274450]  [<ffffffff8148a096>]
? mutex_lock+0x1a/0x2d
Dec  7 18:43:29 krshina3 kernel: [  720.274453]  [<ffffffff81143b73>]
? reiser4_sync_file_common+0x58/0xcd
Dec  7 18:43:29 krshina3 kernel: [  720.274457]  [<ffffffff810fa6d8>]
? do_fsync+0x29/0x47
Dec  7 18:43:29 krshina3 kernel: [  720.274460]  [<ffffffff810fa716>]
? sys_fdatasync+0xe/0x15
Dec  7 18:43:29 krshina3 kernel: [  720.274462]  [<ffffffff8148c3e2>]
? system_call_fastpath+0x16/0x1b
tail: unrecognized file system type 0x52345362 for
'/var/log/messages'. please report this to bug-coreutils@gnu.org.
reverting to polling
Dec  7 18:45:29 krshina3 kernel: [  840.266595] INFO: task
qemu-system-x86:4156 blocked for more than 120 seconds.
Dec  7 18:45:29 krshina3 kernel: [  840.266599] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec  7 18:45:29 krshina3 kernel: [  840.266601] qemu-system-x86 D
0000000000000001     0  4156   3654 0x00000000
Dec  7 18:45:29 krshina3 kernel: [  840.266605]  ffff880206dd7990
0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
Dec  7 18:45:29 krshina3 kernel: [  840.266609]  0000000000011240
ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
Dec  7 18:45:29 krshina3 kernel: [  840.266613]  0000000000011240
ffff880206dd7990 0000000000011240 ffff8801def2e000
Dec  7 18:45:29 krshina3 kernel: [  840.266616] Call Trace:
Dec  7 18:45:29 krshina3 kernel: [  840.266625]  [<ffffffff810aec97>]
? pagevec_lookup_tag+0x18/0x21
Dec  7 18:45:29 krshina3 kernel: [  840.266629]  [<ffffffff810a528a>]
? filemap_fdatawait_range+0xff/0x144
Dec  7 18:45:29 krshina3 kernel: [  840.266634]  [<ffffffff81146b09>]
? writepages_unix_file+0x36e/0x3ce
Dec  7 18:45:29 krshina3 kernel: [  840.266638]  [<ffffffff810ab9f9>]
? global_dirtyable_memory+0xd/0x2c
Dec  7 18:45:29 krshina3 kernel: [  840.266643]  [<ffffffff81489dac>]
? __mutex_lock_slowpath+0xd0/0x116
Dec  7 18:45:29 krshina3 kernel: [  840.266646]  [<ffffffff8148a096>]
? mutex_lock+0x1a/0x2d
Dec  7 18:45:29 krshina3 kernel: [  840.266650]  [<ffffffff81143b73>]
? reiser4_sync_file_common+0x58/0xcd
Dec  7 18:45:29 krshina3 kernel: [  840.266654]  [<ffffffff81147a81>]
? write_unix_file+0x442/0x4b7
Dec  7 18:45:29 krshina3 kernel: [  840.266658]  [<ffffffff811498b9>]
? reiser4_write_careful+0xb8/0x450
Dec  7 18:45:29 krshina3 kernel: [  840.266664]  [<ffffffff810da90f>]
? vfs_write+0xaf/0x149
Dec  7 18:45:29 krshina3 kernel: [  840.266666]  [<ffffffff810dab49>]
? sys_pwrite64+0x53/0x71
Dec  7 18:45:29 krshina3 kernel: [  840.266670]  [<ffffffff8148c3e2>]
? system_call_fastpath+0x16/0x1b
Dec  7 18:45:29 krshina3 kernel: [  840.266672] INFO: task
qemu-system-x86:4162 blocked for more than 120 seconds.
Dec  7 18:45:29 krshina3 kernel: [  840.266674] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec  7 18:45:29 krshina3 kernel: [  840.266675] qemu-system-x86 D
0000000000000000     0  4162   3654 0x00000000
Dec  7 18:45:29 krshina3 kernel: [  840.266678]  ffff88020f7cacf0
0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
Dec  7 18:45:29 krshina3 kernel: [  840.266681]  0000000000011240
ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
Dec  7 18:45:29 krshina3 kernel: [  840.266684]  0000000000011240
ffff88020f7cacf0 0000000000011240 ffff8801e007e000
Dec  7 18:45:29 krshina3 kernel: [  840.266687] Call Trace:
Dec  7 18:45:29 krshina3 kernel: [  840.266691]  [<ffffffff810aec97>]
? pagevec_lookup_tag+0x18/0x21
Dec  7 18:45:29 krshina3 kernel: [  840.266694]  [<ffffffff810a528a>]
? filemap_fdatawait_range+0xff/0x144
Dec  7 18:45:29 krshina3 kernel: [  840.266698]  [<ffffffff81146b09>]
? writepages_unix_file+0x36e/0x3ce
Dec  7 18:45:29 krshina3 kernel: [  840.266702]  [<ffffffff81489dac>]
? __mutex_lock_slowpath+0xd0/0x116
Dec  7 18:45:29 krshina3 kernel: [  840.266705]  [<ffffffff8148a096>]
? mutex_lock+0x1a/0x2d
Dec  7 18:45:29 krshina3 kernel: [  840.266708]  [<ffffffff81143b73>]
? reiser4_sync_file_common+0x58/0xcd
Dec  7 18:45:29 krshina3 kernel: [  840.266712]  [<ffffffff810fa6d8>]
? do_fsync+0x29/0x47
Dec  7 18:45:29 krshina3 kernel: [  840.266714]  [<ffffffff810fa716>]
? sys_fdatasync+0xe/0x15
Dec  7 18:45:29 krshina3 kernel: [  840.266717]  [<ffffffff8148c3e2>]
? system_call_fastpath+0x16/0x1b


File runs fine from FAT32 partition

If I can do something, or you need any info tell me please

Thanks
Dushan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: R4 problem started with 2.6.39 and still there with 3.6.6
  2012-12-07 17:56 R4 problem started with 2.6.39 and still there with 3.6.6 Dušan Čolić
@ 2012-12-07 18:34 ` Dušan Čolić
  2012-12-09 15:17   ` Ivan Shapovalov
  2012-12-09 12:36 ` R4 problem started with 2.6.39 and still there with 3.6.6 Ivan Shapovalov
  1 sibling, 1 reply; 28+ messages in thread
From: Dušan Čolić @ 2012-12-07 18:34 UTC (permalink / raw)
  To: reiserfs-devel

Ok, on just fscked partition I now get:

Dec  7 19:31:43 krshina3 kernel: [ 2069.584096] reiser4[sshd(5058)]:
find_cluster_item
(fs/reiser4/plugin/file/cryptcompress.c:781)[edward-1608]:
Dec  7 19:31:43 krshina3 kernel: [ 2069.584096] WARNING: Expected item
not found. Fsck?
Dec  7 19:31:43 krshina3 kernel: [ 2069.584104] reiser4[sshd(5058)]:
dc_check_checksum
(fs/reiser4/plugin/file/cryptcompress.c:1023)[edward-156]:
Dec  7 19:31:43 krshina3 kernel: [ 2069.584104] WARNING: Bad disk
cluster checksum 1869768224, (should be 950540942) Fsck?
Dec  7 19:31:43 krshina3 kernel: [ 2069.584104]
Dec  7 19:31:43 krshina3 kernel: [ 2069.584109] reiser4[sshd(5058)]:
reiser4_inflate_cluster
(fs/reiser4/plugin/file/cryptcompress.c:1190)[edward-1460]:
Dec  7 19:31:43 krshina3 kernel: [ 2069.584109] WARNING: Inode
14592305: disk cluster 0 looks corrupted
Dec  7 19:31:43 krshina3 sshd[5056]: Accepted keyboard-interactive/pam
for root from 192.168.1.10 port 7531 ssh2
Dec  7 19:31:43 krshina3 sshd[5056]: pam_unix(sshd:session): session
opened for user root by (uid=0)
Dec  7 19:31:43 krshina3 kernel: [ 2069.637085] reiser4[bash(5066)]:
find_cluster_item
(fs/reiser4/plugin/file/cryptcompress.c:781)[edward-1608]:
Dec  7 19:31:43 krshina3 kernel: [ 2069.637085] WARNING: Expected item
not found. Fsck?
Dec  7 19:31:43 krshina3 kernel: [ 2069.637094] reiser4[bash(5066)]:
dc_check_checksum
(fs/reiser4/plugin/file/cryptcompress.c:1023)[edward-156]:
Dec  7 19:31:43 krshina3 kernel: [ 2069.637094] WARNING: Bad disk
cluster checksum -1945338855, (should be 944271739) Fsck?
Dec  7 19:31:43 krshina3 kernel: [ 2069.637094]
Dec  7 19:31:43 krshina3 kernel: [ 2069.637098] reiser4[bash(5066)]:
reiser4_inflate_cluster
(fs/reiser4/plugin/file/cryptcompress.c:1190)[edward-1460]:
Dec  7 19:31:43 krshina3 kernel: [ 2069.637098] WARNING: Inode
15185444: disk cluster 0 looks corrupted
tail: unrecognized file system type 0x52345362 for
'/var/log/messages'. please report this to bug-coreutils@gnu.org.
reverting to polling

This is getting bad, I'm going back to 2.6.39 :D

On Fri, Dec 7, 2012 at 6:56 PM, Dušan Čolić <dusanc@gmail.com> wrote:
> Hello
>
> I'm using KVM for windows emulation and I have a ~3GB image file that
> I run it from.
> I started having problems with it lately on regular and ccreg40
> partitions (I tried same file on both)  using 3.6.6.
> Spammed output with a lot of these:
>
> Dec  7 03:30:02 krshina3 kernel: [15135.133047] reiser4[find(5806)]:
> parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
> Dec  7 03:30:02 krshina3 kernel: [15135.133047] WARNING: Wrong level
> found in node: 2 != 1
> Dec  7 03:30:02 krshina3 kernel: [15135.133048] reiser4[find(5806)]:
> key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
> Dec  7 03:30:02 krshina3 kernel: [15135.133048] WARNING: Error for
> inode 17802378 (-5)
> Dec  7 03:30:02 krshina3 kernel: [15135.133056] reiser4[find(5806)]:
> parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
> Dec  7 03:30:02 krshina3 kernel: [15135.133056] WARNING: Wrong level
> found in node: 2 != 1
>
>
> Dec  7 17:26:23 krshina3 kernel: [38539.089191] reiser4[gdm(2676)]:
> cbk_level_lookup (fs/reiser4/search.c:963)[vs-3533]:
> Dec  7 17:26:23 krshina3 kernel: [38539.089194] reiser4[gdm(2676)]:
> key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
> Dec  7 17:26:23 krshina3 kernel: [38539.090837]
> reiser4[gnome-screensav(3503)]: cbk_level_lookup
> (fs/reiser4/search.c:963)[vs-3533]:
> Dec  7 17:26:23 krshina3 kernel: [38539.090840]
> reiser4[gnome-screensav(3503)]: key_warning
> (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
>
>
>  I fscked the FSes and had some errors that were corrected.
>
> Now I started geting these and I can't kill the offending process:
>
> Dec  7 18:43:29 krshina3 kernel: [  720.274343] INFO: task
> qemu-system-x86:4156 blocked for more than 120 seconds.
> Dec  7 18:43:29 krshina3 kernel: [  720.274347] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Dec  7 18:43:29 krshina3 kernel: [  720.274349] qemu-system-x86 D
> 0000000000000001     0  4156   3654 0x00000000
> Dec  7 18:43:29 krshina3 kernel: [  720.274354]  ffff880206dd7990
> 0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
> Dec  7 18:43:29 krshina3 kernel: [  720.274357]  0000000000011240
> ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
> Dec  7 18:43:29 krshina3 kernel: [  720.274361]  0000000000011240
> ffff880206dd7990 0000000000011240 ffff8801def2e000
> Dec  7 18:43:29 krshina3 kernel: [  720.274364] Call Trace:
> Dec  7 18:43:29 krshina3 kernel: [  720.274372]  [<ffffffff810aec97>]
> ? pagevec_lookup_tag+0x18/0x21
> Dec  7 18:43:29 krshina3 kernel: [  720.274375]  [<ffffffff810a528a>]
> ? filemap_fdatawait_range+0xff/0x144
> Dec  7 18:43:29 krshina3 kernel: [  720.274380]  [<ffffffff81146b09>]
> ? writepages_unix_file+0x36e/0x3ce
> Dec  7 18:43:29 krshina3 kernel: [  720.274384]  [<ffffffff810ab9f9>]
> ? global_dirtyable_memory+0xd/0x2c
> Dec  7 18:43:29 krshina3 kernel: [  720.274389]  [<ffffffff81489dac>]
> ? __mutex_lock_slowpath+0xd0/0x116
> Dec  7 18:43:29 krshina3 kernel: [  720.274392]  [<ffffffff8148a096>]
> ? mutex_lock+0x1a/0x2d
> Dec  7 18:43:29 krshina3 kernel: [  720.274396]  [<ffffffff81143b73>]
> ? reiser4_sync_file_common+0x58/0xcd
> Dec  7 18:43:29 krshina3 kernel: [  720.274400]  [<ffffffff81147a81>]
> ? write_unix_file+0x442/0x4b7
> Dec  7 18:43:29 krshina3 kernel: [  720.274404]  [<ffffffff811498b9>]
> ? reiser4_write_careful+0xb8/0x450
> Dec  7 18:43:29 krshina3 kernel: [  720.274409]  [<ffffffff810da90f>]
> ? vfs_write+0xaf/0x149
> Dec  7 18:43:29 krshina3 kernel: [  720.274412]  [<ffffffff810dab49>]
> ? sys_pwrite64+0x53/0x71
> Dec  7 18:43:29 krshina3 kernel: [  720.274415]  [<ffffffff8148c3e2>]
> ? system_call_fastpath+0x16/0x1b
> Dec  7 18:43:29 krshina3 kernel: [  720.274418] INFO: task
> qemu-system-x86:4162 blocked for more than 120 seconds.
> Dec  7 18:43:29 krshina3 kernel: [  720.274419] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Dec  7 18:43:29 krshina3 kernel: [  720.274420] qemu-system-x86 D
> 0000000000000000     0  4162   3654 0x00000000
> Dec  7 18:43:29 krshina3 kernel: [  720.274423]  ffff88020f7cacf0
> 0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
> Dec  7 18:43:29 krshina3 kernel: [  720.274426]  0000000000011240
> ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
> Dec  7 18:43:29 krshina3 kernel: [  720.274429]  0000000000011240
> ffff88020f7cacf0 0000000000011240 ffff8801e007e000
> Dec  7 18:43:29 krshina3 kernel: [  720.274432] Call Trace:
> Dec  7 18:43:29 krshina3 kernel: [  720.274437]  [<ffffffff810aec97>]
> ? pagevec_lookup_tag+0x18/0x21
> Dec  7 18:43:29 krshina3 kernel: [  720.274439]  [<ffffffff810a528a>]
> ? filemap_fdatawait_range+0xff/0x144
> Dec  7 18:43:29 krshina3 kernel: [  720.274443]  [<ffffffff81146b09>]
> ? writepages_unix_file+0x36e/0x3ce
> Dec  7 18:43:29 krshina3 kernel: [  720.274447]  [<ffffffff81489dac>]
> ? __mutex_lock_slowpath+0xd0/0x116
> Dec  7 18:43:29 krshina3 kernel: [  720.274450]  [<ffffffff8148a096>]
> ? mutex_lock+0x1a/0x2d
> Dec  7 18:43:29 krshina3 kernel: [  720.274453]  [<ffffffff81143b73>]
> ? reiser4_sync_file_common+0x58/0xcd
> Dec  7 18:43:29 krshina3 kernel: [  720.274457]  [<ffffffff810fa6d8>]
> ? do_fsync+0x29/0x47
> Dec  7 18:43:29 krshina3 kernel: [  720.274460]  [<ffffffff810fa716>]
> ? sys_fdatasync+0xe/0x15
> Dec  7 18:43:29 krshina3 kernel: [  720.274462]  [<ffffffff8148c3e2>]
> ? system_call_fastpath+0x16/0x1b
> tail: unrecognized file system type 0x52345362 for
> '/var/log/messages'. please report this to bug-coreutils@gnu.org.
> reverting to polling
> Dec  7 18:45:29 krshina3 kernel: [  840.266595] INFO: task
> qemu-system-x86:4156 blocked for more than 120 seconds.
> Dec  7 18:45:29 krshina3 kernel: [  840.266599] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Dec  7 18:45:29 krshina3 kernel: [  840.266601] qemu-system-x86 D
> 0000000000000001     0  4156   3654 0x00000000
> Dec  7 18:45:29 krshina3 kernel: [  840.266605]  ffff880206dd7990
> 0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
> Dec  7 18:45:29 krshina3 kernel: [  840.266609]  0000000000011240
> ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
> Dec  7 18:45:29 krshina3 kernel: [  840.266613]  0000000000011240
> ffff880206dd7990 0000000000011240 ffff8801def2e000
> Dec  7 18:45:29 krshina3 kernel: [  840.266616] Call Trace:
> Dec  7 18:45:29 krshina3 kernel: [  840.266625]  [<ffffffff810aec97>]
> ? pagevec_lookup_tag+0x18/0x21
> Dec  7 18:45:29 krshina3 kernel: [  840.266629]  [<ffffffff810a528a>]
> ? filemap_fdatawait_range+0xff/0x144
> Dec  7 18:45:29 krshina3 kernel: [  840.266634]  [<ffffffff81146b09>]
> ? writepages_unix_file+0x36e/0x3ce
> Dec  7 18:45:29 krshina3 kernel: [  840.266638]  [<ffffffff810ab9f9>]
> ? global_dirtyable_memory+0xd/0x2c
> Dec  7 18:45:29 krshina3 kernel: [  840.266643]  [<ffffffff81489dac>]
> ? __mutex_lock_slowpath+0xd0/0x116
> Dec  7 18:45:29 krshina3 kernel: [  840.266646]  [<ffffffff8148a096>]
> ? mutex_lock+0x1a/0x2d
> Dec  7 18:45:29 krshina3 kernel: [  840.266650]  [<ffffffff81143b73>]
> ? reiser4_sync_file_common+0x58/0xcd
> Dec  7 18:45:29 krshina3 kernel: [  840.266654]  [<ffffffff81147a81>]
> ? write_unix_file+0x442/0x4b7
> Dec  7 18:45:29 krshina3 kernel: [  840.266658]  [<ffffffff811498b9>]
> ? reiser4_write_careful+0xb8/0x450
> Dec  7 18:45:29 krshina3 kernel: [  840.266664]  [<ffffffff810da90f>]
> ? vfs_write+0xaf/0x149
> Dec  7 18:45:29 krshina3 kernel: [  840.266666]  [<ffffffff810dab49>]
> ? sys_pwrite64+0x53/0x71
> Dec  7 18:45:29 krshina3 kernel: [  840.266670]  [<ffffffff8148c3e2>]
> ? system_call_fastpath+0x16/0x1b
> Dec  7 18:45:29 krshina3 kernel: [  840.266672] INFO: task
> qemu-system-x86:4162 blocked for more than 120 seconds.
> Dec  7 18:45:29 krshina3 kernel: [  840.266674] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Dec  7 18:45:29 krshina3 kernel: [  840.266675] qemu-system-x86 D
> 0000000000000000     0  4162   3654 0x00000000
> Dec  7 18:45:29 krshina3 kernel: [  840.266678]  ffff88020f7cacf0
> 0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
> Dec  7 18:45:29 krshina3 kernel: [  840.266681]  0000000000011240
> ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
> Dec  7 18:45:29 krshina3 kernel: [  840.266684]  0000000000011240
> ffff88020f7cacf0 0000000000011240 ffff8801e007e000
> Dec  7 18:45:29 krshina3 kernel: [  840.266687] Call Trace:
> Dec  7 18:45:29 krshina3 kernel: [  840.266691]  [<ffffffff810aec97>]
> ? pagevec_lookup_tag+0x18/0x21
> Dec  7 18:45:29 krshina3 kernel: [  840.266694]  [<ffffffff810a528a>]
> ? filemap_fdatawait_range+0xff/0x144
> Dec  7 18:45:29 krshina3 kernel: [  840.266698]  [<ffffffff81146b09>]
> ? writepages_unix_file+0x36e/0x3ce
> Dec  7 18:45:29 krshina3 kernel: [  840.266702]  [<ffffffff81489dac>]
> ? __mutex_lock_slowpath+0xd0/0x116
> Dec  7 18:45:29 krshina3 kernel: [  840.266705]  [<ffffffff8148a096>]
> ? mutex_lock+0x1a/0x2d
> Dec  7 18:45:29 krshina3 kernel: [  840.266708]  [<ffffffff81143b73>]
> ? reiser4_sync_file_common+0x58/0xcd
> Dec  7 18:45:29 krshina3 kernel: [  840.266712]  [<ffffffff810fa6d8>]
> ? do_fsync+0x29/0x47
> Dec  7 18:45:29 krshina3 kernel: [  840.266714]  [<ffffffff810fa716>]
> ? sys_fdatasync+0xe/0x15
> Dec  7 18:45:29 krshina3 kernel: [  840.266717]  [<ffffffff8148c3e2>]
> ? system_call_fastpath+0x16/0x1b
>
>
> File runs fine from FAT32 partition
>
> If I can do something, or you need any info tell me please
>
> Thanks
> Dushan
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: R4 problem started with 2.6.39 and still there with 3.6.6
  2012-12-07 17:56 R4 problem started with 2.6.39 and still there with 3.6.6 Dušan Čolić
  2012-12-07 18:34 ` Dušan Čolić
@ 2012-12-09 12:36 ` Ivan Shapovalov
  2012-12-09 14:47   ` Dušan Čolić
  1 sibling, 1 reply; 28+ messages in thread
From: Ivan Shapovalov @ 2012-12-09 12:36 UTC (permalink / raw)
  To: Dušan Čolić; +Cc: reiserfs-devel

On 07 December 2012 18:56:26 Dušan Čolić wrote:
> Hello
> 
> I'm using KVM for windows emulation and I have a ~3GB image file that
> I run it from.
> I started having problems with it lately on regular and ccreg40
> partitions (I tried same file on both)  using 3.6.6.
> Spammed output with a lot of these:
> 
> Dec  7 03:30:02 krshina3 kernel: [15135.133047] reiser4[find(5806)]:
> parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
> Dec  7 03:30:02 krshina3 kernel: [15135.133047] WARNING: Wrong level
> found in node: 2 != 1
> Dec  7 03:30:02 krshina3 kernel: [15135.133048] reiser4[find(5806)]:
> key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
> Dec  7 03:30:02 krshina3 kernel: [15135.133048] WARNING: Error for
> inode 17802378 (-5)
> Dec  7 03:30:02 krshina3 kernel: [15135.133056] reiser4[find(5806)]:
> parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
> Dec  7 03:30:02 krshina3 kernel: [15135.133056] WARNING: Wrong level
> found in node: 2 != 1
> 
> 
> Dec  7 17:26:23 krshina3 kernel: [38539.089191] reiser4[gdm(2676)]:
> cbk_level_lookup (fs/reiser4/search.c:963)[vs-3533]:
> Dec  7 17:26:23 krshina3 kernel: [38539.089194] reiser4[gdm(2676)]:
> key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
> Dec  7 17:26:23 krshina3 kernel: [38539.090837]
> reiser4[gnome-screensav(3503)]: cbk_level_lookup
> (fs/reiser4/search.c:963)[vs-3533]:
> Dec  7 17:26:23 krshina3 kernel: [38539.090840]
> reiser4[gnome-screensav(3503)]: key_warning
> (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
> 
> 
>  I fscked the FSes and had some errors that were corrected.
> 
> Now I started geting these and I can't kill the offending process:
> 
> Dec  7 18:43:29 krshina3 kernel: [  720.274343] INFO: task
> qemu-system-x86:4156 blocked for more than 120 seconds.
> Dec  7 18:43:29 krshina3 kernel: [  720.274347] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Dec  7 18:43:29 krshina3 kernel: [  720.274349] qemu-system-x86 D
> 0000000000000001     0  4156   3654 0x00000000
> Dec  7 18:43:29 krshina3 kernel: [  720.274354]  ffff880206dd7990
> 0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
> Dec  7 18:43:29 krshina3 kernel: [  720.274357]  0000000000011240
> ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
> Dec  7 18:43:29 krshina3 kernel: [  720.274361]  0000000000011240
> ffff880206dd7990 0000000000011240 ffff8801def2e000
> Dec  7 18:43:29 krshina3 kernel: [  720.274364] Call Trace:
> Dec  7 18:43:29 krshina3 kernel: [  720.274372]  [<ffffffff810aec97>]
> ? pagevec_lookup_tag+0x18/0x21
> Dec  7 18:43:29 krshina3 kernel: [  720.274375]  [<ffffffff810a528a>]
> ? filemap_fdatawait_range+0xff/0x144
> Dec  7 18:43:29 krshina3 kernel: [  720.274380]  [<ffffffff81146b09>]
> ? writepages_unix_file+0x36e/0x3ce
> Dec  7 18:43:29 krshina3 kernel: [  720.274384]  [<ffffffff810ab9f9>]
> ? global_dirtyable_memory+0xd/0x2c
> Dec  7 18:43:29 krshina3 kernel: [  720.274389]  [<ffffffff81489dac>]
> ? __mutex_lock_slowpath+0xd0/0x116
> Dec  7 18:43:29 krshina3 kernel: [  720.274392]  [<ffffffff8148a096>]
> ? mutex_lock+0x1a/0x2d
> Dec  7 18:43:29 krshina3 kernel: [  720.274396]  [<ffffffff81143b73>]
> ? reiser4_sync_file_common+0x58/0xcd
> Dec  7 18:43:29 krshina3 kernel: [  720.274400]  [<ffffffff81147a81>]
> ? write_unix_file+0x442/0x4b7
> Dec  7 18:43:29 krshina3 kernel: [  720.274404]  [<ffffffff811498b9>]
> ? reiser4_write_careful+0xb8/0x450
> Dec  7 18:43:29 krshina3 kernel: [  720.274409]  [<ffffffff810da90f>]
> ? vfs_write+0xaf/0x149
> Dec  7 18:43:29 krshina3 kernel: [  720.274412]  [<ffffffff810dab49>]
> ? sys_pwrite64+0x53/0x71
> Dec  7 18:43:29 krshina3 kernel: [  720.274415]  [<ffffffff8148c3e2>]
> ? system_call_fastpath+0x16/0x1b
> Dec  7 18:43:29 krshina3 kernel: [  720.274418] INFO: task
> qemu-system-x86:4162 blocked for more than 120 seconds.
> Dec  7 18:43:29 krshina3 kernel: [  720.274419] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Dec  7 18:43:29 krshina3 kernel: [  720.274420] qemu-system-x86 D
> 0000000000000000     0  4162   3654 0x00000000
> Dec  7 18:43:29 krshina3 kernel: [  720.274423]  ffff88020f7cacf0
> 0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
> Dec  7 18:43:29 krshina3 kernel: [  720.274426]  0000000000011240
> ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
> Dec  7 18:43:29 krshina3 kernel: [  720.274429]  0000000000011240
> ffff88020f7cacf0 0000000000011240 ffff8801e007e000
> Dec  7 18:43:29 krshina3 kernel: [  720.274432] Call Trace:
> Dec  7 18:43:29 krshina3 kernel: [  720.274437]  [<ffffffff810aec97>]
> ? pagevec_lookup_tag+0x18/0x21
> Dec  7 18:43:29 krshina3 kernel: [  720.274439]  [<ffffffff810a528a>]
> ? filemap_fdatawait_range+0xff/0x144
> Dec  7 18:43:29 krshina3 kernel: [  720.274443]  [<ffffffff81146b09>]
> ? writepages_unix_file+0x36e/0x3ce
> Dec  7 18:43:29 krshina3 kernel: [  720.274447]  [<ffffffff81489dac>]
> ? __mutex_lock_slowpath+0xd0/0x116
> Dec  7 18:43:29 krshina3 kernel: [  720.274450]  [<ffffffff8148a096>]
> ? mutex_lock+0x1a/0x2d
> Dec  7 18:43:29 krshina3 kernel: [  720.274453]  [<ffffffff81143b73>]
> ? reiser4_sync_file_common+0x58/0xcd
> Dec  7 18:43:29 krshina3 kernel: [  720.274457]  [<ffffffff810fa6d8>]
> ? do_fsync+0x29/0x47
> Dec  7 18:43:29 krshina3 kernel: [  720.274460]  [<ffffffff810fa716>]
> ? sys_fdatasync+0xe/0x15
> Dec  7 18:43:29 krshina3 kernel: [  720.274462]  [<ffffffff8148c3e2>]
> ? system_call_fastpath+0x16/0x1b
> tail: unrecognized file system type 0x52345362 for
> '/var/log/messages'. please report this to bug-coreutils@gnu.org.
> reverting to polling
> Dec  7 18:45:29 krshina3 kernel: [  840.266595] INFO: task
> qemu-system-x86:4156 blocked for more than 120 seconds.
> Dec  7 18:45:29 krshina3 kernel: [  840.266599] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Dec  7 18:45:29 krshina3 kernel: [  840.266601] qemu-system-x86 D
> 0000000000000001     0  4156   3654 0x00000000
> Dec  7 18:45:29 krshina3 kernel: [  840.266605]  ffff880206dd7990
> 0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
> Dec  7 18:45:29 krshina3 kernel: [  840.266609]  0000000000011240
> ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
> Dec  7 18:45:29 krshina3 kernel: [  840.266613]  0000000000011240
> ffff880206dd7990 0000000000011240 ffff8801def2e000
> Dec  7 18:45:29 krshina3 kernel: [  840.266616] Call Trace:
> Dec  7 18:45:29 krshina3 kernel: [  840.266625]  [<ffffffff810aec97>]
> ? pagevec_lookup_tag+0x18/0x21
> Dec  7 18:45:29 krshina3 kernel: [  840.266629]  [<ffffffff810a528a>]
> ? filemap_fdatawait_range+0xff/0x144
> Dec  7 18:45:29 krshina3 kernel: [  840.266634]  [<ffffffff81146b09>]
> ? writepages_unix_file+0x36e/0x3ce
> Dec  7 18:45:29 krshina3 kernel: [  840.266638]  [<ffffffff810ab9f9>]
> ? global_dirtyable_memory+0xd/0x2c
> Dec  7 18:45:29 krshina3 kernel: [  840.266643]  [<ffffffff81489dac>]
> ? __mutex_lock_slowpath+0xd0/0x116
> Dec  7 18:45:29 krshina3 kernel: [  840.266646]  [<ffffffff8148a096>]
> ? mutex_lock+0x1a/0x2d
> Dec  7 18:45:29 krshina3 kernel: [  840.266650]  [<ffffffff81143b73>]
> ? reiser4_sync_file_common+0x58/0xcd
> Dec  7 18:45:29 krshina3 kernel: [  840.266654]  [<ffffffff81147a81>]
> ? write_unix_file+0x442/0x4b7
> Dec  7 18:45:29 krshina3 kernel: [  840.266658]  [<ffffffff811498b9>]
> ? reiser4_write_careful+0xb8/0x450
> Dec  7 18:45:29 krshina3 kernel: [  840.266664]  [<ffffffff810da90f>]
> ? vfs_write+0xaf/0x149
> Dec  7 18:45:29 krshina3 kernel: [  840.266666]  [<ffffffff810dab49>]
> ? sys_pwrite64+0x53/0x71
> Dec  7 18:45:29 krshina3 kernel: [  840.266670]  [<ffffffff8148c3e2>]
> ? system_call_fastpath+0x16/0x1b
> Dec  7 18:45:29 krshina3 kernel: [  840.266672] INFO: task
> qemu-system-x86:4162 blocked for more than 120 seconds.
> Dec  7 18:45:29 krshina3 kernel: [  840.266674] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Dec  7 18:45:29 krshina3 kernel: [  840.266675] qemu-system-x86 D
> 0000000000000000     0  4162   3654 0x00000000
> Dec  7 18:45:29 krshina3 kernel: [  840.266678]  ffff88020f7cacf0
> 0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
> Dec  7 18:45:29 krshina3 kernel: [  840.266681]  0000000000011240
> ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
> Dec  7 18:45:29 krshina3 kernel: [  840.266684]  0000000000011240
> ffff88020f7cacf0 0000000000011240 ffff8801e007e000
> Dec  7 18:45:29 krshina3 kernel: [  840.266687] Call Trace:
> Dec  7 18:45:29 krshina3 kernel: [  840.266691]  [<ffffffff810aec97>]
> ? pagevec_lookup_tag+0x18/0x21
> Dec  7 18:45:29 krshina3 kernel: [  840.266694]  [<ffffffff810a528a>]
> ? filemap_fdatawait_range+0xff/0x144
> Dec  7 18:45:29 krshina3 kernel: [  840.266698]  [<ffffffff81146b09>]
> ? writepages_unix_file+0x36e/0x3ce
> Dec  7 18:45:29 krshina3 kernel: [  840.266702]  [<ffffffff81489dac>]
> ? __mutex_lock_slowpath+0xd0/0x116
> Dec  7 18:45:29 krshina3 kernel: [  840.266705]  [<ffffffff8148a096>]
> ? mutex_lock+0x1a/0x2d
> Dec  7 18:45:29 krshina3 kernel: [  840.266708]  [<ffffffff81143b73>]
> ? reiser4_sync_file_common+0x58/0xcd
> Dec  7 18:45:29 krshina3 kernel: [  840.266712]  [<ffffffff810fa6d8>]
> ? do_fsync+0x29/0x47
> Dec  7 18:45:29 krshina3 kernel: [  840.266714]  [<ffffffff810fa716>]
> ? sys_fdatasync+0xe/0x15
> Dec  7 18:45:29 krshina3 kernel: [  840.266717]  [<ffffffff8148c3e2>]
> ? system_call_fastpath+0x16/0x1b
> 
> 
> File runs fine from FAT32 partition
> 
> If I can do something, or you need any info tell me please
> 
> Thanks
> Dushan

Did it start precisely from 2.6.39?
Got the same, and if yes, then it definitily seems to be an effect of the 
problem with in-kernel(?) dcache corruption - the one which also results in 
BUGs at 000000000000000e, assertion "nikita-2050" in jnode.c and so on...

BTW (@Edward): none of the problems can be reproduced on my VM with exactly 
same software configuration (copied root partition)...

Regards,
Ivan.
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: R4 problem started with 2.6.39 and still there with 3.6.6
  2012-12-09 12:36 ` R4 problem started with 2.6.39 and still there with 3.6.6 Ivan Shapovalov
@ 2012-12-09 14:47   ` Dušan Čolić
  2012-12-09 14:52     ` Dušan Čolić
  0 siblings, 1 reply; 28+ messages in thread
From: Dušan Čolić @ 2012-12-09 14:47 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel

On Sun, Dec 9, 2012 at 1:36 PM, Ivan Shapovalov <intelfx100@gmail.com> wrote:
> On 07 December 2012 18:56:26 Dušan Čolić wrote:
>> Hello
>>
>> I'm using KVM for windows emulation and I have a ~3GB image file that
>> I run it from.
>> I started having problems with it lately on regular and ccreg40
>> partitions (I tried same file on both)  using 3.6.6.
>> Spammed output with a lot of these:
>>
>> Dec  7 03:30:02 krshina3 kernel: [15135.133047] reiser4[find(5806)]:
>> parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
>> Dec  7 03:30:02 krshina3 kernel: [15135.133047] WARNING: Wrong level
>> found in node: 2 != 1
>> Dec  7 03:30:02 krshina3 kernel: [15135.133048] reiser4[find(5806)]:
>> key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
>> Dec  7 03:30:02 krshina3 kernel: [15135.133048] WARNING: Error for
>> inode 17802378 (-5)
>> Dec  7 03:30:02 krshina3 kernel: [15135.133056] reiser4[find(5806)]:
>> parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
>> Dec  7 03:30:02 krshina3 kernel: [15135.133056] WARNING: Wrong level
>> found in node: 2 != 1
>>
>>
>> Dec  7 17:26:23 krshina3 kernel: [38539.089191] reiser4[gdm(2676)]:
>> cbk_level_lookup (fs/reiser4/search.c:963)[vs-3533]:
>> Dec  7 17:26:23 krshina3 kernel: [38539.089194] reiser4[gdm(2676)]:
>> key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
>> Dec  7 17:26:23 krshina3 kernel: [38539.090837]
>> reiser4[gnome-screensav(3503)]: cbk_level_lookup
>> (fs/reiser4/search.c:963)[vs-3533]:
>> Dec  7 17:26:23 krshina3 kernel: [38539.090840]
>> reiser4[gnome-screensav(3503)]: key_warning
>> (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
>>
>>
>>  I fscked the FSes and had some errors that were corrected.
>>
>> Now I started geting these and I can't kill the offending process:
>>
>> Dec  7 18:43:29 krshina3 kernel: [  720.274343] INFO: task
>> qemu-system-x86:4156 blocked for more than 120 seconds.
>> Dec  7 18:43:29 krshina3 kernel: [  720.274347] "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Dec  7 18:43:29 krshina3 kernel: [  720.274349] qemu-system-x86 D
>> 0000000000000001     0  4156   3654 0x00000000
>> Dec  7 18:43:29 krshina3 kernel: [  720.274354]  ffff880206dd7990
>> 0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
>> Dec  7 18:43:29 krshina3 kernel: [  720.274357]  0000000000011240
>> ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
>> Dec  7 18:43:29 krshina3 kernel: [  720.274361]  0000000000011240
>> ffff880206dd7990 0000000000011240 ffff8801def2e000
>> Dec  7 18:43:29 krshina3 kernel: [  720.274364] Call Trace:
>> Dec  7 18:43:29 krshina3 kernel: [  720.274372]  [<ffffffff810aec97>]
>> ? pagevec_lookup_tag+0x18/0x21
>> Dec  7 18:43:29 krshina3 kernel: [  720.274375]  [<ffffffff810a528a>]
>> ? filemap_fdatawait_range+0xff/0x144
>> Dec  7 18:43:29 krshina3 kernel: [  720.274380]  [<ffffffff81146b09>]
>> ? writepages_unix_file+0x36e/0x3ce
>> Dec  7 18:43:29 krshina3 kernel: [  720.274384]  [<ffffffff810ab9f9>]
>> ? global_dirtyable_memory+0xd/0x2c
>> Dec  7 18:43:29 krshina3 kernel: [  720.274389]  [<ffffffff81489dac>]
>> ? __mutex_lock_slowpath+0xd0/0x116
>> Dec  7 18:43:29 krshina3 kernel: [  720.274392]  [<ffffffff8148a096>]
>> ? mutex_lock+0x1a/0x2d
>> Dec  7 18:43:29 krshina3 kernel: [  720.274396]  [<ffffffff81143b73>]
>> ? reiser4_sync_file_common+0x58/0xcd
>> Dec  7 18:43:29 krshina3 kernel: [  720.274400]  [<ffffffff81147a81>]
>> ? write_unix_file+0x442/0x4b7
>> Dec  7 18:43:29 krshina3 kernel: [  720.274404]  [<ffffffff811498b9>]
>> ? reiser4_write_careful+0xb8/0x450
>> Dec  7 18:43:29 krshina3 kernel: [  720.274409]  [<ffffffff810da90f>]
>> ? vfs_write+0xaf/0x149
>> Dec  7 18:43:29 krshina3 kernel: [  720.274412]  [<ffffffff810dab49>]
>> ? sys_pwrite64+0x53/0x71
>> Dec  7 18:43:29 krshina3 kernel: [  720.274415]  [<ffffffff8148c3e2>]
>> ? system_call_fastpath+0x16/0x1b
>> Dec  7 18:43:29 krshina3 kernel: [  720.274418] INFO: task
>> qemu-system-x86:4162 blocked for more than 120 seconds.
>> Dec  7 18:43:29 krshina3 kernel: [  720.274419] "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Dec  7 18:43:29 krshina3 kernel: [  720.274420] qemu-system-x86 D
>> 0000000000000000     0  4162   3654 0x00000000
>> Dec  7 18:43:29 krshina3 kernel: [  720.274423]  ffff88020f7cacf0
>> 0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
>> Dec  7 18:43:29 krshina3 kernel: [  720.274426]  0000000000011240
>> ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
>> Dec  7 18:43:29 krshina3 kernel: [  720.274429]  0000000000011240
>> ffff88020f7cacf0 0000000000011240 ffff8801e007e000
>> Dec  7 18:43:29 krshina3 kernel: [  720.274432] Call Trace:
>> Dec  7 18:43:29 krshina3 kernel: [  720.274437]  [<ffffffff810aec97>]
>> ? pagevec_lookup_tag+0x18/0x21
>> Dec  7 18:43:29 krshina3 kernel: [  720.274439]  [<ffffffff810a528a>]
>> ? filemap_fdatawait_range+0xff/0x144
>> Dec  7 18:43:29 krshina3 kernel: [  720.274443]  [<ffffffff81146b09>]
>> ? writepages_unix_file+0x36e/0x3ce
>> Dec  7 18:43:29 krshina3 kernel: [  720.274447]  [<ffffffff81489dac>]
>> ? __mutex_lock_slowpath+0xd0/0x116
>> Dec  7 18:43:29 krshina3 kernel: [  720.274450]  [<ffffffff8148a096>]
>> ? mutex_lock+0x1a/0x2d
>> Dec  7 18:43:29 krshina3 kernel: [  720.274453]  [<ffffffff81143b73>]
>> ? reiser4_sync_file_common+0x58/0xcd
>> Dec  7 18:43:29 krshina3 kernel: [  720.274457]  [<ffffffff810fa6d8>]
>> ? do_fsync+0x29/0x47
>> Dec  7 18:43:29 krshina3 kernel: [  720.274460]  [<ffffffff810fa716>]
>> ? sys_fdatasync+0xe/0x15
>> Dec  7 18:43:29 krshina3 kernel: [  720.274462]  [<ffffffff8148c3e2>]
>> ? system_call_fastpath+0x16/0x1b
>> tail: unrecognized file system type 0x52345362 for
>> '/var/log/messages'. please report this to bug-coreutils@gnu.org.
>> reverting to polling
>> Dec  7 18:45:29 krshina3 kernel: [  840.266595] INFO: task
>> qemu-system-x86:4156 blocked for more than 120 seconds.
>> Dec  7 18:45:29 krshina3 kernel: [  840.266599] "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Dec  7 18:45:29 krshina3 kernel: [  840.266601] qemu-system-x86 D
>> 0000000000000001     0  4156   3654 0x00000000
>> Dec  7 18:45:29 krshina3 kernel: [  840.266605]  ffff880206dd7990
>> 0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
>> Dec  7 18:45:29 krshina3 kernel: [  840.266609]  0000000000011240
>> ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
>> Dec  7 18:45:29 krshina3 kernel: [  840.266613]  0000000000011240
>> ffff880206dd7990 0000000000011240 ffff8801def2e000
>> Dec  7 18:45:29 krshina3 kernel: [  840.266616] Call Trace:
>> Dec  7 18:45:29 krshina3 kernel: [  840.266625]  [<ffffffff810aec97>]
>> ? pagevec_lookup_tag+0x18/0x21
>> Dec  7 18:45:29 krshina3 kernel: [  840.266629]  [<ffffffff810a528a>]
>> ? filemap_fdatawait_range+0xff/0x144
>> Dec  7 18:45:29 krshina3 kernel: [  840.266634]  [<ffffffff81146b09>]
>> ? writepages_unix_file+0x36e/0x3ce
>> Dec  7 18:45:29 krshina3 kernel: [  840.266638]  [<ffffffff810ab9f9>]
>> ? global_dirtyable_memory+0xd/0x2c
>> Dec  7 18:45:29 krshina3 kernel: [  840.266643]  [<ffffffff81489dac>]
>> ? __mutex_lock_slowpath+0xd0/0x116
>> Dec  7 18:45:29 krshina3 kernel: [  840.266646]  [<ffffffff8148a096>]
>> ? mutex_lock+0x1a/0x2d
>> Dec  7 18:45:29 krshina3 kernel: [  840.266650]  [<ffffffff81143b73>]
>> ? reiser4_sync_file_common+0x58/0xcd
>> Dec  7 18:45:29 krshina3 kernel: [  840.266654]  [<ffffffff81147a81>]
>> ? write_unix_file+0x442/0x4b7
>> Dec  7 18:45:29 krshina3 kernel: [  840.266658]  [<ffffffff811498b9>]
>> ? reiser4_write_careful+0xb8/0x450
>> Dec  7 18:45:29 krshina3 kernel: [  840.266664]  [<ffffffff810da90f>]
>> ? vfs_write+0xaf/0x149
>> Dec  7 18:45:29 krshina3 kernel: [  840.266666]  [<ffffffff810dab49>]
>> ? sys_pwrite64+0x53/0x71
>> Dec  7 18:45:29 krshina3 kernel: [  840.266670]  [<ffffffff8148c3e2>]
>> ? system_call_fastpath+0x16/0x1b
>> Dec  7 18:45:29 krshina3 kernel: [  840.266672] INFO: task
>> qemu-system-x86:4162 blocked for more than 120 seconds.
>> Dec  7 18:45:29 krshina3 kernel: [  840.266674] "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Dec  7 18:45:29 krshina3 kernel: [  840.266675] qemu-system-x86 D
>> 0000000000000000     0  4162   3654 0x00000000
>> Dec  7 18:45:29 krshina3 kernel: [  840.266678]  ffff88020f7cacf0
>> 0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
>> Dec  7 18:45:29 krshina3 kernel: [  840.266681]  0000000000011240
>> ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
>> Dec  7 18:45:29 krshina3 kernel: [  840.266684]  0000000000011240
>> ffff88020f7cacf0 0000000000011240 ffff8801e007e000
>> Dec  7 18:45:29 krshina3 kernel: [  840.266687] Call Trace:
>> Dec  7 18:45:29 krshina3 kernel: [  840.266691]  [<ffffffff810aec97>]
>> ? pagevec_lookup_tag+0x18/0x21
>> Dec  7 18:45:29 krshina3 kernel: [  840.266694]  [<ffffffff810a528a>]
>> ? filemap_fdatawait_range+0xff/0x144
>> Dec  7 18:45:29 krshina3 kernel: [  840.266698]  [<ffffffff81146b09>]
>> ? writepages_unix_file+0x36e/0x3ce
>> Dec  7 18:45:29 krshina3 kernel: [  840.266702]  [<ffffffff81489dac>]
>> ? __mutex_lock_slowpath+0xd0/0x116
>> Dec  7 18:45:29 krshina3 kernel: [  840.266705]  [<ffffffff8148a096>]
>> ? mutex_lock+0x1a/0x2d
>> Dec  7 18:45:29 krshina3 kernel: [  840.266708]  [<ffffffff81143b73>]
>> ? reiser4_sync_file_common+0x58/0xcd
>> Dec  7 18:45:29 krshina3 kernel: [  840.266712]  [<ffffffff810fa6d8>]
>> ? do_fsync+0x29/0x47
>> Dec  7 18:45:29 krshina3 kernel: [  840.266714]  [<ffffffff810fa716>]
>> ? sys_fdatasync+0xe/0x15
>> Dec  7 18:45:29 krshina3 kernel: [  840.266717]  [<ffffffff8148c3e2>]
>> ? system_call_fastpath+0x16/0x1b
>>
>>
>> File runs fine from FAT32 partition
>>
>> If I can do something, or you need any info tell me please
>>
>> Thanks
>> Dushan
>
> Did it start precisely from 2.6.39?

It started like this:

I used 2.6.39 for a long time without problems.
Then I updated to 3.6.6 and errors started happening ONLY after that
big image windows.qcow2 3.2GB file was first accessed from QEMU. Then
I tried copying that file to another partition (plain R4) to save it
and when I opened that file from QEMU then that partition started to
give me problems.
I even tried to do it on clean new R4 partition and it crashed.
So now every R4 partition that has that file accessed from QEMU give
me new problems with inconsistencies.
Using it from FAT32 works OK.

Only thing I changed recently was changing from piix sata driver to ahci.


> Got the same, and if yes, then it definitily seems to be an effect of the
> problem with in-kernel(?) dcache corruption - the one which also results in
> BUGs at 000000000000000e, assertion "nikita-2050" in jnode.c and so on...
>
> BTW (@Edward): none of the problems can be reproduced on my VM with exactly
> same software configuration (copied root partition)...
>
> Regards,
> Ivan.
Have a nice day

Dushan
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: R4 problem started with 2.6.39 and still there with 3.6.6
  2012-12-09 14:47   ` Dušan Čolić
@ 2012-12-09 14:52     ` Dušan Čolić
  0 siblings, 0 replies; 28+ messages in thread
From: Dušan Čolić @ 2012-12-09 14:52 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel

On Sun, Dec 9, 2012 at 3:47 PM, Dušan Čolić <dusanc@gmail.com> wrote:
> On Sun, Dec 9, 2012 at 1:36 PM, Ivan Shapovalov <intelfx100@gmail.com> wrote:
>> On 07 December 2012 18:56:26 Dušan Čolić wrote:
>>> Hello
>>>
>>> I'm using KVM for windows emulation and I have a ~3GB image file that
>>> I run it from.
>>> I started having problems with it lately on regular and ccreg40
>>> partitions (I tried same file on both)  using 3.6.6.
>>> Spammed output with a lot of these:
>>>
>>> Dec  7 03:30:02 krshina3 kernel: [15135.133047] reiser4[find(5806)]:
>>> parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
>>> Dec  7 03:30:02 krshina3 kernel: [15135.133047] WARNING: Wrong level
>>> found in node: 2 != 1
>>> Dec  7 03:30:02 krshina3 kernel: [15135.133048] reiser4[find(5806)]:
>>> key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
>>> Dec  7 03:30:02 krshina3 kernel: [15135.133048] WARNING: Error for
>>> inode 17802378 (-5)
>>> Dec  7 03:30:02 krshina3 kernel: [15135.133056] reiser4[find(5806)]:
>>> parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
>>> Dec  7 03:30:02 krshina3 kernel: [15135.133056] WARNING: Wrong level
>>> found in node: 2 != 1
>>>
>>>
>>> Dec  7 17:26:23 krshina3 kernel: [38539.089191] reiser4[gdm(2676)]:
>>> cbk_level_lookup (fs/reiser4/search.c:963)[vs-3533]:
>>> Dec  7 17:26:23 krshina3 kernel: [38539.089194] reiser4[gdm(2676)]:
>>> key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
>>> Dec  7 17:26:23 krshina3 kernel: [38539.090837]
>>> reiser4[gnome-screensav(3503)]: cbk_level_lookup
>>> (fs/reiser4/search.c:963)[vs-3533]:
>>> Dec  7 17:26:23 krshina3 kernel: [38539.090840]
>>> reiser4[gnome-screensav(3503)]: key_warning
>>> (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
>>>
>>>
>>>  I fscked the FSes and had some errors that were corrected.
>>>
>>> Now I started geting these and I can't kill the offending process:
>>>
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274343] INFO: task
>>> qemu-system-x86:4156 blocked for more than 120 seconds.
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274347] "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274349] qemu-system-x86 D
>>> 0000000000000001     0  4156   3654 0x00000000
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274354]  ffff880206dd7990
>>> 0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274357]  0000000000011240
>>> ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274361]  0000000000011240
>>> ffff880206dd7990 0000000000011240 ffff8801def2e000
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274364] Call Trace:
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274372]  [<ffffffff810aec97>]
>>> ? pagevec_lookup_tag+0x18/0x21
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274375]  [<ffffffff810a528a>]
>>> ? filemap_fdatawait_range+0xff/0x144
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274380]  [<ffffffff81146b09>]
>>> ? writepages_unix_file+0x36e/0x3ce
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274384]  [<ffffffff810ab9f9>]
>>> ? global_dirtyable_memory+0xd/0x2c
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274389]  [<ffffffff81489dac>]
>>> ? __mutex_lock_slowpath+0xd0/0x116
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274392]  [<ffffffff8148a096>]
>>> ? mutex_lock+0x1a/0x2d
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274396]  [<ffffffff81143b73>]
>>> ? reiser4_sync_file_common+0x58/0xcd
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274400]  [<ffffffff81147a81>]
>>> ? write_unix_file+0x442/0x4b7
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274404]  [<ffffffff811498b9>]
>>> ? reiser4_write_careful+0xb8/0x450
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274409]  [<ffffffff810da90f>]
>>> ? vfs_write+0xaf/0x149
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274412]  [<ffffffff810dab49>]
>>> ? sys_pwrite64+0x53/0x71
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274415]  [<ffffffff8148c3e2>]
>>> ? system_call_fastpath+0x16/0x1b
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274418] INFO: task
>>> qemu-system-x86:4162 blocked for more than 120 seconds.
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274419] "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274420] qemu-system-x86 D
>>> 0000000000000000     0  4162   3654 0x00000000
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274423]  ffff88020f7cacf0
>>> 0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274426]  0000000000011240
>>> ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274429]  0000000000011240
>>> ffff88020f7cacf0 0000000000011240 ffff8801e007e000
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274432] Call Trace:
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274437]  [<ffffffff810aec97>]
>>> ? pagevec_lookup_tag+0x18/0x21
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274439]  [<ffffffff810a528a>]
>>> ? filemap_fdatawait_range+0xff/0x144
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274443]  [<ffffffff81146b09>]
>>> ? writepages_unix_file+0x36e/0x3ce
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274447]  [<ffffffff81489dac>]
>>> ? __mutex_lock_slowpath+0xd0/0x116
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274450]  [<ffffffff8148a096>]
>>> ? mutex_lock+0x1a/0x2d
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274453]  [<ffffffff81143b73>]
>>> ? reiser4_sync_file_common+0x58/0xcd
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274457]  [<ffffffff810fa6d8>]
>>> ? do_fsync+0x29/0x47
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274460]  [<ffffffff810fa716>]
>>> ? sys_fdatasync+0xe/0x15
>>> Dec  7 18:43:29 krshina3 kernel: [  720.274462]  [<ffffffff8148c3e2>]
>>> ? system_call_fastpath+0x16/0x1b
>>> tail: unrecognized file system type 0x52345362 for
>>> '/var/log/messages'. please report this to bug-coreutils@gnu.org.
>>> reverting to polling
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266595] INFO: task
>>> qemu-system-x86:4156 blocked for more than 120 seconds.
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266599] "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266601] qemu-system-x86 D
>>> 0000000000000001     0  4156   3654 0x00000000
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266605]  ffff880206dd7990
>>> 0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266609]  0000000000011240
>>> ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266613]  0000000000011240
>>> ffff880206dd7990 0000000000011240 ffff8801def2e000
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266616] Call Trace:
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266625]  [<ffffffff810aec97>]
>>> ? pagevec_lookup_tag+0x18/0x21
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266629]  [<ffffffff810a528a>]
>>> ? filemap_fdatawait_range+0xff/0x144
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266634]  [<ffffffff81146b09>]
>>> ? writepages_unix_file+0x36e/0x3ce
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266638]  [<ffffffff810ab9f9>]
>>> ? global_dirtyable_memory+0xd/0x2c
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266643]  [<ffffffff81489dac>]
>>> ? __mutex_lock_slowpath+0xd0/0x116
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266646]  [<ffffffff8148a096>]
>>> ? mutex_lock+0x1a/0x2d
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266650]  [<ffffffff81143b73>]
>>> ? reiser4_sync_file_common+0x58/0xcd
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266654]  [<ffffffff81147a81>]
>>> ? write_unix_file+0x442/0x4b7
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266658]  [<ffffffff811498b9>]
>>> ? reiser4_write_careful+0xb8/0x450
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266664]  [<ffffffff810da90f>]
>>> ? vfs_write+0xaf/0x149
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266666]  [<ffffffff810dab49>]
>>> ? sys_pwrite64+0x53/0x71
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266670]  [<ffffffff8148c3e2>]
>>> ? system_call_fastpath+0x16/0x1b
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266672] INFO: task
>>> qemu-system-x86:4162 blocked for more than 120 seconds.
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266674] "echo 0 >
>>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266675] qemu-system-x86 D
>>> 0000000000000000     0  4162   3654 0x00000000
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266678]  ffff88020f7cacf0
>>> 0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266681]  0000000000011240
>>> ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266684]  0000000000011240
>>> ffff88020f7cacf0 0000000000011240 ffff8801e007e000
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266687] Call Trace:
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266691]  [<ffffffff810aec97>]
>>> ? pagevec_lookup_tag+0x18/0x21
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266694]  [<ffffffff810a528a>]
>>> ? filemap_fdatawait_range+0xff/0x144
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266698]  [<ffffffff81146b09>]
>>> ? writepages_unix_file+0x36e/0x3ce
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266702]  [<ffffffff81489dac>]
>>> ? __mutex_lock_slowpath+0xd0/0x116
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266705]  [<ffffffff8148a096>]
>>> ? mutex_lock+0x1a/0x2d
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266708]  [<ffffffff81143b73>]
>>> ? reiser4_sync_file_common+0x58/0xcd
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266712]  [<ffffffff810fa6d8>]
>>> ? do_fsync+0x29/0x47
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266714]  [<ffffffff810fa716>]
>>> ? sys_fdatasync+0xe/0x15
>>> Dec  7 18:45:29 krshina3 kernel: [  840.266717]  [<ffffffff8148c3e2>]
>>> ? system_call_fastpath+0x16/0x1b
>>>
>>>
>>> File runs fine from FAT32 partition
>>>
>>> If I can do something, or you need any info tell me please
>>>
>>> Thanks
>>> Dushan
>>
>> Did it start precisely from 2.6.39?
>
> It started like this:
>
> I used 2.6.39 for a long time without problems.
> Then I updated to 3.6.6 and errors started happening ONLY after that
> big image windows.qcow2 3.2GB file was first accessed from QEMU. Then
> I tried copying that file to another partition (plain R4) to save it
> and when I opened that file from QEMU then that partition started to
> give me problems.
> I even tried to do it on clean new R4 partition and it crashed.
> So now every R4 partition that has that file accessed from QEMU give
> me new problems with inconsistencies.

I forgot to emphasize this:
That file is backuped weekly to a fscked R4 partition and I never had
any troubles with that partition, ONLY with partitions where QEMU
accessed that file. Loopback truobles?

> Using it from FAT32 works OK.
>
> Only thing I changed recently was changing from piix sata driver to ahci.
>
>
>> Got the same, and if yes, then it definitily seems to be an effect of the
>> problem with in-kernel(?) dcache corruption - the one which also results in
>> BUGs at 000000000000000e, assertion "nikita-2050" in jnode.c and so on...
>>
>> BTW (@Edward): none of the problems can be reproduced on my VM with exactly
>> same software configuration (copied root partition)...
>>
>> Regards,
>> Ivan.
> Have a nice day
>
> Dushan
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: R4 problem started with 2.6.39 and still there with 3.6.6
  2012-12-07 18:34 ` Dušan Čolić
@ 2012-12-09 15:17   ` Ivan Shapovalov
  2012-12-09 16:19     ` Dušan Čolić
  0 siblings, 1 reply; 28+ messages in thread
From: Ivan Shapovalov @ 2012-12-09 15:17 UTC (permalink / raw)
  To: Dušan Čolić; +Cc: reiserfs-devel

On 07 December 2012 19:34:45 Dušan Čolić wrote:
> Ok, on just fscked partition I now get:
> 
> Dec  7 19:31:43 krshina3 kernel: [ 2069.584096] reiser4[sshd(5058)]:
> find_cluster_item
> (fs/reiser4/plugin/file/cryptcompress.c:781)[edward-1608]:
> Dec  7 19:31:43 krshina3 kernel: [ 2069.584096] WARNING: Expected item
> not found. Fsck?
> Dec  7 19:31:43 krshina3 kernel: [ 2069.584104] reiser4[sshd(5058)]:
> dc_check_checksum
> (fs/reiser4/plugin/file/cryptcompress.c:1023)[edward-156]:
> Dec  7 19:31:43 krshina3 kernel: [ 2069.584104] WARNING: Bad disk
> cluster checksum 1869768224, (should be 950540942) Fsck?
> Dec  7 19:31:43 krshina3 kernel: [ 2069.584104]
> Dec  7 19:31:43 krshina3 kernel: [ 2069.584109] reiser4[sshd(5058)]:
> reiser4_inflate_cluster
> (fs/reiser4/plugin/file/cryptcompress.c:1190)[edward-1460]:
> Dec  7 19:31:43 krshina3 kernel: [ 2069.584109] WARNING: Inode
> 14592305: disk cluster 0 looks corrupted
> Dec  7 19:31:43 krshina3 sshd[5056]: Accepted keyboard-interactive/pam
> for root from 192.168.1.10 port 7531 ssh2
> Dec  7 19:31:43 krshina3 sshd[5056]: pam_unix(sshd:session): session
> opened for user root by (uid=0)
> Dec  7 19:31:43 krshina3 kernel: [ 2069.637085] reiser4[bash(5066)]:
> find_cluster_item
> (fs/reiser4/plugin/file/cryptcompress.c:781)[edward-1608]:
> Dec  7 19:31:43 krshina3 kernel: [ 2069.637085] WARNING: Expected item
> not found. Fsck?
> Dec  7 19:31:43 krshina3 kernel: [ 2069.637094] reiser4[bash(5066)]:
> dc_check_checksum
> (fs/reiser4/plugin/file/cryptcompress.c:1023)[edward-156]:
> Dec  7 19:31:43 krshina3 kernel: [ 2069.637094] WARNING: Bad disk
> cluster checksum -1945338855, (should be 944271739) Fsck?
> Dec  7 19:31:43 krshina3 kernel: [ 2069.637094]
> Dec  7 19:31:43 krshina3 kernel: [ 2069.637098] reiser4[bash(5066)]:
> reiser4_inflate_cluster
> (fs/reiser4/plugin/file/cryptcompress.c:1190)[edward-1460]:
> Dec  7 19:31:43 krshina3 kernel: [ 2069.637098] WARNING: Inode
> 15185444: disk cluster 0 looks corrupted
> tail: unrecognized file system type 0x52345362 for
> '/var/log/messages'. please report this to bug-coreutils@gnu.org.
> reverting to polling
> 
> This is getting bad, I'm going back to 2.6.39 :D

This is exactly what I have here on 3.<anything>.<anything> with a plain KDE 
desktop and "a bit of everything" workload. It seems not to be related to QEMU 
or loopbacks or something - just intensive random I/O is what triggers this, 
no specific patterns I've got so far.
Please tell if it stops happening on 2.6.39 (but remember that it may be 
silent for a while) so I can bisect with precision :)

Thanks,
Ivan.

> 
> On Fri, Dec 7, 2012 at 6:56 PM, Dušan Čolić <dusanc@gmail.com> wrote:
> > Hello
> > 
> > I'm using KVM for windows emulation and I have a ~3GB image file that
> > I run it from.
> > I started having problems with it lately on regular and ccreg40
> > partitions (I tried same file on both)  using 3.6.6.
> > Spammed output with a lot of these:
> > 
> > Dec  7 03:30:02 krshina3 kernel: [15135.133047] reiser4[find(5806)]:
> > parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
> > Dec  7 03:30:02 krshina3 kernel: [15135.133047] WARNING: Wrong level
> > found in node: 2 != 1
> > Dec  7 03:30:02 krshina3 kernel: [15135.133048] reiser4[find(5806)]:
> > key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
> > Dec  7 03:30:02 krshina3 kernel: [15135.133048] WARNING: Error for
> > inode 17802378 (-5)
> > Dec  7 03:30:02 krshina3 kernel: [15135.133056] reiser4[find(5806)]:
> > parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
> > Dec  7 03:30:02 krshina3 kernel: [15135.133056] WARNING: Wrong level
> > found in node: 2 != 1
> > 
> > 
> > Dec  7 17:26:23 krshina3 kernel: [38539.089191] reiser4[gdm(2676)]:
> > cbk_level_lookup (fs/reiser4/search.c:963)[vs-3533]:
> > Dec  7 17:26:23 krshina3 kernel: [38539.089194] reiser4[gdm(2676)]:
> > key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
> > Dec  7 17:26:23 krshina3 kernel: [38539.090837]
> > reiser4[gnome-screensav(3503)]: cbk_level_lookup
> > (fs/reiser4/search.c:963)[vs-3533]:
> > Dec  7 17:26:23 krshina3 kernel: [38539.090840]
> > reiser4[gnome-screensav(3503)]: key_warning
> > 
> > (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
> >  I fscked the FSes and had some errors that were corrected.
> > 
> > Now I started geting these and I can't kill the offending process:
> > 
> > Dec  7 18:43:29 krshina3 kernel: [  720.274343] INFO: task
> > qemu-system-x86:4156 blocked for more than 120 seconds.
> > Dec  7 18:43:29 krshina3 kernel: [  720.274347] "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Dec  7 18:43:29 krshina3 kernel: [  720.274349] qemu-system-x86 D
> > 0000000000000001     0  4156   3654 0x00000000
> > Dec  7 18:43:29 krshina3 kernel: [  720.274354]  ffff880206dd7990
> > 0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
> > Dec  7 18:43:29 krshina3 kernel: [  720.274357]  0000000000011240
> > ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
> > Dec  7 18:43:29 krshina3 kernel: [  720.274361]  0000000000011240
> > ffff880206dd7990 0000000000011240 ffff8801def2e000
> > Dec  7 18:43:29 krshina3 kernel: [  720.274364] Call Trace:
> > Dec  7 18:43:29 krshina3 kernel: [  720.274372]  [<ffffffff810aec97>]
> > ? pagevec_lookup_tag+0x18/0x21
> > Dec  7 18:43:29 krshina3 kernel: [  720.274375]  [<ffffffff810a528a>]
> > ? filemap_fdatawait_range+0xff/0x144
> > Dec  7 18:43:29 krshina3 kernel: [  720.274380]  [<ffffffff81146b09>]
> > ? writepages_unix_file+0x36e/0x3ce
> > Dec  7 18:43:29 krshina3 kernel: [  720.274384]  [<ffffffff810ab9f9>]
> > ? global_dirtyable_memory+0xd/0x2c
> > Dec  7 18:43:29 krshina3 kernel: [  720.274389]  [<ffffffff81489dac>]
> > ? __mutex_lock_slowpath+0xd0/0x116
> > Dec  7 18:43:29 krshina3 kernel: [  720.274392]  [<ffffffff8148a096>]
> > ? mutex_lock+0x1a/0x2d
> > Dec  7 18:43:29 krshina3 kernel: [  720.274396]  [<ffffffff81143b73>]
> > ? reiser4_sync_file_common+0x58/0xcd
> > Dec  7 18:43:29 krshina3 kernel: [  720.274400]  [<ffffffff81147a81>]
> > ? write_unix_file+0x442/0x4b7
> > Dec  7 18:43:29 krshina3 kernel: [  720.274404]  [<ffffffff811498b9>]
> > ? reiser4_write_careful+0xb8/0x450
> > Dec  7 18:43:29 krshina3 kernel: [  720.274409]  [<ffffffff810da90f>]
> > ? vfs_write+0xaf/0x149
> > Dec  7 18:43:29 krshina3 kernel: [  720.274412]  [<ffffffff810dab49>]
> > ? sys_pwrite64+0x53/0x71
> > Dec  7 18:43:29 krshina3 kernel: [  720.274415]  [<ffffffff8148c3e2>]
> > ? system_call_fastpath+0x16/0x1b
> > Dec  7 18:43:29 krshina3 kernel: [  720.274418] INFO: task
> > qemu-system-x86:4162 blocked for more than 120 seconds.
> > Dec  7 18:43:29 krshina3 kernel: [  720.274419] "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Dec  7 18:43:29 krshina3 kernel: [  720.274420] qemu-system-x86 D
> > 0000000000000000     0  4162   3654 0x00000000
> > Dec  7 18:43:29 krshina3 kernel: [  720.274423]  ffff88020f7cacf0
> > 0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
> > Dec  7 18:43:29 krshina3 kernel: [  720.274426]  0000000000011240
> > ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
> > Dec  7 18:43:29 krshina3 kernel: [  720.274429]  0000000000011240
> > ffff88020f7cacf0 0000000000011240 ffff8801e007e000
> > Dec  7 18:43:29 krshina3 kernel: [  720.274432] Call Trace:
> > Dec  7 18:43:29 krshina3 kernel: [  720.274437]  [<ffffffff810aec97>]
> > ? pagevec_lookup_tag+0x18/0x21
> > Dec  7 18:43:29 krshina3 kernel: [  720.274439]  [<ffffffff810a528a>]
> > ? filemap_fdatawait_range+0xff/0x144
> > Dec  7 18:43:29 krshina3 kernel: [  720.274443]  [<ffffffff81146b09>]
> > ? writepages_unix_file+0x36e/0x3ce
> > Dec  7 18:43:29 krshina3 kernel: [  720.274447]  [<ffffffff81489dac>]
> > ? __mutex_lock_slowpath+0xd0/0x116
> > Dec  7 18:43:29 krshina3 kernel: [  720.274450]  [<ffffffff8148a096>]
> > ? mutex_lock+0x1a/0x2d
> > Dec  7 18:43:29 krshina3 kernel: [  720.274453]  [<ffffffff81143b73>]
> > ? reiser4_sync_file_common+0x58/0xcd
> > Dec  7 18:43:29 krshina3 kernel: [  720.274457]  [<ffffffff810fa6d8>]
> > ? do_fsync+0x29/0x47
> > Dec  7 18:43:29 krshina3 kernel: [  720.274460]  [<ffffffff810fa716>]
> > ? sys_fdatasync+0xe/0x15
> > Dec  7 18:43:29 krshina3 kernel: [  720.274462]  [<ffffffff8148c3e2>]
> > ? system_call_fastpath+0x16/0x1b
> > tail: unrecognized file system type 0x52345362 for
> > '/var/log/messages'. please report this to bug-coreutils@gnu.org.
> > reverting to polling
> > Dec  7 18:45:29 krshina3 kernel: [  840.266595] INFO: task
> > qemu-system-x86:4156 blocked for more than 120 seconds.
> > Dec  7 18:45:29 krshina3 kernel: [  840.266599] "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Dec  7 18:45:29 krshina3 kernel: [  840.266601] qemu-system-x86 D
> > 0000000000000001     0  4156   3654 0x00000000
> > Dec  7 18:45:29 krshina3 kernel: [  840.266605]  ffff880206dd7990
> > 0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
> > Dec  7 18:45:29 krshina3 kernel: [  840.266609]  0000000000011240
> > ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
> > Dec  7 18:45:29 krshina3 kernel: [  840.266613]  0000000000011240
> > ffff880206dd7990 0000000000011240 ffff8801def2e000
> > Dec  7 18:45:29 krshina3 kernel: [  840.266616] Call Trace:
> > Dec  7 18:45:29 krshina3 kernel: [  840.266625]  [<ffffffff810aec97>]
> > ? pagevec_lookup_tag+0x18/0x21
> > Dec  7 18:45:29 krshina3 kernel: [  840.266629]  [<ffffffff810a528a>]
> > ? filemap_fdatawait_range+0xff/0x144
> > Dec  7 18:45:29 krshina3 kernel: [  840.266634]  [<ffffffff81146b09>]
> > ? writepages_unix_file+0x36e/0x3ce
> > Dec  7 18:45:29 krshina3 kernel: [  840.266638]  [<ffffffff810ab9f9>]
> > ? global_dirtyable_memory+0xd/0x2c
> > Dec  7 18:45:29 krshina3 kernel: [  840.266643]  [<ffffffff81489dac>]
> > ? __mutex_lock_slowpath+0xd0/0x116
> > Dec  7 18:45:29 krshina3 kernel: [  840.266646]  [<ffffffff8148a096>]
> > ? mutex_lock+0x1a/0x2d
> > Dec  7 18:45:29 krshina3 kernel: [  840.266650]  [<ffffffff81143b73>]
> > ? reiser4_sync_file_common+0x58/0xcd
> > Dec  7 18:45:29 krshina3 kernel: [  840.266654]  [<ffffffff81147a81>]
> > ? write_unix_file+0x442/0x4b7
> > Dec  7 18:45:29 krshina3 kernel: [  840.266658]  [<ffffffff811498b9>]
> > ? reiser4_write_careful+0xb8/0x450
> > Dec  7 18:45:29 krshina3 kernel: [  840.266664]  [<ffffffff810da90f>]
> > ? vfs_write+0xaf/0x149
> > Dec  7 18:45:29 krshina3 kernel: [  840.266666]  [<ffffffff810dab49>]
> > ? sys_pwrite64+0x53/0x71
> > Dec  7 18:45:29 krshina3 kernel: [  840.266670]  [<ffffffff8148c3e2>]
> > ? system_call_fastpath+0x16/0x1b
> > Dec  7 18:45:29 krshina3 kernel: [  840.266672] INFO: task
> > qemu-system-x86:4162 blocked for more than 120 seconds.
> > Dec  7 18:45:29 krshina3 kernel: [  840.266674] "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Dec  7 18:45:29 krshina3 kernel: [  840.266675] qemu-system-x86 D
> > 0000000000000000     0  4162   3654 0x00000000
> > Dec  7 18:45:29 krshina3 kernel: [  840.266678]  ffff88020f7cacf0
> > 0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
> > Dec  7 18:45:29 krshina3 kernel: [  840.266681]  0000000000011240
> > ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
> > Dec  7 18:45:29 krshina3 kernel: [  840.266684]  0000000000011240
> > ffff88020f7cacf0 0000000000011240 ffff8801e007e000
> > Dec  7 18:45:29 krshina3 kernel: [  840.266687] Call Trace:
> > Dec  7 18:45:29 krshina3 kernel: [  840.266691]  [<ffffffff810aec97>]
> > ? pagevec_lookup_tag+0x18/0x21
> > Dec  7 18:45:29 krshina3 kernel: [  840.266694]  [<ffffffff810a528a>]
> > ? filemap_fdatawait_range+0xff/0x144
> > Dec  7 18:45:29 krshina3 kernel: [  840.266698]  [<ffffffff81146b09>]
> > ? writepages_unix_file+0x36e/0x3ce
> > Dec  7 18:45:29 krshina3 kernel: [  840.266702]  [<ffffffff81489dac>]
> > ? __mutex_lock_slowpath+0xd0/0x116
> > Dec  7 18:45:29 krshina3 kernel: [  840.266705]  [<ffffffff8148a096>]
> > ? mutex_lock+0x1a/0x2d
> > Dec  7 18:45:29 krshina3 kernel: [  840.266708]  [<ffffffff81143b73>]
> > ? reiser4_sync_file_common+0x58/0xcd
> > Dec  7 18:45:29 krshina3 kernel: [  840.266712]  [<ffffffff810fa6d8>]
> > ? do_fsync+0x29/0x47
> > Dec  7 18:45:29 krshina3 kernel: [  840.266714]  [<ffffffff810fa716>]
> > ? sys_fdatasync+0xe/0x15
> > Dec  7 18:45:29 krshina3 kernel: [  840.266717]  [<ffffffff8148c3e2>]
> > ? system_call_fastpath+0x16/0x1b
> > 
> > 
> > File runs fine from FAT32 partition
> > 
> > If I can do something, or you need any info tell me please
> > 
> > Thanks
> > Dushan
> 
> --
> To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: R4 problem started with 2.6.39 and still there with 3.6.6
  2012-12-09 15:17   ` Ivan Shapovalov
@ 2012-12-09 16:19     ` Dušan Čolić
  2012-12-09 16:29       ` Dušan Čolić
  0 siblings, 1 reply; 28+ messages in thread
From: Dušan Čolić @ 2012-12-09 16:19 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel

On Sun, Dec 9, 2012 at 4:17 PM, Ivan Shapovalov <intelfx100@gmail.com> wrote:
> On 07 December 2012 19:34:45 Dušan Čolić wrote:
>> Ok, on just fscked partition I now get:
>>
>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584096] reiser4[sshd(5058)]:
>> find_cluster_item
>> (fs/reiser4/plugin/file/cryptcompress.c:781)[edward-1608]:
>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584096] WARNING: Expected item
>> not found. Fsck?
>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584104] reiser4[sshd(5058)]:
>> dc_check_checksum
>> (fs/reiser4/plugin/file/cryptcompress.c:1023)[edward-156]:
>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584104] WARNING: Bad disk
>> cluster checksum 1869768224, (should be 950540942) Fsck?
>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584104]
>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584109] reiser4[sshd(5058)]:
>> reiser4_inflate_cluster
>> (fs/reiser4/plugin/file/cryptcompress.c:1190)[edward-1460]:
>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584109] WARNING: Inode
>> 14592305: disk cluster 0 looks corrupted
>> Dec  7 19:31:43 krshina3 sshd[5056]: Accepted keyboard-interactive/pam
>> for root from 192.168.1.10 port 7531 ssh2
>> Dec  7 19:31:43 krshina3 sshd[5056]: pam_unix(sshd:session): session
>> opened for user root by (uid=0)
>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637085] reiser4[bash(5066)]:
>> find_cluster_item
>> (fs/reiser4/plugin/file/cryptcompress.c:781)[edward-1608]:
>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637085] WARNING: Expected item
>> not found. Fsck?
>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637094] reiser4[bash(5066)]:
>> dc_check_checksum
>> (fs/reiser4/plugin/file/cryptcompress.c:1023)[edward-156]:
>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637094] WARNING: Bad disk
>> cluster checksum -1945338855, (should be 944271739) Fsck?
>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637094]
>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637098] reiser4[bash(5066)]:
>> reiser4_inflate_cluster
>> (fs/reiser4/plugin/file/cryptcompress.c:1190)[edward-1460]:
>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637098] WARNING: Inode
>> 15185444: disk cluster 0 looks corrupted
>> tail: unrecognized file system type 0x52345362 for
>> '/var/log/messages'. please report this to bug-coreutils@gnu.org.
>> reverting to polling
>>
>> This is getting bad, I'm going back to 2.6.39 :D
>
> This is exactly what I have here on 3.<anything>.<anything> with a plain KDE
> desktop and "a bit of everything" workload. It seems not to be related to QEMU
> or loopbacks or something - just intensive random I/O is what triggers this,
> no specific patterns I've got so far.
> Please tell if it stops happening on 2.6.39 (but remember that it may be
> silent for a while) so I can bisect with precision :)
>
It happens with 2.6.39 now too, but with 2.6.39 computer hangs while
with 3.6.6 it still works after 24hrs

If I reformat partition and restore everything from backup do you
think it would stop?


> Thanks,
> Ivan.
>
Thanks,
Dushan
>>
>> On Fri, Dec 7, 2012 at 6:56 PM, Dušan Čolić <dusanc@gmail.com> wrote:
>> > Hello
>> >
>> > I'm using KVM for windows emulation and I have a ~3GB image file that
>> > I run it from.
>> > I started having problems with it lately on regular and ccreg40
>> > partitions (I tried same file on both)  using 3.6.6.
>> > Spammed output with a lot of these:
>> >
>> > Dec  7 03:30:02 krshina3 kernel: [15135.133047] reiser4[find(5806)]:
>> > parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
>> > Dec  7 03:30:02 krshina3 kernel: [15135.133047] WARNING: Wrong level
>> > found in node: 2 != 1
>> > Dec  7 03:30:02 krshina3 kernel: [15135.133048] reiser4[find(5806)]:
>> > key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
>> > Dec  7 03:30:02 krshina3 kernel: [15135.133048] WARNING: Error for
>> > inode 17802378 (-5)
>> > Dec  7 03:30:02 krshina3 kernel: [15135.133056] reiser4[find(5806)]:
>> > parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
>> > Dec  7 03:30:02 krshina3 kernel: [15135.133056] WARNING: Wrong level
>> > found in node: 2 != 1
>> >
>> >
>> > Dec  7 17:26:23 krshina3 kernel: [38539.089191] reiser4[gdm(2676)]:
>> > cbk_level_lookup (fs/reiser4/search.c:963)[vs-3533]:
>> > Dec  7 17:26:23 krshina3 kernel: [38539.089194] reiser4[gdm(2676)]:
>> > key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
>> > Dec  7 17:26:23 krshina3 kernel: [38539.090837]
>> > reiser4[gnome-screensav(3503)]: cbk_level_lookup
>> > (fs/reiser4/search.c:963)[vs-3533]:
>> > Dec  7 17:26:23 krshina3 kernel: [38539.090840]
>> > reiser4[gnome-screensav(3503)]: key_warning
>> >
>> > (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
>> >  I fscked the FSes and had some errors that were corrected.
>> >
>> > Now I started geting these and I can't kill the offending process:
>> >
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274343] INFO: task
>> > qemu-system-x86:4156 blocked for more than 120 seconds.
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274347] "echo 0 >
>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274349] qemu-system-x86 D
>> > 0000000000000001     0  4156   3654 0x00000000
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274354]  ffff880206dd7990
>> > 0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274357]  0000000000011240
>> > ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274361]  0000000000011240
>> > ffff880206dd7990 0000000000011240 ffff8801def2e000
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274364] Call Trace:
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274372]  [<ffffffff810aec97>]
>> > ? pagevec_lookup_tag+0x18/0x21
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274375]  [<ffffffff810a528a>]
>> > ? filemap_fdatawait_range+0xff/0x144
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274380]  [<ffffffff81146b09>]
>> > ? writepages_unix_file+0x36e/0x3ce
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274384]  [<ffffffff810ab9f9>]
>> > ? global_dirtyable_memory+0xd/0x2c
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274389]  [<ffffffff81489dac>]
>> > ? __mutex_lock_slowpath+0xd0/0x116
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274392]  [<ffffffff8148a096>]
>> > ? mutex_lock+0x1a/0x2d
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274396]  [<ffffffff81143b73>]
>> > ? reiser4_sync_file_common+0x58/0xcd
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274400]  [<ffffffff81147a81>]
>> > ? write_unix_file+0x442/0x4b7
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274404]  [<ffffffff811498b9>]
>> > ? reiser4_write_careful+0xb8/0x450
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274409]  [<ffffffff810da90f>]
>> > ? vfs_write+0xaf/0x149
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274412]  [<ffffffff810dab49>]
>> > ? sys_pwrite64+0x53/0x71
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274415]  [<ffffffff8148c3e2>]
>> > ? system_call_fastpath+0x16/0x1b
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274418] INFO: task
>> > qemu-system-x86:4162 blocked for more than 120 seconds.
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274419] "echo 0 >
>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274420] qemu-system-x86 D
>> > 0000000000000000     0  4162   3654 0x00000000
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274423]  ffff88020f7cacf0
>> > 0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274426]  0000000000011240
>> > ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274429]  0000000000011240
>> > ffff88020f7cacf0 0000000000011240 ffff8801e007e000
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274432] Call Trace:
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274437]  [<ffffffff810aec97>]
>> > ? pagevec_lookup_tag+0x18/0x21
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274439]  [<ffffffff810a528a>]
>> > ? filemap_fdatawait_range+0xff/0x144
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274443]  [<ffffffff81146b09>]
>> > ? writepages_unix_file+0x36e/0x3ce
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274447]  [<ffffffff81489dac>]
>> > ? __mutex_lock_slowpath+0xd0/0x116
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274450]  [<ffffffff8148a096>]
>> > ? mutex_lock+0x1a/0x2d
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274453]  [<ffffffff81143b73>]
>> > ? reiser4_sync_file_common+0x58/0xcd
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274457]  [<ffffffff810fa6d8>]
>> > ? do_fsync+0x29/0x47
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274460]  [<ffffffff810fa716>]
>> > ? sys_fdatasync+0xe/0x15
>> > Dec  7 18:43:29 krshina3 kernel: [  720.274462]  [<ffffffff8148c3e2>]
>> > ? system_call_fastpath+0x16/0x1b
>> > tail: unrecognized file system type 0x52345362 for
>> > '/var/log/messages'. please report this to bug-coreutils@gnu.org.
>> > reverting to polling
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266595] INFO: task
>> > qemu-system-x86:4156 blocked for more than 120 seconds.
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266599] "echo 0 >
>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266601] qemu-system-x86 D
>> > 0000000000000001     0  4156   3654 0x00000000
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266605]  ffff880206dd7990
>> > 0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266609]  0000000000011240
>> > ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266613]  0000000000011240
>> > ffff880206dd7990 0000000000011240 ffff8801def2e000
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266616] Call Trace:
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266625]  [<ffffffff810aec97>]
>> > ? pagevec_lookup_tag+0x18/0x21
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266629]  [<ffffffff810a528a>]
>> > ? filemap_fdatawait_range+0xff/0x144
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266634]  [<ffffffff81146b09>]
>> > ? writepages_unix_file+0x36e/0x3ce
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266638]  [<ffffffff810ab9f9>]
>> > ? global_dirtyable_memory+0xd/0x2c
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266643]  [<ffffffff81489dac>]
>> > ? __mutex_lock_slowpath+0xd0/0x116
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266646]  [<ffffffff8148a096>]
>> > ? mutex_lock+0x1a/0x2d
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266650]  [<ffffffff81143b73>]
>> > ? reiser4_sync_file_common+0x58/0xcd
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266654]  [<ffffffff81147a81>]
>> > ? write_unix_file+0x442/0x4b7
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266658]  [<ffffffff811498b9>]
>> > ? reiser4_write_careful+0xb8/0x450
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266664]  [<ffffffff810da90f>]
>> > ? vfs_write+0xaf/0x149
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266666]  [<ffffffff810dab49>]
>> > ? sys_pwrite64+0x53/0x71
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266670]  [<ffffffff8148c3e2>]
>> > ? system_call_fastpath+0x16/0x1b
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266672] INFO: task
>> > qemu-system-x86:4162 blocked for more than 120 seconds.
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266674] "echo 0 >
>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266675] qemu-system-x86 D
>> > 0000000000000000     0  4162   3654 0x00000000
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266678]  ffff88020f7cacf0
>> > 0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266681]  0000000000011240
>> > ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266684]  0000000000011240
>> > ffff88020f7cacf0 0000000000011240 ffff8801e007e000
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266687] Call Trace:
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266691]  [<ffffffff810aec97>]
>> > ? pagevec_lookup_tag+0x18/0x21
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266694]  [<ffffffff810a528a>]
>> > ? filemap_fdatawait_range+0xff/0x144
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266698]  [<ffffffff81146b09>]
>> > ? writepages_unix_file+0x36e/0x3ce
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266702]  [<ffffffff81489dac>]
>> > ? __mutex_lock_slowpath+0xd0/0x116
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266705]  [<ffffffff8148a096>]
>> > ? mutex_lock+0x1a/0x2d
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266708]  [<ffffffff81143b73>]
>> > ? reiser4_sync_file_common+0x58/0xcd
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266712]  [<ffffffff810fa6d8>]
>> > ? do_fsync+0x29/0x47
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266714]  [<ffffffff810fa716>]
>> > ? sys_fdatasync+0xe/0x15
>> > Dec  7 18:45:29 krshina3 kernel: [  840.266717]  [<ffffffff8148c3e2>]
>> > ? system_call_fastpath+0x16/0x1b
>> >
>> >
>> > File runs fine from FAT32 partition
>> >
>> > If I can do something, or you need any info tell me please
>> >
>> > Thanks
>> > Dushan
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: R4 problem started with 2.6.39 and still there with 3.6.6
  2012-12-09 16:19     ` Dušan Čolić
@ 2012-12-09 16:29       ` Dušan Čolić
  2012-12-09 16:38         ` Ivan Shapovalov
  0 siblings, 1 reply; 28+ messages in thread
From: Dušan Čolić @ 2012-12-09 16:29 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel

On Sun, Dec 9, 2012 at 5:19 PM, Dušan Čolić <dusanc@gmail.com> wrote:
> On Sun, Dec 9, 2012 at 4:17 PM, Ivan Shapovalov <intelfx100@gmail.com> wrote:
>> On 07 December 2012 19:34:45 Dušan Čolić wrote:
>>> Ok, on just fscked partition I now get:
>>>
>>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584096] reiser4[sshd(5058)]:
>>> find_cluster_item
>>> (fs/reiser4/plugin/file/cryptcompress.c:781)[edward-1608]:
>>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584096] WARNING: Expected item
>>> not found. Fsck?
>>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584104] reiser4[sshd(5058)]:
>>> dc_check_checksum
>>> (fs/reiser4/plugin/file/cryptcompress.c:1023)[edward-156]:
>>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584104] WARNING: Bad disk
>>> cluster checksum 1869768224, (should be 950540942) Fsck?
>>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584104]
>>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584109] reiser4[sshd(5058)]:
>>> reiser4_inflate_cluster
>>> (fs/reiser4/plugin/file/cryptcompress.c:1190)[edward-1460]:
>>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584109] WARNING: Inode
>>> 14592305: disk cluster 0 looks corrupted
>>> Dec  7 19:31:43 krshina3 sshd[5056]: Accepted keyboard-interactive/pam
>>> for root from 192.168.1.10 port 7531 ssh2
>>> Dec  7 19:31:43 krshina3 sshd[5056]: pam_unix(sshd:session): session
>>> opened for user root by (uid=0)
>>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637085] reiser4[bash(5066)]:
>>> find_cluster_item
>>> (fs/reiser4/plugin/file/cryptcompress.c:781)[edward-1608]:
>>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637085] WARNING: Expected item
>>> not found. Fsck?
>>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637094] reiser4[bash(5066)]:
>>> dc_check_checksum
>>> (fs/reiser4/plugin/file/cryptcompress.c:1023)[edward-156]:
>>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637094] WARNING: Bad disk
>>> cluster checksum -1945338855, (should be 944271739) Fsck?
>>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637094]
>>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637098] reiser4[bash(5066)]:
>>> reiser4_inflate_cluster
>>> (fs/reiser4/plugin/file/cryptcompress.c:1190)[edward-1460]:
>>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637098] WARNING: Inode
>>> 15185444: disk cluster 0 looks corrupted
>>> tail: unrecognized file system type 0x52345362 for
>>> '/var/log/messages'. please report this to bug-coreutils@gnu.org.
>>> reverting to polling
>>>
>>> This is getting bad, I'm going back to 2.6.39 :D
>>
>> This is exactly what I have here on 3.<anything>.<anything> with a plain KDE
>> desktop and "a bit of everything" workload. It seems not to be related to QEMU
>> or loopbacks or something - just intensive random I/O is what triggers this,
>> no specific patterns I've got so far.
>> Please tell if it stops happening on 2.6.39 (but remember that it may be
>> silent for a while) so I can bisect with precision :)
>>
> It happens with 2.6.39 now too, but with 2.6.39 computer hangs while
> with 3.6.6 it still works after 24hrs
>

Sorry I messed up, now with 2.6.39 I get same errors on I/O, but files
look OK. I only got files in lost+found when system oopsed.
Funny thing is I never had these errors until now and i used 2.6.39
from Nov 2011.

mount options noatime,onerror=remount-ro, partition type ccreg40

> If I reformat partition and restore everything from backup do you
> think it would stop?
>
>
>> Thanks,
>> Ivan.
>>
> Thanks,
> Dushan
>>>
>>> On Fri, Dec 7, 2012 at 6:56 PM, Dušan Čolić <dusanc@gmail.com> wrote:
>>> > Hello
>>> >
>>> > I'm using KVM for windows emulation and I have a ~3GB image file that
>>> > I run it from.
>>> > I started having problems with it lately on regular and ccreg40
>>> > partitions (I tried same file on both)  using 3.6.6.
>>> > Spammed output with a lot of these:
>>> >
>>> > Dec  7 03:30:02 krshina3 kernel: [15135.133047] reiser4[find(5806)]:
>>> > parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
>>> > Dec  7 03:30:02 krshina3 kernel: [15135.133047] WARNING: Wrong level
>>> > found in node: 2 != 1
>>> > Dec  7 03:30:02 krshina3 kernel: [15135.133048] reiser4[find(5806)]:
>>> > key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
>>> > Dec  7 03:30:02 krshina3 kernel: [15135.133048] WARNING: Error for
>>> > inode 17802378 (-5)
>>> > Dec  7 03:30:02 krshina3 kernel: [15135.133056] reiser4[find(5806)]:
>>> > parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
>>> > Dec  7 03:30:02 krshina3 kernel: [15135.133056] WARNING: Wrong level
>>> > found in node: 2 != 1
>>> >
>>> >
>>> > Dec  7 17:26:23 krshina3 kernel: [38539.089191] reiser4[gdm(2676)]:
>>> > cbk_level_lookup (fs/reiser4/search.c:963)[vs-3533]:
>>> > Dec  7 17:26:23 krshina3 kernel: [38539.089194] reiser4[gdm(2676)]:
>>> > key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
>>> > Dec  7 17:26:23 krshina3 kernel: [38539.090837]
>>> > reiser4[gnome-screensav(3503)]: cbk_level_lookup
>>> > (fs/reiser4/search.c:963)[vs-3533]:
>>> > Dec  7 17:26:23 krshina3 kernel: [38539.090840]
>>> > reiser4[gnome-screensav(3503)]: key_warning
>>> >
>>> > (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
>>> >  I fscked the FSes and had some errors that were corrected.
>>> >
>>> > Now I started geting these and I can't kill the offending process:
>>> >
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274343] INFO: task
>>> > qemu-system-x86:4156 blocked for more than 120 seconds.
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274347] "echo 0 >
>>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274349] qemu-system-x86 D
>>> > 0000000000000001     0  4156   3654 0x00000000
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274354]  ffff880206dd7990
>>> > 0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274357]  0000000000011240
>>> > ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274361]  0000000000011240
>>> > ffff880206dd7990 0000000000011240 ffff8801def2e000
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274364] Call Trace:
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274372]  [<ffffffff810aec97>]
>>> > ? pagevec_lookup_tag+0x18/0x21
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274375]  [<ffffffff810a528a>]
>>> > ? filemap_fdatawait_range+0xff/0x144
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274380]  [<ffffffff81146b09>]
>>> > ? writepages_unix_file+0x36e/0x3ce
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274384]  [<ffffffff810ab9f9>]
>>> > ? global_dirtyable_memory+0xd/0x2c
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274389]  [<ffffffff81489dac>]
>>> > ? __mutex_lock_slowpath+0xd0/0x116
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274392]  [<ffffffff8148a096>]
>>> > ? mutex_lock+0x1a/0x2d
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274396]  [<ffffffff81143b73>]
>>> > ? reiser4_sync_file_common+0x58/0xcd
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274400]  [<ffffffff81147a81>]
>>> > ? write_unix_file+0x442/0x4b7
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274404]  [<ffffffff811498b9>]
>>> > ? reiser4_write_careful+0xb8/0x450
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274409]  [<ffffffff810da90f>]
>>> > ? vfs_write+0xaf/0x149
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274412]  [<ffffffff810dab49>]
>>> > ? sys_pwrite64+0x53/0x71
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274415]  [<ffffffff8148c3e2>]
>>> > ? system_call_fastpath+0x16/0x1b
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274418] INFO: task
>>> > qemu-system-x86:4162 blocked for more than 120 seconds.
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274419] "echo 0 >
>>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274420] qemu-system-x86 D
>>> > 0000000000000000     0  4162   3654 0x00000000
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274423]  ffff88020f7cacf0
>>> > 0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274426]  0000000000011240
>>> > ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274429]  0000000000011240
>>> > ffff88020f7cacf0 0000000000011240 ffff8801e007e000
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274432] Call Trace:
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274437]  [<ffffffff810aec97>]
>>> > ? pagevec_lookup_tag+0x18/0x21
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274439]  [<ffffffff810a528a>]
>>> > ? filemap_fdatawait_range+0xff/0x144
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274443]  [<ffffffff81146b09>]
>>> > ? writepages_unix_file+0x36e/0x3ce
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274447]  [<ffffffff81489dac>]
>>> > ? __mutex_lock_slowpath+0xd0/0x116
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274450]  [<ffffffff8148a096>]
>>> > ? mutex_lock+0x1a/0x2d
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274453]  [<ffffffff81143b73>]
>>> > ? reiser4_sync_file_common+0x58/0xcd
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274457]  [<ffffffff810fa6d8>]
>>> > ? do_fsync+0x29/0x47
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274460]  [<ffffffff810fa716>]
>>> > ? sys_fdatasync+0xe/0x15
>>> > Dec  7 18:43:29 krshina3 kernel: [  720.274462]  [<ffffffff8148c3e2>]
>>> > ? system_call_fastpath+0x16/0x1b
>>> > tail: unrecognized file system type 0x52345362 for
>>> > '/var/log/messages'. please report this to bug-coreutils@gnu.org.
>>> > reverting to polling
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266595] INFO: task
>>> > qemu-system-x86:4156 blocked for more than 120 seconds.
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266599] "echo 0 >
>>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266601] qemu-system-x86 D
>>> > 0000000000000001     0  4156   3654 0x00000000
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266605]  ffff880206dd7990
>>> > 0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266609]  0000000000011240
>>> > ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266613]  0000000000011240
>>> > ffff880206dd7990 0000000000011240 ffff8801def2e000
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266616] Call Trace:
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266625]  [<ffffffff810aec97>]
>>> > ? pagevec_lookup_tag+0x18/0x21
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266629]  [<ffffffff810a528a>]
>>> > ? filemap_fdatawait_range+0xff/0x144
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266634]  [<ffffffff81146b09>]
>>> > ? writepages_unix_file+0x36e/0x3ce
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266638]  [<ffffffff810ab9f9>]
>>> > ? global_dirtyable_memory+0xd/0x2c
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266643]  [<ffffffff81489dac>]
>>> > ? __mutex_lock_slowpath+0xd0/0x116
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266646]  [<ffffffff8148a096>]
>>> > ? mutex_lock+0x1a/0x2d
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266650]  [<ffffffff81143b73>]
>>> > ? reiser4_sync_file_common+0x58/0xcd
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266654]  [<ffffffff81147a81>]
>>> > ? write_unix_file+0x442/0x4b7
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266658]  [<ffffffff811498b9>]
>>> > ? reiser4_write_careful+0xb8/0x450
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266664]  [<ffffffff810da90f>]
>>> > ? vfs_write+0xaf/0x149
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266666]  [<ffffffff810dab49>]
>>> > ? sys_pwrite64+0x53/0x71
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266670]  [<ffffffff8148c3e2>]
>>> > ? system_call_fastpath+0x16/0x1b
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266672] INFO: task
>>> > qemu-system-x86:4162 blocked for more than 120 seconds.
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266674] "echo 0 >
>>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266675] qemu-system-x86 D
>>> > 0000000000000000     0  4162   3654 0x00000000
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266678]  ffff88020f7cacf0
>>> > 0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266681]  0000000000011240
>>> > ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266684]  0000000000011240
>>> > ffff88020f7cacf0 0000000000011240 ffff8801e007e000
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266687] Call Trace:
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266691]  [<ffffffff810aec97>]
>>> > ? pagevec_lookup_tag+0x18/0x21
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266694]  [<ffffffff810a528a>]
>>> > ? filemap_fdatawait_range+0xff/0x144
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266698]  [<ffffffff81146b09>]
>>> > ? writepages_unix_file+0x36e/0x3ce
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266702]  [<ffffffff81489dac>]
>>> > ? __mutex_lock_slowpath+0xd0/0x116
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266705]  [<ffffffff8148a096>]
>>> > ? mutex_lock+0x1a/0x2d
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266708]  [<ffffffff81143b73>]
>>> > ? reiser4_sync_file_common+0x58/0xcd
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266712]  [<ffffffff810fa6d8>]
>>> > ? do_fsync+0x29/0x47
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266714]  [<ffffffff810fa716>]
>>> > ? sys_fdatasync+0xe/0x15
>>> > Dec  7 18:45:29 krshina3 kernel: [  840.266717]  [<ffffffff8148c3e2>]
>>> > ? system_call_fastpath+0x16/0x1b
>>> >
>>> >
>>> > File runs fine from FAT32 partition
>>> >
>>> > If I can do something, or you need any info tell me please
>>> >
>>> > Thanks
>>> > Dushan
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: R4 problem started with 2.6.39 and still there with 3.6.6
  2012-12-09 16:29       ` Dušan Čolić
@ 2012-12-09 16:38         ` Ivan Shapovalov
  2012-12-09 17:12           ` Dušan Čolić
  0 siblings, 1 reply; 28+ messages in thread
From: Ivan Shapovalov @ 2012-12-09 16:38 UTC (permalink / raw)
  To: Dušan Čolić; +Cc: reiserfs-devel

On 09 December 2012 17:29:58 Dušan Čolić wrote:
> On Sun, Dec 9, 2012 at 5:19 PM, Dušan Čolić <dusanc@gmail.com> wrote:
> > On Sun, Dec 9, 2012 at 4:17 PM, Ivan Shapovalov <intelfx100@gmail.com> 
wrote:
> >> On 07 December 2012 19:34:45 Dušan Čolić wrote:
> >>> Ok, on just fscked partition I now get:
> >>> 
> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584096] reiser4[sshd(5058)]:
> >>> find_cluster_item
> >>> (fs/reiser4/plugin/file/cryptcompress.c:781)[edward-1608]:
> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584096] WARNING: Expected item
> >>> not found. Fsck?
> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584104] reiser4[sshd(5058)]:
> >>> dc_check_checksum
> >>> (fs/reiser4/plugin/file/cryptcompress.c:1023)[edward-156]:
> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584104] WARNING: Bad disk
> >>> cluster checksum 1869768224, (should be 950540942) Fsck?
> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584104]
> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584109] reiser4[sshd(5058)]:
> >>> reiser4_inflate_cluster
> >>> (fs/reiser4/plugin/file/cryptcompress.c:1190)[edward-1460]:
> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584109] WARNING: Inode
> >>> 14592305: disk cluster 0 looks corrupted
> >>> Dec  7 19:31:43 krshina3 sshd[5056]: Accepted keyboard-interactive/pam
> >>> for root from 192.168.1.10 port 7531 ssh2
> >>> Dec  7 19:31:43 krshina3 sshd[5056]: pam_unix(sshd:session): session
> >>> opened for user root by (uid=0)
> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637085] reiser4[bash(5066)]:
> >>> find_cluster_item
> >>> (fs/reiser4/plugin/file/cryptcompress.c:781)[edward-1608]:
> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637085] WARNING: Expected item
> >>> not found. Fsck?
> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637094] reiser4[bash(5066)]:
> >>> dc_check_checksum
> >>> (fs/reiser4/plugin/file/cryptcompress.c:1023)[edward-156]:
> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637094] WARNING: Bad disk
> >>> cluster checksum -1945338855, (should be 944271739) Fsck?
> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637094]
> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637098] reiser4[bash(5066)]:
> >>> reiser4_inflate_cluster
> >>> (fs/reiser4/plugin/file/cryptcompress.c:1190)[edward-1460]:
> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637098] WARNING: Inode
> >>> 15185444: disk cluster 0 looks corrupted
> >>> tail: unrecognized file system type 0x52345362 for
> >>> '/var/log/messages'. please report this to bug-coreutils@gnu.org.
> >>> reverting to polling
> >>> 
> >>> This is getting bad, I'm going back to 2.6.39 :D
> >> 
> >> This is exactly what I have here on 3.<anything>.<anything> with a plain
> >> KDE desktop and "a bit of everything" workload. It seems not to be
> >> related to QEMU or loopbacks or something - just intensive random I/O is
> >> what triggers this, no specific patterns I've got so far.
> >> Please tell if it stops happening on 2.6.39 (but remember that it may be
> >> silent for a while) so I can bisect with precision :)
> > 
> > It happens with 2.6.39 now too, but with 2.6.39 computer hangs while
> > with 3.6.6 it still works after 24hrs
> 
> Sorry I messed up, now with 2.6.39 I get same errors on I/O, but files
> look OK. I only got files in lost+found when system oopsed.
> Funny thing is I never had these errors until now and i used 2.6.39
> from Nov 2011.

So what did change from Nov 2011? Have you had the QEMU image on a r4 
partition back to then?
And... try to revert to piix, maybe.

Thanks,
Ivan.

> 
> mount options noatime,onerror=remount-ro, partition type ccreg40
> 
> > If I reformat partition and restore everything from backup do you
> > think it would stop?
> > 
> >> Thanks,
> >> Ivan.
> > 
> > Thanks,
> > Dushan
> > 
> >>> On Fri, Dec 7, 2012 at 6:56 PM, Dušan Čolić <dusanc@gmail.com> wrote:
> >>> > Hello
> >>> > 
> >>> > I'm using KVM for windows emulation and I have a ~3GB image file that
> >>> > I run it from.
> >>> > I started having problems with it lately on regular and ccreg40
> >>> > partitions (I tried same file on both)  using 3.6.6.
> >>> > Spammed output with a lot of these:
> >>> > 
> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133047] reiser4[find(5806)]:
> >>> > parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133047] WARNING: Wrong level
> >>> > found in node: 2 != 1
> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133048] reiser4[find(5806)]:
> >>> > key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133048] WARNING: Error for
> >>> > inode 17802378 (-5)
> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133056] reiser4[find(5806)]:
> >>> > parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133056] WARNING: Wrong level
> >>> > found in node: 2 != 1
> >>> > 
> >>> > 
> >>> > Dec  7 17:26:23 krshina3 kernel: [38539.089191] reiser4[gdm(2676)]:
> >>> > cbk_level_lookup (fs/reiser4/search.c:963)[vs-3533]:
> >>> > Dec  7 17:26:23 krshina3 kernel: [38539.089194] reiser4[gdm(2676)]:
> >>> > key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
> >>> > Dec  7 17:26:23 krshina3 kernel: [38539.090837]
> >>> > reiser4[gnome-screensav(3503)]: cbk_level_lookup
> >>> > (fs/reiser4/search.c:963)[vs-3533]:
> >>> > Dec  7 17:26:23 krshina3 kernel: [38539.090840]
> >>> > reiser4[gnome-screensav(3503)]: key_warning
> >>> > 
> >>> > (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
> >>> >  I fscked the FSes and had some errors that were corrected.
> >>> > 
> >>> > Now I started geting these and I can't kill the offending process:
> >>> > 
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274343] INFO: task
> >>> > qemu-system-x86:4156 blocked for more than 120 seconds.
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274347] "echo 0 >
> >>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274349] qemu-system-x86 D
> >>> > 0000000000000001     0  4156   3654 0x00000000
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274354]  ffff880206dd7990
> >>> > 0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274357]  0000000000011240
> >>> > ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274361]  0000000000011240
> >>> > ffff880206dd7990 0000000000011240 ffff8801def2e000
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274364] Call Trace:
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274372]  [<ffffffff810aec97>]
> >>> > ? pagevec_lookup_tag+0x18/0x21
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274375]  [<ffffffff810a528a>]
> >>> > ? filemap_fdatawait_range+0xff/0x144
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274380]  [<ffffffff81146b09>]
> >>> > ? writepages_unix_file+0x36e/0x3ce
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274384]  [<ffffffff810ab9f9>]
> >>> > ? global_dirtyable_memory+0xd/0x2c
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274389]  [<ffffffff81489dac>]
> >>> > ? __mutex_lock_slowpath+0xd0/0x116
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274392]  [<ffffffff8148a096>]
> >>> > ? mutex_lock+0x1a/0x2d
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274396]  [<ffffffff81143b73>]
> >>> > ? reiser4_sync_file_common+0x58/0xcd
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274400]  [<ffffffff81147a81>]
> >>> > ? write_unix_file+0x442/0x4b7
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274404]  [<ffffffff811498b9>]
> >>> > ? reiser4_write_careful+0xb8/0x450
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274409]  [<ffffffff810da90f>]
> >>> > ? vfs_write+0xaf/0x149
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274412]  [<ffffffff810dab49>]
> >>> > ? sys_pwrite64+0x53/0x71
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274415]  [<ffffffff8148c3e2>]
> >>> > ? system_call_fastpath+0x16/0x1b
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274418] INFO: task
> >>> > qemu-system-x86:4162 blocked for more than 120 seconds.
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274419] "echo 0 >
> >>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274420] qemu-system-x86 D
> >>> > 0000000000000000     0  4162   3654 0x00000000
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274423]  ffff88020f7cacf0
> >>> > 0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274426]  0000000000011240
> >>> > ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274429]  0000000000011240
> >>> > ffff88020f7cacf0 0000000000011240 ffff8801e007e000
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274432] Call Trace:
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274437]  [<ffffffff810aec97>]
> >>> > ? pagevec_lookup_tag+0x18/0x21
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274439]  [<ffffffff810a528a>]
> >>> > ? filemap_fdatawait_range+0xff/0x144
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274443]  [<ffffffff81146b09>]
> >>> > ? writepages_unix_file+0x36e/0x3ce
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274447]  [<ffffffff81489dac>]
> >>> > ? __mutex_lock_slowpath+0xd0/0x116
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274450]  [<ffffffff8148a096>]
> >>> > ? mutex_lock+0x1a/0x2d
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274453]  [<ffffffff81143b73>]
> >>> > ? reiser4_sync_file_common+0x58/0xcd
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274457]  [<ffffffff810fa6d8>]
> >>> > ? do_fsync+0x29/0x47
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274460]  [<ffffffff810fa716>]
> >>> > ? sys_fdatasync+0xe/0x15
> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274462]  [<ffffffff8148c3e2>]
> >>> > ? system_call_fastpath+0x16/0x1b
> >>> > tail: unrecognized file system type 0x52345362 for
> >>> > '/var/log/messages'. please report this to bug-coreutils@gnu.org.
> >>> > reverting to polling
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266595] INFO: task
> >>> > qemu-system-x86:4156 blocked for more than 120 seconds.
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266599] "echo 0 >
> >>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266601] qemu-system-x86 D
> >>> > 0000000000000001     0  4156   3654 0x00000000
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266605]  ffff880206dd7990
> >>> > 0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266609]  0000000000011240
> >>> > ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266613]  0000000000011240
> >>> > ffff880206dd7990 0000000000011240 ffff8801def2e000
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266616] Call Trace:
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266625]  [<ffffffff810aec97>]
> >>> > ? pagevec_lookup_tag+0x18/0x21
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266629]  [<ffffffff810a528a>]
> >>> > ? filemap_fdatawait_range+0xff/0x144
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266634]  [<ffffffff81146b09>]
> >>> > ? writepages_unix_file+0x36e/0x3ce
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266638]  [<ffffffff810ab9f9>]
> >>> > ? global_dirtyable_memory+0xd/0x2c
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266643]  [<ffffffff81489dac>]
> >>> > ? __mutex_lock_slowpath+0xd0/0x116
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266646]  [<ffffffff8148a096>]
> >>> > ? mutex_lock+0x1a/0x2d
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266650]  [<ffffffff81143b73>]
> >>> > ? reiser4_sync_file_common+0x58/0xcd
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266654]  [<ffffffff81147a81>]
> >>> > ? write_unix_file+0x442/0x4b7
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266658]  [<ffffffff811498b9>]
> >>> > ? reiser4_write_careful+0xb8/0x450
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266664]  [<ffffffff810da90f>]
> >>> > ? vfs_write+0xaf/0x149
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266666]  [<ffffffff810dab49>]
> >>> > ? sys_pwrite64+0x53/0x71
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266670]  [<ffffffff8148c3e2>]
> >>> > ? system_call_fastpath+0x16/0x1b
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266672] INFO: task
> >>> > qemu-system-x86:4162 blocked for more than 120 seconds.
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266674] "echo 0 >
> >>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266675] qemu-system-x86 D
> >>> > 0000000000000000     0  4162   3654 0x00000000
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266678]  ffff88020f7cacf0
> >>> > 0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266681]  0000000000011240
> >>> > ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266684]  0000000000011240
> >>> > ffff88020f7cacf0 0000000000011240 ffff8801e007e000
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266687] Call Trace:
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266691]  [<ffffffff810aec97>]
> >>> > ? pagevec_lookup_tag+0x18/0x21
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266694]  [<ffffffff810a528a>]
> >>> > ? filemap_fdatawait_range+0xff/0x144
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266698]  [<ffffffff81146b09>]
> >>> > ? writepages_unix_file+0x36e/0x3ce
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266702]  [<ffffffff81489dac>]
> >>> > ? __mutex_lock_slowpath+0xd0/0x116
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266705]  [<ffffffff8148a096>]
> >>> > ? mutex_lock+0x1a/0x2d
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266708]  [<ffffffff81143b73>]
> >>> > ? reiser4_sync_file_common+0x58/0xcd
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266712]  [<ffffffff810fa6d8>]
> >>> > ? do_fsync+0x29/0x47
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266714]  [<ffffffff810fa716>]
> >>> > ? sys_fdatasync+0xe/0x15
> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266717]  [<ffffffff8148c3e2>]
> >>> > ? system_call_fastpath+0x16/0x1b
> >>> > 
> >>> > 
> >>> > File runs fine from FAT32 partition
> >>> > 
> >>> > If I can do something, or you need any info tell me please
> >>> > 
> >>> > Thanks
> >>> > Dushan
> >>> 
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe
> >>> reiserfs-devel" in the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: R4 problem started with 2.6.39 and still there with 3.6.6
  2012-12-09 16:38         ` Ivan Shapovalov
@ 2012-12-09 17:12           ` Dušan Čolić
  2012-12-09 17:54             ` Dušan Čolić
  0 siblings, 1 reply; 28+ messages in thread
From: Dušan Čolić @ 2012-12-09 17:12 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel

On Sun, Dec 9, 2012 at 4:38 PM, Ivan Shapovalov <intelfx100@gmail.com> wrote:
> On 09 December 2012 17:29:58 Dušan Čolić wrote:
>> On Sun, Dec 9, 2012 at 5:19 PM, Dušan Čolić <dusanc@gmail.com> wrote:
>> > On Sun, Dec 9, 2012 at 4:17 PM, Ivan Shapovalov <intelfx100@gmail.com>
> wrote:
>> >> On 07 December 2012 19:34:45 Dušan Čolić wrote:
>> >>> Ok, on just fscked partition I now get:
>> >>>
>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584096] reiser4[sshd(5058)]:
>> >>> find_cluster_item
>> >>> (fs/reiser4/plugin/file/cryptcompress.c:781)[edward-1608]:
>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584096] WARNING: Expected item
>> >>> not found. Fsck?
>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584104] reiser4[sshd(5058)]:
>> >>> dc_check_checksum
>> >>> (fs/reiser4/plugin/file/cryptcompress.c:1023)[edward-156]:
>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584104] WARNING: Bad disk
>> >>> cluster checksum 1869768224, (should be 950540942) Fsck?
>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584104]
>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584109] reiser4[sshd(5058)]:
>> >>> reiser4_inflate_cluster
>> >>> (fs/reiser4/plugin/file/cryptcompress.c:1190)[edward-1460]:
>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584109] WARNING: Inode
>> >>> 14592305: disk cluster 0 looks corrupted
>> >>> Dec  7 19:31:43 krshina3 sshd[5056]: Accepted keyboard-interactive/pam
>> >>> for root from 192.168.1.10 port 7531 ssh2
>> >>> Dec  7 19:31:43 krshina3 sshd[5056]: pam_unix(sshd:session): session
>> >>> opened for user root by (uid=0)
>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637085] reiser4[bash(5066)]:
>> >>> find_cluster_item
>> >>> (fs/reiser4/plugin/file/cryptcompress.c:781)[edward-1608]:
>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637085] WARNING: Expected item
>> >>> not found. Fsck?
>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637094] reiser4[bash(5066)]:
>> >>> dc_check_checksum
>> >>> (fs/reiser4/plugin/file/cryptcompress.c:1023)[edward-156]:
>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637094] WARNING: Bad disk
>> >>> cluster checksum -1945338855, (should be 944271739) Fsck?
>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637094]
>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637098] reiser4[bash(5066)]:
>> >>> reiser4_inflate_cluster
>> >>> (fs/reiser4/plugin/file/cryptcompress.c:1190)[edward-1460]:
>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637098] WARNING: Inode
>> >>> 15185444: disk cluster 0 looks corrupted
>> >>> tail: unrecognized file system type 0x52345362 for
>> >>> '/var/log/messages'. please report this to bug-coreutils@gnu.org.
>> >>> reverting to polling
>> >>>
>> >>> This is getting bad, I'm going back to 2.6.39 :D
>> >>
>> >> This is exactly what I have here on 3.<anything>.<anything> with a plain
>> >> KDE desktop and "a bit of everything" workload. It seems not to be
>> >> related to QEMU or loopbacks or something - just intensive random I/O is
>> >> what triggers this, no specific patterns I've got so far.
>> >> Please tell if it stops happening on 2.6.39 (but remember that it may be
>> >> silent for a while) so I can bisect with precision :)
>> >
>> > It happens with 2.6.39 now too, but with 2.6.39 computer hangs while
>> > with 3.6.6 it still works after 24hrs
>>
>> Sorry I messed up, now with 2.6.39 I get same errors on I/O, but files
>> look OK. I only got files in lost+found when system oopsed.
>> Funny thing is I never had these errors until now and i used 2.6.39
>> from Nov 2011.
>
> So what did change from Nov 2011? Have you had the QEMU image on a r4
> partition back to then?
Yeah same usage scenario, Gentoo machine with months of uptime with
lots of recompiling and QEMU-KVM virtual machine with WinXP
Did a small change in kernel config in July 2012, same kernel 2.6.39.4

> And... try to revert to piix, maybe.
>
I'll try.
I;m currently fscking the /fs and I got:
FSCK: node.c: 108: repair_node_items_check: Node (3205815), items (27)
and (28): Wrong order of keys.
FSCK: filter.c: 407: repair_filter_update_traverse: Node (3205815):
the node is broken. Pointed from the node (2871633), item (12), unit
(0). The whole subtree is skipped.

--build-fs did:
FSCK: node.c: 108: repair_node_items_check: Node (3205815), items (27)
and (28): Wrong order of keys.
FSCK: filter.c: 407: repair_filter_update_traverse: Node (3205815):
the node is unrecoverable. Pointed from the node (2871633), item (12),
unit (0). Removed.
FSCK: obj40_repair.c: 146: obj40_check_bytes_report: Node (100919),
item (17), [12e38:76785f68775361:fd23c2] (stat40): wrong bytes
(634880), Fixed to (0).
FSCK: obj40_repair.c: 373: obj40_stat_lw_check: Node (100919), item
(17), [12e38:76785f68775361:fd23c2] (stat40): wrong size (633454),
Fixed to (0).
FSCK: ccreg40_repair.c: 77: ccreg40_check_item: The file
[12e38:767465364b4534:1185752] (ccreg40), node [1795], item [0]: item
of the wrong cluster size (8192) found, Should be (65536). Fixed.
FSCK: ccreg40_repair.c: 77: ccreg40_check_item: The file
[12e38:767465364d4534:1185753] (ccreg40), node [3211856], item [1]:
item of the wrong cluster size (2048) found, Should be (65536). Fixed.
FSCK: ccreg40_repair.c: 77: ccreg40_check_item: The file
[12e38:76746538463156:118493e] (ccreg40), node [3211856], item [2]:
item of the wrong cluster size (268435456) found, Should be (65536).
Fixed.

That's the damage after using 2.6.39 with light workload - compilation
of few small packages with -j3

> Thanks,
> Ivan.
>
>>
>> mount options noatime,onerror=remount-ro, partition type ccreg40
>>
>> > If I reformat partition and restore everything from backup do you
>> > think it would stop?
>> >
>> >> Thanks,
>> >> Ivan.
>> >
>> > Thanks,
>> > Dushan
>> >
>> >>> On Fri, Dec 7, 2012 at 6:56 PM, Dušan Čolić <dusanc@gmail.com> wrote:
>> >>> > Hello
>> >>> >
>> >>> > I'm using KVM for windows emulation and I have a ~3GB image file that
>> >>> > I run it from.
>> >>> > I started having problems with it lately on regular and ccreg40
>> >>> > partitions (I tried same file on both)  using 3.6.6.
>> >>> > Spammed output with a lot of these:
>> >>> >
>> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133047] reiser4[find(5806)]:
>> >>> > parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
>> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133047] WARNING: Wrong level
>> >>> > found in node: 2 != 1
>> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133048] reiser4[find(5806)]:
>> >>> > key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
>> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133048] WARNING: Error for
>> >>> > inode 17802378 (-5)
>> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133056] reiser4[find(5806)]:
>> >>> > parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
>> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133056] WARNING: Wrong level
>> >>> > found in node: 2 != 1
>> >>> >
>> >>> >
>> >>> > Dec  7 17:26:23 krshina3 kernel: [38539.089191] reiser4[gdm(2676)]:
>> >>> > cbk_level_lookup (fs/reiser4/search.c:963)[vs-3533]:
>> >>> > Dec  7 17:26:23 krshina3 kernel: [38539.089194] reiser4[gdm(2676)]:
>> >>> > key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
>> >>> > Dec  7 17:26:23 krshina3 kernel: [38539.090837]
>> >>> > reiser4[gnome-screensav(3503)]: cbk_level_lookup
>> >>> > (fs/reiser4/search.c:963)[vs-3533]:
>> >>> > Dec  7 17:26:23 krshina3 kernel: [38539.090840]
>> >>> > reiser4[gnome-screensav(3503)]: key_warning
>> >>> >
>> >>> > (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
>> >>> >  I fscked the FSes and had some errors that were corrected.
>> >>> >
>> >>> > Now I started geting these and I can't kill the offending process:
>> >>> >
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274343] INFO: task
>> >>> > qemu-system-x86:4156 blocked for more than 120 seconds.
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274347] "echo 0 >
>> >>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274349] qemu-system-x86 D
>> >>> > 0000000000000001     0  4156   3654 0x00000000
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274354]  ffff880206dd7990
>> >>> > 0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274357]  0000000000011240
>> >>> > ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274361]  0000000000011240
>> >>> > ffff880206dd7990 0000000000011240 ffff8801def2e000
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274364] Call Trace:
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274372]  [<ffffffff810aec97>]
>> >>> > ? pagevec_lookup_tag+0x18/0x21
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274375]  [<ffffffff810a528a>]
>> >>> > ? filemap_fdatawait_range+0xff/0x144
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274380]  [<ffffffff81146b09>]
>> >>> > ? writepages_unix_file+0x36e/0x3ce
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274384]  [<ffffffff810ab9f9>]
>> >>> > ? global_dirtyable_memory+0xd/0x2c
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274389]  [<ffffffff81489dac>]
>> >>> > ? __mutex_lock_slowpath+0xd0/0x116
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274392]  [<ffffffff8148a096>]
>> >>> > ? mutex_lock+0x1a/0x2d
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274396]  [<ffffffff81143b73>]
>> >>> > ? reiser4_sync_file_common+0x58/0xcd
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274400]  [<ffffffff81147a81>]
>> >>> > ? write_unix_file+0x442/0x4b7
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274404]  [<ffffffff811498b9>]
>> >>> > ? reiser4_write_careful+0xb8/0x450
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274409]  [<ffffffff810da90f>]
>> >>> > ? vfs_write+0xaf/0x149
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274412]  [<ffffffff810dab49>]
>> >>> > ? sys_pwrite64+0x53/0x71
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274415]  [<ffffffff8148c3e2>]
>> >>> > ? system_call_fastpath+0x16/0x1b
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274418] INFO: task
>> >>> > qemu-system-x86:4162 blocked for more than 120 seconds.
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274419] "echo 0 >
>> >>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274420] qemu-system-x86 D
>> >>> > 0000000000000000     0  4162   3654 0x00000000
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274423]  ffff88020f7cacf0
>> >>> > 0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274426]  0000000000011240
>> >>> > ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274429]  0000000000011240
>> >>> > ffff88020f7cacf0 0000000000011240 ffff8801e007e000
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274432] Call Trace:
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274437]  [<ffffffff810aec97>]
>> >>> > ? pagevec_lookup_tag+0x18/0x21
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274439]  [<ffffffff810a528a>]
>> >>> > ? filemap_fdatawait_range+0xff/0x144
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274443]  [<ffffffff81146b09>]
>> >>> > ? writepages_unix_file+0x36e/0x3ce
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274447]  [<ffffffff81489dac>]
>> >>> > ? __mutex_lock_slowpath+0xd0/0x116
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274450]  [<ffffffff8148a096>]
>> >>> > ? mutex_lock+0x1a/0x2d
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274453]  [<ffffffff81143b73>]
>> >>> > ? reiser4_sync_file_common+0x58/0xcd
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274457]  [<ffffffff810fa6d8>]
>> >>> > ? do_fsync+0x29/0x47
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274460]  [<ffffffff810fa716>]
>> >>> > ? sys_fdatasync+0xe/0x15
>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274462]  [<ffffffff8148c3e2>]
>> >>> > ? system_call_fastpath+0x16/0x1b
>> >>> > tail: unrecognized file system type 0x52345362 for
>> >>> > '/var/log/messages'. please report this to bug-coreutils@gnu.org.
>> >>> > reverting to polling
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266595] INFO: task
>> >>> > qemu-system-x86:4156 blocked for more than 120 seconds.
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266599] "echo 0 >
>> >>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266601] qemu-system-x86 D
>> >>> > 0000000000000001     0  4156   3654 0x00000000
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266605]  ffff880206dd7990
>> >>> > 0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266609]  0000000000011240
>> >>> > ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266613]  0000000000011240
>> >>> > ffff880206dd7990 0000000000011240 ffff8801def2e000
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266616] Call Trace:
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266625]  [<ffffffff810aec97>]
>> >>> > ? pagevec_lookup_tag+0x18/0x21
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266629]  [<ffffffff810a528a>]
>> >>> > ? filemap_fdatawait_range+0xff/0x144
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266634]  [<ffffffff81146b09>]
>> >>> > ? writepages_unix_file+0x36e/0x3ce
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266638]  [<ffffffff810ab9f9>]
>> >>> > ? global_dirtyable_memory+0xd/0x2c
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266643]  [<ffffffff81489dac>]
>> >>> > ? __mutex_lock_slowpath+0xd0/0x116
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266646]  [<ffffffff8148a096>]
>> >>> > ? mutex_lock+0x1a/0x2d
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266650]  [<ffffffff81143b73>]
>> >>> > ? reiser4_sync_file_common+0x58/0xcd
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266654]  [<ffffffff81147a81>]
>> >>> > ? write_unix_file+0x442/0x4b7
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266658]  [<ffffffff811498b9>]
>> >>> > ? reiser4_write_careful+0xb8/0x450
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266664]  [<ffffffff810da90f>]
>> >>> > ? vfs_write+0xaf/0x149
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266666]  [<ffffffff810dab49>]
>> >>> > ? sys_pwrite64+0x53/0x71
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266670]  [<ffffffff8148c3e2>]
>> >>> > ? system_call_fastpath+0x16/0x1b
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266672] INFO: task
>> >>> > qemu-system-x86:4162 blocked for more than 120 seconds.
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266674] "echo 0 >
>> >>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266675] qemu-system-x86 D
>> >>> > 0000000000000000     0  4162   3654 0x00000000
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266678]  ffff88020f7cacf0
>> >>> > 0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266681]  0000000000011240
>> >>> > ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266684]  0000000000011240
>> >>> > ffff88020f7cacf0 0000000000011240 ffff8801e007e000
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266687] Call Trace:
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266691]  [<ffffffff810aec97>]
>> >>> > ? pagevec_lookup_tag+0x18/0x21
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266694]  [<ffffffff810a528a>]
>> >>> > ? filemap_fdatawait_range+0xff/0x144
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266698]  [<ffffffff81146b09>]
>> >>> > ? writepages_unix_file+0x36e/0x3ce
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266702]  [<ffffffff81489dac>]
>> >>> > ? __mutex_lock_slowpath+0xd0/0x116
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266705]  [<ffffffff8148a096>]
>> >>> > ? mutex_lock+0x1a/0x2d
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266708]  [<ffffffff81143b73>]
>> >>> > ? reiser4_sync_file_common+0x58/0xcd
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266712]  [<ffffffff810fa6d8>]
>> >>> > ? do_fsync+0x29/0x47
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266714]  [<ffffffff810fa716>]
>> >>> > ? sys_fdatasync+0xe/0x15
>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266717]  [<ffffffff8148c3e2>]
>> >>> > ? system_call_fastpath+0x16/0x1b
>> >>> >
>> >>> >
>> >>> > File runs fine from FAT32 partition
>> >>> >
>> >>> > If I can do something, or you need any info tell me please
>> >>> >
>> >>> > Thanks
>> >>> > Dushan
>> >>>
>> >>> --
>> >>> To unsubscribe from this list: send the line "unsubscribe
>> >>> reiserfs-devel" in the body of a message to majordomo@vger.kernel.org
>> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: R4 problem started with 2.6.39 and still there with 3.6.6
  2012-12-09 17:12           ` Dušan Čolić
@ 2012-12-09 17:54             ` Dušan Čolić
  2012-12-10 20:08               ` Dušan Čolić
  2012-12-11 15:08               ` Kernel config option which causes reiser4 to be instable Ivan Shapovalov
  0 siblings, 2 replies; 28+ messages in thread
From: Dušan Čolić @ 2012-12-09 17:54 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel

On Sun, Dec 9, 2012 at 6:12 PM, Dušan Čolić <dusanc@gmail.com> wrote:
> On Sun, Dec 9, 2012 at 4:38 PM, Ivan Shapovalov <intelfx100@gmail.com> wrote:
>> On 09 December 2012 17:29:58 Dušan Čolić wrote:
>>> On Sun, Dec 9, 2012 at 5:19 PM, Dušan Čolić <dusanc@gmail.com> wrote:
>>> > On Sun, Dec 9, 2012 at 4:17 PM, Ivan Shapovalov <intelfx100@gmail.com>
>> wrote:
>>> >> On 07 December 2012 19:34:45 Dušan Čolić wrote:
>>> >>> Ok, on just fscked partition I now get:
>>> >>>
>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584096] reiser4[sshd(5058)]:
>>> >>> find_cluster_item
>>> >>> (fs/reiser4/plugin/file/cryptcompress.c:781)[edward-1608]:
>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584096] WARNING: Expected item
>>> >>> not found. Fsck?
>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584104] reiser4[sshd(5058)]:
>>> >>> dc_check_checksum
>>> >>> (fs/reiser4/plugin/file/cryptcompress.c:1023)[edward-156]:
>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584104] WARNING: Bad disk
>>> >>> cluster checksum 1869768224, (should be 950540942) Fsck?
>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584104]
>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584109] reiser4[sshd(5058)]:
>>> >>> reiser4_inflate_cluster
>>> >>> (fs/reiser4/plugin/file/cryptcompress.c:1190)[edward-1460]:
>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584109] WARNING: Inode
>>> >>> 14592305: disk cluster 0 looks corrupted
>>> >>> Dec  7 19:31:43 krshina3 sshd[5056]: Accepted keyboard-interactive/pam
>>> >>> for root from 192.168.1.10 port 7531 ssh2
>>> >>> Dec  7 19:31:43 krshina3 sshd[5056]: pam_unix(sshd:session): session
>>> >>> opened for user root by (uid=0)
>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637085] reiser4[bash(5066)]:
>>> >>> find_cluster_item
>>> >>> (fs/reiser4/plugin/file/cryptcompress.c:781)[edward-1608]:
>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637085] WARNING: Expected item
>>> >>> not found. Fsck?
>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637094] reiser4[bash(5066)]:
>>> >>> dc_check_checksum
>>> >>> (fs/reiser4/plugin/file/cryptcompress.c:1023)[edward-156]:
>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637094] WARNING: Bad disk
>>> >>> cluster checksum -1945338855, (should be 944271739) Fsck?
>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637094]
>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637098] reiser4[bash(5066)]:
>>> >>> reiser4_inflate_cluster
>>> >>> (fs/reiser4/plugin/file/cryptcompress.c:1190)[edward-1460]:
>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637098] WARNING: Inode
>>> >>> 15185444: disk cluster 0 looks corrupted
>>> >>> tail: unrecognized file system type 0x52345362 for
>>> >>> '/var/log/messages'. please report this to bug-coreutils@gnu.org.
>>> >>> reverting to polling
>>> >>>
>>> >>> This is getting bad, I'm going back to 2.6.39 :D
>>> >>
>>> >> This is exactly what I have here on 3.<anything>.<anything> with a plain
>>> >> KDE desktop and "a bit of everything" workload. It seems not to be
>>> >> related to QEMU or loopbacks or something - just intensive random I/O is
>>> >> what triggers this, no specific patterns I've got so far.
>>> >> Please tell if it stops happening on 2.6.39 (but remember that it may be
>>> >> silent for a while) so I can bisect with precision :)
>>> >
>>> > It happens with 2.6.39 now too, but with 2.6.39 computer hangs while
>>> > with 3.6.6 it still works after 24hrs
>>>
>>> Sorry I messed up, now with 2.6.39 I get same errors on I/O, but files
>>> look OK. I only got files in lost+found when system oopsed.
>>> Funny thing is I never had these errors until now and i used 2.6.39
>>> from Nov 2011.
>>
>> So what did change from Nov 2011? Have you had the QEMU image on a r4
>> partition back to then?
> Yeah same usage scenario, Gentoo machine with months of uptime with
> lots of recompiling and QEMU-KVM virtual machine with WinXP
> Did a small change in kernel config in July 2012, same kernel 2.6.39.4
>
>> And... try to revert to piix, maybe.
>>

Ok I reverted back to old kernel that worked, 2.6.39 from july, and am
currently compiling GCC and using QEMU and everything works fine so
far.
I looked at the configs and looks like I did some more messing than
switching from piix to ahci, look for yourself if anything jumps as
buggy:

 diff -uprN config.old config.new
--- config.old	2012-12-09 18:41:04.372679078 +0100
+++ config.new	2012-12-09 18:41:57.359007631 +0100
@@ -1,7 +1,7 @@
 #
 # Automatically generated make config: don't edit
 # Linux/x86_64 2.6.39.4 Kernel Configuration
-# Sat Jun 23 20:57:39 2012
+# Tue Dec  4 16:56:13 2012
 #
 CONFIG_64BIT=y
 # CONFIG_X86_32 is not set
@@ -284,9 +284,9 @@ CONFIG_SWIOTLB=y
 CONFIG_IOMMU_HELPER=y
 # CONFIG_IOMMU_API is not set
 # CONFIG_MAXSMP is not set
-CONFIG_NR_CPUS=2
-# CONFIG_SCHED_SMT is not set
-# CONFIG_SCHED_MC is not set
+CONFIG_NR_CPUS=4
+CONFIG_SCHED_SMT=y
+CONFIG_SCHED_MC=y
 # CONFIG_IRQ_TIME_ACCOUNTING is not set
 # CONFIG_PREEMPT_NONE is not set
 CONFIG_PREEMPT_VOLUNTARY=y
@@ -324,7 +324,8 @@ CONFIG_HAVE_MEMBLOCK=y
 # CONFIG_MEMORY_HOTPLUG is not set
 CONFIG_PAGEFLAGS_EXTENDED=y
 CONFIG_SPLIT_PTLOCK_CPUS=4
-# CONFIG_COMPACTION is not set
+CONFIG_COMPACTION=y
+CONFIG_MIGRATION=y
 CONFIG_PHYS_ADDR_T_64BIT=y
 CONFIG_ZONE_DMA_FLAG=1
 CONFIG_BOUNCE=y
@@ -334,7 +335,9 @@ CONFIG_MMU_NOTIFIER=y
 CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
 CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y
 # CONFIG_MEMORY_FAILURE is not set
-# CONFIG_TRANSPARENT_HUGEPAGE is not set
+CONFIG_TRANSPARENT_HUGEPAGE=y
+CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
+# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
 # CONFIG_X86_CHECK_BIOS_CORRUPTION is not set
 CONFIG_X86_RESERVE_LOW=64
 CONFIG_MTRR=y
@@ -840,97 +843,18 @@ CONFIG_SCSI_CONSTANTS=y
 CONFIG_ATA=y
 # CONFIG_ATA_NONSTANDARD is not set
 CONFIG_ATA_VERBOSE_ERROR=y
-# CONFIG_ATA_ACPI is not set
+CONFIG_ATA_ACPI=y
 # CONFIG_SATA_PMP is not set

 #
 # Controllers with non-SFF native interface
 #
-# CONFIG_SATA_AHCI is not set
+CONFIG_SATA_AHCI=y
 # CONFIG_SATA_AHCI_PLATFORM is not set
 # CONFIG_SATA_INIC162X is not set
 # CONFIG_SATA_ACARD_AHCI is not set
 # CONFIG_SATA_SIL24 is not set
-CONFIG_ATA_SFF=y
-
-#
-# SFF controllers with custom DMA interface
-#
-# CONFIG_PDC_ADMA is not set
-# CONFIG_SATA_QSTOR is not set
-# CONFIG_SATA_SX4 is not set
-CONFIG_ATA_BMDMA=y
-
-#
-# SATA SFF controllers with BMDMA
-#
-CONFIG_ATA_PIIX=y
-# CONFIG_SATA_MV is not set
-# CONFIG_SATA_NV is not set
-# CONFIG_SATA_PROMISE is not set
-# CONFIG_SATA_SIL is not set
-# CONFIG_SATA_SIS is not set
-# CONFIG_SATA_SVW is not set
-# CONFIG_SATA_ULI is not set
-# CONFIG_SATA_VIA is not set
-# CONFIG_SATA_VITESSE is not set
-
-#
-# PATA SFF controllers with BMDMA
-#
-# CONFIG_PATA_ALI is not set
-# CONFIG_PATA_AMD is not set
-# CONFIG_PATA_ARASAN_CF is not set
-# CONFIG_PATA_ARTOP is not set
-# CONFIG_PATA_ATIIXP is not set
-# CONFIG_PATA_ATP867X is not set
-# CONFIG_PATA_CMD64X is not set
-# CONFIG_PATA_CS5520 is not set
-# CONFIG_PATA_CS5530 is not set
-# CONFIG_PATA_CS5536 is not set
-# CONFIG_PATA_CYPRESS is not set
-# CONFIG_PATA_EFAR is not set
-# CONFIG_PATA_HPT366 is not set
-# CONFIG_PATA_HPT37X is not set
-# CONFIG_PATA_HPT3X2N is not set
-# CONFIG_PATA_HPT3X3 is not set
-# CONFIG_PATA_IT8213 is not set
-# CONFIG_PATA_IT821X is not set
-# CONFIG_PATA_JMICRON is not set
-# CONFIG_PATA_MARVELL is not set
-# CONFIG_PATA_NETCELL is not set
-# CONFIG_PATA_NINJA32 is not set
-# CONFIG_PATA_NS87415 is not set
-# CONFIG_PATA_OLDPIIX is not set
-# CONFIG_PATA_OPTIDMA is not set
-# CONFIG_PATA_PDC2027X is not set
-# CONFIG_PATA_PDC_OLD is not set
-# CONFIG_PATA_RADISYS is not set
-# CONFIG_PATA_RDC is not set
-# CONFIG_PATA_SC1200 is not set
-# CONFIG_PATA_SCH is not set
-# CONFIG_PATA_SERVERWORKS is not set
-# CONFIG_PATA_SIL680 is not set
-# CONFIG_PATA_SIS is not set
-# CONFIG_PATA_TOSHIBA is not set
-# CONFIG_PATA_TRIFLEX is not set
-# CONFIG_PATA_VIA is not set
-# CONFIG_PATA_WINBOND is not set
-
-#
-# PIO-only SFF controllers
-#
-# CONFIG_PATA_CMD640_PCI is not set
-# CONFIG_PATA_MPIIX is not set
-# CONFIG_PATA_NS87410 is not set
-# CONFIG_PATA_OPTI is not set
-# CONFIG_PATA_RZ1000 is not set
-
-#
-# Generic fallback / legacy drivers
-#
-# CONFIG_ATA_GENERIC is not set
-# CONFIG_PATA_LEGACY is not set
+# CONFIG_ATA_SFF is not set
 CONFIG_MD=y
 CONFIG_BLK_DEV_MD=y
 CONFIG_MD_AUTODETECT=y
@@ -985,9 +909,9 @@ CONFIG_R8169=y
 # CONFIG_BNX2 is not set
 # CONFIG_CNIC is not set
 # CONFIG_QLA3XXX is not set
-# CONFIG_ATL1 is not set
-# CONFIG_ATL1E is not set
-# CONFIG_ATL1C is not set
+CONFIG_ATL1=y
+CONFIG_ATL1E=y
+CONFIG_ATL1C=y
 # CONFIG_JME is not set
 # CONFIG_STMMAC_ETH is not set
 # CONFIG_PCH_GBE is not set
@@ -1126,6 +1050,8 @@ CONFIG_SERIAL_8250_RUNTIME_UARTS=4
 #
 # Non-8250 serial port support
 #
+# CONFIG_SERIAL_MAX3100 is not set
+# CONFIG_SERIAL_MAX3107 is not set
 # CONFIG_SERIAL_MFD_HSU is not set
 CONFIG_SERIAL_CORE=y
 CONFIG_SERIAL_CORE_CONSOLE=y
@@ -1208,7 +1134,25 @@ CONFIG_I2C_PIIX4=y
 # CONFIG_I2C_DEBUG_CORE is not set
 # CONFIG_I2C_DEBUG_ALGO is not set
 # CONFIG_I2C_DEBUG_BUS is not set
-# CONFIG_SPI is not set
+CONFIG_SPI=y
+# CONFIG_SPI_DEBUG is not set
+CONFIG_SPI_MASTER=y
+
+#
+# SPI Master Controller Drivers
+#
+# CONFIG_SPI_ALTERA is not set
+# CONFIG_SPI_BITBANG is not set
+# CONFIG_SPI_PXA2XX_PCI is not set
+# CONFIG_SPI_TOPCLIFF_PCH is not set
+# CONFIG_SPI_XILINX is not set
+# CONFIG_SPI_DESIGNWARE is not set
+
+#
+# SPI Protocol Masters
+#
+# CONFIG_SPI_SPIDEV is not set
+# CONFIG_SPI_TLE62X0 is not set

 #
 # PPS support
@@ -1241,6 +1185,7 @@ CONFIG_HWMON=y
 # CONFIG_SENSORS_ABITUGURU3 is not set
 # CONFIG_SENSORS_AD7414 is not set
 # CONFIG_SENSORS_AD7418 is not set
+# CONFIG_SENSORS_ADCXX is not set
 # CONFIG_SENSORS_ADM1021 is not set
 # CONFIG_SENSORS_ADM1025 is not set
 # CONFIG_SENSORS_ADM1026 is not set
@@ -1272,6 +1217,7 @@ CONFIG_HWMON=y
 # CONFIG_SENSORS_JC42 is not set
 # CONFIG_SENSORS_LINEAGE is not set
 # CONFIG_SENSORS_LM63 is not set
+# CONFIG_SENSORS_LM70 is not set
 # CONFIG_SENSORS_LM73 is not set
 # CONFIG_SENSORS_LM75 is not set
 # CONFIG_SENSORS_LM77 is not set
@@ -1288,6 +1234,7 @@ CONFIG_HWMON=y
 # CONFIG_SENSORS_LTC4245 is not set
 # CONFIG_SENSORS_LTC4261 is not set
 # CONFIG_SENSORS_LM95241 is not set
+# CONFIG_SENSORS_MAX1111 is not set
 # CONFIG_SENSORS_MAX1619 is not set
 # CONFIG_SENSORS_MAX6639 is not set
 # CONFIG_SENSORS_MAX6650 is not set
@@ -1307,6 +1254,7 @@ CONFIG_HWMON=y
 # CONFIG_SENSORS_SCH5627 is not set
 # CONFIG_SENSORS_ADS1015 is not set
 # CONFIG_SENSORS_ADS7828 is not set
+# CONFIG_SENSORS_ADS7871 is not set
 # CONFIG_SENSORS_AMC6821 is not set
 # CONFIG_SENSORS_THMC50 is not set
 # CONFIG_SENSORS_TMP102 is not set
@@ -1353,7 +1301,7 @@ CONFIG_AGP=y
 # CONFIG_AGP_SIS is not set
 # CONFIG_AGP_VIA is not set
 CONFIG_VGA_ARB=y
-CONFIG_VGA_ARB_MAX_GPUS=2
+CONFIG_VGA_ARB_MAX_GPUS=3
 # CONFIG_VGA_SWITCHEROO is not set
 CONFIG_DRM=y
 CONFIG_DRM_KMS_HELPER=y
@@ -1431,7 +1379,12 @@ CONFIG_FB_CFB_IMAGEBLIT=y
 # CONFIG_FB_BROADSHEET is not set
 CONFIG_BACKLIGHT_LCD_SUPPORT=y
 CONFIG_LCD_CLASS_DEVICE=y
+# CONFIG_LCD_LTV350QV is not set
+# CONFIG_LCD_TDO24M is not set
+# CONFIG_LCD_VGG2432A4 is not set
 # CONFIG_LCD_PLATFORM is not set
+# CONFIG_LCD_S6E63M0 is not set
+# CONFIG_LCD_LD9040 is not set
 CONFIG_BACKLIGHT_CLASS_DEVICE=y
 CONFIG_BACKLIGHT_GENERIC=y
 # CONFIG_BACKLIGHT_PROGEAR is not set
@@ -1571,6 +1524,7 @@ CONFIG_SND_HDA_GENERIC=y
 # CONFIG_SND_VIRTUOSO is not set
 # CONFIG_SND_VX222 is not set
 # CONFIG_SND_YMFPCI is not set
+CONFIG_SND_SPI=y
 CONFIG_SND_USB=y
 CONFIG_SND_USB_AUDIO=y
 # CONFIG_SND_USB_UA101 is not set
@@ -1769,7 +1723,26 @@ CONFIG_USB_STORAGE=y
 # CONFIG_NFC_DEVICES is not set
 # CONFIG_ACCESSIBILITY is not set
 # CONFIG_INFINIBAND is not set
-# CONFIG_EDAC is not set
+CONFIG_EDAC=y
+
+#
+# Reporting subsystems
+#
+# CONFIG_EDAC_DEBUG is not set
+CONFIG_EDAC_DECODE_MCE=y
+# CONFIG_EDAC_MCE_INJ is not set
+CONFIG_EDAC_MM_EDAC=y
+# CONFIG_EDAC_AMD64 is not set
+# CONFIG_EDAC_E752X is not set
+# CONFIG_EDAC_I82975X is not set
+# CONFIG_EDAC_I3000 is not set
+# CONFIG_EDAC_I3200 is not set
+# CONFIG_EDAC_X38 is not set
+# CONFIG_EDAC_I5400 is not set
+# CONFIG_EDAC_I7CORE is not set
+# CONFIG_EDAC_I5000 is not set
+# CONFIG_EDAC_I5100 is not set
+# CONFIG_EDAC_I7300 is not set
 CONFIG_RTC_LIB=y
 CONFIG_RTC_CLASS=y
 CONFIG_RTC_HCTOSYS=y
@@ -1809,6 +1782,14 @@ CONFIG_RTC_INTF_DEV=y
 #
 # SPI RTC drivers
 #
+# CONFIG_RTC_DRV_M41T94 is not set
+# CONFIG_RTC_DRV_DS1305 is not set
+# CONFIG_RTC_DRV_DS1390 is not set
+# CONFIG_RTC_DRV_MAX6902 is not set
+# CONFIG_RTC_DRV_R9701 is not set
+# CONFIG_RTC_DRV_RS5C348 is not set
+# CONFIG_RTC_DRV_DS3234 is not set
+# CONFIG_RTC_DRV_PCF2123 is not set

 #
 # Platform RTC drivers

Thanks a lot for the help

Dushan

> I'll try.
> I;m currently fscking the /fs and I got:
> FSCK: node.c: 108: repair_node_items_check: Node (3205815), items (27)
> and (28): Wrong order of keys.
> FSCK: filter.c: 407: repair_filter_update_traverse: Node (3205815):
> the node is broken. Pointed from the node (2871633), item (12), unit
> (0). The whole subtree is skipped.
>
> --build-fs did:
> FSCK: node.c: 108: repair_node_items_check: Node (3205815), items (27)
> and (28): Wrong order of keys.
> FSCK: filter.c: 407: repair_filter_update_traverse: Node (3205815):
> the node is unrecoverable. Pointed from the node (2871633), item (12),
> unit (0). Removed.
> FSCK: obj40_repair.c: 146: obj40_check_bytes_report: Node (100919),
> item (17), [12e38:76785f68775361:fd23c2] (stat40): wrong bytes
> (634880), Fixed to (0).
> FSCK: obj40_repair.c: 373: obj40_stat_lw_check: Node (100919), item
> (17), [12e38:76785f68775361:fd23c2] (stat40): wrong size (633454),
> Fixed to (0).
> FSCK: ccreg40_repair.c: 77: ccreg40_check_item: The file
> [12e38:767465364b4534:1185752] (ccreg40), node [1795], item [0]: item
> of the wrong cluster size (8192) found, Should be (65536). Fixed.
> FSCK: ccreg40_repair.c: 77: ccreg40_check_item: The file
> [12e38:767465364d4534:1185753] (ccreg40), node [3211856], item [1]:
> item of the wrong cluster size (2048) found, Should be (65536). Fixed.
> FSCK: ccreg40_repair.c: 77: ccreg40_check_item: The file
> [12e38:76746538463156:118493e] (ccreg40), node [3211856], item [2]:
> item of the wrong cluster size (268435456) found, Should be (65536).
> Fixed.
>
> That's the damage after using 2.6.39 with light workload - compilation
> of few small packages with -j3
>
>> Thanks,
>> Ivan.
>>
>>>
>>> mount options noatime,onerror=remount-ro, partition type ccreg40
>>>
>>> > If I reformat partition and restore everything from backup do you
>>> > think it would stop?
>>> >
>>> >> Thanks,
>>> >> Ivan.
>>> >
>>> > Thanks,
>>> > Dushan
>>> >
>>> >>> On Fri, Dec 7, 2012 at 6:56 PM, Dušan Čolić <dusanc@gmail.com> wrote:
>>> >>> > Hello
>>> >>> >
>>> >>> > I'm using KVM for windows emulation and I have a ~3GB image file that
>>> >>> > I run it from.
>>> >>> > I started having problems with it lately on regular and ccreg40
>>> >>> > partitions (I tried same file on both)  using 3.6.6.
>>> >>> > Spammed output with a lot of these:
>>> >>> >
>>> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133047] reiser4[find(5806)]:
>>> >>> > parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
>>> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133047] WARNING: Wrong level
>>> >>> > found in node: 2 != 1
>>> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133048] reiser4[find(5806)]:
>>> >>> > key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
>>> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133048] WARNING: Error for
>>> >>> > inode 17802378 (-5)
>>> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133056] reiser4[find(5806)]:
>>> >>> > parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
>>> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133056] WARNING: Wrong level
>>> >>> > found in node: 2 != 1
>>> >>> >
>>> >>> >
>>> >>> > Dec  7 17:26:23 krshina3 kernel: [38539.089191] reiser4[gdm(2676)]:
>>> >>> > cbk_level_lookup (fs/reiser4/search.c:963)[vs-3533]:
>>> >>> > Dec  7 17:26:23 krshina3 kernel: [38539.089194] reiser4[gdm(2676)]:
>>> >>> > key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
>>> >>> > Dec  7 17:26:23 krshina3 kernel: [38539.090837]
>>> >>> > reiser4[gnome-screensav(3503)]: cbk_level_lookup
>>> >>> > (fs/reiser4/search.c:963)[vs-3533]:
>>> >>> > Dec  7 17:26:23 krshina3 kernel: [38539.090840]
>>> >>> > reiser4[gnome-screensav(3503)]: key_warning
>>> >>> >
>>> >>> > (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
>>> >>> >  I fscked the FSes and had some errors that were corrected.
>>> >>> >
>>> >>> > Now I started geting these and I can't kill the offending process:
>>> >>> >
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274343] INFO: task
>>> >>> > qemu-system-x86:4156 blocked for more than 120 seconds.
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274347] "echo 0 >
>>> >>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274349] qemu-system-x86 D
>>> >>> > 0000000000000001     0  4156   3654 0x00000000
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274354]  ffff880206dd7990
>>> >>> > 0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274357]  0000000000011240
>>> >>> > ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274361]  0000000000011240
>>> >>> > ffff880206dd7990 0000000000011240 ffff8801def2e000
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274364] Call Trace:
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274372]  [<ffffffff810aec97>]
>>> >>> > ? pagevec_lookup_tag+0x18/0x21
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274375]  [<ffffffff810a528a>]
>>> >>> > ? filemap_fdatawait_range+0xff/0x144
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274380]  [<ffffffff81146b09>]
>>> >>> > ? writepages_unix_file+0x36e/0x3ce
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274384]  [<ffffffff810ab9f9>]
>>> >>> > ? global_dirtyable_memory+0xd/0x2c
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274389]  [<ffffffff81489dac>]
>>> >>> > ? __mutex_lock_slowpath+0xd0/0x116
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274392]  [<ffffffff8148a096>]
>>> >>> > ? mutex_lock+0x1a/0x2d
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274396]  [<ffffffff81143b73>]
>>> >>> > ? reiser4_sync_file_common+0x58/0xcd
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274400]  [<ffffffff81147a81>]
>>> >>> > ? write_unix_file+0x442/0x4b7
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274404]  [<ffffffff811498b9>]
>>> >>> > ? reiser4_write_careful+0xb8/0x450
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274409]  [<ffffffff810da90f>]
>>> >>> > ? vfs_write+0xaf/0x149
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274412]  [<ffffffff810dab49>]
>>> >>> > ? sys_pwrite64+0x53/0x71
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274415]  [<ffffffff8148c3e2>]
>>> >>> > ? system_call_fastpath+0x16/0x1b
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274418] INFO: task
>>> >>> > qemu-system-x86:4162 blocked for more than 120 seconds.
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274419] "echo 0 >
>>> >>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274420] qemu-system-x86 D
>>> >>> > 0000000000000000     0  4162   3654 0x00000000
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274423]  ffff88020f7cacf0
>>> >>> > 0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274426]  0000000000011240
>>> >>> > ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274429]  0000000000011240
>>> >>> > ffff88020f7cacf0 0000000000011240 ffff8801e007e000
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274432] Call Trace:
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274437]  [<ffffffff810aec97>]
>>> >>> > ? pagevec_lookup_tag+0x18/0x21
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274439]  [<ffffffff810a528a>]
>>> >>> > ? filemap_fdatawait_range+0xff/0x144
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274443]  [<ffffffff81146b09>]
>>> >>> > ? writepages_unix_file+0x36e/0x3ce
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274447]  [<ffffffff81489dac>]
>>> >>> > ? __mutex_lock_slowpath+0xd0/0x116
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274450]  [<ffffffff8148a096>]
>>> >>> > ? mutex_lock+0x1a/0x2d
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274453]  [<ffffffff81143b73>]
>>> >>> > ? reiser4_sync_file_common+0x58/0xcd
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274457]  [<ffffffff810fa6d8>]
>>> >>> > ? do_fsync+0x29/0x47
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274460]  [<ffffffff810fa716>]
>>> >>> > ? sys_fdatasync+0xe/0x15
>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274462]  [<ffffffff8148c3e2>]
>>> >>> > ? system_call_fastpath+0x16/0x1b
>>> >>> > tail: unrecognized file system type 0x52345362 for
>>> >>> > '/var/log/messages'. please report this to bug-coreutils@gnu.org.
>>> >>> > reverting to polling
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266595] INFO: task
>>> >>> > qemu-system-x86:4156 blocked for more than 120 seconds.
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266599] "echo 0 >
>>> >>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266601] qemu-system-x86 D
>>> >>> > 0000000000000001     0  4156   3654 0x00000000
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266605]  ffff880206dd7990
>>> >>> > 0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266609]  0000000000011240
>>> >>> > ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266613]  0000000000011240
>>> >>> > ffff880206dd7990 0000000000011240 ffff8801def2e000
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266616] Call Trace:
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266625]  [<ffffffff810aec97>]
>>> >>> > ? pagevec_lookup_tag+0x18/0x21
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266629]  [<ffffffff810a528a>]
>>> >>> > ? filemap_fdatawait_range+0xff/0x144
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266634]  [<ffffffff81146b09>]
>>> >>> > ? writepages_unix_file+0x36e/0x3ce
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266638]  [<ffffffff810ab9f9>]
>>> >>> > ? global_dirtyable_memory+0xd/0x2c
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266643]  [<ffffffff81489dac>]
>>> >>> > ? __mutex_lock_slowpath+0xd0/0x116
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266646]  [<ffffffff8148a096>]
>>> >>> > ? mutex_lock+0x1a/0x2d
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266650]  [<ffffffff81143b73>]
>>> >>> > ? reiser4_sync_file_common+0x58/0xcd
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266654]  [<ffffffff81147a81>]
>>> >>> > ? write_unix_file+0x442/0x4b7
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266658]  [<ffffffff811498b9>]
>>> >>> > ? reiser4_write_careful+0xb8/0x450
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266664]  [<ffffffff810da90f>]
>>> >>> > ? vfs_write+0xaf/0x149
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266666]  [<ffffffff810dab49>]
>>> >>> > ? sys_pwrite64+0x53/0x71
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266670]  [<ffffffff8148c3e2>]
>>> >>> > ? system_call_fastpath+0x16/0x1b
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266672] INFO: task
>>> >>> > qemu-system-x86:4162 blocked for more than 120 seconds.
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266674] "echo 0 >
>>> >>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266675] qemu-system-x86 D
>>> >>> > 0000000000000000     0  4162   3654 0x00000000
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266678]  ffff88020f7cacf0
>>> >>> > 0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266681]  0000000000011240
>>> >>> > ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266684]  0000000000011240
>>> >>> > ffff88020f7cacf0 0000000000011240 ffff8801e007e000
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266687] Call Trace:
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266691]  [<ffffffff810aec97>]
>>> >>> > ? pagevec_lookup_tag+0x18/0x21
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266694]  [<ffffffff810a528a>]
>>> >>> > ? filemap_fdatawait_range+0xff/0x144
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266698]  [<ffffffff81146b09>]
>>> >>> > ? writepages_unix_file+0x36e/0x3ce
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266702]  [<ffffffff81489dac>]
>>> >>> > ? __mutex_lock_slowpath+0xd0/0x116
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266705]  [<ffffffff8148a096>]
>>> >>> > ? mutex_lock+0x1a/0x2d
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266708]  [<ffffffff81143b73>]
>>> >>> > ? reiser4_sync_file_common+0x58/0xcd
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266712]  [<ffffffff810fa6d8>]
>>> >>> > ? do_fsync+0x29/0x47
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266714]  [<ffffffff810fa716>]
>>> >>> > ? sys_fdatasync+0xe/0x15
>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266717]  [<ffffffff8148c3e2>]
>>> >>> > ? system_call_fastpath+0x16/0x1b
>>> >>> >
>>> >>> >
>>> >>> > File runs fine from FAT32 partition
>>> >>> >
>>> >>> > If I can do something, or you need any info tell me please
>>> >>> >
>>> >>> > Thanks
>>> >>> > Dushan
>>> >>>
>>> >>> --
>>> >>> To unsubscribe from this list: send the line "unsubscribe
>>> >>> reiserfs-devel" in the body of a message to majordomo@vger.kernel.org
>>> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: R4 problem started with 2.6.39 and still there with 3.6.6
  2012-12-09 17:54             ` Dušan Čolić
@ 2012-12-10 20:08               ` Dušan Čolić
  2012-12-11 15:08               ` Kernel config option which causes reiser4 to be instable Ivan Shapovalov
  1 sibling, 0 replies; 28+ messages in thread
From: Dušan Čolić @ 2012-12-10 20:08 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel

On Sun, Dec 9, 2012 at 6:54 PM, Dušan Čolić <dusanc@gmail.com> wrote:
> On Sun, Dec 9, 2012 at 6:12 PM, Dušan Čolić <dusanc@gmail.com> wrote:
>> On Sun, Dec 9, 2012 at 4:38 PM, Ivan Shapovalov <intelfx100@gmail.com> wrote:
>>> On 09 December 2012 17:29:58 Dušan Čolić wrote:
>>>> On Sun, Dec 9, 2012 at 5:19 PM, Dušan Čolić <dusanc@gmail.com> wrote:
>>>> > On Sun, Dec 9, 2012 at 4:17 PM, Ivan Shapovalov <intelfx100@gmail.com>
>>> wrote:
>>>> >> On 07 December 2012 19:34:45 Dušan Čolić wrote:
>>>> >>> Ok, on just fscked partition I now get:
>>>> >>>
>>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584096] reiser4[sshd(5058)]:
>>>> >>> find_cluster_item
>>>> >>> (fs/reiser4/plugin/file/cryptcompress.c:781)[edward-1608]:
>>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584096] WARNING: Expected item
>>>> >>> not found. Fsck?
>>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584104] reiser4[sshd(5058)]:
>>>> >>> dc_check_checksum
>>>> >>> (fs/reiser4/plugin/file/cryptcompress.c:1023)[edward-156]:
>>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584104] WARNING: Bad disk
>>>> >>> cluster checksum 1869768224, (should be 950540942) Fsck?
>>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584104]
>>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584109] reiser4[sshd(5058)]:
>>>> >>> reiser4_inflate_cluster
>>>> >>> (fs/reiser4/plugin/file/cryptcompress.c:1190)[edward-1460]:
>>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.584109] WARNING: Inode
>>>> >>> 14592305: disk cluster 0 looks corrupted
>>>> >>> Dec  7 19:31:43 krshina3 sshd[5056]: Accepted keyboard-interactive/pam
>>>> >>> for root from 192.168.1.10 port 7531 ssh2
>>>> >>> Dec  7 19:31:43 krshina3 sshd[5056]: pam_unix(sshd:session): session
>>>> >>> opened for user root by (uid=0)
>>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637085] reiser4[bash(5066)]:
>>>> >>> find_cluster_item
>>>> >>> (fs/reiser4/plugin/file/cryptcompress.c:781)[edward-1608]:
>>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637085] WARNING: Expected item
>>>> >>> not found. Fsck?
>>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637094] reiser4[bash(5066)]:
>>>> >>> dc_check_checksum
>>>> >>> (fs/reiser4/plugin/file/cryptcompress.c:1023)[edward-156]:
>>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637094] WARNING: Bad disk
>>>> >>> cluster checksum -1945338855, (should be 944271739) Fsck?
>>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637094]
>>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637098] reiser4[bash(5066)]:
>>>> >>> reiser4_inflate_cluster
>>>> >>> (fs/reiser4/plugin/file/cryptcompress.c:1190)[edward-1460]:
>>>> >>> Dec  7 19:31:43 krshina3 kernel: [ 2069.637098] WARNING: Inode
>>>> >>> 15185444: disk cluster 0 looks corrupted
>>>> >>> tail: unrecognized file system type 0x52345362 for
>>>> >>> '/var/log/messages'. please report this to bug-coreutils@gnu.org.
>>>> >>> reverting to polling
>>>> >>>
>>>> >>> This is getting bad, I'm going back to 2.6.39 :D
>>>> >>
>>>> >> This is exactly what I have here on 3.<anything>.<anything> with a plain
>>>> >> KDE desktop and "a bit of everything" workload. It seems not to be
>>>> >> related to QEMU or loopbacks or something - just intensive random I/O is
>>>> >> what triggers this, no specific patterns I've got so far.
>>>> >> Please tell if it stops happening on 2.6.39 (but remember that it may be
>>>> >> silent for a while) so I can bisect with precision :)
>>>> >
>>>> > It happens with 2.6.39 now too, but with 2.6.39 computer hangs while
>>>> > with 3.6.6 it still works after 24hrs
>>>>
>>>> Sorry I messed up, now with 2.6.39 I get same errors on I/O, but files
>>>> look OK. I only got files in lost+found when system oopsed.
>>>> Funny thing is I never had these errors until now and i used 2.6.39
>>>> from Nov 2011.
>>>
>>> So what did change from Nov 2011? Have you had the QEMU image on a r4
>>> partition back to then?
>> Yeah same usage scenario, Gentoo machine with months of uptime with
>> lots of recompiling and QEMU-KVM virtual machine with WinXP
>> Did a small change in kernel config in July 2012, same kernel 2.6.39.4
>>
>>> And... try to revert to piix, maybe.
>>>
>
> Ok I reverted back to old kernel that worked, 2.6.39 from july, and am
> currently compiling GCC and using QEMU and everything works fine so
> far.

More than 24hrs later and whole system compiled twice (2x280 packages
like gcc, glibc etc.) + regular usage and no problems with kernel
2.6.39.4 with old config.


> I looked at the configs and looks like I did some more messing than
> switching from piix to ahci, look for yourself if anything jumps as
> buggy:
>
>  diff -uprN config.old config.new
> --- config.old  2012-12-09 18:41:04.372679078 +0100
> +++ config.new  2012-12-09 18:41:57.359007631 +0100
> @@ -1,7 +1,7 @@
>  #
>  # Automatically generated make config: don't edit
>  # Linux/x86_64 2.6.39.4 Kernel Configuration
> -# Sat Jun 23 20:57:39 2012
> +# Tue Dec  4 16:56:13 2012
>  #
>  CONFIG_64BIT=y
>  # CONFIG_X86_32 is not set
> @@ -284,9 +284,9 @@ CONFIG_SWIOTLB=y
>  CONFIG_IOMMU_HELPER=y
>  # CONFIG_IOMMU_API is not set
>  # CONFIG_MAXSMP is not set
> -CONFIG_NR_CPUS=2
> -# CONFIG_SCHED_SMT is not set
> -# CONFIG_SCHED_MC is not set
> +CONFIG_NR_CPUS=4
> +CONFIG_SCHED_SMT=y
> +CONFIG_SCHED_MC=y
>  # CONFIG_IRQ_TIME_ACCOUNTING is not set
>  # CONFIG_PREEMPT_NONE is not set
>  CONFIG_PREEMPT_VOLUNTARY=y
> @@ -324,7 +324,8 @@ CONFIG_HAVE_MEMBLOCK=y
>  # CONFIG_MEMORY_HOTPLUG is not set
>  CONFIG_PAGEFLAGS_EXTENDED=y
>  CONFIG_SPLIT_PTLOCK_CPUS=4
> -# CONFIG_COMPACTION is not set
> +CONFIG_COMPACTION=y
> +CONFIG_MIGRATION=y
>  CONFIG_PHYS_ADDR_T_64BIT=y
>  CONFIG_ZONE_DMA_FLAG=1
>  CONFIG_BOUNCE=y
> @@ -334,7 +335,9 @@ CONFIG_MMU_NOTIFIER=y
>  CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
>  CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y
>  # CONFIG_MEMORY_FAILURE is not set
> -# CONFIG_TRANSPARENT_HUGEPAGE is not set
> +CONFIG_TRANSPARENT_HUGEPAGE=y
> +CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
> +# CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set
>  # CONFIG_X86_CHECK_BIOS_CORRUPTION is not set
>  CONFIG_X86_RESERVE_LOW=64
>  CONFIG_MTRR=y
> @@ -840,97 +843,18 @@ CONFIG_SCSI_CONSTANTS=y
>  CONFIG_ATA=y
>  # CONFIG_ATA_NONSTANDARD is not set
>  CONFIG_ATA_VERBOSE_ERROR=y
> -# CONFIG_ATA_ACPI is not set
> +CONFIG_ATA_ACPI=y
>  # CONFIG_SATA_PMP is not set
>
>  #
>  # Controllers with non-SFF native interface
>  #
> -# CONFIG_SATA_AHCI is not set
> +CONFIG_SATA_AHCI=y
>  # CONFIG_SATA_AHCI_PLATFORM is not set
>  # CONFIG_SATA_INIC162X is not set
>  # CONFIG_SATA_ACARD_AHCI is not set
>  # CONFIG_SATA_SIL24 is not set
> -CONFIG_ATA_SFF=y
> -
> -#
> -# SFF controllers with custom DMA interface
> -#
> -# CONFIG_PDC_ADMA is not set
> -# CONFIG_SATA_QSTOR is not set
> -# CONFIG_SATA_SX4 is not set
> -CONFIG_ATA_BMDMA=y
> -
> -#
> -# SATA SFF controllers with BMDMA
> -#
> -CONFIG_ATA_PIIX=y
> -# CONFIG_SATA_MV is not set
> -# CONFIG_SATA_NV is not set
> -# CONFIG_SATA_PROMISE is not set
> -# CONFIG_SATA_SIL is not set
> -# CONFIG_SATA_SIS is not set
> -# CONFIG_SATA_SVW is not set
> -# CONFIG_SATA_ULI is not set
> -# CONFIG_SATA_VIA is not set
> -# CONFIG_SATA_VITESSE is not set
> -
> -#
> -# PATA SFF controllers with BMDMA
> -#
> -# CONFIG_PATA_ALI is not set
> -# CONFIG_PATA_AMD is not set
> -# CONFIG_PATA_ARASAN_CF is not set
> -# CONFIG_PATA_ARTOP is not set
> -# CONFIG_PATA_ATIIXP is not set
> -# CONFIG_PATA_ATP867X is not set
> -# CONFIG_PATA_CMD64X is not set
> -# CONFIG_PATA_CS5520 is not set
> -# CONFIG_PATA_CS5530 is not set
> -# CONFIG_PATA_CS5536 is not set
> -# CONFIG_PATA_CYPRESS is not set
> -# CONFIG_PATA_EFAR is not set
> -# CONFIG_PATA_HPT366 is not set
> -# CONFIG_PATA_HPT37X is not set
> -# CONFIG_PATA_HPT3X2N is not set
> -# CONFIG_PATA_HPT3X3 is not set
> -# CONFIG_PATA_IT8213 is not set
> -# CONFIG_PATA_IT821X is not set
> -# CONFIG_PATA_JMICRON is not set
> -# CONFIG_PATA_MARVELL is not set
> -# CONFIG_PATA_NETCELL is not set
> -# CONFIG_PATA_NINJA32 is not set
> -# CONFIG_PATA_NS87415 is not set
> -# CONFIG_PATA_OLDPIIX is not set
> -# CONFIG_PATA_OPTIDMA is not set
> -# CONFIG_PATA_PDC2027X is not set
> -# CONFIG_PATA_PDC_OLD is not set
> -# CONFIG_PATA_RADISYS is not set
> -# CONFIG_PATA_RDC is not set
> -# CONFIG_PATA_SC1200 is not set
> -# CONFIG_PATA_SCH is not set
> -# CONFIG_PATA_SERVERWORKS is not set
> -# CONFIG_PATA_SIL680 is not set
> -# CONFIG_PATA_SIS is not set
> -# CONFIG_PATA_TOSHIBA is not set
> -# CONFIG_PATA_TRIFLEX is not set
> -# CONFIG_PATA_VIA is not set
> -# CONFIG_PATA_WINBOND is not set
> -
> -#
> -# PIO-only SFF controllers
> -#
> -# CONFIG_PATA_CMD640_PCI is not set
> -# CONFIG_PATA_MPIIX is not set
> -# CONFIG_PATA_NS87410 is not set
> -# CONFIG_PATA_OPTI is not set
> -# CONFIG_PATA_RZ1000 is not set
> -
> -#
> -# Generic fallback / legacy drivers
> -#
> -# CONFIG_ATA_GENERIC is not set
> -# CONFIG_PATA_LEGACY is not set
> +# CONFIG_ATA_SFF is not set
>  CONFIG_MD=y
>  CONFIG_BLK_DEV_MD=y
>  CONFIG_MD_AUTODETECT=y
> @@ -985,9 +909,9 @@ CONFIG_R8169=y
>  # CONFIG_BNX2 is not set
>  # CONFIG_CNIC is not set
>  # CONFIG_QLA3XXX is not set
> -# CONFIG_ATL1 is not set
> -# CONFIG_ATL1E is not set
> -# CONFIG_ATL1C is not set
> +CONFIG_ATL1=y
> +CONFIG_ATL1E=y
> +CONFIG_ATL1C=y
>  # CONFIG_JME is not set
>  # CONFIG_STMMAC_ETH is not set
>  # CONFIG_PCH_GBE is not set
> @@ -1126,6 +1050,8 @@ CONFIG_SERIAL_8250_RUNTIME_UARTS=4
>  #
>  # Non-8250 serial port support
>  #
> +# CONFIG_SERIAL_MAX3100 is not set
> +# CONFIG_SERIAL_MAX3107 is not set
>  # CONFIG_SERIAL_MFD_HSU is not set
>  CONFIG_SERIAL_CORE=y
>  CONFIG_SERIAL_CORE_CONSOLE=y
> @@ -1208,7 +1134,25 @@ CONFIG_I2C_PIIX4=y
>  # CONFIG_I2C_DEBUG_CORE is not set
>  # CONFIG_I2C_DEBUG_ALGO is not set
>  # CONFIG_I2C_DEBUG_BUS is not set
> -# CONFIG_SPI is not set
> +CONFIG_SPI=y
> +# CONFIG_SPI_DEBUG is not set
> +CONFIG_SPI_MASTER=y
> +
> +#
> +# SPI Master Controller Drivers
> +#
> +# CONFIG_SPI_ALTERA is not set
> +# CONFIG_SPI_BITBANG is not set
> +# CONFIG_SPI_PXA2XX_PCI is not set
> +# CONFIG_SPI_TOPCLIFF_PCH is not set
> +# CONFIG_SPI_XILINX is not set
> +# CONFIG_SPI_DESIGNWARE is not set
> +
> +#
> +# SPI Protocol Masters
> +#
> +# CONFIG_SPI_SPIDEV is not set
> +# CONFIG_SPI_TLE62X0 is not set
>
>  #
>  # PPS support
> @@ -1241,6 +1185,7 @@ CONFIG_HWMON=y
>  # CONFIG_SENSORS_ABITUGURU3 is not set
>  # CONFIG_SENSORS_AD7414 is not set
>  # CONFIG_SENSORS_AD7418 is not set
> +# CONFIG_SENSORS_ADCXX is not set
>  # CONFIG_SENSORS_ADM1021 is not set
>  # CONFIG_SENSORS_ADM1025 is not set
>  # CONFIG_SENSORS_ADM1026 is not set
> @@ -1272,6 +1217,7 @@ CONFIG_HWMON=y
>  # CONFIG_SENSORS_JC42 is not set
>  # CONFIG_SENSORS_LINEAGE is not set
>  # CONFIG_SENSORS_LM63 is not set
> +# CONFIG_SENSORS_LM70 is not set
>  # CONFIG_SENSORS_LM73 is not set
>  # CONFIG_SENSORS_LM75 is not set
>  # CONFIG_SENSORS_LM77 is not set
> @@ -1288,6 +1234,7 @@ CONFIG_HWMON=y
>  # CONFIG_SENSORS_LTC4245 is not set
>  # CONFIG_SENSORS_LTC4261 is not set
>  # CONFIG_SENSORS_LM95241 is not set
> +# CONFIG_SENSORS_MAX1111 is not set
>  # CONFIG_SENSORS_MAX1619 is not set
>  # CONFIG_SENSORS_MAX6639 is not set
>  # CONFIG_SENSORS_MAX6650 is not set
> @@ -1307,6 +1254,7 @@ CONFIG_HWMON=y
>  # CONFIG_SENSORS_SCH5627 is not set
>  # CONFIG_SENSORS_ADS1015 is not set
>  # CONFIG_SENSORS_ADS7828 is not set
> +# CONFIG_SENSORS_ADS7871 is not set
>  # CONFIG_SENSORS_AMC6821 is not set
>  # CONFIG_SENSORS_THMC50 is not set
>  # CONFIG_SENSORS_TMP102 is not set
> @@ -1353,7 +1301,7 @@ CONFIG_AGP=y
>  # CONFIG_AGP_SIS is not set
>  # CONFIG_AGP_VIA is not set
>  CONFIG_VGA_ARB=y
> -CONFIG_VGA_ARB_MAX_GPUS=2
> +CONFIG_VGA_ARB_MAX_GPUS=3
>  # CONFIG_VGA_SWITCHEROO is not set
>  CONFIG_DRM=y
>  CONFIG_DRM_KMS_HELPER=y
> @@ -1431,7 +1379,12 @@ CONFIG_FB_CFB_IMAGEBLIT=y
>  # CONFIG_FB_BROADSHEET is not set
>  CONFIG_BACKLIGHT_LCD_SUPPORT=y
>  CONFIG_LCD_CLASS_DEVICE=y
> +# CONFIG_LCD_LTV350QV is not set
> +# CONFIG_LCD_TDO24M is not set
> +# CONFIG_LCD_VGG2432A4 is not set
>  # CONFIG_LCD_PLATFORM is not set
> +# CONFIG_LCD_S6E63M0 is not set
> +# CONFIG_LCD_LD9040 is not set
>  CONFIG_BACKLIGHT_CLASS_DEVICE=y
>  CONFIG_BACKLIGHT_GENERIC=y
>  # CONFIG_BACKLIGHT_PROGEAR is not set
> @@ -1571,6 +1524,7 @@ CONFIG_SND_HDA_GENERIC=y
>  # CONFIG_SND_VIRTUOSO is not set
>  # CONFIG_SND_VX222 is not set
>  # CONFIG_SND_YMFPCI is not set
> +CONFIG_SND_SPI=y
>  CONFIG_SND_USB=y
>  CONFIG_SND_USB_AUDIO=y
>  # CONFIG_SND_USB_UA101 is not set
> @@ -1769,7 +1723,26 @@ CONFIG_USB_STORAGE=y
>  # CONFIG_NFC_DEVICES is not set
>  # CONFIG_ACCESSIBILITY is not set
>  # CONFIG_INFINIBAND is not set
> -# CONFIG_EDAC is not set
> +CONFIG_EDAC=y
> +
> +#
> +# Reporting subsystems
> +#
> +# CONFIG_EDAC_DEBUG is not set
> +CONFIG_EDAC_DECODE_MCE=y
> +# CONFIG_EDAC_MCE_INJ is not set
> +CONFIG_EDAC_MM_EDAC=y
> +# CONFIG_EDAC_AMD64 is not set
> +# CONFIG_EDAC_E752X is not set
> +# CONFIG_EDAC_I82975X is not set
> +# CONFIG_EDAC_I3000 is not set
> +# CONFIG_EDAC_I3200 is not set
> +# CONFIG_EDAC_X38 is not set
> +# CONFIG_EDAC_I5400 is not set
> +# CONFIG_EDAC_I7CORE is not set
> +# CONFIG_EDAC_I5000 is not set
> +# CONFIG_EDAC_I5100 is not set
> +# CONFIG_EDAC_I7300 is not set
>  CONFIG_RTC_LIB=y
>  CONFIG_RTC_CLASS=y
>  CONFIG_RTC_HCTOSYS=y
> @@ -1809,6 +1782,14 @@ CONFIG_RTC_INTF_DEV=y
>  #
>  # SPI RTC drivers
>  #
> +# CONFIG_RTC_DRV_M41T94 is not set
> +# CONFIG_RTC_DRV_DS1305 is not set
> +# CONFIG_RTC_DRV_DS1390 is not set
> +# CONFIG_RTC_DRV_MAX6902 is not set
> +# CONFIG_RTC_DRV_R9701 is not set
> +# CONFIG_RTC_DRV_RS5C348 is not set
> +# CONFIG_RTC_DRV_DS3234 is not set
> +# CONFIG_RTC_DRV_PCF2123 is not set
>
>  #
>  # Platform RTC drivers
>
> Thanks a lot for the help
>
> Dushan
>
>> I'll try.
>> I;m currently fscking the /fs and I got:
>> FSCK: node.c: 108: repair_node_items_check: Node (3205815), items (27)
>> and (28): Wrong order of keys.
>> FSCK: filter.c: 407: repair_filter_update_traverse: Node (3205815):
>> the node is broken. Pointed from the node (2871633), item (12), unit
>> (0). The whole subtree is skipped.
>>
>> --build-fs did:
>> FSCK: node.c: 108: repair_node_items_check: Node (3205815), items (27)
>> and (28): Wrong order of keys.
>> FSCK: filter.c: 407: repair_filter_update_traverse: Node (3205815):
>> the node is unrecoverable. Pointed from the node (2871633), item (12),
>> unit (0). Removed.
>> FSCK: obj40_repair.c: 146: obj40_check_bytes_report: Node (100919),
>> item (17), [12e38:76785f68775361:fd23c2] (stat40): wrong bytes
>> (634880), Fixed to (0).
>> FSCK: obj40_repair.c: 373: obj40_stat_lw_check: Node (100919), item
>> (17), [12e38:76785f68775361:fd23c2] (stat40): wrong size (633454),
>> Fixed to (0).
>> FSCK: ccreg40_repair.c: 77: ccreg40_check_item: The file
>> [12e38:767465364b4534:1185752] (ccreg40), node [1795], item [0]: item
>> of the wrong cluster size (8192) found, Should be (65536). Fixed.
>> FSCK: ccreg40_repair.c: 77: ccreg40_check_item: The file
>> [12e38:767465364d4534:1185753] (ccreg40), node [3211856], item [1]:
>> item of the wrong cluster size (2048) found, Should be (65536). Fixed.
>> FSCK: ccreg40_repair.c: 77: ccreg40_check_item: The file
>> [12e38:76746538463156:118493e] (ccreg40), node [3211856], item [2]:
>> item of the wrong cluster size (268435456) found, Should be (65536).
>> Fixed.
>>
>> That's the damage after using 2.6.39 with light workload - compilation
>> of few small packages with -j3
>>
>>> Thanks,
>>> Ivan.
>>>
>>>>
>>>> mount options noatime,onerror=remount-ro, partition type ccreg40
>>>>
>>>> > If I reformat partition and restore everything from backup do you
>>>> > think it would stop?
>>>> >
>>>> >> Thanks,
>>>> >> Ivan.
>>>> >
>>>> > Thanks,
>>>> > Dushan
>>>> >
>>>> >>> On Fri, Dec 7, 2012 at 6:56 PM, Dušan Čolić <dusanc@gmail.com> wrote:
>>>> >>> > Hello
>>>> >>> >
>>>> >>> > I'm using KVM for windows emulation and I have a ~3GB image file that
>>>> >>> > I run it from.
>>>> >>> > I started having problems with it lately on regular and ccreg40
>>>> >>> > partitions (I tried same file on both)  using 3.6.6.
>>>> >>> > Spammed output with a lot of these:
>>>> >>> >
>>>> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133047] reiser4[find(5806)]:
>>>> >>> > parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
>>>> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133047] WARNING: Wrong level
>>>> >>> > found in node: 2 != 1
>>>> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133048] reiser4[find(5806)]:
>>>> >>> > key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
>>>> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133048] WARNING: Error for
>>>> >>> > inode 17802378 (-5)
>>>> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133056] reiser4[find(5806)]:
>>>> >>> > parse_node40 (fs/reiser4/plugin/node/node40.c:672)[nikita-494]:
>>>> >>> > Dec  7 03:30:02 krshina3 kernel: [15135.133056] WARNING: Wrong level
>>>> >>> > found in node: 2 != 1
>>>> >>> >
>>>> >>> >
>>>> >>> > Dec  7 17:26:23 krshina3 kernel: [38539.089191] reiser4[gdm(2676)]:
>>>> >>> > cbk_level_lookup (fs/reiser4/search.c:963)[vs-3533]:
>>>> >>> > Dec  7 17:26:23 krshina3 kernel: [38539.089194] reiser4[gdm(2676)]:
>>>> >>> > key_warning (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
>>>> >>> > Dec  7 17:26:23 krshina3 kernel: [38539.090837]
>>>> >>> > reiser4[gnome-screensav(3503)]: cbk_level_lookup
>>>> >>> > (fs/reiser4/search.c:963)[vs-3533]:
>>>> >>> > Dec  7 17:26:23 krshina3 kernel: [38539.090840]
>>>> >>> > reiser4[gnome-screensav(3503)]: key_warning
>>>> >>> >
>>>> >>> > (fs/reiser4/plugin/file_plugin_common.c:512)[nikita-717]:
>>>> >>> >  I fscked the FSes and had some errors that were corrected.
>>>> >>> >
>>>> >>> > Now I started geting these and I can't kill the offending process:
>>>> >>> >
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274343] INFO: task
>>>> >>> > qemu-system-x86:4156 blocked for more than 120 seconds.
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274347] "echo 0 >
>>>> >>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274349] qemu-system-x86 D
>>>> >>> > 0000000000000001     0  4156   3654 0x00000000
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274354]  ffff880206dd7990
>>>> >>> > 0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274357]  0000000000011240
>>>> >>> > ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274361]  0000000000011240
>>>> >>> > ffff880206dd7990 0000000000011240 ffff8801def2e000
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274364] Call Trace:
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274372]  [<ffffffff810aec97>]
>>>> >>> > ? pagevec_lookup_tag+0x18/0x21
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274375]  [<ffffffff810a528a>]
>>>> >>> > ? filemap_fdatawait_range+0xff/0x144
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274380]  [<ffffffff81146b09>]
>>>> >>> > ? writepages_unix_file+0x36e/0x3ce
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274384]  [<ffffffff810ab9f9>]
>>>> >>> > ? global_dirtyable_memory+0xd/0x2c
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274389]  [<ffffffff81489dac>]
>>>> >>> > ? __mutex_lock_slowpath+0xd0/0x116
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274392]  [<ffffffff8148a096>]
>>>> >>> > ? mutex_lock+0x1a/0x2d
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274396]  [<ffffffff81143b73>]
>>>> >>> > ? reiser4_sync_file_common+0x58/0xcd
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274400]  [<ffffffff81147a81>]
>>>> >>> > ? write_unix_file+0x442/0x4b7
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274404]  [<ffffffff811498b9>]
>>>> >>> > ? reiser4_write_careful+0xb8/0x450
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274409]  [<ffffffff810da90f>]
>>>> >>> > ? vfs_write+0xaf/0x149
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274412]  [<ffffffff810dab49>]
>>>> >>> > ? sys_pwrite64+0x53/0x71
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274415]  [<ffffffff8148c3e2>]
>>>> >>> > ? system_call_fastpath+0x16/0x1b
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274418] INFO: task
>>>> >>> > qemu-system-x86:4162 blocked for more than 120 seconds.
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274419] "echo 0 >
>>>> >>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274420] qemu-system-x86 D
>>>> >>> > 0000000000000000     0  4162   3654 0x00000000
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274423]  ffff88020f7cacf0
>>>> >>> > 0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274426]  0000000000011240
>>>> >>> > ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274429]  0000000000011240
>>>> >>> > ffff88020f7cacf0 0000000000011240 ffff8801e007e000
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274432] Call Trace:
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274437]  [<ffffffff810aec97>]
>>>> >>> > ? pagevec_lookup_tag+0x18/0x21
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274439]  [<ffffffff810a528a>]
>>>> >>> > ? filemap_fdatawait_range+0xff/0x144
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274443]  [<ffffffff81146b09>]
>>>> >>> > ? writepages_unix_file+0x36e/0x3ce
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274447]  [<ffffffff81489dac>]
>>>> >>> > ? __mutex_lock_slowpath+0xd0/0x116
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274450]  [<ffffffff8148a096>]
>>>> >>> > ? mutex_lock+0x1a/0x2d
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274453]  [<ffffffff81143b73>]
>>>> >>> > ? reiser4_sync_file_common+0x58/0xcd
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274457]  [<ffffffff810fa6d8>]
>>>> >>> > ? do_fsync+0x29/0x47
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274460]  [<ffffffff810fa716>]
>>>> >>> > ? sys_fdatasync+0xe/0x15
>>>> >>> > Dec  7 18:43:29 krshina3 kernel: [  720.274462]  [<ffffffff8148c3e2>]
>>>> >>> > ? system_call_fastpath+0x16/0x1b
>>>> >>> > tail: unrecognized file system type 0x52345362 for
>>>> >>> > '/var/log/messages'. please report this to bug-coreutils@gnu.org.
>>>> >>> > reverting to polling
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266595] INFO: task
>>>> >>> > qemu-system-x86:4156 blocked for more than 120 seconds.
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266599] "echo 0 >
>>>> >>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266601] qemu-system-x86 D
>>>> >>> > 0000000000000001     0  4156   3654 0x00000000
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266605]  ffff880206dd7990
>>>> >>> > 0000000000000086 ffff8801def2fc38 ffff88022ca38cf0
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266609]  0000000000011240
>>>> >>> > ffff8801def2ffd8 0000000000004000 ffff8801def2ffd8
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266613]  0000000000011240
>>>> >>> > ffff880206dd7990 0000000000011240 ffff8801def2e000
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266616] Call Trace:
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266625]  [<ffffffff810aec97>]
>>>> >>> > ? pagevec_lookup_tag+0x18/0x21
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266629]  [<ffffffff810a528a>]
>>>> >>> > ? filemap_fdatawait_range+0xff/0x144
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266634]  [<ffffffff81146b09>]
>>>> >>> > ? writepages_unix_file+0x36e/0x3ce
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266638]  [<ffffffff810ab9f9>]
>>>> >>> > ? global_dirtyable_memory+0xd/0x2c
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266643]  [<ffffffff81489dac>]
>>>> >>> > ? __mutex_lock_slowpath+0xd0/0x116
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266646]  [<ffffffff8148a096>]
>>>> >>> > ? mutex_lock+0x1a/0x2d
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266650]  [<ffffffff81143b73>]
>>>> >>> > ? reiser4_sync_file_common+0x58/0xcd
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266654]  [<ffffffff81147a81>]
>>>> >>> > ? write_unix_file+0x442/0x4b7
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266658]  [<ffffffff811498b9>]
>>>> >>> > ? reiser4_write_careful+0xb8/0x450
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266664]  [<ffffffff810da90f>]
>>>> >>> > ? vfs_write+0xaf/0x149
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266666]  [<ffffffff810dab49>]
>>>> >>> > ? sys_pwrite64+0x53/0x71
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266670]  [<ffffffff8148c3e2>]
>>>> >>> > ? system_call_fastpath+0x16/0x1b
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266672] INFO: task
>>>> >>> > qemu-system-x86:4162 blocked for more than 120 seconds.
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266674] "echo 0 >
>>>> >>> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266675] qemu-system-x86 D
>>>> >>> > 0000000000000000     0  4162   3654 0x00000000
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266678]  ffff88020f7cacf0
>>>> >>> > 0000000000000086 ffff8801e007fe18 ffffffff816ab3f0
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266681]  0000000000011240
>>>> >>> > ffff8801e007ffd8 0000000000004000 ffff8801e007ffd8
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266684]  0000000000011240
>>>> >>> > ffff88020f7cacf0 0000000000011240 ffff8801e007e000
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266687] Call Trace:
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266691]  [<ffffffff810aec97>]
>>>> >>> > ? pagevec_lookup_tag+0x18/0x21
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266694]  [<ffffffff810a528a>]
>>>> >>> > ? filemap_fdatawait_range+0xff/0x144
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266698]  [<ffffffff81146b09>]
>>>> >>> > ? writepages_unix_file+0x36e/0x3ce
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266702]  [<ffffffff81489dac>]
>>>> >>> > ? __mutex_lock_slowpath+0xd0/0x116
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266705]  [<ffffffff8148a096>]
>>>> >>> > ? mutex_lock+0x1a/0x2d
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266708]  [<ffffffff81143b73>]
>>>> >>> > ? reiser4_sync_file_common+0x58/0xcd
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266712]  [<ffffffff810fa6d8>]
>>>> >>> > ? do_fsync+0x29/0x47
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266714]  [<ffffffff810fa716>]
>>>> >>> > ? sys_fdatasync+0xe/0x15
>>>> >>> > Dec  7 18:45:29 krshina3 kernel: [  840.266717]  [<ffffffff8148c3e2>]
>>>> >>> > ? system_call_fastpath+0x16/0x1b
>>>> >>> >
>>>> >>> >
>>>> >>> > File runs fine from FAT32 partition
>>>> >>> >
>>>> >>> > If I can do something, or you need any info tell me please
>>>> >>> >
>>>> >>> > Thanks
>>>> >>> > Dushan
>>>> >>>
>>>> >>> --
>>>> >>> To unsubscribe from this list: send the line "unsubscribe
>>>> >>> reiserfs-devel" in the body of a message to majordomo@vger.kernel.org
>>>> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Kernel config option which causes reiser4 to be instable
  2012-12-09 17:54             ` Dušan Čolić
  2012-12-10 20:08               ` Dušan Čolić
@ 2012-12-11 15:08               ` Ivan Shapovalov
  2012-12-11 18:33                 ` Edward Shishkin
  1 sibling, 1 reply; 28+ messages in thread
From: Ivan Shapovalov @ 2012-12-11 15:08 UTC (permalink / raw)
  To: reiserfs-devel; +Cc: Dušan Čolić, edward.shishkin

Hello!

With help of Dušan Čolić <dusanc@gmail.com> who provided his kernel config 
diff I've found a kernel option which, when disabled, greatly reduces 
(hopefully to zero, but need time to verify it) corruption rate in reiser4.

It's CONFIG_TRANSPARENT_HUGEPAGE (or something which is used by it like 
CONFIG_COMPACTION or CONFIG_MIGRATION).
For now I'm testing it with CONFIG_TRANSPARENT_HUGEPAGE disabled on kernel 
3.6.10, and everything seems to be OK so far (so the workaround is version-
agnostic).

Edward, are there any guesses on what can make reiser4 choke on 
hugepages/compaction/migration? I'm not even barely familiar with the kernel 
internals.

Thanks,
Ivan.
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Kernel config option which causes reiser4 to be instable
  2012-12-11 15:08               ` Kernel config option which causes reiser4 to be instable Ivan Shapovalov
@ 2012-12-11 18:33                 ` Edward Shishkin
  2012-12-11 18:49                   ` Ivan Shapovalov
  2012-12-11 20:54                   ` Dušan Čolić
  0 siblings, 2 replies; 28+ messages in thread
From: Edward Shishkin @ 2012-12-11 18:33 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel, Dušan Čolić

On 12/11/2012 04:08 PM, Ivan Shapovalov wrote:
> Hello!

Hello.

>
> With help of Dušan Čolić <dusanc@gmail.com> who provided his kernel config
> diff I've found a kernel option which, when disabled, greatly reduces
> (hopefully to zero, but need time to verify it) corruption rate in reiser4.
>
> It's CONFIG_TRANSPARENT_HUGEPAGE (or something which is used by it like
> CONFIG_COMPACTION or CONFIG_MIGRATION).
> For now I'm testing it with CONFIG_TRANSPARENT_HUGEPAGE disabled

How long?

>   on kernel
> 3.6.10, and everything seems to be OK so far (so the workaround is version-
> agnostic).
>
> Edward, are there any guesses on what can make reiser4 choke on
> hugepages/compaction/migration?

TBH, no ideas. They (hugepages) are _transparent_.
It means we shouldn't suffer in theory ;)

>   I'm not even barely familiar with the kernel
> internals.
>
> Thanks,
> Ivan.
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Kernel config option which causes reiser4 to be instable
  2012-12-11 18:33                 ` Edward Shishkin
@ 2012-12-11 18:49                   ` Ivan Shapovalov
  2012-12-12  3:23                     ` Ivan Shapovalov
  2012-12-11 20:54                   ` Dušan Čolić
  1 sibling, 1 reply; 28+ messages in thread
From: Ivan Shapovalov @ 2012-12-11 18:49 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: reiserfs-devel, Dušan Čolić

On 11 December 2012 19:33:39 Edward Shishkin wrote:
> On 12/11/2012 04:08 PM, Ivan Shapovalov wrote:
> > Hello!
> 
> Hello.
> 
> > With help of Dušan Čolić <dusanc@gmail.com> who provided his kernel config
> > diff I've found a kernel option which, when disabled, greatly reduces
> > (hopefully to zero, but need time to verify it) corruption rate in
> > reiser4.
> > 
> > It's CONFIG_TRANSPARENT_HUGEPAGE (or something which is used by it like
> > CONFIG_COMPACTION or CONFIG_MIGRATION).
> > For now I'm testing it with CONFIG_TRANSPARENT_HUGEPAGE disabled
> 
> How long?

12 hours of indexing, scanning, compiling, repeated execution of
"find <mountpoint> -type f -exec grep wtf {} \;" and so on.

> 
> >   on kernel
> > 
> > 3.6.10, and everything seems to be OK so far (so the workaround is
> > version-
> > agnostic).
> > 
> > Edward, are there any guesses on what can make reiser4 choke on
> > hugepages/compaction/migration?
> 
> TBH, no ideas. They (hugepages) are _transparent_.
> It means we shouldn't suffer in theory ;)

Maybe it's actually migration who does the damage? If we don't lock the pages 
properly and they are "stolen" by the migration code... If this is the case, I 
shall eventually get corruptions with current setup (since 
migration/compaction is not disabled).
If I get them, I'll rebuild without migration at all and will see if 
corruptions disappear completely. (Then they should disappear, if the 
prediction is true.)

> 
> >   I'm not even barely familiar with the kernel
> > 
> > internals.
> > 
> > Thanks,
> > Ivan.
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Kernel config option which causes reiser4 to be instable
  2012-12-11 18:33                 ` Edward Shishkin
  2012-12-11 18:49                   ` Ivan Shapovalov
@ 2012-12-11 20:54                   ` Dušan Čolić
  2012-12-13 22:47                     ` Edward Shishkin
  1 sibling, 1 reply; 28+ messages in thread
From: Dušan Čolić @ 2012-12-11 20:54 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: Ivan Shapovalov, reiserfs-devel

On Tue, Dec 11, 2012 at 7:33 PM, Edward Shishkin
<edward.shishkin@gmail.com> wrote:
> On 12/11/2012 04:08 PM, Ivan Shapovalov wrote:
>>
>> Hello!
>
>
> Hello.
>
>
>>
>> With help of Dušan Čolić <dusanc@gmail.com> who provided his kernel config
>> diff I've found a kernel option which, when disabled, greatly reduces
>> (hopefully to zero, but need time to verify it) corruption rate in
>> reiser4.
>>
>> It's CONFIG_TRANSPARENT_HUGEPAGE (or something which is used by it like
>> CONFIG_COMPACTION or CONFIG_MIGRATION).
>> For now I'm testing it with CONFIG_TRANSPARENT_HUGEPAGE disabled
>
>
> How long?
>
For me the difference in uptime is months without vs hours with it :D
on 2.6.39.4

>
>>   on kernel
>> 3.6.10, and everything seems to be OK so far (so the workaround is
>> version-
>> agnostic).
>>
>> Edward, are there any guesses on what can make reiser4 choke on
>> hugepages/compaction/migration?
>
>
> TBH, no ideas. They (hugepages) are _transparent_.
> It means we shouldn't suffer in theory ;)
>
>
>>   I'm not even barely familiar with the kernel
>> internals.
>>
>> Thanks,
>> Ivan.
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Kernel config option which causes reiser4 to be instable
  2012-12-11 18:49                   ` Ivan Shapovalov
@ 2012-12-12  3:23                     ` Ivan Shapovalov
       [not found]                       ` <21180603.IycRkMTJZZ@intelfx-laptop>
  0 siblings, 1 reply; 28+ messages in thread
From: Ivan Shapovalov @ 2012-12-12  3:23 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: reiserfs-devel, Dušan Čolić

On 11 December 2012 22:49:47 Ivan Shapovalov wrote:
> On 11 December 2012 19:33:39 Edward Shishkin wrote:
> > On 12/11/2012 04:08 PM, Ivan Shapovalov wrote:
> > > Hello!
> > 
> > Hello.
> > 
> > > With help of Dušan Čolić <dusanc@gmail.com> who provided his kernel
> > > config
> > > diff I've found a kernel option which, when disabled, greatly reduces
> > > (hopefully to zero, but need time to verify it) corruption rate in
> > > reiser4.
> > > 
> > > It's CONFIG_TRANSPARENT_HUGEPAGE (or something which is used by it like
> > > CONFIG_COMPACTION or CONFIG_MIGRATION).
> > > For now I'm testing it with CONFIG_TRANSPARENT_HUGEPAGE disabled
> > 
> > How long?
> 
> 12 hours of indexing, scanning, compiling, repeated execution of
> "find <mountpoint> -type f -exec grep wtf {} \;" and so on.
> 
> > >   on kernel
> > > 
> > > 3.6.10, and everything seems to be OK so far (so the workaround is
> > > version-
> > > agnostic).
> > > 
> > > Edward, are there any guesses on what can make reiser4 choke on
> > > hugepages/compaction/migration?
> > 
> > TBH, no ideas. They (hugepages) are _transparent_.
> > It means we shouldn't suffer in theory ;)
> 
> Maybe it's actually migration who does the damage? If we don't lock the
> pages properly and they are "stolen" by the migration code... If this is
> the case, I shall eventually get corruptions with current setup (since
> migration/compaction is not disabled).
> If I get them, I'll rebuild without migration at all and will see if
> corruptions disappear completely. (Then they should disappear, if the
> prediction is true.)

...So, the kernel did not pass the overnight testing with usual errors of 
"cluster corrupted" and etc (which is just as planned).

I'm now rebuilding without CONFIG_COMPACTION and CONFIG_MIGRATION.

> 
> > >   I'm not even barely familiar with the kernel
> > > 
> > > internals.
> > > 
> > > Thanks,
> > > Ivan.
-- 
С уважением,
Шаповалов Иван.
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Kernel config option which causes reiser4 to be instable
       [not found]                       ` <21180603.IycRkMTJZZ@intelfx-laptop>
@ 2012-12-13 20:51                         ` Edward Shishkin
  0 siblings, 0 replies; 28+ messages in thread
From: Edward Shishkin @ 2012-12-13 20:51 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: reiserfs-devel, Dušan Čolić

On 12/13/2012 07:56 PM, Ivan Shapovalov wrote:
> On 12 December 2012 07:23:53 Ivan Shapovalov wrote:
>> On 11 December 2012 22:49:47 Ivan Shapovalov wrote:
>>> On 11 December 2012 19:33:39 Edward Shishkin wrote:
>>>> On 12/11/2012 04:08 PM, Ivan Shapovalov wrote:
>>>>> Hello!
>>>> Hello.
>>>>
>>>>> With help of Dušan Čolić <dusanc@gmail.com> who provided his kernel
>>>>> config
>>>>> diff I've found a kernel option which, when disabled, greatly reduces
>>>>> (hopefully to zero, but need time to verify it) corruption rate in
>>>>> reiser4.
>>>>>
>>>>> It's CONFIG_TRANSPARENT_HUGEPAGE (or something which is used by it
>>>>> like
>>>>> CONFIG_COMPACTION or CONFIG_MIGRATION).
>>>>> For now I'm testing it with CONFIG_TRANSPARENT_HUGEPAGE disabled
>>>> How long?
>>> 12 hours of indexing, scanning, compiling, repeated execution of
>>> "find <mountpoint> -type f -exec grep wtf {} \;" and so on.
>>>
>>>>>    on kernel
>>>>>
>>>>> 3.6.10, and everything seems to be OK so far (so the workaround is
>>>>> version-
>>>>> agnostic).
>>>>>
>>>>> Edward, are there any guesses on what can make reiser4 choke on
>>>>> hugepages/compaction/migration?
>>>> TBH, no ideas. They (hugepages) are _transparent_.
>>>> It means we shouldn't suffer in theory ;)
>>> Maybe it's actually migration who does the damage? If we don't lock the
>>> pages properly and they are "stolen" by the migration code... If this is
>>> the case, I shall eventually get corruptions with current setup (since
>>> migration/compaction is not disabled).
>>> If I get them, I'll rebuild without migration at all and will see if
>>> corruptions disappear completely. (Then they should disappear, if the
>>> prediction is true.)
>> ...So, the kernel did not pass the overnight testing with usual errors of
>> "cluster corrupted" and etc (which is just as planned).
>>
>> I'm now rebuilding without CONFIG_COMPACTION and CONFIG_MIGRATION.
> So far the kernel built without CONFIG_MIGRATION worked flawless. I gave it
> double testing time compared to the previous attempt - that is, 2 days.
>
> Regarding the actual solution (as plainly disabling kernel features doesn't
> count as one):
>
> I have a guess that the problem is related to default ->migratepage() of
> struct address_space_operations (which is not no-op, but a "generic"
> implementation by default).

Hmm, I didn't know about this new aop :(

Right now I can not surely say, that it is the default ->migratepage(),
who caused corruptions, however quick look showed, that it works
incorrectly: reiser4_writepage() doesn't necessarily make page clean.
So, yes, it would be better to disable migration for our mappings for
now..

Thank you for the finding!

Edward.

>
> So I've just attempted to "quickfix" the problem by explicitly setting the
> said pointer to fail_migrate_page and building 3.7.0 with all three
> migration-related options enabled. I'll let the new kernel to work overnight
> to see if it indeed fixes The Problem.
>
> Attaching the reiser4 patch for 3.7 (just rebased the one for 3.6 against new
> kernel version, no apparent API changes spotted by me) and that quickfix one-
> liner (completely untested as of now).
>
> Thanks,
> Ivan.
>
>>>>>    I'm not even barely familiar with the kernel
>>>>>
>>>>> internals.
>>>>>
>>>>> Thanks,
>>>>> Ivan.

--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Kernel config option which causes reiser4 to be instable
  2012-12-11 20:54                   ` Dušan Čolić
@ 2012-12-13 22:47                     ` Edward Shishkin
  2012-12-14  3:14                       ` Ivan Shapovalov
  0 siblings, 1 reply; 28+ messages in thread
From: Edward Shishkin @ 2012-12-13 22:47 UTC (permalink / raw)
  To: Dušan Čolić; +Cc: Ivan Shapovalov, reiserfs-devel

On 12/11/2012 09:54 PM, Dušan Čolić wrote:
> On Tue, Dec 11, 2012 at 7:33 PM, Edward Shishkin
> <edward.shishkin@gmail.com>  wrote:
>> On 12/11/2012 04:08 PM, Ivan Shapovalov wrote:
>>>
>>> Hello!
>>
>>
>> Hello.
>>
>>
>>>
>>> With help of Dušan Čolić<dusanc@gmail.com>  who provided his kernel config
>>> diff I've found a kernel option which, when disabled, greatly reduces
>>> (hopefully to zero, but need time to verify it) corruption rate in
>>> reiser4.
>>>
>>> It's CONFIG_TRANSPARENT_HUGEPAGE (or something which is used by it like
>>> CONFIG_COMPACTION or CONFIG_MIGRATION).
>>> For now I'm testing it with CONFIG_TRANSPARENT_HUGEPAGE disabled
>>
>>
>> How long?
>>
> For me the difference in uptime is months without vs hours with it :D
> on 2.6.39.4


Hm, indeed: my setup with enabled migration can not survive even one 
kernel compilation, while with disabled migration everything looks ok..


>
>>
>>>    on kernel
>>> 3.6.10, and everything seems to be OK so far (so the workaround is
>>> version-
>>> agnostic).
>>>
>>> Edward, are there any guesses on what can make reiser4 choke on
>>> hugepages/compaction/migration?
>>
>>
>> TBH, no ideas. They (hugepages) are _transparent_.
>> It means we shouldn't suffer in theory ;)
>>
>>
>>>    I'm not even barely familiar with the kernel
>>> internals.
>>>
>>> Thanks,
>>> Ivan.

--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Kernel config option which causes reiser4 to be instable
  2012-12-13 22:47                     ` Edward Shishkin
@ 2012-12-14  3:14                       ` Ivan Shapovalov
  2012-12-14 11:07                         ` Edward Shishkin
  0 siblings, 1 reply; 28+ messages in thread
From: Ivan Shapovalov @ 2012-12-14  3:14 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: Dušan Čolić, reiserfs-devel

On 13 December 2012 23:47:10 Edward Shishkin wrote:
> On 12/11/2012 09:54 PM, Dušan Čolić wrote:
> > On Tue, Dec 11, 2012 at 7:33 PM, Edward Shishkin
> > 
> > <edward.shishkin@gmail.com>  wrote:
> >> On 12/11/2012 04:08 PM, Ivan Shapovalov wrote:
> >>> Hello!
> >> 
> >> Hello.
> >> 
> >>> With help of Dušan Čolić<dusanc@gmail.com>  who provided his kernel
> >>> config
> >>> diff I've found a kernel option which, when disabled, greatly reduces
> >>> (hopefully to zero, but need time to verify it) corruption rate in
> >>> reiser4.
> >>> 
> >>> It's CONFIG_TRANSPARENT_HUGEPAGE (or something which is used by it like
> >>> CONFIG_COMPACTION or CONFIG_MIGRATION).
> >>> For now I'm testing it with CONFIG_TRANSPARENT_HUGEPAGE disabled
> >> 
> >> How long?
> > 
> > For me the difference in uptime is months without vs hours with it :D
> > on 2.6.39.4
> 
> Hm, indeed: my setup with enabled migration can not survive even one
> kernel compilation, while with disabled migration everything looks ok..

The overnight testing also showed no errors...
So shall we release reiser4-for-3.7 and announce FIXED(?) once again?
:)

Regards,
Ivan.

> 
> >>>    on kernel
> >>> 
> >>> 3.6.10, and everything seems to be OK so far (so the workaround is
> >>> version-
> >>> agnostic).
> >>> 
> >>> Edward, are there any guesses on what can make reiser4 choke on
> >>> hugepages/compaction/migration?
> >> 
> >> TBH, no ideas. They (hugepages) are _transparent_.
> >> It means we shouldn't suffer in theory ;)
> >> 
> >>>    I'm not even barely familiar with the kernel
> >>> 
> >>> internals.
> >>> 
> >>> Thanks,
> >>> Ivan.
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Kernel config option which causes reiser4 to be instable
  2012-12-14  3:14                       ` Ivan Shapovalov
@ 2012-12-14 11:07                         ` Edward Shishkin
  2012-12-14 18:20                           ` Ivan Shapovalov
  0 siblings, 1 reply; 28+ messages in thread
From: Edward Shishkin @ 2012-12-14 11:07 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: Dušan Čolić, reiserfs-devel

On 12/14/2012 04:14 AM, Ivan Shapovalov wrote:
> On 13 December 2012 23:47:10 Edward Shishkin wrote:
>> On 12/11/2012 09:54 PM, Dušan Čolić wrote:
>>> On Tue, Dec 11, 2012 at 7:33 PM, Edward Shishkin
>>>
>>> <edward.shishkin@gmail.com>   wrote:
>>>> On 12/11/2012 04:08 PM, Ivan Shapovalov wrote:
>>>>> Hello!
>>>>
>>>> Hello.
>>>>
>>>>> With help of Dušan Čolić<dusanc@gmail.com>   who provided his kernel
>>>>> config
>>>>> diff I've found a kernel option which, when disabled, greatly reduces
>>>>> (hopefully to zero, but need time to verify it) corruption rate in
>>>>> reiser4.
>>>>>
>>>>> It's CONFIG_TRANSPARENT_HUGEPAGE (or something which is used by it like
>>>>> CONFIG_COMPACTION or CONFIG_MIGRATION).
>>>>> For now I'm testing it with CONFIG_TRANSPARENT_HUGEPAGE disabled
>>>>
>>>> How long?
>>>
>>> For me the difference in uptime is months without vs hours with it :D
>>> on 2.6.39.4
>>
>> Hm, indeed: my setup with enabled migration can not survive even one
>> kernel compilation, while with disabled migration everything looks ok..
>
> The overnight testing also showed no errors...
> So shall we release reiser4-for-3.7 and announce FIXED(?) once again?
> :)


I worry that migration is mandatory option for hugepages.
Does fail_migrate_page() work with hugepages?

Also before the release I'll try to take a look at this:
http://marc.info/?l=reiserfs-devel&m=135402207623711&w=2

This failed path might indicate that we adjusted to fs-writeback
incorrectly.

Edward.

>
> Regards,
> Ivan.
>
>>
>>>>>     on kernel
>>>>>
>>>>> 3.6.10, and everything seems to be OK so far (so the workaround is
>>>>> version-
>>>>> agnostic).
>>>>>
>>>>> Edward, are there any guesses on what can make reiser4 choke on
>>>>> hugepages/compaction/migration?
>>>>
>>>> TBH, no ideas. They (hugepages) are _transparent_.
>>>> It means we shouldn't suffer in theory ;)
>>>>
>>>>>     I'm not even barely familiar with the kernel
>>>>>
>>>>> internals.
>>>>>
>>>>> Thanks,
>>>>> Ivan.

--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Kernel config option which causes reiser4 to be instable
  2012-12-14 11:07                         ` Edward Shishkin
@ 2012-12-14 18:20                           ` Ivan Shapovalov
  2012-12-16 15:36                             ` Edward Shishkin
  0 siblings, 1 reply; 28+ messages in thread
From: Ivan Shapovalov @ 2012-12-14 18:20 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: Dušan Čolić, reiserfs-devel

On 14 December 2012 12:07:56 Edward Shishkin wrote:
> On 12/14/2012 04:14 AM, Ivan Shapovalov wrote:
> > On 13 December 2012 23:47:10 Edward Shishkin wrote:
> >> On 12/11/2012 09:54 PM, Dušan Čolić wrote:
> >>> On Tue, Dec 11, 2012 at 7:33 PM, Edward Shishkin
> >>> 
> >>> <edward.shishkin@gmail.com>   wrote:
> >>>> On 12/11/2012 04:08 PM, Ivan Shapovalov wrote:
> >>>>> Hello!
> >>>> 
> >>>> Hello.
> >>>> 
> >>>>> With help of Dušan Čolić<dusanc@gmail.com>   who provided his kernel
> >>>>> config
> >>>>> diff I've found a kernel option which, when disabled, greatly reduces
> >>>>> (hopefully to zero, but need time to verify it) corruption rate in
> >>>>> reiser4.
> >>>>> 
> >>>>> It's CONFIG_TRANSPARENT_HUGEPAGE (or something which is used by it
> >>>>> like
> >>>>> CONFIG_COMPACTION or CONFIG_MIGRATION).
> >>>>> For now I'm testing it with CONFIG_TRANSPARENT_HUGEPAGE disabled
> >>>> 
> >>>> How long?
> >>> 
> >>> For me the difference in uptime is months without vs hours with it :D
> >>> on 2.6.39.4
> >> 
> >> Hm, indeed: my setup with enabled migration can not survive even one
> >> kernel compilation, while with disabled migration everything looks ok..
> > 
> > The overnight testing also showed no errors...
> > So shall we release reiser4-for-3.7 and announce FIXED(?) once again?
> > 
> > :)
> 
> I worry that migration is mandatory option for hugepages.
> Does fail_migrate_page() work with hugepages?

_Apparently_ yes. We have a counter named "compact_pagemigrate_failed" in 
/proc/vmstat (documented in vm/transhuge.txt), which means that failing a page 
migration is not a critical event. So hugepages and compaction will work, 
albeit quite less effectively...

...And I've immediately got a bunch of (presumably silly) questions while 
trying to implement ->migratepage().

1) Why it is needed to writeback dirty pages before migrating them?

2) Looking at the default implementation (fallback_migrate_page()), what is 
the meaning of migrating a released page? In other words, doesn't "releasing" 
page anyway mean "completely freeing" it, requiring the fs to read 
corresponding data again?

3) As far as I could understand, migrating page (from fs's point of view) is 
just replacing all internal pointers to the "old" page with pointers to the 
new one together with calling predefined functions migrate_page_move_mapping() 
and migrate_page_copy(). So here's a question - which structures of reiser4 
(beyond jnode->pg) keep pointers to pages and how to access them, given a 
single page?
I can remember cryptcompress's struct cluster_handle which stores an array of 
pages...

Thanks,
Ivan.

> 
> Also before the release I'll try to take a look at this:
> http://marc.info/?l=reiserfs-devel&m=135402207623711&w=2
> 
> This failed path might indicate that we adjusted to fs-writeback
> incorrectly.
> 
> Edward.
> 
> > Regards,
> > Ivan.
> > 
> >>>>>     on kernel
> >>>>> 
> >>>>> 3.6.10, and everything seems to be OK so far (so the workaround is
> >>>>> version-
> >>>>> agnostic).
> >>>>> 
> >>>>> Edward, are there any guesses on what can make reiser4 choke on
> >>>>> hugepages/compaction/migration?
> >>>> 
> >>>> TBH, no ideas. They (hugepages) are _transparent_.
> >>>> It means we shouldn't suffer in theory ;)
> >>>> 
> >>>>>     I'm not even barely familiar with the kernel
> >>>>> 
> >>>>> internals.
> >>>>> 
> >>>>> Thanks,
> >>>>> Ivan.
-- 
С уважением,
Шаповалов Иван.
--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Kernel config option which causes reiser4 to be instable
  2012-12-14 18:20                           ` Ivan Shapovalov
@ 2012-12-16 15:36                             ` Edward Shishkin
  2012-12-26 16:22                               ` Ivan Shapovalov
  0 siblings, 1 reply; 28+ messages in thread
From: Edward Shishkin @ 2012-12-16 15:36 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: Dušan Čolić, reiserfs-devel

On 12/14/2012 07:20 PM, Ivan Shapovalov wrote:
> On 14 December 2012 12:07:56 Edward Shishkin wrote:
>> On 12/14/2012 04:14 AM, Ivan Shapovalov wrote:
>>> On 13 December 2012 23:47:10 Edward Shishkin wrote:
>>>> On 12/11/2012 09:54 PM, Dušan Čolić wrote:
>>>>> On Tue, Dec 11, 2012 at 7:33 PM, Edward Shishkin
>>>>>
>>>>> <edward.shishkin@gmail.com>    wrote:
>>>>>> On 12/11/2012 04:08 PM, Ivan Shapovalov wrote:
>>>>>>> Hello!
>>>>>>
>>>>>> Hello.
>>>>>>
>>>>>>> With help of Dušan Čolić<dusanc@gmail.com>    who provided his kernel
>>>>>>> config
>>>>>>> diff I've found a kernel option which, when disabled, greatly reduces
>>>>>>> (hopefully to zero, but need time to verify it) corruption rate in
>>>>>>> reiser4.
>>>>>>>
>>>>>>> It's CONFIG_TRANSPARENT_HUGEPAGE (or something which is used by it
>>>>>>> like
>>>>>>> CONFIG_COMPACTION or CONFIG_MIGRATION).
>>>>>>> For now I'm testing it with CONFIG_TRANSPARENT_HUGEPAGE disabled
>>>>>>
>>>>>> How long?
>>>>>
>>>>> For me the difference in uptime is months without vs hours with it :D
>>>>> on 2.6.39.4
>>>>
>>>> Hm, indeed: my setup with enabled migration can not survive even one
>>>> kernel compilation, while with disabled migration everything looks ok..
>>>
>>> The overnight testing also showed no errors...
>>> So shall we release reiser4-for-3.7 and announce FIXED(?) once again?
>>>
>>> :)
>>
>> I worry that migration is mandatory option for hugepages.
>> Does fail_migrate_page() work with hugepages?
>
> _Apparently_ yes. We have a counter named "compact_pagemigrate_failed" in
> /proc/vmstat (documented in vm/transhuge.txt), which means that failing a page
> migration is not a critical event. So hugepages and compaction will work,
> albeit quite less effectively...
>
> ...And I've immediately got a bunch of (presumably silly) questions


Nop. Good questions.


  while
> trying to implement ->migratepage().
>
> 1) Why it is needed to writeback dirty pages before migrating them?
>
> 2) Looking at the default implementation (fallback_migrate_page()), what is
> the meaning of migrating a released page?


To make sure that nobody uses the page.

Just imagine: we allocate a page, take a reference, make page uptodate.
At this point migration routine steals the page. Then we do kmap(), but
virtual address is wrong. Welcome to corruption..

So, at first, migration routine wants to make sure that file system
doesn't use the page: try_to_release_page() checks a reference
counter (see e.g reiser4_releasepage).


  In other words, doesn't "releasing"
> page anyway mean "completely freeing" it, requiring the fs to read
> corresponding data again?


File system can not use a pointer to page which has been released.
We should obtain a new pointer (via find_get_page(), etc). IMHO dirty
page is a special case (this is regarding your question #1)


>
> 3) As far as I could understand, migrating page (from fs's point of view) is
> just replacing all internal pointers to the "old" page with pointers to the
> new one together with calling predefined functions migrate_page_move_mapping()
> and migrate_page_copy(). So here's a question - which structures of reiser4
> (beyond jnode->pg) keep pointers to pages and how to access them, given a
> single page?


Those pointers shouldn't be a concern, as we use them with reference
counters hold. I don't see where we reuse pointers to released page.

When a page is successfully released, we detach it from jnode (see
page_clear_jnode() in reiser4_releasepage()).


> I can remember cryptcompress's struct cluster_handle which stores an array of
> pages...


All cluster handles do have a status of local variables. After
checkin_page_cluster() we forget about the pointers while reference
counters are still hold. After checkout_page_cluster() we drop
reference counters and also forget about the pointers.

I see that default migration routine tries to release only pages
with non-zero private info. It won't work for reiser4, as not all
our pages has non-zero private info. For files managed by
cryptcompress plugin we allocate one jnode per page cluster (by
default 16 pages for page size 4K). And only first page of the
cluster gets non-zero private info. So reiser4_migratepage() should
try to release _all_ pages, not only ones with non-zero private info.

Still don't have ideas why we get corruption in the case of files
managed by (default) unix-file plugin (where we allocate one jnode
per page)..

Edward.


>
> Thanks,
> Ivan.
>
>>
>> Also before the release I'll try to take a look at this:
>> http://marc.info/?l=reiserfs-devel&m=135402207623711&w=2
>>
>> This failed path might indicate that we adjusted to fs-writeback
>> incorrectly.
>>
>> Edward.
>>
>>> Regards,
>>> Ivan.
>>>
>>>>>>>      on kernel
>>>>>>>
>>>>>>> 3.6.10, and everything seems to be OK so far (so the workaround is
>>>>>>> version-
>>>>>>> agnostic).
>>>>>>>
>>>>>>> Edward, are there any guesses on what can make reiser4 choke on
>>>>>>> hugepages/compaction/migration?
>>>>>>
>>>>>> TBH, no ideas. They (hugepages) are _transparent_.
>>>>>> It means we shouldn't suffer in theory ;)
>>>>>>
>>>>>>>      I'm not even barely familiar with the kernel
>>>>>>>
>>>>>>> internals.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Ivan.

--
To unsubscribe from this list: send the line "unsubscribe reiserfs-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Kernel config option which causes reiser4 to be instable
  2012-12-16 15:36                             ` Edward Shishkin
@ 2012-12-26 16:22                               ` Ivan Shapovalov
  2012-12-29  0:24                                 ` Edward Shishkin
  0 siblings, 1 reply; 28+ messages in thread
From: Ivan Shapovalov @ 2012-12-26 16:22 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: Dušan Čolić, reiserfs-devel

[-- Attachment #1: Type: text/plain, Size: 6238 bytes --]

On 16 December 2012 16:36:38 Edward Shishkin wrote:
> On 12/14/2012 07:20 PM, Ivan Shapovalov wrote:
> > On 14 December 2012 12:07:56 Edward Shishkin wrote:
> >> On 12/14/2012 04:14 AM, Ivan Shapovalov wrote:
> >>> On 13 December 2012 23:47:10 Edward Shishkin wrote:
> >>>> On 12/11/2012 09:54 PM, Dušan Čolić wrote:
> >>>>> On Tue, Dec 11, 2012 at 7:33 PM, Edward Shishkin
> >>>>> 
> >>>>> <edward.shishkin@gmail.com>    wrote:
> >>>>>> On 12/11/2012 04:08 PM, Ivan Shapovalov wrote:
> >>>>>>> Hello!
> >>>>>> 
> >>>>>> Hello.
> >>>>>> 
> >>>>>>> With help of Dušan Čolić<dusanc@gmail.com>    who provided his
> >>>>>>> kernel
> >>>>>>> config
> >>>>>>> diff I've found a kernel option which, when disabled, greatly
> >>>>>>> reduces
> >>>>>>> (hopefully to zero, but need time to verify it) corruption rate in
> >>>>>>> reiser4.
> >>>>>>> 
> >>>>>>> It's CONFIG_TRANSPARENT_HUGEPAGE (or something which is used by it
> >>>>>>> like
> >>>>>>> CONFIG_COMPACTION or CONFIG_MIGRATION).
> >>>>>>> For now I'm testing it with CONFIG_TRANSPARENT_HUGEPAGE disabled
> >>>>>> 
> >>>>>> How long?
> >>>>> 
> >>>>> For me the difference in uptime is months without vs hours with it :D
> >>>>> on 2.6.39.4
> >>>> 
> >>>> Hm, indeed: my setup with enabled migration can not survive even one
> >>>> kernel compilation, while with disabled migration everything looks ok..
> >>> 
> >>> The overnight testing also showed no errors...
> >>> So shall we release reiser4-for-3.7 and announce FIXED(?) once again?
> >>> 
> >>> :)
> >> 
> >> I worry that migration is mandatory option for hugepages.
> >> Does fail_migrate_page() work with hugepages?
> > 
> > _Apparently_ yes. We have a counter named "compact_pagemigrate_failed" in
> > /proc/vmstat (documented in vm/transhuge.txt), which means that failing a
> > page migration is not a critical event. So hugepages and compaction will
> > work, albeit quite less effectively...
> > 
> > ...And I've immediately got a bunch of (presumably silly) questions
> 
> Nop. Good questions.
> 
> 
>   while
> 
> > trying to implement ->migratepage().
> > 
> > 1) Why it is needed to writeback dirty pages before migrating them?
> > 
> > 2) Looking at the default implementation (fallback_migrate_page()), what
> > is
> > the meaning of migrating a released page?
> 
> To make sure that nobody uses the page.
> 
> Just imagine: we allocate a page, take a reference, make page uptodate.
> At this point migration routine steals the page. Then we do kmap(), but
> virtual address is wrong. Welcome to corruption..
> 
> So, at first, migration routine wants to make sure that file system
> doesn't use the page: try_to_release_page() checks a reference
> counter (see e.g reiser4_releasepage).
> 
> 
>   In other words, doesn't "releasing"
> 
> > page anyway mean "completely freeing" it, requiring the fs to read
> > corresponding data again?
> 
> File system can not use a pointer to page which has been released.
> We should obtain a new pointer (via find_get_page(), etc). IMHO dirty
> page is a special case (this is regarding your question #1)
> 
> > 3) As far as I could understand, migrating page (from fs's point of view)
> > is just replacing all internal pointers to the "old" page with pointers
> > to the new one together with calling predefined functions
> > migrate_page_move_mapping() and migrate_page_copy(). So here's a question
> > - which structures of reiser4 (beyond jnode->pg) keep pointers to pages
> > and how to access them, given a single page?
> 
> Those pointers shouldn't be a concern, as we use them with reference
> counters hold. I don't see where we reuse pointers to released page.
> 
> When a page is successfully released, we detach it from jnode (see
> page_clear_jnode() in reiser4_releasepage()).
> 
> > I can remember cryptcompress's struct cluster_handle which stores an array
> > of pages...
> 
> All cluster handles do have a status of local variables. After
> checkin_page_cluster() we forget about the pointers while reference
> counters are still hold. After checkout_page_cluster() we drop
> reference counters and also forget about the pointers.
> 
> I see that default migration routine tries to release only pages
> with non-zero private info. It won't work for reiser4, as not all
> our pages has non-zero private info. For files managed by
> cryptcompress plugin we allocate one jnode per page cluster (by
> default 16 pages for page size 4K). And only first page of the
> cluster gets non-zero private info. So reiser4_migratepage() should
> try to release _all_ pages, not only ones with non-zero private info.
> 
> Still don't have ideas why we get corruption in the case of files
> managed by (default) unix-file plugin (where we allocate one jnode
> per page)..

Hello Edward,

I've apparently managed to get a working implementation of ->migratepage().

I'm attaching a patch; it seems stable, but I'm worried somewhat because the 
destination pages (parameter newpage) sometimes (pretty often) have a non-NULL 
private field.

Currently I've put there a warning and a set_page_private(newpage, 0) - it 
seems to work, but well...
I'm continuing to test it and will report if that actually has bad 
consequences.

- Ivan

> 
> Edward.
> 
> > Thanks,
> > Ivan.
> > 
> >> Also before the release I'll try to take a look at this:
> >> http://marc.info/?l=reiserfs-devel&m=135402207623711&w=2
> >> 
> >> This failed path might indicate that we adjusted to fs-writeback
> >> incorrectly.
> >> 
> >> Edward.
> >> 
> >>> Regards,
> >>> Ivan.
> >>> 
> >>>>>>>      on kernel
> >>>>>>> 
> >>>>>>> 3.6.10, and everything seems to be OK so far (so the workaround is
> >>>>>>> version-
> >>>>>>> agnostic).
> >>>>>>> 
> >>>>>>> Edward, are there any guesses on what can make reiser4 choke on
> >>>>>>> hugepages/compaction/migration?
> >>>>>> 
> >>>>>> TBH, no ideas. They (hugepages) are _transparent_.
> >>>>>> It means we shouldn't suffer in theory ;)
> >>>>>> 
> >>>>>>>      I'm not even barely familiar with the kernel
> >>>>>>> 
> >>>>>>> internals.
> >>>>>>> 
> >>>>>>> Thanks,
> >>>>>>> Ivan.

[-- Attachment #2: reiser4-migratepage.patch --]
[-- Type: text/x-patch, Size: 5135 bytes --]

diff --git a/fs/reiser4/as_ops.c b/fs/reiser4/as_ops.c
index 8d8a37d..20006fc 100644
--- a/fs/reiser4/as_ops.c
+++ b/fs/reiser4/as_ops.c
@@ -43,6 +43,7 @@
 #include <linux/backing-dev.h>
 #include <linux/quotaops.h>
 #include <linux/security.h>
+#include <linux/migrate.h>
 
 /* address space operations */
 
@@ -304,6 +305,98 @@ int reiser4_releasepage(struct page *page, gfp_t gfp UNUSED_ARG)
 	}
 }
 
+int reiser4_migratepage(struct address_space *mapping, struct page *newpage,
+			struct page *page, enum migrate_mode mode)
+{
+	jnode *node;
+	int result;
+
+	assert("???-1", PageLocked(page));
+	assert("???-2", !PageWriteback(page));
+	assert("???-3", reiser4_schedulable());
+
+	if (PageDirty(page)) {
+		/*
+		 * Logic from migrate.c:fallback_migrate_page()
+		 * Only writeback pages in full synchronous migration
+		 */
+		if (mode != MIGRATE_SYNC)
+			return -EBUSY;
+		return write_one_page(page, true) < 0 ? -EIO : -EAGAIN;
+	}
+
+	assert("???-4", !PageDirty(page));
+
+	assert("???-5", page->mapping != NULL);
+	assert("???-6", page->mapping->host != NULL);
+
+	/*
+	 * Iteration 1: release page before default migration by migrate_page().
+	 * Iteration 2: check if the page is releasable; if true, directly replace
+	 *              jnode's page pointer instead of releasing, then migrate using
+	 *              default migrate_page().
+	 */
+
+	/* -- comment taken from mm/migrate.c:migrate_page_move_mapping() --
+	 * The number of remaining references must be:
+	 * 1 for anonymous pages without a mapping
+	 * 2 for pages with a mapping
+	 * 3 for pages with a mapping and PagePrivate/PagePrivate2 set.
+	 */
+
+	if (page_count(page) > (PagePrivate(page) ? 3 : 2))
+		return -EIO;
+
+	/*
+	 * Non-referenced non-PagePrivate pages are e. g. anonymous pages.
+	 * If any, just migrate them using default routine.
+	 */
+
+	if (!PagePrivate(page))
+		return migrate_page(mapping, newpage, page, mode);
+
+	node = jnode_by_page(page);
+	assert("???-7", node != NULL);
+
+	/* releasable() needs jnode lock, because it looks at the jnode fields
+	 * and we need jload_lock here to avoid races with jload(). */
+	spin_lock_jnode(node);
+	spin_lock(&(node->load));
+	if (jnode_is_releasable(node)) {
+		jref(node);
+
+		/* there is no need to synchronize against
+		 * jnode_extent_write() here, because pages seen by
+		 * jnode_extent_write() are !releasable(). */
+		page_clear_jnode(page, node);
+
+		if(jprivate(newpage)) {
+			// FIXME: warning or what? happens on a regular basis, behavior is unaffected.
+			warning("???-10", "Migration destination page has a non-NULL private field (%x) - resetting it", page_private(newpage));
+			set_page_private(newpage, 0ul);
+		}
+
+		result = migrate_page(mapping, newpage, page, mode);
+		if (unlikely(result)) {
+			jnode_attach_page(node, page); /* migration failed - reattach the old page */
+		} else {
+			jnode_attach_page(node, newpage);
+		}
+
+		jput(node);
+
+		spin_unlock(&(node->load));
+		spin_unlock_jnode(node);
+		assert("???-9", reiser4_schedulable());
+		return result;
+	} else {
+		spin_unlock(&(node->load));
+		spin_unlock_jnode(node);
+		assert("???-8", reiser4_schedulable());
+		return -EIO;
+	}
+}
+
 int reiser4_readpage(struct file *file, struct page *page)
 {
 	assert("edward-1533", PageLocked(page));
diff --git a/fs/reiser4/page_cache.c b/fs/reiser4/page_cache.c
index bce07ea..1bbb9dc 100644
--- a/fs/reiser4/page_cache.c
+++ b/fs/reiser4/page_cache.c
@@ -548,7 +548,8 @@ static struct address_space_operations formatted_fake_as_ops = {
 	   and, may be made page itself free-able.
 	 */
 	.releasepage = reiser4_releasepage,
-	.direct_IO = NULL
+	.direct_IO = NULL,
+	.migratepage = reiser4_migratepage
 };
 
 /* called just before page is released (no longer used by reiser4). Callers:
diff --git a/fs/reiser4/plugin/object.c b/fs/reiser4/plugin/object.c
index e8db03b..ef37a2d 100644
--- a/fs/reiser4/plugin/object.c
+++ b/fs/reiser4/plugin/object.c
@@ -121,7 +121,8 @@ static struct address_space_operations regular_file_a_ops = {
 	.write_end = reiser4_write_end_careful,
 	.bmap = reiser4_bmap_careful,
 	.invalidatepage = reiser4_invalidatepage,
-	.releasepage = reiser4_releasepage
+	.releasepage = reiser4_releasepage,
+	.migratepage = reiser4_migratepage
 };
 
 /* VFS methods for symlink files */
@@ -172,7 +173,8 @@ static struct address_space_operations directory_a_ops = {
 	.write_end = bugop,
 	.bmap = bugop,
 	.invalidatepage = bugop,
-	.releasepage = bugop
+	.releasepage = bugop,
+	.migratepage = bugop
 };
 
 /*
diff --git a/fs/reiser4/vfs_ops.h b/fs/reiser4/vfs_ops.h
index 6ca85f8..3c6c0f3 100644
--- a/fs/reiser4/vfs_ops.h
+++ b/fs/reiser4/vfs_ops.h
@@ -24,6 +24,8 @@ int reiser4_writepage(struct page *, struct writeback_control *);
 int reiser4_set_page_dirty(struct page *);
 void reiser4_invalidatepage(struct page *, unsigned long offset);
 int reiser4_releasepage(struct page *, gfp_t);
+int reiser4_migratepage(struct address_space *, struct page *,
+			struct page *, enum migrate_mode);
 
 extern int reiser4_update_sd(struct inode *);
 extern int reiser4_add_nlink(struct inode *, struct inode *, int);

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: Kernel config option which causes reiser4 to be instable
  2012-12-26 16:22                               ` Ivan Shapovalov
@ 2012-12-29  0:24                                 ` Edward Shishkin
  2012-12-29 18:47                                   ` Ivan Shapovalov
  0 siblings, 1 reply; 28+ messages in thread
From: Edward Shishkin @ 2012-12-29  0:24 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: Dušan Čolić, reiserfs-devel

On 12/26/2012 05:22 PM, Ivan Shapovalov wrote:
> Hello Edward,


Hi Ivan,


>
> I've apparently managed to get a working implementation of ->migratepage().
>
> I'm attaching a patch; it seems stable, but I'm worried somewhat because the
> destination pages (parameter newpage) sometimes (pretty often) have a non-NULL
> private field.
>
> Currently I've put there a warning and a set_page_private(newpage, 0) - it
> seems to work, but well...
> I'm continuing to test it and will report if that actually has bad
> consequences.
>
> - Ivan
>
>> >
>> >  Edward.
>> >
>>> >  >  Thanks,
>>> >  >  Ivan.
>>> >  >
>>>> >  >>  Also before the release I'll try to take a look at this:
>>>> >  >>  http://marc.info/?l=reiserfs-devel&m=135402207623711&w=2
>>>> >  >>
>>>> >  >>  This failed path might indicate that we adjusted to fs-writeback
>>>> >  >>  incorrectly.
>>>> >  >>
>>>> >  >>  Edward.
>>>> >  >>
>>>>> >  >>>  Regards,
>>>>> >  >>>  Ivan.
>>>>> >  >>>
>>>>>>>>> >  >>>>>>>        on kernel
>>>>>>>>> >  >>>>>>>
>>>>>>>>> >  >>>>>>>  3.6.10, and everything seems to be OK so far (so the workaround is
>>>>>>>>> >  >>>>>>>  version-
>>>>>>>>> >  >>>>>>>  agnostic).
>>>>>>>>> >  >>>>>>>
>>>>>>>>> >  >>>>>>>  Edward, are there any guesses on what can make reiser4 choke on
>>>>>>>>> >  >>>>>>>  hugepages/compaction/migration?
>>>>>>>> >  >>>>>>
>>>>>>>> >  >>>>>>  TBH, no ideas. They (hugepages) are_transparent_.
>>>>>>>> >  >>>>>>  It means we shouldn't suffer in theory;)
>>>>>>>> >  >>>>>>
>>>>>>>>> >  >>>>>>>        I'm not even barely familiar with the kernel
>>>>>>>>> >  >>>>>>>
>>>>>>>>> >  >>>>>>>  internals.
>>>>>>>>> >  >>>>>>>
>>>>>>>>> >  >>>>>>>  Thanks,
>>>>>>>>> >  >>>>>>>  Ivan.
>
>
> reiser4-migratepage.patch
>
>
> diff --git a/fs/reiser4/as_ops.c b/fs/reiser4/as_ops.c
> index 8d8a37d..20006fc 100644
> --- a/fs/reiser4/as_ops.c
> +++ b/fs/reiser4/as_ops.c
> @@ -43,6 +43,7 @@
>   #include<linux/backing-dev.h>
>   #include<linux/quotaops.h>
>   #include<linux/security.h>
> +#include<linux/migrate.h>
>
>   /* address space operations */
>
> @@ -304,6 +305,98 @@ int reiser4_releasepage(struct page *page, gfp_t gfp UNUSED_ARG)
>   	}
>   }
>
> +int reiser4_migratepage(struct address_space *mapping, struct page *newpage,
> +			struct page *page, enum migrate_mode mode)
> +{
> +	jnode *node;
> +	int result;
> +
> +	assert("???-1", PageLocked(page));
> +	assert("???-2", !PageWriteback(page));
> +	assert("???-3", reiser4_schedulable());
> +
> +	if (PageDirty(page)) {
> +		/*
> +		 * Logic from migrate.c:fallback_migrate_page()
> +		 * Only writeback pages in full synchronous migration
> +		 */
> +		if (mode != MIGRATE_SYNC)
> +			return -EBUSY;
> +		return write_one_page(page, true)<  0 ? -EIO : -EAGAIN;


Why? The migrate.c's writeout() doesn't work?
I see it performs some cleanups..


> +	}
> +
> +	assert("???-4", !PageDirty(page));
> +
> +	assert("???-5", page->mapping != NULL);
> +	assert("???-6", page->mapping->host != NULL);
> +
> +	/*
> +	 * Iteration 1: release page before default migration by migrate_page().
> +	 * Iteration 2: check if the page is releasable; if true, directly replace
> +	 *              jnode's page pointer instead of releasing, then migrate using
> +	 *              default migrate_page().
> +	 */
> +
> +	/* -- comment taken from mm/migrate.c:migrate_page_move_mapping() --
> +	 * The number of remaining references must be:
> +	 * 1 for anonymous pages without a mapping
> +	 * 2 for pages with a mapping
> +	 * 3 for pages with a mapping and PagePrivate/PagePrivate2 set.
> +	 */
> +
> +	if (page_count(page)>  (PagePrivate(page) ? 3 : 2))


OK


> +		return -EIO;


-EAGAIN would be better (like fallback_migrate_page() returns in
the case of "non-releasable")


> +
> +	/*
> +	 * Non-referenced non-PagePrivate pages are e. g. anonymous pages.


"vfs-anonymous" are pages with page->mapping == NULL;

"reiser4-anonymous" are pages which are dirtied via ->mmap(),
but not yet captured by reiser4 transaction manager.

Note, that none of those cases takes place at this point.


> +	 * If any, just migrate them using default routine.
> +	 */
> +
> +	if (!PagePrivate(page))
> +		return migrate_page(mapping, newpage, page, mode);


OK


> +
> +	node = jnode_by_page(page);
> +	assert("???-7", node != NULL);
> +
> +	/* releasable() needs jnode lock, because it looks at the jnode fields
> +	 * and we need jload_lock here to avoid races with jload(). */
> +	spin_lock_jnode(node);
> +	spin_lock(&(node->load));
> +	if (jnode_is_releasable(node)) {
> +		jref(node);


OK


> +
> +		/* there is no need to synchronize against
> +		 * jnode_extent_write() here, because pages seen by
> +		 * jnode_extent_write() are !releasable(). */
> +		page_clear_jnode(page, node);
> +
> +		if(jprivate(newpage)) {


Hmm, strange: the newpage is freshly allocated and locked..
Somebody doesn't clean up after himself ???  Don't have other
ideas..



> +			// FIXME: warning or what? happens on a regular basis, behavior is unaffected.
> +			warning("???-10", "Migration destination page has a non-NULL private field (%x) - resetting it", page_private(newpage));


Does it look like a pointer, which can be dereferenced?
(print it better in %p format). If so, try to detect a
jnode at this address by jnode's magic (JMAGIC 0x52654973,
present only when debug is on). To make sure we are not
the culprit..


> +			set_page_private(newpage, 0ul);
> +		}
> +
> +		result = migrate_page(mapping, newpage, page, mode);
> +		if (unlikely(result)) {
> +			jnode_attach_page(node, page); /* migration failed - reattach the old page */
> +		} else {
> +			jnode_attach_page(node, newpage);
> +		}
> +
> +		jput(node);
> +
> +		spin_unlock(&(node->load));
> +		spin_unlock_jnode(node);
> +		assert("???-9", reiser4_schedulable());
> +		return result;
> +	} else {
> +		spin_unlock(&(node->load));
> +		spin_unlock_jnode(node);
> +		assert("???-8", reiser4_schedulable());
> +		return -EIO;


-EAGAIN


> +	}
> +}
> +


So simply releasing the page (by default migration routine) doesn't
work, while such transfer of the relationship (page, jnode) to
another pair (newpage, jnode) does work? Can not understand this..

E.g. vmscanner (vmscan.c), which also releases releasable page, but
doesn't provide a newpage instead (in contrast with migration) works
perfectly..

Did you stress it well enough?

I think we'll release reiser4-for-3.7 with fail_migrate_page (to fix
a stability point), and continue to play with migration..

Thanks for important results!

Edward.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Kernel config option which causes reiser4 to be instable
  2012-12-29  0:24                                 ` Edward Shishkin
@ 2012-12-29 18:47                                   ` Ivan Shapovalov
  2013-01-07  0:06                                     ` Edward Shishkin
  0 siblings, 1 reply; 28+ messages in thread
From: Ivan Shapovalov @ 2012-12-29 18:47 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: Dušan Čolić, reiserfs-devel

[-- Attachment #1: Type: text/plain, Size: 8295 bytes --]

On 29 December 2012 01:24:48 Edward Shishkin wrote:
> On 12/26/2012 05:22 PM, Ivan Shapovalov wrote:
> > Hello Edward,
> 
> Hi Ivan,
> 
> > I've apparently managed to get a working implementation of
> > ->migratepage().
> > 
> > I'm attaching a patch; it seems stable, but I'm worried somewhat because
> > the destination pages (parameter newpage) sometimes (pretty often) have a
> > non-NULL private field.
> > 
> > Currently I've put there a warning and a set_page_private(newpage, 0) - it
> > seems to work, but well...
> > I'm continuing to test it and will report if that actually has bad
> > consequences.
> > 
> > - Ivan
> > 
> >> >  Edward.
> >> >  
> >>> >  >  Thanks,
> >>> >  >  Ivan.
> >>> >  >  
> >>>> >  >>  Also before the release I'll try to take a look at this:
> >>>> >  >>  http://marc.info/?l=reiserfs-devel&m=135402207623711&w=2
> >>>> >  >>  
> >>>> >  >>  This failed path might indicate that we adjusted to fs-writeback
> >>>> >  >>  incorrectly.
> >>>> >  >>  
> >>>> >  >>  Edward.
> >>>> >  >>  
> >>>>> >  >>>  Regards,
> >>>>> >  >>>  Ivan.
> >>>>> >  >>>  
> >>>>>>>>> >  >>>>>>>        on kernel
> >>>>>>>>> >  >>>>>>>  
> >>>>>>>>> >  >>>>>>>  3.6.10, and everything seems to be OK so far (so the
> >>>>>>>>> >  >>>>>>>  workaround is
> >>>>>>>>> >  >>>>>>>  version-
> >>>>>>>>> >  >>>>>>>  agnostic).
> >>>>>>>>> >  >>>>>>>  
> >>>>>>>>> >  >>>>>>>  Edward, are there any guesses on what can make reiser4
> >>>>>>>>> >  >>>>>>>  choke on
> >>>>>>>>> >  >>>>>>>  hugepages/compaction/migration?
> >>>>>>>> >  >>>>>>  
> >>>>>>>> >  >>>>>>  TBH, no ideas. They (hugepages) are_transparent_.
> >>>>>>>> >  >>>>>>  It means we shouldn't suffer in theory;)
> >>>>>>>> >  >>>>>>  
> >>>>>>>>> >  >>>>>>>        I'm not even barely familiar with the kernel
> >>>>>>>>> >  >>>>>>>  
> >>>>>>>>> >  >>>>>>>  internals.
> >>>>>>>>> >  >>>>>>>  
> >>>>>>>>> >  >>>>>>>  Thanks,
> >>>>>>>>> >  >>>>>>>  Ivan.
> > 
> > reiser4-migratepage.patch
> > 
> > 
> > diff --git a/fs/reiser4/as_ops.c b/fs/reiser4/as_ops.c
> > index 8d8a37d..20006fc 100644
> > --- a/fs/reiser4/as_ops.c
> > +++ b/fs/reiser4/as_ops.c
> > @@ -43,6 +43,7 @@
> > 
> >   #include<linux/backing-dev.h>
> >   #include<linux/quotaops.h>
> >   #include<linux/security.h>
> > 
> > +#include<linux/migrate.h>
> > 
> >   /* address space operations */
> > 
> > @@ -304,6 +305,98 @@ int reiser4_releasepage(struct page *page, gfp_t gfp
> > UNUSED_ARG)> 
> >   	}
> >   
> >   }
> > 
> > +int reiser4_migratepage(struct address_space *mapping, struct page
> > *newpage, +			struct page *page, enum migrate_mode mode)
> > +{
> > +	jnode *node;
> > +	int result;
> > +
> > +	assert("???-1", PageLocked(page));
> > +	assert("???-2", !PageWriteback(page));
> > +	assert("???-3", reiser4_schedulable());
> > +
> > +	if (PageDirty(page)) {
> > +		/*
> > +		 * Logic from migrate.c:fallback_migrate_page()
> > +		 * Only writeback pages in full synchronous migration
> > +		 */
> > +		if (mode != MIGRATE_SYNC)
> > +			return -EBUSY;
> > +		return write_one_page(page, true)<  0 ? -EIO : -EAGAIN;
> 
> Why? The migrate.c's writeout() doesn't work?
> I see it performs some cleanups..

Hi,

Because it's static. :)
But OK, I exported writeout() and used it.

> 
> > +	}
> > +
> > +	assert("???-4", !PageDirty(page));
> > +
> > +	assert("???-5", page->mapping != NULL);
> > +	assert("???-6", page->mapping->host != NULL);
> > +
> > +	/*
> > +	 * Iteration 1: release page before default migration by migrate_page().
> > +	 * Iteration 2: check if the page is releasable; if true, directly
> > replace +	 *              jnode's page pointer instead of releasing, then
> > migrate using +	 *              default migrate_page().
> > +	 */
> > +
> > +	/* -- comment taken from mm/migrate.c:migrate_page_move_mapping() --
> > +	 * The number of remaining references must be:
> > +	 * 1 for anonymous pages without a mapping
> > +	 * 2 for pages with a mapping
> > +	 * 3 for pages with a mapping and PagePrivate/PagePrivate2 set.
> > +	 */
> > +
> > +	if (page_count(page)>  (PagePrivate(page) ? 3 : 2))
> 
> OK
> 
> > +		return -EIO;
> 
> -EAGAIN would be better (like fallback_migrate_page() returns in
> the case of "non-releasable")

Fixed.

> 
> > +
> > +	/*
> > +	 * Non-referenced non-PagePrivate pages are e. g. anonymous pages.
> 
> "vfs-anonymous" are pages with page->mapping == NULL;
> 
> "reiser4-anonymous" are pages which are dirtied via ->mmap(),
> but not yet captured by reiser4 transaction manager.
> 
> Note, that none of those cases takes place at this point.

Hm, indeed that's an insane comment. Fixed.

> 
> > +	 * If any, just migrate them using default routine.
> > +	 */
> > +
> > +	if (!PagePrivate(page))
> > +		return migrate_page(mapping, newpage, page, mode);
> 
> OK
> 
> > +
> > +	node = jnode_by_page(page);
> > +	assert("???-7", node != NULL);
> > +
> > +	/* releasable() needs jnode lock, because it looks at the jnode fields
> > +	 * and we need jload_lock here to avoid races with jload(). */
> > +	spin_lock_jnode(node);
> > +	spin_lock(&(node->load));
> > +	if (jnode_is_releasable(node)) {
> > +		jref(node);
> 
> OK
> 
> > +
> > +		/* there is no need to synchronize against
> > +		 * jnode_extent_write() here, because pages seen by
> > +		 * jnode_extent_write() are !releasable(). */
> > +		page_clear_jnode(page, node);
> > +
> > +		if(jprivate(newpage)) {
> 
> Hmm, strange: the newpage is freshly allocated and locked..
> Somebody doesn't clean up after himself ???  Don't have other
> ideas..

Yes, apparently. There are no calls to set_page_private() in compaction.c or
migrate.c (actually, in last file there are some, but they do not get called
in observed pathes).

> 
> > +			// FIXME: warning or what? happens on a regular basis, behavior is
> > unaffected. +			warning("???-10", "Migration destination page has a
> > non-NULL private field (%x) - resetting it", page_private(newpage));
> Does it look like a pointer, which can be dereferenced?
> (print it better in %p format). If so, try to detect a
> jnode at this address by jnode's magic (JMAGIC 0x52654973,
> present only when debug is on). To make sure we are not
> the culprit..

Cannot be dereferenced. Proved by a handful of oopses...
The addresses (as printed by %p) look like "ffffea0002419800" and then
I get pagefault oopses for addresses like "0000000000001010".

> 
> > +			set_page_private(newpage, 0ul);
> > +		}
> > +
> > +		result = migrate_page(mapping, newpage, page, mode);
> > +		if (unlikely(result)) {
> > +			jnode_attach_page(node, page); /* migration failed - reattach the old
> > page */ +		} else {
> > +			jnode_attach_page(node, newpage);
> > +		}
> > +
> > +		jput(node);
> > +
> > +		spin_unlock(&(node->load));
> > +		spin_unlock_jnode(node);
> > +		assert("???-9", reiser4_schedulable());
> > +		return result;
> > +	} else {
> > +		spin_unlock(&(node->load));
> > +		spin_unlock_jnode(node);
> > +		assert("???-8", reiser4_schedulable());
> > +		return -EIO;
> 
> -EAGAIN

Fixed.

> 
> > +	}
> > +}
> > +
> 
> So simply releasing the page (by default migration routine) doesn't
> work, while such transfer of the relationship (page, jnode) to
> another pair (newpage, jnode) does work? Can not understand this..
> 
> E.g. vmscanner (vmscan.c), which also releases releasable page, but
> doesn't provide a newpage instead (in contrast with migration) works
> perfectly..

/me just looked at vmscan.c, which also skips ->releasepage() when there is
no private data on page, and wonders how did that ever work with reiser4.

So I'd say there is nothing strange in that the fallback_migrate_page() does not work
while our custom one works (it's all about remembering about non-PagePrivate pages);
instead, it is strange that the vmscanner works while fallback_migrate_page() does not.

Maybe vmscanner finds out that the page is used by some other means?..

> 
> Did you stress it well enough?

Yes, I think so. Parallel building + various git operations + databases + virtual machines.

> 
> I think we'll release reiser4-for-3.7 with fail_migrate_page (to fix
> a stability point), and continue to play with migration..

Maybe.. What are the next targets to play?
I'm attaching a patch with all these issues corrected.

Regards,
Ivan.

> 
> Thanks for important results!
> 
> Edward.

[-- Attachment #2: reiser4-migratepage-v2.patch --]
[-- Type: text/x-patch, Size: 6680 bytes --]

diff --git a/fs/reiser4/as_ops.c b/fs/reiser4/as_ops.c
index 8d8a37d..5949e5d 100644
--- a/fs/reiser4/as_ops.c
+++ b/fs/reiser4/as_ops.c
@@ -43,6 +43,7 @@
 #include <linux/backing-dev.h>
 #include <linux/quotaops.h>
 #include <linux/security.h>
+#include <linux/migrate.h>
 
 /* address space operations */
 
@@ -304,6 +305,98 @@ int reiser4_releasepage(struct page *page, gfp_t gfp UNUSED_ARG)
 	}
 }
 
+int reiser4_migratepage(struct address_space *mapping, struct page *newpage,
+			struct page *page, enum migrate_mode mode)
+{
+	jnode *node;
+	int result;
+
+	assert("???-1", PageLocked(page));
+	assert("???-2", !PageWriteback(page));
+	assert("???-3", reiser4_schedulable());
+
+	if (PageDirty(page)) {
+		/*
+		 * Logic from migrate.c:fallback_migrate_page()
+		 * Only writeback pages in full synchronous migration
+		 */
+		if (mode != MIGRATE_SYNC)
+			return -EBUSY;
+		return writeout(mapping, page);
+	}
+
+	assert("???-4", !PageDirty(page));
+
+	assert("???-5", page->mapping != NULL);
+	assert("???-6", page->mapping->host != NULL);
+
+	/*
+	 * Iteration 1: release page before default migration by migrate_page().
+	 * Iteration 2: check if the page is releasable; if true, directly replace
+	 *              jnode's page pointer instead of releasing, then migrate using
+	 *              default migrate_page().
+	 */
+
+	/* -- comment taken from mm/migrate.c:migrate_page_move_mapping() --
+	 * The number of remaining references must be:
+	 * 1 for anonymous pages without a mapping
+	 * 2 for pages with a mapping
+	 * 3 for pages with a mapping and PagePrivate/PagePrivate2 set.
+	 */
+
+	if (page_count(page) > (PagePrivate(page) ? 3 : 2))
+		return -EAGAIN;
+
+	/*
+	 * If there are non-referenced non-PagePrivate pages,
+	 * just migrate them using default routine.
+	 */
+
+	if (!PagePrivate(page))
+		return migrate_page(mapping, newpage, page, mode);
+
+	node = jnode_by_page(page);
+	assert("???-7", node != NULL);
+
+	/* releasable() needs jnode lock, because it looks at the jnode fields
+	 * and we need jload_lock here to avoid races with jload(). */
+	spin_lock_jnode(node);
+	spin_lock(&(node->load));
+	if (jnode_is_releasable(node)) {
+		jref(node);
+
+		/* there is no need to synchronize against
+		 * jnode_extent_write() here, because pages seen by
+		 * jnode_extent_write() are !releasable(). */
+		page_clear_jnode(page, node);
+
+		if(jprivate(newpage)) {
+			// FIXME: warning or what? happens on a regular basis, behavior is unaffected.
+			warning("???-10", "Migration destination page has a non-NULL private field (%p) - resetting it", page_private(newpage));
+			set_page_private(newpage, 0ul);
+		}
+
+		result = migrate_page(mapping, newpage, page, mode);
+		if (unlikely(result)) {
+			jnode_attach_page(node, page); /* migration failed - reattach the old page */
+		} else {
+			jnode_attach_page(node, newpage);
+		}
+
+		jput(node);
+
+		spin_unlock(&(node->load));
+		spin_unlock_jnode(node);
+		assert("???-9", reiser4_schedulable());
+		return result;
+	} else {
+		spin_unlock(&(node->load));
+		spin_unlock_jnode(node);
+		assert("???-8", reiser4_schedulable());
+		return -EAGAIN;
+	}
+}
+
 int reiser4_readpage(struct file *file, struct page *page)
 {
 	assert("edward-1533", PageLocked(page));
diff --git a/fs/reiser4/page_cache.c b/fs/reiser4/page_cache.c
index bce07ea..1bbb9dc 100644
--- a/fs/reiser4/page_cache.c
+++ b/fs/reiser4/page_cache.c
@@ -548,7 +548,8 @@ static struct address_space_operations formatted_fake_as_ops = {
 	   and, may be made page itself free-able.
 	 */
 	.releasepage = reiser4_releasepage,
-	.direct_IO = NULL
+	.direct_IO = NULL,
+	.migratepage = reiser4_migratepage
 };
 
 /* called just before page is released (no longer used by reiser4). Callers:
diff --git a/fs/reiser4/plugin/object.c b/fs/reiser4/plugin/object.c
index e8db03b..ef37a2d 100644
--- a/fs/reiser4/plugin/object.c
+++ b/fs/reiser4/plugin/object.c
@@ -121,7 +121,8 @@ static struct address_space_operations regular_file_a_ops = {
 	.write_end = reiser4_write_end_careful,
 	.bmap = reiser4_bmap_careful,
 	.invalidatepage = reiser4_invalidatepage,
-	.releasepage = reiser4_releasepage
+	.releasepage = reiser4_releasepage,
+	.migratepage = reiser4_migratepage
 };
 
 /* VFS methods for symlink files */
@@ -172,7 +173,8 @@ static struct address_space_operations directory_a_ops = {
 	.write_end = bugop,
 	.bmap = bugop,
 	.invalidatepage = bugop,
-	.releasepage = bugop
+	.releasepage = bugop,
+	.migratepage = bugop
 };
 
 /*
diff --git a/fs/reiser4/vfs_ops.h b/fs/reiser4/vfs_ops.h
index 6ca85f8..3c6c0f3 100644
--- a/fs/reiser4/vfs_ops.h
+++ b/fs/reiser4/vfs_ops.h
@@ -24,6 +24,8 @@ int reiser4_writepage(struct page *, struct writeback_control *);
 int reiser4_set_page_dirty(struct page *);
 void reiser4_invalidatepage(struct page *, unsigned long offset);
 int reiser4_releasepage(struct page *, gfp_t);
+int reiser4_migratepage(struct address_space *, struct page *,
+			struct page *, enum migrate_mode);
 
 extern int reiser4_update_sd(struct inode *);
 extern int reiser4_add_nlink(struct inode *, struct inode *, int);
diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index ce7e667..2d0340f 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -30,6 +30,9 @@ extern int migrate_vmas(struct mm_struct *mm,
 extern void migrate_page_copy(struct page *newpage, struct page *page);
 extern int migrate_huge_page_move_mapping(struct address_space *mapping,
 				  struct page *newpage, struct page *page);
+
+extern int writeout(struct address_space *mapping, struct page *page);
+
 #else
 
 static inline void putback_lru_pages(struct list_head *l) {}
@@ -59,6 +62,11 @@ static inline int migrate_huge_page_move_mapping(struct address_space *mapping,
 	return -ENOSYS;
 }
 
+static inline int writeout(struct address_space *mapping, struct page *page)
+{
+	return -ENOSYS;
+}
+
 /* Possible settings for the migrate_page() method in address_operations */
 #define migrate_page NULL
 #define fail_migrate_page NULL
diff --git a/mm/migrate.c b/mm/migrate.c
index 77ed2d7..3a854cb 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -557,7 +557,7 @@ EXPORT_SYMBOL(buffer_migrate_page);
 /*
  * Writeback a page to clean the dirty state
  */
-static int writeout(struct address_space *mapping, struct page *page)
+int writeout(struct address_space *mapping, struct page *page)
 {
 	struct writeback_control wbc = {
 		.sync_mode = WB_SYNC_NONE,
@@ -594,6 +594,7 @@ static int writeout(struct address_space *mapping, struct page *page)
 
 	return (rc < 0) ? -EIO : -EAGAIN;
 }
+EXPORT_SYMBOL(writeout);
 
 /*
  * Default handling if a filesystem does not provide a migration function.

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: Kernel config option which causes reiser4 to be instable
  2012-12-29 18:47                                   ` Ivan Shapovalov
@ 2013-01-07  0:06                                     ` Edward Shishkin
  2013-01-07  1:33                                       ` Ivan Shapovalov
  0 siblings, 1 reply; 28+ messages in thread
From: Edward Shishkin @ 2013-01-07  0:06 UTC (permalink / raw)
  To: Ivan Shapovalov; +Cc: Dušan Čolić, reiserfs-devel

On 12/29/2012 07:47 PM, Ivan Shapovalov wrote:
> On 29 December 2012 01:24:48 Edward Shishkin wrote:
>> On 12/26/2012 05:22 PM, Ivan Shapovalov wrote:
>>> Hello Edward,
>>
>> Hi Ivan,
>>
>>> I've apparently managed to get a working implementation of
>>> ->migratepage().
>>>
>>> I'm attaching a patch; it seems stable, but I'm worried somewhat because
>>> the destination pages (parameter newpage) sometimes (pretty often) have a
>>> non-NULL private field.
>>>
>>> Currently I've put there a warning and a set_page_private(newpage, 0) - it
>>> seems to work, but well...
>>> I'm continuing to test it and will report if that actually has bad
>>> consequences.
>>>
>>> - Ivan
>>>
>>>>>   Edward.
>>>>>
>>>>>>   >   Thanks,
>>>>>>   >   Ivan.
>>>>>>   >
>>>>>>>   >>   Also before the release I'll try to take a look at this:
>>>>>>>   >>   http://marc.info/?l=reiserfs-devel&m=135402207623711&w=2
>>>>>>>   >>
>>>>>>>   >>   This failed path might indicate that we adjusted to fs-writeback
>>>>>>>   >>   incorrectly.
>>>>>>>   >>
>>>>>>>   >>   Edward.
>>>>>>>   >>
>>>>>>>>   >>>   Regards,
>>>>>>>>   >>>   Ivan.
>>>>>>>>   >>>
>>>>>>>>>>>>   >>>>>>>         on kernel
>>>>>>>>>>>>   >>>>>>>
>>>>>>>>>>>>   >>>>>>>   3.6.10, and everything seems to be OK so far (so the
>>>>>>>>>>>>   >>>>>>>   workaround is
>>>>>>>>>>>>   >>>>>>>   version-
>>>>>>>>>>>>   >>>>>>>   agnostic).
>>>>>>>>>>>>   >>>>>>>
>>>>>>>>>>>>   >>>>>>>   Edward, are there any guesses on what can make reiser4
>>>>>>>>>>>>   >>>>>>>   choke on
>>>>>>>>>>>>   >>>>>>>   hugepages/compaction/migration?
>>>>>>>>>>>   >>>>>>
>>>>>>>>>>>   >>>>>>   TBH, no ideas. They (hugepages) are_transparent_.
>>>>>>>>>>>   >>>>>>   It means we shouldn't suffer in theory;)
>>>>>>>>>>>   >>>>>>
>>>>>>>>>>>>   >>>>>>>         I'm not even barely familiar with the kernel
>>>>>>>>>>>>   >>>>>>>
>>>>>>>>>>>>   >>>>>>>   internals.
>>>>>>>>>>>>   >>>>>>>
>>>>>>>>>>>>   >>>>>>>   Thanks,
>>>>>>>>>>>>   >>>>>>>   Ivan.
>>>
>>> reiser4-migratepage.patch
>>>
>>>
>>> diff --git a/fs/reiser4/as_ops.c b/fs/reiser4/as_ops.c
>>> index 8d8a37d..20006fc 100644
>>> --- a/fs/reiser4/as_ops.c
>>> +++ b/fs/reiser4/as_ops.c
>>> @@ -43,6 +43,7 @@
>>>
>>>    #include<linux/backing-dev.h>
>>>    #include<linux/quotaops.h>
>>>    #include<linux/security.h>
>>>
>>> +#include<linux/migrate.h>
>>>
>>>    /* address space operations */
>>>
>>> @@ -304,6 +305,98 @@ int reiser4_releasepage(struct page *page, gfp_t gfp
>>> UNUSED_ARG)>
>>>    	}
>>>
>>>    }
>>>
>>> +int reiser4_migratepage(struct address_space *mapping, struct page
>>> *newpage, +			struct page *page, enum migrate_mode mode)
>>> +{
>>> +	jnode *node;
>>> +	int result;
>>> +
>>> +	assert("???-1", PageLocked(page));
>>> +	assert("???-2", !PageWriteback(page));
>>> +	assert("???-3", reiser4_schedulable());
>>> +
>>> +	if (PageDirty(page)) {
>>> +		/*
>>> +		 * Logic from migrate.c:fallback_migrate_page()
>>> +		 * Only writeback pages in full synchronous migration
>>> +		 */
>>> +		if (mode != MIGRATE_SYNC)
>>> +			return -EBUSY;
>>> +		return write_one_page(page, true)<   0 ? -EIO : -EAGAIN;
>>
>> Why? The migrate.c's writeout() doesn't work?
>> I see it performs some cleanups..
>
> Hi,
>
> Because it's static. :)
> But OK, I exported writeout() and used it.
>
>>
>>> +	}
>>> +
>>> +	assert("???-4", !PageDirty(page));
>>> +
>>> +	assert("???-5", page->mapping != NULL);
>>> +	assert("???-6", page->mapping->host != NULL);
>>> +
>>> +	/*
>>> +	 * Iteration 1: release page before default migration by migrate_page().
>>> +	 * Iteration 2: check if the page is releasable; if true, directly
>>> replace +	 *              jnode's page pointer instead of releasing, then
>>> migrate using +	 *              default migrate_page().
>>> +	 */
>>> +
>>> +	/* -- comment taken from mm/migrate.c:migrate_page_move_mapping() --
>>> +	 * The number of remaining references must be:
>>> +	 * 1 for anonymous pages without a mapping
>>> +	 * 2 for pages with a mapping
>>> +	 * 3 for pages with a mapping and PagePrivate/PagePrivate2 set.
>>> +	 */
>>> +
>>> +	if (page_count(page)>   (PagePrivate(page) ? 3 : 2))
>>
>> OK
>>
>>> +		return -EIO;
>>
>> -EAGAIN would be better (like fallback_migrate_page() returns in
>> the case of "non-releasable")
>
> Fixed.
>
>>
>>> +
>>> +	/*
>>> +	 * Non-referenced non-PagePrivate pages are e. g. anonymous pages.
>>
>> "vfs-anonymous" are pages with page->mapping == NULL;
>>
>> "reiser4-anonymous" are pages which are dirtied via ->mmap(),
>> but not yet captured by reiser4 transaction manager.
>>
>> Note, that none of those cases takes place at this point.
>
> Hm, indeed that's an insane comment. Fixed.
>
>>
>>> +	 * If any, just migrate them using default routine.
>>> +	 */
>>> +
>>> +	if (!PagePrivate(page))
>>> +		return migrate_page(mapping, newpage, page, mode);
>>
>> OK
>>
>>> +
>>> +	node = jnode_by_page(page);
>>> +	assert("???-7", node != NULL);
>>> +
>>> +	/* releasable() needs jnode lock, because it looks at the jnode fields
>>> +	 * and we need jload_lock here to avoid races with jload(). */
>>> +	spin_lock_jnode(node);
>>> +	spin_lock(&(node->load));
>>> +	if (jnode_is_releasable(node)) {
>>> +		jref(node);
>>
>> OK
>>
>>> +
>>> +		/* there is no need to synchronize against
>>> +		 * jnode_extent_write() here, because pages seen by
>>> +		 * jnode_extent_write() are !releasable(). */
>>> +		page_clear_jnode(page, node);
>>> +
>>> +		if(jprivate(newpage)) {
>>
>> Hmm, strange: the newpage is freshly allocated and locked..
>> Somebody doesn't clean up after himself ???  Don't have other
>> ideas..
>
> Yes, apparently. There are no calls to set_page_private() in compaction.c or
> migrate.c (actually, in last file there are some, but they do not get called
> in observed pathes).
>
>>
>>> +			// FIXME: warning or what? happens on a regular basis, behavior is
>>> unaffected. +			warning("???-10", "Migration destination page has a
>>> non-NULL private field (%x) - resetting it", page_private(newpage));
>> Does it look like a pointer, which can be dereferenced?
>> (print it better in %p format). If so, try to detect a
>> jnode at this address by jnode's magic (JMAGIC 0x52654973,
>> present only when debug is on). To make sure we are not
>> the culprit..
>
> Cannot be dereferenced. Proved by a handful of oopses...
> The addresses (as printed by %p) look like "ffffea0002419800" and then
> I get pagefault oopses for addresses like "0000000000001010".


This is compaction manager, who provides pages with not cleared
private info. Lustre had the same problem. I've discussed with VS,
and here is his original explanation:

Vladimir Saveliev wrote:
 > migrate_pages() used to allocate new pages with a function passed as a
 > parameter.
 >
 > int migrate_pages(struct list_head *from,
 > new_page_t get_new_page, unsigned long private, bool offlining,
 > bool sync)
 >
 > In most cases allocating function passed to migrate_pages() ends up
 > with __alloc_pages() which calls get_page_from_freelist()
 > ->..-> prep_new_page() which sets page->private to 0.
 >
 > There is one exception however. In case of compact_zone() (introduced
 > in rhel6 kernels) allocating function is compaction_alloc().
 >
 > This function seems to avoid traditional page allocation path, it
 > takes free pages from isolated free lists and page->private does not
 > get set to 0.
 >
 > Then migrate_page_move_mapping() puts that new page into mapping's
 > page tree where lustre's ll_read_ahead_page() finds nonprivate page
 > with page->private != 0 and oops-es.


>
>>
>>> +			set_page_private(newpage, 0ul);
>>> +		}
>>> +
>>> +		result = migrate_page(mapping, newpage, page, mode);
>>> +		if (unlikely(result)) {
>>> +			jnode_attach_page(node, page); /* migration failed - reattach the old
>>> page */ +		} else {
>>> +			jnode_attach_page(node, newpage);
>>> +		}
>>> +
>>> +		jput(node);
>>> +
>>> +		spin_unlock(&(node->load));
>>> +		spin_unlock_jnode(node);
>>> +		assert("???-9", reiser4_schedulable());
>>> +		return result;
>>> +	} else {
>>> +		spin_unlock(&(node->load));
>>> +		spin_unlock_jnode(node);
>>> +		assert("???-8", reiser4_schedulable());
>>> +		return -EIO;
>>
>> -EAGAIN
>
> Fixed.
>
>>
>>> +	}
>>> +}
>>> +
>>
>> So simply releasing the page (by default migration routine) doesn't
>> work, while such transfer of the relationship (page, jnode) to
>> another pair (newpage, jnode) does work? Can not understand this..
>>
>> E.g. vmscanner (vmscan.c), which also releases releasable page, but
>> doesn't provide a newpage instead (in contrast with migration) works
>> perfectly..
>
> /me just looked at vmscan.c, which also skips ->releasepage() when there is
> no private data on page, and wonders how did that ever work with reiser4.


It does right things: ->releasepage() is only to free all resources
related to private info. Moreover, being called with zeroed private,
->releasepage() will oops.

Sorry for confusion: I was unhappy that default migratepage doesn't
check reference counter. Actually, it does: migrate_page() calls
migrate_page_move_mapping(), which checks if the counter is
2 + page_has_private(page).

I think that default migratepage (fallback_migrate_page) must work for
all file systems. For some of them it can be suboptimal, so migration
developers provided possibility to supply an optimal one via
->migratepage asop. I am sure we'll eventually understand why default
migratepage doesn't work for reiser4..


>
> So I'd say there is nothing strange in that the fallback_migrate_page() does not work
> while our custom one works (it's all about remembering about non-PagePrivate pages);
> instead, it is strange that the vmscanner works while fallback_migrate_page() does not.
>
> Maybe vmscanner finds out that the page is used by some other means?..
>
>>
>> Did you stress it well enough?
>
> Yes, I think so. Parallel building + various git operations + databases + virtual machines.
>
>>
>> I think we'll release reiser4-for-3.7 with fail_migrate_page (to fix
>> a stability point), and continue to play with migration..
>
> Maybe.. What are the next targets to play?


It depends on your preferences. What do you want?
Resolve performance issues? Add new features? Let me know and
I'll shed more light to the specified item...

Thanks,
Edward.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: Kernel config option which causes reiser4 to be instable
  2013-01-07  0:06                                     ` Edward Shishkin
@ 2013-01-07  1:33                                       ` Ivan Shapovalov
  0 siblings, 0 replies; 28+ messages in thread
From: Ivan Shapovalov @ 2013-01-07  1:33 UTC (permalink / raw)
  To: Edward Shishkin; +Cc: Dušan Čolić, reiserfs-devel

On 07 January 2013 01:06:36 Edward Shishkin wrote:
> On 12/29/2012 07:47 PM, Ivan Shapovalov wrote:
> > On 29 December 2012 01:24:48 Edward Shishkin wrote:
> >> On 12/26/2012 05:22 PM, Ivan Shapovalov wrote:
> >>> Hello Edward,
> >> 
> >> Hi Ivan,
> >> 
> >>> I've apparently managed to get a working implementation of
> >>> ->migratepage().
> >>> 
> >>> I'm attaching a patch; it seems stable, but I'm worried somewhat because
> >>> the destination pages (parameter newpage) sometimes (pretty often) have
> >>> a
> >>> non-NULL private field.
> >>> 
> >>> Currently I've put there a warning and a set_page_private(newpage, 0) -
> >>> it
> >>> seems to work, but well...
> >>> I'm continuing to test it and will report if that actually has bad
> >>> consequences.
> >>> 
> >>> - Ivan
> >>> 
> >>>>>   Edward.
> >>>>>   
> >>>>>>   >   Thanks,
> >>>>>>   >   Ivan.
> >>>>>>   >   
> >>>>>>>   >>   Also before the release I'll try to take a look at this:
> >>>>>>>   >>   http://marc.info/?l=reiserfs-devel&m=135402207623711&w=2
> >>>>>>>   >>   
> >>>>>>>   >>   This failed path might indicate that we adjusted to
> >>>>>>>   >>   fs-writeback
> >>>>>>>   >>   incorrectly.
> >>>>>>>   >>   
> >>>>>>>   >>   Edward.
> >>>>>>>   >>   
> >>>>>>>>   >>>   Regards,
> >>>>>>>>   >>>   Ivan.
> >>>>>>>>   >>>   
> >>>>>>>>>>>>   >>>>>>>         on kernel
> >>>>>>>>>>>>   >>>>>>>   
> >>>>>>>>>>>>   >>>>>>>   3.6.10, and everything seems to be OK so far (so
> >>>>>>>>>>>>   >>>>>>>   the
> >>>>>>>>>>>>   >>>>>>>   workaround is
> >>>>>>>>>>>>   >>>>>>>   version-
> >>>>>>>>>>>>   >>>>>>>   agnostic).
> >>>>>>>>>>>>   >>>>>>>   
> >>>>>>>>>>>>   >>>>>>>   Edward, are there any guesses on what can make
> >>>>>>>>>>>>   >>>>>>>   reiser4
> >>>>>>>>>>>>   >>>>>>>   choke on
> >>>>>>>>>>>>   >>>>>>>   hugepages/compaction/migration?
> >>>>>>>>>>>   >>>>>>   
> >>>>>>>>>>>   >>>>>>   TBH, no ideas. They (hugepages) are_transparent_.
> >>>>>>>>>>>   >>>>>>   It means we shouldn't suffer in theory;)
> >>>>>>>>>>>   >>>>>>   
> >>>>>>>>>>>>   >>>>>>>         I'm not even barely familiar with the kernel
> >>>>>>>>>>>>   >>>>>>>   
> >>>>>>>>>>>>   >>>>>>>   internals.
> >>>>>>>>>>>>   >>>>>>>   
> >>>>>>>>>>>>   >>>>>>>   Thanks,
> >>>>>>>>>>>>   >>>>>>>   Ivan.
> >>> 
> >>> reiser4-migratepage.patch
> >>> 
> >>> 
> >>> diff --git a/fs/reiser4/as_ops.c b/fs/reiser4/as_ops.c
> >>> index 8d8a37d..20006fc 100644
> >>> --- a/fs/reiser4/as_ops.c
> >>> +++ b/fs/reiser4/as_ops.c
> >>> @@ -43,6 +43,7 @@
> >>> 
> >>>    #include<linux/backing-dev.h>
> >>>    #include<linux/quotaops.h>
> >>>    #include<linux/security.h>
> >>> 
> >>> +#include<linux/migrate.h>
> >>> 
> >>>    /* address space operations */
> >>> 
> >>> @@ -304,6 +305,98 @@ int reiser4_releasepage(struct page *page, gfp_t
> >>> gfp
> >>> UNUSED_ARG)>
> >>> 
> >>>    	}
> >>>    
> >>>    }
> >>> 
> >>> +int reiser4_migratepage(struct address_space *mapping, struct page
> >>> *newpage, +			struct page *page, enum migrate_mode mode)
> >>> +{
> >>> +	jnode *node;
> >>> +	int result;
> >>> +
> >>> +	assert("???-1", PageLocked(page));
> >>> +	assert("???-2", !PageWriteback(page));
> >>> +	assert("???-3", reiser4_schedulable());
> >>> +
> >>> +	if (PageDirty(page)) {
> >>> +		/*
> >>> +		 * Logic from migrate.c:fallback_migrate_page()
> >>> +		 * Only writeback pages in full synchronous migration
> >>> +		 */
> >>> +		if (mode != MIGRATE_SYNC)
> >>> +			return -EBUSY;
> >>> +		return write_one_page(page, true)<   0 ? -EIO : -EAGAIN;
> >> 
> >> Why? The migrate.c's writeout() doesn't work?
> >> I see it performs some cleanups..
> > 
> > Hi,
> > 
> > Because it's static. :)
> > But OK, I exported writeout() and used it.
> > 
> >>> +	}
> >>> +
> >>> +	assert("???-4", !PageDirty(page));
> >>> +
> >>> +	assert("???-5", page->mapping != NULL);
> >>> +	assert("???-6", page->mapping->host != NULL);
> >>> +
> >>> +	/*
> >>> +	 * Iteration 1: release page before default migration by
> >>> migrate_page().
> >>> +	 * Iteration 2: check if the page is releasable; if true, directly
> >>> replace +	 *              jnode's page pointer instead of releasing,
> >>> then
> >>> migrate using +	 *              default migrate_page().
> >>> +	 */
> >>> +
> >>> +	/* -- comment taken from mm/migrate.c:migrate_page_move_mapping() --
> >>> +	 * The number of remaining references must be:
> >>> +	 * 1 for anonymous pages without a mapping
> >>> +	 * 2 for pages with a mapping
> >>> +	 * 3 for pages with a mapping and PagePrivate/PagePrivate2 set.
> >>> +	 */
> >>> +
> >>> +	if (page_count(page)>   (PagePrivate(page) ? 3 : 2))
> >> 
> >> OK
> >> 
> >>> +		return -EIO;
> >> 
> >> -EAGAIN would be better (like fallback_migrate_page() returns in
> >> the case of "non-releasable")
> > 
> > Fixed.
> > 
> >>> +
> >>> +	/*
> >>> +	 * Non-referenced non-PagePrivate pages are e. g. anonymous pages.
> >> 
> >> "vfs-anonymous" are pages with page->mapping == NULL;
> >> 
> >> "reiser4-anonymous" are pages which are dirtied via ->mmap(),
> >> but not yet captured by reiser4 transaction manager.
> >> 
> >> Note, that none of those cases takes place at this point.
> > 
> > Hm, indeed that's an insane comment. Fixed.
> > 
> >>> +	 * If any, just migrate them using default routine.
> >>> +	 */
> >>> +
> >>> +	if (!PagePrivate(page))
> >>> +		return migrate_page(mapping, newpage, page, mode);
> >> 
> >> OK
> >> 
> >>> +
> >>> +	node = jnode_by_page(page);
> >>> +	assert("???-7", node != NULL);
> >>> +
> >>> +	/* releasable() needs jnode lock, because it looks at the jnode fields
> >>> +	 * and we need jload_lock here to avoid races with jload(). */
> >>> +	spin_lock_jnode(node);
> >>> +	spin_lock(&(node->load));
> >>> +	if (jnode_is_releasable(node)) {
> >>> +		jref(node);
> >> 
> >> OK
> >> 
> >>> +
> >>> +		/* there is no need to synchronize against
> >>> +		 * jnode_extent_write() here, because pages seen by
> >>> +		 * jnode_extent_write() are !releasable(). */
> >>> +		page_clear_jnode(page, node);
> >>> +
> >>> +		if(jprivate(newpage)) {
> >> 
> >> Hmm, strange: the newpage is freshly allocated and locked..
> >> Somebody doesn't clean up after himself ???  Don't have other
> >> ideas..
> > 
> > Yes, apparently. There are no calls to set_page_private() in compaction.c
> > or migrate.c (actually, in last file there are some, but they do not get
> > called in observed pathes).
> > 
> >>> +			// FIXME: warning or what? happens on a regular basis, behavior 
is
> >>> unaffected. +			warning("???-10", "Migration destination page has a
> >>> non-NULL private field (%x) - resetting it", page_private(newpage));
> >> 
> >> Does it look like a pointer, which can be dereferenced?
> >> (print it better in %p format). If so, try to detect a
> >> jnode at this address by jnode's magic (JMAGIC 0x52654973,
> >> present only when debug is on). To make sure we are not
> >> the culprit..
> > 
> > Cannot be dereferenced. Proved by a handful of oopses...
> > The addresses (as printed by %p) look like "ffffea0002419800" and then
> > I get pagefault oopses for addresses like "0000000000001010".
> 
> This is compaction manager, who provides pages with not cleared
> private info. Lustre had the same problem. I've discussed with VS,
> and here is his original explanation:
> 
> Vladimir Saveliev wrote:
>  > migrate_pages() used to allocate new pages with a function passed as a
>  > parameter.
>  > 
>  > int migrate_pages(struct list_head *from,
>  > new_page_t get_new_page, unsigned long private, bool offlining,
>  > bool sync)
>  > 
>  > In most cases allocating function passed to migrate_pages() ends up
>  > with __alloc_pages() which calls get_page_from_freelist()
>  > ->..-> prep_new_page() which sets page->private to 0.
>  > 
>  > There is one exception however. In case of compact_zone() (introduced
>  > in rhel6 kernels) allocating function is compaction_alloc().
>  > 
>  > This function seems to avoid traditional page allocation path, it
>  > takes free pages from isolated free lists and page->private does not
>  > get set to 0.
>  > 
>  > Then migrate_page_move_mapping() puts that new page into mapping's
>  > page tree where lustre's ll_read_ahead_page() finds nonprivate page
>  > with page->private != 0 and oops-es.
>  > 
> >>> +			set_page_private(newpage, 0ul);
> >>> +		}
> >>> +
> >>> +		result = migrate_page(mapping, newpage, page, mode);
> >>> +		if (unlikely(result)) {
> >>> +			jnode_attach_page(node, page); /* migration failed - reattach 
the
> >>> old
> >>> page */ +		} else {
> >>> +			jnode_attach_page(node, newpage);
> >>> +		}
> >>> +
> >>> +		jput(node);
> >>> +
> >>> +		spin_unlock(&(node->load));
> >>> +		spin_unlock_jnode(node);
> >>> +		assert("???-9", reiser4_schedulable());
> >>> +		return result;
> >>> +	} else {
> >>> +		spin_unlock(&(node->load));
> >>> +		spin_unlock_jnode(node);
> >>> +		assert("???-8", reiser4_schedulable());
> >>> +		return -EIO;
> >> 
> >> -EAGAIN
> > 
> > Fixed.
> > 
> >>> +	}
> >>> +}
> >>> +
> >> 
> >> So simply releasing the page (by default migration routine) doesn't
> >> work, while such transfer of the relationship (page, jnode) to
> >> another pair (newpage, jnode) does work? Can not understand this..
> >> 
> >> E.g. vmscanner (vmscan.c), which also releases releasable page, but
> >> doesn't provide a newpage instead (in contrast with migration) works
> >> perfectly..
> > 
> > /me just looked at vmscan.c, which also skips ->releasepage() when there
> > is
> > no private data on page, and wonders how did that ever work with reiser4.
> 
> It does right things: ->releasepage() is only to free all resources
> related to private info. Moreover, being called with zeroed private,
> ->releasepage() will oops.
> 
> Sorry for confusion: I was unhappy that default migratepage doesn't
> check reference counter. Actually, it does: migrate_page() calls
> migrate_page_move_mapping(), which checks if the counter is
> 2 + page_has_private(page).
> 
> I think that default migratepage (fallback_migrate_page) must work for
> all file systems. For some of them it can be suboptimal, so migration
> developers provided possibility to supply an optimal one via
> ->migratepage asop. I am sure we'll eventually understand why default
> migratepage doesn't work for reiser4..


Hm... I looked at that function (migrate_page_move_mapping()) before and 
somehow became convinced that it does not check refcount.
And now I have completely no ideas. :)

> 
> > So I'd say there is nothing strange in that the fallback_migrate_page()
> > does not work while our custom one works (it's all about remembering
> > about non-PagePrivate pages); instead, it is strange that the vmscanner
> > works while fallback_migrate_page() does not.
> > 
> > Maybe vmscanner finds out that the page is used by some other means?..
> > 
> >> Did you stress it well enough?
> > 
> > Yes, I think so. Parallel building + various git operations + databases +
> > virtual machines.> 
> >> I think we'll release reiser4-for-3.7 with fail_migrate_page (to fix
> >> a stability point), and continue to play with migration..
> > 
> > Maybe.. What are the next targets to play?
> 
> It depends on your preferences. What do you want?
> Resolve performance issues? Add new features? Let me know and
> I'll shed more light to the specified item...

Yes, I'd like to work on a couple of things. I'll start a new thread then.

Thanks,
Ivan.

> 
> Thanks,
> Edward.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2013-01-07  1:33 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-07 17:56 R4 problem started with 2.6.39 and still there with 3.6.6 Dušan Čolić
2012-12-07 18:34 ` Dušan Čolić
2012-12-09 15:17   ` Ivan Shapovalov
2012-12-09 16:19     ` Dušan Čolić
2012-12-09 16:29       ` Dušan Čolić
2012-12-09 16:38         ` Ivan Shapovalov
2012-12-09 17:12           ` Dušan Čolić
2012-12-09 17:54             ` Dušan Čolić
2012-12-10 20:08               ` Dušan Čolić
2012-12-11 15:08               ` Kernel config option which causes reiser4 to be instable Ivan Shapovalov
2012-12-11 18:33                 ` Edward Shishkin
2012-12-11 18:49                   ` Ivan Shapovalov
2012-12-12  3:23                     ` Ivan Shapovalov
     [not found]                       ` <21180603.IycRkMTJZZ@intelfx-laptop>
2012-12-13 20:51                         ` Edward Shishkin
2012-12-11 20:54                   ` Dušan Čolić
2012-12-13 22:47                     ` Edward Shishkin
2012-12-14  3:14                       ` Ivan Shapovalov
2012-12-14 11:07                         ` Edward Shishkin
2012-12-14 18:20                           ` Ivan Shapovalov
2012-12-16 15:36                             ` Edward Shishkin
2012-12-26 16:22                               ` Ivan Shapovalov
2012-12-29  0:24                                 ` Edward Shishkin
2012-12-29 18:47                                   ` Ivan Shapovalov
2013-01-07  0:06                                     ` Edward Shishkin
2013-01-07  1:33                                       ` Ivan Shapovalov
2012-12-09 12:36 ` R4 problem started with 2.6.39 and still there with 3.6.6 Ivan Shapovalov
2012-12-09 14:47   ` Dušan Čolić
2012-12-09 14:52     ` Dušan Čolić

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.