All of lore.kernel.org
 help / color / mirror / Atom feed
* [SPAM]  Write hang on ARM based target
@ 2009-06-08 15:46 Dunphy, Bill
       [not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900F8D-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Dunphy, Bill @ 2009-06-08 15:46 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg


[-- Attachment #1.1: Type: text/plain, Size: 979 bytes --]

Greetings,
 
I'm attempting to get NILFS up and running on a Marvell DB-88F6281-BP-A
(ARM) and have experienced a consistent hang during data transfer.  This
target board is running a 2.6.22.18 kernel and I've tested it against
NILFS versions 2.0.12, 2.0.14 and the latest git pull (6/4/09).  I've
eliminated the garbage collector and and have enabled full debugging for
log capture.  The failure is not immediate or at a fixed point -
typically in the 1-2GB neighborhood of writing variably sized files.
This same exercise (a simple recursive local copy) works flawlessly on
my host Ubuntu i86 machine as well as on the target with an ext3 file
system.
 
Is there a specific kernel feature that I might be missing and need to
enable?  A kernel version sensitivity?  Is this log information
something you would be interested in seeing?  If so, how verbose would
you like it or better still, how would you like me to collect it?
 
Thanks,
 
William Dunphy
 

[-- Attachment #1.2: Type: text/html, Size: 2831 bytes --]

[-- Attachment #2: Type: text/plain, Size: 158 bytes --]

_______________________________________________
users mailing list
users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
https://www.nilfs.org/mailman/listinfo/users

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [SPAM] Write hang on ARM based target
       [not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900F8D-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
@ 2009-06-08 16:49   ` Ryusuke Konishi
       [not found]     ` <20090609.014919.79903506.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Ryusuke Konishi @ 2009-06-08 16:49 UTC (permalink / raw)
  To: users-JrjvKiOkagjYtjvyW6yDsg, WDunphy-/3AnZwnt+WyVkR+q5Y5grQ

Hi!
On Mon, 8 Jun 2009 09:46:52 -0600, "Dunphy, Bill" wrote:
> Greetings,
>  
> I'm attempting to get NILFS up and running on a Marvell DB-88F6281-BP-A
> (ARM) and have experienced a consistent hang during data transfer.  This
> target board is running a 2.6.22.18 kernel and I've tested it against
> NILFS versions 2.0.12, 2.0.14 and the latest git pull (6/4/09).  I've
> eliminated the garbage collector and and have enabled full debugging for
> log capture.  The failure is not immediate or at a fixed point -
> typically in the 1-2GB neighborhood of writing variably sized files.
> This same exercise (a simple recursive local copy) works flawlessly on
> my host Ubuntu i86 machine as well as on the target with an ext3 file
> system.
>  
> Is there a specific kernel feature that I might be missing and need to
> enable?  A kernel version sensitivity?  Is this log information
> something you would be interested in seeing?  If so, how verbose would
> you like it or better still, how would you like me to collect it?

Thanks for reporting.

What will happen if you trigger SysRq when you meet the hang?

 # echo t > /proc/sysrq-trigger

I don't know whether the arm kernel codes is supporting sysrq, but we
may get the stack dump of the nilfs task hung at some lock.  If it
works, it would help us to narrow down the problem.

Also, you can know the progress of segment writer by specifying the
following debug options,

 # echo "-vvv segment -vv seginfo" > /proc/fs/nilfs2/debug_option

This feature is only available on the nilfs2 out-of-tree module built
with CONFIG_NILFS_DEBUG=y option.

Could you try either of these?

Thanks in advance,
Ryusuke Konishi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [SPAM] Write hang on ARM based target
       [not found]     ` <20090609.014919.79903506.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2009-06-08 20:30       ` Dunphy, Bill
       [not found]         ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900F96-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Dunphy, Bill @ 2009-06-08 20:30 UTC (permalink / raw)
  To: Ryusuke Konishi, users-JrjvKiOkagjYtjvyW6yDsg

Ryusuke,

Thanks for the prompt reply.

>Hi!
>On Mon, 8 Jun 2009 09:46:52 -0600, "Dunphy, Bill" wrote:
>> Greetings,
>>  
>> I'm attempting to get NILFS up and running on a Marvell 
>> DB-88F6281-BP-A
>> (ARM) and have experienced a consistent hang during data transfer.  
>> This target board is running a 2.6.22.18 kernel and I've tested it 
>> against NILFS versions 2.0.12, 2.0.14 and the latest git pull 
>> (6/4/09).  I've eliminated the garbage collector and and have enabled

>> full debugging for log capture.  The failure is not immediate or at a

>> fixed point - typically in the 1-2GB neighborhood of writing variably
sized files.
>> This same exercise (a simple recursive local copy) works flawlessly
on 
>> my host Ubuntu i86 machine as well as on the target with an ext3 file

>> system.
>>  
>> Is there a specific kernel feature that I might be missing and need
to 
>> enable?  A kernel version sensitivity?  Is this log information 
>> something you would be interested in seeing?  If so, how verbose
would 
>> you like it or better still, how would you like me to collect it?
>
>Thanks for reporting.
>
>What will happen if you trigger SysRq when you meet the hang?
>
> # echo t > /proc/sysrq-trigger
>
>I don't know whether the arm kernel codes is supporting sysrq, but we
may get the stack dump of >the nilfs task hung at some lock.  If it
works, it would help us to narrow down the problem.

Here's the stack trace of the copy process which has entered a permanent
disk sleep state:

cp            D C002CAA4     0   339    333 (NOTLB)
[<c002c580>] (schedule+0x0/0x600) from [<c002d54c>]
(schedule_timeout+0x98/0xc4)
[<c002d4b4>] (schedule_timeout+0x0/0xc4) from [<c002cc28>]
(io_schedule_timeout+0x34/0x58)
 r7:00000001 r6:dfe6fc64 r5:dfd7bac0 r4:c03fb3a0
[<c002cbf4>] (io_schedule_timeout+0x0/0x58) from [<c014b834>]
(congestion_wait+0x70/0x94)
 r4:c03df664
[<c014b7c4>] (congestion_wait+0x0/0x94) from [<c0146b2c>]
(balance_dirty_pages_ratelimited_nr+0x12c/0x1c0)
 r8:00000000 r7:cf142660 r6:00000600 r5:c04038c4 r4:ffffffce
[<c0146a00>] (balance_dirty_pages_ratelimited_nr+0x0/0x1c0) from
[<c0142860>] (generic_file_buffered_write+0x570/0x670)
[<c01422f0>] (generic_file_buffered_write+0x0/0x670) from [<c0142db0>]
(__generic_file_aio_write_nolock+0x450/0x4a0)
[<c0142960>] (__generic_file_aio_write_nolock+0x0/0x4a0) from
[<c0142e78>] (generic_file_aio_write+0x78/0xf4)
[<c0142e00>] (generic_file_aio_write+0x0/0xf4) from [<c0160430>]
(do_sync_write+0xb8/0x108)
[<c0160378>] (do_sync_write+0x0/0x108) from [<c0160d18>]
(vfs_write+0xb8/0x148)
[<c0160c60>] (vfs_write+0x0/0x148) from [<c01612dc>]
(sys_write+0x44/0x70)
 r7:00000549 r6:dead0bc0 r5:00000000 r4:00009000
[<c0161298>] (sys_write+0x0/0x70) from [<c0027400>]
(ret_fast_syscall+0x0/0x2c)
 r8:c00275a8 r7:00000004 r6:0000000d r5:bee23f50 r4:00000549

>Also, you can know the progress of segment writer by specifying the
following debug options,
>
> # echo "-vvv segment -vv seginfo" > /proc/fs/nilfs2/debug_option
>
>This feature is only available on the nilfs2 out-of-tree module built
with CONFIG_NILFS_DEBUG=y >option.

Here's the tail of the syslog with segment and seginfo logging enhanced
as requested:

NILFS(segment) nilfs_segctor_sync: start task=dfd7bac0 seq=159
NILFS(segment) nilfs_segctor_thread: woke 
NILFS(segment) nilfs_segctor_thread: sequence: req=159, done=158,
state=0
NILFS(segment) nilfs_transaction_lock: task c0909820 locking segment
semaphore
NILFS(segment) nilfs_transaction_lock: locked
NILFS(segment) nilfs_segctor_confirm: Skipped construction (no changes)
NILFS(segment) nilfs_segctor_notify: complete requests from seq=159 to
seq=159
NILFS(segment) nilfs_segctor_wakeup: wakeup task=dfe6fb44 seq=159
NILFS(segment) nilfs_transaction_unlock: task c0909820 unlocked segment
semaphore
NILFS(segment) nilfs_segctor_thread: sequence: req=159, done=159,
state=0
NILFS(segment) nilfs_segctor_sync: done task=dfd7bac0 seq=159 err=0
NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment
semaphore
NILFS(segment) nilfs_transaction_lock: locked
NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment
semaphore
NILFS(segment) nilfs_segctor_sync: start task=dfd7bac0 seq=160
NILFS(segment) nilfs_segctor_thread: woke 
NILFS(segment) nilfs_segctor_thread: sequence: req=160, done=159,
state=0
NILFS(segment) nilfs_transaction_lock: task c0909820 locking segment
semaphore
NILFS(segment) nilfs_transaction_lock: locked
NILFS(segment) nilfs_segctor_confirm: Skipped construction (no changes)
NILFS(segment) nilfs_segctor_notify: comion_unlock: task dfd7bac0
unlocked segment semaphore
NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment
semaphore
NILFS(segment) nilfs_transaction_lock: locked
NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment
semaphore

// Bunch of repeated lock, lock, unlock entries...

NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment
semaphore
NILFS(segment) nilfs_transaction_lock: locked
NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment
semaphore
NILFS(segment) nilfs_segctor_sync: start task=dfd7bac0 seq=703
NILFS(segment) nilfs_segctor_thread: woke 
NILFS(segment) nilfs_segctor_thread: sequence: req=703, done=702,
state=0
NILFS(segment) nilfs_transaction_lock: task c0909820 locking segment
semaphore
NILFS(segment) nilfs_transaction_lock: locked
NILFS(segment) nilfs_segctor_confirm: Skipped construction (no changes)
NILFS(segment) nilfs_segctor_notify: complete requests from seq=703 to
seq=703
NILFS(segment) nilfs_segctor_wakeup: wakeup task=dfe6fb44 seq=703
NILFS(segment) nilfs_transaction_unlock: task c0909820 unlocked segment
semaphore
NILFS(segment) nilfs_segctor_thread: sequence: req=703, done=703,
state=0
NILFS(segment) nilfs_segctor_sync: done task=dfd7bac0 seq=703 err=0
NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment
semaphore
NILFS(segment) nilfs_transaction_lock: locked
NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment
semaphore

// Bunch of repeated lock, lock, unlock entries...

NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment
semaphore
NILFS(segment) nilfs_transaction_lock: locked
NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment
semaphore

// Game over.

As you can see, I've cut out some repeated sections to trim down the
output.  If you need more log info beyond this tail, let me know how
much and how you would like me to pass it to you.

Bill 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Write hang on ARM based target
       [not found]         ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900F96-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
@ 2009-06-09 19:04           ` Dunphy, Bill
       [not found]             ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900F99-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Dunphy, Bill @ 2009-06-09 19:04 UTC (permalink / raw)
  To: Ryusuke Konishi, users-JrjvKiOkagjYtjvyW6yDsg

 
Ryusuke,

I've simplified my test approach by simply copying a single 10 byte file
repeatedly.  When doing so, I get consistent NILFS log output at the
time of hang.  Examination of /proc/sysreq-trigger also shows the copy
process to be in the same state as before (at sys_write).

Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_construct: begin
(mode=0x1)                              
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: **
LSEG BEGIN                            
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: **
GC INODE STAGE                        
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: **
FILE STAGE                            
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: **
IFILE STAGE                           
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: called
(ino=6, stage-count=3)                 
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers:
called (ino=6)                        
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers:
done (ino=6)                          
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_node_buffers:
called (ino=6)                        
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: done
(err=0)                                  
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: **
CP STAGE                              
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: called
(ino=4, stage-count=4)                 
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers:
called (ino=4)                        
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers:
done (ino=4)                          
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_node_buffers:
called (ino=4)                        
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: done
(err=0)                                  
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: **
SUFILE STAGE                          
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: called
(ino=5, stage-count=5)                 
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers:
called (ino=5)                        
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers:
done (ino=5)                          
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_node_buffers:
called (ino=5)                        
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: done
(err=0)                                  
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: **
DAT STAGE                             
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: called
(ino=3, stage-count=6)                 
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers:
called (ino=3)                        
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers:
done (ino=3)                          
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_node_buffers:
called (ino=3)                        
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: done
(err=0)                                  
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: **
SR STAGE                              
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: add
a super root block                   
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: **
LSEG END                              
Jun  9 12:55:14 kernel: ========= NILFS SEGMENT INFORMATION ========

Jun  9 12:55:14 kernel: full segment: segnum=101, start=206848,
end=208895                                    
Jun  9 12:55:14 kernel: partial segment: start=208788, rest=108

Jun  9 12:55:14 kernel: ------------------ SUMMARY -----------------

Jun  9 12:55:14 kernel: nfinfo     = 4 (number of files)

Jun  9 12:55:14 kernel: nblocks    = 22 (number of blocks)

Jun  9 12:55:14 kernel: sumbytes   = 408 (size of summary in bytes)

Jun  9 12:55:14 kernel: nsumblk    = 1 (number of summary blocks)

Jun  9 12:55:14 kernel: flags      = LOGBGN|LOGEND|SR

Jun  9 12:55:14 kernel: ============================================

Jun  9 12:55:14 kernel: NILFS(segment)
nilfs_segctor_update_payload_blocknr: called                           
Jun  9 12:55:14 kernel: NILFS(segment)
nilfs_segctor_update_payload_blocknr: done                             
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap:
called                                
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap:
done                                  
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint:
called                               
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint:
done                                 
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_update_segusage:
called                                  
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_update_segusage:
done                                    
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums:
called                                
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums:
done                                  
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segbuf_write: submitting
summary blocks                          
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_alloc_seg_bio: allocated
bio (max_vecs=64)                       
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segbuf_write: submitting
normal blocks (index=1)                 
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_submit_seg_bio: submitting
bio (start_sector=1670304, size=90112)
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segbuf_write: submitted a
segment (err=0, pseg_start=208788, #re)
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segbuf_wait: called nbio=1

Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segbuf_wait: wait completed

Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_complete_write:
completing segment (flags=0x7)           
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_complete_write:
completed a segment having a super root )
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_do_construct:
submitted all segments                     
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_construct: end
(stage=9)                                 
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_notify: complete
requests from seq=2 to seq=1            
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_transaction_unlock: task
c08c0820 unlocked segment semaphore     
Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_thread: sequence:
req=1, done=1, state=0 
  
Bill

-----Original Message-----
From: Dunphy, Bill 
Sent: Monday, June 08, 2009 2:31 PM
To: 'Ryusuke Konishi'; users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
Subject: RE: [NILFS users] [SPAM] Write hang on ARM based target

Ryusuke,

Thanks for the prompt reply.

>Hi!
>On Mon, 8 Jun 2009 09:46:52 -0600, "Dunphy, Bill" wrote:
>> Greetings,
>>  
>> I'm attempting to get NILFS up and running on a Marvell 
>> DB-88F6281-BP-A
>> (ARM) and have experienced a consistent hang during data transfer.  
>> This target board is running a 2.6.22.18 kernel and I've tested it 
>> against NILFS versions 2.0.12, 2.0.14 and the latest git pull 
>> (6/4/09).  I've eliminated the garbage collector and and have enabled

>> full debugging for log capture.  The failure is not immediate or at a

>> fixed point - typically in the 1-2GB neighborhood of writing variably
sized files.
>> This same exercise (a simple recursive local copy) works flawlessly 
>> on my host Ubuntu i86 machine as well as on the target with an ext3 
>> file system.
>>  
>> Is there a specific kernel feature that I might be missing and need 
>> to enable?  A kernel version sensitivity?  Is this log information 
>> something you would be interested in seeing?  If so, how verbose 
>> would you like it or better still, how would you like me to collect
it?
>
>Thanks for reporting.
>
>What will happen if you trigger SysRq when you meet the hang?
>
> # echo t > /proc/sysrq-trigger
>
>I don't know whether the arm kernel codes is supporting sysrq, but we
may get the stack dump of >the nilfs task hung at some lock.  If it
works, it would help us to narrow down the problem.

Here's the stack trace of the copy process which has entered a permanent
disk sleep state:

cp            D C002CAA4     0   339    333 (NOTLB)
[<c002c580>] (schedule+0x0/0x600) from [<c002d54c>]
(schedule_timeout+0x98/0xc4) [<c002d4b4>] (schedule_timeout+0x0/0xc4)
from [<c002cc28>] (io_schedule_timeout+0x34/0x58)
 r7:00000001 r6:dfe6fc64 r5:dfd7bac0 r4:c03fb3a0 [<c002cbf4>]
(io_schedule_timeout+0x0/0x58) from [<c014b834>]
(congestion_wait+0x70/0x94)
 r4:c03df664
[<c014b7c4>] (congestion_wait+0x0/0x94) from [<c0146b2c>]
(balance_dirty_pages_ratelimited_nr+0x12c/0x1c0)
 r8:00000000 r7:cf142660 r6:00000600 r5:c04038c4 r4:ffffffce
[<c0146a00>] (balance_dirty_pages_ratelimited_nr+0x0/0x1c0) from
[<c0142860>] (generic_file_buffered_write+0x570/0x670)
[<c01422f0>] (generic_file_buffered_write+0x0/0x670) from [<c0142db0>]
(__generic_file_aio_write_nolock+0x450/0x4a0)
[<c0142960>] (__generic_file_aio_write_nolock+0x0/0x4a0) from
[<c0142e78>] (generic_file_aio_write+0x78/0xf4)
[<c0142e00>] (generic_file_aio_write+0x0/0xf4) from [<c0160430>]
(do_sync_write+0xb8/0x108) [<c0160378>] (do_sync_write+0x0/0x108) from
[<c0160d18>] (vfs_write+0xb8/0x148) [<c0160c60>] (vfs_write+0x0/0x148)
from [<c01612dc>] (sys_write+0x44/0x70)
 r7:00000549 r6:dead0bc0 r5:00000000 r4:00009000 [<c0161298>]
(sys_write+0x0/0x70) from [<c0027400>] (ret_fast_syscall+0x0/0x2c)
 r8:c00275a8 r7:00000004 r6:0000000d r5:bee23f50 r4:00000549

>Also, you can know the progress of segment writer by specifying the 
>following debug options,
>
> # echo "-vvv segment -vv seginfo" > /proc/fs/nilfs2/debug_option
>
>This feature is only available on the nilfs2 out-of-tree module built
with CONFIG_NILFS_DEBUG=y >option.

Here's the tail of the syslog with segment and seginfo logging enhanced
as requested:

NILFS(segment) nilfs_segctor_sync: start task=dfd7bac0 seq=159
NILFS(segment) nilfs_segctor_thread: woke
NILFS(segment) nilfs_segctor_thread: sequence: req=159, done=158,
state=0
NILFS(segment) nilfs_transaction_lock: task c0909820 locking segment
semaphore
NILFS(segment) nilfs_transaction_lock: locked
NILFS(segment) nilfs_segctor_confirm: Skipped construction (no changes)
NILFS(segment) nilfs_segctor_notify: complete requests from seq=159 to
seq=159
NILFS(segment) nilfs_segctor_wakeup: wakeup task=dfe6fb44 seq=159
NILFS(segment) nilfs_transaction_unlock: task c0909820 unlocked segment
semaphore
NILFS(segment) nilfs_segctor_thread: sequence: req=159, done=159,
state=0
NILFS(segment) nilfs_segctor_sync: done task=dfd7bac0 seq=159 err=0
NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment
semaphore
NILFS(segment) nilfs_transaction_lock: locked
NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment
semaphore
NILFS(segment) nilfs_segctor_sync: start task=dfd7bac0 seq=160
NILFS(segment) nilfs_segctor_thread: woke
NILFS(segment) nilfs_segctor_thread: sequence: req=160, done=159,
state=0
NILFS(segment) nilfs_transaction_lock: task c0909820 locking segment
semaphore
NILFS(segment) nilfs_transaction_lock: locked
NILFS(segment) nilfs_segctor_confirm: Skipped construction (no changes)
NILFS(segment) nilfs_segctor_notify: comion_unlock: task dfd7bac0
unlocked segment semaphore
NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment
semaphore
NILFS(segment) nilfs_transaction_lock: locked
NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment
semaphore

// Bunch of repeated lock, lock, unlock entries...

NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment
semaphore
NILFS(segment) nilfs_transaction_lock: locked
NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment
semaphore
NILFS(segment) nilfs_segctor_sync: start task=dfd7bac0 seq=703
NILFS(segment) nilfs_segctor_thread: woke
NILFS(segment) nilfs_segctor_thread: sequence: req=703, done=702,
state=0
NILFS(segment) nilfs_transaction_lock: task c0909820 locking segment
semaphore
NILFS(segment) nilfs_transaction_lock: locked
NILFS(segment) nilfs_segctor_confirm: Skipped construction (no changes)
NILFS(segment) nilfs_segctor_notify: complete requests from seq=703 to
seq=703
NILFS(segment) nilfs_segctor_wakeup: wakeup task=dfe6fb44 seq=703
NILFS(segment) nilfs_transaction_unlock: task c0909820 unlocked segment
semaphore
NILFS(segment) nilfs_segctor_thread: sequence: req=703, done=703,
state=0
NILFS(segment) nilfs_segctor_sync: done task=dfd7bac0 seq=703 err=0
NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment
semaphore
NILFS(segment) nilfs_transaction_lock: locked
NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment
semaphore

// Bunch of repeated lock, lock, unlock entries...

NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment
semaphore
NILFS(segment) nilfs_transaction_lock: locked
NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment
semaphore

// Game over.

As you can see, I've cut out some repeated sections to trim down the
output.  If you need more log info beyond this tail, let me know how
much and how you would like me to pass it to you.

Bill 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Write hang on ARM based target
       [not found]             ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900F99-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
@ 2009-06-12  4:23               ` Ryusuke Konishi
       [not found]                 ` <20090612.132344.17157054.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Ryusuke Konishi @ 2009-06-12  4:23 UTC (permalink / raw)
  To: WDunphy-/3AnZwnt+WyVkR+q5Y5grQ; +Cc: users-JrjvKiOkagjYtjvyW6yDsg

Hi Dunphy,

Thank you for sending me the trace log.  I will look into the problem
later.

Ryusuke Konishi

On Tue, 9 Jun 2009 13:04:04 -0600, "Dunphy, Bill" wrote:
>  
> Ryusuke,
> 
> I've simplified my test approach by simply copying a single 10 byte file
> repeatedly.  When doing so, I get consistent NILFS log output at the
> time of hang.  Examination of /proc/sysreq-trigger also shows the copy
> process to be in the same state as before (at sys_write).
> 
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_construct: begin
> (mode=0x1)                              
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: **
> LSEG BEGIN                            
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: **
> GC INODE STAGE                        
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: **
> FILE STAGE                            
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: **
> IFILE STAGE                           
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: called
> (ino=6, stage-count=3)                 
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers:
> called (ino=6)                        
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers:
> done (ino=6)                          
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_node_buffers:
> called (ino=6)                        
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: done
> (err=0)                                  
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: **
> CP STAGE                              
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: called
> (ino=4, stage-count=4)                 
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers:
> called (ino=4)                        
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers:
> done (ino=4)                          
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_node_buffers:
> called (ino=4)                        
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: done
> (err=0)                                  
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: **
> SUFILE STAGE                          
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: called
> (ino=5, stage-count=5)                 
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers:
> called (ino=5)                        
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers:
> done (ino=5)                          
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_node_buffers:
> called (ino=5)                        
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: done
> (err=0)                                  
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: **
> DAT STAGE                             
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: called
> (ino=3, stage-count=6)                 
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers:
> called (ino=3)                        
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_data_buffers:
> done (ino=3)                          
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_lookup_dirty_node_buffers:
> called (ino=3)                        
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_scan_file: done
> (err=0)                                  
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: **
> SR STAGE                              
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: add
> a super root block                   
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_collect_blocks: **
> LSEG END                              
> Jun  9 12:55:14 kernel: ========= NILFS SEGMENT INFORMATION ========
> 
> Jun  9 12:55:14 kernel: full segment: segnum=101, start=206848,
> end=208895                                    
> Jun  9 12:55:14 kernel: partial segment: start=208788, rest=108
> 
> Jun  9 12:55:14 kernel: ------------------ SUMMARY -----------------
> 
> Jun  9 12:55:14 kernel: nfinfo     = 4 (number of files)
> 
> Jun  9 12:55:14 kernel: nblocks    = 22 (number of blocks)
> 
> Jun  9 12:55:14 kernel: sumbytes   = 408 (size of summary in bytes)
> 
> Jun  9 12:55:14 kernel: nsumblk    = 1 (number of summary blocks)
> 
> Jun  9 12:55:14 kernel: flags      = LOGBGN|LOGEND|SR
> 
> Jun  9 12:55:14 kernel: ============================================
> 
> Jun  9 12:55:14 kernel: NILFS(segment)
> nilfs_segctor_update_payload_blocknr: called                           
> Jun  9 12:55:14 kernel: NILFS(segment)
> nilfs_segctor_update_payload_blocknr: done                             
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap:
> called                                
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap:
> done                                  
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint:
> called                               
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint:
> done                                 
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_update_segusage:
> called                                  
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_update_segusage:
> done                                    
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums:
> called                                
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums:
> done                                  
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segbuf_write: submitting
> summary blocks                          
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_alloc_seg_bio: allocated
> bio (max_vecs=64)                       
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segbuf_write: submitting
> normal blocks (index=1)                 
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_submit_seg_bio: submitting
> bio (start_sector=1670304, size=90112)
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segbuf_write: submitted a
> segment (err=0, pseg_start=208788, #re)
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segbuf_wait: called nbio=1
> 
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segbuf_wait: wait completed
> 
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_complete_write:
> completing segment (flags=0x7)           
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_complete_write:
> completed a segment having a super root )
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_do_construct:
> submitted all segments                     
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_construct: end
> (stage=9)                                 
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_notify: complete
> requests from seq=2 to seq=1            
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_transaction_unlock: task
> c08c0820 unlocked segment semaphore     
> Jun  9 12:55:14 kernel: NILFS(segment) nilfs_segctor_thread: sequence:
> req=1, done=1, state=0 
>   
> Bill
> 
> -----Original Message-----
> From: Dunphy, Bill 
> Sent: Monday, June 08, 2009 2:31 PM
> To: 'Ryusuke Konishi'; users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
> Subject: RE: [NILFS users] [SPAM] Write hang on ARM based target
> 
> Ryusuke,
> 
> Thanks for the prompt reply.
> 
> >Hi!
> >On Mon, 8 Jun 2009 09:46:52 -0600, "Dunphy, Bill" wrote:
> >> Greetings,
> >>  
> >> I'm attempting to get NILFS up and running on a Marvell 
> >> DB-88F6281-BP-A
> >> (ARM) and have experienced a consistent hang during data transfer.  
> >> This target board is running a 2.6.22.18 kernel and I've tested it 
> >> against NILFS versions 2.0.12, 2.0.14 and the latest git pull 
> >> (6/4/09).  I've eliminated the garbage collector and and have enabled
> 
> >> full debugging for log capture.  The failure is not immediate or at a
> 
> >> fixed point - typically in the 1-2GB neighborhood of writing variably
> sized files.
> >> This same exercise (a simple recursive local copy) works flawlessly 
> >> on my host Ubuntu i86 machine as well as on the target with an ext3 
> >> file system.
> >>  
> >> Is there a specific kernel feature that I might be missing and need 
> >> to enable?  A kernel version sensitivity?  Is this log information 
> >> something you would be interested in seeing?  If so, how verbose 
> >> would you like it or better still, how would you like me to collect
> it?
> >
> >Thanks for reporting.
> >
> >What will happen if you trigger SysRq when you meet the hang?
> >
> > # echo t > /proc/sysrq-trigger
> >
> >I don't know whether the arm kernel codes is supporting sysrq, but we
> may get the stack dump of >the nilfs task hung at some lock.  If it
> works, it would help us to narrow down the problem.
> 
> Here's the stack trace of the copy process which has entered a permanent
> disk sleep state:
> 
> cp            D C002CAA4     0   339    333 (NOTLB)
> [<c002c580>] (schedule+0x0/0x600) from [<c002d54c>]
> (schedule_timeout+0x98/0xc4) [<c002d4b4>] (schedule_timeout+0x0/0xc4)
> from [<c002cc28>] (io_schedule_timeout+0x34/0x58)
>  r7:00000001 r6:dfe6fc64 r5:dfd7bac0 r4:c03fb3a0 [<c002cbf4>]
> (io_schedule_timeout+0x0/0x58) from [<c014b834>]
> (congestion_wait+0x70/0x94)
>  r4:c03df664
> [<c014b7c4>] (congestion_wait+0x0/0x94) from [<c0146b2c>]
> (balance_dirty_pages_ratelimited_nr+0x12c/0x1c0)
>  r8:00000000 r7:cf142660 r6:00000600 r5:c04038c4 r4:ffffffce
> [<c0146a00>] (balance_dirty_pages_ratelimited_nr+0x0/0x1c0) from
> [<c0142860>] (generic_file_buffered_write+0x570/0x670)
> [<c01422f0>] (generic_file_buffered_write+0x0/0x670) from [<c0142db0>]
> (__generic_file_aio_write_nolock+0x450/0x4a0)
> [<c0142960>] (__generic_file_aio_write_nolock+0x0/0x4a0) from
> [<c0142e78>] (generic_file_aio_write+0x78/0xf4)
> [<c0142e00>] (generic_file_aio_write+0x0/0xf4) from [<c0160430>]
> (do_sync_write+0xb8/0x108) [<c0160378>] (do_sync_write+0x0/0x108) from
> [<c0160d18>] (vfs_write+0xb8/0x148) [<c0160c60>] (vfs_write+0x0/0x148)
> from [<c01612dc>] (sys_write+0x44/0x70)
>  r7:00000549 r6:dead0bc0 r5:00000000 r4:00009000 [<c0161298>]
> (sys_write+0x0/0x70) from [<c0027400>] (ret_fast_syscall+0x0/0x2c)
>  r8:c00275a8 r7:00000004 r6:0000000d r5:bee23f50 r4:00000549
> 
> >Also, you can know the progress of segment writer by specifying the 
> >following debug options,
> >
> > # echo "-vvv segment -vv seginfo" > /proc/fs/nilfs2/debug_option
> >
> >This feature is only available on the nilfs2 out-of-tree module built
> with CONFIG_NILFS_DEBUG=y >option.
> 
> Here's the tail of the syslog with segment and seginfo logging enhanced
> as requested:
> 
> NILFS(segment) nilfs_segctor_sync: start task=dfd7bac0 seq=159
> NILFS(segment) nilfs_segctor_thread: woke
> NILFS(segment) nilfs_segctor_thread: sequence: req=159, done=158,
> state=0
> NILFS(segment) nilfs_transaction_lock: task c0909820 locking segment
> semaphore
> NILFS(segment) nilfs_transaction_lock: locked
> NILFS(segment) nilfs_segctor_confirm: Skipped construction (no changes)
> NILFS(segment) nilfs_segctor_notify: complete requests from seq=159 to
> seq=159
> NILFS(segment) nilfs_segctor_wakeup: wakeup task=dfe6fb44 seq=159
> NILFS(segment) nilfs_transaction_unlock: task c0909820 unlocked segment
> semaphore
> NILFS(segment) nilfs_segctor_thread: sequence: req=159, done=159,
> state=0
> NILFS(segment) nilfs_segctor_sync: done task=dfd7bac0 seq=159 err=0
> NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment
> semaphore
> NILFS(segment) nilfs_transaction_lock: locked
> NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment
> semaphore
> NILFS(segment) nilfs_segctor_sync: start task=dfd7bac0 seq=160
> NILFS(segment) nilfs_segctor_thread: woke
> NILFS(segment) nilfs_segctor_thread: sequence: req=160, done=159,
> state=0
> NILFS(segment) nilfs_transaction_lock: task c0909820 locking segment
> semaphore
> NILFS(segment) nilfs_transaction_lock: locked
> NILFS(segment) nilfs_segctor_confirm: Skipped construction (no changes)
> NILFS(segment) nilfs_segctor_notify: comion_unlock: task dfd7bac0
> unlocked segment semaphore
> NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment
> semaphore
> NILFS(segment) nilfs_transaction_lock: locked
> NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment
> semaphore
> 
> // Bunch of repeated lock, lock, unlock entries...
> 
> NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment
> semaphore
> NILFS(segment) nilfs_transaction_lock: locked
> NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment
> semaphore
> NILFS(segment) nilfs_segctor_sync: start task=dfd7bac0 seq=703
> NILFS(segment) nilfs_segctor_thread: woke
> NILFS(segment) nilfs_segctor_thread: sequence: req=703, done=702,
> state=0
> NILFS(segment) nilfs_transaction_lock: task c0909820 locking segment
> semaphore
> NILFS(segment) nilfs_transaction_lock: locked
> NILFS(segment) nilfs_segctor_confirm: Skipped construction (no changes)
> NILFS(segment) nilfs_segctor_notify: complete requests from seq=703 to
> seq=703
> NILFS(segment) nilfs_segctor_wakeup: wakeup task=dfe6fb44 seq=703
> NILFS(segment) nilfs_transaction_unlock: task c0909820 unlocked segment
> semaphore
> NILFS(segment) nilfs_segctor_thread: sequence: req=703, done=703,
> state=0
> NILFS(segment) nilfs_segctor_sync: done task=dfd7bac0 seq=703 err=0
> NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment
> semaphore
> NILFS(segment) nilfs_transaction_lock: locked
> NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment
> semaphore
> 
> // Bunch of repeated lock, lock, unlock entries...
> 
> NILFS(segment) nilfs_transaction_lock: task dfd7bac0 locking segment
> semaphore
> NILFS(segment) nilfs_transaction_lock: locked
> NILFS(segment) nilfs_transaction_unlock: task dfd7bac0 unlocked segment
> semaphore
> 
> // Game over.
> 
> As you can see, I've cut out some repeated sections to trim down the
> output.  If you need more log info beyond this tail, let me know how
> much and how you would like me to pass it to you.
> 
> Bill 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Write hang on ARM based target
       [not found]                 ` <20090612.132344.17157054.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2009-06-18 18:36                   ` Ryusuke Konishi
       [not found]                     ` <20090619.033642.32757627.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Ryusuke Konishi @ 2009-06-18 18:36 UTC (permalink / raw)
  To: WDunphy-/3AnZwnt+WyVkR+q5Y5grQ
  Cc: konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg, users-JrjvKiOkagjYtjvyW6yDsg

Hi,
On Fri, 12 Jun 2009 13:23:44 +0900 (JST), Ryusuke Konishi wrote:
> On Tue, 9 Jun 2009 13:04:04 -0600, "Dunphy, Bill" wrote:
> >  
> > Ryusuke,
> > 
> > I've simplified my test approach by simply copying a single 10 byte file
> > repeatedly.  When doing so, I get consistent NILFS log output at the
> > time of hang.  Examination of /proc/sysreq-trigger also shows the copy
> > process to be in the same state as before (at sys_write).
>
> Thank you for sending me the trace log.  I will look into the problem
> later.
> 
> Ryusuke Konishi

In another thread, I've sent a patch that may cause the hang problem.
Here, I attach the out-of-tree version of the patch.

Does this make a difference in your problem ?

Thanks,
Ryusuke Konishi
--

diff --git a/fs/dat.c b/fs/dat.c
index 036b05d..b05d9d4 100644
--- a/fs/dat.c
+++ b/fs/dat.c
@@ -149,15 +149,6 @@ void nilfs_dat_commit_start(struct inode *dat, struct nilfs_palloc_req *req,
 	entry = nilfs_palloc_block_get_entry(dat, req->pr_entry_nr,
 					     req->pr_entry_bh, kaddr);
 	entry->de_start = cpu_to_le64(nilfs_mdt_cno(dat));
-	if (entry->de_blocknr != cpu_to_le64(0) ||
-	    entry->de_end != cpu_to_le64(NILFS_CNO_MAX)) {
-		printk(KERN_CRIT
-		       "%s: vbn = %llu, start = %llu, end = %llu, pbn = %llu\n",
-		       __func__, (unsigned long long)req->pr_entry_nr,
-		       (unsigned long long)le64_to_cpu(entry->de_start),
-		       (unsigned long long)le64_to_cpu(entry->de_end),
-		       (unsigned long long)le64_to_cpu(entry->de_blocknr));
-	}
 	entry->de_blocknr = cpu_to_le64(blocknr);
 	kunmap_atomic(kaddr, KM_USER0);
 
diff --git a/fs/segment.c b/fs/segment.c
index 3cbb3fe..225d5ae 100644
--- a/fs/segment.c
+++ b/fs/segment.c
@@ -2009,26 +2009,13 @@ static int nilfs_segctor_write(struct nilfs_sc_info *sci,
 		err = nilfs_segbuf_write(segbuf, &wi);
 
 		res = nilfs_segbuf_wait(segbuf, &wi);
-		err = unlikely(err) ? : res;
+		err = unlikely(err) ? err : res;
 		if (unlikely(err))
 			return err;
 	}
 	return 0;
 }
 
-static int nilfs_page_has_uncleared_buffer(struct page *page)
-{
-	struct buffer_head *head, *bh;
-
-	head = bh = page_buffers(page);
-	do {
-		if (buffer_dirty(bh) && !list_empty(&bh->b_assoc_buffers))
-			return 1;
-		bh = bh->b_this_page;
-	} while (bh != head);
-	return 0;
-}
-
 static void __nilfs_end_page_io(struct page *page, int err)
 {
 	if (!err) {
@@ -2056,12 +2043,11 @@ static void nilfs_end_page_io(struct page *page, int err)
 	if (!page)
 		return;
 
-	if (buffer_nilfs_node(page_buffers(page)) &&
-	    nilfs_page_has_uncleared_buffer(page))
-		/* For b-tree node pages, this function may be called twice
-		   or more because they might be split in a segment.
-		   This check assures that cleanup has been done for all
-		   buffers in a split btnode page. */
+	if (buffer_nilfs_node(page_buffers(page)) && !PageWriteback(page))
+		/*
+		 * For b-tree node pages, this function may be called twice
+		 * or more because they might be split in a segment.
+		 */
 		return;
 
 	__nilfs_end_page_io(page, err);
@@ -2125,7 +2111,7 @@ static void nilfs_segctor_abort_write(struct nilfs_sc_info *sci,
 			}
 			if (bh->b_page != fs_page) {
 				nilfs_end_page_io(fs_page, err);
-				if (unlikely(fs_page == failed_page))
+				if (fs_page && fs_page == failed_page)
 					goto done;
 				fs_page = bh->b_page;
 			}

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Write hang on ARM based target
       [not found]                     ` <20090619.033642.32757627.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2009-06-18 18:42                       ` Ryusuke Konishi
  2009-06-18 19:48                       ` Dunphy, Bill
  1 sibling, 0 replies; 14+ messages in thread
From: Ryusuke Konishi @ 2009-06-18 18:42 UTC (permalink / raw)
  To: WDunphy-/3AnZwnt+WyVkR+q5Y5grQ
  Cc: users-JrjvKiOkagjYtjvyW6yDsg, konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg

On Fri, 19 Jun 2009 03:36:42 +0900 (JST), Ryusuke Konishi <ryusuke-sG5X7nlA6pw@public.gmane.org> wrote:
> Hi,
> On Fri, 12 Jun 2009 13:23:44 +0900 (JST), Ryusuke Konishi wrote:
> > On Tue, 9 Jun 2009 13:04:04 -0600, "Dunphy, Bill" wrote:
> > >  
> > > Ryusuke,
> > > 
> > > I've simplified my test approach by simply copying a single 10 byte file
> > > repeatedly.  When doing so, I get consistent NILFS log output at the
> > > time of hang.  Examination of /proc/sysreq-trigger also shows the copy
> > > process to be in the same state as before (at sys_write).
> >
> > Thank you for sending me the trace log.  I will look into the problem
> > later.
> > 
> > Ryusuke Konishi
> 
> In another thread, I've sent a patch that may cause the hang problem.
> Here, I attach the out-of-tree version of the patch.

s/cause/fix/

Ryusuke

> Does this make a difference in your problem ?
> 
> Thanks,
> Ryusuke Konishi
> --
> 
> diff --git a/fs/dat.c b/fs/dat.c
> index 036b05d..b05d9d4 100644
> --- a/fs/dat.c
> +++ b/fs/dat.c
> @@ -149,15 +149,6 @@ void nilfs_dat_commit_start(struct inode *dat, struct nilfs_palloc_req *req,
>  	entry = nilfs_palloc_block_get_entry(dat, req->pr_entry_nr,
>  					     req->pr_entry_bh, kaddr);
>  	entry->de_start = cpu_to_le64(nilfs_mdt_cno(dat));
> -	if (entry->de_blocknr != cpu_to_le64(0) ||
> -	    entry->de_end != cpu_to_le64(NILFS_CNO_MAX)) {
> -		printk(KERN_CRIT
> -		       "%s: vbn = %llu, start = %llu, end = %llu, pbn = %llu\n",
> -		       __func__, (unsigned long long)req->pr_entry_nr,
> -		       (unsigned long long)le64_to_cpu(entry->de_start),
> -		       (unsigned long long)le64_to_cpu(entry->de_end),
> -		       (unsigned long long)le64_to_cpu(entry->de_blocknr));
> -	}
>  	entry->de_blocknr = cpu_to_le64(blocknr);
>  	kunmap_atomic(kaddr, KM_USER0);
>  
> diff --git a/fs/segment.c b/fs/segment.c
> index 3cbb3fe..225d5ae 100644
> --- a/fs/segment.c
> +++ b/fs/segment.c
> @@ -2009,26 +2009,13 @@ static int nilfs_segctor_write(struct nilfs_sc_info *sci,
>  		err = nilfs_segbuf_write(segbuf, &wi);
>  
>  		res = nilfs_segbuf_wait(segbuf, &wi);
> -		err = unlikely(err) ? : res;
> +		err = unlikely(err) ? err : res;
>  		if (unlikely(err))
>  			return err;
>  	}
>  	return 0;
>  }
>  
> -static int nilfs_page_has_uncleared_buffer(struct page *page)
> -{
> -	struct buffer_head *head, *bh;
> -
> -	head = bh = page_buffers(page);
> -	do {
> -		if (buffer_dirty(bh) && !list_empty(&bh->b_assoc_buffers))
> -			return 1;
> -		bh = bh->b_this_page;
> -	} while (bh != head);
> -	return 0;
> -}
> -
>  static void __nilfs_end_page_io(struct page *page, int err)
>  {
>  	if (!err) {
> @@ -2056,12 +2043,11 @@ static void nilfs_end_page_io(struct page *page, int err)
>  	if (!page)
>  		return;
>  
> -	if (buffer_nilfs_node(page_buffers(page)) &&
> -	    nilfs_page_has_uncleared_buffer(page))
> -		/* For b-tree node pages, this function may be called twice
> -		   or more because they might be split in a segment.
> -		   This check assures that cleanup has been done for all
> -		   buffers in a split btnode page. */
> +	if (buffer_nilfs_node(page_buffers(page)) && !PageWriteback(page))
> +		/*
> +		 * For b-tree node pages, this function may be called twice
> +		 * or more because they might be split in a segment.
> +		 */
>  		return;
>  
>  	__nilfs_end_page_io(page, err);
> @@ -2125,7 +2111,7 @@ static void nilfs_segctor_abort_write(struct nilfs_sc_info *sci,
>  			}
>  			if (bh->b_page != fs_page) {
>  				nilfs_end_page_io(fs_page, err);
> -				if (unlikely(fs_page == failed_page))
> +				if (fs_page && fs_page == failed_page)
>  					goto done;
>  				fs_page = bh->b_page;
>  			}
> _______________________________________________
> users mailing list
> users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
> https://www.nilfs.org/mailman/listinfo/users

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Write hang on ARM based target
       [not found]                     ` <20090619.033642.32757627.ryusuke-sG5X7nlA6pw@public.gmane.org>
  2009-06-18 18:42                       ` Ryusuke Konishi
@ 2009-06-18 19:48                       ` Dunphy, Bill
       [not found]                         ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FAC-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
  1 sibling, 1 reply; 14+ messages in thread
From: Dunphy, Bill @ 2009-06-18 19:48 UTC (permalink / raw)
  To: Ryusuke Konishi
  Cc: konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg, users-JrjvKiOkagjYtjvyW6yDsg

Tried applying patch to latest git pull as well as 2.0.12 and 2.0.14 and
experienced the following patch issue:

$ patch -p1 <your.patch 
(Stripping trailing CRs from patch.)
patching file fs/dat.c
(Stripping trailing CRs from patch.)
patching file fs/segment.c
patch: **** malformed patch at line 50: @@ -2056,12 +2043,11 @@ static
void nilfs_end_page_io(struct page *page, int err)

I could attempt to do by hand but decided to let you chime in first.

Thoughts? 

Bill

-----Original Message-----
From: Ryusuke Konishi [mailto:ryusuke-sG5X7nlA6pw@public.gmane.org] 
Sent: Thursday, June 18, 2009 12:37 PM
To: Dunphy, Bill
Cc: users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org; konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org
Subject: Re: [NILFS users] Write hang on ARM based target

Hi,
On Fri, 12 Jun 2009 13:23:44 +0900 (JST), Ryusuke Konishi wrote:
> On Tue, 9 Jun 2009 13:04:04 -0600, "Dunphy, Bill" wrote:
> >  
> > Ryusuke,
> > 
> > I've simplified my test approach by simply copying a single 10 byte 
> > file repeatedly.  When doing so, I get consistent NILFS log output 
> > at the time of hang.  Examination of /proc/sysreq-trigger also shows

> > the copy process to be in the same state as before (at sys_write).
>
> Thank you for sending me the trace log.  I will look into the problem 
> later.
> 
> Ryusuke Konishi

In another thread, I've sent a patch that may cause the hang problem.
Here, I attach the out-of-tree version of the patch.

Does this make a difference in your problem ?

Thanks,
Ryusuke Konishi
--

diff --git a/fs/dat.c b/fs/dat.c
index 036b05d..b05d9d4 100644
--- a/fs/dat.c
+++ b/fs/dat.c
@@ -149,15 +149,6 @@ void nilfs_dat_commit_start(struct inode *dat,
struct nilfs_palloc_req *req,
 	entry = nilfs_palloc_block_get_entry(dat, req->pr_entry_nr,
 					     req->pr_entry_bh, kaddr);
 	entry->de_start = cpu_to_le64(nilfs_mdt_cno(dat));
-	if (entry->de_blocknr != cpu_to_le64(0) ||
-	    entry->de_end != cpu_to_le64(NILFS_CNO_MAX)) {
-		printk(KERN_CRIT
-		       "%s: vbn = %llu, start = %llu, end = %llu, pbn =
%llu\n",
-		       __func__, (unsigned long long)req->pr_entry_nr,
-		       (unsigned long long)le64_to_cpu(entry->de_start),
-		       (unsigned long long)le64_to_cpu(entry->de_end),
-		       (unsigned long
long)le64_to_cpu(entry->de_blocknr));
-	}
 	entry->de_blocknr = cpu_to_le64(blocknr);
 	kunmap_atomic(kaddr, KM_USER0);
 
diff --git a/fs/segment.c b/fs/segment.c index 3cbb3fe..225d5ae 100644
--- a/fs/segment.c
+++ b/fs/segment.c
@@ -2009,26 +2009,13 @@ static int nilfs_segctor_write(struct
nilfs_sc_info *sci,
 		err = nilfs_segbuf_write(segbuf, &wi);
 
 		res = nilfs_segbuf_wait(segbuf, &wi);
-		err = unlikely(err) ? : res;
+		err = unlikely(err) ? err : res;
 		if (unlikely(err))
 			return err;
 	}
 	return 0;
 }
 
-static int nilfs_page_has_uncleared_buffer(struct page *page) -{
-	struct buffer_head *head, *bh;
-
-	head = bh = page_buffers(page);
-	do {
-		if (buffer_dirty(bh) &&
!list_empty(&bh->b_assoc_buffers))
-			return 1;
-		bh = bh->b_this_page;
-	} while (bh != head);
-	return 0;
-}
-
 static void __nilfs_end_page_io(struct page *page, int err)  {
 	if (!err) {
@@ -2056,12 +2043,11 @@ static void nilfs_end_page_io(struct page *page,
int err)
 	if (!page)
 		return;
 
-	if (buffer_nilfs_node(page_buffers(page)) &&
-	    nilfs_page_has_uncleared_buffer(page))
-		/* For b-tree node pages, this function may be called
twice
-		   or more because they might be split in a segment.
-		   This check assures that cleanup has been done for all
-		   buffers in a split btnode page. */
+	if (buffer_nilfs_node(page_buffers(page)) &&
!PageWriteback(page))
+		/*
+		 * For b-tree node pages, this function may be called
twice
+		 * or more because they might be split in a segment.
+		 */
 		return;
 
 	__nilfs_end_page_io(page, err);
@@ -2125,7 +2111,7 @@ static void nilfs_segctor_abort_write(struct
nilfs_sc_info *sci,
 			}
 			if (bh->b_page != fs_page) {
 				nilfs_end_page_io(fs_page, err);
-				if (unlikely(fs_page == failed_page))
+				if (fs_page && fs_page == failed_page)
 					goto done;
 				fs_page = bh->b_page;
 			}

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Write hang on ARM based target
       [not found]                         ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FAC-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
@ 2009-06-19  1:47                           ` Ryusuke Konishi
       [not found]                             ` <20090619.104731.99018736.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Ryusuke Konishi @ 2009-06-19  1:47 UTC (permalink / raw)
  To: WDunphy-/3AnZwnt+WyVkR+q5Y5grQ
  Cc: konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg, users-JrjvKiOkagjYtjvyW6yDsg

[-- Attachment #1: Type: Text/Plain, Size: 823 bytes --]

Hi,
On Thu, 18 Jun 2009 13:48:56 -0600, "Dunphy, Bill" wrote:
> Tried applying patch to latest git pull as well as 2.0.12 and 2.0.14 and
> experienced the following patch issue:
> 
> $ patch -p1 <your.patch 
> (Stripping trailing CRs from patch.)
> patching file fs/dat.c
> (Stripping trailing CRs from patch.)
> patching file fs/segment.c
> patch: **** malformed patch at line 50: @@ -2056,12 +2043,11 @@ static
> void nilfs_end_page_io(struct page *page, int err)
> 
> I could attempt to do by hand but decided to let you chime in first.
> 
> Thoughts? 
> 
> Bill

Hmm, strange.

I will attach the compressed one.  It is against the latest git, but
will be also applicable to v2.0.14.

 $ cd nilfs2-module
 $ bunzip -c xxx/nilfs2-fix-hang-after-bio_alloc-failed.patch.bz2 | patch -p1

How does this go ?

Ryusuke Konishi

[-- Attachment #2: nilfs2-fix-hang-after-bio_alloc-failed.patch.bz2 --]
[-- Type: Application/Octet-Stream, Size: 1159 bytes --]

[-- Attachment #3: Type: text/plain, Size: 158 bytes --]

_______________________________________________
users mailing list
users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org
https://www.nilfs.org/mailman/listinfo/users

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Write hang on ARM based target
       [not found]                             ` <20090619.104731.99018736.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2009-06-19 15:01                               ` Dunphy, Bill
       [not found]                                 ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FB3-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Dunphy, Bill @ 2009-06-19 15:01 UTC (permalink / raw)
  To: Ryusuke Konishi
  Cc: konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg, users-JrjvKiOkagjYtjvyW6yDsg

Thanks.  That patch integrated successfully.

Ran it again with the following result:

Jun 19 08:38:26 kernel: ========= NILFS SEGMENT INFORMATION ========

Jun 19 08:38:26 kernel: full segment: segnum=39, start=79872, end=81919

Jun 19 08:38:26 kernel: partial segment: start=81162, rest=758

Jun 19 08:38:26 kernel: ------------------ SUMMARY -----------------

Jun 19 08:38:26 kernel: nfinfo     = 4 (number of files)

Jun 19 08:38:26 kernel: nblocks    = 14 (number of blocks)

Jun 19 08:38:26 kernel: sumbytes   = 312 (size of summary in bytes)

Jun 19 08:38:26 kernel: nsumblk    = 1 (number of summary blocks)

Jun 19 08:38:26 kernel: flags      = LOGBGN|LOGEND|SR

Jun 19 08:38:26 kernel: ============================================

Jun 19 08:38:26 kernel: NILFS(segment)
nilfs_segctor_update_payload_blocknr: called                           
Jun 19 08:38:26 kernel: NILFS(segment)
nilfs_segctor_update_payload_blocknr: done                             
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap:
called                                
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap:
done                                  
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint:
called                               
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint:
done                                 
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_segusage:
called                                  
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_segusage:
done                                    
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums:
called                                
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums:
done                                  
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_write: submitting
summary blocks                          
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_alloc_seg_bio: allocated
bio (max_vecs=16)                       
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_write: submitting
normal blocks (index=1)                 
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_submit_seg_bio: submitting
bio (start_sector=649296, size=57344,)
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_write: submitted a
segment (err=0, pseg_start=81162, #req)
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_wait: called nbio=1

Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_wait: wait completed

Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_complete_write:
completing segment (flags=0x7)           
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_complete_write:
completed a segment having a super root )
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_do_construct:
submitted all segments                     
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_construct: end
(stage=9)                                 
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_notify: complete
requests from seq=2 to seq=1            
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_thread: sequence:
req=1, done=1, state=0                  

Bill

-----Original Message-----
From: Ryusuke Konishi [mailto:ryusuke-sG5X7nlA6pw@public.gmane.org] 
Sent: Thursday, June 18, 2009 7:48 PM
To: Dunphy, Bill
Cc: users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org; konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org
Subject: Re: [NILFS users] Write hang on ARM based target

Hi,
On Thu, 18 Jun 2009 13:48:56 -0600, "Dunphy, Bill" wrote:
> Tried applying patch to latest git pull as well as 2.0.12 and 2.0.14 
> and experienced the following patch issue:
> 
> $ patch -p1 <your.patch
> (Stripping trailing CRs from patch.)
> patching file fs/dat.c
> (Stripping trailing CRs from patch.)
> patching file fs/segment.c
> patch: **** malformed patch at line 50: @@ -2056,12 +2043,11 @@ static

> void nilfs_end_page_io(struct page *page, int err)
> 
> I could attempt to do by hand but decided to let you chime in first.
> 
> Thoughts? 
> 
> Bill

Hmm, strange.

I will attach the compressed one.  It is against the latest git, but
will be also applicable to v2.0.14.

 $ cd nilfs2-module
 $ bunzip -c xxx/nilfs2-fix-hang-after-bio_alloc-failed.patch.bz2 |
patch -p1

How does this go ?

Ryusuke Konishi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Write hang on ARM based target
       [not found]                                 ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FB3-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
@ 2009-06-22 18:58                                   ` Dunphy, Bill
       [not found]                                     ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FB9-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Dunphy, Bill @ 2009-06-22 18:58 UTC (permalink / raw)
  To: Ryusuke Konishi
  Cc: konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg, users-JrjvKiOkagjYtjvyW6yDsg

An update.

I've just begun some testing on a different reference design board
utilizing the 88F6281.  This particular board has native support in the
2.6.30 kernel which allowed me to give the in-tree version of NILFS a
go.  This board/kernel version combination ran though the testing
mentioned below without a hitch this weekend (1 million loops).  I've
since performed a number of massive simultaneous data transfers without
any errors.  Performance appeared to be much better from a high level as
well.  So at this point, it appears to me that there is a NILFS
sensitivity to the 2.6.22.18 kernel and/or a board oddity (even though
other file systems worked flawlessly).  My near term plan is to move
forward with this new board/kernel combination.  However, I will keep
the original board and it's 2.6.22.18 kernel up and available if you
would like me to try some other changes - you decide.  In the meanwhile,
I'll start banging away on this platform and report in if I see any
strange behavior.

Bill

-----Original Message-----
From: Dunphy, Bill 
Sent: Friday, June 19, 2009 9:01 AM
To: 'Ryusuke Konishi'
Cc: users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org; konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org
Subject: RE: [NILFS users] Write hang on ARM based target

Thanks.  That patch integrated successfully.

Ran it again with the following result:

Jun 19 08:38:26 kernel: ========= NILFS SEGMENT INFORMATION ========

Jun 19 08:38:26 kernel: full segment: segnum=39, start=79872, end=81919

Jun 19 08:38:26 kernel: partial segment: start=81162, rest=758

Jun 19 08:38:26 kernel: ------------------ SUMMARY -----------------

Jun 19 08:38:26 kernel: nfinfo     = 4 (number of files)

Jun 19 08:38:26 kernel: nblocks    = 14 (number of blocks)

Jun 19 08:38:26 kernel: sumbytes   = 312 (size of summary in bytes)

Jun 19 08:38:26 kernel: nsumblk    = 1 (number of summary blocks)

Jun 19 08:38:26 kernel: flags      = LOGBGN|LOGEND|SR

Jun 19 08:38:26 kernel: ============================================

Jun 19 08:38:26 kernel: NILFS(segment)
nilfs_segctor_update_payload_blocknr: called                           
Jun 19 08:38:26 kernel: NILFS(segment)
nilfs_segctor_update_payload_blocknr: done                             
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap:
called                                
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap:
done                                  
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint:
called                               
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint:
done                                 
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_segusage:
called                                  
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_segusage:
done                                    
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums:
called                                
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums:
done                                  
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_write: submitting
summary blocks                          
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_alloc_seg_bio: allocated
bio (max_vecs=16)                       
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_write: submitting
normal blocks (index=1)                 
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_submit_seg_bio: submitting
bio (start_sector=649296, size=57344,) Jun 19 08:38:26 kernel:
NILFS(segment) nilfs_segbuf_write: submitted a segment (err=0,
pseg_start=81162, #req)
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_wait: called nbio=1

Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_wait: wait completed

Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_complete_write:
completing segment (flags=0x7)           
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_complete_write:
completed a segment having a super root )
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_do_construct:
submitted all segments                     
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_construct: end
(stage=9)                                 
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_notify: complete
requests from seq=2 to seq=1            
Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_thread: sequence:
req=1, done=1, state=0                  

Bill

-----Original Message-----
From: Ryusuke Konishi [mailto:ryusuke-sG5X7nlA6pw@public.gmane.org]
Sent: Thursday, June 18, 2009 7:48 PM
To: Dunphy, Bill
Cc: users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org; konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org
Subject: Re: [NILFS users] Write hang on ARM based target

Hi,
On Thu, 18 Jun 2009 13:48:56 -0600, "Dunphy, Bill" wrote:
> Tried applying patch to latest git pull as well as 2.0.12 and 2.0.14 
> and experienced the following patch issue:
> 
> $ patch -p1 <your.patch
> (Stripping trailing CRs from patch.)
> patching file fs/dat.c
> (Stripping trailing CRs from patch.)
> patching file fs/segment.c
> patch: **** malformed patch at line 50: @@ -2056,12 +2043,11 @@ static

> void nilfs_end_page_io(struct page *page, int err)
> 
> I could attempt to do by hand but decided to let you chime in first.
> 
> Thoughts? 
> 
> Bill

Hmm, strange.

I will attach the compressed one.  It is against the latest git, but
will be also applicable to v2.0.14.

 $ cd nilfs2-module
 $ bunzip -c xxx/nilfs2-fix-hang-after-bio_alloc-failed.patch.bz2 |
patch -p1

How does this go ?

Ryusuke Konishi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Write hang on ARM based target
       [not found]                                     ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FB9-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
@ 2009-06-23  3:35                                       ` Ryusuke Konishi
       [not found]                                         ` <20090623.123505.37056161.ryusuke-sG5X7nlA6pw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Ryusuke Konishi @ 2009-06-23  3:35 UTC (permalink / raw)
  To: WDunphy-/3AnZwnt+WyVkR+q5Y5grQ
  Cc: konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg, users-JrjvKiOkagjYtjvyW6yDsg

Hi Bill,
On Mon, 22 Jun 2009 12:58:22 -0600, "Dunphy, Bill" wrote:
> An update.
> 
> I've just begun some testing on a different reference design board
> utilizing the 88F6281.  This particular board has native support in the
> 2.6.30 kernel which allowed me to give the in-tree version of NILFS a
> go.  This board/kernel version combination ran though the testing
> mentioned below without a hitch this weekend (1 million loops).  I've
> since performed a number of massive simultaneous data transfers without
> any errors.  Performance appeared to be much better from a high level as
> well.  So at this point, it appears to me that there is a NILFS
> sensitivity to the 2.6.22.18 kernel and/or a board oddity (even though
> other file systems worked flawlessly).  My near term plan is to move
> forward with this new board/kernel combination.  However, I will keep
> the original board and it's 2.6.22.18 kernel up and available if you
> would like me to try some other changes - you decide.  In the meanwhile,
> I'll start banging away on this platform and report in if I see any
> strange behavior.
> 
> Bill

Sorry for my late reply.

I found an inconsistent state in the value of sequence counter shown
in your log.  I think some sort of synchronization problem is present.
If so, I think we should resolve the problem because it may occur in
any RISC architectures.

Could you test if the attached patch makes a difference?

The patch adds volatile specifiers to sequence counters which may be
shared among different tasks.

Thanks,
Ryusuke Konishi
 
> -----Original Message-----
> From: Dunphy, Bill 
> Sent: Friday, June 19, 2009 9:01 AM
> To: 'Ryusuke Konishi'
> Cc: users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org; konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org
> Subject: RE: [NILFS users] Write hang on ARM based target
> 
> Thanks.  That patch integrated successfully.
> 
> Ran it again with the following result:
> 
> Jun 19 08:38:26 kernel: ========= NILFS SEGMENT INFORMATION ========
> 
> Jun 19 08:38:26 kernel: full segment: segnum=39, start=79872, end=81919
> 
> Jun 19 08:38:26 kernel: partial segment: start=81162, rest=758
> 
> Jun 19 08:38:26 kernel: ------------------ SUMMARY -----------------
> 
> Jun 19 08:38:26 kernel: nfinfo     = 4 (number of files)
> 
> Jun 19 08:38:26 kernel: nblocks    = 14 (number of blocks)
> 
> Jun 19 08:38:26 kernel: sumbytes   = 312 (size of summary in bytes)
> 
> Jun 19 08:38:26 kernel: nsumblk    = 1 (number of summary blocks)
> 
> Jun 19 08:38:26 kernel: flags      = LOGBGN|LOGEND|SR
> 
> Jun 19 08:38:26 kernel: ============================================
> 
> Jun 19 08:38:26 kernel: NILFS(segment)
> nilfs_segctor_update_payload_blocknr: called                           
> Jun 19 08:38:26 kernel: NILFS(segment)
> nilfs_segctor_update_payload_blocknr: done                             
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap:
> called                                
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap:
> done                                  
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint:
> called                               
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint:
> done                                 
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_segusage:
> called                                  
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_segusage:
> done                                    
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums:
> called                                
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums:
> done                                  
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_write: submitting
> summary blocks                          
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_alloc_seg_bio: allocated
> bio (max_vecs=16)                       
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_write: submitting
> normal blocks (index=1)                 
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_submit_seg_bio: submitting
> bio (start_sector=649296, size=57344,) Jun 19 08:38:26 kernel:
> NILFS(segment) nilfs_segbuf_write: submitted a segment (err=0,
> pseg_start=81162, #req)
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_wait: called nbio=1
> 
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_wait: wait completed
> 
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_complete_write:
> completing segment (flags=0x7)           
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_complete_write:
> completed a segment having a super root )
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_do_construct:
> submitted all segments                     
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_construct: end
> (stage=9)                                 


diff --git a/fs/segment.c b/fs/segment.c
index 84201ce..c9c28c2 100644
--- a/fs/segment.c
+++ b/fs/segment.c
@@ -2530,7 +2530,7 @@ void nilfs_segctor_clear_segments_to_be_freed(struct nilfs_sc_info *sci)
 
 struct nilfs_segctor_wait_request {
 	wait_queue_t	wq;
-	__u32		seq;
+	volatile __u32	seq;
 	int		err;
 	atomic_t	done;
 };
@@ -2699,7 +2699,7 @@ int nilfs_construct_dsync_segment(struct super_block *sb, struct inode *inode,
 
 struct nilfs_segctor_req {
 	int mode;
-	__u32 seq_accepted;
+	volatile __u32 seq_accepted;
 	int sc_err;  /* construction failure */
 	int sb_err;  /* super block writeback failure */
 };
diff --git a/fs/segment.h b/fs/segment.h
index 44dca64..8533783 100644
--- a/fs/segment.h
+++ b/fs/segment.h
@@ -163,8 +163,8 @@ struct nilfs_sc_info {
 	wait_queue_head_t	sc_wait_daemon;
 	wait_queue_head_t	sc_wait_task;
 
-	__u32			sc_seq_request;
-	__u32			sc_seq_done;
+	volatile __u32		sc_seq_request;
+	volatile __u32		sc_seq_done;
 
 	int			sc_sync;
 	unsigned long		sc_interval;
-- 
1.6.2

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Write hang on ARM based target
       [not found]                                         ` <20090623.123505.37056161.ryusuke-sG5X7nlA6pw@public.gmane.org>
@ 2009-06-23 15:34                                           ` Dunphy, Bill
       [not found]                                             ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FBB-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
  0 siblings, 1 reply; 14+ messages in thread
From: Dunphy, Bill @ 2009-06-23 15:34 UTC (permalink / raw)
  To: Ryusuke Konishi
  Cc: konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg, users-JrjvKiOkagjYtjvyW6yDsg

Ryusuke,

This patch failed to integrate with the latest git pull again.  Looking
at the lines it was trying to substitute I can see why.  Being a
relative rookie when it comes to git, am I missing a step here?  My
assumption is that anything we do in unison will be against the latest
and greatest - is this a bad assumption?  My sequence is a bit overkill
but seems to ensure a clean start:

1) git clone http://git.nilfs.org/nilfs-module-git
2) cd nilfs2-module
3) patch -p1 < "your patch"

If you have any suggestions as to how I should be doing this
differently, I'm all ears.  In any event, the good news is that was able
to hand patch the necessary files (segment.c/segment.h) - pretty simple,
straight forward change.  The bad news is that it still ends in a hang
state:

Jun 23 09:12:52 kernel: ========= NILFS SEGMENT INFORMATION ========

Jun 23 09:12:52 kernel: full segment: segnum=22, start=45056, end=47103

Jun 23 09:12:52 kernel: partial segment: start=45413, rest=1691

Jun 23 09:12:52 kernel: ------------------ SUMMARY -----------------

Jun 23 09:12:52 kernel: nfinfo     = 4 (number of files)

Jun 23 09:12:52 kernel: nblocks    = 14 (number of blocks)

Jun 23 09:12:52 kernel: sumbytes   = 312 (size of summary in bytes)

Jun 23 09:12:52 kernel: nsumblk    = 1 (number of summary blocks)

Jun 23 09:12:52 kernel: flags      = LOGBGN|LOGEND|SR

Jun 23 09:12:52 kernel: ============================================

Jun 23 09:12:52 kernel: NILFS(segment)
nilfs_segctor_update_payload_blocknr: called                           
Jun 23 09:12:52 kernel: NILFS(segment)
nilfs_segctor_update_payload_blocknr: done                             
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap:
called                                
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap:
done                                  
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint:
called                               
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint:
done                                 
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_update_segusage:
called                                  
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_update_segusage:
done                                    
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums:
called                                
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums:
done                                  
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segbuf_write: submitting
summary blocks                          
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_alloc_seg_bio: allocated
bio (max_vecs=16)                       
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segbuf_write: submitting
normal blocks (index=1)                 
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_submit_seg_bio: submitting
bio (start_sector=363304, size=57344,)
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segbuf_write: submitted a
segment (err=0, pseg_start=45413, #req)
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segbuf_wait: called nbio=1

Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segbuf_wait: wait completed

Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_complete_write:
completing segment (flags=0x7)           
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_complete_write:
completed a segment having a super root )
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_do_construct:
submitted all segments                     
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_construct: end
(stage=9)                                 
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_notify: complete
requests from seq=1 to seq=1            
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_thread: sequence:
req=1, done=1, state=0        

Bill

-----Original Message-----
From: Ryusuke Konishi [mailto:ryusuke-sG5X7nlA6pw@public.gmane.org] 
Sent: Monday, June 22, 2009 9:35 PM
To: Dunphy, Bill
Cc: users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org; konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org
Subject: Re: [NILFS users] Write hang on ARM based target

Hi Bill,
On Mon, 22 Jun 2009 12:58:22 -0600, "Dunphy, Bill" wrote:
> An update.
> 
> I've just begun some testing on a different reference design board 
> utilizing the 88F6281.  This particular board has native support in 
> the 2.6.30 kernel which allowed me to give the in-tree version of 
> NILFS a go.  This board/kernel version combination ran though the 
> testing mentioned below without a hitch this weekend (1 million 
> loops).  I've since performed a number of massive simultaneous data 
> transfers without any errors.  Performance appeared to be much better 
> from a high level as well.  So at this point, it appears to me that 
> there is a NILFS sensitivity to the 2.6.22.18 kernel and/or a board 
> oddity (even though other file systems worked flawlessly).  My near 
> term plan is to move forward with this new board/kernel combination.  
> However, I will keep the original board and it's 2.6.22.18 kernel up 
> and available if you would like me to try some other changes - you 
> decide.  In the meanwhile, I'll start banging away on this platform 
> and report in if I see any strange behavior.
> 
> Bill

Sorry for my late reply.

I found an inconsistent state in the value of sequence counter shown in
your log.  I think some sort of synchronization problem is present.
If so, I think we should resolve the problem because it may occur in any
RISC architectures.

Could you test if the attached patch makes a difference?

The patch adds volatile specifiers to sequence counters which may be
shared among different tasks.

Thanks,
Ryusuke Konishi
 
> -----Original Message-----
> From: Dunphy, Bill
> Sent: Friday, June 19, 2009 9:01 AM
> To: 'Ryusuke Konishi'
> Cc: users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org; konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org
> Subject: RE: [NILFS users] Write hang on ARM based target
> 
> Thanks.  That patch integrated successfully.
> 
> Ran it again with the following result:
> 
> Jun 19 08:38:26 kernel: ========= NILFS SEGMENT INFORMATION ========
> 
> Jun 19 08:38:26 kernel: full segment: segnum=39, start=79872, 
> end=81919
> 
> Jun 19 08:38:26 kernel: partial segment: start=81162, rest=758
> 
> Jun 19 08:38:26 kernel: ------------------ SUMMARY -----------------
> 
> Jun 19 08:38:26 kernel: nfinfo     = 4 (number of files)
> 
> Jun 19 08:38:26 kernel: nblocks    = 14 (number of blocks)
> 
> Jun 19 08:38:26 kernel: sumbytes   = 312 (size of summary in bytes)
> 
> Jun 19 08:38:26 kernel: nsumblk    = 1 (number of summary blocks)
> 
> Jun 19 08:38:26 kernel: flags      = LOGBGN|LOGEND|SR
> 
> Jun 19 08:38:26 kernel: ============================================
> 
> Jun 19 08:38:26 kernel: NILFS(segment)
> nilfs_segctor_update_payload_blocknr: called

> Jun 19 08:38:26 kernel: NILFS(segment)
> nilfs_segctor_update_payload_blocknr: done

> Jun 19 08:38:26 kernel: NILFS(segment)
nilfs_segctor_fill_in_file_bmap:
> called                                
> Jun 19 08:38:26 kernel: NILFS(segment)
nilfs_segctor_fill_in_file_bmap:
> done                                  
> Jun 19 08:38:26 kernel: NILFS(segment)
nilfs_segctor_fill_in_checkpoint:
> called                               
> Jun 19 08:38:26 kernel: NILFS(segment)
nilfs_segctor_fill_in_checkpoint:
> done                                 
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_segusage:
> called                                  
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_segusage:
> done                                    
> Jun 19 08:38:26 kernel: NILFS(segment)
nilfs_segctor_fill_in_checksums:
> called                                
> Jun 19 08:38:26 kernel: NILFS(segment)
nilfs_segctor_fill_in_checksums:
> done                                  
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_write: submitting
> summary blocks                          
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_alloc_seg_bio: allocated
> bio (max_vecs=16)                       
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_write: submitting
> normal blocks (index=1)                 
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_submit_seg_bio: 
> submitting bio (start_sector=649296, size=57344,) Jun 19 08:38:26
kernel:
> NILFS(segment) nilfs_segbuf_write: submitted a segment (err=0, 
> pseg_start=81162, #req) Jun 19 08:38:26 kernel: NILFS(segment) 
> nilfs_segbuf_wait: called nbio=1
> 
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_wait: wait 
> completed
> 
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_complete_write:
> completing segment (flags=0x7)           
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_complete_write:
> completed a segment having a super root ) Jun 19 08:38:26 kernel: 
> NILFS(segment) nilfs_segctor_do_construct:
> submitted all segments                     
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_construct: end
> (stage=9)                                 


diff --git a/fs/segment.c b/fs/segment.c index 84201ce..c9c28c2 100644
--- a/fs/segment.c
+++ b/fs/segment.c
@@ -2530,7 +2530,7 @@ void
nilfs_segctor_clear_segments_to_be_freed(struct nilfs_sc_info *sci)
 
 struct nilfs_segctor_wait_request {
 	wait_queue_t	wq;
-	__u32		seq;
+	volatile __u32	seq;
 	int		err;
 	atomic_t	done;
 };
@@ -2699,7 +2699,7 @@ int nilfs_construct_dsync_segment(struct
super_block *sb, struct inode *inode,
 
 struct nilfs_segctor_req {
 	int mode;
-	__u32 seq_accepted;
+	volatile __u32 seq_accepted;
 	int sc_err;  /* construction failure */
 	int sb_err;  /* super block writeback failure */  }; diff --git
a/fs/segment.h b/fs/segment.h index 44dca64..8533783 100644
--- a/fs/segment.h
+++ b/fs/segment.h
@@ -163,8 +163,8 @@ struct nilfs_sc_info {
 	wait_queue_head_t	sc_wait_daemon;
 	wait_queue_head_t	sc_wait_task;
 
-	__u32			sc_seq_request;
-	__u32			sc_seq_done;
+	volatile __u32		sc_seq_request;
+	volatile __u32		sc_seq_done;
 
 	int			sc_sync;
 	unsigned long		sc_interval;
--
1.6.2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Write hang on ARM based target
       [not found]                                             ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FBB-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
@ 2009-06-24 21:16                                               ` Dunphy, Bill
  0 siblings, 0 replies; 14+ messages in thread
From: Dunphy, Bill @ 2009-06-24 21:16 UTC (permalink / raw)
  To: Ryusuke Konishi
  Cc: konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg, users-JrjvKiOkagjYtjvyW6yDsg

Ryusuke,

Here's another data point.  I've run the same test on the new ARM
reference design hardware, but this time running the 2.6.22.18 kernel.
It also hangs after a similar period of time and when examining the
logging output, it appears very similar to the original error.

 Jun 24 15:06:25 kernel: ========= NILFS SEGMENT INFORMATION ========

Jun 24 15:06:25 kernel: full segment: segnum=124, start=253952,
end=255999                                    
Jun 24 15:06:25 kernel: partial segment: start=254180, rest=1820

Jun 24 15:06:25 kernel: ------------------ SUMMARY -----------------

Jun 24 15:06:25 kernel: nfinfo     = 4 (number of files)

Jun 24 15:06:25 kernel: nblocks    = 17 (number of blocks)

Jun 24 15:06:25 kernel: sumbytes   = 344 (size of summary in bytes)

Jun 24 15:06:25 kernel: nsumblk    = 1 (number of summary blocks)

Jun 24 15:06:25 kernel: flags      = LOGBGN|LOGEND|SR

Jun 24 15:06:25 kernel: ============================================

Jun 24 15:06:25 kernel: NILFS(segment)
nilfs_segctor_update_payload_blocknr: called                           
Jun 24 15:06:25 kernel: NILFS(segment)
nilfs_segctor_update_payload_blocknr: done                             
Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap:
called                                
Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap:
done                                  
Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint:
called                               
Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint:
done                                 
Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_update_segusage:
called                                  
Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_update_segusage:
done                                    
Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums:
called                                
Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums:
done                                  
Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segbuf_write: submitting
summary blocks                          
Jun 24 15:06:25 kernel: NILFS(segment) nilfs_alloc_seg_bio: allocated
bio (max_vecs=64)                       
Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segbuf_write: submitting
normal blocks (index=1)                 
Jun 24 15:06:25 kernel: NILFS(segment) nilfs_submit_seg_bio: submitting
bio (start_sector=2033440, size=69632)
Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segbuf_write: submitted a
segment (err=0, pseg_start=254180, #re)
Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segbuf_wait: called nbio=1

Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segbuf_wait: wait completed

Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_complete_write:
completing segment (flags=0x7)           
Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_complete_write:
completed a segment having a super root )
Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_do_construct:
submitted all segments                     
Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_construct: end
(stage=9)                                 
Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_notify: complete
requests from seq=1 to seq=1            
Jun 24 15:06:25 kernel: NILFS(segment) nilfs_segctor_thread: sequence:
req=1, done=1, state=0                 

Not too surprising - just eliminates the possibility of bad hardware.

Bill

-----Original Message-----
From: Dunphy, Bill 
Sent: Tuesday, June 23, 2009 9:35 AM
To: 'Ryusuke Konishi'
Cc: users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org; konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org
Subject: RE: [NILFS users] Write hang on ARM based target

Ryusuke,

This patch failed to integrate with the latest git pull again.  Looking
at the lines it was trying to substitute I can see why.  Being a
relative rookie when it comes to git, am I missing a step here?  My
assumption is that anything we do in unison will be against the latest
and greatest - is this a bad assumption?  My sequence is a bit overkill
but seems to ensure a clean start:

1) git clone http://git.nilfs.org/nilfs-module-git
2) cd nilfs2-module
3) patch -p1 < "your patch"

If you have any suggestions as to how I should be doing this
differently, I'm all ears.  In any event, the good news is that was able
to hand patch the necessary files (segment.c/segment.h) - pretty simple,
straight forward change.  The bad news is that it still ends in a hang
state:

Jun 23 09:12:52 kernel: ========= NILFS SEGMENT INFORMATION ========

Jun 23 09:12:52 kernel: full segment: segnum=22, start=45056, end=47103

Jun 23 09:12:52 kernel: partial segment: start=45413, rest=1691

Jun 23 09:12:52 kernel: ------------------ SUMMARY -----------------

Jun 23 09:12:52 kernel: nfinfo     = 4 (number of files)

Jun 23 09:12:52 kernel: nblocks    = 14 (number of blocks)

Jun 23 09:12:52 kernel: sumbytes   = 312 (size of summary in bytes)

Jun 23 09:12:52 kernel: nsumblk    = 1 (number of summary blocks)

Jun 23 09:12:52 kernel: flags      = LOGBGN|LOGEND|SR

Jun 23 09:12:52 kernel: ============================================

Jun 23 09:12:52 kernel: NILFS(segment)
nilfs_segctor_update_payload_blocknr: called                           
Jun 23 09:12:52 kernel: NILFS(segment)
nilfs_segctor_update_payload_blocknr: done                             
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap:
called                                
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_file_bmap:
done                                  
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint:
called                               
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_checkpoint:
done                                 
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_update_segusage:
called                                  
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_update_segusage:
done                                    
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums:
called                                
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_fill_in_checksums:
done                                  
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segbuf_write: submitting
summary blocks                          
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_alloc_seg_bio: allocated
bio (max_vecs=16)                       
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segbuf_write: submitting
normal blocks (index=1)                 
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_submit_seg_bio: submitting
bio (start_sector=363304, size=57344,) Jun 23 09:12:52 kernel:
NILFS(segment) nilfs_segbuf_write: submitted a segment (err=0,
pseg_start=45413, #req)
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segbuf_wait: called nbio=1

Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segbuf_wait: wait completed

Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_complete_write:
completing segment (flags=0x7)           
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_complete_write:
completed a segment having a super root )
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_do_construct:
submitted all segments                     
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_construct: end
(stage=9)                                 
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_notify: complete
requests from seq=1 to seq=1            
Jun 23 09:12:52 kernel: NILFS(segment) nilfs_segctor_thread: sequence:
req=1, done=1, state=0        

Bill

-----Original Message-----
From: Ryusuke Konishi [mailto:ryusuke-sG5X7nlA6pw@public.gmane.org]
Sent: Monday, June 22, 2009 9:35 PM
To: Dunphy, Bill
Cc: users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org; konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org
Subject: Re: [NILFS users] Write hang on ARM based target

Hi Bill,
On Mon, 22 Jun 2009 12:58:22 -0600, "Dunphy, Bill" wrote:
> An update.
> 
> I've just begun some testing on a different reference design board 
> utilizing the 88F6281.  This particular board has native support in 
> the 2.6.30 kernel which allowed me to give the in-tree version of 
> NILFS a go.  This board/kernel version combination ran though the 
> testing mentioned below without a hitch this weekend (1 million 
> loops).  I've since performed a number of massive simultaneous data 
> transfers without any errors.  Performance appeared to be much better 
> from a high level as well.  So at this point, it appears to me that 
> there is a NILFS sensitivity to the 2.6.22.18 kernel and/or a board 
> oddity (even though other file systems worked flawlessly).  My near 
> term plan is to move forward with this new board/kernel combination.
> However, I will keep the original board and it's 2.6.22.18 kernel up 
> and available if you would like me to try some other changes - you 
> decide.  In the meanwhile, I'll start banging away on this platform 
> and report in if I see any strange behavior.
> 
> Bill

Sorry for my late reply.

I found an inconsistent state in the value of sequence counter shown in
your log.  I think some sort of synchronization problem is present.
If so, I think we should resolve the problem because it may occur in any
RISC architectures.

Could you test if the attached patch makes a difference?

The patch adds volatile specifiers to sequence counters which may be
shared among different tasks.

Thanks,
Ryusuke Konishi
 
> -----Original Message-----
> From: Dunphy, Bill
> Sent: Friday, June 19, 2009 9:01 AM
> To: 'Ryusuke Konishi'
> Cc: users-JrjvKiOkagjYtjvyW6yDsg@public.gmane.org; konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org
> Subject: RE: [NILFS users] Write hang on ARM based target
> 
> Thanks.  That patch integrated successfully.
> 
> Ran it again with the following result:
> 
> Jun 19 08:38:26 kernel: ========= NILFS SEGMENT INFORMATION ========
> 
> Jun 19 08:38:26 kernel: full segment: segnum=39, start=79872,
> end=81919
> 
> Jun 19 08:38:26 kernel: partial segment: start=81162, rest=758
> 
> Jun 19 08:38:26 kernel: ------------------ SUMMARY -----------------
> 
> Jun 19 08:38:26 kernel: nfinfo     = 4 (number of files)
> 
> Jun 19 08:38:26 kernel: nblocks    = 14 (number of blocks)
> 
> Jun 19 08:38:26 kernel: sumbytes   = 312 (size of summary in bytes)
> 
> Jun 19 08:38:26 kernel: nsumblk    = 1 (number of summary blocks)
> 
> Jun 19 08:38:26 kernel: flags      = LOGBGN|LOGEND|SR
> 
> Jun 19 08:38:26 kernel: ============================================
> 
> Jun 19 08:38:26 kernel: NILFS(segment)
> nilfs_segctor_update_payload_blocknr: called

> Jun 19 08:38:26 kernel: NILFS(segment)
> nilfs_segctor_update_payload_blocknr: done

> Jun 19 08:38:26 kernel: NILFS(segment)
nilfs_segctor_fill_in_file_bmap:
> called                                
> Jun 19 08:38:26 kernel: NILFS(segment)
nilfs_segctor_fill_in_file_bmap:
> done                                  
> Jun 19 08:38:26 kernel: NILFS(segment)
nilfs_segctor_fill_in_checkpoint:
> called                               
> Jun 19 08:38:26 kernel: NILFS(segment)
nilfs_segctor_fill_in_checkpoint:
> done                                 
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_segusage:
> called                                  
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_update_segusage:
> done                                    
> Jun 19 08:38:26 kernel: NILFS(segment)
nilfs_segctor_fill_in_checksums:
> called                                
> Jun 19 08:38:26 kernel: NILFS(segment)
nilfs_segctor_fill_in_checksums:
> done                                  
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_write: submitting
> summary blocks                          
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_alloc_seg_bio: allocated
> bio (max_vecs=16)                       
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_write: submitting
> normal blocks (index=1)                 
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_submit_seg_bio: 
> submitting bio (start_sector=649296, size=57344,) Jun 19 08:38:26
kernel:
> NILFS(segment) nilfs_segbuf_write: submitted a segment (err=0, 
> pseg_start=81162, #req) Jun 19 08:38:26 kernel: NILFS(segment)
> nilfs_segbuf_wait: called nbio=1
> 
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segbuf_wait: wait 
> completed
> 
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_complete_write:
> completing segment (flags=0x7)           
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_complete_write:
> completed a segment having a super root ) Jun 19 08:38:26 kernel: 
> NILFS(segment) nilfs_segctor_do_construct:
> submitted all segments                     
> Jun 19 08:38:26 kernel: NILFS(segment) nilfs_segctor_construct: end
> (stage=9)                                 


diff --git a/fs/segment.c b/fs/segment.c index 84201ce..c9c28c2 100644
--- a/fs/segment.c
+++ b/fs/segment.c
@@ -2530,7 +2530,7 @@ void
nilfs_segctor_clear_segments_to_be_freed(struct nilfs_sc_info *sci)
 
 struct nilfs_segctor_wait_request {
 	wait_queue_t	wq;
-	__u32		seq;
+	volatile __u32	seq;
 	int		err;
 	atomic_t	done;
 };
@@ -2699,7 +2699,7 @@ int nilfs_construct_dsync_segment(struct
super_block *sb, struct inode *inode,
 
 struct nilfs_segctor_req {
 	int mode;
-	__u32 seq_accepted;
+	volatile __u32 seq_accepted;
 	int sc_err;  /* construction failure */
 	int sb_err;  /* super block writeback failure */  }; diff --git
a/fs/segment.h b/fs/segment.h index 44dca64..8533783 100644
--- a/fs/segment.h
+++ b/fs/segment.h
@@ -163,8 +163,8 @@ struct nilfs_sc_info {
 	wait_queue_head_t	sc_wait_daemon;
 	wait_queue_head_t	sc_wait_task;
 
-	__u32			sc_seq_request;
-	__u32			sc_seq_done;
+	volatile __u32		sc_seq_request;
+	volatile __u32		sc_seq_done;
 
 	int			sc_sync;
 	unsigned long		sc_interval;
--
1.6.2

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-06-24 21:16 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-08 15:46 [SPAM] Write hang on ARM based target Dunphy, Bill
     [not found] ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900F8D-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
2009-06-08 16:49   ` Ryusuke Konishi
     [not found]     ` <20090609.014919.79903506.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-06-08 20:30       ` Dunphy, Bill
     [not found]         ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900F96-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
2009-06-09 19:04           ` Dunphy, Bill
     [not found]             ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900F99-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
2009-06-12  4:23               ` Ryusuke Konishi
     [not found]                 ` <20090612.132344.17157054.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-06-18 18:36                   ` Ryusuke Konishi
     [not found]                     ` <20090619.033642.32757627.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-06-18 18:42                       ` Ryusuke Konishi
2009-06-18 19:48                       ` Dunphy, Bill
     [not found]                         ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FAC-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
2009-06-19  1:47                           ` Ryusuke Konishi
     [not found]                             ` <20090619.104731.99018736.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-06-19 15:01                               ` Dunphy, Bill
     [not found]                                 ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FB3-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
2009-06-22 18:58                                   ` Dunphy, Bill
     [not found]                                     ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FB9-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
2009-06-23  3:35                                       ` Ryusuke Konishi
     [not found]                                         ` <20090623.123505.37056161.ryusuke-sG5X7nlA6pw@public.gmane.org>
2009-06-23 15:34                                           ` Dunphy, Bill
     [not found]                                             ` <BAEFA5609CC3AB4BAE1625CAD4C27E5C01900FBB-qPLjlnsG5hmCmmEZ8mJCX5vx1ZQ4fP860E9HWUfgJXw@public.gmane.org>
2009-06-24 21:16                                               ` Dunphy, Bill

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.