All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH][RFC] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_updates operations
@ 2011-12-16 11:19 Toshiyuki Okajima
  2011-12-16 12:22 ` Yongqiang Yang
  0 siblings, 1 reply; 14+ messages in thread
From: Toshiyuki Okajima @ 2011-12-16 11:19 UTC (permalink / raw)
  To: tytso, adilger.kernel; +Cc: linux-ext4

Hi.

I found a bug by executing the following reproducer (needs tuning).
(The reproducer never ends.)

I wrote it for a confirmation of 
 "[PATCH 1/5] ext4: allocate delalloc blocks before changing journal mode".
Therefore, without 
 "[PATCH 1/5] ext4: allocate delalloc blocks before changing journal mode"
patch, the other problem (panic) which the patch describes can happen more
frequently:
-------------------------------------------------------------------------------
#!/bin/sh

date
LOOP=100000
dd if=/dev/zero of=/tmp/ext4.img bs=1k count=1000k
/sbin/mkfs.ext4 -Fq /tmp/ext4.img
mount -o loop /tmp/ext4.img /mnt
rm -f /mnt/file
echo "0" > /mnt/file
(while [ 1 ]; do dd if=/dev/zero of=/mnt/file2 bs=4k > /dev/null 2>&1; done) &
PID=$!
for ((i=0; i < LOOP; i++));
do
        DATE=$(date)
        echo -n "[LOOP $i] $DATE"
        if ((i%2 == 0));
        then
                chattr +j /mnt/file
        else
                chattr -j /mnt/file
        fi
        echo "0" >> /mnt/file
done
kill -9 $PID
rm -f /mnt/file*
umount /mnt
exit 0
-------------------------------------------------------------------------------
Though I ran it, it never ended forever.
(At one of my tries to reproduce the hang, the reproducer stopped in 6
hours.)

Because jbd2_journal_lock_updates() never returns from schedule() 
after prepare_to_wait(). 

The detailed explanation is as follows:
===============================================================================
Current implementation of jbd2_journal_stop() has a bug not to synchronize 
t_updates with other operators, jbd2_journal_lock_updates(), start_this_handle()
and jbd2_journal_commit_transaction().

This bug was derived from commit: c35a56a090eacefca07afeb994029b57d8dd8025 and
 commit: 8dd420466c7bfc459fa04680bd5690bfc41a4553.

---------------------------
jbd2_journal_lock_updates()
---------------------------
 509         write_lock(&journal->j_state_lock);
 510         ++journal->j_barrier_count;
...
 513         while (1) {
...
 519                 spin_lock(&transaction->t_handle_lock);
 520                 if (!atomic_read(&transaction->t_updates)) {
 521                         spin_unlock(&transaction->t_handle_lock);
 522                         break;
 523                 }
 524                 prepare_to_wait(&journal->j_wait_updates, &wait,
 525                                 TASK_UNINTERRUPTIBLE);
 526                 spin_unlock(&transaction->t_handle_lock);
 527                 write_unlock(&journal->j_state_lock);
 528                 schedule();
 529                 finish_wait(&journal->j_wait_updates, &wait);
 530                 write_lock(&journal->j_state_lock);
 531         }
 532         write_unlock(&journal->j_state_lock);

* The operation of t_updates is protected by write_lock(&journal->j_state_lock)
 and spin_lock(&transaction->t_handle_lock).

-------------------
start_this_handle()
-------------------
 160         read_lock(&journal->j_state_lock);
...
 278         atomic_inc(&transaction->t_updates);
...
 284         read_unlock(&journal->j_state_lock);

* The operation of t_updates is protected by read_lock(&journal->j_state_lock).

---------------------------------
jbd2_journal_commit_transaction()
---------------------------------
 358         write_lock(&journal->j_state_lock);
...
 367         spin_lock(&commit_transaction->t_handle_lock);
 368         while (atomic_read(&commit_transaction->t_updates)) {
 369                 DEFINE_WAIT(wait);
 370 
 371                 prepare_to_wait(&journal->j_wait_updates, &wait,
 372                                         TASK_UNINTERRUPTIBLE);
 373                 if (atomic_read(&commit_transaction->t_updates)) {
 374                         spin_unlock(&commit_transaction->t_handle_lock);
 375                         write_unlock(&journal->j_state_lock);
 376                         schedule();
 377                         write_lock(&journal->j_state_lock);
 378                         spin_lock(&commit_transaction->t_handle_lock);
 379                 }
 380                 finish_wait(&journal->j_wait_updates, &wait);
 381         }
 382         spin_unlock(&commit_transaction->t_handle_lock);
...
 447         wake_up(&journal->j_wait_transaction_locked);
 448         write_unlock(&journal->j_state_lock);

* The operation of t_updates is protected by write_lock(&journal->j_state_lock)
 and spin_lock(&transaction->t_handle_lock).

-------------------
jbd2_journal_stop()
-------------------
...
1452         if (atomic_dec_and_test(&transaction->t_updates)) {
1453                 wake_up(&journal->j_wait_updates);
1454                 if (journal->j_barrier_count)
1455                         wake_up(&journal->j_wait_transaction_locked);
1456         }
1457 
1458         if (wait_for_commit)
1459                 err = jbd2_log_wait_commit(journal, tid);
1460 
1461         lock_map_release(&handle->h_lockdep_map);
1462 
1463         jbd2_free_handle(handle);
1464         return err;
1465 }

* The operation of t_updates is not protected.

Therefore, 
- jbd2_journal_lock_updates() sleeps if transaction->t_update is 0.
- jbd2_journal_stop() tries to wake someone up 
(wake_up(&journal->j_wait_updates)) after transaction->t_updates becomes 0. 
=> But jbd2_journal_lock_updates() isn't woken up and then sleeps forever,
   if it has not yet slept completely.

After we run the reproducer, we can see the processes which become hang-up.

-------------------------------------------------------------------------------
crash> ps | grep UN
  16645  23737   1  ffff88007a592b20  UN   0.0  105136    508  dd
  16684  23727   1  ffff88007a2a9480  UN   0.0  107152    512  chattr
  23722      2   1  ffff88007b496080  UN   0.0       0      0  [flush-7:0]

[1]
crash> bt 16645
PID: 16645  TASK: ffff88007a592b20  CPU: 1   COMMAND: "dd"
 #0 [ffff88007b575878] __schedule at ffffffff814b9b77
 #1 [ffff88007b575920] schedule at ffffffff814ba3cf
 #2 [ffff88007b575930] start_this_handle at ffffffffa01acef5 [jbd2]
 #3 [ffff88007b5759f0] jbd2__journal_start at ffffffffa01ad4cb [jbd2]
 #4 [ffff88007b575a30] jbd2_journal_start at ffffffffa01ad523 [jbd2]
 #5 [ffff88007b575a40] ext4_journal_start_sb at ffffffffa03a7d83 [ext4]
 #6 [ffff88007b575ab0] ext4_da_write_begin at ffffffffa038ddce [ext4]
 #7 [ffff88007b575b50] generic_file_buffered_write at ffffffff811005ce
 #8 [ffff88007b575c10] __generic_file_aio_write at ffffffff81103418
 #9 [ffff88007b575cd0] generic_file_aio_write at ffffffff811036b6
#10 [ffff88007b575d50] ext4_file_write at ffffffffa0386ee9 [ext4]
#11 [ffff88007b575de0] do_sync_write at ffffffff8115de52
#12 [ffff88007b575f00] vfs_write at ffffffff8115e3e8
#13 [ffff88007b575f30] sys_write at ffffffff8115e5b1
#14 [ffff88007b575f80] system_call_fastpath at ffffffff814c3f42
    RIP: 00000034930d4230  RSP: 00007fffd8176980  RFLAGS: 00000206
    RAX: 0000000000000001  RBX: ffffffff814c3f42  RCX: 0000000000000010
    RDX: 0000000000001000  RSI: 0000000001ac6000  RDI: 0000000000000001
    RBP: 0000000001ac6000   R8: 0000000000000003   R9: 0000000000040000
    R10: 0000000000003003  R11: 0000000000000246  R12: 0000000001ac5fff
    R13: 0000000000000000  R14: 0000000000001000  R15: 0000000000000246
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

[2]
crash> bt 16684
PID: 16684  TASK: ffff88007a2a9480  CPU: 1   COMMAND: "chattr"
 #0 [ffff88007af87ca8] __schedule at ffffffff814b9b77
 #1 [ffff88007af87d50] schedule at ffffffff814ba3cf
 #2 [ffff88007af87d60] jbd2_journal_lock_updates at ffffffffa01acca6 [jbd2]
 #3 [ffff88007af87de0] ext4_change_inode_journal_flag at ffffffffa038faf1 [ext4]
 #4 [ffff88007af87e10] ext4_ioctl at ffffffffa0392f48 [ext4]
 #5 [ffff88007af87ea0] do_vfs_ioctl at ffffffff8116fb8a
 #6 [ffff88007af87f30] sys_ioctl at ffffffff811700d1
 #7 [ffff88007af87f80] system_call_fastpath at ffffffff814c3f42
    RIP: 00000034930d95f7  RSP: 00007fff8830e0d8  RFLAGS: 00010213
    RAX: 0000000000000010  RBX: ffffffff814c3f42  RCX: 00000034930d95f7
    RDX: 00007fff8830e17c  RSI: 0000000040086602  RDI: 0000000000000003
    RBP: 0000000000084000   R8: 0000000000000000   R9: 0000000000000001
    R10: 0000000000000000  R11: 0000000000000246  R12: 00007fff8830f8db
    R13: 0000000000000001  R14: 0000000000000000  R15: 0000000000000003
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b

[3]
crash> bt 23722
PID: 23722  TASK: ffff88007b496080  CPU: 1   COMMAND: "flush-7:0"
 #0 [ffff88007b4f7840] __schedule at ffffffff814b9b77
 #1 [ffff88007b4f78e8] schedule at ffffffff814ba3cf
 #2 [ffff88007b4f78f8] start_this_handle at ffffffffa01acef5 [jbd2]
 #3 [ffff88007b4f79b8] jbd2__journal_start at ffffffffa01ad4cb [jbd2]
 #4 [ffff88007b4f79f8] jbd2_journal_start at ffffffffa01ad523 [jbd2]
 #5 [ffff88007b4f7a08] ext4_journal_start_sb at ffffffffa03a7d83 [ext4]
 #6 [ffff88007b4f7a78] ext4_da_writepages at ffffffffa0390b6d [ext4]
 #7 [ffff88007b4f7bc8] do_writepages at ffffffff8110c261
 #8 [ffff88007b4f7bd8] writeback_single_inode at ffffffff81185830
 #9 [ffff88007b4f7c38] writeback_sb_inodes at ffffffff81185e21
#10 [ffff88007b4f7ce8] __writeback_inodes_wb at ffffffff81185f7e
#11 [ffff88007b4f7d38] wb_writeback at ffffffff81186223
#12 [ffff88007b4f7db8] wb_do_writeback at ffffffff811864f9
#13 [ffff88007b4f7e58] bdi_writeback_thread at ffffffff811865fa
#14 [ffff88007b4f7ee8] kthread at ffffffff810828c6
#15 [ffff88007b4f7f48] kernel_thread_helper at ffffffff814c60b4

Step to reproduce a hang-up(figure):
 =======================================+=======================================
 ( "chattr -j" process PID: 16684 )     |  ( a certain process )
t ======================================+=======================================
i                                       |
m                                       +---------------------------------------
e                                       |jbd2_journal_start()
|                                       +---------------------------------------
|                                       | . . .
|                                       | t_updates++ // t_updates = 1
|                                       | . . .
|                                       +---------------------------------------
|                                        ...
|---------------------------------------+---------------------------------------
|jbd2_journal_lock_updates()            |jbd2_journal_stop()
|---------------------------------------+---------------------------------------
| write_lock(&journal->j_state_lock)    |    .
| ++journal->j_barrier_count            |    .
| spin_lock(&tran->t_handle_lock)       |    .
| atomic_read(&tran->t_updates) //not 0 |
|                                       | atomic_dec_and_test(&tran->t_updates)
|                                       |    // t_updates = 0
|                                       | wake_up(&journal->j_wait_updates)
| prepare_to_wait()                     |    // no process is woken up.
| spin_unlock(&tran->t_handle_lock)     + wake_up(&journal->j_wait_tran_locked)
| write_unlock(&journal->j_state_lock)  |    // no process is woken up.
| schedule() // never return            +---------------------------------------
|                                       |  ( "dd" PID: 16645 or "flush-7:0" )
|                                       +=======================================
|                                       |start_this_handle()
|                                       +---------------------------------------
|                                       | . . .
|                                       | read_lock(&journal->j_state_lock)
|                                       | if (journal->j_barrier_count)
|                                       |  read_unlock(&journal->j_state_lock)
|                                       |  wait_event(journal->j_wait_tran_locked)
v                                       |  // never return

Therefore, jbd2_journal_stop() needs spin_lock(&tran->t_handle_lock) or 
read_lock(&journal->j_state_lock) to synchronize trans->t_updates with other
operators.
===============================================================================

Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
---
 fs/jbd2/transaction.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index a0e41a4..d48f395 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1448,12 +1448,14 @@ int jbd2_journal_stop(handle_t *handle)
 	 * once we do this, we must not dereference transaction
 	 * pointer again.
 	 */
+	read_lock(&journal->j_state_lock);
 	tid = transaction->t_tid;
 	if (atomic_dec_and_test(&transaction->t_updates)) {
 		wake_up(&journal->j_wait_updates);
 		if (journal->j_barrier_count)
 			wake_up(&journal->j_wait_transaction_locked);
 	}
+	read_unlock(&journal->j_state_lock);
 
 	if (wait_for_commit)
 		err = jbd2_log_wait_commit(journal, tid);
-- 
1.5.5.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH][RFC] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_updates operations
  2011-12-16 11:19 [PATCH][RFC] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_updates operations Toshiyuki Okajima
@ 2011-12-16 12:22 ` Yongqiang Yang
  2011-12-20 10:44   ` Toshiyuki Okajima
  0 siblings, 1 reply; 14+ messages in thread
From: Yongqiang Yang @ 2011-12-16 12:22 UTC (permalink / raw)
  To: Toshiyuki Okajima; +Cc: tytso, adilger.kernel, linux-ext4

On Fri, Dec 16, 2011 at 7:19 PM, Toshiyuki Okajima
<toshi.okajima@jp.fujitsu.com> wrote:
> Hi.
>
> I found a bug by executing the following reproducer (needs tuning).
> (The reproducer never ends.)
>
> I wrote it for a confirmation of
>  "[PATCH 1/5] ext4: allocate delalloc blocks before changing journal mode".
> Therefore, without
>  "[PATCH 1/5] ext4: allocate delalloc blocks before changing journal mode"
> patch, the other problem (panic) which the patch describes can happen more
> frequently:
> -------------------------------------------------------------------------------
> #!/bin/sh
>
> date
> LOOP=100000
> dd if=/dev/zero of=/tmp/ext4.img bs=1k count=1000k
> /sbin/mkfs.ext4 -Fq /tmp/ext4.img
> mount -o loop /tmp/ext4.img /mnt
> rm -f /mnt/file
> echo "0" > /mnt/file
> (while [ 1 ]; do dd if=/dev/zero of=/mnt/file2 bs=4k > /dev/null 2>&1; done) &
> PID=$!
> for ((i=0; i < LOOP; i++));
> do
>        DATE=$(date)
>        echo -n "[LOOP $i] $DATE"
>        if ((i%2 == 0));
>        then
>                chattr +j /mnt/file
>        else
>                chattr -j /mnt/file
>        fi
>        echo "0" >> /mnt/file
> done
> kill -9 $PID
> rm -f /mnt/file*
> umount /mnt
> exit 0
> -------------------------------------------------------------------------------
> Though I ran it, it never ended forever.
> (At one of my tries to reproduce the hang, the reproducer stopped in 6
> hours.)
>
> Because jbd2_journal_lock_updates() never returns from schedule()
> after prepare_to_wait().
>
> The detailed explanation is as follows:
> ===============================================================================
> Current implementation of jbd2_journal_stop() has a bug not to synchronize
> t_updates with other operators, jbd2_journal_lock_updates(), start_this_handle()
IMHO, the description is not very exactly, the problem is not due to
desync of t_updates, t_updates is a var of atomic_t.   The reason is
that we have a critical code section here:[operation on t_updates,
wait or wakeup], this section should be synchronized.

> and jbd2_journal_commit_transaction().
>
> This bug was derived from commit: c35a56a090eacefca07afeb994029b57d8dd8025 and
>  commit: 8dd420466c7bfc459fa04680bd5690bfc41a4553.
>
> ---------------------------
> jbd2_journal_lock_updates()
> ---------------------------
>  509         write_lock(&journal->j_state_lock);
>  510         ++journal->j_barrier_count;
> ...
>  513         while (1) {
> ...
>  519                 spin_lock(&transaction->t_handle_lock);
>  520                 if (!atomic_read(&transaction->t_updates)) {
>  521                         spin_unlock(&transaction->t_handle_lock);
>  522                         break;
>  523                 }
>  524                 prepare_to_wait(&journal->j_wait_updates, &wait,
>  525                                 TASK_UNINTERRUPTIBLE);
>  526                 spin_unlock(&transaction->t_handle_lock);
It seems that t_handle_lock is used to protect the critical section
here.   I think t_handle is not needed here at all.   t_handle_lock
should be used to protect transaction and j_state_lock should be used
to protect journal.  prepare_to_wait operates on journal, so we should
remove t_handle_lock here.

>  527                 write_unlock(&journal->j_state_lock);
>  528                 schedule();
>  529                 finish_wait(&journal->j_wait_updates, &wait);
>  530                 write_lock(&journal->j_state_lock);
>  531         }
>  532         write_unlock(&journal->j_state_lock);
>
> * The operation of t_updates is protected by write_lock(&journal->j_state_lock)
>  and spin_lock(&transaction->t_handle_lock).
>
> -------------------
> start_this_handle()
> -------------------
>  160         read_lock(&journal->j_state_lock);
> ...
>  278         atomic_inc(&transaction->t_updates);
> ...
>  284         read_unlock(&journal->j_state_lock);
>
> * The operation of t_updates is protected by read_lock(&journal->j_state_lock).
>
> ---------------------------------
> jbd2_journal_commit_transaction()
> ---------------------------------
>  358         write_lock(&journal->j_state_lock);
> ...
>  367         spin_lock(&commit_transaction->t_handle_lock);
>  368         while (atomic_read(&commit_transaction->t_updates)) {
>  369                 DEFINE_WAIT(wait);
>  370
>  371                 prepare_to_wait(&journal->j_wait_updates, &wait,
>  372                                         TASK_UNINTERRUPTIBLE);
>  373                 if (atomic_read(&commit_transaction->t_updates)) {
>  374                         spin_unlock(&commit_transaction->t_handle_lock);
Also, t_handle_lock is used to protect the critical section.  I think
t_handle_lock is not needed here at all.

>  375                         write_unlock(&journal->j_state_lock);
>  376                         schedule();
>  377                         write_lock(&journal->j_state_lock);
>  378                         spin_lock(&commit_transaction->t_handle_lock);
>  379                 }
>  380                 finish_wait(&journal->j_wait_updates, &wait);
>  381         }
>  382         spin_unlock(&commit_transaction->t_handle_lock);
> ...
>  447         wake_up(&journal->j_wait_transaction_locked);
>  448         write_unlock(&journal->j_state_lock);
>
> * The operation of t_updates is protected by write_lock(&journal->j_state_lock)
>  and spin_lock(&transaction->t_handle_lock).
>
> -------------------
> jbd2_journal_stop()
> -------------------
> ...
> 1452         if (atomic_dec_and_test(&transaction->t_updates)) {
> 1453                 wake_up(&journal->j_wait_updates);
> 1454                 if (journal->j_barrier_count)
> 1455                         wake_up(&journal->j_wait_transaction_locked);
> 1456         }
> 1457
> 1458         if (wait_for_commit)
> 1459                 err = jbd2_log_wait_commit(journal, tid);
> 1460
> 1461         lock_map_release(&handle->h_lockdep_map);
> 1462
> 1463         jbd2_free_handle(handle);
> 1464         return err;
> 1465 }
>
> * The operation of t_updates is not protected.
>
> Therefore,
> - jbd2_journal_lock_updates() sleeps if transaction->t_update is 0.
> - jbd2_journal_stop() tries to wake someone up
> (wake_up(&journal->j_wait_updates)) after transaction->t_updates becomes 0.
> => But jbd2_journal_lock_updates() isn't woken up and then sleeps forever,
>   if it has not yet slept completely.
>
> After we run the reproducer, we can see the processes which become hang-up.
>
> -------------------------------------------------------------------------------
> crash> ps | grep UN
>  16645  23737   1  ffff88007a592b20  UN   0.0  105136    508  dd
>  16684  23727   1  ffff88007a2a9480  UN   0.0  107152    512  chattr
>  23722      2   1  ffff88007b496080  UN   0.0       0      0  [flush-7:0]
>
> [1]
> crash> bt 16645
> PID: 16645  TASK: ffff88007a592b20  CPU: 1   COMMAND: "dd"
>  #0 [ffff88007b575878] __schedule at ffffffff814b9b77
>  #1 [ffff88007b575920] schedule at ffffffff814ba3cf
>  #2 [ffff88007b575930] start_this_handle at ffffffffa01acef5 [jbd2]
>  #3 [ffff88007b5759f0] jbd2__journal_start at ffffffffa01ad4cb [jbd2]
>  #4 [ffff88007b575a30] jbd2_journal_start at ffffffffa01ad523 [jbd2]
>  #5 [ffff88007b575a40] ext4_journal_start_sb at ffffffffa03a7d83 [ext4]
>  #6 [ffff88007b575ab0] ext4_da_write_begin at ffffffffa038ddce [ext4]
>  #7 [ffff88007b575b50] generic_file_buffered_write at ffffffff811005ce
>  #8 [ffff88007b575c10] __generic_file_aio_write at ffffffff81103418
>  #9 [ffff88007b575cd0] generic_file_aio_write at ffffffff811036b6
> #10 [ffff88007b575d50] ext4_file_write at ffffffffa0386ee9 [ext4]
> #11 [ffff88007b575de0] do_sync_write at ffffffff8115de52
> #12 [ffff88007b575f00] vfs_write at ffffffff8115e3e8
> #13 [ffff88007b575f30] sys_write at ffffffff8115e5b1
> #14 [ffff88007b575f80] system_call_fastpath at ffffffff814c3f42
>    RIP: 00000034930d4230  RSP: 00007fffd8176980  RFLAGS: 00000206
>    RAX: 0000000000000001  RBX: ffffffff814c3f42  RCX: 0000000000000010
>    RDX: 0000000000001000  RSI: 0000000001ac6000  RDI: 0000000000000001
>    RBP: 0000000001ac6000   R8: 0000000000000003   R9: 0000000000040000
>    R10: 0000000000003003  R11: 0000000000000246  R12: 0000000001ac5fff
>    R13: 0000000000000000  R14: 0000000000001000  R15: 0000000000000246
>    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
>
> [2]
> crash> bt 16684
> PID: 16684  TASK: ffff88007a2a9480  CPU: 1   COMMAND: "chattr"
>  #0 [ffff88007af87ca8] __schedule at ffffffff814b9b77
>  #1 [ffff88007af87d50] schedule at ffffffff814ba3cf
>  #2 [ffff88007af87d60] jbd2_journal_lock_updates at ffffffffa01acca6 [jbd2]
>  #3 [ffff88007af87de0] ext4_change_inode_journal_flag at ffffffffa038faf1 [ext4]
>  #4 [ffff88007af87e10] ext4_ioctl at ffffffffa0392f48 [ext4]
>  #5 [ffff88007af87ea0] do_vfs_ioctl at ffffffff8116fb8a
>  #6 [ffff88007af87f30] sys_ioctl at ffffffff811700d1
>  #7 [ffff88007af87f80] system_call_fastpath at ffffffff814c3f42
>    RIP: 00000034930d95f7  RSP: 00007fff8830e0d8  RFLAGS: 00010213
>    RAX: 0000000000000010  RBX: ffffffff814c3f42  RCX: 00000034930d95f7
>    RDX: 00007fff8830e17c  RSI: 0000000040086602  RDI: 0000000000000003
>    RBP: 0000000000084000   R8: 0000000000000000   R9: 0000000000000001
>    R10: 0000000000000000  R11: 0000000000000246  R12: 00007fff8830f8db
>    R13: 0000000000000001  R14: 0000000000000000  R15: 0000000000000003
>    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b
>
> [3]
> crash> bt 23722
> PID: 23722  TASK: ffff88007b496080  CPU: 1   COMMAND: "flush-7:0"
>  #0 [ffff88007b4f7840] __schedule at ffffffff814b9b77
>  #1 [ffff88007b4f78e8] schedule at ffffffff814ba3cf
>  #2 [ffff88007b4f78f8] start_this_handle at ffffffffa01acef5 [jbd2]
>  #3 [ffff88007b4f79b8] jbd2__journal_start at ffffffffa01ad4cb [jbd2]
>  #4 [ffff88007b4f79f8] jbd2_journal_start at ffffffffa01ad523 [jbd2]
>  #5 [ffff88007b4f7a08] ext4_journal_start_sb at ffffffffa03a7d83 [ext4]
>  #6 [ffff88007b4f7a78] ext4_da_writepages at ffffffffa0390b6d [ext4]
>  #7 [ffff88007b4f7bc8] do_writepages at ffffffff8110c261
>  #8 [ffff88007b4f7bd8] writeback_single_inode at ffffffff81185830
>  #9 [ffff88007b4f7c38] writeback_sb_inodes at ffffffff81185e21
> #10 [ffff88007b4f7ce8] __writeback_inodes_wb at ffffffff81185f7e
> #11 [ffff88007b4f7d38] wb_writeback at ffffffff81186223
> #12 [ffff88007b4f7db8] wb_do_writeback at ffffffff811864f9
> #13 [ffff88007b4f7e58] bdi_writeback_thread at ffffffff811865fa
> #14 [ffff88007b4f7ee8] kthread at ffffffff810828c6
> #15 [ffff88007b4f7f48] kernel_thread_helper at ffffffff814c60b4
>
> Step to reproduce a hang-up(figure):
>  =======================================+=======================================
>  ( "chattr -j" process PID: 16684 )     |  ( a certain process )
> t ======================================+=======================================
> i                                       |
> m                                       +---------------------------------------
> e                                       |jbd2_journal_start()
> |                                       +---------------------------------------
> |                                       | . . .
> |                                       | t_updates++ // t_updates = 1
> |                                       | . . .
> |                                       +---------------------------------------
> |                                        ...
> |---------------------------------------+---------------------------------------
> |jbd2_journal_lock_updates()            |jbd2_journal_stop()
> |---------------------------------------+---------------------------------------
> | write_lock(&journal->j_state_lock)    |    .
> | ++journal->j_barrier_count            |    .
> | spin_lock(&tran->t_handle_lock)       |    .
> | atomic_read(&tran->t_updates) //not 0 |
> |                                       | atomic_dec_and_test(&tran->t_updates)
> |                                       |    // t_updates = 0
> |                                       | wake_up(&journal->j_wait_updates)
> | prepare_to_wait()                     |    // no process is woken up.
> | spin_unlock(&tran->t_handle_lock)     + wake_up(&journal->j_wait_tran_locked)
> | write_unlock(&journal->j_state_lock)  |    // no process is woken up.
> | schedule() // never return            +---------------------------------------
> |                                       |  ( "dd" PID: 16645 or "flush-7:0" )
> |                                       +=======================================
> |                                       |start_this_handle()
> |                                       +---------------------------------------
> |                                       | . . .
> |                                       | read_lock(&journal->j_state_lock)
> |                                       | if (journal->j_barrier_count)
> |                                       |  read_unlock(&journal->j_state_lock)
> |                                       |  wait_event(journal->j_wait_tran_locked)
> v                                       |  // never return
>
> Therefore, jbd2_journal_stop() needs spin_lock(&tran->t_handle_lock) or
> read_lock(&journal->j_state_lock) to synchronize trans->t_updates with other
> operators.
> ===============================================================================
>
> Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
> ---
>  fs/jbd2/transaction.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index a0e41a4..d48f395 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -1448,12 +1448,14 @@ int jbd2_journal_stop(handle_t *handle)
>         * once we do this, we must not dereference transaction
>         * pointer again.
>         */
> +       read_lock(&journal->j_state_lock);
>        tid = transaction->t_tid;
>        if (atomic_dec_and_test(&transaction->t_updates)) {
>                wake_up(&journal->j_wait_updates);
>                if (journal->j_barrier_count)
oops, j_barrier_count is not protected too in original code.  The
problem does not exists in patched code any more:-)


Yongqiang.
>                        wake_up(&journal->j_wait_transaction_locked);
>        }
> +       read_unlock(&journal->j_state_lock);
>
>        if (wait_for_commit)
>                err = jbd2_log_wait_commit(journal, tid);
> --
> 1.5.5.6
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH][RFC] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_updates operations
  2011-12-16 12:22 ` Yongqiang Yang
@ 2011-12-20 10:44   ` Toshiyuki Okajima
  2011-12-22 11:36     ` [PATCH 0/2 take2][RFC] hangup in jbd2_journal_lock_updates Toshiyuki Okajima
  0 siblings, 1 reply; 14+ messages in thread
From: Toshiyuki Okajima @ 2011-12-20 10:44 UTC (permalink / raw)
  To: Yongqiang Yang; +Cc: toshi.okajima, tytso, adilger.kernel, linux-ext4

Hi, Yongqiang.

Thanks for reviewing.

(2011/12/16 21:22), Yongqiang Yang wrote:
> On Fri, Dec 16, 2011 at 7:19 PM, Toshiyuki Okajima
> <toshi.okajima@jp.fujitsu.com> wrote:
>> Hi.
>>
>> I found a bug by executing the following reproducer (needs tuning).
>> (The reproducer never ends.)
>>
>> I wrote it for a confirmation of
>>  "[PATCH 1/5] ext4: allocate delalloc blocks before changing journal mode".
>> Therefore, without
>>  "[PATCH 1/5] ext4: allocate delalloc blocks before changing journal mode"
>> patch, the other problem (panic) which the patch describes can happen more
>> frequently:
>> -------------------------------------------------------------------------------
>> #!/bin/sh
>>
>> date
>> LOOP=100000
>> dd if=/dev/zero of=/tmp/ext4.img bs=1k count=1000k
>> /sbin/mkfs.ext4 -Fq /tmp/ext4.img
>> mount -o loop /tmp/ext4.img /mnt
>> rm -f /mnt/file
>> echo "0" > /mnt/file
>> (while [ 1 ]; do dd if=/dev/zero of=/mnt/file2 bs=4k > /dev/null 2>&1; done) &
>> PID=$!
>> for ((i=0; i < LOOP; i++));
>> do
>>        DATE=$(date)
>>        echo -n "[LOOP $i] $DATE"
>>        if ((i%2 == 0));
>>        then
>>                chattr +j /mnt/file
>>        else
>>                chattr -j /mnt/file
>>        fi
>>        echo "0" >> /mnt/file
>> done
>> kill -9 $PID
>> rm -f /mnt/file*
>> umount /mnt
>> exit 0
>> -------------------------------------------------------------------------------
>> Though I ran it, it never ended forever.
>> (At one of my tries to reproduce the hang, the reproducer stopped in 6
>> hours.)
>>
>> Because jbd2_journal_lock_updates() never returns from schedule()
>> after prepare_to_wait().
>>
>> The detailed explanation is as follows:
>> ===============================================================================
>> Current implementation of jbd2_journal_stop() has a bug not to synchronize
>> t_updates with other operators, jbd2_journal_lock_updates(), start_this_handle()

> IMHO, the description is not very exactly, the problem is not due to
> desync of t_updates, t_updates is a var of atomic_t.   The reason is
> that we have a critical code section here:[operation on t_updates,
> wait or wakeup], this section should be synchronized.
I understood. I will rewrite the description much better than this by 
applying your comment. 

> 
>> and jbd2_journal_commit_transaction().
>>
>> This bug was derived from commit: c35a56a090eacefca07afeb994029b57d8dd8025 and
>>  commit: 8dd420466c7bfc459fa04680bd5690bfc41a4553.
>>
>> ---------------------------
>> jbd2_journal_lock_updates()
>> ---------------------------
>>  509         write_lock(&journal->j_state_lock);
>>  510         ++journal->j_barrier_count;
>> ...
>>  513         while (1) {
>> ...
>>  519                 spin_lock(&transaction->t_handle_lock);
>>  520                 if (!atomic_read(&transaction->t_updates)) {
>>  521                         spin_unlock(&transaction->t_handle_lock);
>>  522                         break;
>>  523                 }
>>  524                 prepare_to_wait(&journal->j_wait_updates, &wait,
>>  525                                 TASK_UNINTERRUPTIBLE);
>>  526                 spin_unlock(&transaction->t_handle_lock);

> It seems that t_handle_lock is used to protect the critical section
> here.   I think t_handle is not needed here at all.   t_handle_lock
> should be used to protect transaction and j_state_lock should be used
> to protect journal.  prepare_to_wait operates on journal, so we should
> remove t_handle_lock here.
I think so, too. Then, I will write another patch which deletes the statements
of  t_handle_lock.

> 
>>  527                 write_unlock(&journal->j_state_lock);
>>  528                 schedule();
>>  529                 finish_wait(&journal->j_wait_updates, &wait);
>>  530                 write_lock(&journal->j_state_lock);
>>  531         }
>>  532         write_unlock(&journal->j_state_lock);
>>
>> * The operation of t_updates is protected by write_lock(&journal->j_state_lock)
>>  and spin_lock(&transaction->t_handle_lock).
>>
>> -------------------
>> start_this_handle()
>> -------------------
>>  160         read_lock(&journal->j_state_lock);
>> ...
>>  278         atomic_inc(&transaction->t_updates);
>> ...
>>  284         read_unlock(&journal->j_state_lock);
>>
>> * The operation of t_updates is protected by read_lock(&journal->j_state_lock).
>>
>> ---------------------------------
>> jbd2_journal_commit_transaction()
>> ---------------------------------
>>  358         write_lock(&journal->j_state_lock);
>> ...
>>  367         spin_lock(&commit_transaction->t_handle_lock);
>>  368         while (atomic_read(&commit_transaction->t_updates)) {
>>  369                 DEFINE_WAIT(wait);
>>  370
>>  371                 prepare_to_wait(&journal->j_wait_updates, &wait,
>>  372                                         TASK_UNINTERRUPTIBLE);
>>  373                 if (atomic_read(&commit_transaction->t_updates)) {
>>  374                         spin_unlock(&commit_transaction->t_handle_lock);

> Also, t_handle_lock is used to protect the critical section.  I think
> t_handle_lock is not needed here at all.
Yes.

> 
>>  375                         write_unlock(&journal->j_state_lock);
>>  376                         schedule();
>>  377                         write_lock(&journal->j_state_lock);
>>  378                         spin_lock(&commit_transaction->t_handle_lock);
>>  379                 }
>>  380                 finish_wait(&journal->j_wait_updates, &wait);
>>  381         }
>>  382         spin_unlock(&commit_transaction->t_handle_lock);
>> ...
>>  447         wake_up(&journal->j_wait_transaction_locked);
>>  448         write_unlock(&journal->j_state_lock);
>>
>> * The operation of t_updates is protected by write_lock(&journal->j_state_lock)
>>  and spin_lock(&transaction->t_handle_lock).
>>
>> -------------------
>> jbd2_journal_stop()
>> -------------------
>> ...
>> 1452         if (atomic_dec_and_test(&transaction->t_updates)) {
>> 1453                 wake_up(&journal->j_wait_updates);
>> 1454                 if (journal->j_barrier_count)
>> 1455                         wake_up(&journal->j_wait_transaction_locked);
>> 1456         }
>> 1457
>> 1458         if (wait_for_commit)
>> 1459                 err = jbd2_log_wait_commit(journal, tid);
>> 1460
>> 1461         lock_map_release(&handle->h_lockdep_map);
>> 1462
>> 1463         jbd2_free_handle(handle);
>> 1464         return err;
>> 1465 }
>>
>> * The operation of t_updates is not protected.
>>
>> Therefore,
>> - jbd2_journal_lock_updates() sleeps if transaction->t_update is 0.
>> - jbd2_journal_stop() tries to wake someone up
>> (wake_up(&journal->j_wait_updates)) after transaction->t_updates becomes 0.
>> => But jbd2_journal_lock_updates() isn't woken up and then sleeps forever,
>>   if it has not yet slept completely.
>>
>> After we run the reproducer, we can see the processes which become hang-up.
>>
>> -------------------------------------------------------------------------------
>> crash> ps | grep UN
>>  16645  23737   1  ffff88007a592b20  UN   0.0  105136    508  dd
>>  16684  23727   1  ffff88007a2a9480  UN   0.0  107152    512  chattr
>>  23722      2   1  ffff88007b496080  UN   0.0       0      0  [flush-7:0]
>>
>> [1]
>> crash> bt 16645
>> PID: 16645  TASK: ffff88007a592b20  CPU: 1   COMMAND: "dd"
>>  #0 [ffff88007b575878] __schedule at ffffffff814b9b77
>>  #1 [ffff88007b575920] schedule at ffffffff814ba3cf
>>  #2 [ffff88007b575930] start_this_handle at ffffffffa01acef5 [jbd2]
>>  #3 [ffff88007b5759f0] jbd2__journal_start at ffffffffa01ad4cb [jbd2]
>>  #4 [ffff88007b575a30] jbd2_journal_start at ffffffffa01ad523 [jbd2]
>>  #5 [ffff88007b575a40] ext4_journal_start_sb at ffffffffa03a7d83 [ext4]
>>  #6 [ffff88007b575ab0] ext4_da_write_begin at ffffffffa038ddce [ext4]
>>  #7 [ffff88007b575b50] generic_file_buffered_write at ffffffff811005ce
>>  #8 [ffff88007b575c10] __generic_file_aio_write at ffffffff81103418
>>  #9 [ffff88007b575cd0] generic_file_aio_write at ffffffff811036b6
>> #10 [ffff88007b575d50] ext4_file_write at ffffffffa0386ee9 [ext4]
>> #11 [ffff88007b575de0] do_sync_write at ffffffff8115de52
>> #12 [ffff88007b575f00] vfs_write at ffffffff8115e3e8
>> #13 [ffff88007b575f30] sys_write at ffffffff8115e5b1
>> #14 [ffff88007b575f80] system_call_fastpath at ffffffff814c3f42
>>    RIP: 00000034930d4230  RSP: 00007fffd8176980  RFLAGS: 00000206
>>    RAX: 0000000000000001  RBX: ffffffff814c3f42  RCX: 0000000000000010
>>    RDX: 0000000000001000  RSI: 0000000001ac6000  RDI: 0000000000000001
>>    RBP: 0000000001ac6000   R8: 0000000000000003   R9: 0000000000040000
>>    R10: 0000000000003003  R11: 0000000000000246  R12: 0000000001ac5fff
>>    R13: 0000000000000000  R14: 0000000000001000  R15: 0000000000000246
>>    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
>>
>> [2]
>> crash> bt 16684
>> PID: 16684  TASK: ffff88007a2a9480  CPU: 1   COMMAND: "chattr"
>>  #0 [ffff88007af87ca8] __schedule at ffffffff814b9b77
>>  #1 [ffff88007af87d50] schedule at ffffffff814ba3cf
>>  #2 [ffff88007af87d60] jbd2_journal_lock_updates at ffffffffa01acca6 [jbd2]
>>  #3 [ffff88007af87de0] ext4_change_inode_journal_flag at ffffffffa038faf1 [ext4]
>>  #4 [ffff88007af87e10] ext4_ioctl at ffffffffa0392f48 [ext4]
>>  #5 [ffff88007af87ea0] do_vfs_ioctl at ffffffff8116fb8a
>>  #6 [ffff88007af87f30] sys_ioctl at ffffffff811700d1
>>  #7 [ffff88007af87f80] system_call_fastpath at ffffffff814c3f42
>>    RIP: 00000034930d95f7  RSP: 00007fff8830e0d8  RFLAGS: 00010213
>>    RAX: 0000000000000010  RBX: ffffffff814c3f42  RCX: 00000034930d95f7
>>    RDX: 00007fff8830e17c  RSI: 0000000040086602  RDI: 0000000000000003
>>    RBP: 0000000000084000   R8: 0000000000000000   R9: 0000000000000001
>>    R10: 0000000000000000  R11: 0000000000000246  R12: 00007fff8830f8db
>>    R13: 0000000000000001  R14: 0000000000000000  R15: 0000000000000003
>>    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b
>>
>> [3]
>> crash> bt 23722
>> PID: 23722  TASK: ffff88007b496080  CPU: 1   COMMAND: "flush-7:0"
>>  #0 [ffff88007b4f7840] __schedule at ffffffff814b9b77
>>  #1 [ffff88007b4f78e8] schedule at ffffffff814ba3cf
>>  #2 [ffff88007b4f78f8] start_this_handle at ffffffffa01acef5 [jbd2]
>>  #3 [ffff88007b4f79b8] jbd2__journal_start at ffffffffa01ad4cb [jbd2]
>>  #4 [ffff88007b4f79f8] jbd2_journal_start at ffffffffa01ad523 [jbd2]
>>  #5 [ffff88007b4f7a08] ext4_journal_start_sb at ffffffffa03a7d83 [ext4]
>>  #6 [ffff88007b4f7a78] ext4_da_writepages at ffffffffa0390b6d [ext4]
>>  #7 [ffff88007b4f7bc8] do_writepages at ffffffff8110c261
>>  #8 [ffff88007b4f7bd8] writeback_single_inode at ffffffff81185830
>>  #9 [ffff88007b4f7c38] writeback_sb_inodes at ffffffff81185e21
>> #10 [ffff88007b4f7ce8] __writeback_inodes_wb at ffffffff81185f7e
>> #11 [ffff88007b4f7d38] wb_writeback at ffffffff81186223
>> #12 [ffff88007b4f7db8] wb_do_writeback at ffffffff811864f9
>> #13 [ffff88007b4f7e58] bdi_writeback_thread at ffffffff811865fa
>> #14 [ffff88007b4f7ee8] kthread at ffffffff810828c6
>> #15 [ffff88007b4f7f48] kernel_thread_helper at ffffffff814c60b4
>>
>> Step to reproduce a hang-up(figure):
>>  =======================================+=======================================
>>  ( "chattr -j" process PID: 16684 )     |  ( a certain process )
>> t ======================================+=======================================
>> i                                       |
>> m                                       +---------------------------------------
>> e                                       |jbd2_journal_start()
>> |                                       +---------------------------------------
>> |                                       | . . .
>> |                                       | t_updates++ // t_updates = 1
>> |                                       | . . .
>> |                                       +---------------------------------------
>> |                                        ...
>> |---------------------------------------+---------------------------------------
>> |jbd2_journal_lock_updates()            |jbd2_journal_stop()
>> |---------------------------------------+---------------------------------------
>> | write_lock(&journal->j_state_lock)    |    .
>> | ++journal->j_barrier_count            |    .
>> | spin_lock(&tran->t_handle_lock)       |    .
>> | atomic_read(&tran->t_updates) //not 0 |
>> |                                       | atomic_dec_and_test(&tran->t_updates)
>> |                                       |    // t_updates = 0
>> |                                       | wake_up(&journal->j_wait_updates)
>> | prepare_to_wait()                     |    // no process is woken up.
>> | spin_unlock(&tran->t_handle_lock)     + wake_up(&journal->j_wait_tran_locked)
>> | write_unlock(&journal->j_state_lock)  |    // no process is woken up.
>> | schedule() // never return            +---------------------------------------
>> |                                       |  ( "dd" PID: 16645 or "flush-7:0" )
>> |                                       +=======================================
>> |                                       |start_this_handle()
>> |                                       +---------------------------------------
>> |                                       | . . .
>> |                                       | read_lock(&journal->j_state_lock)
>> |                                       | if (journal->j_barrier_count)
>> |                                       |  read_unlock(&journal->j_state_lock)
>> |                                       |  wait_event(journal->j_wait_tran_locked)
>> v                                       |  // never return
>>
>> Therefore, jbd2_journal_stop() needs spin_lock(&tran->t_handle_lock) or
>> read_lock(&journal->j_state_lock) to synchronize trans->t_updates with other
>> operators.
>> ===============================================================================
>>
>> Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
>> ---
>>  fs/jbd2/transaction.c |    2 ++
>>  1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
>> index a0e41a4..d48f395 100644
>> --- a/fs/jbd2/transaction.c
>> +++ b/fs/jbd2/transaction.c
>> @@ -1448,12 +1448,14 @@ int jbd2_journal_stop(handle_t *handle)
>>         * once we do this, we must not dereference transaction
>>         * pointer again.
>>         */
>> +       read_lock(&journal->j_state_lock);
>>        tid = transaction->t_tid;
>>        if (atomic_dec_and_test(&transaction->t_updates)) {
>>                wake_up(&journal->j_wait_updates);
>>                if (journal->j_barrier_count)

> oops, j_barrier_count is not protected too in original code.  The
> problem does not exists in patched code any more:-)
OK. 
I will add the notice into the description which I will rewrite.

Best Regards,
Toshiyuki Okajima


> 
> 
> Yongqiang.
>>                        wake_up(&journal->j_wait_transaction_locked);
>>        }
>> +       read_unlock(&journal->j_state_lock);
>>
>>        if (wait_for_commit)
>>                err = jbd2_log_wait_commit(journal, tid);
>> --
>> 1.5.5.6
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 0/2 take2][RFC] hangup in jbd2_journal_lock_updates
  2011-12-20 10:44   ` Toshiyuki Okajima
@ 2011-12-22 11:36     ` Toshiyuki Okajima
  2011-12-22 11:56       ` [PATCH 1/2] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_update operations Toshiyuki Okajima
  2011-12-22 12:00       ` [PATCH 2/2] jbd2: remove all t_handle_lock statements Toshiyuki Okajima
  0 siblings, 2 replies; 14+ messages in thread
From: Toshiyuki Okajima @ 2011-12-22 11:36 UTC (permalink / raw)
  To: tytso, adilger.kernel; +Cc: Yongqiang Yang, linux-ext4

Hi.

I found a bug by executing the following reproducer.  The reproducer is for a 
confirmation of 
 "[PATCH 1/5] ext4: allocate delalloc blocks before changing journal mode".
Therefore, without its patch, the other problem (panic) which the patch 
describes can happen more frequently:
-------------------------------------------------------------------------------
#!/bin/sh

date
LOOP=100000
dd if=/dev/zero of=/tmp/ext4.img bs=1k count=1000k
/sbin/mkfs.ext4 -Fq /tmp/ext4.img
mount -o loop /tmp/ext4.img /mnt
rm -f /mnt/file
echo "0" > /mnt/file
(while [ 1 ]; do dd if=/dev/zero of=/mnt/file2 bs=4k > /dev/null 2>&1; done) &
PID=$!
for ((i=0; i < LOOP; i++));
do
        DATE=$(date)
        echo -n "[LOOP $i] $DATE"
        if ((i%2 == 0));
        then
                chattr +j /mnt/file
        else
                chattr -j /mnt/file
        fi
        echo "0" >> /mnt/file
done
kill -9 $PID
rm -f /mnt/file*
umount /mnt
exit 0
-------------------------------------------------------------------------------
Though I ran it, it never ended forever.
(At one of my tries to reproduce the hangup, the reproducer stopped in 6
hours.)

Because jbd2_journal_lock_updates() never returns from schedule() after 
prepare_to_wait(). 
(Its filesystem becomes a hangup.)

The detailed explanation is as follows:
===============================================================================
Current implementation of jbd2_journal_stop() has a bug not to synchronize 
a critical code section of it with others,
 - jbd2_journal_lock_updates()
 - start_this_handle()
 - jbd2_journal_commit_transaction().

This bug was derived from commit: c35a56a090eacefca07afeb994029b57d8dd8025 and
 commit: 8dd420466c7bfc459fa04680bd5690bfc41a4553.

The source codes and critical code sections of them are as follows:
---------------------------
jbd2_journal_lock_updates()
---------------------------
 509         write_lock(&journal->j_state_lock);
--- critical code section ------------------------------------------------------
 510         ++journal->j_barrier_count;
...
 513         while (1) {
...
 519                 spin_lock(&transaction->t_handle_lock);
 520                 if (!atomic_read(&transaction->t_updates)) {
 521                         spin_unlock(&transaction->t_handle_lock);
 522                         break;
 523                 }
 524                 prepare_to_wait(&journal->j_wait_updates, &wait,
 525                                 TASK_UNINTERRUPTIBLE);
 526                 spin_unlock(&transaction->t_handle_lock);
 527                 write_unlock(&journal->j_state_lock);
 528                 schedule();
 529                 finish_wait(&journal->j_wait_updates, &wait);
 530                 write_lock(&journal->j_state_lock);
 531         }
-------------------------------------------------------------------------------
 532         write_unlock(&journal->j_state_lock);

* The critical code section is from line 510 to line 531
     referencer) protected by write_lock(j_state_lock) and 
	spin_lock(t_handle_lock)

-------------------
start_this_handle()
-------------------
 160         read_lock(&journal->j_state_lock);
--- critical code section ------------------------------------------------------
...
 278         atomic_inc(&transaction->t_updates);
...
-------------------------------------------------------------------------------
 284         read_unlock(&journal->j_state_lock);

* The critical code sections is in line 278 protected by read_lock(j_state_lock)

---------------------------------
jbd2_journal_commit_transaction()
---------------------------------
 358         write_lock(&journal->j_state_lock);
--- critical code section ------------------------------------------------------
...
 367         spin_lock(&commit_transaction->t_handle_lock);
 368         while (atomic_read(&commit_transaction->t_updates)) {
 369                 DEFINE_WAIT(wait);
 370 
 371                 prepare_to_wait(&journal->j_wait_updates, &wait,
 372                                         TASK_UNINTERRUPTIBLE);
 373                 if (atomic_read(&commit_transaction->t_updates)) {
 374                         spin_unlock(&commit_transaction->t_handle_lock);
 375                         write_unlock(&journal->j_state_lock);
 376                         schedule();
 377                         write_lock(&journal->j_state_lock);
 378                         spin_lock(&commit_transaction->t_handle_lock);
 379                 }
 380                 finish_wait(&journal->j_wait_updates, &wait);
 381         }
 382         spin_unlock(&commit_transaction->t_handle_lock);
...
 447         wake_up(&journal->j_wait_transaction_locked);
-------------------------------------------------------------------------------
 448         write_unlock(&journal->j_state_lock);

* The critical code sections is from line 367 to line 447 protected by 
  write_lock(&journal->j_state_lock) and spin_lock(&transaction->t_handle_lock).

-------------------
jbd2_journal_stop()
-------------------
...
--- critical code section ------------------------------------------------------
1452         if (atomic_dec_and_test(&transaction->t_updates)) {
1453                 wake_up(&journal->j_wait_updates);
1454                 if (journal->j_barrier_count)
1455                         wake_up(&journal->j_wait_transaction_locked);
1456         }
-------------------------------------------------------------------------------
1457 
1458         if (wait_for_commit)
1459                 err = jbd2_log_wait_commit(journal, tid);
1460 
1461         lock_map_release(&handle->h_lockdep_map);
1462 
1463         jbd2_free_handle(handle);
1464         return err;
1465 }

* The critical code sections is from line 1452 to line 1456. However,
 it isn't protected by no exclusive control.

Therefore, 
- jbd2_journal_lock_updates() sleeps if transaction->t_update is 0.
- jbd2_journal_stop() tries to wake someone up 
(wake_up(&journal->j_wait_updates)) after transaction->t_updates becomes 0. 
=> But jbd2_journal_lock_updates() isn't woken up and then sleeps forever,
   if it has not yet slept completely (before executing schedule()).

After we run the reproducer, we can see the processes which become hangup:
-------------------------------------------------------------------------------
crash> ps | grep UN
  16645  23737   1  ffff88007a592b20  UN   0.0  105136    508  dd
  16684  23727   1  ffff88007a2a9480  UN   0.0  107152    512  chattr
  23722      2   1  ffff88007b496080  UN   0.0       0      0  [flush-7:0]

[1]
crash> bt 16645
PID: 16645  TASK: ffff88007a592b20  CPU: 1   COMMAND: "dd"
 #0 [ffff88007b575878] __schedule at ffffffff814b9b77
 #1 [ffff88007b575920] schedule at ffffffff814ba3cf
 #2 [ffff88007b575930] start_this_handle at ffffffffa01acef5 [jbd2]
 #3 [ffff88007b5759f0] jbd2__journal_start at ffffffffa01ad4cb [jbd2]
 #4 [ffff88007b575a30] jbd2_journal_start at ffffffffa01ad523 [jbd2]
 #5 [ffff88007b575a40] ext4_journal_start_sb at ffffffffa03a7d83 [ext4]
 #6 [ffff88007b575ab0] ext4_da_write_begin at ffffffffa038ddce [ext4]
 #7 [ffff88007b575b50] generic_file_buffered_write at ffffffff811005ce
 #8 [ffff88007b575c10] __generic_file_aio_write at ffffffff81103418
 #9 [ffff88007b575cd0] generic_file_aio_write at ffffffff811036b6
#10 [ffff88007b575d50] ext4_file_write at ffffffffa0386ee9 [ext4]
#11 [ffff88007b575de0] do_sync_write at ffffffff8115de52
#12 [ffff88007b575f00] vfs_write at ffffffff8115e3e8
#13 [ffff88007b575f30] sys_write at ffffffff8115e5b1
#14 [ffff88007b575f80] system_call_fastpath at ffffffff814c3f42
    RIP: 00000034930d4230  RSP: 00007fffd8176980  RFLAGS: 00000206
    RAX: 0000000000000001  RBX: ffffffff814c3f42  RCX: 0000000000000010
    RDX: 0000000000001000  RSI: 0000000001ac6000  RDI: 0000000000000001
    RBP: 0000000001ac6000   R8: 0000000000000003   R9: 0000000000040000
    R10: 0000000000003003  R11: 0000000000000246  R12: 0000000001ac5fff
    R13: 0000000000000000  R14: 0000000000001000  R15: 0000000000000246
    ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b

[2]
crash> bt 16684
PID: 16684  TASK: ffff88007a2a9480  CPU: 1   COMMAND: "chattr"
 #0 [ffff88007af87ca8] __schedule at ffffffff814b9b77
 #1 [ffff88007af87d50] schedule at ffffffff814ba3cf
 #2 [ffff88007af87d60] jbd2_journal_lock_updates at ffffffffa01acca6 [jbd2]
 #3 [ffff88007af87de0] ext4_change_inode_journal_flag at ffffffffa038faf1 [ext4]
 #4 [ffff88007af87e10] ext4_ioctl at ffffffffa0392f48 [ext4]
 #5 [ffff88007af87ea0] do_vfs_ioctl at ffffffff8116fb8a
 #6 [ffff88007af87f30] sys_ioctl at ffffffff811700d1
 #7 [ffff88007af87f80] system_call_fastpath at ffffffff814c3f42
    RIP: 00000034930d95f7  RSP: 00007fff8830e0d8  RFLAGS: 00010213
    RAX: 0000000000000010  RBX: ffffffff814c3f42  RCX: 00000034930d95f7
    RDX: 00007fff8830e17c  RSI: 0000000040086602  RDI: 0000000000000003
    RBP: 0000000000084000   R8: 0000000000000000   R9: 0000000000000001
    R10: 0000000000000000  R11: 0000000000000246  R12: 00007fff8830f8db
    R13: 0000000000000001  R14: 0000000000000000  R15: 0000000000000003
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b

[3]
crash> bt 23722
PID: 23722  TASK: ffff88007b496080  CPU: 1   COMMAND: "flush-7:0"
 #0 [ffff88007b4f7840] __schedule at ffffffff814b9b77
 #1 [ffff88007b4f78e8] schedule at ffffffff814ba3cf
 #2 [ffff88007b4f78f8] start_this_handle at ffffffffa01acef5 [jbd2]
 #3 [ffff88007b4f79b8] jbd2__journal_start at ffffffffa01ad4cb [jbd2]
 #4 [ffff88007b4f79f8] jbd2_journal_start at ffffffffa01ad523 [jbd2]
 #5 [ffff88007b4f7a08] ext4_journal_start_sb at ffffffffa03a7d83 [ext4]
 #6 [ffff88007b4f7a78] ext4_da_writepages at ffffffffa0390b6d [ext4]
 #7 [ffff88007b4f7bc8] do_writepages at ffffffff8110c261
 #8 [ffff88007b4f7bd8] writeback_single_inode at ffffffff81185830
 #9 [ffff88007b4f7c38] writeback_sb_inodes at ffffffff81185e21
#10 [ffff88007b4f7ce8] __writeback_inodes_wb at ffffffff81185f7e
#11 [ffff88007b4f7d38] wb_writeback at ffffffff81186223
#12 [ffff88007b4f7db8] wb_do_writeback at ffffffff811864f9
#13 [ffff88007b4f7e58] bdi_writeback_thread at ffffffff811865fa
#14 [ffff88007b4f7ee8] kthread at ffffffff810828c6
#15 [ffff88007b4f7f48] kernel_thread_helper at ffffffff814c60b4

Step to reproduce a hangup(figure):
 =======================================+=======================================
 ( "chattr -j" process PID: 16684 )     |  ( a certain process )
t ======================================+=======================================
i                                       |
m                                       +---------------------------------------
e                                       |jbd2_journal_start()
|                                       +---------------------------------------
|                                       | . . .
|                                       | t_updates++ // t_updates = 1
|                                       + . . .
|                                       +---------------------------------------
|                                        ...
|---------------------------------------+---------------------------------------
|jbd2_journal_lock_updates()            |jbd2_journal_stop()
|---------------------------------------+---------------------------------------
| write_lock(&journal->j_state_lock)    |    .
| ++journal->j_barrier_count            |    .
| spin_lock(&tran->t_handle_lock)       |    .
| atomic_read(&tran->t_updates) //not 0 |
|                                       | atomic_dec_and_test(&tran->t_updates)
|                                       |    // t_updates = 0
|                                       | wake_up(&journal->j_wait_updates)
| prepare_to_wait()                     |    // no process is woken up.
| spin_unlock(&tran->t_handle_lock)     + wake_up(&journal->j_wait_tran_locked)
| write_unlock(&journal->j_state_lock)  |    // no process is woken up.
| schedule() // never return            +---------------------------------------
|                                       |  ( "dd" PID: 16645 or "flush-7:0" )
|                                       +=======================================
|                                       |start_this_handle()
|                                       +---------------------------------------
|                                       | . . .
|                                       | read_lock(&journal->j_state_lock)
|                                       | if (journal->j_barrier_count)
|                                       |  read_unlock(&journal->j_state_lock)
|                                       |  wait_event(jrnl->j_wait_tran_locked)
v                                       |  // never return

Therefore, jbd2_journal_stop() needs spin_lock(&tran->t_handle_lock) or 
read_lock(&journal->j_state_lock) to synchronize its critical code section with 
others.  Besides, I noticed that t_handle_lock exclusive control had already 
been unnecessary when I examined all the functions which used t_handle_lock. 
So, read_lock(&journal->j_state_lock) exclusive control is good for this fix.

Regards,
Toshiyuki Okajima

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_update operations
  2011-12-22 11:36     ` [PATCH 0/2 take2][RFC] hangup in jbd2_journal_lock_updates Toshiyuki Okajima
@ 2011-12-22 11:56       ` Toshiyuki Okajima
  2012-01-03 15:32         ` Jan Kara
  2011-12-22 12:00       ` [PATCH 2/2] jbd2: remove all t_handle_lock statements Toshiyuki Okajima
  1 sibling, 1 reply; 14+ messages in thread
From: Toshiyuki Okajima @ 2011-12-22 11:56 UTC (permalink / raw)
  To: tytso, adilger.kernel; +Cc: Yongqiang Yang, linux-ext4

The following statements need an exclusive control for the critical code section
around t_update operations:
[jbd2_journal_stop()]
1445         /*
1446          * Once we drop t_updates, if it goes to zero the transaction
1447          * could start committing on us and eventually disappear.  So
1448          * once we do this, we must not dereference transaction
1449          * pointer again.
1450          */
1451         tid = transaction->t_tid;
+    read_lock(&journal->j_state_lock);
----- critical code section ------------------------------------------------
1452         if (atomic_dec_and_test(&transaction->t_updates)) {
1453                 wake_up(&journal->j_wait_updates);
1454                 if (journal->j_barrier_count)
1455                         wake_up(&journal->j_wait_transaction_locked);
1456         }
-----------------------------------------------------------------------------
+    read_unlock(&journal->j_state_lock);
1457 

Because the functions which have the other critical code sections around t_update 
operations, 
 - jbd2_journal_commit_transaction
 - start_this_handle
 - jbd2_journal_lock_updates
can not synchronize with jbd2_journal_stop.

ex) jbd2_journal_lock_updates
 505 void jbd2_journal_lock_updates(journal_t *journal)
 506 {
 507         DEFINE_WAIT(wait);
 508 
 509         write_lock(&journal->j_state_lock);
 510         ++journal->j_barrier_count;
 511 
 512         /* Wait until there are no running updates */
 513         while (1) {
 514                transaction_t *transaction = journal->j_running_transaction;
 515 
 516                 if (!transaction)
 517                         break;
 518 
 519                 spin_lock(&transaction->t_handle_lock);
----- critical code section ------------------------------------------------
 520                 if (!atomic_read(&transaction->t_updates)) {
 521                         spin_unlock(&transaction->t_handle_lock);
 522                         break;
 523                 }
 524                 prepare_to_wait(&journal->j_wait_updates, &wait,
 525                                 TASK_UNINTERRUPTIBLE);
-----------------------------------------------------------------------------
 526                 spin_unlock(&transaction->t_handle_lock);
 527                 write_unlock(&journal->j_state_lock);
 528                 schedule();
 529                 finish_wait(&journal->j_wait_updates, &wait);
 530                 write_lock(&journal->j_state_lock);
 531         }
 532         write_unlock(&journal->j_state_lock);

Thefore, the following steps causes a hang-up of process1:
1) (process1) line 520 in jbd2_journal_lock_updates
      transaction->t_updates is equal to 1, and then goto 4).
2) (process2) line 1452 in jbd2_journal_stop
      transaction->t_updates becomes to 0, and then goto 3).
3) (process2) line 1453 in jbd2_journal_stop
      wake_up(&journal->j_wait_updates) tries to wake someone up.
4) (process1) line 524 in jbd2_journal_lock_updates
      prepare to sleep itself, and then goto 5).
5) (process1) line 528 in jbd2_journal_lock_updates
      sleep forever because process2 doesn't wake it up anymore.

Similar problem also exists for j_barrier_count operations but it can be
fixed, too:
[jbd2_journal_lock_updates]
 505 void jbd2_journal_lock_updates(journal_t *journal)
 506 {
 507         DEFINE_WAIT(wait);
 508 
 509         write_lock(&journal->j_state_lock);
----------------------------------------------------------------------------
 510         ++journal->j_barrier_count;
----------------------------------------------------------------------------
...
 532         write_unlock(&journal->j_state_lock);

[jbd2_journal_stop]
1445         /*
1446          * Once we drop t_updates, if it goes to zero the transaction
1447          * could start committing on us and eventually disappear.  So
1448          * once we do this, we must not dereference transaction
1449          * pointer again.
1450          */
1451         tid = transaction->t_tid;
+    read_lock(&journal->j_state_lock);
1452         if (atomic_dec_and_test(&transaction->t_updates)) {
1453                 wake_up(&journal->j_wait_updates);
----------------------------------------------------------------------------
1454                 if (journal->j_barrier_count)
1455                         wake_up(&journal->j_wait_transaction_locked);
----------------------------------------------------------------------------
1456         }
+    read_unlock(&journal->j_state_lock);
1457 

Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
---
 fs/jbd2/transaction.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index a0e41a4..76f2eca 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1449,11 +1449,13 @@ int jbd2_journal_stop(handle_t *handle)
 	 * pointer again.
 	 */
 	tid = transaction->t_tid;
+	read_lock(&journal->j_state_lock);
 	if (atomic_dec_and_test(&transaction->t_updates)) {
 		wake_up(&journal->j_wait_updates);
 		if (journal->j_barrier_count)
 			wake_up(&journal->j_wait_transaction_locked);
 	}
+	read_unlock(&journal->j_state_lock);
 
 	if (wait_for_commit)
 		err = jbd2_log_wait_commit(journal, tid);
-- 
1.5.5.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] jbd2: remove all t_handle_lock statements
  2011-12-22 11:36     ` [PATCH 0/2 take2][RFC] hangup in jbd2_journal_lock_updates Toshiyuki Okajima
  2011-12-22 11:56       ` [PATCH 1/2] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_update operations Toshiyuki Okajima
@ 2011-12-22 12:00       ` Toshiyuki Okajima
  2011-12-22 14:07         ` Yongqiang Yang
  1 sibling, 1 reply; 14+ messages in thread
From: Toshiyuki Okajima @ 2011-12-22 12:00 UTC (permalink / raw)
  To: tytso, adilger.kernel; +Cc: Yongqiang Yang, linux-ext4

Remove all t_handle_lock statements because they are not necessary anymore.

Because there is read_lock(&journal->j_state_lock) or 
write_lock(&journal->j_state_lock) on all the forward codes beyond the place 
which needs a spin_lock(&transaction->t_handle_lock).

Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Cc: 
---
 fs/jbd2/commit.c      |    4 ----
 fs/jbd2/transaction.c |   18 +++---------------
 include/linux/jbd2.h  |    8 --------
 3 files changed, 3 insertions(+), 27 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 68d704d..1030d47 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -364,22 +364,18 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 	stats.run.rs_running = jbd2_time_diff(commit_transaction->t_start,
 					      stats.run.rs_locked);
 
-	spin_lock(&commit_transaction->t_handle_lock);
 	while (atomic_read(&commit_transaction->t_updates)) {
 		DEFINE_WAIT(wait);
 
 		prepare_to_wait(&journal->j_wait_updates, &wait,
 					TASK_UNINTERRUPTIBLE);
 		if (atomic_read(&commit_transaction->t_updates)) {
-			spin_unlock(&commit_transaction->t_handle_lock);
 			write_unlock(&journal->j_state_lock);
 			schedule();
 			write_lock(&journal->j_state_lock);
-			spin_lock(&commit_transaction->t_handle_lock);
 		}
 		finish_wait(&journal->j_wait_updates, &wait);
 	}
-	spin_unlock(&commit_transaction->t_handle_lock);
 
 	J_ASSERT (atomic_read(&commit_transaction->t_outstanding_credits) <=
 			journal->j_max_transaction_buffers);
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 76f2eca..7f84e3f 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -56,7 +56,6 @@ jbd2_get_transaction(journal_t *journal, transaction_t *transaction)
 	transaction->t_start_time = ktime_get();
 	transaction->t_tid = journal->j_transaction_sequence++;
 	transaction->t_expires = jiffies + journal->j_commit_interval;
-	spin_lock_init(&transaction->t_handle_lock);
 	atomic_set(&transaction->t_updates, 0);
 	atomic_set(&transaction->t_outstanding_credits, 0);
 	atomic_set(&transaction->t_handle_count, 0);
@@ -100,10 +99,8 @@ static inline void update_t_max_wait(transaction_t *transaction,
 	if (jbd2_journal_enable_debug &&
 	    time_after(transaction->t_start, ts)) {
 		ts = jbd2_time_diff(ts, transaction->t_start);
-		spin_lock(&transaction->t_handle_lock);
 		if (ts > transaction->t_max_wait)
 			transaction->t_max_wait = ts;
-		spin_unlock(&transaction->t_handle_lock);
 	}
 #endif
 }
@@ -401,19 +398,18 @@ int jbd2_journal_extend(handle_t *handle, int nblocks)
 		goto error_out;
 	}
 
-	spin_lock(&transaction->t_handle_lock);
 	wanted = atomic_read(&transaction->t_outstanding_credits) + nblocks;
 
 	if (wanted > journal->j_max_transaction_buffers) {
 		jbd_debug(3, "denied handle %p %d blocks: "
 			  "transaction too large\n", handle, nblocks);
-		goto unlock;
+		goto error_out;
 	}
 
 	if (wanted > __jbd2_log_space_left(journal)) {
 		jbd_debug(3, "denied handle %p %d blocks: "
 			  "insufficient log space\n", handle, nblocks);
-		goto unlock;
+		goto error_out;
 	}
 
 	handle->h_buffer_credits += nblocks;
@@ -421,8 +417,6 @@ int jbd2_journal_extend(handle_t *handle, int nblocks)
 	result = 0;
 
 	jbd_debug(3, "extended handle %p by %d\n", handle, nblocks);
-unlock:
-	spin_unlock(&transaction->t_handle_lock);
 error_out:
 	read_unlock(&journal->j_state_lock);
 out:
@@ -464,12 +458,10 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, gfp_t gfp_mask)
 	J_ASSERT(journal_current_handle() == handle);
 
 	read_lock(&journal->j_state_lock);
-	spin_lock(&transaction->t_handle_lock);
 	atomic_sub(handle->h_buffer_credits,
 		   &transaction->t_outstanding_credits);
 	if (atomic_dec_and_test(&transaction->t_updates))
 		wake_up(&journal->j_wait_updates);
-	spin_unlock(&transaction->t_handle_lock);
 
 	jbd_debug(2, "restarting handle %p\n", handle);
 	tid = transaction->t_tid;
@@ -516,14 +508,10 @@ void jbd2_journal_lock_updates(journal_t *journal)
 		if (!transaction)
 			break;
 
-		spin_lock(&transaction->t_handle_lock);
-		if (!atomic_read(&transaction->t_updates)) {
-			spin_unlock(&transaction->t_handle_lock);
+		if (!atomic_read(&transaction->t_updates))
 			break;
-		}
 		prepare_to_wait(&journal->j_wait_updates, &wait,
 				TASK_UNINTERRUPTIBLE);
-		spin_unlock(&transaction->t_handle_lock);
 		write_unlock(&journal->j_state_lock);
 		schedule();
 		finish_wait(&journal->j_wait_updates, &wait);
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 2092ea2..55f7a8c 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -440,9 +440,6 @@ struct transaction_chp_stats_s {
  *    ->j_list_lock
  *
  *    j_state_lock
- *    ->t_handle_lock
- *
- *    j_state_lock
  *    ->j_list_lock			(journal_unmap_buffer)
  *
  */
@@ -538,11 +535,6 @@ struct transaction_s
 	struct list_head	t_inode_list;
 
 	/*
-	 * Protects info related to handles
-	 */
-	spinlock_t		t_handle_lock;

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] jbd2: remove all t_handle_lock statements
  2011-12-22 12:00       ` [PATCH 2/2] jbd2: remove all t_handle_lock statements Toshiyuki Okajima
@ 2011-12-22 14:07         ` Yongqiang Yang
  2011-12-26  0:20           ` Toshiyuki Okajima
  0 siblings, 1 reply; 14+ messages in thread
From: Yongqiang Yang @ 2011-12-22 14:07 UTC (permalink / raw)
  To: Toshiyuki Okajima; +Cc: tytso, adilger.kernel, linux-ext4

On Thu, Dec 22, 2011 at 8:00 PM, Toshiyuki Okajima
<toshi.okajima@jp.fujitsu.com> wrote:
> Remove all t_handle_lock statements because they are not necessary anymore.
>
> Because there is read_lock(&journal->j_state_lock) or
> write_lock(&journal->j_state_lock) on all the forward codes beyond the place
> which needs a spin_lock(&transaction->t_handle_lock).
>
> Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
> Cc:
> ---
>  fs/jbd2/commit.c      |    4 ----
>  fs/jbd2/transaction.c |   18 +++---------------
>  include/linux/jbd2.h  |    8 --------
>  3 files changed, 3 insertions(+), 27 deletions(-)
>
> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index 68d704d..1030d47 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -364,22 +364,18 @@ void jbd2_journal_commit_transaction(journal_t *journal)
>        stats.run.rs_running = jbd2_time_diff(commit_transaction->t_start,
>                                              stats.run.rs_locked);
>
> -       spin_lock(&commit_transaction->t_handle_lock);
>        while (atomic_read(&commit_transaction->t_updates)) {
>                DEFINE_WAIT(wait);
>
>                prepare_to_wait(&journal->j_wait_updates, &wait,
>                                        TASK_UNINTERRUPTIBLE);
>                if (atomic_read(&commit_transaction->t_updates)) {
> -                       spin_unlock(&commit_transaction->t_handle_lock);
>                        write_unlock(&journal->j_state_lock);
>                        schedule();
>                        write_lock(&journal->j_state_lock);
> -                       spin_lock(&commit_transaction->t_handle_lock);
>                }
>                finish_wait(&journal->j_wait_updates, &wait);
>        }
> -       spin_unlock(&commit_transaction->t_handle_lock);
>
>        J_ASSERT (atomic_read(&commit_transaction->t_outstanding_credits) <=
>                        journal->j_max_transaction_buffers);
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index 76f2eca..7f84e3f 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -56,7 +56,6 @@ jbd2_get_transaction(journal_t *journal, transaction_t *transaction)
>        transaction->t_start_time = ktime_get();
>        transaction->t_tid = journal->j_transaction_sequence++;
>        transaction->t_expires = jiffies + journal->j_commit_interval;
> -       spin_lock_init(&transaction->t_handle_lock);
>        atomic_set(&transaction->t_updates, 0);
>        atomic_set(&transaction->t_outstanding_credits, 0);
>        atomic_set(&transaction->t_handle_count, 0);
> @@ -100,10 +99,8 @@ static inline void update_t_max_wait(transaction_t *transaction,
>        if (jbd2_journal_enable_debug &&
>            time_after(transaction->t_start, ts)) {
>                ts = jbd2_time_diff(ts, transaction->t_start);
> -               spin_lock(&transaction->t_handle_lock);
>                if (ts > transaction->t_max_wait)
>                        transaction->t_max_wait = ts;
> -               spin_unlock(&transaction->t_handle_lock);
Hi Toshiyuki,

     I think you removed too much.  t_handle_lock is needed here to
protect t_max_wait.   I meant just removing t_handle_lock from where I
commented in your patch post out last time.

Yongqiang.
>        }
>  #endif
>  }
> @@ -401,19 +398,18 @@ int jbd2_journal_extend(handle_t *handle, int nblocks)
>                goto error_out;
>        }
>
> -       spin_lock(&transaction->t_handle_lock);
>        wanted = atomic_read(&transaction->t_outstanding_credits) + nblocks;
>
>        if (wanted > journal->j_max_transaction_buffers) {
>                jbd_debug(3, "denied handle %p %d blocks: "
>                          "transaction too large\n", handle, nblocks);
> -               goto unlock;
> +               goto error_out;
>        }
>
>        if (wanted > __jbd2_log_space_left(journal)) {
>                jbd_debug(3, "denied handle %p %d blocks: "
>                          "insufficient log space\n", handle, nblocks);
> -               goto unlock;
> +               goto error_out;
>        }
>
>        handle->h_buffer_credits += nblocks;
> @@ -421,8 +417,6 @@ int jbd2_journal_extend(handle_t *handle, int nblocks)
>        result = 0;
>
>        jbd_debug(3, "extended handle %p by %d\n", handle, nblocks);
> -unlock:
> -       spin_unlock(&transaction->t_handle_lock);
>  error_out:
>        read_unlock(&journal->j_state_lock);
>  out:
> @@ -464,12 +458,10 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, gfp_t gfp_mask)
>        J_ASSERT(journal_current_handle() == handle);
>
>        read_lock(&journal->j_state_lock);
> -       spin_lock(&transaction->t_handle_lock);
>        atomic_sub(handle->h_buffer_credits,
>                   &transaction->t_outstanding_credits);
>        if (atomic_dec_and_test(&transaction->t_updates))
>                wake_up(&journal->j_wait_updates);
> -       spin_unlock(&transaction->t_handle_lock);
>
>        jbd_debug(2, "restarting handle %p\n", handle);
>        tid = transaction->t_tid;
> @@ -516,14 +508,10 @@ void jbd2_journal_lock_updates(journal_t *journal)
>                if (!transaction)
>                        break;
>
> -               spin_lock(&transaction->t_handle_lock);
> -               if (!atomic_read(&transaction->t_updates)) {
> -                       spin_unlock(&transaction->t_handle_lock);
> +               if (!atomic_read(&transaction->t_updates))
>                        break;
> -               }
>                prepare_to_wait(&journal->j_wait_updates, &wait,
>                                TASK_UNINTERRUPTIBLE);
> -               spin_unlock(&transaction->t_handle_lock);
>                write_unlock(&journal->j_state_lock);
>                schedule();
>                finish_wait(&journal->j_wait_updates, &wait);
> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
> index 2092ea2..55f7a8c 100644
> --- a/include/linux/jbd2.h
> +++ b/include/linux/jbd2.h
> @@ -440,9 +440,6 @@ struct transaction_chp_stats_s {
>  *    ->j_list_lock
>  *
>  *    j_state_lock
> - *    ->t_handle_lock
> - *
> - *    j_state_lock
>  *    ->j_list_lock                    (journal_unmap_buffer)
>  *
>  */
> @@ -538,11 +535,6 @@ struct transaction_s
>        struct list_head        t_inode_list;
>
>        /*
> -        * Protects info related to handles
> -        */
> -       spinlock_t              t_handle_lock;
> -
> -       /*
>         * Longest time some handle had to wait for running transaction
>         */
>        unsigned long           t_max_wait;
> --
> 1.5.5.6



-- 
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] jbd2: remove all t_handle_lock statements
  2011-12-22 14:07         ` Yongqiang Yang
@ 2011-12-26  0:20           ` Toshiyuki Okajima
  2011-12-26  1:17             ` [PATCH 2/2 take2] jbd2: delete spin_lock(t_handle_lock) inside wirte_lock(j_state_lock) Toshiyuki Okajima
  0 siblings, 1 reply; 14+ messages in thread
From: Toshiyuki Okajima @ 2011-12-26  0:20 UTC (permalink / raw)
  To: Yongqiang Yang; +Cc: toshi.okajima, tytso, adilger.kernel, linux-ext4

Hi Yongqiang,

(2011/12/22 23:07), Yongqiang Yang wrote:
> On Thu, Dec 22, 2011 at 8:00 PM, Toshiyuki Okajima
> <toshi.okajima@jp.fujitsu.com> wrote:
>> Remove all t_handle_lock statements because they are not necessary anymore.
>>
>> Because there is read_lock(&journal->j_state_lock) or
>> write_lock(&journal->j_state_lock) on all the forward codes beyond the place
>> which needs a spin_lock(&transaction->t_handle_lock).
>>
>> Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
>> Cc:
>> ---
>>  fs/jbd2/commit.c      |    4 ----
>>  fs/jbd2/transaction.c |   18 +++---------------
>>  include/linux/jbd2.h  |    8 --------
>>  3 files changed, 3 insertions(+), 27 deletions(-)
>>
>> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
>> index 68d704d..1030d47 100644
>> --- a/fs/jbd2/commit.c
>> +++ b/fs/jbd2/commit.c
>> @@ -364,22 +364,18 @@ void jbd2_journal_commit_transaction(journal_t *journal)
>>        stats.run.rs_running = jbd2_time_diff(commit_transaction->t_start,
>>                                              stats.run.rs_locked);
>>
>> -       spin_lock(&commit_transaction->t_handle_lock);
>>        while (atomic_read(&commit_transaction->t_updates)) {
>>                DEFINE_WAIT(wait);
>>
>>                prepare_to_wait(&journal->j_wait_updates, &wait,
>>                                        TASK_UNINTERRUPTIBLE);
>>                if (atomic_read(&commit_transaction->t_updates)) {
>> -                       spin_unlock(&commit_transaction->t_handle_lock);
>>                        write_unlock(&journal->j_state_lock);
>>                        schedule();
>>                        write_lock(&journal->j_state_lock);
>> -                       spin_lock(&commit_transaction->t_handle_lock);
>>                }
>>                finish_wait(&journal->j_wait_updates, &wait);
>>        }
>> -       spin_unlock(&commit_transaction->t_handle_lock);
>>
>>        J_ASSERT (atomic_read(&commit_transaction->t_outstanding_credits) <=
>>                        journal->j_max_transaction_buffers);
>> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
>> index 76f2eca..7f84e3f 100644
>> --- a/fs/jbd2/transaction.c
>> +++ b/fs/jbd2/transaction.c
>> @@ -56,7 +56,6 @@ jbd2_get_transaction(journal_t *journal, transaction_t *transaction)
>>        transaction->t_start_time = ktime_get();
>>        transaction->t_tid = journal->j_transaction_sequence++;
>>        transaction->t_expires = jiffies + journal->j_commit_interval;
>> -       spin_lock_init(&transaction->t_handle_lock);
>>        atomic_set(&transaction->t_updates, 0);
>>        atomic_set(&transaction->t_outstanding_credits, 0);
>>        atomic_set(&transaction->t_handle_count, 0);

>> @@ -100,10 +99,8 @@ static inline void update_t_max_wait(transaction_t *transaction,
>>        if (jbd2_journal_enable_debug &&
>>            time_after(transaction->t_start, ts)) {
>>                ts = jbd2_time_diff(ts, transaction->t_start);
>> -               spin_lock(&transaction->t_handle_lock);
>>                if (ts > transaction->t_max_wait)
>>                        transaction->t_max_wait = ts;
>> -               spin_unlock(&transaction->t_handle_lock);
> Hi Toshiyuki,
> 
>      I think you removed too much.  t_handle_lock is needed here to
> protect t_max_wait.   I meant just removing t_handle_lock from where I
> commented in your patch post out last time.
Thanks for your comment. OK, I understand.

This function tries to protect itself by 
 read_lock(&journal->j_state_lock), but by only 
read_lock(&journal->j_state_lock) cannot protect this critical code 
section.

  96 static inline void update_t_max_wait(transaction_t *transaction,
  97                                      unsigned long ts)
  98 {
  99 #ifdef CONFIG_JBD2_DEBUG
 100         if (jbd2_journal_enable_debug &&
 101             time_after(transaction->t_start, ts)) {
 102                 ts = jbd2_time_diff(ts, transaction->t_start);
 103                 spin_lock(&transaction->t_handle_lock);
 104                 if (ts > transaction->t_max_wait)
 105                         transaction->t_max_wait = ts;
 106                 spin_unlock(&transaction->t_handle_lock);
 107         }
 108 #endif
 109 }

I will delete the part of my patch for update_t_max_wait().

Due to the same reason, 
 - jbd2_journal_extend
 - jbd2_journal_restart
cannot be protected by my patch. So, I will delete these parts, too.

Best Regards,
Toshiyuki Okajima

> 
> Yongqiang.
>>        }
>>  #endif
>>  }
>> @@ -401,19 +398,18 @@ int jbd2_journal_extend(handle_t *handle, int nblocks)
>>                goto error_out;
>>        }
>>
>> -       spin_lock(&transaction->t_handle_lock);
>>        wanted = atomic_read(&transaction->t_outstanding_credits) + nblocks;
>>
>>        if (wanted > journal->j_max_transaction_buffers) {
>>                jbd_debug(3, "denied handle %p %d blocks: "
>>                          "transaction too large\n", handle, nblocks);
>> -               goto unlock;
>> +               goto error_out;
>>        }
>>
>>        if (wanted > __jbd2_log_space_left(journal)) {
>>                jbd_debug(3, "denied handle %p %d blocks: "
>>                          "insufficient log space\n", handle, nblocks);
>> -               goto unlock;
>> +               goto error_out;
>>        }
>>
>>        handle->h_buffer_credits += nblocks;
>> @@ -421,8 +417,6 @@ int jbd2_journal_extend(handle_t *handle, int nblocks)
>>        result = 0;
>>
>>        jbd_debug(3, "extended handle %p by %d\n", handle, nblocks);
>> -unlock:
>> -       spin_unlock(&transaction->t_handle_lock);
>>  error_out:
>>        read_unlock(&journal->j_state_lock);
>>  out:
>> @@ -464,12 +458,10 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, gfp_t gfp_mask)
>>        J_ASSERT(journal_current_handle() == handle);
>>
>>        read_lock(&journal->j_state_lock);
>> -       spin_lock(&transaction->t_handle_lock);
>>        atomic_sub(handle->h_buffer_credits,
>>                   &transaction->t_outstanding_credits);
>>        if (atomic_dec_and_test(&transaction->t_updates))
>>                wake_up(&journal->j_wait_updates);
>> -       spin_unlock(&transaction->t_handle_lock);
>>
>>        jbd_debug(2, "restarting handle %p\n", handle);
>>        tid = transaction->t_tid;
>> @@ -516,14 +508,10 @@ void jbd2_journal_lock_updates(journal_t *journal)
>>                if (!transaction)
>>                        break;
>>
>> -               spin_lock(&transaction->t_handle_lock);
>> -               if (!atomic_read(&transaction->t_updates)) {
>> -                       spin_unlock(&transaction->t_handle_lock);
>> +               if (!atomic_read(&transaction->t_updates))
>>                        break;
>> -               }
>>                prepare_to_wait(&journal->j_wait_updates, &wait,
>>                                TASK_UNINTERRUPTIBLE);
>> -               spin_unlock(&transaction->t_handle_lock);
>>                write_unlock(&journal->j_state_lock);
>>                schedule();
>>                finish_wait(&journal->j_wait_updates, &wait);
>> diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
>> index 2092ea2..55f7a8c 100644
>> --- a/include/linux/jbd2.h
>> +++ b/include/linux/jbd2.h
>> @@ -440,9 +440,6 @@ struct transaction_chp_stats_s {
>>  *    ->j_list_lock
>>  *
>>  *    j_state_lock
>> - *    ->t_handle_lock
>> - *
>> - *    j_state_lock
>>  *    ->j_list_lock                    (journal_unmap_buffer)
>>  *
>>  */
>> @@ -538,11 +535,6 @@ struct transaction_s
>>        struct list_head        t_inode_list;
>>
>>        /*
>> -        * Protects info related to handles
>> -        */
>> -       spinlock_t              t_handle_lock;
>> -
>> -       /*
>>         * Longest time some handle had to wait for running transaction
>>         */
>>        unsigned long           t_max_wait;
>> --
>> 1.5.5.6
> 
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/2 take2] jbd2: delete spin_lock(t_handle_lock) inside wirte_lock(j_state_lock)
  2011-12-26  0:20           ` Toshiyuki Okajima
@ 2011-12-26  1:17             ` Toshiyuki Okajima
  2011-12-31  6:35               ` Yongqiang Yang
                                 ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Toshiyuki Okajima @ 2011-12-26  1:17 UTC (permalink / raw)
  To: tytso, adilger.kernel, Yongqiang Yang; +Cc: linux-ext4, toshi.okajima

Delete all spin_lock(t_handle_lock) statements inside write_lock(j_state_lock) 
because the critical code sections can be protected by write_lock(j_state_lock)
only.

Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Reported-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
---
 fs/jbd2/commit.c      |    4 ----
 fs/jbd2/transaction.c |    6 +-----
 2 files changed, 1 insertions(+), 9 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 68d704d..1030d47 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -364,22 +364,18 @@ void jbd2_journal_commit_transaction(journal_t *journal)
 	stats.run.rs_running = jbd2_time_diff(commit_transaction->t_start,
 					      stats.run.rs_locked);
 
-	spin_lock(&commit_transaction->t_handle_lock);
 	while (atomic_read(&commit_transaction->t_updates)) {
 		DEFINE_WAIT(wait);
 
 		prepare_to_wait(&journal->j_wait_updates, &wait,
 					TASK_UNINTERRUPTIBLE);
 		if (atomic_read(&commit_transaction->t_updates)) {
-			spin_unlock(&commit_transaction->t_handle_lock);
 			write_unlock(&journal->j_state_lock);
 			schedule();
 			write_lock(&journal->j_state_lock);
-			spin_lock(&commit_transaction->t_handle_lock);
 		}
 		finish_wait(&journal->j_wait_updates, &wait);
 	}
-	spin_unlock(&commit_transaction->t_handle_lock);
 
 	J_ASSERT (atomic_read(&commit_transaction->t_outstanding_credits) <=
 			journal->j_max_transaction_buffers);
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 76f2eca..c418ed9 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -516,14 +516,10 @@ void jbd2_journal_lock_updates(journal_t *journal)
 		if (!transaction)
 			break;
 
-		spin_lock(&transaction->t_handle_lock);
-		if (!atomic_read(&transaction->t_updates)) {
-			spin_unlock(&transaction->t_handle_lock);
+		if (!atomic_read(&transaction->t_updates))
 			break;
-		}
 		prepare_to_wait(&journal->j_wait_updates, &wait,
 				TASK_UNINTERRUPTIBLE);
-		spin_unlock(&transaction->t_handle_lock);
 		write_unlock(&journal->j_state_lock);
 		schedule();
 		finish_wait(&journal->j_wait_updates, &wait);
-- 
1.5.5.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2 take2] jbd2: delete spin_lock(t_handle_lock) inside wirte_lock(j_state_lock)
  2011-12-26  1:17             ` [PATCH 2/2 take2] jbd2: delete spin_lock(t_handle_lock) inside wirte_lock(j_state_lock) Toshiyuki Okajima
@ 2011-12-31  6:35               ` Yongqiang Yang
  2012-01-03 15:35               ` Jan Kara
  2012-01-05  3:27               ` Ted Ts'o
  2 siblings, 0 replies; 14+ messages in thread
From: Yongqiang Yang @ 2011-12-31  6:35 UTC (permalink / raw)
  To: Toshiyuki Okajima; +Cc: tytso, adilger.kernel, linux-ext4

Looks good to me.

Thanks,
Yongqiang.
On Mon, Dec 26, 2011 at 9:17 AM, Toshiyuki Okajima
<toshi.okajima@jp.fujitsu.com> wrote:
> Delete all spin_lock(t_handle_lock) statements inside write_lock(j_state_lock)
> because the critical code sections can be protected by write_lock(j_state_lock)
> only.
>
> Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
> Reported-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
> ---
>  fs/jbd2/commit.c      |    4 ----
>  fs/jbd2/transaction.c |    6 +-----
>  2 files changed, 1 insertions(+), 9 deletions(-)
>
> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index 68d704d..1030d47 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -364,22 +364,18 @@ void jbd2_journal_commit_transaction(journal_t *journal)
>        stats.run.rs_running = jbd2_time_diff(commit_transaction->t_start,
>                                              stats.run.rs_locked);
>
> -       spin_lock(&commit_transaction->t_handle_lock);
>        while (atomic_read(&commit_transaction->t_updates)) {
>                DEFINE_WAIT(wait);
>
>                prepare_to_wait(&journal->j_wait_updates, &wait,
>                                        TASK_UNINTERRUPTIBLE);
>                if (atomic_read(&commit_transaction->t_updates)) {
> -                       spin_unlock(&commit_transaction->t_handle_lock);
>                        write_unlock(&journal->j_state_lock);
>                        schedule();
>                        write_lock(&journal->j_state_lock);
> -                       spin_lock(&commit_transaction->t_handle_lock);
>                }
>                finish_wait(&journal->j_wait_updates, &wait);
>        }
> -       spin_unlock(&commit_transaction->t_handle_lock);
>
>        J_ASSERT (atomic_read(&commit_transaction->t_outstanding_credits) <=
>                        journal->j_max_transaction_buffers);
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index 76f2eca..c418ed9 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -516,14 +516,10 @@ void jbd2_journal_lock_updates(journal_t *journal)
>                if (!transaction)
>                        break;
>
> -               spin_lock(&transaction->t_handle_lock);
> -               if (!atomic_read(&transaction->t_updates)) {
> -                       spin_unlock(&transaction->t_handle_lock);
> +               if (!atomic_read(&transaction->t_updates))
>                        break;
> -               }
>                prepare_to_wait(&journal->j_wait_updates, &wait,
>                                TASK_UNINTERRUPTIBLE);
> -               spin_unlock(&transaction->t_handle_lock);
>                write_unlock(&journal->j_state_lock);
>                schedule();
>                finish_wait(&journal->j_wait_updates, &wait);
> --
> 1.5.5.6



-- 
Best Wishes
Yongqiang Yang
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_update operations
  2011-12-22 11:56       ` [PATCH 1/2] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_update operations Toshiyuki Okajima
@ 2012-01-03 15:32         ` Jan Kara
  2012-01-05  3:12           ` Ted Ts'o
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Kara @ 2012-01-03 15:32 UTC (permalink / raw)
  To: Toshiyuki Okajima; +Cc: tytso, adilger.kernel, Yongqiang Yang, linux-ext4

[-- Attachment #1: Type: text/plain, Size: 5234 bytes --]

  Hello,

On Thu 22-12-11 20:56:50, Toshiyuki Okajima wrote:
> The following statements need an exclusive control for the critical code section
> around t_update operations:
> [jbd2_journal_stop()]
> 1445         /*
> 1446          * Once we drop t_updates, if it goes to zero the transaction
> 1447          * could start committing on us and eventually disappear.  So
> 1448          * once we do this, we must not dereference transaction
> 1449          * pointer again.
> 1450          */
> 1451         tid = transaction->t_tid;
> +    read_lock(&journal->j_state_lock);
> ----- critical code section ------------------------------------------------
> 1452         if (atomic_dec_and_test(&transaction->t_updates)) {
> 1453                 wake_up(&journal->j_wait_updates);
> 1454                 if (journal->j_barrier_count)
> 1455                         wake_up(&journal->j_wait_transaction_locked);
> 1456         }
> -----------------------------------------------------------------------------
> +    read_unlock(&journal->j_state_lock);
> 1457 
> 
> Because the functions which have the other critical code sections around t_update 
> operations, 
>  - jbd2_journal_commit_transaction
>  - start_this_handle
>  - jbd2_journal_lock_updates
> can not synchronize with jbd2_journal_stop.
> 
> ex) jbd2_journal_lock_updates
>  505 void jbd2_journal_lock_updates(journal_t *journal)
>  506 {
>  507         DEFINE_WAIT(wait);
>  508 
>  509         write_lock(&journal->j_state_lock);
>  510         ++journal->j_barrier_count;
>  511 
>  512         /* Wait until there are no running updates */
>  513         while (1) {
>  514                transaction_t *transaction = journal->j_running_transaction;
>  515 
>  516                 if (!transaction)
>  517                         break;
>  518 
>  519                 spin_lock(&transaction->t_handle_lock);
> ----- critical code section ------------------------------------------------
>  520                 if (!atomic_read(&transaction->t_updates)) {
>  521                         spin_unlock(&transaction->t_handle_lock);
>  522                         break;
>  523                 }
>  524                 prepare_to_wait(&journal->j_wait_updates, &wait,
>  525                                 TASK_UNINTERRUPTIBLE);
> -----------------------------------------------------------------------------
>  526                 spin_unlock(&transaction->t_handle_lock);
>  527                 write_unlock(&journal->j_state_lock);
>  528                 schedule();
>  529                 finish_wait(&journal->j_wait_updates, &wait);
>  530                 write_lock(&journal->j_state_lock);
>  531         }
>  532         write_unlock(&journal->j_state_lock);
> 
> Thefore, the following steps causes a hang-up of process1:
> 1) (process1) line 520 in jbd2_journal_lock_updates
>       transaction->t_updates is equal to 1, and then goto 4).
> 2) (process2) line 1452 in jbd2_journal_stop
>       transaction->t_updates becomes to 0, and then goto 3).
> 3) (process2) line 1453 in jbd2_journal_stop
>       wake_up(&journal->j_wait_updates) tries to wake someone up.
> 4) (process1) line 524 in jbd2_journal_lock_updates
>       prepare to sleep itself, and then goto 5).
> 5) (process1) line 528 in jbd2_journal_lock_updates
>       sleep forever because process2 doesn't wake it up anymore.
  Thanks for the analysis. Actually, you fix adds unnecessary overhead.
The problem really is the wrong ordering of prepare_to_wait() and t_updates
check. So attached patch should fix the issue as well without introducing
the overhead.

> Similar problem also exists for j_barrier_count operations but it can be
> fixed, too:
> [jbd2_journal_lock_updates]
>  505 void jbd2_journal_lock_updates(journal_t *journal)
>  506 {
>  507         DEFINE_WAIT(wait);
>  508 
>  509         write_lock(&journal->j_state_lock);
> ----------------------------------------------------------------------------
>  510         ++journal->j_barrier_count;
> ----------------------------------------------------------------------------
> ...
>  532         write_unlock(&journal->j_state_lock);
> 
> [jbd2_journal_stop]
> 1445         /*
> 1446          * Once we drop t_updates, if it goes to zero the transaction
> 1447          * could start committing on us and eventually disappear.  So
> 1448          * once we do this, we must not dereference transaction
> 1449          * pointer again.
> 1450          */
> 1451         tid = transaction->t_tid;
> +    read_lock(&journal->j_state_lock);
> 1452         if (atomic_dec_and_test(&transaction->t_updates)) {
> 1453                 wake_up(&journal->j_wait_updates);
> ----------------------------------------------------------------------------
> 1454                 if (journal->j_barrier_count)
> 1455                         wake_up(&journal->j_wait_transaction_locked);
> ----------------------------------------------------------------------------
> 1456         }
> +    read_unlock(&journal->j_state_lock);
> 1457 
  Here I don't agree. We use wait_event() to wait for j_barrier_count to
drop to zero and wait_event() has proper ordering of prepare_to_wait() and
test.

								Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR

[-- Attachment #2: 0001-jbd2-Fix-hung-processes-in-jbd2_journal_lock_updates.patch --]
[-- Type: text/x-patch, Size: 2346 bytes --]

>From 1cd5b8178893f3f186ce93eb1f47664a1a3e81fc Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Tue, 3 Jan 2012 16:13:29 +0100
Subject: [PATCH] jbd2: Fix hung processes in jbd2_journal_lock_updates()

Toshiyuki Okajima found out that when running

for ((i=0; i < 100000; i++)); do
        if ((i%2 == 0)); then
                chattr +j /mnt/file
        else
                chattr -j /mnt/file
        fi
        echo "0" >> /mnt/file
done

process sometimes hangs indefinitely in jbd2_journal_lock_updates().

Toshiyuki identified that the following race happens:

jbd2_journal_lock_updates()            |jbd2_journal_stop()
---------------------------------------+---------------------------------------
 write_lock(&journal->j_state_lock)    |    .
 ++journal->j_barrier_count            |    .
 spin_lock(&tran->t_handle_lock)       |    .
 atomic_read(&tran->t_updates) //not 0 |
                                       | atomic_dec_and_test(&tran->t_updates)
                                       |    // t_updates = 0
                                       | wake_up(&journal->j_wait_updates)
 prepare_to_wait()                     |    // no process is woken up.
 spin_unlock(&tran->t_handle_lock)     |
 write_unlock(&journal->j_state_lock)  |
 schedule() // never return            |

We fix the problem by first calling prepare_to_wait() and only after that
checking t_updates in jbd2_journal_lock_updates().

Reported-and-analyzed-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/jbd2/transaction.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index a0e41a4..35ae096 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -517,12 +517,13 @@ void jbd2_journal_lock_updates(journal_t *journal)
 			break;
 
 		spin_lock(&transaction->t_handle_lock);
+		prepare_to_wait(&journal->j_wait_updates, &wait,
+				TASK_UNINTERRUPTIBLE);
 		if (!atomic_read(&transaction->t_updates)) {
 			spin_unlock(&transaction->t_handle_lock);
+			finish_wait(&journal->j_wait_updates, &wait);
 			break;
 		}
-		prepare_to_wait(&journal->j_wait_updates, &wait,
-				TASK_UNINTERRUPTIBLE);
 		spin_unlock(&transaction->t_handle_lock);
 		write_unlock(&journal->j_state_lock);
 		schedule();
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2 take2] jbd2: delete spin_lock(t_handle_lock) inside wirte_lock(j_state_lock)
  2011-12-26  1:17             ` [PATCH 2/2 take2] jbd2: delete spin_lock(t_handle_lock) inside wirte_lock(j_state_lock) Toshiyuki Okajima
  2011-12-31  6:35               ` Yongqiang Yang
@ 2012-01-03 15:35               ` Jan Kara
  2012-01-05  3:27               ` Ted Ts'o
  2 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2012-01-03 15:35 UTC (permalink / raw)
  To: Toshiyuki Okajima; +Cc: tytso, adilger.kernel, Yongqiang Yang, linux-ext4

On Mon 26-12-11 10:17:27, Toshiyuki Okajima wrote:
> Delete all spin_lock(t_handle_lock) statements inside write_lock(j_state_lock) 
> because the critical code sections can be protected by write_lock(j_state_lock)
> only.
> 
> Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
> Reported-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
  The patch looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/jbd2/commit.c      |    4 ----
>  fs/jbd2/transaction.c |    6 +-----
>  2 files changed, 1 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index 68d704d..1030d47 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -364,22 +364,18 @@ void jbd2_journal_commit_transaction(journal_t *journal)
>  	stats.run.rs_running = jbd2_time_diff(commit_transaction->t_start,
>  					      stats.run.rs_locked);
>  
> -	spin_lock(&commit_transaction->t_handle_lock);
>  	while (atomic_read(&commit_transaction->t_updates)) {
>  		DEFINE_WAIT(wait);
>  
>  		prepare_to_wait(&journal->j_wait_updates, &wait,
>  					TASK_UNINTERRUPTIBLE);
>  		if (atomic_read(&commit_transaction->t_updates)) {
> -			spin_unlock(&commit_transaction->t_handle_lock);
>  			write_unlock(&journal->j_state_lock);
>  			schedule();
>  			write_lock(&journal->j_state_lock);
> -			spin_lock(&commit_transaction->t_handle_lock);
>  		}
>  		finish_wait(&journal->j_wait_updates, &wait);
>  	}
> -	spin_unlock(&commit_transaction->t_handle_lock);
>  
>  	J_ASSERT (atomic_read(&commit_transaction->t_outstanding_credits) <=
>  			journal->j_max_transaction_buffers);
> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
> index 76f2eca..c418ed9 100644
> --- a/fs/jbd2/transaction.c
> +++ b/fs/jbd2/transaction.c
> @@ -516,14 +516,10 @@ void jbd2_journal_lock_updates(journal_t *journal)
>  		if (!transaction)
>  			break;
>  
> -		spin_lock(&transaction->t_handle_lock);
> -		if (!atomic_read(&transaction->t_updates)) {
> -			spin_unlock(&transaction->t_handle_lock);
> +		if (!atomic_read(&transaction->t_updates))
>  			break;
> -		}
>  		prepare_to_wait(&journal->j_wait_updates, &wait,
>  				TASK_UNINTERRUPTIBLE);
> -		spin_unlock(&transaction->t_handle_lock);
>  		write_unlock(&journal->j_state_lock);
>  		schedule();
>  		finish_wait(&journal->j_wait_updates, &wait);
> -- 
> 1.5.5.6
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_update operations
  2012-01-03 15:32         ` Jan Kara
@ 2012-01-05  3:12           ` Ted Ts'o
  0 siblings, 0 replies; 14+ messages in thread
From: Ted Ts'o @ 2012-01-05  3:12 UTC (permalink / raw)
  To: Jan Kara; +Cc: Toshiyuki Okajima, adilger.kernel, Yongqiang Yang, linux-ext4

On Tue, Jan 03, 2012 at 04:32:45PM +0100, Jan Kara wrote:
>   Thanks for the analysis. Actually, you fix adds unnecessary overhead.
> The problem really is the wrong ordering of prepare_to_wait() and t_updates
> check. So attached patch should fix the issue as well without introducing
> the overhead.

Thanks, applied.

						- Ted


> From 1cd5b8178893f3f186ce93eb1f47664a1a3e81fc Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Tue, 3 Jan 2012 16:13:29 +0100
> Subject: [PATCH] jbd2: Fix hung processes in jbd2_journal_lock_updates()
> 
> Toshiyuki Okajima found out that when running
> 
> for ((i=0; i < 100000; i++)); do
>         if ((i%2 == 0)); then
>                 chattr +j /mnt/file
>         else
>                 chattr -j /mnt/file
>         fi
>         echo "0" >> /mnt/file
> done
> 
> process sometimes hangs indefinitely in jbd2_journal_lock_updates().
> 
> Toshiyuki identified that the following race happens:
> 
> jbd2_journal_lock_updates()            |jbd2_journal_stop()
> ---------------------------------------+---------------------------------------
>  write_lock(&journal->j_state_lock)    |    .
>  ++journal->j_barrier_count            |    .
>  spin_lock(&tran->t_handle_lock)       |    .
>  atomic_read(&tran->t_updates) //not 0 |
>                                        | atomic_dec_and_test(&tran->t_updates)
>                                        |    // t_updates = 0
>                                        | wake_up(&journal->j_wait_updates)
>  prepare_to_wait()                     |    // no process is woken up.
>  spin_unlock(&tran->t_handle_lock)     |
>  write_unlock(&journal->j_state_lock)  |
>  schedule() // never return            |
> 
> We fix the problem by first calling prepare_to_wait() and only after that
> checking t_updates in jbd2_journal_lock_updates().
> 
> Reported-and-analyzed-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
> Signed-off-by: Jan Kara <jack@suse.cz>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2 take2] jbd2: delete spin_lock(t_handle_lock) inside wirte_lock(j_state_lock)
  2011-12-26  1:17             ` [PATCH 2/2 take2] jbd2: delete spin_lock(t_handle_lock) inside wirte_lock(j_state_lock) Toshiyuki Okajima
  2011-12-31  6:35               ` Yongqiang Yang
  2012-01-03 15:35               ` Jan Kara
@ 2012-01-05  3:27               ` Ted Ts'o
  2 siblings, 0 replies; 14+ messages in thread
From: Ted Ts'o @ 2012-01-05  3:27 UTC (permalink / raw)
  To: Toshiyuki Okajima; +Cc: adilger.kernel, Yongqiang Yang, linux-ext4

On Mon, Dec 26, 2011 at 10:17:27AM +0900, Toshiyuki Okajima wrote:
> Delete all spin_lock(t_handle_lock) statements inside write_lock(j_state_lock) 
> because the critical code sections can be protected by write_lock(j_state_lock)
> only.
> 
> Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
> Reported-by: Yongqiang Yang <xiaoqiangnk@gmail.com>

This may be true today, but I'm worried this is going to make the code
more fragile.  The header file include/linux/jbd2.h defines which
fields are protected by which lock.  This patch is going to make those
comments partially obsolete.  We need to make sure the locking
protocol is clearly defined before we go around removing locks.

This is something I would like to do, but we need to make sure the
code remains maintainable in the long run.

					- Ted

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-01-05  3:27 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-16 11:19 [PATCH][RFC] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_updates operations Toshiyuki Okajima
2011-12-16 12:22 ` Yongqiang Yang
2011-12-20 10:44   ` Toshiyuki Okajima
2011-12-22 11:36     ` [PATCH 0/2 take2][RFC] hangup in jbd2_journal_lock_updates Toshiyuki Okajima
2011-12-22 11:56       ` [PATCH 1/2] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_update operations Toshiyuki Okajima
2012-01-03 15:32         ` Jan Kara
2012-01-05  3:12           ` Ted Ts'o
2011-12-22 12:00       ` [PATCH 2/2] jbd2: remove all t_handle_lock statements Toshiyuki Okajima
2011-12-22 14:07         ` Yongqiang Yang
2011-12-26  0:20           ` Toshiyuki Okajima
2011-12-26  1:17             ` [PATCH 2/2 take2] jbd2: delete spin_lock(t_handle_lock) inside wirte_lock(j_state_lock) Toshiyuki Okajima
2011-12-31  6:35               ` Yongqiang Yang
2012-01-03 15:35               ` Jan Kara
2012-01-05  3:27               ` Ted Ts'o

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.