linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
To: tytso@mit.edu, adilger.kernel@dilger.ca
Cc: Yongqiang Yang <xiaoqiangnk@gmail.com>, linux-ext4@vger.kernel.org
Subject: [PATCH 1/2] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_update operations
Date: Thu, 22 Dec 2011 20:56:50 +0900	[thread overview]
Message-ID: <20111222205650.fe9d1b36.toshi.okajima@jp.fujitsu.com> (raw)
In-Reply-To: <20111222203639.4200538e.toshi.okajima@jp.fujitsu.com>

The following statements need an exclusive control for the critical code section
around t_update operations:
[jbd2_journal_stop()]
1445         /*
1446          * Once we drop t_updates, if it goes to zero the transaction
1447          * could start committing on us and eventually disappear.  So
1448          * once we do this, we must not dereference transaction
1449          * pointer again.
1450          */
1451         tid = transaction->t_tid;
+    read_lock(&journal->j_state_lock);
----- critical code section ------------------------------------------------
1452         if (atomic_dec_and_test(&transaction->t_updates)) {
1453                 wake_up(&journal->j_wait_updates);
1454                 if (journal->j_barrier_count)
1455                         wake_up(&journal->j_wait_transaction_locked);
1456         }
-----------------------------------------------------------------------------
+    read_unlock(&journal->j_state_lock);
1457 

Because the functions which have the other critical code sections around t_update 
operations, 
 - jbd2_journal_commit_transaction
 - start_this_handle
 - jbd2_journal_lock_updates
can not synchronize with jbd2_journal_stop.

ex) jbd2_journal_lock_updates
 505 void jbd2_journal_lock_updates(journal_t *journal)
 506 {
 507         DEFINE_WAIT(wait);
 508 
 509         write_lock(&journal->j_state_lock);
 510         ++journal->j_barrier_count;
 511 
 512         /* Wait until there are no running updates */
 513         while (1) {
 514                transaction_t *transaction = journal->j_running_transaction;
 515 
 516                 if (!transaction)
 517                         break;
 518 
 519                 spin_lock(&transaction->t_handle_lock);
----- critical code section ------------------------------------------------
 520                 if (!atomic_read(&transaction->t_updates)) {
 521                         spin_unlock(&transaction->t_handle_lock);
 522                         break;
 523                 }
 524                 prepare_to_wait(&journal->j_wait_updates, &wait,
 525                                 TASK_UNINTERRUPTIBLE);
-----------------------------------------------------------------------------
 526                 spin_unlock(&transaction->t_handle_lock);
 527                 write_unlock(&journal->j_state_lock);
 528                 schedule();
 529                 finish_wait(&journal->j_wait_updates, &wait);
 530                 write_lock(&journal->j_state_lock);
 531         }
 532         write_unlock(&journal->j_state_lock);

Thefore, the following steps causes a hang-up of process1:
1) (process1) line 520 in jbd2_journal_lock_updates
      transaction->t_updates is equal to 1, and then goto 4).
2) (process2) line 1452 in jbd2_journal_stop
      transaction->t_updates becomes to 0, and then goto 3).
3) (process2) line 1453 in jbd2_journal_stop
      wake_up(&journal->j_wait_updates) tries to wake someone up.
4) (process1) line 524 in jbd2_journal_lock_updates
      prepare to sleep itself, and then goto 5).
5) (process1) line 528 in jbd2_journal_lock_updates
      sleep forever because process2 doesn't wake it up anymore.

Similar problem also exists for j_barrier_count operations but it can be
fixed, too:
[jbd2_journal_lock_updates]
 505 void jbd2_journal_lock_updates(journal_t *journal)
 506 {
 507         DEFINE_WAIT(wait);
 508 
 509         write_lock(&journal->j_state_lock);
----------------------------------------------------------------------------
 510         ++journal->j_barrier_count;
----------------------------------------------------------------------------
...
 532         write_unlock(&journal->j_state_lock);

[jbd2_journal_stop]
1445         /*
1446          * Once we drop t_updates, if it goes to zero the transaction
1447          * could start committing on us and eventually disappear.  So
1448          * once we do this, we must not dereference transaction
1449          * pointer again.
1450          */
1451         tid = transaction->t_tid;
+    read_lock(&journal->j_state_lock);
1452         if (atomic_dec_and_test(&transaction->t_updates)) {
1453                 wake_up(&journal->j_wait_updates);
----------------------------------------------------------------------------
1454                 if (journal->j_barrier_count)
1455                         wake_up(&journal->j_wait_transaction_locked);
----------------------------------------------------------------------------
1456         }
+    read_unlock(&journal->j_state_lock);
1457 

Signed-off-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
---
 fs/jbd2/transaction.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index a0e41a4..76f2eca 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1449,11 +1449,13 @@ int jbd2_journal_stop(handle_t *handle)
 	 * pointer again.
 	 */
 	tid = transaction->t_tid;
+	read_lock(&journal->j_state_lock);
 	if (atomic_dec_and_test(&transaction->t_updates)) {
 		wake_up(&journal->j_wait_updates);
 		if (journal->j_barrier_count)
 			wake_up(&journal->j_wait_transaction_locked);
 	}
+	read_unlock(&journal->j_state_lock);
 
 	if (wait_for_commit)
 		err = jbd2_log_wait_commit(journal, tid);
-- 
1.5.5.6

  reply	other threads:[~2011-12-22 12:02 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-16 11:19 [PATCH][RFC] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_updates operations Toshiyuki Okajima
2011-12-16 12:22 ` Yongqiang Yang
2011-12-20 10:44   ` Toshiyuki Okajima
2011-12-22 11:36     ` [PATCH 0/2 take2][RFC] hangup in jbd2_journal_lock_updates Toshiyuki Okajima
2011-12-22 11:56       ` Toshiyuki Okajima [this message]
2012-01-03 15:32         ` [PATCH 1/2] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_update operations Jan Kara
2012-01-05  3:12           ` Ted Ts'o
2011-12-22 12:00       ` [PATCH 2/2] jbd2: remove all t_handle_lock statements Toshiyuki Okajima
2011-12-22 14:07         ` Yongqiang Yang
2011-12-26  0:20           ` Toshiyuki Okajima
2011-12-26  1:17             ` [PATCH 2/2 take2] jbd2: delete spin_lock(t_handle_lock) inside wirte_lock(j_state_lock) Toshiyuki Okajima
2011-12-31  6:35               ` Yongqiang Yang
2012-01-03 15:35               ` Jan Kara
2012-01-05  3:27               ` Ted Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111222205650.fe9d1b36.toshi.okajima@jp.fujitsu.com \
    --to=toshi.okajima@jp.fujitsu.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=xiaoqiangnk@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).