From: Jan Kara <jack@suse.cz>
To: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Cc: tytso@mit.edu, adilger.kernel@dilger.ca,
Yongqiang Yang <xiaoqiangnk@gmail.com>,
linux-ext4@vger.kernel.org
Subject: Re: [PATCH 1/2] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_update operations
Date: Tue, 3 Jan 2012 16:32:45 +0100 [thread overview]
Message-ID: <20120103153245.GE31457@quack.suse.cz> (raw)
In-Reply-To: <20111222205650.fe9d1b36.toshi.okajima@jp.fujitsu.com>
[-- Attachment #1: Type: text/plain, Size: 5234 bytes --]
Hello,
On Thu 22-12-11 20:56:50, Toshiyuki Okajima wrote:
> The following statements need an exclusive control for the critical code section
> around t_update operations:
> [jbd2_journal_stop()]
> 1445 /*
> 1446 * Once we drop t_updates, if it goes to zero the transaction
> 1447 * could start committing on us and eventually disappear. So
> 1448 * once we do this, we must not dereference transaction
> 1449 * pointer again.
> 1450 */
> 1451 tid = transaction->t_tid;
> + read_lock(&journal->j_state_lock);
> ----- critical code section ------------------------------------------------
> 1452 if (atomic_dec_and_test(&transaction->t_updates)) {
> 1453 wake_up(&journal->j_wait_updates);
> 1454 if (journal->j_barrier_count)
> 1455 wake_up(&journal->j_wait_transaction_locked);
> 1456 }
> -----------------------------------------------------------------------------
> + read_unlock(&journal->j_state_lock);
> 1457
>
> Because the functions which have the other critical code sections around t_update
> operations,
> - jbd2_journal_commit_transaction
> - start_this_handle
> - jbd2_journal_lock_updates
> can not synchronize with jbd2_journal_stop.
>
> ex) jbd2_journal_lock_updates
> 505 void jbd2_journal_lock_updates(journal_t *journal)
> 506 {
> 507 DEFINE_WAIT(wait);
> 508
> 509 write_lock(&journal->j_state_lock);
> 510 ++journal->j_barrier_count;
> 511
> 512 /* Wait until there are no running updates */
> 513 while (1) {
> 514 transaction_t *transaction = journal->j_running_transaction;
> 515
> 516 if (!transaction)
> 517 break;
> 518
> 519 spin_lock(&transaction->t_handle_lock);
> ----- critical code section ------------------------------------------------
> 520 if (!atomic_read(&transaction->t_updates)) {
> 521 spin_unlock(&transaction->t_handle_lock);
> 522 break;
> 523 }
> 524 prepare_to_wait(&journal->j_wait_updates, &wait,
> 525 TASK_UNINTERRUPTIBLE);
> -----------------------------------------------------------------------------
> 526 spin_unlock(&transaction->t_handle_lock);
> 527 write_unlock(&journal->j_state_lock);
> 528 schedule();
> 529 finish_wait(&journal->j_wait_updates, &wait);
> 530 write_lock(&journal->j_state_lock);
> 531 }
> 532 write_unlock(&journal->j_state_lock);
>
> Thefore, the following steps causes a hang-up of process1:
> 1) (process1) line 520 in jbd2_journal_lock_updates
> transaction->t_updates is equal to 1, and then goto 4).
> 2) (process2) line 1452 in jbd2_journal_stop
> transaction->t_updates becomes to 0, and then goto 3).
> 3) (process2) line 1453 in jbd2_journal_stop
> wake_up(&journal->j_wait_updates) tries to wake someone up.
> 4) (process1) line 524 in jbd2_journal_lock_updates
> prepare to sleep itself, and then goto 5).
> 5) (process1) line 528 in jbd2_journal_lock_updates
> sleep forever because process2 doesn't wake it up anymore.
Thanks for the analysis. Actually, you fix adds unnecessary overhead.
The problem really is the wrong ordering of prepare_to_wait() and t_updates
check. So attached patch should fix the issue as well without introducing
the overhead.
> Similar problem also exists for j_barrier_count operations but it can be
> fixed, too:
> [jbd2_journal_lock_updates]
> 505 void jbd2_journal_lock_updates(journal_t *journal)
> 506 {
> 507 DEFINE_WAIT(wait);
> 508
> 509 write_lock(&journal->j_state_lock);
> ----------------------------------------------------------------------------
> 510 ++journal->j_barrier_count;
> ----------------------------------------------------------------------------
> ...
> 532 write_unlock(&journal->j_state_lock);
>
> [jbd2_journal_stop]
> 1445 /*
> 1446 * Once we drop t_updates, if it goes to zero the transaction
> 1447 * could start committing on us and eventually disappear. So
> 1448 * once we do this, we must not dereference transaction
> 1449 * pointer again.
> 1450 */
> 1451 tid = transaction->t_tid;
> + read_lock(&journal->j_state_lock);
> 1452 if (atomic_dec_and_test(&transaction->t_updates)) {
> 1453 wake_up(&journal->j_wait_updates);
> ----------------------------------------------------------------------------
> 1454 if (journal->j_barrier_count)
> 1455 wake_up(&journal->j_wait_transaction_locked);
> ----------------------------------------------------------------------------
> 1456 }
> + read_unlock(&journal->j_state_lock);
> 1457
Here I don't agree. We use wait_event() to wait for j_barrier_count to
drop to zero and wait_event() has proper ordering of prepare_to_wait() and
test.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
[-- Attachment #2: 0001-jbd2-Fix-hung-processes-in-jbd2_journal_lock_updates.patch --]
[-- Type: text/x-patch, Size: 2346 bytes --]
>From 1cd5b8178893f3f186ce93eb1f47664a1a3e81fc Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Tue, 3 Jan 2012 16:13:29 +0100
Subject: [PATCH] jbd2: Fix hung processes in jbd2_journal_lock_updates()
Toshiyuki Okajima found out that when running
for ((i=0; i < 100000; i++)); do
if ((i%2 == 0)); then
chattr +j /mnt/file
else
chattr -j /mnt/file
fi
echo "0" >> /mnt/file
done
process sometimes hangs indefinitely in jbd2_journal_lock_updates().
Toshiyuki identified that the following race happens:
jbd2_journal_lock_updates() |jbd2_journal_stop()
---------------------------------------+---------------------------------------
write_lock(&journal->j_state_lock) | .
++journal->j_barrier_count | .
spin_lock(&tran->t_handle_lock) | .
atomic_read(&tran->t_updates) //not 0 |
| atomic_dec_and_test(&tran->t_updates)
| // t_updates = 0
| wake_up(&journal->j_wait_updates)
prepare_to_wait() | // no process is woken up.
spin_unlock(&tran->t_handle_lock) |
write_unlock(&journal->j_state_lock) |
schedule() // never return |
We fix the problem by first calling prepare_to_wait() and only after that
checking t_updates in jbd2_journal_lock_updates().
Reported-and-analyzed-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/jbd2/transaction.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index a0e41a4..35ae096 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -517,12 +517,13 @@ void jbd2_journal_lock_updates(journal_t *journal)
break;
spin_lock(&transaction->t_handle_lock);
+ prepare_to_wait(&journal->j_wait_updates, &wait,
+ TASK_UNINTERRUPTIBLE);
if (!atomic_read(&transaction->t_updates)) {
spin_unlock(&transaction->t_handle_lock);
+ finish_wait(&journal->j_wait_updates, &wait);
break;
}
- prepare_to_wait(&journal->j_wait_updates, &wait,
- TASK_UNINTERRUPTIBLE);
spin_unlock(&transaction->t_handle_lock);
write_unlock(&journal->j_state_lock);
schedule();
--
1.7.1
next prev parent reply other threads:[~2012-01-04 17:00 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-16 11:19 [PATCH][RFC] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_updates operations Toshiyuki Okajima
2011-12-16 12:22 ` Yongqiang Yang
2011-12-20 10:44 ` Toshiyuki Okajima
2011-12-22 11:36 ` [PATCH 0/2 take2][RFC] hangup in jbd2_journal_lock_updates Toshiyuki Okajima
2011-12-22 11:56 ` [PATCH 1/2] jbd2: jbd2_journal_stop needs an exclusive control to synchronize around t_update operations Toshiyuki Okajima
2012-01-03 15:32 ` Jan Kara [this message]
2012-01-05 3:12 ` Ted Ts'o
2011-12-22 12:00 ` [PATCH 2/2] jbd2: remove all t_handle_lock statements Toshiyuki Okajima
2011-12-22 14:07 ` Yongqiang Yang
2011-12-26 0:20 ` Toshiyuki Okajima
2011-12-26 1:17 ` [PATCH 2/2 take2] jbd2: delete spin_lock(t_handle_lock) inside wirte_lock(j_state_lock) Toshiyuki Okajima
2011-12-31 6:35 ` Yongqiang Yang
2012-01-03 15:35 ` Jan Kara
2012-01-05 3:27 ` Ted Ts'o
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120103153245.GE31457@quack.suse.cz \
--to=jack@suse.cz \
--cc=adilger.kernel@dilger.ca \
--cc=linux-ext4@vger.kernel.org \
--cc=toshi.okajima@jp.fujitsu.com \
--cc=tytso@mit.edu \
--cc=xiaoqiangnk@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).