All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexey Lyahkov <alexey.lyashkov@gmail.com>
To: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
Cc: linux-ext4 <linux-ext4@vger.kernel.org>,
	Theodore Ts'o <tytso@mit.edu>, Andreas Dilger <adilger@dilger.ca>,
	Artem Blagodarenko <artem.blagodarenko@gmail.com>,
	Andrew Perepechko <anserper@ya.ru>
Subject: Re: [PATCH] jbd2: wake up journal waiters in FIFO order, not  LIFO
Date: Thu, 8 Sep 2022 08:51:40 +0300	[thread overview]
Message-ID: <B32B956C-E851-42A2-9419-2947C442E2AA@gmail.com> (raw)
In-Reply-To: <20220908054611.vjcb27wmq4dggqmv@riteshh-domain>

Hi Ritesh,

This was hit on the Lustre OSS node when we have ton’s of short write with sync/(journal commit) in parallel.
Each write was done from own thread (like 1k-2k threads in parallel).
It caused a situation when only few/some threads make a wakeup and enter to the transaction until it will be T_LOCKED.
In our’s observation all handles from head was waked and it’s handles added recently, while old handles still in list and
It caused a soft lockup messages on console.

Alex


> On 8 Sep 2022, at 08:46, Ritesh Harjani (IBM) <ritesh.list@gmail.com> wrote:
> 
> On 22/09/07 07:59PM, Alexey Lyashkov wrote:
>> From: Andrew Perepechko <anserper@ya.ru>
>> 
>> LIFO wakeup order is unfair and sometimes leads to a journal
>> user not being able to get a journal handle for hundreds of
>> transactions in a row.
>> 
>> FIFO wakeup can make things more fair.
> 
> prepare_to_wait() will always add the task to the head of the list.
> While prepare_to_wait_exclusive() will add the task to the tail since all of the
> exclusive tasks are added to the tail.
> wake_up() function will wake up all non-exclusive and single exclusive task 
> v/s
> wake_up_all() function will wake up all tasks irrespective.
> 
> So your change does makes the ordering to FIFO, in which the task which came in 
> first will be woken up first. 
> 
> Although I was wondering about 2 things - 
> 1. In what scenario this was observed to become a problem/bottleneck for you?
> Could you kindly give more details of your problem?
> 
> 2. What about start_this_handle() function where we call wait_event() 
> for j_barrier_count to be 0? I guess that doesn't happen often.
> 
> -ritesh
> 
> 
>> 
>> Signed-off-by: Alexey Lyashkov <alexey.lyashkov@gmail.com>
>> ---
>> fs/jbd2/commit.c      | 2 +-
>> fs/jbd2/transaction.c | 6 +++---
>> 2 files changed, 4 insertions(+), 4 deletions(-)
>> 
>> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
>> index b2b2bc9b88d9..ec2b55879e3a 100644
>> --- a/fs/jbd2/commit.c
>> +++ b/fs/jbd2/commit.c
>> @@ -570,7 +570,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
>> 	journal->j_running_transaction = NULL;
>> 	start_time = ktime_get();
>> 	commit_transaction->t_log_start = journal->j_head;
>> -	wake_up(&journal->j_wait_transaction_locked);
>> +	wake_up_all(&journal->j_wait_transaction_locked);
>> 	write_unlock(&journal->j_state_lock);
>> 
>> 	jbd2_debug(3, "JBD2: commit phase 2a\n");
>> diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
>> index e1be93ccd81c..6a404ac1c178 100644
>> --- a/fs/jbd2/transaction.c
>> +++ b/fs/jbd2/transaction.c
>> @@ -168,7 +168,7 @@ static void wait_transaction_locked(journal_t *journal)
>> 	int need_to_start;
>> 	tid_t tid = journal->j_running_transaction->t_tid;
>> 
>> -	prepare_to_wait(&journal->j_wait_transaction_locked, &wait,
>> +	prepare_to_wait_exclusive(&journal->j_wait_transaction_locked, &wait,
>> 			TASK_UNINTERRUPTIBLE);
>> 	need_to_start = !tid_geq(journal->j_commit_request, tid);
>> 	read_unlock(&journal->j_state_lock);
>> @@ -194,7 +194,7 @@ static void wait_transaction_switching(journal_t *journal)
>> 		read_unlock(&journal->j_state_lock);
>> 		return;
>> 	}
>> -	prepare_to_wait(&journal->j_wait_transaction_locked, &wait,
>> +	prepare_to_wait_exclusive(&journal->j_wait_transaction_locked, &wait,
>> 			TASK_UNINTERRUPTIBLE);
>> 	read_unlock(&journal->j_state_lock);
>> 	/*
>> @@ -920,7 +920,7 @@ void jbd2_journal_unlock_updates (journal_t *journal)
>> 	write_lock(&journal->j_state_lock);
>> 	--journal->j_barrier_count;
>> 	write_unlock(&journal->j_state_lock);
>> -	wake_up(&journal->j_wait_transaction_locked);
>> +	wake_up_all(&journal->j_wait_transaction_locked);
>> }
>> 
>> static void warn_dirty_buffer(struct buffer_head *bh)
>> -- 
>> 2.31.1
>> 


  reply	other threads:[~2022-09-08  5:51 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-07 16:59 [PATCH] jbd2: wake up journal waiters in FIFO order, not LIFO Alexey Lyashkov
2022-09-08  5:46 ` Ritesh Harjani (IBM)
2022-09-08  5:51   ` Alexey Lyahkov [this message]
2022-09-08  6:11     ` Ritesh Harjani (IBM)
2022-09-08  8:21       ` Alexey Lyahkov
2022-09-08  8:28         ` Andrew
2022-09-08  9:11         ` Ritesh Harjani (IBM)
2022-09-08  9:13   ` Ritesh Harjani (IBM)
2022-09-12 18:01 Alexey Lyashkov
2022-09-30  3:19 ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=B32B956C-E851-42A2-9419-2947C442E2AA@gmail.com \
    --to=alexey.lyashkov@gmail.com \
    --cc=adilger@dilger.ca \
    --cc=anserper@ya.ru \
    --cc=artem.blagodarenko@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=ritesh.list@gmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.