All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ritesh Harjani (IBM)" <ritesh.list@gmail.com>
To: Alexey Lyahkov <alexey.lyashkov@gmail.com>
Cc: linux-ext4 <linux-ext4@vger.kernel.org>,
	Theodore Ts'o <tytso@mit.edu>, Andreas Dilger <adilger@dilger.ca>,
	Artem Blagodarenko <artem.blagodarenko@gmail.com>,
	Andrew Perepechko <anserper@ya.ru>
Subject: Re: [PATCH] jbd2: wake up journal waiters in FIFO order, not  LIFO
Date: Thu, 8 Sep 2022 14:41:16 +0530	[thread overview]
Message-ID: <20220908091116.zvsfttb6dhz57d52@riteshh-domain> (raw)
In-Reply-To: <5C1AAACF-5878-4812-8334-29A328B57A77@gmail.com>

On 22/09/08 11:21AM, Alexey Lyahkov wrote:
> 
> 
> > On 8 Sep 2022, at 09:11, Ritesh Harjani (IBM) <ritesh.list@gmail.com> wrote:
> > 
> > On 22/09/08 08:51AM, Alexey Lyahkov wrote:
> >> Hi Ritesh,
> >> 
> >> This was hit on the Lustre OSS node when we have ton’s of short write with sync/(journal commit) in parallel.
> >> Each write was done from own thread (like 1k-2k threads in parallel).
> >> It caused a situation when only few/some threads make a wakeup and enter to the transaction until it will be T_LOCKED.
> >> In our’s observation all handles from head was waked and it’s handles added recently, while old handles still in list and
> > 
> > Thanks Alexey for providing the details.
> > 
> >> It caused a soft lockup messages on console.
> > 
> > Did you mean hung task timeout? I was wondering why will there be soft lockup
> > warning, because these old handles are anyway in a waiting state right.
> > Am I missing something?
> > 
> Oh. I asked a colleges about details. It was internal lustre hung detector not a kernel side

Thanks again for sharing the details. This indeed looks like a task handle can
remain in wait state for long due to wrong wakeup order in case of many threads. 

-ritesh

> 
> [ 2221.036503] Lustre: ll_ost_io04_080: service thread pid 55122 was inactive for 80.284 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
> [ 2221.036677] Pid: 55212, comm: ll_ost_io05_074 4.18.0-305.10.2.x6.1.010.19.x86_64 #1 SMP Thu Jun 30 13:42:51 MDT 2022
> [ 2221.056673] Lustre: Skipped 2 previous similar messages
> [ 2221.067821] Call Trace TBD:
> [ 2221.067855] [<0>] wait_transaction_locked+0x89/0xc0 [jbd2]
> [ 2221.099175] [<0>] add_transaction_credits+0xd4/0x290 [jbd2]
> [ 2221.105266] [<0>] start_this_handle+0x10a/0x520 [jbd2]
> [ 2221.110904] [<0>] jbd2__journal_start+0xea/0x1f0 [jbd2]
> [ 2221.116679] [<0>] __ldiskfs_journal_start_sb+0x6e/0x130 [ldiskfs]
> [ 2221.123316] [<0>] osd_trans_start+0x13b/0x4f0 [osd_ldiskfs]
> [ 2221.129417] [<0>] ofd_commitrw_write+0x620/0x1830 [ofd]
> [ 2221.135147] [<0>] ofd_commitrw+0x731/0xd80 [ofd]
> [ 2221.140420] [<0>] obd_commitrw+0x1ac/0x370 [ptlrpc]
> [ 2221.145858] [<0>] tgt_brw_write+0x1913/0x1d50 [ptlrpc]
> [ 2221.151561] [<0>] tgt_request_handle+0xc93/0x1a40 [ptlrpc]
> [ 2221.157622] [<0>] ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
> [ 2221.164454] [<0>] ptlrpc_main+0xc06/0x1560 [ptlrpc]
> [ 2221.169860] [<0>] kthread+0x116/0x130
> [ 2221.174033] [<0>] ret_from_fork+0x1f/0x40
> 
> 
> Other logs have shown this thread can’t take a handle, but other threads able to do it many times.
> Kernel detector don’t hit because thread have wakeup many times but it have seen T_LOCKED and go to sleep again.
> 
> Alex
> 
> 
> 
> > -ritesh
> 

  parent reply	other threads:[~2022-09-08  9:11 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-07 16:59 [PATCH] jbd2: wake up journal waiters in FIFO order, not LIFO Alexey Lyashkov
2022-09-08  5:46 ` Ritesh Harjani (IBM)
2022-09-08  5:51   ` Alexey Lyahkov
2022-09-08  6:11     ` Ritesh Harjani (IBM)
2022-09-08  8:21       ` Alexey Lyahkov
2022-09-08  8:28         ` Andrew
2022-09-08  9:11         ` Ritesh Harjani (IBM) [this message]
2022-09-08  9:13   ` Ritesh Harjani (IBM)
2022-09-12 18:01 Alexey Lyashkov
2022-09-30  3:19 ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220908091116.zvsfttb6dhz57d52@riteshh-domain \
    --to=ritesh.list@gmail.com \
    --cc=adilger@dilger.ca \
    --cc=alexey.lyashkov@gmail.com \
    --cc=anserper@ya.ru \
    --cc=artem.blagodarenko@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.