All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: sched_core_balance() releasing interrupts with pi_lock held
Date: Wed, 16 Mar 2022 17:03:12 +0100	[thread overview]
Message-ID: <YjIKQBIbJR/kRR+N@linutronix.de> (raw)
In-Reply-To: <20220315174606.02959816@gandalf.local.home>

On 2022-03-15 17:46:06 [-0400], Steven Rostedt wrote:
> On Tue, 8 Mar 2022 16:14:55 -0500
> Steven Rostedt <rostedt@goodmis.org> wrote:
> 
> > Hi Peter,
> 
> Have you had time to look into this?

yes, I can confirm that it is a problem ;) So I did this:

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 33ce5cd113d8..56c286aaa01f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5950,7 +5950,6 @@ static bool try_steal_cookie(int this, int that)
 	unsigned long cookie;
 	bool success = false;
 
-	local_irq_disable();
 	double_rq_lock(dst, src);
 
 	cookie = dst->core->core_cookie;
@@ -5989,7 +5988,6 @@ static bool try_steal_cookie(int this, int that)
 
 unlock:
 	double_rq_unlock(dst, src);
-	local_irq_enable();
 
 	return success;
 }
@@ -6019,7 +6017,7 @@ static void sched_core_balance(struct rq *rq)
 
 	preempt_disable();
 	rcu_read_lock();
-	raw_spin_rq_unlock_irq(rq);
+	raw_spin_rq_unlock(rq);
 	for_each_domain(cpu, sd) {
 		if (need_resched())
 			break;
@@ -6027,7 +6025,7 @@ static void sched_core_balance(struct rq *rq)
 		if (steal_cookie_task(cpu, sd))
 			break;
 	}
-	raw_spin_rq_lock_irq(rq);
+	raw_spin_rq_lock(rq);
 	rcu_read_unlock();
 	preempt_enable();
 }


which looked right but RT still fall apart:

| =====================================
| WARNING: bad unlock balance detected!
| 5.17.0-rc8-rt14+ #10 Not tainted
| -------------------------------------
| gcc/2608 is trying to release lock ((lock)) at:
| [<ffffffff8135a150>] folio_add_lru+0x60/0x90
| but there are no more locks to release!
| 
| other info that might help us debug this:
| 4 locks held by gcc/2608:
|  #0: ffff88826ea6efe0 (&sb->s_type->i_mutex_key#12){++++}-{3:3}, at: xfs_ilock+0x90/0xd0
|  #1: ffff88826ea6f1a0 (mapping.invalidate_lock#2){++++}-{3:3}, at: page_cache_ra_unbounded+0x8e/0x1f0
|  #2: ffff88852aba8d18 ((lock)#3){+.+.}-{2:2}, at: folio_add_lru+0x2a/0x90
|  #3: ffffffff829a5140 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0x5/0xe0
| 
| stack backtrace:
| CPU: 18 PID: 2608 Comm: gcc Not tainted 5.17.0-rc8-rt14+ #10
| Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014
| Call Trace:
|  <TASK>
|  dump_stack_lvl+0x4a/0x62
|  lock_release.cold+0x32/0x37
|  rt_spin_unlock+0x17/0x80
|  folio_add_lru+0x60/0x90
|  filemap_add_folio+0x53/0xa0
|  page_cache_ra_unbounded+0x1c3/0x1f0
|  filemap_get_pages+0xe3/0x5b0
|  filemap_read+0xc5/0x2f0
|  xfs_file_buffered_read+0x6b/0x1a0
|  xfs_file_read_iter+0x6a/0xd0
|  new_sync_read+0x11b/0x1a0
|  vfs_read+0x134/0x1d0
|  ksys_read+0x68/0xf0
|  do_syscall_64+0x59/0x80
|  entry_SYSCALL_64_after_hwframe+0x44/0xae
| RIP: 0033:0x7f3feab7310e

It is always the local-lock that is breaks apart. Based on "locks held"
and the lock it tries to release it looks like the lock was acquired on
CPU-A and released on CPU-B.

> Thanks,
> 
> -- Steve

Sebastian

  reply	other threads:[~2022-03-16 16:03 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-08 21:14 sched_core_balance() releasing interrupts with pi_lock held Steven Rostedt
2022-03-15 21:46 ` Steven Rostedt
2022-03-16 16:03   ` Sebastian Andrzej Siewior [this message]
2022-03-16 16:18     ` Sebastian Andrzej Siewior
2022-03-16 17:05       ` Sebastian Andrzej Siewior
2022-03-16 20:35       ` Peter Zijlstra
2022-03-17 12:09         ` Sebastian Andrzej Siewior
2022-03-17 14:51           ` [PATCH] sched: Teach the forced-newidle balancer about CPU affinity limitation Sebastian Andrzej Siewior
2022-04-05  8:22             ` [tip: sched/urgent] " tip-bot2 for Sebastian Andrzej Siewior
2022-03-16 20:27   ` sched_core_balance() releasing interrupts with pi_lock held Peter Zijlstra
2022-03-16 21:03     ` Peter Zijlstra
2022-03-21 17:30       ` Steven Rostedt
2022-03-29 21:22         ` Steven Rostedt
2022-04-04 20:17           ` T.J. Alumbaugh
2022-04-05  7:48             ` Peter Zijlstra
2022-04-05 15:16               ` Dietmar Eggemann
2022-03-17 12:08     ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YjIKQBIbJR/kRR+N@linutronix.de \
    --to=bigeasy@linutronix.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.