* [PATCH] target: don't corrupt bh_count in iscsit_stop_time2retain_timer()
@ 2013-05-30 20:36 Jörn Engel
2013-06-12 17:13 ` Jörn Engel
0 siblings, 1 reply; 3+ messages in thread
From: Jörn Engel @ 2013-05-30 20:36 UTC (permalink / raw)
To: Nicholas A. Bellinger; +Cc: linux-kernel, target-devel
Nick,
I just started noticing this one. Patch is completely untested so
far, so don't merge it just yet. I just wanted early review,
considering how scary this looks.
Jörn
--
Do not stop an army on its way home.
-- Sun Tzu
Here is a fun one. Bug seems to have been introduced by 140854cb,
almost two years ago. I have no idea why we only started seeing it now,
but we did.
Rough callgraph:
core_tpg_set_initiator_node_queue_depth()
`-> spin_lock_irqsave(&tpg->session_lock, flags);
`-> lio_tpg_shutdown_session()
`-> iscsit_stop_time2retain_timer()
`-> spin_unlock_bh(&se_tpg->session_lock);
`-> spin_lock_bh(&se_tpg->session_lock);
`-> spin_unlock_irqrestore(&tpg->session_lock, flags);
core_tpg_set_initiator_node_queue_depth() used to call spin_lock_bh(),
but 140854cb changed that to spin_lock_irqsave(). However,
lio_tpg_shutdown_session() still claims to be called with spin_lock_bh()
held, as does iscsit_stop_time2retain_timer():
* Called with spin_lock_bh(&struct se_portal_group->session_lock) held
Stale documentation is mostly annoying, but in this case the dropping
the lock with the _bh variant is plain wrong. It is also wrong to drop
locks two functions below the lock-holder, but I will ignore that bit
for now.
After some more locking and unlocking we eventually hit this backtrace:
------------[ cut here ]------------
WARNING: at kernel/softirq.c:159 local_bh_enable_ip+0xe8/0x100()
Pid: 24645, comm: lio_helper.py Tainted: G O 3.6.11+
Call Trace:
[<ffffffff8103e5ff>] warn_slowpath_common+0x7f/0xc0
[<ffffffffa040ae37>] ? iscsit_inc_conn_usage_count+0x37/0x50 [iscsi_target_mod]
[<ffffffff8103e65a>] warn_slowpath_null+0x1a/0x20
[<ffffffff810472f8>] local_bh_enable_ip+0xe8/0x100
[<ffffffff815b8365>] _raw_spin_unlock_bh+0x15/0x20
[<ffffffffa040ae37>] iscsit_inc_conn_usage_count+0x37/0x50 [iscsi_target_mod]
[<ffffffffa041149a>] iscsit_stop_session+0xfa/0x1c0 [iscsi_target_mod]
[<ffffffffa0417fab>] lio_tpg_shutdown_session+0x7b/0x90 [iscsi_target_mod]
[<ffffffffa033ede4>] core_tpg_set_initiator_node_queue_depth+0xe4/0x290 [target_core_mod]
[<ffffffffa0409032>] iscsit_tpg_set_initiator_node_queue_depth+0x12/0x20 [iscsi_target_mod]
[<ffffffffa0415c29>] lio_target_nacl_store_cmdsn_depth+0xa9/0x180 [iscsi_target_mod]
[<ffffffffa0331b49>] target_fabric_nacl_base_attr_store+0x39/0x40 [target_core_mod]
[<ffffffff811b857d>] configfs_write_file+0xbd/0x120
[<ffffffff81148f36>] vfs_write+0xc6/0x180
[<ffffffff81149251>] sys_write+0x51/0x90
[<ffffffff815c0969>] system_call_fastpath+0x16/0x1b
---[ end trace 3747632b9b164652 ]---
As a pure band-aid, this patch drops the _bh.
Signed-off-by: Joern Engel <joern@logfs.org>
---
drivers/target/iscsi/iscsi_target_erl0.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/target/iscsi/iscsi_target_erl0.c b/drivers/target/iscsi/iscsi_target_erl0.c
index 9a577ff..9bd27f2 100644
--- a/drivers/target/iscsi/iscsi_target_erl0.c
+++ b/drivers/target/iscsi/iscsi_target_erl0.c
@@ -834,11 +834,11 @@ extern int iscsit_stop_time2retain_timer(struct iscsi_session *sess)
return 0;
sess->time2retain_timer_flags |= ISCSI_TF_STOP;
- spin_unlock_bh(&se_tpg->session_lock);
+ spin_unlock(&se_tpg->session_lock);
del_timer_sync(&sess->time2retain_timer);
- spin_lock_bh(&se_tpg->session_lock);
+ spin_lock(&se_tpg->session_lock);
sess->time2retain_timer_flags &= ~ISCSI_TF_RUNNING;
pr_debug("Stopped Time2Retain Timer for SID: %u\n",
sess->sid);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] target: don't corrupt bh_count in iscsit_stop_time2retain_timer()
2013-05-30 20:36 [PATCH] target: don't corrupt bh_count in iscsit_stop_time2retain_timer() Jörn Engel
@ 2013-06-12 17:13 ` Jörn Engel
2013-06-14 21:12 ` Nicholas A. Bellinger
0 siblings, 1 reply; 3+ messages in thread
From: Jörn Engel @ 2013-06-12 17:13 UTC (permalink / raw)
To: Nicholas A. Bellinger; +Cc: linux-kernel, target-devel
On Thu, 30 May 2013 16:36:51 -0400, Jörn Engel wrote:
>
> I just started noticing this one. Patch is completely untested so
> far, so don't merge it just yet. I just wanted early review,
> considering how scary this looks.
Patch does fix a race we kept seeing at a low rate. Nick, can you
take it as-is?
Jörn
--
Victory in war is not repetitious.
-- Sun Tzu
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] target: don't corrupt bh_count in iscsit_stop_time2retain_timer()
2013-06-12 17:13 ` Jörn Engel
@ 2013-06-14 21:12 ` Nicholas A. Bellinger
0 siblings, 0 replies; 3+ messages in thread
From: Nicholas A. Bellinger @ 2013-06-14 21:12 UTC (permalink / raw)
To: Jörn Engel; +Cc: linux-kernel, target-devel
On Wed, 2013-06-12 at 13:13 -0400, Jörn Engel wrote:
> On Thu, 30 May 2013 16:36:51 -0400, Jörn Engel wrote:
> >
> > I just started noticing this one. Patch is completely untested so
> > far, so don't merge it just yet. I just wanted early review,
> > considering how scary this looks.
>
> Patch does fix a race we kept seeing at a low rate. Nick, can you
> take it as-is?
>
Mmmm, seems like a reasonable enough work-around for now..
Applied to target-pending/master, with a CC' to stable.
Thanks Joern!
--nab
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-06-14 21:11 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-30 20:36 [PATCH] target: don't corrupt bh_count in iscsit_stop_time2retain_timer() Jörn Engel
2013-06-12 17:13 ` Jörn Engel
2013-06-14 21:12 ` Nicholas A. Bellinger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).