All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] tcm_fc fixes
@ 2016-08-22  8:54 Hannes Reinecke
  2016-08-22  8:54 ` [PATCH 1/6] target: fix hang in target_wait_for_sess_cmds() Hannes Reinecke
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Hannes Reinecke @ 2016-08-22  8:54 UTC (permalink / raw)
  To: Nic Bellinger; +Cc: target-devel, linux-scsi, Martin Wilck, Hannes Reinecke

Hi Nic,

here's a set of patches and fixes I've found during debugging
FCoE over virtio.
The first two fix a race condition I've encountered during starting
and stopping initiators; they are similar to your prior patches,
but I still think they should be applied to avoid any race issues.
The next three are just minor cleanups, but the last one is
_absolutely_ crucial.
With the current code tcm_fc will behave _very_ odd if you're
running in a virtualized environment, as the xid allocator in libfc
will be thouroughly thrashed by sending frames on _other_ cpus than
those the originator frame got received on.
This causes frames to be delayed (by up to several minutes)
leading to I/O errors on the initiator side.

As usual, comments and reviews are welcome.

Hannes Reinecke (6):
  target: fix hang in target_wait_for_sess_cmds()
  target: fix potential race window in target_sess_cmd_list_waiting()
  target/tcm_fc: print command pointer in debug message
  target/tcm_fc: return detailed error in ft_sess_create()
  target/tcm_fc: Update debugging statements to match libfc usage
  target/tcm_fc: use CPU affinity for responses

 drivers/target/target_core_transport.c | 28 ++++++++++++++---------
 drivers/target/tcm_fc/tfc_cmd.c        |  4 ++--
 drivers/target/tcm_fc/tfc_sess.c       | 42 +++++++++++++++++++++++-----------
 3 files changed, 48 insertions(+), 26 deletions(-)

-- 
1.8.5.6


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/6] target: fix hang in target_wait_for_sess_cmds()
  2016-08-22  8:54 [PATCH 0/6] tcm_fc fixes Hannes Reinecke
@ 2016-08-22  8:54 ` Hannes Reinecke
  2016-10-21  8:00   ` Nicholas A. Bellinger
  2016-08-22  8:54 ` [PATCH 2/6] target: fix potential race window in target_sess_cmd_list_waiting() Hannes Reinecke
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Hannes Reinecke @ 2016-08-22  8:54 UTC (permalink / raw)
  To: Nic Bellinger
  Cc: target-devel, linux-scsi, Martin Wilck, Hannes Reinecke, Hannes Reinecke

When shutting down a session I'm seeing this hung_task message:

INFO: task kworker/u128:0:6 blocked for more than 480 seconds.
      Tainted: G            E   N  4.4.17-default+ #241
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u128:0  D ffff880237615c00     0     6      2 0x00000000
 Workqueue: fc_rport_eq fc_rport_work [libfc]
 ffff88017edd7c48 ffffffff81c11500 ffff88017edd0180 ffff88017edd8000
 ffff88035b6860a8 ffff88017edd0180 ffff88036461eac0 0000000000000000
 ffff88017edd7c60 ffffffff815db4d5 7fffffffffffffff ffff88017edd7d10
Call Trace:
 [<ffffffff815db4d5>] schedule+0x35/0x80
 [<ffffffff815dde17>] schedule_timeout+0x237/0x2d0
 [<ffffffff815dc6e3>] wait_for_completion+0xa3/0x110
 [<ffffffffa0869fb5>] target_wait_for_sess_cmds+0x45/0x190 [target_core_mod]
 [<ffffffffa03953b4>] ft_close_sess+0x24/0x30 [tcm_fc]
 [<ffffffffa03955ba>] ft_prlo+0x8a/0x95 [tcm_fc]
 [<ffffffffa066fc28>] fc_rport_work+0x3b8/0x7c0 [libfc]
 [<ffffffff810939ee>] process_one_work+0x14e/0x410
 [<ffffffff81094246>] worker_thread+0x116/0x490

Looking at the code it looks as if there is some confusion about
when to call 'wait_for_completion(&cmd->cmd_wait_comp).
The problem is that there are _two_ sections where the code
waits for 'cmd_wait_comp' completion, but the completion
is only called with 'complete', not 'complete_all'.
So we cannot have both waiting for it, doing so will lead
to a stuck wait_for_completion.

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 drivers/target/target_core_transport.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 6094a6b..2e1a6d8 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2522,7 +2522,7 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
 	 * the remaining calls to target_put_sess_cmd(), and not the
 	 * callers of this function.
 	 */
-	if (aborted) {
+	if (aborted && !cmd->cmd_wait_set) {
 		pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag);
 		wait_for_completion(&cmd->cmd_wait_comp);
 		cmd->se_tfo->release_cmd(cmd);
@@ -2642,9 +2642,11 @@ void target_sess_cmd_list_set_waiting(struct se_session *se_sess)
 	list_for_each_entry(se_cmd, &se_sess->sess_wait_list, se_cmd_list) {
 		rc = kref_get_unless_zero(&se_cmd->cmd_kref);
 		if (rc) {
-			se_cmd->cmd_wait_set = 1;
 			spin_lock(&se_cmd->t_state_lock);
-			se_cmd->transport_state |= CMD_T_FABRIC_STOP;
+			if (!(se_cmd->transport_state & CMD_T_FABRIC_STOP)) {
+				se_cmd->cmd_wait_set = 1;
+				se_cmd->transport_state |= CMD_T_FABRIC_STOP;
+			}
 			spin_unlock(&se_cmd->t_state_lock);
 		}
 	}
@@ -2664,24 +2666,26 @@ void target_wait_for_sess_cmds(struct se_session *se_sess)
 
 	list_for_each_entry_safe(se_cmd, tmp_cmd,
 				&se_sess->sess_wait_list, se_cmd_list) {
+		int cmd_wait;
 		pr_debug("Waiting for se_cmd: %p t_state: %d, fabric state:"
 			" %d\n", se_cmd, se_cmd->t_state,
 			se_cmd->se_tfo->get_cmd_state(se_cmd));
 
 		spin_lock_irqsave(&se_cmd->t_state_lock, flags);
 		tas = (se_cmd->transport_state & CMD_T_TAS);
+		cmd_wait = se_cmd->cmd_wait_set;
 		spin_unlock_irqrestore(&se_cmd->t_state_lock, flags);
-
 		if (!target_put_sess_cmd(se_cmd)) {
 			if (tas)
 				target_put_sess_cmd(se_cmd);
 		}
-
+		if (!cmd_wait)
+			continue;
 		wait_for_completion(&se_cmd->cmd_wait_comp);
 		pr_debug("After cmd_wait_comp: se_cmd: %p t_state: %d"
 			" fabric state: %d\n", se_cmd, se_cmd->t_state,
 			se_cmd->se_tfo->get_cmd_state(se_cmd));
-
+		se_cmd->cmd_wait_set = 0;
 		se_cmd->se_tfo->release_cmd(se_cmd);
 	}
 
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/6] target: fix potential race window in target_sess_cmd_list_waiting()
  2016-08-22  8:54 [PATCH 0/6] tcm_fc fixes Hannes Reinecke
  2016-08-22  8:54 ` [PATCH 1/6] target: fix hang in target_wait_for_sess_cmds() Hannes Reinecke
@ 2016-08-22  8:54 ` Hannes Reinecke
  2016-10-21  8:19   ` Nicholas A. Bellinger
  2016-08-22  8:54 ` [PATCH 3/6] target/tcm_fc: print command pointer in debug message Hannes Reinecke
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Hannes Reinecke @ 2016-08-22  8:54 UTC (permalink / raw)
  To: Nic Bellinger
  Cc: target-devel, linux-scsi, Martin Wilck, Hannes Reinecke, Hannes Reinecke

target_sess_cmd_list_waiting() might hit on a condition where
the kref for the command is already 0, but the destructor has
not been called yet (or is stuck in waiting for a spin lock).
Rather than leaving the command on the list we should explicitly
remove it to avoid race issues later on.

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 drivers/target/target_core_transport.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 2e1a6d8..ce136f0 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2547,8 +2547,8 @@ int target_get_sess_cmd(struct se_cmd *se_cmd, bool ack_kref)
 	 * fabric acknowledgement that requires two target_put_sess_cmd()
 	 * invocations before se_cmd descriptor release.
 	 */
-	if (ack_kref)
-		kref_get(&se_cmd->cmd_kref);
+	if (ack_kref && !kref_get_unless_zero(&se_cmd->cmd_kref))
+		return -EINVAL;
 
 	spin_lock_irqsave(&se_sess->sess_cmd_lock, flags);
 	if (se_sess->sess_tearing_down) {
@@ -2627,7 +2627,7 @@ EXPORT_SYMBOL(target_put_sess_cmd);
  */
 void target_sess_cmd_list_set_waiting(struct se_session *se_sess)
 {
-	struct se_cmd *se_cmd;
+	struct se_cmd *se_cmd, *tmp_cmd;
 	unsigned long flags;
 	int rc;
 
@@ -2639,7 +2639,8 @@ void target_sess_cmd_list_set_waiting(struct se_session *se_sess)
 	se_sess->sess_tearing_down = 1;
 	list_splice_init(&se_sess->sess_cmd_list, &se_sess->sess_wait_list);
 
-	list_for_each_entry(se_cmd, &se_sess->sess_wait_list, se_cmd_list) {
+	list_for_each_entry_safe(se_cmd, tmp_cmd,
+				 &se_sess->sess_wait_list, se_cmd_list) {
 		rc = kref_get_unless_zero(&se_cmd->cmd_kref);
 		if (rc) {
 			spin_lock(&se_cmd->t_state_lock);
@@ -2648,7 +2649,8 @@ void target_sess_cmd_list_set_waiting(struct se_session *se_sess)
 				se_cmd->transport_state |= CMD_T_FABRIC_STOP;
 			}
 			spin_unlock(&se_cmd->t_state_lock);
-		}
+		} else
+			list_del_init(&se_cmd->se_cmd_list);
 	}
 
 	spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/6] target/tcm_fc: print command pointer in debug message
  2016-08-22  8:54 [PATCH 0/6] tcm_fc fixes Hannes Reinecke
  2016-08-22  8:54 ` [PATCH 1/6] target: fix hang in target_wait_for_sess_cmds() Hannes Reinecke
  2016-08-22  8:54 ` [PATCH 2/6] target: fix potential race window in target_sess_cmd_list_waiting() Hannes Reinecke
@ 2016-08-22  8:54 ` Hannes Reinecke
  2016-08-22  8:54 ` [PATCH 4/6] target/tcm_fc: return detailed error in ft_sess_create() Hannes Reinecke
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Hannes Reinecke @ 2016-08-22  8:54 UTC (permalink / raw)
  To: Nic Bellinger
  Cc: target-devel, linux-scsi, Martin Wilck, Hannes Reinecke, Hannes Reinecke

When allocating a new command we should add the pointer to the
debug statements; that allows us to match this with other debug
statements for handling data.

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 drivers/target/tcm_fc/tfc_cmd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/target/tcm_fc/tfc_cmd.c b/drivers/target/tcm_fc/tfc_cmd.c
index 216e18c..36f0864 100644
--- a/drivers/target/tcm_fc/tfc_cmd.c
+++ b/drivers/target/tcm_fc/tfc_cmd.c
@@ -575,7 +575,7 @@ static void ft_send_work(struct work_struct *work)
 			      TARGET_SCF_ACK_KREF))
 		goto err;
 
-	pr_debug("r_ctl %x alloc target_submit_cmd\n", fh->fh_r_ctl);
+	pr_debug("r_ctl %x target_submit_cmd %p\n", fh->fh_r_ctl, cmd);
 	return;
 
 err:
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/6] target/tcm_fc: return detailed error in ft_sess_create()
  2016-08-22  8:54 [PATCH 0/6] tcm_fc fixes Hannes Reinecke
                   ` (2 preceding siblings ...)
  2016-08-22  8:54 ` [PATCH 3/6] target/tcm_fc: print command pointer in debug message Hannes Reinecke
@ 2016-08-22  8:54 ` Hannes Reinecke
  2016-08-22  8:54 ` [PATCH 5/6] target/tcm_fc: Update debugging statements to match libfc usage Hannes Reinecke
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Hannes Reinecke @ 2016-08-22  8:54 UTC (permalink / raw)
  To: Nic Bellinger
  Cc: target-devel, linux-scsi, Martin Wilck, Hannes Reinecke, Hannes Reinecke

Not every failure is due to out-of-memory; the ACLs might not be
set, too. So return a detailed error code in ft_sess_create()
instead of just a NULL pointer.

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 drivers/target/tcm_fc/tfc_sess.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/target/tcm_fc/tfc_sess.c b/drivers/target/tcm_fc/tfc_sess.c
index 6ffbb60..cc8c261 100644
--- a/drivers/target/tcm_fc/tfc_sess.c
+++ b/drivers/target/tcm_fc/tfc_sess.c
@@ -223,7 +223,7 @@ static struct ft_sess *ft_sess_create(struct ft_tport *tport, u32 port_id,
 
 	sess = kzalloc(sizeof(*sess), GFP_KERNEL);
 	if (!sess)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
 	kref_init(&sess->kref); /* ref for table entry */
 	sess->tport = tport;
@@ -234,8 +234,9 @@ static struct ft_sess *ft_sess_create(struct ft_tport *tport, u32 port_id,
 					     TARGET_PROT_NORMAL, &initiatorname[0],
 					     sess, ft_sess_alloc_cb);
 	if (IS_ERR(sess->se_sess)) {
+		int rc = PTR_ERR(sess->se_sess);
 		kfree(sess);
-		return NULL;
+		sess = ERR_PTR(rc);
 	}
 	return sess;
 }
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/6] target/tcm_fc: Update debugging statements to match libfc usage
  2016-08-22  8:54 [PATCH 0/6] tcm_fc fixes Hannes Reinecke
                   ` (3 preceding siblings ...)
  2016-08-22  8:54 ` [PATCH 4/6] target/tcm_fc: return detailed error in ft_sess_create() Hannes Reinecke
@ 2016-08-22  8:54 ` Hannes Reinecke
  2016-08-23 18:07   ` Bart Van Assche
  2016-08-22  8:54 ` [PATCH 6/6] target/tcm_fc: use CPU affinity for responses Hannes Reinecke
  2016-10-21  7:46 ` [PATCH 0/6] tcm_fc fixes Nicholas A. Bellinger
  6 siblings, 1 reply; 11+ messages in thread
From: Hannes Reinecke @ 2016-08-22  8:54 UTC (permalink / raw)
  To: Nic Bellinger
  Cc: target-devel, linux-scsi, Martin Wilck, Hannes Reinecke, Hannes Reinecke

Update the debug statements to match those from libfc.

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 drivers/target/tcm_fc/tfc_sess.c | 37 ++++++++++++++++++++++++++-----------
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/drivers/target/tcm_fc/tfc_sess.c b/drivers/target/tcm_fc/tfc_sess.c
index cc8c261..fd5c3de 100644
--- a/drivers/target/tcm_fc/tfc_sess.c
+++ b/drivers/target/tcm_fc/tfc_sess.c
@@ -39,6 +39,11 @@
 
 #include "tcm_fc.h"
 
+#define TFC_SESS_DBG(lport, fmt, args...) \
+	pr_debug("host%u: rport %6.6x: " fmt,	   \
+		 (lport)->host->host_no,	   \
+		 (lport)->port_id, ##args )
+
 static void ft_sess_delete_all(struct ft_tport *);
 
 /*
@@ -167,24 +172,29 @@ static struct ft_sess *ft_sess_get(struct fc_lport *lport, u32 port_id)
 	struct ft_tport *tport;
 	struct hlist_head *head;
 	struct ft_sess *sess;
+	char *reason = "no session created";
 
 	rcu_read_lock();
 	tport = rcu_dereference(lport->prov[FC_TYPE_FCP]);
-	if (!tport)
+	if (!tport) {
+		reason = "not an FCP port";
 		goto out;
+	}
 
 	head = &tport->hash[ft_sess_hash(port_id)];
 	hlist_for_each_entry_rcu(sess, head, hash) {
 		if (sess->port_id == port_id) {
 			kref_get(&sess->kref);
 			rcu_read_unlock();
-			pr_debug("port_id %x found %p\n", port_id, sess);
+			TFC_SESS_DBG(lport, "port_id %x found %p\n",
+				     port_id, sess);
 			return sess;
 		}
 	}
 out:
 	rcu_read_unlock();
-	pr_debug("port_id %x not found\n", port_id);
+	TFC_SESS_DBG(lport, "port_id %x not found, %s\n",
+		     port_id, reason);
 	return NULL;
 }
 
@@ -195,7 +205,7 @@ static int ft_sess_alloc_cb(struct se_portal_group *se_tpg,
 	struct ft_tport *tport = sess->tport;
 	struct hlist_head *head = &tport->hash[ft_sess_hash(sess->port_id)];
 
-	pr_debug("port_id %x sess %p\n", sess->port_id, sess);
+	TFC_SESS_DBG(tport->lport, "port_id %x sess %p\n", sess->port_id, sess);
 	hlist_add_head_rcu(&sess->hash, head);
 	tport->sess_count++;
 
@@ -320,7 +330,7 @@ void ft_sess_close(struct se_session *se_sess)
 		mutex_unlock(&ft_lport_lock);
 		return;
 	}
-	pr_debug("port_id %x\n", port_id);
+	TFC_SESS_DBG(sess->tport->lport, "port_id %x close session\n", port_id);
 	ft_sess_unhash(sess);
 	mutex_unlock(&ft_lport_lock);
 	ft_close_sess(sess);
@@ -380,8 +390,13 @@ static int ft_prli_locked(struct fc_rport_priv *rdata, u32 spp_len,
 		if (!(fcp_parm & FCP_SPPF_INIT_FCN))
 			return FC_SPP_RESP_CONF;
 		sess = ft_sess_create(tport, rdata->ids.port_id, rdata);
-		if (!sess)
-			return FC_SPP_RESP_RES;
+		if (IS_ERR(sess)) {
+			if (PTR_ERR(sess) == -EACCES) {
+				spp->spp_flags &= ~FC_SPP_EST_IMG_PAIR;
+				return FC_SPP_RESP_CONF;
+			} else
+				return FC_SPP_RESP_RES;
+		}
 		if (!sess->params)
 			rdata->prli_count++;
 		sess->params = fcp_parm;
@@ -424,8 +439,8 @@ static int ft_prli(struct fc_rport_priv *rdata, u32 spp_len,
 	mutex_lock(&ft_lport_lock);
 	ret = ft_prli_locked(rdata, spp_len, rspp, spp);
 	mutex_unlock(&ft_lport_lock);
-	pr_debug("port_id %x flags %x ret %x\n",
-	       rdata->ids.port_id, rspp ? rspp->spp_flags : 0, ret);
+	TFC_SESS_DBG(rdata->local_port, "port_id %x flags %x ret %x\n",
+		     rdata->ids.port_id, rspp ? rspp->spp_flags : 0, ret);
 	return ret;
 }
 
@@ -478,11 +493,11 @@ static void ft_recv(struct fc_lport *lport, struct fc_frame *fp)
 	struct ft_sess *sess;
 	u32 sid = fc_frame_sid(fp);
 
-	pr_debug("sid %x\n", sid);
+	TFC_SESS_DBG(lport, "recv sid %x\n", sid);
 
 	sess = ft_sess_get(lport, sid);
 	if (!sess) {
-		pr_debug("sid %x sess lookup failed\n", sid);
+		TFC_SESS_DBG(lport, "sid %x sess lookup failed\n", sid);
 		/* TBD XXX - if FCP_CMND, send PRLO */
 		fc_frame_free(fp);
 		return;
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6/6] target/tcm_fc: use CPU affinity for responses
  2016-08-22  8:54 [PATCH 0/6] tcm_fc fixes Hannes Reinecke
                   ` (4 preceding siblings ...)
  2016-08-22  8:54 ` [PATCH 5/6] target/tcm_fc: Update debugging statements to match libfc usage Hannes Reinecke
@ 2016-08-22  8:54 ` Hannes Reinecke
  2016-10-21  7:46 ` [PATCH 0/6] tcm_fc fixes Nicholas A. Bellinger
  6 siblings, 0 replies; 11+ messages in thread
From: Hannes Reinecke @ 2016-08-22  8:54 UTC (permalink / raw)
  To: Nic Bellinger
  Cc: target-devel, linux-scsi, Martin Wilck, Hannes Reinecke, Hannes Reinecke

The libfc stack assigns exchange IDs based on the CPU the request
was received on, so we need to send the responses via the same CPU.
Otherwise the send logic gets confuses and responses will be delayed,
causing exchange timeouts on the initiator side.

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 drivers/target/tcm_fc/tfc_cmd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/target/tcm_fc/tfc_cmd.c b/drivers/target/tcm_fc/tfc_cmd.c
index 36f0864..ff5de9a 100644
--- a/drivers/target/tcm_fc/tfc_cmd.c
+++ b/drivers/target/tcm_fc/tfc_cmd.c
@@ -572,7 +572,7 @@ static void ft_send_work(struct work_struct *work)
 	if (target_submit_cmd(&cmd->se_cmd, cmd->sess->se_sess, fcp->fc_cdb,
 			      &cmd->ft_sense_buffer[0], scsilun_to_int(&fcp->fc_lun),
 			      ntohl(fcp->fc_dl), task_attr, data_dir,
-			      TARGET_SCF_ACK_KREF))
+			      TARGET_SCF_ACK_KREF | TARGET_SCF_USE_CPUID))
 		goto err;
 
 	pr_debug("r_ctl %x target_submit_cmd %p\n", fh->fh_r_ctl, cmd);
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 5/6] target/tcm_fc: Update debugging statements to match libfc usage
  2016-08-22  8:54 ` [PATCH 5/6] target/tcm_fc: Update debugging statements to match libfc usage Hannes Reinecke
@ 2016-08-23 18:07   ` Bart Van Assche
  0 siblings, 0 replies; 11+ messages in thread
From: Bart Van Assche @ 2016-08-23 18:07 UTC (permalink / raw)
  To: Hannes Reinecke, Nic Bellinger
  Cc: target-devel, linux-scsi, Martin Wilck, Hannes Reinecke

On 08/22/2016 01:54 AM, Hannes Reinecke wrote:
> +	char *reason = "no session created";

Please use const char* instead of char* in the above declaration.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/6] tcm_fc fixes
  2016-08-22  8:54 [PATCH 0/6] tcm_fc fixes Hannes Reinecke
                   ` (5 preceding siblings ...)
  2016-08-22  8:54 ` [PATCH 6/6] target/tcm_fc: use CPU affinity for responses Hannes Reinecke
@ 2016-10-21  7:46 ` Nicholas A. Bellinger
  6 siblings, 0 replies; 11+ messages in thread
From: Nicholas A. Bellinger @ 2016-10-21  7:46 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: target-devel, linux-scsi, Martin Wilck

Hi Hannes & Martin,

Apologies for the extended delay on this.  Comments below.

On Mon, 2016-08-22 at 10:54 +0200, Hannes Reinecke wrote:
> Hi Nic,
> 
> here's a set of patches and fixes I've found during debugging
> FCoE over virtio.
> The first two fix a race condition I've encountered during starting
> and stopping initiators; they are similar to your prior patches,
> but I still think they should be applied to avoid any race issues.
> The next three are just minor cleanups, but the last one is
> _absolutely_ crucial.
> With the current code tcm_fc will behave _very_ odd if you're
> running in a virtualized environment, as the xid allocator in libfc
> will be thouroughly thrashed by sending frames on _other_ cpus than
> those the originator frame got received on.
> This causes frames to be delayed (by up to several minutes)
> leading to I/O errors on the initiator side.
> 

Thanks for the heads up on #6.  Applied with v4.5+ tag, the earliest
stable where TARGET_SCF_USE_CPUID is available.

> As usual, comments and reviews are welcome.
> 
> Hannes Reinecke (6):
>   target: fix hang in target_wait_for_sess_cmds()
>   target: fix potential race window in target_sess_cmd_list_waiting()
>   target/tcm_fc: print command pointer in debug message
>   target/tcm_fc: return detailed error in ft_sess_create()
>   target/tcm_fc: Update debugging statements to match libfc usage
>   target/tcm_fc: use CPU affinity for responses
> 
>  drivers/target/target_core_transport.c | 28 ++++++++++++++---------
>  drivers/target/tcm_fc/tfc_cmd.c        |  4 ++--
>  drivers/target/tcm_fc/tfc_sess.c       | 42 +++++++++++++++++++++++-----------
>  3 files changed, 48 insertions(+), 26 deletions(-)
> 

Applied #3-6 to target-pending/master.  Comments inline for #1 and #2.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/6] target: fix hang in target_wait_for_sess_cmds()
  2016-08-22  8:54 ` [PATCH 1/6] target: fix hang in target_wait_for_sess_cmds() Hannes Reinecke
@ 2016-10-21  8:00   ` Nicholas A. Bellinger
  0 siblings, 0 replies; 11+ messages in thread
From: Nicholas A. Bellinger @ 2016-10-21  8:00 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: target-devel, linux-scsi, Martin Wilck, Hannes Reinecke

On Mon, 2016-08-22 at 10:54 +0200, Hannes Reinecke wrote:
> When shutting down a session I'm seeing this hung_task message:
> 
> INFO: task kworker/u128:0:6 blocked for more than 480 seconds.
>       Tainted: G            E   N  4.4.17-default+ #241
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/u128:0  D ffff880237615c00     0     6      2 0x00000000
>  Workqueue: fc_rport_eq fc_rport_work [libfc]
>  ffff88017edd7c48 ffffffff81c11500 ffff88017edd0180 ffff88017edd8000
>  ffff88035b6860a8 ffff88017edd0180 ffff88036461eac0 0000000000000000
>  ffff88017edd7c60 ffffffff815db4d5 7fffffffffffffff ffff88017edd7d10
> Call Trace:
>  [<ffffffff815db4d5>] schedule+0x35/0x80
>  [<ffffffff815dde17>] schedule_timeout+0x237/0x2d0
>  [<ffffffff815dc6e3>] wait_for_completion+0xa3/0x110
>  [<ffffffffa0869fb5>] target_wait_for_sess_cmds+0x45/0x190 [target_core_mod]
>  [<ffffffffa03953b4>] ft_close_sess+0x24/0x30 [tcm_fc]
>  [<ffffffffa03955ba>] ft_prlo+0x8a/0x95 [tcm_fc]
>  [<ffffffffa066fc28>] fc_rport_work+0x3b8/0x7c0 [libfc]
>  [<ffffffff810939ee>] process_one_work+0x14e/0x410
>  [<ffffffff81094246>] worker_thread+0x116/0x490
> 
> Looking at the code it looks as if there is some confusion about
> when to call 'wait_for_completion(&cmd->cmd_wait_comp).
> The problem is that there are _two_ sections where the code
> waits for 'cmd_wait_comp' completion, but the completion
> is only called with 'complete', not 'complete_all'.
> So we cannot have both waiting for it, doing so will lead
> to a stuck wait_for_completion.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.com>
> ---
>  drivers/target/target_core_transport.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index 6094a6b..2e1a6d8 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -2522,7 +2522,7 @@ int transport_generic_free_cmd(struct se_cmd *cmd, int wait_for_tasks)
>  	 * the remaining calls to target_put_sess_cmd(), and not the
>  	 * callers of this function.
>  	 */
> -	if (aborted) {
> +	if (aborted && !cmd->cmd_wait_set) {
>  		pr_debug("Detected CMD_T_ABORTED for ITT: %llu\n", cmd->tag);
>  		wait_for_completion(&cmd->cmd_wait_comp);
>  		cmd->se_tfo->release_cmd(cmd);

Do you recall if hung se_cmd shutdown was with both CMD_T_ABORTED and
!CMD_T_ABORTED descriptors, or former only..?

Using v4.4.y code for CMD_T_ABORTED, you likely hit the missing
SCF_ACK_KREF assignment regression leak:

http://www.spinics.net/lists/target-devel/msg13530.html

> @@ -2642,9 +2642,11 @@ void target_sess_cmd_list_set_waiting(struct se_session *se_sess)
>  	list_for_each_entry(se_cmd, &se_sess->sess_wait_list, se_cmd_list) {
>  		rc = kref_get_unless_zero(&se_cmd->cmd_kref);
>  		if (rc) {
> -			se_cmd->cmd_wait_set = 1;
>  			spin_lock(&se_cmd->t_state_lock);
> -			se_cmd->transport_state |= CMD_T_FABRIC_STOP;
> +			if (!(se_cmd->transport_state & CMD_T_FABRIC_STOP)) {
> +				se_cmd->cmd_wait_set = 1;
> +				se_cmd->transport_state |= CMD_T_FABRIC_STOP;
> +			}
>  			spin_unlock(&se_cmd->t_state_lock);
>  		}
>  	}

So tcm_fc uses transport_generic_free_cmd(se_cmd, wait_for_tasks=0) in
ft_check_stop_free() to drop the extra SCF_ACK_KREF reference, which
does the extra if (aborted) check + possible sleep mentioned above.

AFAICT, ft_check_stop_free() response callback should be using
target_put_sess_cmd() to drop se_cmd->cmd_kref, and not attempt to block
for aborted status at all..

> @@ -2664,24 +2666,26 @@ void target_wait_for_sess_cmds(struct se_session *se_sess)
>  
>  	list_for_each_entry_safe(se_cmd, tmp_cmd,
>  				&se_sess->sess_wait_list, se_cmd_list) {
> +		int cmd_wait;
>  		pr_debug("Waiting for se_cmd: %p t_state: %d, fabric state:"
>  			" %d\n", se_cmd, se_cmd->t_state,
>  			se_cmd->se_tfo->get_cmd_state(se_cmd));
>  
>  		spin_lock_irqsave(&se_cmd->t_state_lock, flags);
>  		tas = (se_cmd->transport_state & CMD_T_TAS);
> +		cmd_wait = se_cmd->cmd_wait_set;
>  		spin_unlock_irqrestore(&se_cmd->t_state_lock, flags);
> -
>  		if (!target_put_sess_cmd(se_cmd)) {
>  			if (tas)
>  				target_put_sess_cmd(se_cmd);
>  		}
> -
> +		if (!cmd_wait)
> +			continue;
>  		wait_for_completion(&se_cmd->cmd_wait_comp);
>  		pr_debug("After cmd_wait_comp: se_cmd: %p t_state: %d"
>  			" fabric state: %d\n", se_cmd, se_cmd->t_state,
>  			se_cmd->se_tfo->get_cmd_state(se_cmd));
> -
> +		se_cmd->cmd_wait_set = 0;
>  		se_cmd->se_tfo->release_cmd(se_cmd);
>  	}
>  

target_wait_for_sess_cmd() must always wait for se_cmd->cmd_wait_comp to
finish, because fabric drivers currently expect all outstanding
se_cmd->se_sess to complete before it's safe to release driver specific
session resources.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/6] target: fix potential race window in target_sess_cmd_list_waiting()
  2016-08-22  8:54 ` [PATCH 2/6] target: fix potential race window in target_sess_cmd_list_waiting() Hannes Reinecke
@ 2016-10-21  8:19   ` Nicholas A. Bellinger
  0 siblings, 0 replies; 11+ messages in thread
From: Nicholas A. Bellinger @ 2016-10-21  8:19 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: target-devel, linux-scsi, Martin Wilck, Hannes Reinecke

On Mon, 2016-08-22 at 10:54 +0200, Hannes Reinecke wrote:
> target_sess_cmd_list_waiting() might hit on a condition where
> the kref for the command is already 0, but the destructor has
> not been called yet (or is stuck in waiting for a spin lock).
> Rather than leaving the command on the list we should explicitly
> remove it to avoid race issues later on.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.com>
> ---
>  drivers/target/target_core_transport.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
> index 2e1a6d8..ce136f0 100644
> --- a/drivers/target/target_core_transport.c
> +++ b/drivers/target/target_core_transport.c
> @@ -2547,8 +2547,8 @@ int target_get_sess_cmd(struct se_cmd *se_cmd, bool ack_kref)
>  	 * fabric acknowledgement that requires two target_put_sess_cmd()
>  	 * invocations before se_cmd descriptor release.
>  	 */
> -	if (ack_kref)
> -		kref_get(&se_cmd->cmd_kref);
> +	if (ack_kref && !kref_get_unless_zero(&se_cmd->cmd_kref))
> +		return -EINVAL;
>  

Makes sense.

Applying the following version to target-pending/master atop pending
SCF_ACK_KREF regression bug-fix.

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 9ebbf94..ad6fb3f 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -2535,7 +2535,9 @@ int target_get_sess_cmd(struct se_cmd *se_cmd, bool ack_kref)
         * invocations before se_cmd descriptor release.
         */
        if (ack_kref) {
-               kref_get(&se_cmd->cmd_kref);
+               if (!kref_get_unless_zero(&se_cmd->cmd_kref))
+                       return -EINVAL;
+
                se_cmd->se_cmd_flags |= SCF_ACK_KREF;
        }
 
>  	spin_lock_irqsave(&se_sess->sess_cmd_lock, flags);
>  	if (se_sess->sess_tearing_down) {
> @@ -2627,7 +2627,7 @@ EXPORT_SYMBOL(target_put_sess_cmd);
>   */
>  void target_sess_cmd_list_set_waiting(struct se_session *se_sess)
>  {
> -	struct se_cmd *se_cmd;
> +	struct se_cmd *se_cmd, *tmp_cmd;
>  	unsigned long flags;
>  	int rc;
>  
> @@ -2639,7 +2639,8 @@ void target_sess_cmd_list_set_waiting(struct se_session *se_sess)
>  	se_sess->sess_tearing_down = 1;
>  	list_splice_init(&se_sess->sess_cmd_list, &se_sess->sess_wait_list);
>  
> -	list_for_each_entry(se_cmd, &se_sess->sess_wait_list, se_cmd_list) {
> +	list_for_each_entry_safe(se_cmd, tmp_cmd,
> +				 &se_sess->sess_wait_list, se_cmd_list) {
>  		rc = kref_get_unless_zero(&se_cmd->cmd_kref);
>  		if (rc) {
>  			spin_lock(&se_cmd->t_state_lock);
> @@ -2648,7 +2649,8 @@ void target_sess_cmd_list_set_waiting(struct se_session *se_sess)
>  				se_cmd->transport_state |= CMD_T_FABRIC_STOP;
>  			}
>  			spin_unlock(&se_cmd->t_state_lock);
> -		}
> +		} else
> +			list_del_init(&se_cmd->se_cmd_list);
>  	}
>  
>  	spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-10-21  8:19 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-22  8:54 [PATCH 0/6] tcm_fc fixes Hannes Reinecke
2016-08-22  8:54 ` [PATCH 1/6] target: fix hang in target_wait_for_sess_cmds() Hannes Reinecke
2016-10-21  8:00   ` Nicholas A. Bellinger
2016-08-22  8:54 ` [PATCH 2/6] target: fix potential race window in target_sess_cmd_list_waiting() Hannes Reinecke
2016-10-21  8:19   ` Nicholas A. Bellinger
2016-08-22  8:54 ` [PATCH 3/6] target/tcm_fc: print command pointer in debug message Hannes Reinecke
2016-08-22  8:54 ` [PATCH 4/6] target/tcm_fc: return detailed error in ft_sess_create() Hannes Reinecke
2016-08-22  8:54 ` [PATCH 5/6] target/tcm_fc: Update debugging statements to match libfc usage Hannes Reinecke
2016-08-23 18:07   ` Bart Van Assche
2016-08-22  8:54 ` [PATCH 6/6] target/tcm_fc: use CPU affinity for responses Hannes Reinecke
2016-10-21  7:46 ` [PATCH 0/6] tcm_fc fixes Nicholas A. Bellinger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.