All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/17] Make ABORT and LUN RESET handling synchronous
@ 2018-09-17 21:35 Bart Van Assche
  2018-09-17 21:35   ` Bart Van Assche
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Bart Van Assche @ 2018-09-17 21:35 UTC (permalink / raw)
  To: target-devel

Hello Martin,

As you maybe know the LIO task management handling code is complicated
because it is asynchronous compared to the command execution flow. This
patch series makes task management function handling synchronous and
thereby significantly simplifies the LIO code. Please consider these
patches for kernel v4.20.

Thanks,

Bart.

Bart Van Assche (17):
  target/core: Fix spelling in two source code comments
  target/core: Use the SECTOR_SHIFT constant
  target/core: Remove an unused data member from struct xcopy_pt_cmd
  target/core: Change the return type of transport_init_session from
    void to int
  target/core: Make sure that target_wait_for_sess_cmds() waits long
    enough
  target/core: Use sg_alloc_table() instead of open-coding it
  target/core: Use system workqueues for TMF
  target/core: Make compare_and_write_callback() accept NULL as third
    argument
  target/core: Always call transport_complete_callback() upon failure
  target/core: Make it possible to wait from more than one context for
    command completion
  target/core: Make ABORT and LUN RESET handling synchronous
  target/core: Reduce the amount of code executed with a spinlock held
  target/core: Avoid that LUN reset sporadically triggers data
    corruption
  target/qla2xxx: Simplify the code for handling aborted commands
  target/core: Remove the write_pending_status() callback function
  target/core: Remove several state tests from the TMF code
  target/core: Simplify the LUN RESET implementation

 Documentation/target/tcm_mod_builder.py      |   8 -
 drivers/infiniband/ulp/srpt/ib_srpt.c        |   9 -
 drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c     |   6 -
 drivers/scsi/qla2xxx/tcm_qla2xxx.c           |  31 --
 drivers/target/iscsi/iscsi_target_configfs.c |  13 -
 drivers/target/loopback/tcm_loop.c           |   6 -
 drivers/target/sbp/sbp_target.c              |   6 -
 drivers/target/target_core_configfs.c        |   4 -
 drivers/target/target_core_device.c          |  16 -
 drivers/target/target_core_iblock.c          |   4 +-
 drivers/target/target_core_iblock.h          |   1 -
 drivers/target/target_core_internal.h        |   2 -
 drivers/target/target_core_sbc.c             |  20 +-
 drivers/target/target_core_tmr.c             |  97 ++---
 drivers/target/target_core_transport.c       | 430 +++++++++----------
 drivers/target/target_core_xcopy.c           |  15 +-
 drivers/target/tcm_fc/tcm_fc.h               |   1 -
 drivers/target/tcm_fc/tfc_cmd.c              |   7 -
 drivers/target/tcm_fc/tfc_conf.c             |   1 -
 drivers/usb/gadget/function/f_tcm.c          |   9 -
 drivers/vhost/scsi.c                         |   6 -
 drivers/xen/xen-scsiback.c                   |   6 -
 include/target/target_core_base.h            |   6 +-
 include/target/target_core_fabric.h          |   4 +-
 24 files changed, 272 insertions(+), 436 deletions(-)

-- 
2.18.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 13/17] target/core: Avoid that LUN reset sporadically triggers data corruption
  2018-09-17 21:35 [PATCH 00/17] Make ABORT and LUN RESET handling synchronous Bart Van Assche
@ 2018-09-17 21:35   ` Bart Van Assche
  2018-10-01  5:03 ` [PATCH 00/17] Make ABORT and LUN RESET handling synchronous Bart Van Assche
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2018-09-17 21:35 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: Christoph Hellwig, target-devel, Bart Van Assche,
	Nicholas Bellinger, Mike Christie, Hannes Reinecke, stable

If on an initiator system a LUN reset is issued while I/O is in
progress with queue depth > 1, avoid that data corruption occurs
as follows:
- The initiator submits a READ (a).
- The initiator submits a LUN reset before READ (a) completes.
- The target responds that the LUN reset succeeded after READ (a)
  has been marked as CMD_T_COMPLETE and before .queue_status() has
  been called.
- The initiator receives the LUN reset response and frees the
  tag used by READ (a).
- The initiator submits READ (b) and reuses the tag of READ (a).
- The initiator receives the response for READ (a) and interprets
  this as a completion for READ (b).
- The initiator receives the completion for READ (b) and discards
  it.

With the SRP initiator and target drivers and when running fio
concurrently with sg_reset -d it only takes a few minutes to
reproduce this.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Fixes: commit febe562c20df ("target: Fix LUN_RESET active I/O handling for ACK_KREF")
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: <stable@vger.kernel.org>
---
 drivers/target/target_core_tmr.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
index 2750a2c7b563..6e419396c1e4 100644
--- a/drivers/target/target_core_tmr.c
+++ b/drivers/target/target_core_tmr.c
@@ -90,7 +90,7 @@ static int target_check_cdb_and_preempt(struct list_head *list,
 	return 1;
 }
 
-static bool __target_check_io_state(struct se_cmd *se_cmd,
+static bool __target_check_io_state(struct se_cmd *se_cmd, u32 skip_flags,
 				    struct se_session *tmr_sess, int tas)
 {
 	struct se_session *sess = se_cmd->se_sess;
@@ -108,7 +108,7 @@ static bool __target_check_io_state(struct se_cmd *se_cmd,
 	 * long as se_cmd->cmd_kref is still active unless zero.
 	 */
 	spin_lock(&se_cmd->t_state_lock);
-	if (se_cmd->transport_state & (CMD_T_COMPLETE | CMD_T_FABRIC_STOP)) {
+	if (se_cmd->transport_state & (skip_flags | CMD_T_FABRIC_STOP)) {
 		pr_debug("Attempted to abort io tag: %llu already complete or"
 			" fabric stop, skipping\n", se_cmd->tag);
 		spin_unlock(&se_cmd->t_state_lock);
@@ -165,7 +165,8 @@ void core_tmr_abort_task(
 		printk("ABORT_TASK: Found referenced %s task_tag: %llu\n",
 			se_cmd->se_tfo->get_fabric_name(), ref_tag);
 
-		if (!__target_check_io_state(se_cmd, se_sess, 0))
+		if (!__target_check_io_state(se_cmd, CMD_T_COMPLETE, se_sess,
+					     0))
 			continue;
 
 		spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
@@ -349,7 +350,7 @@ static void core_tmr_drain_state_list(
 			continue;
 
 		spin_lock(&sess->sess_cmd_lock);
-		rc = __target_check_io_state(cmd, tmr_sess, tas);
+		rc = __target_check_io_state(cmd, 0, tmr_sess, tas);
 		spin_unlock(&sess->sess_cmd_lock);
 		if (!rc)
 			continue;
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 13/17] target/core: Avoid that LUN reset sporadically triggers data corruption
@ 2018-09-17 21:35   ` Bart Van Assche
  0 siblings, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2018-09-17 21:35 UTC (permalink / raw)
  To: Martin K . Petersen
  Cc: Christoph Hellwig, target-devel, Bart Van Assche,
	Nicholas Bellinger, Mike Christie, Hannes Reinecke, stable

If on an initiator system a LUN reset is issued while I/O is in
progress with queue depth > 1, avoid that data corruption occurs
as follows:
- The initiator submits a READ (a).
- The initiator submits a LUN reset before READ (a) completes.
- The target responds that the LUN reset succeeded after READ (a)
  has been marked as CMD_T_COMPLETE and before .queue_status() has
  been called.
- The initiator receives the LUN reset response and frees the
  tag used by READ (a).
- The initiator submits READ (b) and reuses the tag of READ (a).
- The initiator receives the response for READ (a) and interprets
  this as a completion for READ (b).
- The initiator receives the completion for READ (b) and discards
  it.

With the SRP initiator and target drivers and when running fio
concurrently with sg_reset -d it only takes a few minutes to
reproduce this.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Fixes: commit febe562c20df ("target: Fix LUN_RESET active I/O handling for ACK_KREF")
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: <stable@vger.kernel.org>
---
 drivers/target/target_core_tmr.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c
index 2750a2c7b563..6e419396c1e4 100644
--- a/drivers/target/target_core_tmr.c
+++ b/drivers/target/target_core_tmr.c
@@ -90,7 +90,7 @@ static int target_check_cdb_and_preempt(struct list_head *list,
 	return 1;
 }
 
-static bool __target_check_io_state(struct se_cmd *se_cmd,
+static bool __target_check_io_state(struct se_cmd *se_cmd, u32 skip_flags,
 				    struct se_session *tmr_sess, int tas)
 {
 	struct se_session *sess = se_cmd->se_sess;
@@ -108,7 +108,7 @@ static bool __target_check_io_state(struct se_cmd *se_cmd,
 	 * long as se_cmd->cmd_kref is still active unless zero.
 	 */
 	spin_lock(&se_cmd->t_state_lock);
-	if (se_cmd->transport_state & (CMD_T_COMPLETE | CMD_T_FABRIC_STOP)) {
+	if (se_cmd->transport_state & (skip_flags | CMD_T_FABRIC_STOP)) {
 		pr_debug("Attempted to abort io tag: %llu already complete or"
 			" fabric stop, skipping\n", se_cmd->tag);
 		spin_unlock(&se_cmd->t_state_lock);
@@ -165,7 +165,8 @@ void core_tmr_abort_task(
 		printk("ABORT_TASK: Found referenced %s task_tag: %llu\n",
 			se_cmd->se_tfo->get_fabric_name(), ref_tag);
 
-		if (!__target_check_io_state(se_cmd, se_sess, 0))
+		if (!__target_check_io_state(se_cmd, CMD_T_COMPLETE, se_sess,
+					     0))
 			continue;
 
 		spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags);
@@ -349,7 +350,7 @@ static void core_tmr_drain_state_list(
 			continue;
 
 		spin_lock(&sess->sess_cmd_lock);
-		rc = __target_check_io_state(cmd, tmr_sess, tas);
+		rc = __target_check_io_state(cmd, 0, tmr_sess, tas);
 		spin_unlock(&sess->sess_cmd_lock);
 		if (!rc)
 			continue;
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 00/17] Make ABORT and LUN RESET handling synchronous
  2018-09-17 21:35 [PATCH 00/17] Make ABORT and LUN RESET handling synchronous Bart Van Assche
  2018-09-17 21:35   ` Bart Van Assche
@ 2018-10-01  5:03 ` Bart Van Assche
  2018-10-01 16:08 ` Mike Christie
  2018-10-06 21:48 ` Nicholas A. Bellinger
  3 siblings, 0 replies; 6+ messages in thread
From: Bart Van Assche @ 2018-10-01  5:03 UTC (permalink / raw)
  To: target-devel

On 9/17/18 2:35 PM, Bart Van Assche wrote:
> As you maybe know the LIO task management handling code is complicated
> because it is asynchronous compared to the command execution flow. This
> patch series makes task management function handling synchronous and
> thereby significantly simplifies the LIO code. Please consider these
> patches for kernel v4.20.

Does anyone want to comment on this patch series?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 00/17] Make ABORT and LUN RESET handling synchronous
  2018-09-17 21:35 [PATCH 00/17] Make ABORT and LUN RESET handling synchronous Bart Van Assche
  2018-09-17 21:35   ` Bart Van Assche
  2018-10-01  5:03 ` [PATCH 00/17] Make ABORT and LUN RESET handling synchronous Bart Van Assche
@ 2018-10-01 16:08 ` Mike Christie
  2018-10-06 21:48 ` Nicholas A. Bellinger
  3 siblings, 0 replies; 6+ messages in thread
From: Mike Christie @ 2018-10-01 16:08 UTC (permalink / raw)
  To: target-devel

On 10/01/2018 12:03 AM, Bart Van Assche wrote:
> On 9/17/18 2:35 PM, Bart Van Assche wrote:
>> As you maybe know the LIO task management handling code is complicated
>> because it is asynchronous compared to the command execution flow. This
>> patch series makes task management function handling synchronous and
>> thereby significantly simplifies the LIO code. Please consider these
>> patches for kernel v4.20.
> 
> Does anyone want to comment on this patch series?
> 

Sorry for the delay Bart. I am at patch 9. I will try to finish the set
by the beginning of next week.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 00/17] Make ABORT and LUN RESET handling synchronous
  2018-09-17 21:35 [PATCH 00/17] Make ABORT and LUN RESET handling synchronous Bart Van Assche
                   ` (2 preceding siblings ...)
  2018-10-01 16:08 ` Mike Christie
@ 2018-10-06 21:48 ` Nicholas A. Bellinger
  3 siblings, 0 replies; 6+ messages in thread
From: Nicholas A. Bellinger @ 2018-10-06 21:48 UTC (permalink / raw)
  To: target-devel

Hi Bart & Co,

Well, it's certainly been a while.  This is the type of patch series
that will draw me out of self-imposed retirement, every single time..

Of course, I'm referring to:

https://marc.info/?l=target-devel&m\x153722310800840&w=2

So the time has come once again to go over this series with a fine
toothed comb.

As per usual, the main areas of focus for review are:

1) Is there a concrete use-case..?

Improvements that significantly alter how multiple fabric drivers and/or
multiple backend drivers interact with target-core require a concrete
use-case.  

Many improvements are obvious because they effect customers and users in
the field, but making significant changes that effect the entire LIO
ecosystem do require a real-world use-case.  Making large functional
changes under of guise of 'style' or 'making it easier to read' is not a
good enough reason on it's own.

2) Does it negativity effect scale, performance, or latency..?

As-is, there are a number vendors (including my own startup) that ship
platforms utilizing upstream LIO with hard requirements for ~1K unique
backends (/sys/kernel/config/target/core/$HBA/$DEV/) plus ~1k unique
target endpoints (/sys/kernel/config/target/$FABRIC/$WWN/) plus ~1k
unique tenants on a single physical machine.

Because of this long standing requirement for scale + performance going
back to the earliest days of LIO + configfs, changes that introduce
global locking, global variables, global work-queues, and/or unnecessary
blocking + synchronization are a complete non-starter.

3) Have comments from previous patches been addressed..?

Looking at the history, I'm commented on this series in detail at least
*seven* times over the last three years.

Every time this series ending up not getting merged because my questions
and concerns where either dismissed, or outright ignored.  So at some
point, I got tired of putting in effort that was not reciprocated at the
other end.

That all said, you still get the benefit of doubt on this series.
However, for those specific patches that have already been reviewed and
commented on in detail numerous times, you will need to answer to those
questions.  I'll add them as references inline as necessary.

Thank you.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-10-06 21:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-17 21:35 [PATCH 00/17] Make ABORT and LUN RESET handling synchronous Bart Van Assche
2018-09-17 21:35 ` [PATCH 13/17] target/core: Avoid that LUN reset sporadically triggers data corruption Bart Van Assche
2018-09-17 21:35   ` Bart Van Assche
2018-10-01  5:03 ` [PATCH 00/17] Make ABORT and LUN RESET handling synchronous Bart Van Assche
2018-10-01 16:08 ` Mike Christie
2018-10-06 21:48 ` Nicholas A. Bellinger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.