* [PATCH 00/17] Make ABORT and LUN RESET handling synchronous @ 2018-09-17 21:35 Bart Van Assche 2018-09-17 21:35 ` Bart Van Assche ` (3 more replies) 0 siblings, 4 replies; 6+ messages in thread From: Bart Van Assche @ 2018-09-17 21:35 UTC (permalink / raw) To: target-devel Hello Martin, As you maybe know the LIO task management handling code is complicated because it is asynchronous compared to the command execution flow. This patch series makes task management function handling synchronous and thereby significantly simplifies the LIO code. Please consider these patches for kernel v4.20. Thanks, Bart. Bart Van Assche (17): target/core: Fix spelling in two source code comments target/core: Use the SECTOR_SHIFT constant target/core: Remove an unused data member from struct xcopy_pt_cmd target/core: Change the return type of transport_init_session from void to int target/core: Make sure that target_wait_for_sess_cmds() waits long enough target/core: Use sg_alloc_table() instead of open-coding it target/core: Use system workqueues for TMF target/core: Make compare_and_write_callback() accept NULL as third argument target/core: Always call transport_complete_callback() upon failure target/core: Make it possible to wait from more than one context for command completion target/core: Make ABORT and LUN RESET handling synchronous target/core: Reduce the amount of code executed with a spinlock held target/core: Avoid that LUN reset sporadically triggers data corruption target/qla2xxx: Simplify the code for handling aborted commands target/core: Remove the write_pending_status() callback function target/core: Remove several state tests from the TMF code target/core: Simplify the LUN RESET implementation Documentation/target/tcm_mod_builder.py | 8 - drivers/infiniband/ulp/srpt/ib_srpt.c | 9 - drivers/scsi/ibmvscsi_tgt/ibmvscsi_tgt.c | 6 - drivers/scsi/qla2xxx/tcm_qla2xxx.c | 31 -- drivers/target/iscsi/iscsi_target_configfs.c | 13 - drivers/target/loopback/tcm_loop.c | 6 - drivers/target/sbp/sbp_target.c | 6 - drivers/target/target_core_configfs.c | 4 - drivers/target/target_core_device.c | 16 - drivers/target/target_core_iblock.c | 4 +- drivers/target/target_core_iblock.h | 1 - drivers/target/target_core_internal.h | 2 - drivers/target/target_core_sbc.c | 20 +- drivers/target/target_core_tmr.c | 97 ++--- drivers/target/target_core_transport.c | 430 +++++++++---------- drivers/target/target_core_xcopy.c | 15 +- drivers/target/tcm_fc/tcm_fc.h | 1 - drivers/target/tcm_fc/tfc_cmd.c | 7 - drivers/target/tcm_fc/tfc_conf.c | 1 - drivers/usb/gadget/function/f_tcm.c | 9 - drivers/vhost/scsi.c | 6 - drivers/xen/xen-scsiback.c | 6 - include/target/target_core_base.h | 6 +- include/target/target_core_fabric.h | 4 +- 24 files changed, 272 insertions(+), 436 deletions(-) -- 2.18.0 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 13/17] target/core: Avoid that LUN reset sporadically triggers data corruption 2018-09-17 21:35 [PATCH 00/17] Make ABORT and LUN RESET handling synchronous Bart Van Assche @ 2018-09-17 21:35 ` Bart Van Assche 2018-10-01 5:03 ` [PATCH 00/17] Make ABORT and LUN RESET handling synchronous Bart Van Assche ` (2 subsequent siblings) 3 siblings, 0 replies; 6+ messages in thread From: Bart Van Assche @ 2018-09-17 21:35 UTC (permalink / raw) To: Martin K . Petersen Cc: Christoph Hellwig, target-devel, Bart Van Assche, Nicholas Bellinger, Mike Christie, Hannes Reinecke, stable If on an initiator system a LUN reset is issued while I/O is in progress with queue depth > 1, avoid that data corruption occurs as follows: - The initiator submits a READ (a). - The initiator submits a LUN reset before READ (a) completes. - The target responds that the LUN reset succeeded after READ (a) has been marked as CMD_T_COMPLETE and before .queue_status() has been called. - The initiator receives the LUN reset response and frees the tag used by READ (a). - The initiator submits READ (b) and reuses the tag of READ (a). - The initiator receives the response for READ (a) and interprets this as a completion for READ (b). - The initiator receives the completion for READ (b) and discards it. With the SRP initiator and target drivers and when running fio concurrently with sg_reset -d it only takes a few minutes to reproduce this. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Fixes: commit febe562c20df ("target: Fix LUN_RESET active I/O handling for ACK_KREF") Cc: Nicholas Bellinger <nab@linux-iscsi.org> Cc: Mike Christie <mchristi@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: <stable@vger.kernel.org> --- drivers/target/target_core_tmr.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c index 2750a2c7b563..6e419396c1e4 100644 --- a/drivers/target/target_core_tmr.c +++ b/drivers/target/target_core_tmr.c @@ -90,7 +90,7 @@ static int target_check_cdb_and_preempt(struct list_head *list, return 1; } -static bool __target_check_io_state(struct se_cmd *se_cmd, +static bool __target_check_io_state(struct se_cmd *se_cmd, u32 skip_flags, struct se_session *tmr_sess, int tas) { struct se_session *sess = se_cmd->se_sess; @@ -108,7 +108,7 @@ static bool __target_check_io_state(struct se_cmd *se_cmd, * long as se_cmd->cmd_kref is still active unless zero. */ spin_lock(&se_cmd->t_state_lock); - if (se_cmd->transport_state & (CMD_T_COMPLETE | CMD_T_FABRIC_STOP)) { + if (se_cmd->transport_state & (skip_flags | CMD_T_FABRIC_STOP)) { pr_debug("Attempted to abort io tag: %llu already complete or" " fabric stop, skipping\n", se_cmd->tag); spin_unlock(&se_cmd->t_state_lock); @@ -165,7 +165,8 @@ void core_tmr_abort_task( printk("ABORT_TASK: Found referenced %s task_tag: %llu\n", se_cmd->se_tfo->get_fabric_name(), ref_tag); - if (!__target_check_io_state(se_cmd, se_sess, 0)) + if (!__target_check_io_state(se_cmd, CMD_T_COMPLETE, se_sess, + 0)) continue; spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags); @@ -349,7 +350,7 @@ static void core_tmr_drain_state_list( continue; spin_lock(&sess->sess_cmd_lock); - rc = __target_check_io_state(cmd, tmr_sess, tas); + rc = __target_check_io_state(cmd, 0, tmr_sess, tas); spin_unlock(&sess->sess_cmd_lock); if (!rc) continue; -- 2.18.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 13/17] target/core: Avoid that LUN reset sporadically triggers data corruption @ 2018-09-17 21:35 ` Bart Van Assche 0 siblings, 0 replies; 6+ messages in thread From: Bart Van Assche @ 2018-09-17 21:35 UTC (permalink / raw) To: Martin K . Petersen Cc: Christoph Hellwig, target-devel, Bart Van Assche, Nicholas Bellinger, Mike Christie, Hannes Reinecke, stable If on an initiator system a LUN reset is issued while I/O is in progress with queue depth > 1, avoid that data corruption occurs as follows: - The initiator submits a READ (a). - The initiator submits a LUN reset before READ (a) completes. - The target responds that the LUN reset succeeded after READ (a) has been marked as CMD_T_COMPLETE and before .queue_status() has been called. - The initiator receives the LUN reset response and frees the tag used by READ (a). - The initiator submits READ (b) and reuses the tag of READ (a). - The initiator receives the response for READ (a) and interprets this as a completion for READ (b). - The initiator receives the completion for READ (b) and discards it. With the SRP initiator and target drivers and when running fio concurrently with sg_reset -d it only takes a few minutes to reproduce this. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Fixes: commit febe562c20df ("target: Fix LUN_RESET active I/O handling for ACK_KREF") Cc: Nicholas Bellinger <nab@linux-iscsi.org> Cc: Mike Christie <mchristi@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: <stable@vger.kernel.org> --- drivers/target/target_core_tmr.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/target/target_core_tmr.c b/drivers/target/target_core_tmr.c index 2750a2c7b563..6e419396c1e4 100644 --- a/drivers/target/target_core_tmr.c +++ b/drivers/target/target_core_tmr.c @@ -90,7 +90,7 @@ static int target_check_cdb_and_preempt(struct list_head *list, return 1; } -static bool __target_check_io_state(struct se_cmd *se_cmd, +static bool __target_check_io_state(struct se_cmd *se_cmd, u32 skip_flags, struct se_session *tmr_sess, int tas) { struct se_session *sess = se_cmd->se_sess; @@ -108,7 +108,7 @@ static bool __target_check_io_state(struct se_cmd *se_cmd, * long as se_cmd->cmd_kref is still active unless zero. */ spin_lock(&se_cmd->t_state_lock); - if (se_cmd->transport_state & (CMD_T_COMPLETE | CMD_T_FABRIC_STOP)) { + if (se_cmd->transport_state & (skip_flags | CMD_T_FABRIC_STOP)) { pr_debug("Attempted to abort io tag: %llu already complete or" " fabric stop, skipping\n", se_cmd->tag); spin_unlock(&se_cmd->t_state_lock); @@ -165,7 +165,8 @@ void core_tmr_abort_task( printk("ABORT_TASK: Found referenced %s task_tag: %llu\n", se_cmd->se_tfo->get_fabric_name(), ref_tag); - if (!__target_check_io_state(se_cmd, se_sess, 0)) + if (!__target_check_io_state(se_cmd, CMD_T_COMPLETE, se_sess, + 0)) continue; spin_unlock_irqrestore(&se_sess->sess_cmd_lock, flags); @@ -349,7 +350,7 @@ static void core_tmr_drain_state_list( continue; spin_lock(&sess->sess_cmd_lock); - rc = __target_check_io_state(cmd, tmr_sess, tas); + rc = __target_check_io_state(cmd, 0, tmr_sess, tas); spin_unlock(&sess->sess_cmd_lock); if (!rc) continue; -- 2.18.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 00/17] Make ABORT and LUN RESET handling synchronous 2018-09-17 21:35 [PATCH 00/17] Make ABORT and LUN RESET handling synchronous Bart Van Assche 2018-09-17 21:35 ` Bart Van Assche @ 2018-10-01 5:03 ` Bart Van Assche 2018-10-01 16:08 ` Mike Christie 2018-10-06 21:48 ` Nicholas A. Bellinger 3 siblings, 0 replies; 6+ messages in thread From: Bart Van Assche @ 2018-10-01 5:03 UTC (permalink / raw) To: target-devel On 9/17/18 2:35 PM, Bart Van Assche wrote: > As you maybe know the LIO task management handling code is complicated > because it is asynchronous compared to the command execution flow. This > patch series makes task management function handling synchronous and > thereby significantly simplifies the LIO code. Please consider these > patches for kernel v4.20. Does anyone want to comment on this patch series? Thanks, Bart. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 00/17] Make ABORT and LUN RESET handling synchronous 2018-09-17 21:35 [PATCH 00/17] Make ABORT and LUN RESET handling synchronous Bart Van Assche 2018-09-17 21:35 ` Bart Van Assche 2018-10-01 5:03 ` [PATCH 00/17] Make ABORT and LUN RESET handling synchronous Bart Van Assche @ 2018-10-01 16:08 ` Mike Christie 2018-10-06 21:48 ` Nicholas A. Bellinger 3 siblings, 0 replies; 6+ messages in thread From: Mike Christie @ 2018-10-01 16:08 UTC (permalink / raw) To: target-devel On 10/01/2018 12:03 AM, Bart Van Assche wrote: > On 9/17/18 2:35 PM, Bart Van Assche wrote: >> As you maybe know the LIO task management handling code is complicated >> because it is asynchronous compared to the command execution flow. This >> patch series makes task management function handling synchronous and >> thereby significantly simplifies the LIO code. Please consider these >> patches for kernel v4.20. > > Does anyone want to comment on this patch series? > Sorry for the delay Bart. I am at patch 9. I will try to finish the set by the beginning of next week. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 00/17] Make ABORT and LUN RESET handling synchronous 2018-09-17 21:35 [PATCH 00/17] Make ABORT and LUN RESET handling synchronous Bart Van Assche ` (2 preceding siblings ...) 2018-10-01 16:08 ` Mike Christie @ 2018-10-06 21:48 ` Nicholas A. Bellinger 3 siblings, 0 replies; 6+ messages in thread From: Nicholas A. Bellinger @ 2018-10-06 21:48 UTC (permalink / raw) To: target-devel Hi Bart & Co, Well, it's certainly been a while. This is the type of patch series that will draw me out of self-imposed retirement, every single time.. Of course, I'm referring to: https://marc.info/?l=target-devel&m\x153722310800840&w=2 So the time has come once again to go over this series with a fine toothed comb. As per usual, the main areas of focus for review are: 1) Is there a concrete use-case..? Improvements that significantly alter how multiple fabric drivers and/or multiple backend drivers interact with target-core require a concrete use-case. Many improvements are obvious because they effect customers and users in the field, but making significant changes that effect the entire LIO ecosystem do require a real-world use-case. Making large functional changes under of guise of 'style' or 'making it easier to read' is not a good enough reason on it's own. 2) Does it negativity effect scale, performance, or latency..? As-is, there are a number vendors (including my own startup) that ship platforms utilizing upstream LIO with hard requirements for ~1K unique backends (/sys/kernel/config/target/core/$HBA/$DEV/) plus ~1k unique target endpoints (/sys/kernel/config/target/$FABRIC/$WWN/) plus ~1k unique tenants on a single physical machine. Because of this long standing requirement for scale + performance going back to the earliest days of LIO + configfs, changes that introduce global locking, global variables, global work-queues, and/or unnecessary blocking + synchronization are a complete non-starter. 3) Have comments from previous patches been addressed..? Looking at the history, I'm commented on this series in detail at least *seven* times over the last three years. Every time this series ending up not getting merged because my questions and concerns where either dismissed, or outright ignored. So at some point, I got tired of putting in effort that was not reciprocated at the other end. That all said, you still get the benefit of doubt on this series. However, for those specific patches that have already been reviewed and commented on in detail numerous times, you will need to answer to those questions. I'll add them as references inline as necessary. Thank you. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-10-06 21:48 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-09-17 21:35 [PATCH 00/17] Make ABORT and LUN RESET handling synchronous Bart Van Assche 2018-09-17 21:35 ` [PATCH 13/17] target/core: Avoid that LUN reset sporadically triggers data corruption Bart Van Assche 2018-09-17 21:35 ` Bart Van Assche 2018-10-01 5:03 ` [PATCH 00/17] Make ABORT and LUN RESET handling synchronous Bart Van Assche 2018-10-01 16:08 ` Mike Christie 2018-10-06 21:48 ` Nicholas A. Bellinger
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.