iSCSI Abort Task and WRITE PENDING

From: Konstantin Shelekhin <k.shelekhin@yadro.com>
To: <target-devel@vger.kernel.org>
Cc: <linux@yadro.com>
Subject: iSCSI Abort Task and WRITE PENDING
Date: Wed, 13 Oct 2021 16:21:50 +0300	[thread overview]
Message-ID: <YWbdbh1w1Eiw82Zr@yadro.com> (raw)

Hi,

I really need the collective wisdom.

Not long ago we've uncovered the problem with iSCSI and ABORT TASK
handling. Currently it's not possible to abort a WRITE_10 command in
TRANSPORT_WRITE_PENDING state, because ABORT TASK  will hang itself in
the process:

  # dmesg | tail -2
  [   83.563505] ABORT_TASK: Found referenced iSCSI task_tag: 3372979269
  [   84.593545] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus <nexus>

  # ps aux | awk '$8 ~/D/'
  root        32  0.0  0.0      0     0 ?        D    15:19   0:00 [kworker/0:1+events]
  root      1187  0.0  0.0      0     0 ?        D    15:20   0:00 [iscsi_ttx]

  # cat /proc/32/stack
  [<0>] target_put_cmd_and_wait+0x68/0xa0
  [<0>] core_tmr_abort_task.cold+0x16b/0x192
  [<0>] target_tmr_work+0x9e/0xe0
  [<0>] process_one_work+0x1d4/0x370
  [<0>] worker_thread+0x48/0x3d0
  [<0>] kthread+0x122/0x140
  [<0>] ret_from_fork+0x22/0x30

  # cat /proc/1187/stack
  [<0>] __transport_wait_for_tasks+0xaf/0x100
  [<0>] transport_generic_free_cmd+0xe9/0x180
  [<0>] iscsit_free_cmd+0x50/0xb0
  [<0>] iscsit_close_connection+0x47d/0x8c0
  [<0>] iscsit_take_action_for_connection_exit+0x6f/0xf0
  [<0>] iscsi_target_tx_thread+0x184/0x200
  [<0>] kthread+0x122/0x140
  [<0>] ret_from_fork+0x22/0x30

What happens:

  1. Initiator sends WRITE_10 CDB
  2. Target parses the CDB and sends R2T
  3. Target starts the Data-Out timer
  4. Initiator sends ABORT TASK; no new data from Initiator after this
  5. Target starts aborting WRITE_10, gets into core_tmr_abort_task()
     and starts waiting for the request completion
  6. Nothing happens
  7. The Data-Out timers expires, connection teardown starts and gets
     stuck waiting for ABORT TASK that waits for WRITE_10

The ABORT TASK processing looks roughly like this:

  iscsi_rx_opcode
    iscsi_handle_task_mgt_cmd
      iscsi_tmr_abort_task
      transport_generic_handle_tmr
        if (tmr_cmd->transport_state & CMD_T_ABORTED)
          target_handle_abort
        else
          target_tmr_work
            if (tmr_cmd->transport_state & CMD_T_ABORTED)
              target_handle_abort
            else
              core_tmr_abort_task
                ret = __target_check_io_state
                  if (write_cmd->transport_state & CMD_T_STOP)
                    return -1
                  write_cmd->transport_state |= CMD_T_ABORTED
                  return 0
                if (!ret)
                  list_move_tail(&write_cmd->state_list, &aborted)
                  target_put_cmd_and_wait(&write_cmd)

As I see it, the main problem is that the abort path can't initiate the
command termination, it simply waits for the request to handle this on
the execution path like in target_execute_cmd():

  target_execute_cmd
    target_cmd_interrupted
      INIT_WORK(&cmd->work, target_abort_work)

However, in this case the request is not going to be executed because
Initiator will not send the Data-Out buffer.

I have a couple of ideas on how to fix this, but they all look kinda
ugly. The one that currently works around this for me:

  core_tmr_abort_task():

    [...]

    spin_lock_irqsave(&se_cmd->t_state_lock, flags);
    write_pending = se_cmd->t_state == TRANSPORT_WRITE_PENDING;
    spin_unlock_irqrestore(&se_cmd->t_state_lock, flags);

    if (write_pending && se_cmd->se_tfo->abort_write_pending)
            se_cmd->se_tfo->abort_write_pending(se_cmd);

    target_put_cmd_and_wait(se_cmd);

    [...]

The new method abort_write_pending() is defined only for iSCSI and calls
target_handle_abort(). However, this opens up another can of worms
because this code heavily races with R2T sending and requires a couple
of checks to "work most of the time". Not ideal, by far.

I can make this one better by introducing R2T list draining that ensures
the proper order during cleanup, but maybe there is a much easier way
that I'm not seeing here.

The tempting idea to wait for the Data-Out timer to time out and then
abort the request is not going to work either, because if the WRITE_10
processing lags and WRITE_10 is marked as aborted before R2Ts are sent,
ABORT TASK will still be stuck, but the Data-Out timer won't even be
started. Also the timeout upper limit is 60 seconds, so...

I really need some advice on how to deal with this properly.

P.S. This case is easily reproduced by libiscsi with some minor
modifications:

  $ git clone https://github.com/sahlberg/libiscsi.git
  $ cd libiscsi
  $ cat <<-EOF | patch -p 1
	diff --git a/test-tool/test_async_abort_simple.c b/test-tool/test_async_abort_simple.c
	index 82abb9f..70fd5ea 100644
	--- a/test-tool/test_async_abort_simple.c
	+++ b/test-tool/test_async_abort_simple.c
	@@ -92,7 +92,7 @@ test_async_abort_simple(void)
	 {
	 	int ret;
	 	struct tests_async_abort_state state = { NULL, 0, 0, 0, 0 };
	-	int blocks_per_io = 8;
	+	int blocks_per_io = 32678;
	 	unsigned char *buf;
	 	uint64_t timeout_sec;

	@@ -146,6 +146,8 @@ test_async_abort_simple(void)
	 	}
	 	logging(LOG_VERBOSE, "dispatched");

	+	sleep(2);
	+
	 	/*
	 	 * queue abort - shouldn't cancel the dispatched task. TMF req should
	 	 * be sent to the target.
  EOF
  $ ./autogen.sh
  $ ./configure --enable-test-tool
  $ ./test-tool/iscsi-test-cu -V --dataloss --test iSCSI.iSCSITMF.AbortTaskSimpleAsync \
  	-i <initiator-iqn> iscsi://<portal>/<target-iqn>/<lun>