All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser
       [not found]     ` <1384217317.12281.26.camel@haakon3.risingtidesystems.com>
@ 2013-11-12  1:31       ` Nicholas A. Bellinger
  2013-11-12  2:16         ` Nicholas A. Bellinger
  2013-11-12  2:32         ` Michael Christie
  0 siblings, 2 replies; 18+ messages in thread
From: Nicholas A. Bellinger @ 2013-11-12  1:31 UTC (permalink / raw)
  To: Moussa Ba (moussaba)
  Cc: target-devel, Nicholas Bellinger, Or Gerlitz, Mike Christie, linux-scsi

On Mon, 2013-11-11 at 16:48 -0800, Nicholas A. Bellinger wrote:
> On Mon, 2013-11-11 at 16:42 -0800, Nicholas A. Bellinger wrote:
> > On Mon, 2013-11-11 at 13:17 -0800, Nicholas A. Bellinger wrote:
> > > Hi Moussa & Co,
> > > 
> > > On Mon, 2013-11-11 at 19:05 +0000, Moussa Ba (moussaba) wrote:
> > > > My system setup is as follows:
> > > > 
> > > > Target: CentOS 6.4 - LIO Target running 3.12 kernel, 8 PCIe SSD in one volume group, 8 logical volumes, 8 targets, 1 LUN/target.
> > > > Initiator: CentOS 6.4 running 3.11 Kernel (Also ran 2.6.32-358), ISER based initiator over Mellanox 40Gbit ConnectX HCA
> > > > 
> > > > When running performance tests on the initiator, I am running into
> > > fio timeouts that lead to ABORT_TASK commands on the target.  In other
> > > words, fio fails to "reap" all the io threads almost as if it is
> > > waiting for lost IOs to complete. This is happening on random write IO
> > > operations.  Some context, we are generating about 576KIOPS 4KB block
> > > sizes using 8 LUNS.  The PCIe SSD have a write buffer that can absorb
> > > writes with an end to end latency on the initiator of 44us.  We are
> > > not currently seeing any errors on read IOs, which tend to have 2X+
> > > the latency of writes.  
> > > > 
> > > > See below for the dmesg on the target side.
> > > > Timeout Condition occurs at 154 which is the Protocol Error after fio is interrupted or runs to completion.  
> > > > [  154.453663] Received CmdSN: 0x000fcbb7 is greater than MaxCmdSN: 0x000fcbb6, protocol error.
> > > > [  154.453673] Received CmdSN: 0x000fcbb8 is greater than MaxCmdSN: 0x000fcbb6, protocol error.
> > > > 
> > > 
> > > (CC'ing Mike)
> > > 
> > > As mentioned off-list, this would tend to indicate some manner of
> > > open-iscsi bug, as it's never legal for an initiator to send a CmdSN
> > > greater than the MaxCmdSN that's currently being sent in target outgoing
> > > response PDUs.
> > > 
> > > Mike, any idea as to how this might be happening on the initiator
> > > side..?
> > > 
> > 

<SNIP>

Mmmmm, just noticed a bit of very suspicious logic in open-iscsi.
(CC'ing linux-scsi)

Namely, how drivers/scsi/libiscsi.c:iscsi_data_xmit() attempts to drain
conn->cmdqueue and hence call iscsi_prep_scsi_cmd_pdu() + increment
session->cmdsn, without ever checking to see if the CmdSN window may
have already closed..

The only CmdSN window check that I see in the I/O dispatch codepath is
in iscsi_queuecommand() -> iscsi_check_cmdsn_window_closed(), but once
iscsi_conn_queue_work() is invoked and process context in
iscsi_xmitworker() -> iscsi_data_xmit() starts to execute, there is
AFAICT no further CmdSN window open checks to ensure the initiator does
not overflow MaxCmdSN..

This would very much line up with the type of bug that is being
reported.  Before trying to determine what a possible fix might look
like, can you try the following libiscsi patch to see if your able to
hit either of the BUG's below..?

Thanks,

--nab

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index e399561..9106f63 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1483,6 +1483,12 @@ check_mgmt:
                        fail_scsi_task(conn->task, DID_IMM_RETRY);
                        continue;
                }
+
+               if (iscsi_check_cmdsn_window_closed(conn)) {
+                       printk("CmdSN window already closed in iscsi_data_xmit #1\n");
+                       BUG();
+               }
+
                rc = iscsi_prep_scsi_cmd_pdu(conn->task);
                if (rc) {
                        if (rc == -ENOMEM || rc == -EACCES) {
@@ -1518,6 +1524,11 @@ check_mgmt:
                if (iscsi_check_tmf_restrictions(task, ISCSI_OP_SCSI_DATA_OUT))
                        break;
 
+               if (iscsi_check_cmdsn_window_closed(conn)) {
+                       printk("CmdSN window already closed in iscsi_data_xmit #2\n");
+                       BUG();
+               }
+
                conn->task = task;
                list_del_init(&conn->task->running);
                conn->task->state = ISCSI_TASK_RUNNING;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser
  2013-11-12  1:31       ` CmdSN greather than MaxCmdSN protocol error in LIO Iser Nicholas A. Bellinger
@ 2013-11-12  2:16         ` Nicholas A. Bellinger
  2013-11-12  2:32         ` Michael Christie
  1 sibling, 0 replies; 18+ messages in thread
From: Nicholas A. Bellinger @ 2013-11-12  2:16 UTC (permalink / raw)
  To: Moussa Ba (moussaba)
  Cc: target-devel, Nicholas Bellinger, Or Gerlitz, Mike Christie, linux-scsi

On Mon, 2013-11-11 at 17:31 -0800, Nicholas A. Bellinger wrote:
> On Mon, 2013-11-11 at 16:48 -0800, Nicholas A. Bellinger wrote:
> > On Mon, 2013-11-11 at 16:42 -0800, Nicholas A. Bellinger wrote:
> > > On Mon, 2013-11-11 at 13:17 -0800, Nicholas A. Bellinger wrote:
> > > > Hi Moussa & Co,
> > > > 
> > > > On Mon, 2013-11-11 at 19:05 +0000, Moussa Ba (moussaba) wrote:
> > > > > My system setup is as follows:
> > > > > 
> > > > > Target: CentOS 6.4 - LIO Target running 3.12 kernel, 8 PCIe SSD in one volume group, 8 logical volumes, 8 targets, 1 LUN/target.
> > > > > Initiator: CentOS 6.4 running 3.11 Kernel (Also ran 2.6.32-358), ISER based initiator over Mellanox 40Gbit ConnectX HCA
> > > > > 
> > > > > When running performance tests on the initiator, I am running into
> > > > fio timeouts that lead to ABORT_TASK commands on the target.  In other
> > > > words, fio fails to "reap" all the io threads almost as if it is
> > > > waiting for lost IOs to complete. This is happening on random write IO
> > > > operations.  Some context, we are generating about 576KIOPS 4KB block
> > > > sizes using 8 LUNS.  The PCIe SSD have a write buffer that can absorb
> > > > writes with an end to end latency on the initiator of 44us.  We are
> > > > not currently seeing any errors on read IOs, which tend to have 2X+
> > > > the latency of writes.  
> > > > > 
> > > > > See below for the dmesg on the target side.
> > > > > Timeout Condition occurs at 154 which is the Protocol Error after fio is interrupted or runs to completion.  
> > > > > [  154.453663] Received CmdSN: 0x000fcbb7 is greater than MaxCmdSN: 0x000fcbb6, protocol error.
> > > > > [  154.453673] Received CmdSN: 0x000fcbb8 is greater than MaxCmdSN: 0x000fcbb6, protocol error.
> > > > > 
> > > > 
> > > > (CC'ing Mike)
> > > > 
> > > > As mentioned off-list, this would tend to indicate some manner of
> > > > open-iscsi bug, as it's never legal for an initiator to send a CmdSN
> > > > greater than the MaxCmdSN that's currently being sent in target outgoing
> > > > response PDUs.
> > > > 
> > > > Mike, any idea as to how this might be happening on the initiator
> > > > side..?
> > > > 
> > > 
> 
> <SNIP>
> 
> Mmmmm, just noticed a bit of very suspicious logic in open-iscsi.
> (CC'ing linux-scsi)
> 
> Namely, how drivers/scsi/libiscsi.c:iscsi_data_xmit() attempts to drain
> conn->cmdqueue and hence call iscsi_prep_scsi_cmd_pdu() + increment
> session->cmdsn, without ever checking to see if the CmdSN window may
> have already closed..
> 
> The only CmdSN window check that I see in the I/O dispatch codepath is
> in iscsi_queuecommand() -> iscsi_check_cmdsn_window_closed(), but once
> iscsi_conn_queue_work() is invoked and process context in
> iscsi_xmitworker() -> iscsi_data_xmit() starts to execute, there is
> AFAICT no further CmdSN window open checks to ensure the initiator does
> not overflow MaxCmdSN..
> 
> This would very much line up with the type of bug that is being
> reported.  Before trying to determine what a possible fix might look
> like, can you try the following libiscsi patch to see if your able to
> hit either of the BUG's below..?
> 

One more time with a different CmdSN check against the currently
allocated task->cmdsn in iscsi_data_xmit(), as the previous patch using
iscsi_check_cmdsn_window_closed() only checks against the value in
session->queued_cmdsn, that will have already been incremented before
returning from iscsi_queuecommand().

--nab

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index e399561..760f2cf 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1494,6 +1494,11 @@ check_mgmt:
                                fail_scsi_task(conn->task, DID_ABORT);
                        continue;
                }
+               if (!iscsi_sna_lte(conn->task->cmdsn, conn->session->max_cmdsn)) {
+                       printk("CmdSN window already closed in iscsi_data_xmit #1 0x%08x/0x%08x\n",
+                                       conn->task->cmdsn, conn->session->max_cmdsn);
+                       BUG();
+               }
                rc = iscsi_xmit_task(conn);
                if (rc)
                        goto done;
@@ -1518,6 +1523,12 @@ check_mgmt:
                if (iscsi_check_tmf_restrictions(task, ISCSI_OP_SCSI_DATA_OUT))
                        break;
 
+               if (!iscsi_sna_lte(conn->task->cmdsn, conn->session->max_cmdsn)) {
+                       printk("CmdSN window already closed in iscsi_data_xmit #2 0x%08x/0x%08x\n",
+                                       conn->task->cmdsn, conn->session->max_cmdsn);
+                       BUG();
+               }
+
                conn->task = task;
                list_del_init(&conn->task->running);
                conn->task->state = ISCSI_TASK_RUNNING;


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser
  2013-11-12  1:31       ` CmdSN greather than MaxCmdSN protocol error in LIO Iser Nicholas A. Bellinger
  2013-11-12  2:16         ` Nicholas A. Bellinger
@ 2013-11-12  2:32         ` Michael Christie
  2013-11-12  2:39           ` Michael Christie
  1 sibling, 1 reply; 18+ messages in thread
From: Michael Christie @ 2013-11-12  2:32 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Moussa Ba (moussaba),
	target-devel, Nicholas Bellinger, Or Gerlitz, linux-scsi



> On Nov 11, 2013, at 7:31 PM, "Nicholas A. Bellinger" <nab@linux-iscsi.org> wrote:
> 
>> On Mon, 2013-11-11 at 16:48 -0800, Nicholas A. Bellinger wrote:
>>> On Mon, 2013-11-11 at 16:42 -0800, Nicholas A. Bellinger wrote:
>>>> On Mon, 2013-11-11 at 13:17 -0800, Nicholas A. Bellinger wrote:
>>>> Hi Moussa & Co,
>>>> 
>>>>> On Mon, 2013-11-11 at 19:05 +0000, Moussa Ba (moussaba) wrote:
>>>>> My system setup is as follows:
>>>>> 
>>>>> Target: CentOS 6.4 - LIO Target running 3.12 kernel, 8 PCIe SSD in one volume group, 8 logical volumes, 8 targets, 1 LUN/target.
>>>>> Initiator: CentOS 6.4 running 3.11 Kernel (Also ran 2.6.32-358), ISER based initiator over Mellanox 40Gbit ConnectX HCA
>>>>> 
>>>>> When running performance tests on the initiator, I am running into
>>>> fio timeouts that lead to ABORT_TASK commands on the target.  In other
>>>> words, fio fails to "reap" all the io threads almost as if it is
>>>> waiting for lost IOs to complete. This is happening on random write IO
>>>> operations.  Some context, we are generating about 576KIOPS 4KB block
>>>> sizes using 8 LUNS.  The PCIe SSD have a write buffer that can absorb
>>>> writes with an end to end latency on the initiator of 44us.  We are
>>>> not currently seeing any errors on read IOs, which tend to have 2X+
>>>> the latency of writes.  
>>>>> 
>>>>> See below for the dmesg on the target side.
>>>>> Timeout Condition occurs at 154 which is the Protocol Error after fio is interrupted or runs to completion.  
>>>>> [  154.453663] Received CmdSN: 0x000fcbb7 is greater than MaxCmdSN: 0x000fcbb6, protocol error.
>>>>> [  154.453673] Received CmdSN: 0x000fcbb8 is greater than MaxCmdSN: 0x000fcbb6, protocol error.
>>>> 
>>>> (CC'ing Mike)
>>>> 
>>>> As mentioned off-list, this would tend to indicate some manner of
>>>> open-iscsi bug, as it's never legal for an initiator to send a CmdSN
>>>> greater than the MaxCmdSN that's currently being sent in target outgoing
>>>> response PDUs.
>>>> 
>>>> Mike, any idea as to how this might be happening on the initiator
>>>> side..?
> 
> <SNIP>
> 
> Mmmmm, just noticed a bit of very suspicious logic in open-iscsi.
> (CC'ing linux-scsi)
> 
> Namely, how drivers/scsi/libiscsi.c:iscsi_data_xmit() attempts to drain
> conn->cmdqueue and hence call iscsi_prep_scsi_cmd_pdu() + increment
> session->cmdsn, without ever checking to see if the CmdSN window may
> have already closed..
> 
> The only CmdSN window check that I see in the I/O dispatch codepath is
> in iscsi_queuecommand() -> iscsi_check_cmdsn_window_closed(), but once
> iscsi_conn_queue_work() is invoked and process context in
> iscsi_xmitworker() -> iscsi_data_xmit() starts to execute, there is
> AFAICT no further CmdSN window open checks to ensure the initiator does
> not overflow MaxCmdSN..
> 
> This would very much line up with the type of bug that is being
> reported.  Before trying to determine what a possible fix might look
> like, can you try the following libiscsi patch to see if your able to
> hit either of the BUG's below..?
> 
> Thanks,
> 
> --nab
> 
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index e399561..9106f63 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -1483,6 +1483,12 @@ check_mgmt:
>                        fail_scsi_task(conn->task, DID_IMM_RETRY);
>                        continue;
>                }
> +
> +               if (iscsi_check_cmdsn_window_closed(conn)) {
> +                       printk("CmdSN window already closed in iscsi_data_xmit #1\n");
> +                       BUG();
> +               }
> +
>                rc = iscsi_prep_scsi_cmd_pdu(conn->task);
>                if (rc) {
>                        if (rc == -ENOMEM || rc == -EACCES) {
> @@ -1518,6 +1524,11 @@ check_mgmt:
>                if (iscsi_check_tmf_restrictions(task, ISCSI_OP_SCSI_DATA_OUT))
>                        break;
> 
> +               if (iscsi_check_cmdsn_window_closed(conn)) {
> +                       printk("CmdSN window already closed in iscsi_data_xmit #2\n");
> +                       BUG();
> +               }
> +
>                conn->task = task;
>                list_del_init(&conn->task->running);
>                conn->task->state = ISCSI_TASK_RUNNING;
> 
> 

If you hit a bug there then it is probably the target or if you are using the new libiscsi back/frwd lock patches it could be related to them. We should not need to check above because we never queue more cmds from the scsi layer than the window will allow at the time. If the window is 10, the queued_cmdsn check should make sure we have only accepted 10 commands into the iscsi layer so we should not need to recheck above.

You should just enable libiscsi debug printk in iscsi_check_cmdsn_window so we can see the sn related values at the time of queuing. You should enable that printk whithout your patch.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser
  2013-11-12  2:32         ` Michael Christie
@ 2013-11-12  2:39           ` Michael Christie
  2013-11-12  4:34             ` Nicholas A. Bellinger
  0 siblings, 1 reply; 18+ messages in thread
From: Michael Christie @ 2013-11-12  2:39 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Moussa Ba (moussaba),
	target-devel, Nicholas Bellinger, Or Gerlitz, linux-scsi



> On Nov 11, 2013, at 8:32 PM, Michael Christie <michaelc@cs.wisc.edu> wrote:
> 
> 
> 
>>> On Nov 11, 2013, at 7:31 PM, "Nicholas A. Bellinger" <nab@linux-iscsi.org> wrote:
>>> 
>>>> On Mon, 2013-11-11 at 16:48 -0800, Nicholas A. Bellinger wrote:
>>>>> On Mon, 2013-11-11 at 16:42 -0800, Nicholas A. Bellinger wrote:
>>>>> On Mon, 2013-11-11 at 13:17 -0800, Nicholas A. Bellinger wrote:
>>>>> Hi Moussa & Co,
>>>>> 
>>>>>> On Mon, 2013-11-11 at 19:05 +0000, Moussa Ba (moussaba) wrote:
>>>>>> My system setup is as follows:
>>>>>> 
>>>>>> Target: CentOS 6.4 - LIO Target running 3.12 kernel, 8 PCIe SSD in one volume group, 8 logical volumes, 8 targets, 1 LUN/target.
>>>>>> Initiator: CentOS 6.4 running 3.11 Kernel (Also ran 2.6.32-358), ISER based initiator over Mellanox 40Gbit ConnectX HCA
>>>>>> 
>>>>>> When running performance tests on the initiator, I am running into
>>>>> fio timeouts that lead to ABORT_TASK commands on the target.  In other
>>>>> words, fio fails to "reap" all the io threads almost as if it is
>>>>> waiting for lost IOs to complete. This is happening on random write IO
>>>>> operations.  Some context, we are generating about 576KIOPS 4KB block
>>>>> sizes using 8 LUNS.  The PCIe SSD have a write buffer that can absorb
>>>>> writes with an end to end latency on the initiator of 44us.  We are
>>>>> not currently seeing any errors on read IOs, which tend to have 2X+
>>>>> the latency of writes.  
>>>>>> 
>>>>>> See below for the dmesg on the target side.
>>>>>> Timeout Condition occurs at 154 which is the Protocol Error after fio is interrupted or runs to completion.  
>>>>>> [  154.453663] Received CmdSN: 0x000fcbb7 is greater than MaxCmdSN: 0x000fcbb6, protocol error.
>>>>>> [  154.453673] Received CmdSN: 0x000fcbb8 is greater than MaxCmdSN: 0x000fcbb6, protocol error.
>>>>> 
>>>>> (CC'ing Mike)
>>>>> 
>>>>> As mentioned off-list, this would tend to indicate some manner of
>>>>> open-iscsi bug, as it's never legal for an initiator to send a CmdSN
>>>>> greater than the MaxCmdSN that's currently being sent in target outgoing
>>>>> response PDUs.
>>>>> 
>>>>> Mike, any idea as to how this might be happening on the initiator
>>>>> side..?
>> 
>> <SNIP>
>> 
>> Mmmmm, just noticed a bit of very suspicious logic in open-iscsi.
>> (CC'ing linux-scsi)
>> 
>> Namely, how drivers/scsi/libiscsi.c:iscsi_data_xmit() attempts to drain
>> conn->cmdqueue and hence call iscsi_prep_scsi_cmd_pdu() + increment
>> session->cmdsn, without ever checking to see if the CmdSN window may
>> have already closed..
>> 
>> The only CmdSN window check that I see in the I/O dispatch codepath is
>> in iscsi_queuecommand() -> iscsi_check_cmdsn_window_closed(), but once
>> iscsi_conn_queue_work() is invoked and process context in
>> iscsi_xmitworker() -> iscsi_data_xmit() starts to execute, there is
>> AFAICT no further CmdSN window open checks to ensure the initiator does
>> not overflow MaxCmdSN..
>> 
>> This would very much line up with the type of bug that is being
>> reported.  Before trying to determine what a possible fix might look
>> like, can you try the following libiscsi patch to see if your able to
>> hit either of the BUG's below..?
>> 
>> Thanks,
>> 
>> --nab
>> 
>> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
>> index e399561..9106f63 100644
>> --- a/drivers/scsi/libiscsi.c
>> +++ b/drivers/scsi/libiscsi.c
>> @@ -1483,6 +1483,12 @@ check_mgmt:
>>                       fail_scsi_task(conn->task, DID_IMM_RETRY);
>>                       continue;
>>               }
>> +
>> +               if (iscsi_check_cmdsn_window_closed(conn)) {
>> +                       printk("CmdSN window already closed in iscsi_data_xmit #1\n");
>> +                       BUG();
>> +               }
>> +
>>               rc = iscsi_prep_scsi_cmd_pdu(conn->task);
>>               if (rc) {
>>                       if (rc == -ENOMEM || rc == -EACCES) {
>> @@ -1518,6 +1524,11 @@ check_mgmt:
>>               if (iscsi_check_tmf_restrictions(task, ISCSI_OP_SCSI_DATA_OUT))
>>                       break;
>> 
>> +               if (iscsi_check_cmdsn_window_closed(conn)) {
>> +                       printk("CmdSN window already closed in iscsi_data_xmit #2\n");
>> +                       BUG();
>> +               }
>> +
>>               conn->task = task;
>>               list_del_init(&conn->task->running);
>>               conn->task->state = ISCSI_TASK_RUNNING;
> 
> If you hit a bug there then it is probably the target or if you are using the new libiscsi back/frwd lock patches it could be related to them. We should not need to check above because we never queue more cmds from the scsi layer than the window will allow at the time. If the window is 10, the queued_cmdsn check should make sure we have only accepted 10 commands into the iscsi layer so we should not need to recheck above.
> 
> You should just enable libiscsi debug printk in iscsi_check_cmdsn_window so we can see the sn related values at the time of queuing. You should enable that printk whithout your patch.

Actually that will not help because we always have the queued cmdsn lower than the max in that path. I would do your patch but then also print the queued cmdsn, cmdsn and also the maxcmdsn in the check function.


I am not by my computer so I cannot send a patch.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser
  2013-11-12  2:39           ` Michael Christie
@ 2013-11-12  4:34             ` Nicholas A. Bellinger
  2013-11-12  7:27               ` Or Gerlitz
  2013-11-12 15:37               ` Mike Christie
  0 siblings, 2 replies; 18+ messages in thread
From: Nicholas A. Bellinger @ 2013-11-12  4:34 UTC (permalink / raw)
  To: Michael Christie
  Cc: Moussa Ba (moussaba),
	target-devel, Nicholas Bellinger, Or Gerlitz, linux-scsi

On Mon, 2013-11-11 at 20:39 -0600, Michael Christie wrote:
> 
> > On Nov 11, 2013, at 8:32 PM, Michael Christie <michaelc@cs.wisc.edu> wrote:
> > 

<SNIP>

> >>>>> Mike, any idea as to how this might be happening on the initiator
> >>>>> side..?
> >> 
> >> <SNIP>
> >> 
> >> Mmmmm, just noticed a bit of very suspicious logic in open-iscsi.
> >> (CC'ing linux-scsi)
> >> 
> >> Namely, how drivers/scsi/libiscsi.c:iscsi_data_xmit() attempts to drain
> >> conn->cmdqueue and hence call iscsi_prep_scsi_cmd_pdu() + increment
> >> session->cmdsn, without ever checking to see if the CmdSN window may
> >> have already closed..
> >> 
> >> The only CmdSN window check that I see in the I/O dispatch codepath is
> >> in iscsi_queuecommand() -> iscsi_check_cmdsn_window_closed(), but once
> >> iscsi_conn_queue_work() is invoked and process context in
> >> iscsi_xmitworker() -> iscsi_data_xmit() starts to execute, there is
> >> AFAICT no further CmdSN window open checks to ensure the initiator does
> >> not overflow MaxCmdSN..
> >> 
> >> This would very much line up with the type of bug that is being
> >> reported.  Before trying to determine what a possible fix might look
> >> like, can you try the following libiscsi patch to see if your able to
> >> hit either of the BUG's below..?
> >> 
> >> Thanks,
> >> 
> >> --nab
> >> 
> >> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> >> index e399561..9106f63 100644
> >> --- a/drivers/scsi/libiscsi.c
> >> +++ b/drivers/scsi/libiscsi.c
> >> @@ -1483,6 +1483,12 @@ check_mgmt:
> >>                       fail_scsi_task(conn->task, DID_IMM_RETRY);
> >>                       continue;
> >>               }
> >> +
> >> +               if (iscsi_check_cmdsn_window_closed(conn)) {
> >> +                       printk("CmdSN window already closed in iscsi_data_xmit #1\n");
> >> +                       BUG();
> >> +               }
> >> +
> >>               rc = iscsi_prep_scsi_cmd_pdu(conn->task);
> >>               if (rc) {
> >>                       if (rc == -ENOMEM || rc == -EACCES) {
> >> @@ -1518,6 +1524,11 @@ check_mgmt:
> >>               if (iscsi_check_tmf_restrictions(task, ISCSI_OP_SCSI_DATA_OUT))
> >>                       break;
> >> 
> >> +               if (iscsi_check_cmdsn_window_closed(conn)) {
> >> +                       printk("CmdSN window already closed in iscsi_data_xmit #2\n");
> >> +                       BUG();
> >> +               }
> >> +
> >>               conn->task = task;
> >>               list_del_init(&conn->task->running);
> >>               conn->task->state = ISCSI_TASK_RUNNING;
> > 
> > If you hit a bug there then it is probably the target or if you are
> using the new libiscsi back/frwd lock patches it could be related to
> them. 

FYI, this is with v3.11 and RHEL 6.x open-iscsi code..

> We should not need to check above because we never queue more cmds
> from the scsi layer than the window will allow at the time. If the
> window is 10, the queued_cmdsn check should make sure we have only
> accepted 10 commands into the iscsi layer so we should not need to
> recheck above.
> > 
> > You should just enable libiscsi debug printk in
> iscsi_check_cmdsn_window so we can see the sn related values at the
> time of queuing. You should enable that printk whithout your patch.
> 
> Actually that will not help because we always have the queued cmdsn
> lower than the max in that path. I would do your patch but then also
> print the queued cmdsn, cmdsn and also the maxcmdsn in the check
> function.
> 
> 

Yes, notice the second test patch is doing the explicit check of
task->cmdsn vs. sess->max_cmdsn, instead of the one that checks
->queued_cmdsn vs. ->max_cmdsn in iscsi_check_cmdsn_window_closed().

Btw, I'm surmising the bug in question does not happen from the
iscsi_queuecommand() submission path, but rather from the
iscsi_update_cmdsn() -> __iscsi_update_cmdsn() completion path, where
iscsi_conn_queue_work() gets called when the CmdSN window was
previously closed, eg:

        if (max_cmdsn != session->max_cmdsn &&
            !iscsi_sna_lt(max_cmdsn, session->max_cmdsn)) {
                session->max_cmdsn = max_cmdsn;
                /*
                 * if the window closed with IO queued, then kick the
                 * xmit thread
                 */
                if (!list_empty(&session->leadconn->cmdqueue) ||
                    !list_empty(&session->leadconn->mgmtqueue))
                        iscsi_conn_queue_work(session->leadconn);
        }
}

Once iscsi_conn_queue_work() is invoked here to start process context
execution of iscsi_xmitworker() -> iscsi_data_xmit() code, AFAICT there
is no logic in place within iscsi_data_xmit() to honor the last received
MaxCmdSN.

Or to put it another way: what is preventing iscsi_data_xmit() from
completely draining both conn->cmdqueue + conn->requeue, even when the
CmdSN window has potentially been closed again..?

--nab

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser
  2013-11-12  4:34             ` Nicholas A. Bellinger
@ 2013-11-12  7:27               ` Or Gerlitz
  2013-11-12  9:15                 ` Nicholas A. Bellinger
  2013-11-12 15:37               ` Mike Christie
  1 sibling, 1 reply; 18+ messages in thread
From: Or Gerlitz @ 2013-11-12  7:27 UTC (permalink / raw)
  To: Nicholas A. Bellinger, Michael Christie
  Cc: Moussa Ba (moussaba), target-devel, Nicholas Bellinger, linux-scsi

On 12/11/2013 06:34, Nicholas A. Bellinger wrote:
> Once iscsi_conn_queue_work() is invoked here to start process context
> execution of iscsi_xmitworker() -> iscsi_data_xmit() code, AFAICT there
> is no logic in place within iscsi_data_xmit() to honor the last received
> MaxCmdSN.
>
> Or to put it another way: what is preventing iscsi_data_xmit() from
> completely draining both conn->cmdqueue + conn->requeue, even when the
> CmdSN window has potentially been closed again..?

Guys,

Note that the iser initiator transport uses the pass-through command 
submission mode of libiscsi, that is
iscsi_conn_queue_work isn't called from queuecommand at all.

This is b/c we call iscsi_host_allocwith xmit_can_sleep = 0. Hence no 
workqueue is used for the command processing/submission over the wire, 
just a call toiscsi_prep_scsi_cmd_pdu and following that to iser's 
xmit_task callbackwhich isiscsi_iser_task_xmit that calls 
iser_send_command, etc.

Mike, Nic is not using the new locking framework patches for libiscsi, 
as you know they are not upstream
yet...

Or.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser
  2013-11-12  7:27               ` Or Gerlitz
@ 2013-11-12  9:15                 ` Nicholas A. Bellinger
  0 siblings, 0 replies; 18+ messages in thread
From: Nicholas A. Bellinger @ 2013-11-12  9:15 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Michael Christie, Moussa Ba (moussaba),
	target-devel, Nicholas Bellinger, linux-scsi

On Tue, 2013-11-12 at 09:27 +0200, Or Gerlitz wrote:
> On 12/11/2013 06:34, Nicholas A. Bellinger wrote:
> > Once iscsi_conn_queue_work() is invoked here to start process context
> > execution of iscsi_xmitworker() -> iscsi_data_xmit() code, AFAICT there
> > is no logic in place within iscsi_data_xmit() to honor the last received
> > MaxCmdSN.
> >
> > Or to put it another way: what is preventing iscsi_data_xmit() from
> > completely draining both conn->cmdqueue + conn->requeue, even when the
> > CmdSN window has potentially been closed again..?
> 
> Guys,
> 
> Note that the iser initiator transport uses the pass-through command 
> submission mode of libiscsi, that is
> iscsi_conn_queue_work isn't called from queuecommand at all.
> 
> This is b/c we call iscsi_host_allocwith xmit_can_sleep = 0. Hence no 
> workqueue is used for the command processing/submission over the wire, 
> just a call toiscsi_prep_scsi_cmd_pdu and following that to iser's 
> xmit_task callbackwhich isiscsi_iser_task_xmit that calls 
> iser_send_command, etc.

Grrrr, managed to miss the fact that iser is using xmit_can_sleep=0, and
avoids iscsi_data_xmit() processing all together..

> Mike, Nic is not using the new locking framework patches for libiscsi, 
> as you know they are not upstream
> yet...
> 

In any event, back to debugging then..

--nab

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser
  2013-11-12  4:34             ` Nicholas A. Bellinger
  2013-11-12  7:27               ` Or Gerlitz
@ 2013-11-12 15:37               ` Mike Christie
  2013-11-12 15:43                 ` Mike Christie
  2013-11-12 18:18                 ` Moussa Ba (moussaba)
  1 sibling, 2 replies; 18+ messages in thread
From: Mike Christie @ 2013-11-12 15:37 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Moussa Ba (moussaba),
	target-devel, Nicholas Bellinger, Or Gerlitz, linux-scsi

On 11/11/13 10:34 PM, Nicholas A. Bellinger wrote:
> On Mon, 2013-11-11 at 20:39 -0600, Michael Christie wrote:
>>
>>> On Nov 11, 2013, at 8:32 PM, Michael Christie <michaelc@cs.wisc.edu> wrote:
>>>
>
> <SNIP>
>
>>>>>>> Mike, any idea as to how this might be happening on the initiator
>>>>>>> side..?
>>>>
>>>> <SNIP>
>>>>
>>>> Mmmmm, just noticed a bit of very suspicious logic in open-iscsi.
>>>> (CC'ing linux-scsi)
>>>>
>>>> Namely, how drivers/scsi/libiscsi.c:iscsi_data_xmit() attempts to drain
>>>> conn->cmdqueue and hence call iscsi_prep_scsi_cmd_pdu() + increment
>>>> session->cmdsn, without ever checking to see if the CmdSN window may
>>>> have already closed..
>>>>
>>>> The only CmdSN window check that I see in the I/O dispatch codepath is
>>>> in iscsi_queuecommand() -> iscsi_check_cmdsn_window_closed(), but once
>>>> iscsi_conn_queue_work() is invoked and process context in
>>>> iscsi_xmitworker() -> iscsi_data_xmit() starts to execute, there is
>>>> AFAICT no further CmdSN window open checks to ensure the initiator does
>>>> not overflow MaxCmdSN..
>>>>
>>>> This would very much line up with the type of bug that is being
>>>> reported.  Before trying to determine what a possible fix might look
>>>> like, can you try the following libiscsi patch to see if your able to
>>>> hit either of the BUG's below..?
>>>>
>>>> Thanks,
>>>>
>>>> --nab
>>>>
>>>> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
>>>> index e399561..9106f63 100644
>>>> --- a/drivers/scsi/libiscsi.c
>>>> +++ b/drivers/scsi/libiscsi.c
>>>> @@ -1483,6 +1483,12 @@ check_mgmt:
>>>>                        fail_scsi_task(conn->task, DID_IMM_RETRY);
>>>>                        continue;
>>>>                }
>>>> +
>>>> +               if (iscsi_check_cmdsn_window_closed(conn)) {
>>>> +                       printk("CmdSN window already closed in iscsi_data_xmit #1\n");
>>>> +                       BUG();
>>>> +               }
>>>> +
>>>>                rc = iscsi_prep_scsi_cmd_pdu(conn->task);
>>>>                if (rc) {
>>>>                        if (rc == -ENOMEM || rc == -EACCES) {
>>>> @@ -1518,6 +1524,11 @@ check_mgmt:
>>>>                if (iscsi_check_tmf_restrictions(task, ISCSI_OP_SCSI_DATA_OUT))
>>>>                        break;
>>>>
>>>> +               if (iscsi_check_cmdsn_window_closed(conn)) {
>>>> +                       printk("CmdSN window already closed in iscsi_data_xmit #2\n");
>>>> +                       BUG();
>>>> +               }
>>>> +
>>>>                conn->task = task;
>>>>                list_del_init(&conn->task->running);
>>>>                conn->task->state = ISCSI_TASK_RUNNING;
>>>
>>> If you hit a bug there then it is probably the target or if you are
>> using the new libiscsi back/frwd lock patches it could be related to
>> them.
>
> FYI, this is with v3.11 and RHEL 6.x open-iscsi code..
>
>> We should not need to check above because we never queue more cmds
>> from the scsi layer than the window will allow at the time. If the
>> window is 10, the queued_cmdsn check should make sure we have only
>> accepted 10 commands into the iscsi layer so we should not need to
>> recheck above.
>>>
>>> You should just enable libiscsi debug printk in
>> iscsi_check_cmdsn_window so we can see the sn related values at the
>> time of queuing. You should enable that printk whithout your patch.
>>
>> Actually that will not help because we always have the queued cmdsn
>> lower than the max in that path. I would do your patch but then also
>> print the queued cmdsn, cmdsn and also the maxcmdsn in the check
>> function.
>>
>>
>
> Yes, notice the second test patch is doing the explicit check of
> task->cmdsn vs. sess->max_cmdsn, instead of the one that checks
> ->queued_cmdsn vs. ->max_cmdsn in iscsi_check_cmdsn_window_closed().
>
> Btw, I'm surmising the bug in question does not happen from the
> iscsi_queuecommand() submission path, but rather from the
> iscsi_update_cmdsn() -> __iscsi_update_cmdsn() completion path, where
> iscsi_conn_queue_work() gets called when the CmdSN window was
> previously closed, eg:
>
>          if (max_cmdsn != session->max_cmdsn &&
>              !iscsi_sna_lt(max_cmdsn, session->max_cmdsn)) {
>                  session->max_cmdsn = max_cmdsn;
>                  /*
>                   * if the window closed with IO queued, then kick the
>                   * xmit thread
>                   */
>                  if (!list_empty(&session->leadconn->cmdqueue) ||
>                      !list_empty(&session->leadconn->mgmtqueue))
>                          iscsi_conn_queue_work(session->leadconn);
>          }
> }
>
> Once iscsi_conn_queue_work() is invoked here to start process context
> execution of iscsi_xmitworker() -> iscsi_data_xmit() code, AFAICT there
> is no logic in place within iscsi_data_xmit() to honor the last received
> MaxCmdSN.
>
> Or to put it another way: what is preventing iscsi_data_xmit() from
> completely draining both conn->cmdqueue + conn->requeue, even when the
> CmdSN window has potentially been closed again..?
>

If the window was not big enough we would have not accepted the cmd from 
scsi-ml in iscsi_queuecommand. We would have never put it on the list 
above then.

In the initiator we have:

session->queued_cmdsn - It will always be less than or equal to the 
max_cmdsn, but greater than or equal to session->cmdsn. Think of it like 
preallocating the sn we are going to give the cmd. Depending on ordering 
it might be different (3 instead of 4), but the final value it gets will 
be less than the max_cmdsn.

session->cmdsn - the sn we give the cmd when we actually put it on the 
wire. It is always less than or equal to queued_cmdsn.

So we have cmdsn <= queued_cmdsn <= max_cmdsn.

The iscsi_queuecommand window check makes sure we will only have put a 
cmd on the cmdqueue list if queued_cmdsn was less than or equal to the 
max_cmdsn. And cmdsn is less than or equal to queued_cmdsn, so we can 
run iscsi_data_xmit without checking the window values again, because it 
is not possible for something like max_cmdsn to decrease on us.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser
  2013-11-12 15:37               ` Mike Christie
@ 2013-11-12 15:43                 ` Mike Christie
  2013-11-12 18:18                 ` Moussa Ba (moussaba)
  1 sibling, 0 replies; 18+ messages in thread
From: Mike Christie @ 2013-11-12 15:43 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Moussa Ba (moussaba),
	target-devel, Nicholas Bellinger, Or Gerlitz, linux-scsi

On 11/12/13 9:37 AM, Mike Christie wrote:
>>
>>          if (max_cmdsn != session->max_cmdsn &&
>>              !iscsi_sna_lt(max_cmdsn, session->max_cmdsn)) {
>>                  session->max_cmdsn = max_cmdsn;
>>                  /*
>>                   * if the window closed with IO queued, then kick the
>>                   * xmit thread
>>                   */
>>                  if (!list_empty(&session->leadconn->cmdqueue) ||
>>                      !list_empty(&session->leadconn->mgmtqueue))
>>                          iscsi_conn_queue_work(session->leadconn);
>>          }
>> }
>>
>> Once iscsi_conn_queue_work() is invoked here to start process context
>> execution of iscsi_xmitworker() -> iscsi_data_xmit() code, AFAICT there
>> is no logic in place within iscsi_data_xmit() to honor the last received
>> MaxCmdSN.
>>
>> Or to put it another way: what is preventing iscsi_data_xmit() from
>> completely draining both conn->cmdqueue + conn->requeue, even when the
>> CmdSN window has potentially been closed again..?
>>
>
> If the window was not big enough we would have not accepted the cmd from
> scsi-ml in iscsi_queuecommand. We would have never put it on the list
> above then.

Oh yeah, if you are asking yourself why that code exists then, it is 
because we used to internally queue cmds when the window closed. In 
iscsi_data_xmit we then checked the window like in your patch. We 
switched to having the scsi layer queue for us, but that code above got 
left there.

Or sent a patch the other day to remove it:
http://marc.info/?l=linux-scsi&m=138288022619303&w=2

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: CmdSN greather than MaxCmdSN protocol error in LIO Iser
  2013-11-12 15:37               ` Mike Christie
  2013-11-12 15:43                 ` Mike Christie
@ 2013-11-12 18:18                 ` Moussa Ba (moussaba)
  2013-11-12 19:46                   ` Michael Christie
  2013-11-12 21:15                   ` Nicholas A. Bellinger
  1 sibling, 2 replies; 18+ messages in thread
From: Moussa Ba (moussaba) @ 2013-11-12 18:18 UTC (permalink / raw)
  To: Mike Christie, Nicholas A. Bellinger
  Cc: target-devel, Nicholas Bellinger, Or Gerlitz, linux-scsi



> -----Original Message-----
> From: Mike Christie [mailto:michaelc@cs.wisc.edu]
> Sent: Tuesday, November 12, 2013 7:38 AM
> To: Nicholas A. Bellinger
> Cc: Moussa Ba (moussaba); target-devel@vger.kernel.org; Nicholas
> Bellinger; Or Gerlitz; linux-scsi
> Subject: Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser
> 
> On 11/11/13 10:34 PM, Nicholas A. Bellinger wrote:
> > On Mon, 2013-11-11 at 20:39 -0600, Michael Christie wrote:
> >>
> >>> On Nov 11, 2013, at 8:32 PM, Michael Christie
> <michaelc@cs.wisc.edu> wrote:
> >>>
> >
> > <SNIP>
> >
> >>>>>>> Mike, any idea as to how this might be happening on the
> initiator
> >>>>>>> side..?
> >>>>
> >>>> <SNIP>
> >>>>
> >>>> Mmmmm, just noticed a bit of very suspicious logic in open-iscsi.
> >>>> (CC'ing linux-scsi)
> >>>>
> >>>> Namely, how drivers/scsi/libiscsi.c:iscsi_data_xmit() attempts to
> drain
> >>>> conn->cmdqueue and hence call iscsi_prep_scsi_cmd_pdu() +
> increment
> >>>> session->cmdsn, without ever checking to see if the CmdSN window
> may
> >>>> have already closed..
> >>>>
> >>>> The only CmdSN window check that I see in the I/O dispatch
> codepath is
> >>>> in iscsi_queuecommand() -> iscsi_check_cmdsn_window_closed(), but
> once
> >>>> iscsi_conn_queue_work() is invoked and process context in
> >>>> iscsi_xmitworker() -> iscsi_data_xmit() starts to execute, there
> is
> >>>> AFAICT no further CmdSN window open checks to ensure the initiator
> does
> >>>> not overflow MaxCmdSN..
> >>>>
> >>>> This would very much line up with the type of bug that is being
> >>>> reported.  Before trying to determine what a possible fix might
> look
> >>>> like, can you try the following libiscsi patch to see if your able
> to
> >>>> hit either of the BUG's below..?
> >>>>
> >>>> Thanks,
> >>>>
> >>>> --nab
> >>>>
> >>>> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> >>>> index e399561..9106f63 100644
> >>>> --- a/drivers/scsi/libiscsi.c
> >>>> +++ b/drivers/scsi/libiscsi.c
> >>>> @@ -1483,6 +1483,12 @@ check_mgmt:
> >>>>                        fail_scsi_task(conn->task, DID_IMM_RETRY);
> >>>>                        continue;
> >>>>                }
> >>>> +
> >>>> +               if (iscsi_check_cmdsn_window_closed(conn)) {
> >>>> +                       printk("CmdSN window already closed in
> iscsi_data_xmit #1\n");
> >>>> +                       BUG();
> >>>> +               }
> >>>> +
> >>>>                rc = iscsi_prep_scsi_cmd_pdu(conn->task);
> >>>>                if (rc) {
> >>>>                        if (rc == -ENOMEM || rc == -EACCES) {
> >>>> @@ -1518,6 +1524,11 @@ check_mgmt:
> >>>>                if (iscsi_check_tmf_restrictions(task,
> ISCSI_OP_SCSI_DATA_OUT))
> >>>>                        break;
> >>>>
> >>>> +               if (iscsi_check_cmdsn_window_closed(conn)) {
> >>>> +                       printk("CmdSN window already closed in
> iscsi_data_xmit #2\n");
> >>>> +                       BUG();
> >>>> +               }
> >>>> +
> >>>>                conn->task = task;
> >>>>                list_del_init(&conn->task->running);
> >>>>                conn->task->state = ISCSI_TASK_RUNNING;
> >>>
> >>> If you hit a bug there then it is probably the target or if you are
> >> using the new libiscsi back/frwd lock patches it could be related to
> >> them.
> >
> > FYI, this is with v3.11 and RHEL 6.x open-iscsi code..
> >
> >> We should not need to check above because we never queue more cmds
> >> from the scsi layer than the window will allow at the time. If the
> >> window is 10, the queued_cmdsn check should make sure we have only
> >> accepted 10 commands into the iscsi layer so we should not need to
> >> recheck above.
> >>>
> >>> You should just enable libiscsi debug printk in
> >> iscsi_check_cmdsn_window so we can see the sn related values at the
> >> time of queuing. You should enable that printk whithout your patch.
> >>
> >> Actually that will not help because we always have the queued cmdsn
> >> lower than the max in that path. I would do your patch but then also
> >> print the queued cmdsn, cmdsn and also the maxcmdsn in the check
> >> function.
> >>
> >>
> >
> > Yes, notice the second test patch is doing the explicit check of
> > task->cmdsn vs. sess->max_cmdsn, instead of the one that checks
> > ->queued_cmdsn vs. ->max_cmdsn in iscsi_check_cmdsn_window_closed().
> >
> > Btw, I'm surmising the bug in question does not happen from the
> > iscsi_queuecommand() submission path, but rather from the
> > iscsi_update_cmdsn() -> __iscsi_update_cmdsn() completion path, where
> > iscsi_conn_queue_work() gets called when the CmdSN window was
> > previously closed, eg:
> >
> >          if (max_cmdsn != session->max_cmdsn &&
> >              !iscsi_sna_lt(max_cmdsn, session->max_cmdsn)) {
> >                  session->max_cmdsn = max_cmdsn;
> >                  /*
> >                   * if the window closed with IO queued, then kick
> the
> >                   * xmit thread
> >                   */
> >                  if (!list_empty(&session->leadconn->cmdqueue) ||
> >                      !list_empty(&session->leadconn->mgmtqueue))
> >                          iscsi_conn_queue_work(session->leadconn);
> >          }
> > }
> >
> > Once iscsi_conn_queue_work() is invoked here to start process context
> > execution of iscsi_xmitworker() -> iscsi_data_xmit() code, AFAICT
> there
> > is no logic in place within iscsi_data_xmit() to honor the last
> received
> > MaxCmdSN.
> >
> > Or to put it another way: what is preventing iscsi_data_xmit() from
> > completely draining both conn->cmdqueue + conn->requeue, even when
> the
> > CmdSN window has potentially been closed again..?
> >
> 
> If the window was not big enough we would have not accepted the cmd
> from
> scsi-ml in iscsi_queuecommand. We would have never put it on the list
> above then.
> 
> In the initiator we have:
> 
> session->queued_cmdsn - It will always be less than or equal to the
> max_cmdsn, but greater than or equal to session->cmdsn. Think of it
> like
> preallocating the sn we are going to give the cmd. Depending on
> ordering
> it might be different (3 instead of 4), but the final value it gets
> will
> be less than the max_cmdsn.
> 
> session->cmdsn - the sn we give the cmd when we actually put it on the
> wire. It is always less than or equal to queued_cmdsn.
> 
> So we have cmdsn <= queued_cmdsn <= max_cmdsn.
> 
> The iscsi_queuecommand window check makes sure we will only have put a
> cmd on the cmdqueue list if queued_cmdsn was less than or equal to the
> max_cmdsn. And cmdsn is less than or equal to queued_cmdsn, so we can
> run iscsi_data_xmit without checking the window values again, because
> it
> is not possible for something like max_cmdsn to decrease on us.



I am attaching error codes on the inititator that seem to coincide with the timeout errors.

[  261.855254]  connection6:0: pdu (op 0x65 itt 0x1) rejected. Reason code 0x4
[  261.855266]  connection6:0: pdu (op 0xa itt 0x1) rejected. Reason code 0x4
[  261.855268]  connection6:0: pdu (op 0x51 itt 0x1) rejected. Reason code 0x4




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser
  2013-11-12 18:18                 ` Moussa Ba (moussaba)
@ 2013-11-12 19:46                   ` Michael Christie
  2013-11-12 19:48                     ` Moussa Ba (moussaba)
  2013-11-12 21:15                   ` Nicholas A. Bellinger
  1 sibling, 1 reply; 18+ messages in thread
From: Michael Christie @ 2013-11-12 19:46 UTC (permalink / raw)
  To: Moussa Ba (moussaba)
  Cc: Nicholas A. Bellinger, target-devel, Nicholas Bellinger,
	Or Gerlitz, linux-scsi


On Nov 12, 2013, at 12:18 PM, Moussa Ba (moussaba) <moussaba@micron.com> wrote:
> 
> I am attaching error codes on the inititator that seem to coincide with the timeout errors.
> 
> [  261.855254]  connection6:0: pdu (op 0x65 itt 0x1) rejected. Reason code 0x4
> [  261.855266]  connection6:0: pdu (op 0xa itt 0x1) rejected. Reason code 0x4
> [  261.855268]  connection6:0: pdu (op 0x51 itt 0x1) rejected. Reason code 0x4
> 


The initiator does not handle rejects. This is actually the 2nd time I have
seen one in 10 years. It logs them just lets the command timeout.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: CmdSN greather than MaxCmdSN protocol error in LIO Iser
  2013-11-12 19:46                   ` Michael Christie
@ 2013-11-12 19:48                     ` Moussa Ba (moussaba)
  0 siblings, 0 replies; 18+ messages in thread
From: Moussa Ba (moussaba) @ 2013-11-12 19:48 UTC (permalink / raw)
  To: Michael Christie
  Cc: Nicholas A. Bellinger, target-devel, Nicholas Bellinger,
	Or Gerlitz, linux-scsi

> -----Original Message-----
> From: Michael Christie [mailto:michaelc@cs.wisc.edu]
> Sent: Tuesday, November 12, 2013 11:46 AM
> To: Moussa Ba (moussaba)
> Cc: Nicholas A. Bellinger; target-devel@vger.kernel.org; Nicholas
> Bellinger; Or Gerlitz; linux-scsi
> Subject: Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser
> 
> 
> On Nov 12, 2013, at 12:18 PM, Moussa Ba (moussaba)
> <moussaba@micron.com> wrote:
> >
> > I am attaching error codes on the inititator that seem to coincide
> with the timeout errors.
> >
> > [  261.855254]  connection6:0: pdu (op 0x65 itt 0x1) rejected. Reason
> code 0x4
> > [  261.855266]  connection6:0: pdu (op 0xa itt 0x1) rejected. Reason
> code 0x4
> > [  261.855268]  connection6:0: pdu (op 0x51 itt 0x1) rejected. Reason
> code 0x4
> >
> 
> 
> The initiator does not handle rejects. This is actually the 2nd time I
> have
> seen one in 10 years. It logs them just lets the command timeout.

Well, I saw 8 already this morning :-). Looking at the reason code it is a protocol error, which is obviously related to the CmdSN/MaxCmdSN mismatch.

Moussa

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser
  2013-11-12 18:18                 ` Moussa Ba (moussaba)
  2013-11-12 19:46                   ` Michael Christie
@ 2013-11-12 21:15                   ` Nicholas A. Bellinger
  2013-11-12 22:11                     ` Moussa Ba (moussaba)
  1 sibling, 1 reply; 18+ messages in thread
From: Nicholas A. Bellinger @ 2013-11-12 21:15 UTC (permalink / raw)
  To: Moussa Ba (moussaba)
  Cc: Mike Christie, target-devel, Nicholas Bellinger, Or Gerlitz, linux-scsi

On Tue, 2013-11-12 at 18:18 +0000, Moussa Ba (moussaba) wrote:

<SNIP>

> > >
> > > Or to put it another way: what is preventing iscsi_data_xmit() from
> > > completely draining both conn->cmdqueue + conn->requeue, even when
> > the
> > > CmdSN window has potentially been closed again..?
> > >
> > 
> > If the window was not big enough we would have not accepted the cmd
> > from
> > scsi-ml in iscsi_queuecommand. We would have never put it on the list
> > above then.
> > 
> > In the initiator we have:
> > 
> > session->queued_cmdsn - It will always be less than or equal to the
> > max_cmdsn, but greater than or equal to session->cmdsn. Think of it
> > like
> > preallocating the sn we are going to give the cmd. Depending on
> > ordering
> > it might be different (3 instead of 4), but the final value it gets
> > will
> > be less than the max_cmdsn.
> > 
> > session->cmdsn - the sn we give the cmd when we actually put it on the
> > wire. It is always less than or equal to queued_cmdsn.
> > 
> > So we have cmdsn <= queued_cmdsn <= max_cmdsn.
> > 
> > The iscsi_queuecommand window check makes sure we will only have put a
> > cmd on the cmdqueue list if queued_cmdsn was less than or equal to the
> > max_cmdsn. And cmdsn is less than or equal to queued_cmdsn, so we can
> > run iscsi_data_xmit without checking the window values again, because
> > it
> > is not possible for something like max_cmdsn to decrease on us.
> 
> 
> 
> I am attaching error codes on the inititator that seem to coincide with the timeout errors.
> 
> [  261.855254]  connection6:0: pdu (op 0x65 itt 0x1) rejected. Reason code 0x4
> [  261.855266]  connection6:0: pdu (op 0xa itt 0x1) rejected. Reason code 0x4
> [  261.855268]  connection6:0: pdu (op 0x51 itt 0x1) rejected. Reason code 0x4
> 
> 

Hi Moussa,

After a bit more review, there may be a scenario introduced with v3.10
code that could cause something like the reported CmdSN > MaxCmdSN error
to occur..

Care to attempt to reproduce with the following target patch..?

Thanks,

--nab

diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index bbd86e8..5661075 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -2029,8 +2029,6 @@ isert_map_rdma(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
 
        if (wr->iser_ib_op == ISER_IB_RDMA_WRITE) {
                data_left = se_cmd->data_length;
-               iscsit_increment_maxcmdsn(cmd, conn->sess);
-               cmd->stat_sn = conn->stat_sn++;
        } else {
                sg_off = cmd->write_data_done / PAGE_SIZE;
                data_left = se_cmd->data_length - cmd->write_data_done;
@@ -2242,8 +2240,6 @@ isert_reg_rdma_frwr(struct iscsi_conn *conn, struct iscsi_cmd *cmd,
 
        if (wr->iser_ib_op == ISER_IB_RDMA_WRITE) {
                data_left = se_cmd->data_length;
-               iscsit_increment_maxcmdsn(cmd, conn->sess);
-               cmd->stat_sn = conn->stat_sn++;
        } else {
                sg_off = cmd->write_data_done / PAGE_SIZE;
                data_left = se_cmd->data_length - cmd->write_data_done;
@@ -2352,7 +2348,7 @@ isert_put_datain(struct iscsi_conn *conn, struct iscsi_cmd *cmd)
         * Build isert_conn->tx_desc for iSCSI response PDU and attach
         */
        isert_create_send_desc(isert_conn, isert_cmd, &isert_cmd->tx_desc);
-       iscsit_build_rsp_pdu(cmd, conn, false, (struct iscsi_scsi_rsp *)
+       iscsit_build_rsp_pdu(cmd, conn, true, (struct iscsi_scsi_rsp *)
                             &isert_cmd->tx_desc.iscsi_header);
        isert_init_tx_hdrs(isert_conn, &isert_cmd->tx_desc);
        isert_init_send_wr(isert_conn, isert_cmd,
diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_config
index 1eec37c..fde3624 100644
--- a/drivers/target/iscsi/iscsi_target_configfs.c
+++ b/drivers/target/iscsi/iscsi_target_configfs.c
@@ -1790,6 +1790,11 @@ static int lio_queue_status(struct se_cmd *se_cmd)
        struct iscsi_cmd *cmd = container_of(se_cmd, struct iscsi_cmd, se_cmd);
 
        cmd->i_state = ISTATE_SEND_STATUS;
+
+       if (cmd->se_cmd.scsi_status || cmd->sense_reason) {
+               iscsit_add_cmd_to_response_queue(cmd, cmd->conn, cmd->i_state);
+               return 0;
+       }
        cmd->conn->conn_transport->iscsit_queue_status(cmd->conn, cmd);
 
        return 0;
diff --git a/drivers/target/iscsi/iscsi_target_device.c b/drivers/target/iscsi/iscsi_target_device.c
index 6c7a510..7087c73 100644
--- a/drivers/target/iscsi/iscsi_target_device.c
+++ b/drivers/target/iscsi/iscsi_target_device.c
@@ -58,11 +58,7 @@ void iscsit_increment_maxcmdsn(struct iscsi_cmd *cmd, struct iscsi_session *sess
 
        cmd->maxcmdsn_inc = 1;
 
-       if (!mutex_trylock(&sess->cmdsn_mutex)) {
-               sess->max_cmd_sn += 1;
-               pr_debug("Updated MaxCmdSN to 0x%08x\n", sess->max_cmd_sn);
-               return;
-       }
+       mutex_lock(&sess->cmdsn_mutex);
        sess->max_cmd_sn += 1;
        pr_debug("Updated MaxCmdSN to 0x%08x\n", sess->max_cmd_sn);
        mutex_unlock(&sess->cmdsn_mutex);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* RE: CmdSN greather than MaxCmdSN protocol error in LIO Iser
  2013-11-12 21:15                   ` Nicholas A. Bellinger
@ 2013-11-12 22:11                     ` Moussa Ba (moussaba)
  2013-11-12 22:14                       ` Nicholas A. Bellinger
  0 siblings, 1 reply; 18+ messages in thread
From: Moussa Ba (moussaba) @ 2013-11-12 22:11 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Mike Christie, target-devel, Nicholas Bellinger, Or Gerlitz, linux-scsi

The patch does not apply to 3.12, are you patching against 3.11? The MaxCmdSN error shows up on 3.12, the 3.11 error is a different one I am still chasing.

Moussa

> -----Original Message-----
> From: target-devel-owner@vger.kernel.org [mailto:target-devel-
> owner@vger.kernel.org] On Behalf Of Nicholas A. Bellinger
> Sent: Tuesday, November 12, 2013 1:16 PM
> To: Moussa Ba (moussaba)
> Cc: Mike Christie; target-devel@vger.kernel.org; Nicholas Bellinger; Or
> Gerlitz; linux-scsi
> Subject: Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser
> 
> On Tue, 2013-11-12 at 18:18 +0000, Moussa Ba (moussaba) wrote:
> 
> <SNIP>
> 
> > > >
> > > > Or to put it another way: what is preventing iscsi_data_xmit()
> from
> > > > completely draining both conn->cmdqueue + conn->requeue, even
> when
> > > the
> > > > CmdSN window has potentially been closed again..?
> > > >
> > >
> > > If the window was not big enough we would have not accepted the cmd
> > > from
> > > scsi-ml in iscsi_queuecommand. We would have never put it on the
> list
> > > above then.
> > >
> > > In the initiator we have:
> > >
> > > session->queued_cmdsn - It will always be less than or equal to the
> > > max_cmdsn, but greater than or equal to session->cmdsn. Think of it
> > > like
> > > preallocating the sn we are going to give the cmd. Depending on
> > > ordering
> > > it might be different (3 instead of 4), but the final value it gets
> > > will
> > > be less than the max_cmdsn.
> > >
> > > session->cmdsn - the sn we give the cmd when we actually put it on
> the
> > > wire. It is always less than or equal to queued_cmdsn.
> > >
> > > So we have cmdsn <= queued_cmdsn <= max_cmdsn.
> > >
> > > The iscsi_queuecommand window check makes sure we will only have
> put a
> > > cmd on the cmdqueue list if queued_cmdsn was less than or equal to
> the
> > > max_cmdsn. And cmdsn is less than or equal to queued_cmdsn, so we
> can
> > > run iscsi_data_xmit without checking the window values again,
> because
> > > it
> > > is not possible for something like max_cmdsn to decrease on us.
> >
> >
> >
> > I am attaching error codes on the inititator that seem to coincide
> with the timeout errors.
> >
> > [  261.855254]  connection6:0: pdu (op 0x65 itt 0x1) rejected. Reason
> code 0x4
> > [  261.855266]  connection6:0: pdu (op 0xa itt 0x1) rejected. Reason
> code 0x4
> > [  261.855268]  connection6:0: pdu (op 0x51 itt 0x1) rejected. Reason
> code 0x4
> >
> >
> 
> Hi Moussa,
> 
> After a bit more review, there may be a scenario introduced with v3.10
> code that could cause something like the reported CmdSN > MaxCmdSN
> error
> to occur..
> 
> Care to attempt to reproduce with the following target patch..?
> 
> Thanks,
> 
> --nab
> 
> diff --git a/drivers/infiniband/ulp/isert/ib_isert.c
> b/drivers/infiniband/ulp/isert/ib_isert.c
> index bbd86e8..5661075 100644
> --- a/drivers/infiniband/ulp/isert/ib_isert.c
> +++ b/drivers/infiniband/ulp/isert/ib_isert.c
> @@ -2029,8 +2029,6 @@ isert_map_rdma(struct iscsi_conn *conn, struct
> iscsi_cmd *cmd,
> 
>         if (wr->iser_ib_op == ISER_IB_RDMA_WRITE) {
>                 data_left = se_cmd->data_length;
> -               iscsit_increment_maxcmdsn(cmd, conn->sess);
> -               cmd->stat_sn = conn->stat_sn++;
>         } else {
>                 sg_off = cmd->write_data_done / PAGE_SIZE;
>                 data_left = se_cmd->data_length - cmd->write_data_done;
> @@ -2242,8 +2240,6 @@ isert_reg_rdma_frwr(struct iscsi_conn *conn,
> struct iscsi_cmd *cmd,
> 
>         if (wr->iser_ib_op == ISER_IB_RDMA_WRITE) {
>                 data_left = se_cmd->data_length;
> -               iscsit_increment_maxcmdsn(cmd, conn->sess);
> -               cmd->stat_sn = conn->stat_sn++;
>         } else {
>                 sg_off = cmd->write_data_done / PAGE_SIZE;
>                 data_left = se_cmd->data_length - cmd->write_data_done;
> @@ -2352,7 +2348,7 @@ isert_put_datain(struct iscsi_conn *conn, struct
> iscsi_cmd *cmd)
>          * Build isert_conn->tx_desc for iSCSI response PDU and attach
>          */
>         isert_create_send_desc(isert_conn, isert_cmd, &isert_cmd-
> >tx_desc);
> -       iscsit_build_rsp_pdu(cmd, conn, false, (struct iscsi_scsi_rsp
> *)
> +       iscsit_build_rsp_pdu(cmd, conn, true, (struct iscsi_scsi_rsp *)
>                              &isert_cmd->tx_desc.iscsi_header);
>         isert_init_tx_hdrs(isert_conn, &isert_cmd->tx_desc);
>         isert_init_send_wr(isert_conn, isert_cmd,
> diff --git a/drivers/target/iscsi/iscsi_target_configfs.c
> b/drivers/target/iscsi/iscsi_target_config
> index 1eec37c..fde3624 100644
> --- a/drivers/target/iscsi/iscsi_target_configfs.c
> +++ b/drivers/target/iscsi/iscsi_target_configfs.c
> @@ -1790,6 +1790,11 @@ static int lio_queue_status(struct se_cmd
> *se_cmd)
>         struct iscsi_cmd *cmd = container_of(se_cmd, struct iscsi_cmd,
> se_cmd);
> 
>         cmd->i_state = ISTATE_SEND_STATUS;
> +
> +       if (cmd->se_cmd.scsi_status || cmd->sense_reason) {
> +               iscsit_add_cmd_to_response_queue(cmd, cmd->conn, cmd-
> >i_state);
> +               return 0;
> +       }
>         cmd->conn->conn_transport->iscsit_queue_status(cmd->conn, cmd);
> 
>         return 0;
> diff --git a/drivers/target/iscsi/iscsi_target_device.c
> b/drivers/target/iscsi/iscsi_target_device.c
> index 6c7a510..7087c73 100644
> --- a/drivers/target/iscsi/iscsi_target_device.c
> +++ b/drivers/target/iscsi/iscsi_target_device.c
> @@ -58,11 +58,7 @@ void iscsit_increment_maxcmdsn(struct iscsi_cmd
> *cmd, struct iscsi_session *sess
> 
>         cmd->maxcmdsn_inc = 1;
> 
> -       if (!mutex_trylock(&sess->cmdsn_mutex)) {
> -               sess->max_cmd_sn += 1;
> -               pr_debug("Updated MaxCmdSN to 0x%08x\n", sess-
> >max_cmd_sn);
> -               return;
> -       }
> +       mutex_lock(&sess->cmdsn_mutex);
>         sess->max_cmd_sn += 1;
>         pr_debug("Updated MaxCmdSN to 0x%08x\n", sess->max_cmd_sn);
>         mutex_unlock(&sess->cmdsn_mutex);
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe target-devel"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser
  2013-11-12 22:11                     ` Moussa Ba (moussaba)
@ 2013-11-12 22:14                       ` Nicholas A. Bellinger
  2013-11-12 22:15                         ` Moussa Ba (moussaba)
  2013-11-12 23:03                         ` Moussa Ba (moussaba)
  0 siblings, 2 replies; 18+ messages in thread
From: Nicholas A. Bellinger @ 2013-11-12 22:14 UTC (permalink / raw)
  To: Moussa Ba (moussaba)
  Cc: Mike Christie, target-devel, Nicholas Bellinger, Or Gerlitz, linux-scsi

On Tue, 2013-11-12 at 22:11 +0000, Moussa Ba (moussaba) wrote:
> The patch does not apply to 3.12, are you patching against 3.11? The
> MaxCmdSN error shows up on 3.12, the 3.11 error is a different one I
> am still chasing.

It's against the target-pending/for-next tree at v3.12-rc5..

Care to apply it manually..?  The effort involved should be minimal.

--nab

> 
> Moussa
> 
> > -----Original Message-----
> > From: target-devel-owner@vger.kernel.org [mailto:target-devel-
> > owner@vger.kernel.org] On Behalf Of Nicholas A. Bellinger
> > Sent: Tuesday, November 12, 2013 1:16 PM
> > To: Moussa Ba (moussaba)
> > Cc: Mike Christie; target-devel@vger.kernel.org; Nicholas Bellinger; Or
> > Gerlitz; linux-scsi
> > Subject: Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser
> > 
> > On Tue, 2013-11-12 at 18:18 +0000, Moussa Ba (moussaba) wrote:
> > 
> > <SNIP>
> > 
> > > > >
> > > > > Or to put it another way: what is preventing iscsi_data_xmit()
> > from
> > > > > completely draining both conn->cmdqueue + conn->requeue, even
> > when
> > > > the
> > > > > CmdSN window has potentially been closed again..?
> > > > >
> > > >
> > > > If the window was not big enough we would have not accepted the cmd
> > > > from
> > > > scsi-ml in iscsi_queuecommand. We would have never put it on the
> > list
> > > > above then.
> > > >
> > > > In the initiator we have:
> > > >
> > > > session->queued_cmdsn - It will always be less than or equal to the
> > > > max_cmdsn, but greater than or equal to session->cmdsn. Think of it
> > > > like
> > > > preallocating the sn we are going to give the cmd. Depending on
> > > > ordering
> > > > it might be different (3 instead of 4), but the final value it gets
> > > > will
> > > > be less than the max_cmdsn.
> > > >
> > > > session->cmdsn - the sn we give the cmd when we actually put it on
> > the
> > > > wire. It is always less than or equal to queued_cmdsn.
> > > >
> > > > So we have cmdsn <= queued_cmdsn <= max_cmdsn.
> > > >
> > > > The iscsi_queuecommand window check makes sure we will only have
> > put a
> > > > cmd on the cmdqueue list if queued_cmdsn was less than or equal to
> > the
> > > > max_cmdsn. And cmdsn is less than or equal to queued_cmdsn, so we
> > can
> > > > run iscsi_data_xmit without checking the window values again,
> > because
> > > > it
> > > > is not possible for something like max_cmdsn to decrease on us.
> > >
> > >
> > >
> > > I am attaching error codes on the inititator that seem to coincide
> > with the timeout errors.
> > >
> > > [  261.855254]  connection6:0: pdu (op 0x65 itt 0x1) rejected. Reason
> > code 0x4
> > > [  261.855266]  connection6:0: pdu (op 0xa itt 0x1) rejected. Reason
> > code 0x4
> > > [  261.855268]  connection6:0: pdu (op 0x51 itt 0x1) rejected. Reason
> > code 0x4
> > >
> > >
> > 
> > Hi Moussa,
> > 
> > After a bit more review, there may be a scenario introduced with v3.10
> > code that could cause something like the reported CmdSN > MaxCmdSN
> > error
> > to occur..
> > 
> > Care to attempt to reproduce with the following target patch..?
> > 
> > Thanks,
> > 
> > --nab
> > 
> > diff --git a/drivers/infiniband/ulp/isert/ib_isert.c
> > b/drivers/infiniband/ulp/isert/ib_isert.c
> > index bbd86e8..5661075 100644
> > --- a/drivers/infiniband/ulp/isert/ib_isert.c
> > +++ b/drivers/infiniband/ulp/isert/ib_isert.c
> > @@ -2029,8 +2029,6 @@ isert_map_rdma(struct iscsi_conn *conn, struct
> > iscsi_cmd *cmd,
> > 
> >         if (wr->iser_ib_op == ISER_IB_RDMA_WRITE) {
> >                 data_left = se_cmd->data_length;
> > -               iscsit_increment_maxcmdsn(cmd, conn->sess);
> > -               cmd->stat_sn = conn->stat_sn++;
> >         } else {
> >                 sg_off = cmd->write_data_done / PAGE_SIZE;
> >                 data_left = se_cmd->data_length - cmd->write_data_done;
> > @@ -2242,8 +2240,6 @@ isert_reg_rdma_frwr(struct iscsi_conn *conn,
> > struct iscsi_cmd *cmd,
> > 
> >         if (wr->iser_ib_op == ISER_IB_RDMA_WRITE) {
> >                 data_left = se_cmd->data_length;
> > -               iscsit_increment_maxcmdsn(cmd, conn->sess);
> > -               cmd->stat_sn = conn->stat_sn++;
> >         } else {
> >                 sg_off = cmd->write_data_done / PAGE_SIZE;
> >                 data_left = se_cmd->data_length - cmd->write_data_done;
> > @@ -2352,7 +2348,7 @@ isert_put_datain(struct iscsi_conn *conn, struct
> > iscsi_cmd *cmd)
> >          * Build isert_conn->tx_desc for iSCSI response PDU and attach
> >          */
> >         isert_create_send_desc(isert_conn, isert_cmd, &isert_cmd-
> > >tx_desc);
> > -       iscsit_build_rsp_pdu(cmd, conn, false, (struct iscsi_scsi_rsp
> > *)
> > +       iscsit_build_rsp_pdu(cmd, conn, true, (struct iscsi_scsi_rsp *)
> >                              &isert_cmd->tx_desc.iscsi_header);
> >         isert_init_tx_hdrs(isert_conn, &isert_cmd->tx_desc);
> >         isert_init_send_wr(isert_conn, isert_cmd,
> > diff --git a/drivers/target/iscsi/iscsi_target_configfs.c
> > b/drivers/target/iscsi/iscsi_target_config
> > index 1eec37c..fde3624 100644
> > --- a/drivers/target/iscsi/iscsi_target_configfs.c
> > +++ b/drivers/target/iscsi/iscsi_target_configfs.c
> > @@ -1790,6 +1790,11 @@ static int lio_queue_status(struct se_cmd
> > *se_cmd)
> >         struct iscsi_cmd *cmd = container_of(se_cmd, struct iscsi_cmd,
> > se_cmd);
> > 
> >         cmd->i_state = ISTATE_SEND_STATUS;
> > +
> > +       if (cmd->se_cmd.scsi_status || cmd->sense_reason) {
> > +               iscsit_add_cmd_to_response_queue(cmd, cmd->conn, cmd-
> > >i_state);
> > +               return 0;
> > +       }
> >         cmd->conn->conn_transport->iscsit_queue_status(cmd->conn, cmd);
> > 
> >         return 0;
> > diff --git a/drivers/target/iscsi/iscsi_target_device.c
> > b/drivers/target/iscsi/iscsi_target_device.c
> > index 6c7a510..7087c73 100644
> > --- a/drivers/target/iscsi/iscsi_target_device.c
> > +++ b/drivers/target/iscsi/iscsi_target_device.c
> > @@ -58,11 +58,7 @@ void iscsit_increment_maxcmdsn(struct iscsi_cmd
> > *cmd, struct iscsi_session *sess
> > 
> >         cmd->maxcmdsn_inc = 1;
> > 
> > -       if (!mutex_trylock(&sess->cmdsn_mutex)) {
> > -               sess->max_cmd_sn += 1;
> > -               pr_debug("Updated MaxCmdSN to 0x%08x\n", sess-
> > >max_cmd_sn);
> > -               return;
> > -       }
> > +       mutex_lock(&sess->cmdsn_mutex);
> >         sess->max_cmd_sn += 1;
> >         pr_debug("Updated MaxCmdSN to 0x%08x\n", sess->max_cmd_sn);
> >         mutex_unlock(&sess->cmdsn_mutex);
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe target-devel"
> > in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: CmdSN greather than MaxCmdSN protocol error in LIO Iser
  2013-11-12 22:14                       ` Nicholas A. Bellinger
@ 2013-11-12 22:15                         ` Moussa Ba (moussaba)
  2013-11-12 23:03                         ` Moussa Ba (moussaba)
  1 sibling, 0 replies; 18+ messages in thread
From: Moussa Ba (moussaba) @ 2013-11-12 22:15 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Mike Christie, target-devel, Nicholas Bellinger, Or Gerlitz, linux-scsi

That is what I am currently doing...Thanks. Will update shortly.

Moussa

> -----Original Message-----
> From: Nicholas A. Bellinger [mailto:nab@linux-iscsi.org]
> Sent: Tuesday, November 12, 2013 2:15 PM
> To: Moussa Ba (moussaba)
> Cc: Mike Christie; target-devel@vger.kernel.org; Nicholas Bellinger; Or
> Gerlitz; linux-scsi
> Subject: Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser
> 
> On Tue, 2013-11-12 at 22:11 +0000, Moussa Ba (moussaba) wrote:
> > The patch does not apply to 3.12, are you patching against 3.11? The
> > MaxCmdSN error shows up on 3.12, the 3.11 error is a different one I
> > am still chasing.
> 
> It's against the target-pending/for-next tree at v3.12-rc5..
> 
> Care to apply it manually..?  The effort involved should be minimal.
> 
> --nab
> 
> >
> > Moussa
> >
> > > -----Original Message-----
> > > From: target-devel-owner@vger.kernel.org [mailto:target-devel-
> > > owner@vger.kernel.org] On Behalf Of Nicholas A. Bellinger
> > > Sent: Tuesday, November 12, 2013 1:16 PM
> > > To: Moussa Ba (moussaba)
> > > Cc: Mike Christie; target-devel@vger.kernel.org; Nicholas
> Bellinger; Or
> > > Gerlitz; linux-scsi
> > > Subject: Re: CmdSN greather than MaxCmdSN protocol error in LIO
> Iser
> > >
> > > On Tue, 2013-11-12 at 18:18 +0000, Moussa Ba (moussaba) wrote:
> > >
> > > <SNIP>
> > >
> > > > > >
> > > > > > Or to put it another way: what is preventing
> iscsi_data_xmit()
> > > from
> > > > > > completely draining both conn->cmdqueue + conn->requeue, even
> > > when
> > > > > the
> > > > > > CmdSN window has potentially been closed again..?
> > > > > >
> > > > >
> > > > > If the window was not big enough we would have not accepted the
> cmd
> > > > > from
> > > > > scsi-ml in iscsi_queuecommand. We would have never put it on
> the
> > > list
> > > > > above then.
> > > > >
> > > > > In the initiator we have:
> > > > >
> > > > > session->queued_cmdsn - It will always be less than or equal to
> the
> > > > > max_cmdsn, but greater than or equal to session->cmdsn. Think
> of it
> > > > > like
> > > > > preallocating the sn we are going to give the cmd. Depending on
> > > > > ordering
> > > > > it might be different (3 instead of 4), but the final value it
> gets
> > > > > will
> > > > > be less than the max_cmdsn.
> > > > >
> > > > > session->cmdsn - the sn we give the cmd when we actually put it
> on
> > > the
> > > > > wire. It is always less than or equal to queued_cmdsn.
> > > > >
> > > > > So we have cmdsn <= queued_cmdsn <= max_cmdsn.
> > > > >
> > > > > The iscsi_queuecommand window check makes sure we will only
> have
> > > put a
> > > > > cmd on the cmdqueue list if queued_cmdsn was less than or equal
> to
> > > the
> > > > > max_cmdsn. And cmdsn is less than or equal to queued_cmdsn, so
> we
> > > can
> > > > > run iscsi_data_xmit without checking the window values again,
> > > because
> > > > > it
> > > > > is not possible for something like max_cmdsn to decrease on us.
> > > >
> > > >
> > > >
> > > > I am attaching error codes on the inititator that seem to
> coincide
> > > with the timeout errors.
> > > >
> > > > [  261.855254]  connection6:0: pdu (op 0x65 itt 0x1) rejected.
> Reason
> > > code 0x4
> > > > [  261.855266]  connection6:0: pdu (op 0xa itt 0x1) rejected.
> Reason
> > > code 0x4
> > > > [  261.855268]  connection6:0: pdu (op 0x51 itt 0x1) rejected.
> Reason
> > > code 0x4
> > > >
> > > >
> > >
> > > Hi Moussa,
> > >
> > > After a bit more review, there may be a scenario introduced with
> v3.10
> > > code that could cause something like the reported CmdSN > MaxCmdSN
> > > error
> > > to occur..
> > >
> > > Care to attempt to reproduce with the following target patch..?
> > >
> > > Thanks,
> > >
> > > --nab
> > >
> > > diff --git a/drivers/infiniband/ulp/isert/ib_isert.c
> > > b/drivers/infiniband/ulp/isert/ib_isert.c
> > > index bbd86e8..5661075 100644
> > > --- a/drivers/infiniband/ulp/isert/ib_isert.c
> > > +++ b/drivers/infiniband/ulp/isert/ib_isert.c
> > > @@ -2029,8 +2029,6 @@ isert_map_rdma(struct iscsi_conn *conn,
> struct
> > > iscsi_cmd *cmd,
> > >
> > >         if (wr->iser_ib_op == ISER_IB_RDMA_WRITE) {
> > >                 data_left = se_cmd->data_length;
> > > -               iscsit_increment_maxcmdsn(cmd, conn->sess);
> > > -               cmd->stat_sn = conn->stat_sn++;
> > >         } else {
> > >                 sg_off = cmd->write_data_done / PAGE_SIZE;
> > >                 data_left = se_cmd->data_length - cmd-
> >write_data_done;
> > > @@ -2242,8 +2240,6 @@ isert_reg_rdma_frwr(struct iscsi_conn *conn,
> > > struct iscsi_cmd *cmd,
> > >
> > >         if (wr->iser_ib_op == ISER_IB_RDMA_WRITE) {
> > >                 data_left = se_cmd->data_length;
> > > -               iscsit_increment_maxcmdsn(cmd, conn->sess);
> > > -               cmd->stat_sn = conn->stat_sn++;
> > >         } else {
> > >                 sg_off = cmd->write_data_done / PAGE_SIZE;
> > >                 data_left = se_cmd->data_length - cmd-
> >write_data_done;
> > > @@ -2352,7 +2348,7 @@ isert_put_datain(struct iscsi_conn *conn,
> struct
> > > iscsi_cmd *cmd)
> > >          * Build isert_conn->tx_desc for iSCSI response PDU and
> attach
> > >          */
> > >         isert_create_send_desc(isert_conn, isert_cmd, &isert_cmd-
> > > >tx_desc);
> > > -       iscsit_build_rsp_pdu(cmd, conn, false, (struct
> iscsi_scsi_rsp
> > > *)
> > > +       iscsit_build_rsp_pdu(cmd, conn, true, (struct
> iscsi_scsi_rsp *)
> > >                              &isert_cmd->tx_desc.iscsi_header);
> > >         isert_init_tx_hdrs(isert_conn, &isert_cmd->tx_desc);
> > >         isert_init_send_wr(isert_conn, isert_cmd,
> > > diff --git a/drivers/target/iscsi/iscsi_target_configfs.c
> > > b/drivers/target/iscsi/iscsi_target_config
> > > index 1eec37c..fde3624 100644
> > > --- a/drivers/target/iscsi/iscsi_target_configfs.c
> > > +++ b/drivers/target/iscsi/iscsi_target_configfs.c
> > > @@ -1790,6 +1790,11 @@ static int lio_queue_status(struct se_cmd
> > > *se_cmd)
> > >         struct iscsi_cmd *cmd = container_of(se_cmd, struct
> iscsi_cmd,
> > > se_cmd);
> > >
> > >         cmd->i_state = ISTATE_SEND_STATUS;
> > > +
> > > +       if (cmd->se_cmd.scsi_status || cmd->sense_reason) {
> > > +               iscsit_add_cmd_to_response_queue(cmd, cmd->conn,
> cmd-
> > > >i_state);
> > > +               return 0;
> > > +       }
> > >         cmd->conn->conn_transport->iscsit_queue_status(cmd->conn,
> cmd);
> > >
> > >         return 0;
> > > diff --git a/drivers/target/iscsi/iscsi_target_device.c
> > > b/drivers/target/iscsi/iscsi_target_device.c
> > > index 6c7a510..7087c73 100644
> > > --- a/drivers/target/iscsi/iscsi_target_device.c
> > > +++ b/drivers/target/iscsi/iscsi_target_device.c
> > > @@ -58,11 +58,7 @@ void iscsit_increment_maxcmdsn(struct iscsi_cmd
> > > *cmd, struct iscsi_session *sess
> > >
> > >         cmd->maxcmdsn_inc = 1;
> > >
> > > -       if (!mutex_trylock(&sess->cmdsn_mutex)) {
> > > -               sess->max_cmd_sn += 1;
> > > -               pr_debug("Updated MaxCmdSN to 0x%08x\n", sess-
> > > >max_cmd_sn);
> > > -               return;
> > > -       }
> > > +       mutex_lock(&sess->cmdsn_mutex);
> > >         sess->max_cmd_sn += 1;
> > >         pr_debug("Updated MaxCmdSN to 0x%08x\n", sess->max_cmd_sn);
> > >         mutex_unlock(&sess->cmdsn_mutex);
> > >
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe target-
> devel"
> > > in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: CmdSN greather than MaxCmdSN protocol error in LIO Iser
  2013-11-12 22:14                       ` Nicholas A. Bellinger
  2013-11-12 22:15                         ` Moussa Ba (moussaba)
@ 2013-11-12 23:03                         ` Moussa Ba (moussaba)
  2013-11-13  1:51                           ` Nicholas A. Bellinger
  1 sibling, 1 reply; 18+ messages in thread
From: Moussa Ba (moussaba) @ 2013-11-12 23:03 UTC (permalink / raw)
  To: Moussa Ba (moussaba), Nicholas A. Bellinger
  Cc: Mike Christie, target-devel, Nicholas Bellinger, Or Gerlitz, linux-scsi

Your latest patch seems to have addressed the problem, I have not been able to reproduce it post patch and can readily reproduce it with the mainline 3.12 kernel. I will run a longer test tonight, but this seems to have done it.

Moussa

> -----Original Message-----
> From: Moussa Ba (moussaba)
> Sent: Tuesday, November 12, 2013 2:15 PM
> To: 'Nicholas A. Bellinger'
> Cc: Mike Christie; target-devel@vger.kernel.org; Nicholas Bellinger; Or
> Gerlitz; linux-scsi
> Subject: RE: CmdSN greather than MaxCmdSN protocol error in LIO Iser
> 
> That is what I am currently doing...Thanks. Will update shortly.
> 
> Moussa
> 
> > -----Original Message-----
> > From: Nicholas A. Bellinger [mailto:nab@linux-iscsi.org]
> > Sent: Tuesday, November 12, 2013 2:15 PM
> > To: Moussa Ba (moussaba)
> > Cc: Mike Christie; target-devel@vger.kernel.org; Nicholas Bellinger;
> Or
> > Gerlitz; linux-scsi
> > Subject: Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser
> >
> > On Tue, 2013-11-12 at 22:11 +0000, Moussa Ba (moussaba) wrote:
> > > The patch does not apply to 3.12, are you patching against 3.11?
> The
> > > MaxCmdSN error shows up on 3.12, the 3.11 error is a different one
> I
> > > am still chasing.
> >
> > It's against the target-pending/for-next tree at v3.12-rc5..
> >
> > Care to apply it manually..?  The effort involved should be minimal.
> >
> > --nab
> >
> > >
> > > Moussa
> > >
> > > > -----Original Message-----
> > > > From: target-devel-owner@vger.kernel.org [mailto:target-devel-
> > > > owner@vger.kernel.org] On Behalf Of Nicholas A. Bellinger
> > > > Sent: Tuesday, November 12, 2013 1:16 PM
> > > > To: Moussa Ba (moussaba)
> > > > Cc: Mike Christie; target-devel@vger.kernel.org; Nicholas
> > Bellinger; Or
> > > > Gerlitz; linux-scsi
> > > > Subject: Re: CmdSN greather than MaxCmdSN protocol error in LIO
> > Iser
> > > >
> > > > On Tue, 2013-11-12 at 18:18 +0000, Moussa Ba (moussaba) wrote:
> > > >
> > > > <SNIP>
> > > >
> > > > > > >
> > > > > > > Or to put it another way: what is preventing
> > iscsi_data_xmit()
> > > > from
> > > > > > > completely draining both conn->cmdqueue + conn->requeue,
> even
> > > > when
> > > > > > the
> > > > > > > CmdSN window has potentially been closed again..?
> > > > > > >
> > > > > >
> > > > > > If the window was not big enough we would have not accepted
> the
> > cmd
> > > > > > from
> > > > > > scsi-ml in iscsi_queuecommand. We would have never put it on
> > the
> > > > list
> > > > > > above then.
> > > > > >
> > > > > > In the initiator we have:
> > > > > >
> > > > > > session->queued_cmdsn - It will always be less than or equal
> to
> > the
> > > > > > max_cmdsn, but greater than or equal to session->cmdsn. Think
> > of it
> > > > > > like
> > > > > > preallocating the sn we are going to give the cmd. Depending
> on
> > > > > > ordering
> > > > > > it might be different (3 instead of 4), but the final value
> it
> > gets
> > > > > > will
> > > > > > be less than the max_cmdsn.
> > > > > >
> > > > > > session->cmdsn - the sn we give the cmd when we actually put
> it
> > on
> > > > the
> > > > > > wire. It is always less than or equal to queued_cmdsn.
> > > > > >
> > > > > > So we have cmdsn <= queued_cmdsn <= max_cmdsn.
> > > > > >
> > > > > > The iscsi_queuecommand window check makes sure we will only
> > have
> > > > put a
> > > > > > cmd on the cmdqueue list if queued_cmdsn was less than or
> equal
> > to
> > > > the
> > > > > > max_cmdsn. And cmdsn is less than or equal to queued_cmdsn,
> so
> > we
> > > > can
> > > > > > run iscsi_data_xmit without checking the window values again,
> > > > because
> > > > > > it
> > > > > > is not possible for something like max_cmdsn to decrease on
> us.
> > > > >
> > > > >
> > > > >
> > > > > I am attaching error codes on the inititator that seem to
> > coincide
> > > > with the timeout errors.
> > > > >
> > > > > [  261.855254]  connection6:0: pdu (op 0x65 itt 0x1) rejected.
> > Reason
> > > > code 0x4
> > > > > [  261.855266]  connection6:0: pdu (op 0xa itt 0x1) rejected.
> > Reason
> > > > code 0x4
> > > > > [  261.855268]  connection6:0: pdu (op 0x51 itt 0x1) rejected.
> > Reason
> > > > code 0x4
> > > > >
> > > > >
> > > >
> > > > Hi Moussa,
> > > >
> > > > After a bit more review, there may be a scenario introduced with
> > v3.10
> > > > code that could cause something like the reported CmdSN >
> MaxCmdSN
> > > > error
> > > > to occur..
> > > >
> > > > Care to attempt to reproduce with the following target patch..?
> > > >
> > > > Thanks,
> > > >
> > > > --nab
> > > >
> > > > diff --git a/drivers/infiniband/ulp/isert/ib_isert.c
> > > > b/drivers/infiniband/ulp/isert/ib_isert.c
> > > > index bbd86e8..5661075 100644
> > > > --- a/drivers/infiniband/ulp/isert/ib_isert.c
> > > > +++ b/drivers/infiniband/ulp/isert/ib_isert.c
> > > > @@ -2029,8 +2029,6 @@ isert_map_rdma(struct iscsi_conn *conn,
> > struct
> > > > iscsi_cmd *cmd,
> > > >
> > > >         if (wr->iser_ib_op == ISER_IB_RDMA_WRITE) {
> > > >                 data_left = se_cmd->data_length;
> > > > -               iscsit_increment_maxcmdsn(cmd, conn->sess);
> > > > -               cmd->stat_sn = conn->stat_sn++;
> > > >         } else {
> > > >                 sg_off = cmd->write_data_done / PAGE_SIZE;
> > > >                 data_left = se_cmd->data_length - cmd-
> > >write_data_done;
> > > > @@ -2242,8 +2240,6 @@ isert_reg_rdma_frwr(struct iscsi_conn
> *conn,
> > > > struct iscsi_cmd *cmd,
> > > >
> > > >         if (wr->iser_ib_op == ISER_IB_RDMA_WRITE) {
> > > >                 data_left = se_cmd->data_length;
> > > > -               iscsit_increment_maxcmdsn(cmd, conn->sess);
> > > > -               cmd->stat_sn = conn->stat_sn++;
> > > >         } else {
> > > >                 sg_off = cmd->write_data_done / PAGE_SIZE;
> > > >                 data_left = se_cmd->data_length - cmd-
> > >write_data_done;
> > > > @@ -2352,7 +2348,7 @@ isert_put_datain(struct iscsi_conn *conn,
> > struct
> > > > iscsi_cmd *cmd)
> > > >          * Build isert_conn->tx_desc for iSCSI response PDU and
> > attach
> > > >          */
> > > >         isert_create_send_desc(isert_conn, isert_cmd, &isert_cmd-
> > > > >tx_desc);
> > > > -       iscsit_build_rsp_pdu(cmd, conn, false, (struct
> > iscsi_scsi_rsp
> > > > *)
> > > > +       iscsit_build_rsp_pdu(cmd, conn, true, (struct
> > iscsi_scsi_rsp *)
> > > >                              &isert_cmd->tx_desc.iscsi_header);
> > > >         isert_init_tx_hdrs(isert_conn, &isert_cmd->tx_desc);
> > > >         isert_init_send_wr(isert_conn, isert_cmd,
> > > > diff --git a/drivers/target/iscsi/iscsi_target_configfs.c
> > > > b/drivers/target/iscsi/iscsi_target_config
> > > > index 1eec37c..fde3624 100644
> > > > --- a/drivers/target/iscsi/iscsi_target_configfs.c
> > > > +++ b/drivers/target/iscsi/iscsi_target_configfs.c
> > > > @@ -1790,6 +1790,11 @@ static int lio_queue_status(struct se_cmd
> > > > *se_cmd)
> > > >         struct iscsi_cmd *cmd = container_of(se_cmd, struct
> > iscsi_cmd,
> > > > se_cmd);
> > > >
> > > >         cmd->i_state = ISTATE_SEND_STATUS;
> > > > +
> > > > +       if (cmd->se_cmd.scsi_status || cmd->sense_reason) {
> > > > +               iscsit_add_cmd_to_response_queue(cmd, cmd->conn,
> > cmd-
> > > > >i_state);
> > > > +               return 0;
> > > > +       }
> > > >         cmd->conn->conn_transport->iscsit_queue_status(cmd->conn,
> > cmd);
> > > >
> > > >         return 0;
> > > > diff --git a/drivers/target/iscsi/iscsi_target_device.c
> > > > b/drivers/target/iscsi/iscsi_target_device.c
> > > > index 6c7a510..7087c73 100644
> > > > --- a/drivers/target/iscsi/iscsi_target_device.c
> > > > +++ b/drivers/target/iscsi/iscsi_target_device.c
> > > > @@ -58,11 +58,7 @@ void iscsit_increment_maxcmdsn(struct
> iscsi_cmd
> > > > *cmd, struct iscsi_session *sess
> > > >
> > > >         cmd->maxcmdsn_inc = 1;
> > > >
> > > > -       if (!mutex_trylock(&sess->cmdsn_mutex)) {
> > > > -               sess->max_cmd_sn += 1;
> > > > -               pr_debug("Updated MaxCmdSN to 0x%08x\n", sess-
> > > > >max_cmd_sn);
> > > > -               return;
> > > > -       }
> > > > +       mutex_lock(&sess->cmdsn_mutex);
> > > >         sess->max_cmd_sn += 1;
> > > >         pr_debug("Updated MaxCmdSN to 0x%08x\n", sess-
> >max_cmd_sn);
> > > >         mutex_unlock(&sess->cmdsn_mutex);
> > > >
> > > >
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe target-
> > devel"
> > > > in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-
> info.html
> >


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: CmdSN greather than MaxCmdSN protocol error in LIO Iser
  2013-11-12 23:03                         ` Moussa Ba (moussaba)
@ 2013-11-13  1:51                           ` Nicholas A. Bellinger
  0 siblings, 0 replies; 18+ messages in thread
From: Nicholas A. Bellinger @ 2013-11-13  1:51 UTC (permalink / raw)
  To: Moussa Ba (moussaba)
  Cc: Mike Christie, target-devel, Nicholas Bellinger, Or Gerlitz, linux-scsi

On Tue, 2013-11-12 at 23:03 +0000, Moussa Ba (moussaba) wrote:
> Your latest patch seems to have addressed the problem, I have not been
> able to reproduce it post patch and can readily reproduce it with the
> mainline 3.12 kernel. I will run a longer test tonight, but this seems
> to have done it.
> 

Excellent.  

Ok, so in the end this was bug due to a misinterpretation of how
mutex_trylock() works..

I was originally under the assumption that mutex_trylock() was returning
zero upon contention when cmdsn_mutex was already held by the same
process (as can happen in the iscsi_execute_cmd() exception response
path), and not from a different process (as expected between normal
session RX/TX process contexts)..

Simply avoiding the direct ib_isert callback in lio_queue_status() for
the special iscsi_execute_cmd() exception cases allows the problematic
mutex_trylock() usage in iscsit_increment_maxcmdsn() to go away
entirely.

I'll put these into proper patch series shortly.

Thanks again,

--nab


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-11-13  1:51 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <697D97E420EE024FB7451B574CD4991BCA064554@NTXBOIMBX05.micron.com>
     [not found] ` <1384204653.12281.7.camel@haakon3.risingtidesystems.com>
     [not found]   ` <1384216960.12281.24.camel@haakon3.risingtidesystems.com>
     [not found]     ` <1384217317.12281.26.camel@haakon3.risingtidesystems.com>
2013-11-12  1:31       ` CmdSN greather than MaxCmdSN protocol error in LIO Iser Nicholas A. Bellinger
2013-11-12  2:16         ` Nicholas A. Bellinger
2013-11-12  2:32         ` Michael Christie
2013-11-12  2:39           ` Michael Christie
2013-11-12  4:34             ` Nicholas A. Bellinger
2013-11-12  7:27               ` Or Gerlitz
2013-11-12  9:15                 ` Nicholas A. Bellinger
2013-11-12 15:37               ` Mike Christie
2013-11-12 15:43                 ` Mike Christie
2013-11-12 18:18                 ` Moussa Ba (moussaba)
2013-11-12 19:46                   ` Michael Christie
2013-11-12 19:48                     ` Moussa Ba (moussaba)
2013-11-12 21:15                   ` Nicholas A. Bellinger
2013-11-12 22:11                     ` Moussa Ba (moussaba)
2013-11-12 22:14                       ` Nicholas A. Bellinger
2013-11-12 22:15                         ` Moussa Ba (moussaba)
2013-11-12 23:03                         ` Moussa Ba (moussaba)
2013-11-13  1:51                           ` Nicholas A. Bellinger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.