* [PATCH v2 0/1] scsi: libiscsi: fix NOP race condition @ 2020-09-25 18:41 lduncan 2020-09-25 18:41 ` [PATCH v2 1/1] " lduncan 0 siblings, 1 reply; 6+ messages in thread From: lduncan @ 2020-09-25 18:41 UTC (permalink / raw) To: linux-scsi Cc: linux-kernel, open-iscsi, martin.petersen, mchristi, hare, Lee Duncan From: Lee Duncan <lduncan@suse.com> A customer that uses iSCSI NOPs extensively found a race condition caused in part by the two-lock system used in iscsi (a forward and a back lock), since sending an iSCSI NOP uses the forward lock, and receiving one uses the back lock. Because of this, processing of the "send" can still be in progress when the "receive" occurs, on a sufficiently fast multicore system. To handle this case, we add a new state to the "ping_task" pointer besides unassigned and assigned, called "invalid", which means the "not yet completed sending". Tests show this closes this race condition hole. Changes since V1: - Removed two redundant lines in iscsi_send_nopout() - Updated commit text to be more clear - Added this cover letter with even more info Lee Duncan (1): scsi: libiscsi: fix NOP race condition drivers/scsi/libiscsi.c | 13 ++++++++++--- include/scsi/libiscsi.h | 3 +++ 2 files changed, 13 insertions(+), 3 deletions(-) -- 2.26.2 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 1/1] scsi: libiscsi: fix NOP race condition 2020-09-25 18:41 [PATCH v2 0/1] scsi: libiscsi: fix NOP race condition lduncan @ 2020-09-25 18:41 ` lduncan 2020-10-02 16:13 ` Lee Duncan 2020-10-08 17:11 ` Mike Christie 0 siblings, 2 replies; 6+ messages in thread From: lduncan @ 2020-09-25 18:41 UTC (permalink / raw) To: linux-scsi Cc: linux-kernel, open-iscsi, martin.petersen, mchristi, hare, Lee Duncan From: Lee Duncan <lduncan@suse.com> iSCSI NOPs are sometimes "lost", mistakenly sent to the user-land iscsid daemon instead of handled in the kernel, as they should be, resulting in a message from the daemon like: > iscsid: Got nop in, but kernel supports nop handling. This can occur because of the forward- and back-locks in the kernel iSCSI code, and the fact that an iSCSI NOP response can be processed before processing of the NOP send is complete. This can result in "conn->ping_task" being NULL in iscsi_nop_out_rsp(), when the pointer is actually in the process of being set. To work around this, we add a new state to the "ping_task" pointer. In addition to NULL (not assigned) and a pointer (assigned), we add the state "being set", which is signaled with an INVALID pointer (using "-1"). Signed-off-by: Lee Duncan <lduncan@suse.com> --- drivers/scsi/libiscsi.c | 13 ++++++++++--- include/scsi/libiscsi.h | 3 +++ 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 1e9c3171fa9f..cade108c33b6 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -738,6 +738,9 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr, task->conn->session->age); } + if (unlikely(READ_ONCE(conn->ping_task) == INVALID_SCSI_TASK)) + WRITE_ONCE(conn->ping_task, task); + if (!ihost->workq) { if (iscsi_prep_mgmt_task(conn, task)) goto free_task; @@ -941,8 +944,11 @@ static int iscsi_send_nopout(struct iscsi_conn *conn, struct iscsi_nopin *rhdr) struct iscsi_nopout hdr; struct iscsi_task *task; - if (!rhdr && conn->ping_task) - return -EINVAL; + if (!rhdr) { + if (READ_ONCE(conn->ping_task)) + return -EINVAL; + WRITE_ONCE(conn->ping_task, INVALID_SCSI_TASK); + } memset(&hdr, 0, sizeof(struct iscsi_nopout)); hdr.opcode = ISCSI_OP_NOOP_OUT | ISCSI_OP_IMMEDIATE; @@ -957,11 +963,12 @@ static int iscsi_send_nopout(struct iscsi_conn *conn, struct iscsi_nopin *rhdr) task = __iscsi_conn_send_pdu(conn, (struct iscsi_hdr *)&hdr, NULL, 0); if (!task) { + if (!rhdr) + WRITE_ONCE(conn->ping_task, NULL); iscsi_conn_printk(KERN_ERR, conn, "Could not send nopout\n"); return -EIO; } else if (!rhdr) { /* only track our nops */ - conn->ping_task = task; conn->last_ping = jiffies; } diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h index c25fb86ffae9..b3bbd10eb3f0 100644 --- a/include/scsi/libiscsi.h +++ b/include/scsi/libiscsi.h @@ -132,6 +132,9 @@ struct iscsi_task { void *dd_data; /* driver/transport data */ }; +/* invalid scsi_task pointer */ +#define INVALID_SCSI_TASK (struct iscsi_task *)-1l + static inline int iscsi_task_has_unsol_data(struct iscsi_task *task) { return task->unsol_r2t.data_length > task->unsol_r2t.sent; -- 2.26.2 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/1] scsi: libiscsi: fix NOP race condition 2020-09-25 18:41 ` [PATCH v2 1/1] " lduncan @ 2020-10-02 16:13 ` Lee Duncan 2020-10-08 17:11 ` Mike Christie 1 sibling, 0 replies; 6+ messages in thread From: Lee Duncan @ 2020-10-02 16:13 UTC (permalink / raw) To: linux-scsi; +Cc: linux-kernel, open-iscsi, martin.petersen, mchristi, hare On 9/25/20 11:41 AM, lduncan@suse.com wrote: > From: Lee Duncan <lduncan@suse.com> > > iSCSI NOPs are sometimes "lost", mistakenly sent to the > user-land iscsid daemon instead of handled in the kernel, > as they should be, resulting in a message from the daemon like: > >> iscsid: Got nop in, but kernel supports nop handling. > > This can occur because of the forward- and back-locks > in the kernel iSCSI code, and the fact that an iSCSI NOP > response can be processed before processing of the NOP send > is complete. This can result in "conn->ping_task" being NULL > in iscsi_nop_out_rsp(), when the pointer is actually in > the process of being set. > > To work around this, we add a new state to the "ping_task" > pointer. In addition to NULL (not assigned) and a pointer > (assigned), we add the state "being set", which is signaled > with an INVALID pointer (using "-1"). > > Signed-off-by: Lee Duncan <lduncan@suse.com> > --- > drivers/scsi/libiscsi.c | 13 ++++++++++--- > include/scsi/libiscsi.h | 3 +++ > 2 files changed, 13 insertions(+), 3 deletions(-) > > diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c > index 1e9c3171fa9f..cade108c33b6 100644 > --- a/drivers/scsi/libiscsi.c > +++ b/drivers/scsi/libiscsi.c > @@ -738,6 +738,9 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr, > task->conn->session->age); > } > > + if (unlikely(READ_ONCE(conn->ping_task) == INVALID_SCSI_TASK)) > + WRITE_ONCE(conn->ping_task, task); > + > if (!ihost->workq) { > if (iscsi_prep_mgmt_task(conn, task)) > goto free_task; > @@ -941,8 +944,11 @@ static int iscsi_send_nopout(struct iscsi_conn *conn, struct iscsi_nopin *rhdr) > struct iscsi_nopout hdr; > struct iscsi_task *task; > > - if (!rhdr && conn->ping_task) > - return -EINVAL; > + if (!rhdr) { > + if (READ_ONCE(conn->ping_task)) > + return -EINVAL; > + WRITE_ONCE(conn->ping_task, INVALID_SCSI_TASK); > + } > > memset(&hdr, 0, sizeof(struct iscsi_nopout)); > hdr.opcode = ISCSI_OP_NOOP_OUT | ISCSI_OP_IMMEDIATE; > @@ -957,11 +963,12 @@ static int iscsi_send_nopout(struct iscsi_conn *conn, struct iscsi_nopin *rhdr) > > task = __iscsi_conn_send_pdu(conn, (struct iscsi_hdr *)&hdr, NULL, 0); > if (!task) { > + if (!rhdr) > + WRITE_ONCE(conn->ping_task, NULL); > iscsi_conn_printk(KERN_ERR, conn, "Could not send nopout\n"); > return -EIO; > } else if (!rhdr) { > /* only track our nops */ > - conn->ping_task = task; > conn->last_ping = jiffies; > } > > diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h > index c25fb86ffae9..b3bbd10eb3f0 100644 > --- a/include/scsi/libiscsi.h > +++ b/include/scsi/libiscsi.h > @@ -132,6 +132,9 @@ struct iscsi_task { > void *dd_data; /* driver/transport data */ > }; > > +/* invalid scsi_task pointer */ > +#define INVALID_SCSI_TASK (struct iscsi_task *)-1l > + > static inline int iscsi_task_has_unsol_data(struct iscsi_task *task) > { > return task->unsol_r2t.data_length > task->unsol_r2t.sent; > Ping? -- Lee Duncan ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/1] scsi: libiscsi: fix NOP race condition 2020-09-25 18:41 ` [PATCH v2 1/1] " lduncan 2020-10-02 16:13 ` Lee Duncan @ 2020-10-08 17:11 ` Mike Christie 2020-10-08 20:54 ` Mike Christie 1 sibling, 1 reply; 6+ messages in thread From: Mike Christie @ 2020-10-08 17:11 UTC (permalink / raw) To: lduncan, linux-scsi Cc: linux-kernel, open-iscsi, martin.petersen, mchristi, hare On 9/25/20 1:41 PM, lduncan@suse.com wrote: > From: Lee Duncan <lduncan@suse.com> > > iSCSI NOPs are sometimes "lost", mistakenly sent to the > user-land iscsid daemon instead of handled in the kernel, > as they should be, resulting in a message from the daemon like: > >> iscsid: Got nop in, but kernel supports nop handling. > > This can occur because of the forward- and back-locks > in the kernel iSCSI code, and the fact that an iSCSI NOP > response can be processed before processing of the NOP send > is complete. This can result in "conn->ping_task" being NULL > in iscsi_nop_out_rsp(), when the pointer is actually in > the process of being set. > > To work around this, we add a new state to the "ping_task" > pointer. In addition to NULL (not assigned) and a pointer > (assigned), we add the state "being set", which is signaled > with an INVALID pointer (using "-1"). > > Signed-off-by: Lee Duncan <lduncan@suse.com> > --- > drivers/scsi/libiscsi.c | 13 ++++++++++--- > include/scsi/libiscsi.h | 3 +++ > 2 files changed, 13 insertions(+), 3 deletions(-) > > diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c > index 1e9c3171fa9f..cade108c33b6 100644 > --- a/drivers/scsi/libiscsi.c > +++ b/drivers/scsi/libiscsi.c > @@ -738,6 +738,9 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr, > task->conn->session->age); > } > > + if (unlikely(READ_ONCE(conn->ping_task) == INVALID_SCSI_TASK)) > + WRITE_ONCE(conn->ping_task, task); > + > if (!ihost->workq) { > if (iscsi_prep_mgmt_task(conn, task)) > goto free_task; I think the API gets a little weird now where in some cases __iscsi_conn_send_pdu checks the opcode to see what type of request it is but above we the caller sets the ping_task. For login, tmfs and passthrough, we assume the __iscsi_conn_send_pdu has sent or cleaned up everything. I think it might be nicer to just have __iscsi_conn_send_pdu set the ping_task field before doing the xmit/queue call. It would then work similar to the conn->login_task case where that function knows about that special task too. So in __iscsi_conn_send_pdu add a "if (opcode == ISCSI_OP_NOOP_OUT)", and check if it's a nop we need to track. If so set conn->ping_task. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/1] scsi: libiscsi: fix NOP race condition 2020-10-08 17:11 ` Mike Christie @ 2020-10-08 20:54 ` Mike Christie 2020-10-20 16:55 ` Lee Duncan 0 siblings, 1 reply; 6+ messages in thread From: Mike Christie @ 2020-10-08 20:54 UTC (permalink / raw) To: lduncan, linux-scsi Cc: linux-kernel, open-iscsi, martin.petersen, mchristi, hare On 10/8/20 12:11 PM, Mike Christie wrote: > On 9/25/20 1:41 PM, lduncan@suse.com wrote: >> From: Lee Duncan <lduncan@suse.com> >> >> iSCSI NOPs are sometimes "lost", mistakenly sent to the >> user-land iscsid daemon instead of handled in the kernel, >> as they should be, resulting in a message from the daemon like: >> >>> iscsid: Got nop in, but kernel supports nop handling. >> >> This can occur because of the forward- and back-locks >> in the kernel iSCSI code, and the fact that an iSCSI NOP >> response can be processed before processing of the NOP send >> is complete. This can result in "conn->ping_task" being NULL >> in iscsi_nop_out_rsp(), when the pointer is actually in >> the process of being set. >> >> To work around this, we add a new state to the "ping_task" >> pointer. In addition to NULL (not assigned) and a pointer >> (assigned), we add the state "being set", which is signaled >> with an INVALID pointer (using "-1"). >> >> Signed-off-by: Lee Duncan <lduncan@suse.com> >> --- >> drivers/scsi/libiscsi.c | 13 ++++++++++--- >> include/scsi/libiscsi.h | 3 +++ >> 2 files changed, 13 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c >> index 1e9c3171fa9f..cade108c33b6 100644 >> --- a/drivers/scsi/libiscsi.c >> +++ b/drivers/scsi/libiscsi.c >> @@ -738,6 +738,9 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr, >> task->conn->session->age); >> } >> >> + if (unlikely(READ_ONCE(conn->ping_task) == INVALID_SCSI_TASK)) >> + WRITE_ONCE(conn->ping_task, task); >> + >> if (!ihost->workq) { >> if (iscsi_prep_mgmt_task(conn, task)) >> goto free_task; > > I think the API gets a little weird now where in some cases > __iscsi_conn_send_pdu checks the opcode to see what type of request > it is but above we the caller sets the ping_task. > > For login, tmfs and passthrough, we assume the __iscsi_conn_send_pdu > has sent or cleaned up everything. I think it might be nicer to just > have __iscsi_conn_send_pdu set the ping_task field before doing the > xmit/queue call. It would then work similar to the conn->login_task > case where that function knows about that special task too. > > So in __iscsi_conn_send_pdu add a "if (opcode == ISCSI_OP_NOOP_OUT)", > and check if it's a nop we need to track. If so set conn->ping_task. > Ignore this. It won't work nicely either. To figure out if the nop is our internal transport test ping vs a userspace ping that also needs a reply, we would need to do something like you did above so there is no point. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/1] scsi: libiscsi: fix NOP race condition 2020-10-08 20:54 ` Mike Christie @ 2020-10-20 16:55 ` Lee Duncan 0 siblings, 0 replies; 6+ messages in thread From: Lee Duncan @ 2020-10-20 16:55 UTC (permalink / raw) To: Mike Christie, linux-scsi Cc: linux-kernel, open-iscsi, martin.petersen, mchristi, hare On 10/8/20 1:54 PM, Mike Christie wrote: > On 10/8/20 12:11 PM, Mike Christie wrote: >> On 9/25/20 1:41 PM, lduncan@suse.com wrote: >>> From: Lee Duncan <lduncan@suse.com> >>> >>> iSCSI NOPs are sometimes "lost", mistakenly sent to the >>> user-land iscsid daemon instead of handled in the kernel, >>> as they should be, resulting in a message from the daemon like: >>> >>>> iscsid: Got nop in, but kernel supports nop handling. >>> >>> This can occur because of the forward- and back-locks >>> in the kernel iSCSI code, and the fact that an iSCSI NOP >>> response can be processed before processing of the NOP send >>> is complete. This can result in "conn->ping_task" being NULL >>> in iscsi_nop_out_rsp(), when the pointer is actually in >>> the process of being set. >>> >>> To work around this, we add a new state to the "ping_task" >>> pointer. In addition to NULL (not assigned) and a pointer >>> (assigned), we add the state "being set", which is signaled >>> with an INVALID pointer (using "-1"). >>> >>> Signed-off-by: Lee Duncan <lduncan@suse.com> >>> --- >>> drivers/scsi/libiscsi.c | 13 ++++++++++--- >>> include/scsi/libiscsi.h | 3 +++ >>> 2 files changed, 13 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c >>> index 1e9c3171fa9f..cade108c33b6 100644 >>> --- a/drivers/scsi/libiscsi.c >>> +++ b/drivers/scsi/libiscsi.c >>> @@ -738,6 +738,9 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr, >>> task->conn->session->age); >>> } >>> >>> + if (unlikely(READ_ONCE(conn->ping_task) == INVALID_SCSI_TASK)) >>> + WRITE_ONCE(conn->ping_task, task); >>> + >>> if (!ihost->workq) { >>> if (iscsi_prep_mgmt_task(conn, task)) >>> goto free_task; >> >> I think the API gets a little weird now where in some cases >> __iscsi_conn_send_pdu checks the opcode to see what type of request >> it is but above we the caller sets the ping_task. >> >> For login, tmfs and passthrough, we assume the __iscsi_conn_send_pdu >> has sent or cleaned up everything. I think it might be nicer to just >> have __iscsi_conn_send_pdu set the ping_task field before doing the >> xmit/queue call. It would then work similar to the conn->login_task >> case where that function knows about that special task too. >> >> So in __iscsi_conn_send_pdu add a "if (opcode == ISCSI_OP_NOOP_OUT)", >> and check if it's a nop we need to track. If so set conn->ping_task. >> > Ignore this. It won't work nicely either. To figure out if the nop is > our internal transport test ping vs a userspace ping that also needs > a reply, we would need to do something like you did above so there is > no point. > Hi Mike: I've read this a few times, and I'm still no sure I'm parsing it correctly. Are you saying that my original patch submission is ok, or are you saying there's nothing we can do and we're up the proverbial creek? -- Lee Duncan ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-10-20 16:55 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-25 18:41 [PATCH v2 0/1] scsi: libiscsi: fix NOP race condition lduncan 2020-09-25 18:41 ` [PATCH v2 1/1] " lduncan 2020-10-02 16:13 ` Lee Duncan 2020-10-08 17:11 ` Mike Christie 2020-10-08 20:54 ` Mike Christie 2020-10-20 16:55 ` Lee Duncan
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.