All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH target-pending] iscsi-target: make sure to wake up sleeping login worker
@ 2018-01-19 13:36 ` Florian Westphal
  0 siblings, 0 replies; 8+ messages in thread
From: Florian Westphal @ 2018-01-19 13:36 UTC (permalink / raw)
  To: target-devel; +Cc: mchristi, nab, netdev, linux-scsi, Florian Westphal

Mike Christie reports:
  Starting in 4.14 iscsi logins will fail around 50% of the time.

Problem appears to be that iscsi_target_sk_data_ready() callback may
return without doing anything in case it finds the login work queue
is still blocked in sock_recvmsg().

Nicholas Bellinger says:
  It would indicate users providing their own ->sk_data_ready() callback
  must be responsible for waking up a kthread context blocked on
  sock_recvmsg(..., MSG_WAITALL), when a second ->sk_data_ready() is
  received before the first sock_recvmsg(..., MSG_WAITALL) completes.

So, do this and invoke the original data_ready() callback -- in
case of tcp sockets this takes care of waking the thread.

Disclaimer: I do not understand why this problem did not show up before
tcp prequeue removal.

Reported-by: Mike Christie <mchristi@redhat.com>
Bisected-by: Mike Christie <mchristi@redhat.com>
Tested-by: Mike Christie <mchristi@redhat.com>
Diagnosed-by: Nicholas Bellinger <nab@linux-iscsi.org>
Fixes: e7942d0633c4 ("tcp: remove prequeue support")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 drivers/target/iscsi/iscsi_target_nego.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c
index b686e2ce9c0e..3723f8f419aa 100644
--- a/drivers/target/iscsi/iscsi_target_nego.c
+++ b/drivers/target/iscsi/iscsi_target_nego.c
@@ -432,6 +432,9 @@ static void iscsi_target_sk_data_ready(struct sock *sk)
 	if (test_and_set_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags)) {
 		write_unlock_bh(&sk->sk_callback_lock);
 		pr_debug("Got LOGIN_FLAGS_READ_ACTIVE=1, conn: %p >>>>\n", conn);
+		if (WARN_ON(iscsi_target_sk_data_ready == conn->orig_data_ready))
+			return;
+		conn->orig_data_ready(sk);
 		return;
 	}
 
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH target-pending] iscsi-target: make sure to wake up sleeping login worker
@ 2018-01-19 13:36 ` Florian Westphal
  0 siblings, 0 replies; 8+ messages in thread
From: Florian Westphal @ 2018-01-19 13:36 UTC (permalink / raw)
  To: target-devel; +Cc: mchristi, nab, netdev, linux-scsi, Florian Westphal

Mike Christie reports:
  Starting in 4.14 iscsi logins will fail around 50% of the time.

Problem appears to be that iscsi_target_sk_data_ready() callback may
return without doing anything in case it finds the login work queue
is still blocked in sock_recvmsg().

Nicholas Bellinger says:
  It would indicate users providing their own ->sk_data_ready() callback
  must be responsible for waking up a kthread context blocked on
  sock_recvmsg(..., MSG_WAITALL), when a second ->sk_data_ready() is
  received before the first sock_recvmsg(..., MSG_WAITALL) completes.

So, do this and invoke the original data_ready() callback -- in
case of tcp sockets this takes care of waking the thread.

Disclaimer: I do not understand why this problem did not show up before
tcp prequeue removal.

Reported-by: Mike Christie <mchristi@redhat.com>
Bisected-by: Mike Christie <mchristi@redhat.com>
Tested-by: Mike Christie <mchristi@redhat.com>
Diagnosed-by: Nicholas Bellinger <nab@linux-iscsi.org>
Fixes: e7942d0633c4 ("tcp: remove prequeue support")
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 drivers/target/iscsi/iscsi_target_nego.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c
index b686e2ce9c0e..3723f8f419aa 100644
--- a/drivers/target/iscsi/iscsi_target_nego.c
+++ b/drivers/target/iscsi/iscsi_target_nego.c
@@ -432,6 +432,9 @@ static void iscsi_target_sk_data_ready(struct sock *sk)
 	if (test_and_set_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags)) {
 		write_unlock_bh(&sk->sk_callback_lock);
 		pr_debug("Got LOGIN_FLAGS_READ_ACTIVE=1, conn: %p >>>>\n", conn);
+		if (WARN_ON(iscsi_target_sk_data_ready = conn->orig_data_ready))
+			return;
+		conn->orig_data_ready(sk);
 		return;
 	}
 
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH target-pending] iscsi-target: make sure to wake up sleeping login worker
  2018-01-19 13:36 ` Florian Westphal
@ 2018-01-19 15:46   ` Eric Dumazet
  -1 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2018-01-19 15:46 UTC (permalink / raw)
  To: Florian Westphal, target-devel; +Cc: mchristi, nab, netdev, linux-scsi

On Fri, 2018-01-19 at 14:36 +0100, Florian Westphal wrote:
> Mike Christie reports:
>   Starting in 4.14 iscsi logins will fail around 50% of the time.
> 
> Problem appears to be that iscsi_target_sk_data_ready() callback may
> return without doing anything in case it finds the login work queue
> is still blocked in sock_recvmsg().
> 
> Nicholas Bellinger says:
>   It would indicate users providing their own ->sk_data_ready() callback
>   must be responsible for waking up a kthread context blocked on
>   sock_recvmsg(..., MSG_WAITALL), when a second ->sk_data_ready() is
>   received before the first sock_recvmsg(..., MSG_WAITALL) completes.
> 
> So, do this and invoke the original data_ready() callback -- in
> case of tcp sockets this takes care of waking the thread.
> 
> Disclaimer: I do not understand why this problem did not show up before
> tcp prequeue removal.
> 
> Reported-by: Mike Christie <mchristi@redhat.com>
> Bisected-by: Mike Christie <mchristi@redhat.com>
> Tested-by: Mike Christie <mchristi@redhat.com>
> Diagnosed-by: Nicholas Bellinger <nab@linux-iscsi.org>
> Fixes: e7942d0633c4 ("tcp: remove prequeue support")
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  drivers/target/iscsi/iscsi_target_nego.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c
> index b686e2ce9c0e..3723f8f419aa 100644
> --- a/drivers/target/iscsi/iscsi_target_nego.c
> +++ b/drivers/target/iscsi/iscsi_target_nego.c
> @@ -432,6 +432,9 @@ static void iscsi_target_sk_data_ready(struct sock *sk)
>  	if (test_and_set_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags)) {
>  		write_unlock_bh(&sk->sk_callback_lock);
>  		pr_debug("Got LOGIN_FLAGS_READ_ACTIVE=1, conn: %p >>>>\n", conn);
> +		if (WARN_ON(iscsi_target_sk_data_ready == conn->orig_data_ready))
> +			return;

Is this WARN_ON() belonging to this fix ?
At least make it WARN_ON_ONCE() or pr_err_once()

> +		conn->orig_data_ready(sk);
>  		return;
>  	}
>  

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH target-pending] iscsi-target: make sure to wake up sleeping login worker
@ 2018-01-19 15:46   ` Eric Dumazet
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2018-01-19 15:46 UTC (permalink / raw)
  To: Florian Westphal, target-devel; +Cc: mchristi, nab, netdev, linux-scsi

On Fri, 2018-01-19 at 14:36 +0100, Florian Westphal wrote:
> Mike Christie reports:
>   Starting in 4.14 iscsi logins will fail around 50% of the time.
> 
> Problem appears to be that iscsi_target_sk_data_ready() callback may
> return without doing anything in case it finds the login work queue
> is still blocked in sock_recvmsg().
> 
> Nicholas Bellinger says:
>   It would indicate users providing their own ->sk_data_ready() callback
>   must be responsible for waking up a kthread context blocked on
>   sock_recvmsg(..., MSG_WAITALL), when a second ->sk_data_ready() is
>   received before the first sock_recvmsg(..., MSG_WAITALL) completes.
> 
> So, do this and invoke the original data_ready() callback -- in
> case of tcp sockets this takes care of waking the thread.
> 
> Disclaimer: I do not understand why this problem did not show up before
> tcp prequeue removal.
> 
> Reported-by: Mike Christie <mchristi@redhat.com>
> Bisected-by: Mike Christie <mchristi@redhat.com>
> Tested-by: Mike Christie <mchristi@redhat.com>
> Diagnosed-by: Nicholas Bellinger <nab@linux-iscsi.org>
> Fixes: e7942d0633c4 ("tcp: remove prequeue support")
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  drivers/target/iscsi/iscsi_target_nego.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c
> index b686e2ce9c0e..3723f8f419aa 100644
> --- a/drivers/target/iscsi/iscsi_target_nego.c
> +++ b/drivers/target/iscsi/iscsi_target_nego.c
> @@ -432,6 +432,9 @@ static void iscsi_target_sk_data_ready(struct sock *sk)
>  	if (test_and_set_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags)) {
>  		write_unlock_bh(&sk->sk_callback_lock);
>  		pr_debug("Got LOGIN_FLAGS_READ_ACTIVE=1, conn: %p >>>>\n", conn);
> +		if (WARN_ON(iscsi_target_sk_data_ready = conn->orig_data_ready))
> +			return;

Is this WARN_ON() belonging to this fix ?
At least make it WARN_ON_ONCE() or pr_err_once()

> +		conn->orig_data_ready(sk);
>  		return;
>  	}
>  

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH target-pending] iscsi-target: make sure to wake up sleeping login worker
  2018-01-19 15:46   ` Eric Dumazet
@ 2018-01-19 17:26     ` Florian Westphal
  -1 siblings, 0 replies; 8+ messages in thread
From: Florian Westphal @ 2018-01-19 17:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Florian Westphal, target-devel, mchristi, nab, netdev, linux-scsi

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2018-01-19 at 14:36 +0100, Florian Westphal wrote:
> > diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c
> > index b686e2ce9c0e..3723f8f419aa 100644
> > --- a/drivers/target/iscsi/iscsi_target_nego.c
> > +++ b/drivers/target/iscsi/iscsi_target_nego.c
> > @@ -432,6 +432,9 @@ static void iscsi_target_sk_data_ready(struct sock *sk)
> >  	if (test_and_set_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags)) {
> >  		write_unlock_bh(&sk->sk_callback_lock);
> >  		pr_debug("Got LOGIN_FLAGS_READ_ACTIVE=1, conn: %p >>>>\n", conn);
> > +		if (WARN_ON(iscsi_target_sk_data_ready == conn->orig_data_ready))
> > +			return;
> 
> Is this WARN_ON() belonging to this fix ?
> At least make it WARN_ON_ONCE() or pr_err_once()

Nicholas, I don't know this code at all so it would be good if you could
give advice here (omit all together, WARN_ON_ONCE, ...).

Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH target-pending] iscsi-target: make sure to wake up sleeping login worker
@ 2018-01-19 17:26     ` Florian Westphal
  0 siblings, 0 replies; 8+ messages in thread
From: Florian Westphal @ 2018-01-19 17:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Florian Westphal, target-devel, mchristi, nab, netdev, linux-scsi

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2018-01-19 at 14:36 +0100, Florian Westphal wrote:
> > diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c
> > index b686e2ce9c0e..3723f8f419aa 100644
> > --- a/drivers/target/iscsi/iscsi_target_nego.c
> > +++ b/drivers/target/iscsi/iscsi_target_nego.c
> > @@ -432,6 +432,9 @@ static void iscsi_target_sk_data_ready(struct sock *sk)
> >  	if (test_and_set_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags)) {
> >  		write_unlock_bh(&sk->sk_callback_lock);
> >  		pr_debug("Got LOGIN_FLAGS_READ_ACTIVE=1, conn: %p >>>>\n", conn);
> > +		if (WARN_ON(iscsi_target_sk_data_ready = conn->orig_data_ready))
> > +			return;
> 
> Is this WARN_ON() belonging to this fix ?
> At least make it WARN_ON_ONCE() or pr_err_once()

Nicholas, I don't know this code at all so it would be good if you could
give advice here (omit all together, WARN_ON_ONCE, ...).

Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH target-pending] iscsi-target: make sure to wake up sleeping login worker
  2018-01-19 17:26     ` Florian Westphal
@ 2018-01-24  7:01       ` Nicholas A. Bellinger
  -1 siblings, 0 replies; 8+ messages in thread
From: Nicholas A. Bellinger @ 2018-01-24  7:01 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Eric Dumazet, target-devel, mchristi, netdev, linux-scsi

Hey Florian & Co,

On Fri, 2018-01-19 at 18:26 +0100, Florian Westphal wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Fri, 2018-01-19 at 14:36 +0100, Florian Westphal wrote:
> > > diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c
> > > index b686e2ce9c0e..3723f8f419aa 100644
> > > --- a/drivers/target/iscsi/iscsi_target_nego.c
> > > +++ b/drivers/target/iscsi/iscsi_target_nego.c
> > > @@ -432,6 +432,9 @@ static void iscsi_target_sk_data_ready(struct sock *sk)
> > >  	if (test_and_set_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags)) {
> > >  		write_unlock_bh(&sk->sk_callback_lock);
> > >  		pr_debug("Got LOGIN_FLAGS_READ_ACTIVE=1, conn: %p >>>>\n", conn);
> > > +		if (WARN_ON(iscsi_target_sk_data_ready == conn->orig_data_ready))
> > > +			return;
> > 
> > Is this WARN_ON() belonging to this fix ?
> > At least make it WARN_ON_ONCE() or pr_err_once()
> 
> Nicholas, I don't know this code at all so it would be good if you could
> give advice here (omit all together, WARN_ON_ONCE, ...).
> 

This is regular behavior during multi PDU login sequences, and should
not include a WARN_ON.

So with MNC's Tested-by in place, applying to target-pending/for-next
minus the WARN_ON, with a extra 4.14.y stable tag.

Thanks again for taking a look at this.

To your earlier point wrt net.ipv4.tcp_low_latency=1 on 4.13 code not
triggering pre-queue logic.  From groking the original patch to drop
prequeue I agree this should really be the case, but am still at a loss
how MNC is triggering on 4.14+ unless something else has changed to
uncover this iscsi-target bug.

Still curious to verify the root cause, but I haven't been able to
reproduce this in VMs on small scale, and haven't had cycles to
reproduce on HW yet.

That said, since the bug appears to be masked on <= 4.13.y +
tcp_low_latency=1, unless someone can reproduce this on earlier code
with tcp_low_latency=0, I'll leave off the older stable tag for now.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH target-pending] iscsi-target: make sure to wake up sleeping login worker
@ 2018-01-24  7:01       ` Nicholas A. Bellinger
  0 siblings, 0 replies; 8+ messages in thread
From: Nicholas A. Bellinger @ 2018-01-24  7:01 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Eric Dumazet, target-devel, mchristi, netdev, linux-scsi

Hey Florian & Co,

On Fri, 2018-01-19 at 18:26 +0100, Florian Westphal wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Fri, 2018-01-19 at 14:36 +0100, Florian Westphal wrote:
> > > diff --git a/drivers/target/iscsi/iscsi_target_nego.c b/drivers/target/iscsi/iscsi_target_nego.c
> > > index b686e2ce9c0e..3723f8f419aa 100644
> > > --- a/drivers/target/iscsi/iscsi_target_nego.c
> > > +++ b/drivers/target/iscsi/iscsi_target_nego.c
> > > @@ -432,6 +432,9 @@ static void iscsi_target_sk_data_ready(struct sock *sk)
> > >  	if (test_and_set_bit(LOGIN_FLAGS_READ_ACTIVE, &conn->login_flags)) {
> > >  		write_unlock_bh(&sk->sk_callback_lock);
> > >  		pr_debug("Got LOGIN_FLAGS_READ_ACTIVE=1, conn: %p >>>>\n", conn);
> > > +		if (WARN_ON(iscsi_target_sk_data_ready = conn->orig_data_ready))
> > > +			return;
> > 
> > Is this WARN_ON() belonging to this fix ?
> > At least make it WARN_ON_ONCE() or pr_err_once()
> 
> Nicholas, I don't know this code at all so it would be good if you could
> give advice here (omit all together, WARN_ON_ONCE, ...).
> 

This is regular behavior during multi PDU login sequences, and should
not include a WARN_ON.

So with MNC's Tested-by in place, applying to target-pending/for-next
minus the WARN_ON, with a extra 4.14.y stable tag.

Thanks again for taking a look at this.

To your earlier point wrt net.ipv4.tcp_low_latency=1 on 4.13 code not
triggering pre-queue logic.  From groking the original patch to drop
prequeue I agree this should really be the case, but am still at a loss
how MNC is triggering on 4.14+ unless something else has changed to
uncover this iscsi-target bug.

Still curious to verify the root cause, but I haven't been able to
reproduce this in VMs on small scale, and haven't had cycles to
reproduce on HW yet.

That said, since the bug appears to be masked on <= 4.13.y +
tcp_low_latency=1, unless someone can reproduce this on earlier code
with tcp_low_latency=0, I'll leave off the older stable tag for now.


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-01-24  7:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-19 13:36 [PATCH target-pending] iscsi-target: make sure to wake up sleeping login worker Florian Westphal
2018-01-19 13:36 ` Florian Westphal
2018-01-19 15:46 ` Eric Dumazet
2018-01-19 15:46   ` Eric Dumazet
2018-01-19 17:26   ` Florian Westphal
2018-01-19 17:26     ` Florian Westphal
2018-01-24  7:01     ` Nicholas A. Bellinger
2018-01-24  7:01       ` Nicholas A. Bellinger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.