Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH net v3] ibmvnic: Continue with reset if set link down failed
@ 2021-05-04 19:11 Dany Madden
  2021-05-04 19:27 ` Lijun Pan
  0 siblings, 1 reply; 5+ messages in thread
From: Dany Madden @ 2021-05-04 19:11 UTC (permalink / raw)
  To: davem, kuba
  Cc: drt, sukadev, tlfalcon, mpe, benh, paulus, netdev, linuxppc-dev

When ibmvnic gets a FATAL error message from the vnicserver, it marks
the Command Respond Queue (CRQ) inactive and resets the adapter. If this
FATAL reset fails and a transmission timeout reset follows, the CRQ is
still inactive, ibmvnic's attempt to set link down will also fail. If
ibmvnic abandons the reset because of this failed set link down and this
is the last reset in the workqueue, then this adapter will be left in an
inoperable state.

Instead, make the driver ignore this link down failure and continue to
free and re-register CRQ so that the adapter has an opportunity to
recover.

Fixes: ed651a10875f ("ibmvnic: Updated reset handling")
Signed-off-by: Dany Madden <drt@linux.ibm.com>
Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com>
Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
---
Changes in V2:
- Update description to clarify background for the patch
- Include Reviewed-by tags
Changes in V3:
- Add comment above the code change
---
 drivers/net/ethernet/ibm/ibmvnic.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 5788bb956d73..9e005a08d43b 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2017,8 +2017,15 @@ static int do_reset(struct ibmvnic_adapter *adapter,
 			rtnl_unlock();
 			rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN);
 			rtnl_lock();
-			if (rc)
-				goto out;
+
+			/* Attempted to set the link down. It could fail if the
+			 * vnicserver has already torn down the CRQ. We will
+			 * note it and continue with reset to reinit the CRQ.
+			 */
+			if (rc) {
+				netdev_dbg(netdev,
+					   "Setting link down failed rc=%d. Continue anyway\n", rc);
+			}
 
 			if (adapter->state == VNIC_OPEN) {
 				/* When we dropped rtnl, ibmvnic_open() got
-- 
2.18.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v3] ibmvnic: Continue with reset if set link down failed
  2021-05-04 19:11 [PATCH net v3] ibmvnic: Continue with reset if set link down failed Dany Madden
@ 2021-05-04 19:27 ` Lijun Pan
  2021-05-04 19:31   ` Lijun Pan
  0 siblings, 1 reply; 5+ messages in thread
From: Lijun Pan @ 2021-05-04 19:27 UTC (permalink / raw)
  To: Dany Madden
  Cc: David S. Miller, Jakub Kicinski, Sukadev Bhattiprolu,
	Thomas Falcon, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, netdev, linuxppc-dev

On Tue, May 4, 2021 at 2:14 PM Dany Madden <drt@linux.ibm.com> wrote:
>
> When ibmvnic gets a FATAL error message from the vnicserver, it marks
> the Command Respond Queue (CRQ) inactive and resets the adapter. If this
> FATAL reset fails and a transmission timeout reset follows, the CRQ is
> still inactive, ibmvnic's attempt to set link down will also fail. If
> ibmvnic abandons the reset because of this failed set link down and this
> is the last reset in the workqueue, then this adapter will be left in an
> inoperable state.
>
> Instead, make the driver ignore this link down failure and continue to
> free and re-register CRQ so that the adapter has an opportunity to
> recover.
>
> Fixes: ed651a10875f ("ibmvnic: Updated reset handling")
> Signed-off-by: Dany Madden <drt@linux.ibm.com>
> Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com>
> Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
> ---
> Changes in V2:
> - Update description to clarify background for the patch
> - Include Reviewed-by tags
> Changes in V3:
> - Add comment above the code change
> ---
>  drivers/net/ethernet/ibm/ibmvnic.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
> index 5788bb956d73..9e005a08d43b 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.c
> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> @@ -2017,8 +2017,15 @@ static int do_reset(struct ibmvnic_adapter *adapter,
>                         rtnl_unlock();
>                         rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN);
>                         rtnl_lock();
> -                       if (rc)
> -                               goto out;
> +
> +                       /* Attempted to set the link down. It could fail if the
> +                        * vnicserver has already torn down the CRQ. We will
> +                        * note it and continue with reset to reinit the CRQ.
> +                        */
> +                       if (rc) {
> +                               netdev_dbg(netdev,
> +                                          "Setting link down failed rc=%d. Continue anyway\n", rc);
> +                       }

There are other places which check and rely on the return value of
this function. Your change makes that inconsistent. Can you stop
posting new versions and soliciting the maintainer to accept it before
there is material change? There are many ways to make reset
successful. I think this is the worst approach of all.


>
>                         if (adapter->state == VNIC_OPEN) {
>                                 /* When we dropped rtnl, ibmvnic_open() got
> --
> 2.18.2
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v3] ibmvnic: Continue with reset if set link down failed
  2021-05-04 19:27 ` Lijun Pan
@ 2021-05-04 19:31   ` Lijun Pan
  2021-05-04 20:24     ` Dany Madden
  0 siblings, 1 reply; 5+ messages in thread
From: Lijun Pan @ 2021-05-04 19:31 UTC (permalink / raw)
  To: Dany Madden
  Cc: David S. Miller, Jakub Kicinski, Sukadev Bhattiprolu,
	Thomas Falcon, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, netdev, linuxppc-dev

On Tue, May 4, 2021 at 2:27 PM Lijun Pan <lijunp213@gmail.com> wrote:
>
> On Tue, May 4, 2021 at 2:14 PM Dany Madden <drt@linux.ibm.com> wrote:
> >
> > When ibmvnic gets a FATAL error message from the vnicserver, it marks
> > the Command Respond Queue (CRQ) inactive and resets the adapter. If this
> > FATAL reset fails and a transmission timeout reset follows, the CRQ is
> > still inactive, ibmvnic's attempt to set link down will also fail. If
> > ibmvnic abandons the reset because of this failed set link down and this
> > is the last reset in the workqueue, then this adapter will be left in an
> > inoperable state.
> >
> > Instead, make the driver ignore this link down failure and continue to
> > free and re-register CRQ so that the adapter has an opportunity to
> > recover.
> >
> > Fixes: ed651a10875f ("ibmvnic: Updated reset handling")
> > Signed-off-by: Dany Madden <drt@linux.ibm.com>
> > Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com>
> > Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
> > ---
> > Changes in V2:
> > - Update description to clarify background for the patch
> > - Include Reviewed-by tags
> > Changes in V3:
> > - Add comment above the code change
> > ---
> >  drivers/net/ethernet/ibm/ibmvnic.c | 11 +++++++++--
> >  1 file changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
> > index 5788bb956d73..9e005a08d43b 100644
> > --- a/drivers/net/ethernet/ibm/ibmvnic.c
> > +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> > @@ -2017,8 +2017,15 @@ static int do_reset(struct ibmvnic_adapter *adapter,
> >                         rtnl_unlock();
> >                         rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN);
> >                         rtnl_lock();
> > -                       if (rc)
> > -                               goto out;
> > +
> > +                       /* Attempted to set the link down. It could fail if the
> > +                        * vnicserver has already torn down the CRQ. We will
> > +                        * note it and continue with reset to reinit the CRQ.
> > +                        */
> > +                       if (rc) {
> > +                               netdev_dbg(netdev,
> > +                                          "Setting link down failed rc=%d. Continue anyway\n", rc);
> > +                       }
>
> There are other places which check and rely on the return value of
> this function. Your change makes that inconsistent. Can you stop

To be more specific, __ibmvnic_close, __ibmvnic_open both call this
set_link_state.

> posting new versions and soliciting the maintainer to accept it before
> there is material change? There are many ways to make reset
> successful. I think this is the worst approach of all.
>
>
> >
> >                         if (adapter->state == VNIC_OPEN) {
> >                                 /* When we dropped rtnl, ibmvnic_open() got
> > --
> > 2.18.2
> >

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v3] ibmvnic: Continue with reset if set link down failed
  2021-05-04 19:31   ` Lijun Pan
@ 2021-05-04 20:24     ` Dany Madden
  2021-05-07  7:24       ` Lijun Pan
  0 siblings, 1 reply; 5+ messages in thread
From: Dany Madden @ 2021-05-04 20:24 UTC (permalink / raw)
  To: Lijun Pan
  Cc: David S. Miller, Jakub Kicinski, Sukadev Bhattiprolu,
	Thomas Falcon, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, netdev, linuxppc-dev

On 2021-05-04 12:31, Lijun Pan wrote:
> On Tue, May 4, 2021 at 2:27 PM Lijun Pan <lijunp213@gmail.com> wrote:
>> 
>> On Tue, May 4, 2021 at 2:14 PM Dany Madden <drt@linux.ibm.com> wrote:
>> >
>> > When ibmvnic gets a FATAL error message from the vnicserver, it marks
>> > the Command Respond Queue (CRQ) inactive and resets the adapter. If this
>> > FATAL reset fails and a transmission timeout reset follows, the CRQ is
>> > still inactive, ibmvnic's attempt to set link down will also fail. If
>> > ibmvnic abandons the reset because of this failed set link down and this
>> > is the last reset in the workqueue, then this adapter will be left in an
>> > inoperable state.
>> >
>> > Instead, make the driver ignore this link down failure and continue to
>> > free and re-register CRQ so that the adapter has an opportunity to
>> > recover.
>> >
>> > Fixes: ed651a10875f ("ibmvnic: Updated reset handling")
>> > Signed-off-by: Dany Madden <drt@linux.ibm.com>
>> > Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com>
>> > Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
>> > ---
>> > Changes in V2:
>> > - Update description to clarify background for the patch
>> > - Include Reviewed-by tags
>> > Changes in V3:
>> > - Add comment above the code change
>> > ---
>> >  drivers/net/ethernet/ibm/ibmvnic.c | 11 +++++++++--
>> >  1 file changed, 9 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
>> > index 5788bb956d73..9e005a08d43b 100644
>> > --- a/drivers/net/ethernet/ibm/ibmvnic.c
>> > +++ b/drivers/net/ethernet/ibm/ibmvnic.c
>> > @@ -2017,8 +2017,15 @@ static int do_reset(struct ibmvnic_adapter *adapter,
>> >                         rtnl_unlock();
>> >                         rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN);
>> >                         rtnl_lock();
>> > -                       if (rc)
>> > -                               goto out;
>> > +
>> > +                       /* Attempted to set the link down. It could fail if the
>> > +                        * vnicserver has already torn down the CRQ. We will
>> > +                        * note it and continue with reset to reinit the CRQ.
>> > +                        */
>> > +                       if (rc) {
>> > +                               netdev_dbg(netdev,
>> > +                                          "Setting link down failed rc=%d. Continue anyway\n", rc);
>> > +                       }
>> 
>> There are other places which check and rely on the return value of
>> this function. Your change makes that inconsistent. Can you stop
> 
> To be more specific, __ibmvnic_close, __ibmvnic_open both call this
> set_link_state.
Inconsistent would have been not checking for the rc at all. Here we 
checked and noted it that there are times that it's ok to continue.

> 
>> posting new versions and soliciting the maintainer to accept it before
>> there is material change? There are many ways to make reset
>> successful. I think this is the worst approach of all.

Can you show me a patch that is better than this one, that has gone thru 
a 30+ hours of testing?

>> 
>> 
>> >
>> >                         if (adapter->state == VNIC_OPEN) {
>> >                                 /* When we dropped rtnl, ibmvnic_open() got
>> > --
>> > 2.18.2
>> >

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v3] ibmvnic: Continue with reset if set link down failed
  2021-05-04 20:24     ` Dany Madden
@ 2021-05-07  7:24       ` Lijun Pan
  0 siblings, 0 replies; 5+ messages in thread
From: Lijun Pan @ 2021-05-07  7:24 UTC (permalink / raw)
  To: Dany Madden
  Cc: David S. Miller, Jakub Kicinski, Sukadev Bhattiprolu,
	Thomas Falcon, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, netdev, linuxppc-dev

On Tue, May 4, 2021 at 3:24 PM Dany Madden <drt@linux.ibm.com> wrote:
>
> On 2021-05-04 12:31, Lijun Pan wrote:
> > On Tue, May 4, 2021 at 2:27 PM Lijun Pan <lijunp213@gmail.com> wrote:
> >>
> >> On Tue, May 4, 2021 at 2:14 PM Dany Madden <drt@linux.ibm.com> wrote:
> >> >
> >> > When ibmvnic gets a FATAL error message from the vnicserver, it marks
> >> > the Command Respond Queue (CRQ) inactive and resets the adapter. If this
> >> > FATAL reset fails and a transmission timeout reset follows, the CRQ is
> >> > still inactive, ibmvnic's attempt to set link down will also fail. If
> >> > ibmvnic abandons the reset because of this failed set link down and this
> >> > is the last reset in the workqueue, then this adapter will be left in an
> >> > inoperable state.
> >> >
> >> > Instead, make the driver ignore this link down failure and continue to
> >> > free and re-register CRQ so that the adapter has an opportunity to
> >> > recover.
> >> >
> >> > Fixes: ed651a10875f ("ibmvnic: Updated reset handling")
> >> > Signed-off-by: Dany Madden <drt@linux.ibm.com>
> >> > Reviewed-by: Rick Lindsley <ricklind@linux.ibm.com>
> >> > Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
> >> > ---
> >> > Changes in V2:
> >> > - Update description to clarify background for the patch
> >> > - Include Reviewed-by tags
> >> > Changes in V3:
> >> > - Add comment above the code change
> >> > ---
> >> >  drivers/net/ethernet/ibm/ibmvnic.c | 11 +++++++++--
> >> >  1 file changed, 9 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
> >> > index 5788bb956d73..9e005a08d43b 100644
> >> > --- a/drivers/net/ethernet/ibm/ibmvnic.c
> >> > +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> >> > @@ -2017,8 +2017,15 @@ static int do_reset(struct ibmvnic_adapter *adapter,
> >> >                         rtnl_unlock();
> >> >                         rc = set_link_state(adapter, IBMVNIC_LOGICAL_LNK_DN);
> >> >                         rtnl_lock();
> >> > -                       if (rc)
> >> > -                               goto out;
> >> > +
> >> > +                       /* Attempted to set the link down. It could fail if the
> >> > +                        * vnicserver has already torn down the CRQ. We will
> >> > +                        * note it and continue with reset to reinit the CRQ.
> >> > +                        */
> >> > +                       if (rc) {
> >> > +                               netdev_dbg(netdev,
> >> > +                                          "Setting link down failed rc=%d. Continue anyway\n", rc);
> >> > +                       }
> >>
> >> There are other places which check and rely on the return value of
> >> this function. Your change makes that inconsistent. Can you stop
> >
> > To be more specific, __ibmvnic_close, __ibmvnic_open both call this
> > set_link_state.
> Inconsistent would have been not checking for the rc at all. Here we
> checked and noted it that there are times that it's ok to continue.
>
> >
> >> posting new versions and soliciting the maintainer to accept it before
> >> there is material change? There are many ways to make reset
> >> successful. I think this is the worst approach of all.
>
> Can you show me a patch that is better than this one, that has gone thru
> a 30+ hours of testing?

The patch review convention is: community review the patch, and the
patch author modifies the patch and resend. We are talking about the
patch itself, you came up with something about testing. You do not
take the reviewer's opinions but ask the reviewer to write a patch,
which is a little bit odd.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, back to index

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-04 19:11 [PATCH net v3] ibmvnic: Continue with reset if set link down failed Dany Madden
2021-05-04 19:27 ` Lijun Pan
2021-05-04 19:31   ` Lijun Pan
2021-05-04 20:24     ` Dany Madden
2021-05-07  7:24       ` Lijun Pan

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git