All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michal Suchánek" <msuchanek@suse.de>
To: Lijun Pan <lijunp213@gmail.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.ibm.com>,
	netdev@vger.kernel.org, Lijun Pan <ljp@linux.vnet.ibm.com>,
	Tom Falcon <tlfalcon@linux.ibm.com>,
	Paul Mackerras <paulus@samba.org>,
	Dany Madden <drt@linux.ibm.com>, Jakub Kicinski <kuba@kernel.org>,
	linuxppc-dev@lists.ozlabs.org, David Miller <davem@davemloft.net>
Subject: Re: [PATCH V2 net] ibmvnic: Continue with reset if set link down failed
Date: Thu, 22 Apr 2021 19:21:36 +0200	[thread overview]
Message-ID: <20210422172135.GY6564@kitsune.suse.cz> (raw)
In-Reply-To: <CAOhMmr4ckVFTZtSeHFHNgGPUA12xYO8WcUoakx7WdwQfSKBJhA@mail.gmail.com>

Hello,

On Thu, Apr 22, 2021 at 12:06:45AM -0500, Lijun Pan wrote:
> On Wed, Apr 21, 2021 at 2:25 AM Sukadev Bhattiprolu
> <sukadev@linux.ibm.com> wrote:
> >
> > Lijun Pan [ljp@linux.vnet.ibm.com] wrote:
> > >
> > >
> > > > On Apr 20, 2021, at 4:35 PM, Dany Madden <drt@linux.ibm.com> wrote:
> > > >
> > > > When ibmvnic gets a FATAL error message from the vnicserver, it marks
> > > > the Command Respond Queue (CRQ) inactive and resets the adapter. If this
> > > > FATAL reset fails and a transmission timeout reset follows, the CRQ is
> > > > still inactive, ibmvnic's attempt to set link down will also fail. If
> > > > ibmvnic abandons the reset because of this failed set link down and this
> > > > is the last reset in the workqueue, then this adapter will be left in an
> > > > inoperable state.
> > > >
> > > > Instead, make the driver ignore this link down failure and continue to
> > > > free and re-register CRQ so that the adapter has an opportunity to
> > > > recover.
> > >
> > > This v2 does not adddress the concerns mentioned in v1.
> > > And I think it is better to exit with error from do_reset, and schedule a thorough
> > > do_hard_reset if the the adapter is already in unstable state.
> >
> > We had a FATAL error and when handling it, we failed to send a
> > link-down message to the VIOS. So what we need to try next is to
> > reset the connection with the VIOS. For this we must talk to the
> > firmware using the H_FREE_CRQ and H_REG_CRQ hcalls. do_reset()
> > does just that in ibmvnic_reset_crq().
> >
> > Now, sure we can attempt a "thorough hard reset" which also does
> > the same hcalls to reestablish the connection. Is there any
> > other magic in do_hard_reset()? But in addition, it also frees lot
> > more Linux kernel buffers and reallocates them for instance.
> 
> Working around everything in do_reset will make the code very difficult
> to manage. Ultimately do_reset can do anything I am afraid, and do_hard_reset
> can be removed completely or merged into do_reset.

This debate is not very constructive.

In the context of driver that has separate do_reset and do_hard_reset
this fix picks the correct one unless you can refute the arguments
provided.

Merging do_reset and do_hard_reset might be a good code cleanup which is
out of the scope of this fix.



Given that vast majority of fixes to the vnic driver are related to the
reset handling it would improve stability and testability if every
reset took the same code path.

In the context of merging do_hard_reset and do_reset the question is
what is the intended distinction and performance gain by having
'lightweight' reset.

I don't have a vnic protocol manual at hand and I suspect I would not
get one even if I searched for one.

From reading through the fixes in the past my understanding is that the
full reset is required when the backend changes which then potentially
requires different size/number of buffers.

What is the expected situation when reset is required without changing
the backend?

Is this so common that it warrants a separate 'lightweight' optimized
function?

Thanks

Michal

WARNING: multiple messages have this Message-ID (diff)
From: "Michal Suchánek" <msuchanek@suse.de>
To: Lijun Pan <lijunp213@gmail.com>
Cc: netdev@vger.kernel.org, Lijun Pan <ljp@linux.vnet.ibm.com>,
	Tom Falcon <tlfalcon@linux.ibm.com>,
	Paul Mackerras <paulus@samba.org>,
	Dany Madden <drt@linux.ibm.com>, Jakub Kicinski <kuba@kernel.org>,
	Sukadev Bhattiprolu <sukadev@linux.ibm.com>,
	linuxppc-dev@lists.ozlabs.org, David Miller <davem@davemloft.net>
Subject: Re: [PATCH V2 net] ibmvnic: Continue with reset if set link down failed
Date: Thu, 22 Apr 2021 19:21:36 +0200	[thread overview]
Message-ID: <20210422172135.GY6564@kitsune.suse.cz> (raw)
In-Reply-To: <CAOhMmr4ckVFTZtSeHFHNgGPUA12xYO8WcUoakx7WdwQfSKBJhA@mail.gmail.com>

Hello,

On Thu, Apr 22, 2021 at 12:06:45AM -0500, Lijun Pan wrote:
> On Wed, Apr 21, 2021 at 2:25 AM Sukadev Bhattiprolu
> <sukadev@linux.ibm.com> wrote:
> >
> > Lijun Pan [ljp@linux.vnet.ibm.com] wrote:
> > >
> > >
> > > > On Apr 20, 2021, at 4:35 PM, Dany Madden <drt@linux.ibm.com> wrote:
> > > >
> > > > When ibmvnic gets a FATAL error message from the vnicserver, it marks
> > > > the Command Respond Queue (CRQ) inactive and resets the adapter. If this
> > > > FATAL reset fails and a transmission timeout reset follows, the CRQ is
> > > > still inactive, ibmvnic's attempt to set link down will also fail. If
> > > > ibmvnic abandons the reset because of this failed set link down and this
> > > > is the last reset in the workqueue, then this adapter will be left in an
> > > > inoperable state.
> > > >
> > > > Instead, make the driver ignore this link down failure and continue to
> > > > free and re-register CRQ so that the adapter has an opportunity to
> > > > recover.
> > >
> > > This v2 does not adddress the concerns mentioned in v1.
> > > And I think it is better to exit with error from do_reset, and schedule a thorough
> > > do_hard_reset if the the adapter is already in unstable state.
> >
> > We had a FATAL error and when handling it, we failed to send a
> > link-down message to the VIOS. So what we need to try next is to
> > reset the connection with the VIOS. For this we must talk to the
> > firmware using the H_FREE_CRQ and H_REG_CRQ hcalls. do_reset()
> > does just that in ibmvnic_reset_crq().
> >
> > Now, sure we can attempt a "thorough hard reset" which also does
> > the same hcalls to reestablish the connection. Is there any
> > other magic in do_hard_reset()? But in addition, it also frees lot
> > more Linux kernel buffers and reallocates them for instance.
> 
> Working around everything in do_reset will make the code very difficult
> to manage. Ultimately do_reset can do anything I am afraid, and do_hard_reset
> can be removed completely or merged into do_reset.

This debate is not very constructive.

In the context of driver that has separate do_reset and do_hard_reset
this fix picks the correct one unless you can refute the arguments
provided.

Merging do_reset and do_hard_reset might be a good code cleanup which is
out of the scope of this fix.



Given that vast majority of fixes to the vnic driver are related to the
reset handling it would improve stability and testability if every
reset took the same code path.

In the context of merging do_hard_reset and do_reset the question is
what is the intended distinction and performance gain by having
'lightweight' reset.

I don't have a vnic protocol manual at hand and I suspect I would not
get one even if I searched for one.

From reading through the fixes in the past my understanding is that the
full reset is required when the backend changes which then potentially
requires different size/number of buffers.

What is the expected situation when reset is required without changing
the backend?

Is this so common that it warrants a separate 'lightweight' optimized
function?

Thanks

Michal

  parent reply	other threads:[~2021-04-22 17:21 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-20 21:35 [PATCH V2 net] ibmvnic: Continue with reset if set link down failed Dany Madden
2021-04-20 21:35 ` Dany Madden
2021-04-20 21:42 ` Lijun Pan
2021-04-20 21:42   ` Lijun Pan
2021-04-21  6:45   ` Sukadev Bhattiprolu
2021-04-21  6:45     ` Sukadev Bhattiprolu
2021-04-22  5:06     ` Lijun Pan
2021-04-22  5:06       ` Lijun Pan
2021-04-22  6:58       ` Sukadev Bhattiprolu
2021-04-22  6:58         ` Sukadev Bhattiprolu
2021-04-22  7:05       ` Rick Lindsley
2021-04-22  7:05         ` Rick Lindsley
2021-04-22 17:21       ` Michal Suchánek [this message]
2021-04-22 17:21         ` Michal Suchánek
2021-04-22 17:38         ` Lijun Pan
2021-04-22 17:38           ` Lijun Pan
2021-04-23  2:26         ` Rick Lindsley
2021-04-23  2:26           ` Rick Lindsley
2021-05-04 19:05           ` Dany Madden
2021-05-04 19:05             ` Dany Madden
2021-04-21  7:54   ` Rick Lindsley
2021-04-21  7:54     ` Rick Lindsley
2021-04-22  5:12     ` Lijun Pan
2021-04-22  5:12       ` Lijun Pan
2021-04-22  7:05       ` Rick Lindsley
2021-04-22  7:05         ` Rick Lindsley
2021-04-22  5:30 ` Lijun Pan
2021-04-22  5:30   ` Lijun Pan
2021-04-22  7:07   ` Rick Lindsley
2021-04-22  7:07     ` Rick Lindsley
2021-04-22 17:01     ` Lijun Pan
2021-04-22 17:01       ` Lijun Pan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210422172135.GY6564@kitsune.suse.cz \
    --to=msuchanek@suse.de \
    --cc=davem@davemloft.net \
    --cc=drt@linux.ibm.com \
    --cc=kuba@kernel.org \
    --cc=lijunp213@gmail.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=ljp@linux.vnet.ibm.com \
    --cc=netdev@vger.kernel.org \
    --cc=paulus@samba.org \
    --cc=sukadev@linux.ibm.com \
    --cc=tlfalcon@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.