linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alex Elder <elder@linaro.org>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: David Miller <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	elder@kernel.org, evgreen@chromium.org,
	bjorn.andersson@linaro.org, cpratapa@codeaurora.org,
	Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>,
	Network Development <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH net-next 9/9] net: ipa: don't disable NAPI in suspend
Date: Sat, 30 Jan 2021 22:29:19 -0600	[thread overview]
Message-ID: <e27f5c10-7b77-1f12-fe36-e9261f01bca1@linaro.org> (raw)
In-Reply-To: <CAF=yD-L1SKzu+gsma7KN4VjGnma-_w+amXx=Y_0e78rQiUCu7Q@mail.gmail.com>

On 1/30/21 9:25 AM, Willem de Bruijn wrote:
> On Fri, Jan 29, 2021 at 3:29 PM Alex Elder <elder@linaro.org> wrote:
>>
>> The channel stop and suspend paths both call __gsi_channel_stop(),
>> which quiesces channel activity, disables NAPI, and (on other than
>> SDM845) stops the channel.  Similarly, the start and resume paths
>> share __gsi_channel_start(), which starts the channel and re-enables
>> NAPI again.
>>
>> Disabling NAPI should be done when stopping a channel, but this
>> should *not* be done when suspending.  It's not necessary in the
>> suspend path anyway, because the stopped channel (or suspended
>> endpoint on SDM845) will not cause interrupts to schedule NAPI,
>> and gsi_channel_trans_quiesce() won't return until there are no
>> more transactions to process in the NAPI polling loop.
> 
> But why is it incorrect to do so?

Maybe it's not; I also thought it was fine before, but...

Someone at Qualcomm asked me why I thought NAPI needed
to be disabled on suspend.  My response was basically
that it was a lightweight operation, and it shouldn't
really be a problem to do so.

Then, when I posted two patches last month, Jakub's
response told me he didn't understand why I was doing
what I was doing, and I stepped back to reconsider
the details of what was happening at suspend time.
 
https://lore.kernel.org/netdev/20210107183803.47308e23@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/

Four things were happening to suspend a channel:
quiesce activity; disable interrupt; disable NAPI;
and stop the channel.  It occurred to me that a
stopped channel would not generate interrupts, so if
the channel was stopped earlier there would be no need
to disable the interrupt.  Similarly there would be
(essentially) no need to disable NAPI once a channel
was stopped.

Underlying all of this is that I started chasing a
hang that was occurring on suspend over a month ago.
It was hard to reproduce (hundreds or thousands of
suspend/resume cycles without hitting it), and one
of the few times I actually hit the problem it was
stuck in napi_disable(), apparently waiting for
NAPI_STATE_SCHED to get cleared by napi_complete().

My best guess about how this could occur was if there
were a race of some kind between the interrupt handler
(scheduling NAPI) and the poll function (completing
it).  I found a number of problems while looking
at this, and in the past few weeks I've posted some
fixes to improve things.  Still, even with some of
these fixes in place we have seen a hang (but now
even more rarely).

So this grand rework of suspending/stopping channels
is an attempt to resolve this hang on suspend.

The channel is now stopped early, and once stopped,
everything that completed prior to the channel being
stopped is polled before considering the suspend
function done.  A stopped channel won't interrupt,
so we don't bother disabling the completion interrupt,
with no interrupts, NAPI won't be scheduled, so there's
no need to disable NAPI either.

The net result is simpler, and seems logical, and
should preclude any possible race between the interrupt
handler and poll function.  I'm trying to solve the
hang problem analytically, because it takes *so* long
to reproduce.

I'm open to other suggestions.

					-Alex

>  From a quick look, virtio-net disables on both remove and freeze, for instance.
> 
>> Instead, enable NAPI in gsi_channel_start(), when the completion
>> interrupt is first enabled.  Disable it again in gsi_channel_stop(),
>> when finally disabling the interrupt.
>>
>> Add a call to napi_synchronize() to __gsi_channel_stop(), to ensure
>> NAPI polling is done before moving on.
>>
>> Signed-off-by: Alex Elder <elder@linaro.org>
>> ---
> =
>> @@ -894,12 +894,16 @@ int gsi_channel_start(struct gsi *gsi, u32 channel_id)
>>          struct gsi_channel *channel = &gsi->channel[channel_id];
>>          int ret;
>>
>> -       /* Enable the completion interrupt */
>> +       /* Enable NAPI and the completion interrupt */
>> +       napi_enable(&channel->napi);
>>          gsi_irq_ieob_enable_one(gsi, channel->evt_ring_id);
>>
>>          ret = __gsi_channel_start(channel, true);
>> -       if (ret)
>> -               gsi_irq_ieob_disable_one(gsi, channel->evt_ring_id);
>> +       if (!ret)
>> +               return 0;
>> +
>> +       gsi_irq_ieob_disable_one(gsi, channel->evt_ring_id);
>> +       napi_disable(&channel->napi);
>>
>>          return ret;
>>   }
> 
> subjective, but easier to parse when the normal control flow is linear
> and the error path takes a branch (or goto, if reused).
> 


  parent reply	other threads:[~2021-01-31  4:30 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-29 20:20 [PATCH net-next 0/9] net: ipa: don't disable NAPI in suspend Alex Elder
2021-01-29 20:20 ` [PATCH net-next 1/9] net: ipa: don't thaw channel if error starting Alex Elder
2021-01-29 20:20 ` [PATCH net-next 2/9] net: ipa: introduce gsi_channel_stop_retry() Alex Elder
2021-01-29 20:20 ` [PATCH net-next 3/9] net: ipa: introduce __gsi_channel_start() Alex Elder
2021-01-29 20:20 ` [PATCH net-next 4/9] net: ipa: kill gsi_channel_freeze() and gsi_channel_thaw() Alex Elder
2021-01-29 20:20 ` [PATCH net-next 5/9] net: ipa: disable IEOB interrupt after channel stop Alex Elder
2021-01-29 20:20 ` [PATCH net-next 6/9] net: ipa: move completion interrupt enable/disable Alex Elder
2021-01-29 20:20 ` [PATCH net-next 7/9] net: ipa: don't disable IEOB interrupt during suspend Alex Elder
2021-01-29 20:20 ` [PATCH net-next 8/9] net: ipa: expand last transaction check Alex Elder
2021-01-29 20:20 ` [PATCH net-next 9/9] net: ipa: don't disable NAPI in suspend Alex Elder
2021-01-30 15:25   ` Willem de Bruijn
2021-01-30 19:22     ` Jakub Kicinski
2021-01-31  4:30       ` Alex Elder
2021-01-31  4:29     ` Alex Elder [this message]
2021-01-31 14:52       ` Willem de Bruijn
2021-01-31 15:32         ` Alex Elder
2021-02-01  1:36           ` Willem de Bruijn
2021-02-01 14:35             ` Alex Elder
2021-02-01 14:47               ` Willem de Bruijn
2021-02-01 15:48                 ` Alex Elder
2021-02-01 18:38                 ` Alex Elder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e27f5c10-7b77-1f12-fe36-e9261f01bca1@linaro.org \
    --to=elder@linaro.org \
    --cc=bjorn.andersson@linaro.org \
    --cc=cpratapa@codeaurora.org \
    --cc=davem@davemloft.net \
    --cc=elder@kernel.org \
    --cc=evgreen@chromium.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=subashab@codeaurora.org \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).