All of lore.kernel.org
 help / color / mirror / Atom feed
From: Doug Anderson <dianders@chromium.org>
To: Stephen Boyd <swboyd@chromium.org>
Cc: Roja Rani Yarubandi <rojay@codeaurora.org>,
	Mark Brown <broonie@kernel.org>, Andy Gross <agross@kernel.org>,
	Bjorn Andersson <bjorn.andersson@linaro.org>,
	linux-arm-msm <linux-arm-msm@vger.kernel.org>,
	linux-spi <linux-spi@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Akash Asthana <akashast@codeaurora.org>,
	msavaliy@qti.qualcomm.com
Subject: Re: [PATCH] spi: spi-geni-qcom: Fix NULL pointer access in geni_spi_isr
Date: Thu, 10 Dec 2020 09:14:15 -0800	[thread overview]
Message-ID: <CAD=FV=WtU3cnRe6pDKFMA9_0cnQFtSOyohY_bJwZObK+KrbhVQ@mail.gmail.com> (raw)
In-Reply-To: <160757022002.1580929.8656750350166301192@swboyd.mtv.corp.google.com>

Hi,

On Wed, Dec 9, 2020 at 7:17 PM Stephen Boyd <swboyd@chromium.org> wrote:
>
> Quoting Doug Anderson (2020-12-03 08:40:46)
>
> > I would guess that if "mas->cur_xfer" is NULL then
> > geni_spi_handle_rx() should read all data in the FIFO and throw it
> > away and geni_spi_handle_tx() should set SE_GENI_TX_WATERMARK_REG to
> > 0.  NOTE: I _think_ that with the synchronize_irq() I'm suggesting
> > above we'll avoid this case, but it never hurts to be defensive.
> >
> >
> > Does that all make sense?  So the summary is that instead of your patch:
>
> Can we get a CPU diagram describing the race and scenario where this
> happens? Something like:
>
>   CPU0                                CPU1
>   ----                                ----
>   setup_fifo_xfer()
>    spin_lock_irq(&mas->lock);
>    spin_unlock_irq(&mas->lock);
>    mas->cur_xfer = xfer
>    ...
>    <IRQ>
>                                       geni_spi_isr()
>                                        geni_spi_handle_rx()
>                                         <NULL deref boom explosion!>
>
> But obviously this example diagram is incorrect and some timeout happens
> instead? Sorry, I'm super lazy and don't want to read many paragraphs of
> text. :) I'd rather have a diagram like above that clearly points out
> the steps taken to the NULL pointer deref.

This is my untested belief of what's happening

 CPU0                                CPU1
 ----                                ----
                                     setup_fifo_xfer()
                                      ...
                                      geni_se_setup_m_cmd()
                                      <hardware starts transfer>
 <unrelated interrupt storm>          spin_unlock_irq()
 <continued interrupt storm>         <time passes>
 <continued interrupt storm>         <transfer complets in hardware>
 <continued interrupt storm>         <hardware sets M_RX_FIFO_WATERMARK_EN>
 <continued interrupt storm>         <time passes>
 <continued interrupt storm>         handle_fifo_timeout()
 <continued interrupt storm>          spin_lock_irq()
 <continued interrupt storm>          mas->cur_xfer = NULL
 <continued interrupt storm>          geni_se_cancel_m_cmd()
 <continued interrupt storm>          spin_unlock_irq()
 <continued interrupt storm>          wait_for_completion_timeout() => timeout
 <continued interrupt storm>          spin_lock_irq()
 <continued interrupt storm>          geni_se_abort_m_cmd()
 <continued interrupt storm>          spin_unlock_irq()
 <continued interrupt storm>          wait_for_completion_timeout() => timeout
 <interrupt storm ends>
 geni_spi_isr()
  spin_lock()
  if (m_irq & M_RX_FIFO_WATERMARK_EN)
   geni_spi_handle_rx()
    mas->cur_xfer NULL derefrence

With my proposed fix, I believe that would transform into:

 CPU0                                CPU1
 ----                                ----
                                     setup_fifo_xfer()
                                      ...
                                      geni_se_setup_m_cmd()
                                      <hardware starts transfer>
 <unrelated interrupt storm>          spin_unlock_irq()
 <continued interrupt storm>         <time passes>
 <continued interrupt storm>         <transfer complets in hardware>
 <continued interrupt storm>         <hardware sets M_RX_FIFO_WATERMARK_EN>
 <continued interrupt storm>         <time passes>
 <continued interrupt storm>         handle_fifo_timeout()
 <continued interrupt storm>          synchronize_irq()
 <continued interrupt storm>           <time passes>
 <interrupt storm ends>
 geni_spi_isr()
  ...
                                       <synchronize_irq() finishes>
                                      spin_lock_irq()
                                      mas->cur_xfer = NULL
                                      geni_se_cancel_m_cmd()
                                      spin_unlock_irq()
 geni_spi_isr()
   ...
                                      wait_for_completion_timeout() => success

The extra synchronize_irq() I was suggesting at the end of the
function would be an extra bit of paranoia.  Maybe a new storm showed
up while we were processing the timeout?


> > 1. Add synchronize_irq() at the start and end of
> > handle_fifo_timeout().  Not under lock.
> >
> > 2. In geni_spi_handle_rx(), check for NULL "mas->cur_xfer".  Read all
> > data in the FIFO (don't cap at rx_rem_bytes), but throw it away.
> >
> > 3. In geni_spi_handle_tx(), check for NULL "mas->cur_xfer".  Don't
> > write any data.  Just write 0 to SE_GENI_TX_WATERMARK_REG.
> >
> > I think #1 is the real fix, but #2 and #3 will avoid crashes in case
> > there's another bug somewhere.
>
> Aren't 2 and 3 papering over some weird problem though where irqs are
> coming in unexpectedly?

I think that's what I said but in different words?  #1 is the real fix
but #2 and #3 will keep us from crashing (AKA paper over) if we have
some other (unexpected) bug.  We'll already have an error in the log
in this case "Failed to cancel/abort m_cmd" so it doesn't feel
necessary to crash with a NULL dereference...

-Doug

  reply	other threads:[~2020-12-10 17:15 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-03  7:44 [PATCH] spi: spi-geni-qcom: Fix NULL pointer access in geni_spi_isr Roja Rani Yarubandi
2020-12-03 16:40 ` Doug Anderson
2020-12-10  3:17   ` Stephen Boyd
2020-12-10 17:14     ` Doug Anderson [this message]
2020-12-10 22:57       ` Stephen Boyd
2020-12-10 23:07         ` Doug Anderson
2020-12-10 23:32           ` Stephen Boyd
2020-12-10 23:50             ` Doug Anderson
2020-12-11  0:50               ` Stephen Boyd
2020-12-11  1:04                 ` Doug Anderson
2020-12-11  1:21                   ` Stephen Boyd
2020-12-11  1:30                     ` Doug Anderson
2020-12-11  1:39                       ` Stephen Boyd
2020-12-11  1:51                         ` Doug Anderson
2020-12-12  1:32                           ` Stephen Boyd
2020-12-15  0:31                             ` Doug Anderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAD=FV=WtU3cnRe6pDKFMA9_0cnQFtSOyohY_bJwZObK+KrbhVQ@mail.gmail.com' \
    --to=dianders@chromium.org \
    --cc=agross@kernel.org \
    --cc=akashast@codeaurora.org \
    --cc=bjorn.andersson@linaro.org \
    --cc=broonie@kernel.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-spi@vger.kernel.org \
    --cc=msavaliy@qti.qualcomm.com \
    --cc=rojay@codeaurora.org \
    --cc=swboyd@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.