All of lore.kernel.org
 help / color / mirror / Atom feed
From: wi nk <wink@technolu.st>
To: Kalle Valo <kvalo@codeaurora.org>
Cc: "ath11k@lists.infradead.org" <ath11k@lists.infradead.org>,
	Mitchell Nordine <mail@mitchellnordine.com>
Subject: Re: ath11k: QCA6390 on Dell XPS 13 and kernel crashes
Date: Wed, 9 Dec 2020 22:46:29 +0100	[thread overview]
Message-ID: <CAHUdJJXiDUc7ovAo8KbWxo_FEdniN4hXo67M9w6979QdNXeMfA@mail.gmail.com> (raw)
In-Reply-To: <CAHUdJJWNSKw9aAHQBc8Ftne1J+s5KdVfdMLwgWu+g-ZfeDnitA@mail.gmail.com>

On Wed, Dec 9, 2020 at 4:55 PM wi nk <wink@technolu.st> wrote:
>
> On Wed, Dec 9, 2020 at 4:50 PM Kalle Valo <kvalo@codeaurora.org> wrote:
> >
> > wi nk <wink@technolu.st> writes:
> >
> > > On Wed, Dec 9, 2020 at 4:35 PM Kalle Valo <kvalo@codeaurora.org> wrote:
> > >>
> > >> wi nk <wink@technolu.st> writes:
> > >>
> > >> > So I've managed to stabilise my system now, so either the race is
> > >> > gone, or I've done something to win it all the time.  So one of the
> > >> > avenues of racing I was chasing at first was in the ath11k driver
> > >> > itself.  There are a couple areas where the single/shared IRQ is being
> > >> > forcibly toggled in ways that the documentation says are not great
> > >> > (and the original patch was trying to avoid).  Fixing those didn't
> > >> > seem to have much impact on the stability of things (I've included
> > >> > those changes in my patch though).  After the last email I was
> > >> > thinking about the MHI side of things a bit more and found a number of
> > >> > call sites that my naive grepping had missed that do the same thing,
> > >> > but via acquiring a lock at the same time.  I modified all the calls
> > >> > to *_lock_irq and *_unlock_irq to the lock/unlock - save/restore
> > >> > variants that accept the flags parameter to capture state.  I've now
> > >> > booted and loaded the driver 10+ times without a single freeze or
> > >> > crash.  I'm not sure all of those modifications are necessary (ie:
> > >> > which things are re-entrant in this single interrupt operating mode vs
> > >> > which ones can use the simpler lock/unlock mechanisms), so I could use
> > >> > some advice/guidance there.
> > >> >
> > >> > Mitchell - if you want to grab this patch and try it, let me know how
> > >> > it goes and I can clean it up for the mailing list:
> > >> > https://github.com/w1nk/ath11k-debug/blob/master/one-irq-manage.patch
> > >> > (apply to ath11k-qca6390-bringup-202011301608)
> > >>
> > >> Wink, I want to ask more about your the very interesting
> > >> one-irq-manage.patch you wrote. Have you seen the "sched: RT throttling
> > >> activated" crash with that patch? If yes, how many times, for example 5
> > >> out of 10 times or something like that?
> > >>
> > >> Or is it so with one-irq-manage.patch the kernel doesn't crash at all? I
> > >> didn't quite understand the situation.
> > >>
> > >> --
> > >> https://patchwork.kernel.org/project/linux-wireless/list/
> > >>
> > >> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> > >
> > > Kalle,
> > >
> > >    Sorry for moving the thread :).
> >
> > No problem, I'll just make extra questions to make sure that I'm
> > understanding things correctly :)
> >
> > > So I've attempted 2 patches that seem to produce varying degrees of
> > > success. The single IRQ patch took the crashing behaviour from hard
> > > locking immediately, to that stuttering / RT throttling message
> > > consistently. So instead of hard locking 9/10 times and stuttering
> > > 1/10, it was inverted.
> >
> > Ok, got it now.
> >
> > > The second patch disabling the m2 transition (even without the single
> > > IRQ patch) seems to have resolved the issues altogether, but at the
> > > expense of disabling this m2 state, which I don't have much idea of
> > > the consequences..
> >
> > Sorry, I have missed that. What second patch are you talking about?
> >
> > Also can you share your /proc/interrupts in full?
> >
> > --
> > https://patchwork.kernel.org/project/linux-wireless/list/
> >
> > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> >
> > --
> > ath11k mailing list
> > ath11k@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/ath11k
>
> Here's interrupts in full , and the short patch after:
>
>             CPU0       CPU1       CPU2       CPU3       CPU4
> CPU5       CPU6       CPU7
>    0:          7          0          0          0          0
> 0          0          0   IO-APIC    2-edge      timer
>    1:          0          0          0          0          0
> 0          0       2923   IO-APIC    1-edge      i8042
>    8:          0          0          0          0          0
> 0          0          0   IO-APIC    8-edge      rtc0
>    9:          0       9290          0          0          0
> 0          0          0   IO-APIC    9-fasteoi   acpi
>   12:          0          0          0          0          0
> 0         53          0   IO-APIC   12-edge      i8042
>   14:          0      29816          0          0          0
> 0          0          0   IO-APIC   14-fasteoi   INT34C5:00
>   16:          0          0          0          0          0
> 10376          0          0   IO-APIC   16-fasteoi   intel_ish_ipc,
> i801_smbus, idma64.4
>   27:          0          0          0          0          0
> 0          0          0   IO-APIC   27-fasteoi   idma64.0,
> i2c_designware.0
>   31:          0          0          0          0          0
> 0          0          0   IO-APIC   31-fasteoi   idma64.2,
> i2c_designware.2
>   32:          0          0          0          0          0
> 0          0          0   IO-APIC   32-fasteoi   idma64.3,
> i2c_designware.3
>   40:       9681     777197      27906          0          0
> 0          0          0   IO-APIC   40-fasteoi   idma64.1,
> i2c_designware.1
>  120:          0          0          0          0          0
> 0          0          0   PCI-MSI 114688-edge      PCIe PME, pciehp
>  121:          0          0          0          0          0
> 0          0          0   PCI-MSI 118784-edge      PCIe PME, pciehp
>  122:          0          0          0          0          0
> 0          0          0   PCI-MSI 458752-edge      PCIe PME
>  123:          0          0          0          0          0
> 0          0          0   PCI-MSI 475136-edge      PCIe PME
>  124:          0          0          1          0          0
> 0          0          0   PCI-MSI 229376-edge      vmd
>  125:          0          0          0         27          0
> 0          0          0   PCI-MSI 229377-edge      vmd
>  126:          0          0          0          0       4303
> 0          0          0   PCI-MSI 229378-edge      vmd
>  127:          0          0          0          0          0
> 2992          0        434   PCI-MSI 229379-edge      vmd
>  128:          0          0          0          0          0
> 593       2504          0   PCI-MSI 229380-edge      vmd
>  129:          0          0          0          0        699
> 0       1061       1873   PCI-MSI 229381-edge      vmd
>  130:       2382        394          0        603          0
> 0          0          0   PCI-MSI 229382-edge      vmd
>  131:          0       1670          0        406        646
> 0          0          0   PCI-MSI 229383-edge      vmd
>  132:        692          0       2903          0          0
> 0          0          0   PCI-MSI 229384-edge      vmd
>  133:          0        518        913       2198          0
> 0          0          0   PCI-MSI 229385-edge      vmd
>  134:          0          0          0          0          0
> 0          0          0   PCI-MSI 229386-edge      vmd
>  135:          0          0          0          0          0
> 0          0          0   PCI-MSI 229387-edge      vmd
>  136:          0          0          0          0          0
> 0          0          0   PCI-MSI 229388-edge      vmd
>  137:          0          0          0          0          0
> 0          0          0   PCI-MSI 229389-edge      vmd
>  138:          0          0          0          0          0
> 0          0          0   PCI-MSI 229390-edge      vmd
>  139:          0          0          0          0          0
> 0          0          0   PCI-MSI 229391-edge      vmd
>  140:          0          0          0          0          0
> 0          0          0   PCI-MSI 229392-edge      vmd
>  141:          0          0          0          0          0
> 0          0          0   PCI-MSI 229393-edge      vmd
>  142:          0          0          0          0          0
> 0          0          0   PCI-MSI 229394-edge      vmd
>  143:          0          0          0          0          0
> 0          0          0   VMD-MSI  124  PCIe PME, aerdrv, pcie-dpc
>  144:          0          0          0          0          0
> 0          1          0   PCI-MSI 212992-edge      xhci_hcd
>  145:          0          0          0          0          0
> 0          0         72   PCI-MSI 327680-edge      xhci_hcd
>  146:          6          0          0          0          0
> 0          0          0   PCI-MSI 45088768-edge      rtsx_pci
>  147:          0          0          0          0          0
> 0          0          0   VMD-MSI  125  nvme0q0
>  148:          0          0          0       1859          0
> 0          0      38399   PCI-MSI 32768-edge      i915
>  149:          0          0          0          0          0
> 0          0          0   VMD-MSI  126  nvme0q1
>  150:          0          0          0          0          0
> 0          0          0   VMD-MSI  127  nvme0q2
>  151:          0          0          0          0          0
> 0          0          0   VMD-MSI  128  nvme0q3
>  152:          0          0          0          0          0
> 0          0          0   VMD-MSI  129  nvme0q4
>  153:          0          0          0          0          0
> 0          0          0   VMD-MSI  130  nvme0q5
>  154:          0          0          0          0          0
> 0          0          0   VMD-MSI  131  nvme0q6
>  155:          0          0          0          0          0
> 0          0          0   VMD-MSI  132  nvme0q7
>  156:          0          0          0          0          0
> 0          0          0   VMD-MSI  133  nvme0q8
>  157:          0      29816          0          0          0
> 0          0          0  INT34C5:00  327  DLL0945:00
>  158:          0          0          0          0          0
> 0         48          0   PCI-MSI 360448-edge      mei_me
>  159:          0          0          0          0          0
> 0          0       1134   PCI-MSI 514048-edge      AudioDSP
>  162:          0          0          0     108102          0
> 0          0          0   PCI-MSI 44564480-edge      ce0, ce1, ce2,
> ce3, ce5, ce7, ce8, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ,
> DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ,
> DP_EXT_IRQ, bhi, mhi, mhi
>  NMI:          0          0          0          0          0
> 0          0          0   Non-maskable interrupts
>  LOC:      64516      80387      54151      82574      64663
> 113373      58033      81555   Local timer interrupts
>  SPU:          0          0          0          0          0
> 0          0          0   Spurious interrupts
>  PMI:          0          0          0          0          0
> 0          0          0   Performance monitoring interrupts
>  IWI:          5          2          1        760          1
> 1          0      16078   IRQ work interrupts
>  RTR:          6          0          0          0          0
> 0          0          0   APIC ICR read retries
>  RES:       1834       7304       1432       1807       3015
> 1552       1417       1498   Rescheduling interrupts
>  CAL:      21739      26798      28934      22211      22590
> 28622      22541      20023   Function call interrupts
>  TLB:      51267      49182      59392      48384      46755
> 56491      48103      46560   TLB shootdowns
>  TRM:          2          2          2          2          2
> 2          2          2   Thermal event interrupts
>  THR:          0          0          0          0          0
> 0          0          0   Threshold APIC interrupts
>  DFR:          0          0          0          0          0
> 0          0          0   Deferred Error APIC interrupts
>  MCE:          0          0          0          0          0
> 0          0          0   Machine check exceptions
>  MCP:          3          4          4          4          4
> 4          4          4   Machine check polls
>  ERR:         16
>  MIS:          0
>  PIN:          0          0          0          0          0
> 0          0          0   Posted-interrupt notification event
>  NPI:          0          0          0          0          0
> 0          0          0   Nested posted-interrupt event
>  PIW:          0          0          0          0          0
> 0          0          0   Posted-interrupt wakeup event
>
> and the modification that disables m2 state:
>
> diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
> index 3de7b1639ec6..20f670c8b129 100644
> --- a/drivers/bus/mhi/core/pm.c
> +++ b/drivers/bus/mhi/core/pm.c
> @@ -55,12 +55,12 @@ static struct mhi_pm_transitions const
> dev_state_transitions[] = {
>      },
>      {
>          MHI_PM_M0,
> -        MHI_PM_M0 | MHI_PM_M2 | MHI_PM_M3_ENTER |
> +        MHI_PM_M0 | MHI_PM_M3_ENTER |
>          MHI_PM_SYS_ERR_DETECT | MHI_PM_SHUTDOWN_PROCESS |
>          MHI_PM_LD_ERR_FATAL_DETECT | MHI_PM_FW_DL_ERR
>      },
>      {
> -        MHI_PM_M2,
> +        MHI_PM_M0,
>          MHI_PM_M0 | MHI_PM_SYS_ERR_DETECT | MHI_PM_SHUTDOWN_PROCESS |
>          MHI_PM_LD_ERR_FATAL_DETECT
>      },

Adding one more data point.  The driver will not crash on
initialization this way, but also with the M2 state transition
disabled the system survives suspend and wake and the adapter
successfully reassociates consistently.  As expected with my patch,
the MHI driver shows everything stays in the M1 state instead of
attempting to transition to M2 ever.  It also doesn't return back to
M0 if I disconnect the power / replug it.  I'm not sure what things
are affected by me hacking this state machine, but avoiding that M2
transition has removed any obvious issues from my system.

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

  reply	other threads:[~2020-12-09 21:46 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-06 17:38 ath11k: QCA6390 on Dell XPS 13 and kernel crashes Mitchell Nordine
2020-12-06 17:53 ` wi nk
2020-12-06 21:45   ` wi nk
2020-12-07  1:17     ` wi nk
2020-12-07 14:45       ` Mitchell Nordine
2020-12-07 17:01         ` wi nk
2020-12-09  1:52           ` wi nk
2020-12-09  9:43             ` wi nk
2020-12-09 15:28               ` wi nk
2020-12-09 15:35     ` Kalle Valo
2020-12-09 15:39       ` wi nk
2020-12-09 15:50         ` wi nk
2020-12-09 15:50         ` Kalle Valo
2020-12-09 15:55           ` wi nk
2020-12-09 21:46             ` wi nk [this message]
2020-12-11 12:28               ` wi nk
2020-12-12  5:37                 ` Kalle Valo
2020-12-12 11:46                   ` wi nk
2020-12-12 23:29                     ` wi nk
2020-12-13  0:03                       ` wi nk
2020-12-13  0:59                         ` Mitchell Nordine
2020-12-13 22:09                           ` Stephen Liang
2020-12-16  8:50                           ` Kalle Valo
  -- strict thread matches above, loose matches on Subject: below --
2020-12-02 23:49 Stephen Liang
2020-12-09 15:09 ` Kalle Valo
2020-12-10  3:07   ` Stephen Liang
2020-12-10  7:37     ` Stephen Liang
2020-11-30 16:55 Kalle Valo
2020-11-30 17:02 ` wi nk
2020-12-01 10:17   ` wi nk
2020-12-05 19:17     ` wi nk
2020-12-06  8:05       ` wi nk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHUdJJXiDUc7ovAo8KbWxo_FEdniN4hXo67M9w6979QdNXeMfA@mail.gmail.com \
    --to=wink@technolu.st \
    --cc=ath11k@lists.infradead.org \
    --cc=kvalo@codeaurora.org \
    --cc=mail@mitchellnordine.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.