All of lore.kernel.org
 help / color / mirror / Atom feed
From: wi nk <wink@technolu.st>
To: Kalle Valo <kvalo@codeaurora.org>
Cc: "ath11k@lists.infradead.org" <ath11k@lists.infradead.org>,
	Mitchell Nordine <mail@mitchellnordine.com>
Subject: Re: ath11k: QCA6390 on Dell XPS 13 and kernel crashes
Date: Wed, 9 Dec 2020 16:55:50 +0100	[thread overview]
Message-ID: <CAHUdJJWNSKw9aAHQBc8Ftne1J+s5KdVfdMLwgWu+g-ZfeDnitA@mail.gmail.com> (raw)
In-Reply-To: <87mtyn9dx0.fsf@codeaurora.org>

On Wed, Dec 9, 2020 at 4:50 PM Kalle Valo <kvalo@codeaurora.org> wrote:
>
> wi nk <wink@technolu.st> writes:
>
> > On Wed, Dec 9, 2020 at 4:35 PM Kalle Valo <kvalo@codeaurora.org> wrote:
> >>
> >> wi nk <wink@technolu.st> writes:
> >>
> >> > So I've managed to stabilise my system now, so either the race is
> >> > gone, or I've done something to win it all the time.  So one of the
> >> > avenues of racing I was chasing at first was in the ath11k driver
> >> > itself.  There are a couple areas where the single/shared IRQ is being
> >> > forcibly toggled in ways that the documentation says are not great
> >> > (and the original patch was trying to avoid).  Fixing those didn't
> >> > seem to have much impact on the stability of things (I've included
> >> > those changes in my patch though).  After the last email I was
> >> > thinking about the MHI side of things a bit more and found a number of
> >> > call sites that my naive grepping had missed that do the same thing,
> >> > but via acquiring a lock at the same time.  I modified all the calls
> >> > to *_lock_irq and *_unlock_irq to the lock/unlock - save/restore
> >> > variants that accept the flags parameter to capture state.  I've now
> >> > booted and loaded the driver 10+ times without a single freeze or
> >> > crash.  I'm not sure all of those modifications are necessary (ie:
> >> > which things are re-entrant in this single interrupt operating mode vs
> >> > which ones can use the simpler lock/unlock mechanisms), so I could use
> >> > some advice/guidance there.
> >> >
> >> > Mitchell - if you want to grab this patch and try it, let me know how
> >> > it goes and I can clean it up for the mailing list:
> >> > https://github.com/w1nk/ath11k-debug/blob/master/one-irq-manage.patch
> >> > (apply to ath11k-qca6390-bringup-202011301608)
> >>
> >> Wink, I want to ask more about your the very interesting
> >> one-irq-manage.patch you wrote. Have you seen the "sched: RT throttling
> >> activated" crash with that patch? If yes, how many times, for example 5
> >> out of 10 times or something like that?
> >>
> >> Or is it so with one-irq-manage.patch the kernel doesn't crash at all? I
> >> didn't quite understand the situation.
> >>
> >> --
> >> https://patchwork.kernel.org/project/linux-wireless/list/
> >>
> >> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
> >
> > Kalle,
> >
> >    Sorry for moving the thread :).
>
> No problem, I'll just make extra questions to make sure that I'm
> understanding things correctly :)
>
> > So I've attempted 2 patches that seem to produce varying degrees of
> > success. The single IRQ patch took the crashing behaviour from hard
> > locking immediately, to that stuttering / RT throttling message
> > consistently. So instead of hard locking 9/10 times and stuttering
> > 1/10, it was inverted.
>
> Ok, got it now.
>
> > The second patch disabling the m2 transition (even without the single
> > IRQ patch) seems to have resolved the issues altogether, but at the
> > expense of disabling this m2 state, which I don't have much idea of
> > the consequences..
>
> Sorry, I have missed that. What second patch are you talking about?
>
> Also can you share your /proc/interrupts in full?
>
> --
> https://patchwork.kernel.org/project/linux-wireless/list/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
>
> --
> ath11k mailing list
> ath11k@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/ath11k

Here's interrupts in full , and the short patch after:

            CPU0       CPU1       CPU2       CPU3       CPU4
CPU5       CPU6       CPU7
   0:          7          0          0          0          0
0          0          0   IO-APIC    2-edge      timer
   1:          0          0          0          0          0
0          0       2923   IO-APIC    1-edge      i8042
   8:          0          0          0          0          0
0          0          0   IO-APIC    8-edge      rtc0
   9:          0       9290          0          0          0
0          0          0   IO-APIC    9-fasteoi   acpi
  12:          0          0          0          0          0
0         53          0   IO-APIC   12-edge      i8042
  14:          0      29816          0          0          0
0          0          0   IO-APIC   14-fasteoi   INT34C5:00
  16:          0          0          0          0          0
10376          0          0   IO-APIC   16-fasteoi   intel_ish_ipc,
i801_smbus, idma64.4
  27:          0          0          0          0          0
0          0          0   IO-APIC   27-fasteoi   idma64.0,
i2c_designware.0
  31:          0          0          0          0          0
0          0          0   IO-APIC   31-fasteoi   idma64.2,
i2c_designware.2
  32:          0          0          0          0          0
0          0          0   IO-APIC   32-fasteoi   idma64.3,
i2c_designware.3
  40:       9681     777197      27906          0          0
0          0          0   IO-APIC   40-fasteoi   idma64.1,
i2c_designware.1
 120:          0          0          0          0          0
0          0          0   PCI-MSI 114688-edge      PCIe PME, pciehp
 121:          0          0          0          0          0
0          0          0   PCI-MSI 118784-edge      PCIe PME, pciehp
 122:          0          0          0          0          0
0          0          0   PCI-MSI 458752-edge      PCIe PME
 123:          0          0          0          0          0
0          0          0   PCI-MSI 475136-edge      PCIe PME
 124:          0          0          1          0          0
0          0          0   PCI-MSI 229376-edge      vmd
 125:          0          0          0         27          0
0          0          0   PCI-MSI 229377-edge      vmd
 126:          0          0          0          0       4303
0          0          0   PCI-MSI 229378-edge      vmd
 127:          0          0          0          0          0
2992          0        434   PCI-MSI 229379-edge      vmd
 128:          0          0          0          0          0
593       2504          0   PCI-MSI 229380-edge      vmd
 129:          0          0          0          0        699
0       1061       1873   PCI-MSI 229381-edge      vmd
 130:       2382        394          0        603          0
0          0          0   PCI-MSI 229382-edge      vmd
 131:          0       1670          0        406        646
0          0          0   PCI-MSI 229383-edge      vmd
 132:        692          0       2903          0          0
0          0          0   PCI-MSI 229384-edge      vmd
 133:          0        518        913       2198          0
0          0          0   PCI-MSI 229385-edge      vmd
 134:          0          0          0          0          0
0          0          0   PCI-MSI 229386-edge      vmd
 135:          0          0          0          0          0
0          0          0   PCI-MSI 229387-edge      vmd
 136:          0          0          0          0          0
0          0          0   PCI-MSI 229388-edge      vmd
 137:          0          0          0          0          0
0          0          0   PCI-MSI 229389-edge      vmd
 138:          0          0          0          0          0
0          0          0   PCI-MSI 229390-edge      vmd
 139:          0          0          0          0          0
0          0          0   PCI-MSI 229391-edge      vmd
 140:          0          0          0          0          0
0          0          0   PCI-MSI 229392-edge      vmd
 141:          0          0          0          0          0
0          0          0   PCI-MSI 229393-edge      vmd
 142:          0          0          0          0          0
0          0          0   PCI-MSI 229394-edge      vmd
 143:          0          0          0          0          0
0          0          0   VMD-MSI  124  PCIe PME, aerdrv, pcie-dpc
 144:          0          0          0          0          0
0          1          0   PCI-MSI 212992-edge      xhci_hcd
 145:          0          0          0          0          0
0          0         72   PCI-MSI 327680-edge      xhci_hcd
 146:          6          0          0          0          0
0          0          0   PCI-MSI 45088768-edge      rtsx_pci
 147:          0          0          0          0          0
0          0          0   VMD-MSI  125  nvme0q0
 148:          0          0          0       1859          0
0          0      38399   PCI-MSI 32768-edge      i915
 149:          0          0          0          0          0
0          0          0   VMD-MSI  126  nvme0q1
 150:          0          0          0          0          0
0          0          0   VMD-MSI  127  nvme0q2
 151:          0          0          0          0          0
0          0          0   VMD-MSI  128  nvme0q3
 152:          0          0          0          0          0
0          0          0   VMD-MSI  129  nvme0q4
 153:          0          0          0          0          0
0          0          0   VMD-MSI  130  nvme0q5
 154:          0          0          0          0          0
0          0          0   VMD-MSI  131  nvme0q6
 155:          0          0          0          0          0
0          0          0   VMD-MSI  132  nvme0q7
 156:          0          0          0          0          0
0          0          0   VMD-MSI  133  nvme0q8
 157:          0      29816          0          0          0
0          0          0  INT34C5:00  327  DLL0945:00
 158:          0          0          0          0          0
0         48          0   PCI-MSI 360448-edge      mei_me
 159:          0          0          0          0          0
0          0       1134   PCI-MSI 514048-edge      AudioDSP
 162:          0          0          0     108102          0
0          0          0   PCI-MSI 44564480-edge      ce0, ce1, ce2,
ce3, ce5, ce7, ce8, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ,
DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ, DP_EXT_IRQ,
DP_EXT_IRQ, bhi, mhi, mhi
 NMI:          0          0          0          0          0
0          0          0   Non-maskable interrupts
 LOC:      64516      80387      54151      82574      64663
113373      58033      81555   Local timer interrupts
 SPU:          0          0          0          0          0
0          0          0   Spurious interrupts
 PMI:          0          0          0          0          0
0          0          0   Performance monitoring interrupts
 IWI:          5          2          1        760          1
1          0      16078   IRQ work interrupts
 RTR:          6          0          0          0          0
0          0          0   APIC ICR read retries
 RES:       1834       7304       1432       1807       3015
1552       1417       1498   Rescheduling interrupts
 CAL:      21739      26798      28934      22211      22590
28622      22541      20023   Function call interrupts
 TLB:      51267      49182      59392      48384      46755
56491      48103      46560   TLB shootdowns
 TRM:          2          2          2          2          2
2          2          2   Thermal event interrupts
 THR:          0          0          0          0          0
0          0          0   Threshold APIC interrupts
 DFR:          0          0          0          0          0
0          0          0   Deferred Error APIC interrupts
 MCE:          0          0          0          0          0
0          0          0   Machine check exceptions
 MCP:          3          4          4          4          4
4          4          4   Machine check polls
 ERR:         16
 MIS:          0
 PIN:          0          0          0          0          0
0          0          0   Posted-interrupt notification event
 NPI:          0          0          0          0          0
0          0          0   Nested posted-interrupt event
 PIW:          0          0          0          0          0
0          0          0   Posted-interrupt wakeup event

and the modification that disables m2 state:

diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
index 3de7b1639ec6..20f670c8b129 100644
--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -55,12 +55,12 @@ static struct mhi_pm_transitions const
dev_state_transitions[] = {
     },
     {
         MHI_PM_M0,
-        MHI_PM_M0 | MHI_PM_M2 | MHI_PM_M3_ENTER |
+        MHI_PM_M0 | MHI_PM_M3_ENTER |
         MHI_PM_SYS_ERR_DETECT | MHI_PM_SHUTDOWN_PROCESS |
         MHI_PM_LD_ERR_FATAL_DETECT | MHI_PM_FW_DL_ERR
     },
     {
-        MHI_PM_M2,
+        MHI_PM_M0,
         MHI_PM_M0 | MHI_PM_SYS_ERR_DETECT | MHI_PM_SHUTDOWN_PROCESS |
         MHI_PM_LD_ERR_FATAL_DETECT
     },

-- 
ath11k mailing list
ath11k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath11k

  reply	other threads:[~2020-12-09 15:56 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-06 17:38 ath11k: QCA6390 on Dell XPS 13 and kernel crashes Mitchell Nordine
2020-12-06 17:53 ` wi nk
2020-12-06 21:45   ` wi nk
2020-12-07  1:17     ` wi nk
2020-12-07 14:45       ` Mitchell Nordine
2020-12-07 17:01         ` wi nk
2020-12-09  1:52           ` wi nk
2020-12-09  9:43             ` wi nk
2020-12-09 15:28               ` wi nk
2020-12-09 15:35     ` Kalle Valo
2020-12-09 15:39       ` wi nk
2020-12-09 15:50         ` wi nk
2020-12-09 15:50         ` Kalle Valo
2020-12-09 15:55           ` wi nk [this message]
2020-12-09 21:46             ` wi nk
2020-12-11 12:28               ` wi nk
2020-12-12  5:37                 ` Kalle Valo
2020-12-12 11:46                   ` wi nk
2020-12-12 23:29                     ` wi nk
2020-12-13  0:03                       ` wi nk
2020-12-13  0:59                         ` Mitchell Nordine
2020-12-13 22:09                           ` Stephen Liang
2020-12-16  8:50                           ` Kalle Valo
  -- strict thread matches above, loose matches on Subject: below --
2020-12-02 23:49 Stephen Liang
2020-12-09 15:09 ` Kalle Valo
2020-12-10  3:07   ` Stephen Liang
2020-12-10  7:37     ` Stephen Liang
2020-11-30 16:55 Kalle Valo
2020-11-30 17:02 ` wi nk
2020-12-01 10:17   ` wi nk
2020-12-05 19:17     ` wi nk
2020-12-06  8:05       ` wi nk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHUdJJWNSKw9aAHQBc8Ftne1J+s5KdVfdMLwgWu+g-ZfeDnitA@mail.gmail.com \
    --to=wink@technolu.st \
    --cc=ath11k@lists.infradead.org \
    --cc=kvalo@codeaurora.org \
    --cc=mail@mitchellnordine.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.