* Re: kernel 4.15.7/64bit, C3600 is unstable during heavy I/O on PCI [not found] ` <CA+QBN9D9vUA7Q=Sd=moi+bSAkQjGQ6nGa8wnb1=7qHudAY8L8g@mail.gmail.com> @ 2018-03-17 11:36 ` Helge Deller 2018-03-17 16:12 ` Grant Grundler [not found] ` <81f118a7-5bc5-c6b2-eb10-a9c683c72a26@bell.net> 0 siblings, 2 replies; 11+ messages in thread From: Helge Deller @ 2018-03-17 11:36 UTC (permalink / raw) To: Carlo Pisani; +Cc: debian-hppa, linux-parisc * Carlo Pisani <carlojpisani@gmail.com>: > I have created and applied the following patch > testing the kernel with heavy I/O seems now stable > > my C3600 is still under testing, moving chunks of 500Mbyte and compiling gcc-v6 > > > http://93.55.217.0//wonderland/chunk_of/user/ivelegacy/happa-dev/hppa2_0001_HPMC_fix_my_v1.patch --- drivers/parisc/lba_pci.c 2018-01-28 22:20:33.000000000 +0100 +++ drivers/parisc/lba_pci.c 2018-03-15 12:26:44.839894952 +0100 @@ -1405,7 +1405,7 @@ /* Set HF mode as the default (vs. -1 mode). */ stat = READ_REG32(d->hba.base_addr + LBA_STAT_CTL); - WRITE_REG32(stat | HF_ENABLE, d->hba.base_addr + LBA_STAT_CTL); + WRITE_REG32(stat & ~HF_ENABLE, d->hba.base_addr + LBA_STAT_CTL); /* ** Writing a zero to STAT_CTL.rf (bit 0) will clear reset signal That's the patch from Kyle: https://www.spinics.net/lists/linux-parisc/msg01027.html which comes out of this mail thread: https://www.spinics.net/lists/linux-parisc/msg01024.html specifically with those notes: https://www.spinics.net/lists/linux-parisc/msg01026.html Citing here: "bus timeout" usually means we tried to read an address that doesn't respond. that is, nothing on the bus accepted the transaction for it, so it timed out and HPMC'd the box. what you really need is the IIR, and the address it tried to access (both the kernel vaddr which will be in the register, and the "system requester address" from the hpmc dump which will be the physical address mapped. not sure why the hpmc handler is getting skipped, that's a little weird. you can try hacking elroy to set softfail mode on that bus, which will result in a timeout on the pci bus to return -1 (like what x86 and most other architectures do) rather than hang the box, but it really likely means a driver bug. So, you change LBA to return -1 instead of faulting via HPMC which is of course one work-around to avoid the HPMC. But could you try to check the driver instead? You run this SATA controller: 01:05.0 RAID bus controller: VIA Technologies, Inc. VT6421 IDE RAID Can you maybe try to localize where in the drivers/ata/sata_via.c driver triggers the HPMC ? --- arch/parisc/kernel/hpmc.S 2018-01-28 22:20:33.000000000 +0100 +++ arch/parisc/kernel/hpmc.S 2018-03-15 14:13:46.611969815 +0100 @@ -308,4 +290,5 @@ .align 4 .export os_hpmc_size os_hpmc_size: - .word .os_hpmc_end-.os_hpmc + /* .word .os_hpmc_end-.os_hpmc */ + .word (.os_hpmc_end - .os_hpmc) * 4 /* sizeof(u32) */ This one seems wrong. I think you just didn't hit a HPMC with your first patch, and as such this patch has no influence... Helge ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: kernel 4.15.7/64bit, C3600 is unstable during heavy I/O on PCI 2018-03-17 11:36 ` kernel 4.15.7/64bit, C3600 is unstable during heavy I/O on PCI Helge Deller @ 2018-03-17 16:12 ` Grant Grundler [not found] ` <CA+QBN9Bd69whK14_0jT0m=5F4i2FxNPyEbMv3mgUsox9pX3fKA@mail.gmail.com> 2018-03-17 17:47 ` John David Anglin [not found] ` <81f118a7-5bc5-c6b2-eb10-a9c683c72a26@bell.net> 1 sibling, 2 replies; 11+ messages in thread From: Grant Grundler @ 2018-03-17 16:12 UTC (permalink / raw) To: Helge Deller; +Cc: Carlo Pisani, debian-hppa, linux-parisc Hi Helge, just a nit on PCI terminology... On Sat, Mar 17, 2018 at 12:36 PM, Helge Deller <deller@gmx.de> wrote: > * Carlo Pisani <carlojpisani@gmail.com>: >> I have created and applied the following patch >> testing the kernel with heavy I/O seems now stable >> >> my C3600 is still under testing, moving chunks of 500Mbyte and compiling gcc-v6 >> >> >> http://93.55.217.0//wonderland/chunk_of/user/ivelegacy/happa-dev/hppa2_0001_HPMC_fix_my_v1.patch > > --- drivers/parisc/lba_pci.c 2018-01-28 22:20:33.000000000 +0100 > +++ drivers/parisc/lba_pci.c 2018-03-15 12:26:44.839894952 +0100 > @@ -1405,7 +1405,7 @@ > > /* Set HF mode as the default (vs. -1 mode). */ > stat = READ_REG32(d->hba.base_addr + LBA_STAT_CTL); > - WRITE_REG32(stat | HF_ENABLE, d->hba.base_addr + LBA_STAT_CTL); > + WRITE_REG32(stat & ~HF_ENABLE, d->hba.base_addr + LBA_STAT_CTL); > > /* > ** Writing a zero to STAT_CTL.rf (bit 0) will clear reset signal > > That's the patch from Kyle: > https://www.spinics.net/lists/linux-parisc/msg01027.html > > which comes out of this mail thread: > https://www.spinics.net/lists/linux-parisc/msg01024.html > specifically with those notes: > https://www.spinics.net/lists/linux-parisc/msg01026.html > > Citing here: > "bus timeout" usually means we tried to read an address that doesn't > respond. that is, nothing on the bus accepted the transaction for it, > so it timed out and HPMC'd the box. HF= Hard Fail on PCI "Master Abort". "Master Abort" means the MMIO transaction timed out - usually due to the device not responding to an MMIO read. We normally want HF to be enabled as you noted below in order to get state of the CPU and which IO device it was trying to access. It's possible the "~0L" returned in SoftFail mode is being handled by the driver OR the particular read that fails just doesn't matter. Would have to see dmesg output if the driver ever complains about invalid MMIO read data (~0L). cheers, grant > what you really need is the IIR, and the address it tried to access > (both the kernel vaddr which will be in the register, and the "system > requester address" from the hpmc dump which will be the physical address > mapped. > > not sure why the hpmc handler is getting skipped, that's a little weird. > > you can try hacking elroy to set softfail mode on that bus, which will > result in a timeout on the pci bus to return -1 (like what x86 and most > other architectures do) rather than hang the box, but it really likely > means a driver bug. > > > > So, you change LBA to return -1 instead of faulting via HPMC which is > of course one work-around to avoid the HPMC. > > But could you try to check the driver instead? > > You run this SATA controller: > 01:05.0 RAID bus controller: VIA Technologies, Inc. VT6421 IDE RAID > > Can you maybe try to localize where in the drivers/ata/sata_via.c driver > triggers the HPMC ? > > > --- arch/parisc/kernel/hpmc.S 2018-01-28 22:20:33.000000000 +0100 > +++ arch/parisc/kernel/hpmc.S 2018-03-15 14:13:46.611969815 +0100 > @@ -308,4 +290,5 @@ > .align 4 > .export os_hpmc_size > os_hpmc_size: > - .word .os_hpmc_end-.os_hpmc > + /* .word .os_hpmc_end-.os_hpmc */ > + .word (.os_hpmc_end - .os_hpmc) * 4 /* sizeof(u32) */ > > This one seems wrong. > I think you just didn't hit a HPMC with your first patch, and as such > this patch has no influence... > > Helge > ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CA+QBN9Bd69whK14_0jT0m=5F4i2FxNPyEbMv3mgUsox9pX3fKA@mail.gmail.com>]
* Re: kernel 4.15.7/64bit, C3600 is unstable during heavy I/O on PCI [not found] ` <CA+QBN9Bd69whK14_0jT0m=5F4i2FxNPyEbMv3mgUsox9pX3fKA@mail.gmail.com> @ 2018-03-17 17:05 ` Grant Grundler [not found] ` <CA+QBN9A_0x6_0Ayo6YtkdC=j1ssa9mD7NTzU5+Jie0dYabp3bQ@mail.gmail.com> 0 siblings, 1 reply; 11+ messages in thread From: Grant Grundler @ 2018-03-17 17:05 UTC (permalink / raw) To: Carlo Pisani; +Cc: debian-hppa, linux-parisc [adding lists back to CC since this is a public discussion] On Sat, Mar 17, 2018 at 5:36 PM, Carlo Pisani <carlojpisani@gmail.com> wrote: >> Would have to see dmesg output if the driver ever complains about >> invalid MMIO read data (~0L). > > I repeat AGAIN > > I have tested different sATA controllers > - via6421 > - SIL24 > - Adaptex 2410 Ah sorry - I missed that. > none of them has never complained in dmesg ( i have the console > redirected the serial port ) Thanks for confirming Carlo! Having worked on mv7042 and SIL3124 (IIRC) driver support for almost 4 years, I can tell you most SATA drivers suck. Helge is most likely correct that this is due to a SATA driver bug. Since Intel x86 (32 or 64bit) systems ONLY support "SoftFail" mode, many driver issues are never exposed until something gets corrupted or HW totally wedges. > and all of them showed the same behavior under heavy I/O While the symptom (HPMC) looks the same, the details are likely different for each card. It's been over 10 years but ISTR that "SER PIM" command when entered at boot prompt will dump those details. Ah, yes, that's correct: https://parisc.wiki.kernel.org/index.php/How_to_report_a_parisc-linux_kernel_problem#HPMC BTW, HPMCs can also be due to devices DMAing to an invalid address since the IOMMU is "strict" and requires all DMA be "mapped". IIRC, these DMA failures are "imprecise events" meaning SER PIM is the only way to determine what the target DMA address was. This is non-trivial stuff to debug since it requires pretty deep understanding of the SATA controller operations and how the driver works. > the C3600 machine stops to work, and I see errors in the LCD > without a line from the kernel on the serial console Yes, HPMC due to IO errors (including DMA failures and PCI Master Abort) will lock up all other IO including the serial console. SER PIM is the only output that will lead to diagnosing the underlying driver bug. But in SoftFail mode, I was hoping one of the drivers might complain about "invalid value" - but that is likely way too optimistic on my part. The only drivers that look for ~0L on MMIO read are the ones that support PCI hotplug (e.g. PCMCIA devices and some higher end PCI IO devices which target data center/server market). cheers, grant ps. I haven't looked at mvsata or sil3124 SATA controller code since 2010. But if you could find a MV7042 PCI (or PCI-X?) SATA controller, I have more faith in that driver/HW combination partly because I tested it pretty thoroughly at the time. ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CA+QBN9A_0x6_0Ayo6YtkdC=j1ssa9mD7NTzU5+Jie0dYabp3bQ@mail.gmail.com>]
* Re: kernel 4.15.7/64bit, C3600 is unstable during heavy I/O on PCI [not found] ` <CA+QBN9A_0x6_0Ayo6YtkdC=j1ssa9mD7NTzU5+Jie0dYabp3bQ@mail.gmail.com> @ 2018-03-19 19:37 ` Grant Grundler 2018-03-19 19:41 ` Grant Grundler 0 siblings, 1 reply; 11+ messages in thread From: Grant Grundler @ 2018-03-19 19:37 UTC (permalink / raw) To: Carlo Pisani; +Cc: debian-hppa, linux-parisc [adding public lists back to CC since I can't sort this by myself any more] On Sat, Mar 17, 2018 at 6:45 PM, Carlo Pisani <carlojpisani@gmail.com> wrote: >> While the symptom (HPMC) looks the same, the details are likely >> different for each card. It's been over 10 years but ISTR that "SER >> PIM" command when entered at boot prompt will dump those details. > > this is an old log > http://93.55.217.0/wonderland/chunk_of/user/ivelegacy/hppa-dev/ser-prim-2018-03-12--sata-crash Awesome - thank you! Can you tell me which SATA controller was installed when this crash was taken? > my C3600 is now busy, compiling 4.16.*, a couple of branches with two > different configurations, so I can't force a crash to have a fresh log > at the moment, but I will with several SATA card! No worries - start with one. :) > what about SAS kernel modules? are them better than SATA? In general, I expect they are better supported but I never worked with SAS controllers. > and what about SCSI (adaptec 29320, PCI-X)? Sorry: I don't have any experience with adaptec - only the LSI SCSI controllers, Marvel 7042 SATA controller, and Silicon Image 3124 (IIRC). The SI3124 was a PoS. MV7042 worked well and was reliable "at scale". > I am waiting for an array of disks, I can test the SCSI card under > heavy I/O to see > - performance > - stability > > it will be delivered here for the end of the next week Ok! Sounds like fun! :D cheers, grant ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: kernel 4.15.7/64bit, C3600 is unstable during heavy I/O on PCI 2018-03-19 19:37 ` Grant Grundler @ 2018-03-19 19:41 ` Grant Grundler 0 siblings, 0 replies; 11+ messages in thread From: Grant Grundler @ 2018-03-19 19:41 UTC (permalink / raw) To: Carlo Pisani; +Cc: debian-hppa, linux-parisc On Mon, Mar 19, 2018 at 8:37 PM, Grant Grundler <grantgrundler@gmail.com> wrote: ... >> this is an old log >> http://93.55.217.0/wonderland/chunk_of/user/ivelegacy/hppa-dev/ser-prim-2018-03-12--sata-crash BTW, I should point out one needs to give "CLEARPIM" command at PDC command line in order to record a new HPMC. So once you reproduce the crash, collect HPMC dump with "SER PIM", then "CLEAR PIM" again. :) cheers, grant ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: kernel 4.15.7/64bit, C3600 is unstable during heavy I/O on PCI 2018-03-17 16:12 ` Grant Grundler [not found] ` <CA+QBN9Bd69whK14_0jT0m=5F4i2FxNPyEbMv3mgUsox9pX3fKA@mail.gmail.com> @ 2018-03-17 17:47 ` John David Anglin 2018-03-19 19:30 ` Grant Grundler [not found] ` <CA+QBN9DgtjA3vdE7WA36o8P585r8vMbR+s2zPAF+529HuEzzfw@mail.gmail.com> 1 sibling, 2 replies; 11+ messages in thread From: John David Anglin @ 2018-03-17 17:47 UTC (permalink / raw) To: Grant Grundler, Helge Deller; +Cc: Carlo Pisani, debian-hppa, linux-parisc Hi Grant, On 2018-03-17 12:12 PM, Grant Grundler wrote: > "Master Abort" means the MMIO > transaction timed out - usually due to the device not responding to an > MMIO read. In lba_pci.c and sba_iommu.c, it says "BE WARNED: register writes are posted" and need to be followed by a read. It seems there are a some routines in these modules that have writes that aren't followed by a read. One is lba_wr_cfg(). Another might be the macro LBA_CFG_RESTORE(). Are these okay? It seems probable that the problem that Carlo is having is a conflict between devices. Dave ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: kernel 4.15.7/64bit, C3600 is unstable during heavy I/O on PCI 2018-03-17 17:47 ` John David Anglin @ 2018-03-19 19:30 ` Grant Grundler [not found] ` <CA+QBN9DgtjA3vdE7WA36o8P585r8vMbR+s2zPAF+529HuEzzfw@mail.gmail.com> 1 sibling, 0 replies; 11+ messages in thread From: Grant Grundler @ 2018-03-19 19:30 UTC (permalink / raw) To: John David Anglin; +Cc: Helge Deller, Carlo Pisani, debian-hppa, linux-parisc Hi John! On Sat, Mar 17, 2018 at 6:47 PM, John David Anglin <dave.anglin@bell.net> wrote: > Hi Grant, > > On 2018-03-17 12:12 PM, Grant Grundler wrote: >> >> "Master Abort" means the MMIO >> transaction timed out - usually due to the device not responding to an >> MMIO read. > > In lba_pci.c and sba_iommu.c, it says "BE WARNED: register writes are > posted" and need to be followed by a read. It seems there are a some > routines in these modules that have writes that aren't followed by a read. > One is lba_wr_cfg(). Another might be the macro > LBA_CFG_RESTORE(). Are these okay? I looked through the two examples you point out and I *think* both are ok. lba_wr_cfg() issues an mmio write and immediately after calls LBA_CFG_MASTER_ABORT_CHECK() which performs an MMIO read from the same base address. The LBA_CFG_RESTORE() is "lazy" - the next MMIO read will flush those three writes and (I believe) any following MMIO writes will still be issued in order. Typically, the problem with posted MMIO writes is DMA or other events don't start until the MMIO write is "seen" by the device. This is important when specific timing between MMIO transactions is required OR some magic (e.g. device reset, updates Frame Buffer, etc) happens. > It seems probable that the problem that Carlo is having is a conflict > between devices. Hrm. I don't know. I haven't yet looked at the latest dump that Carlo helpfully provided as I'm still traveling. Why do you suspect this? I'm skeptical about "conflict between devices" (due to lba_wr_cfg()) for two reasons: 1) configuration space accesses are usually not part of normal IO device transaction processing. 2) I've nearly always found that PCI Master Aborts (on MMIO reads) are usually just a symptom of something else going wrong and not the root cause. Typically, the issues I recall running into are around the drivers hitting a corner case where the device is still performing DMA to an address that gets unmapped by the driver. This will wedge the IOMMU (sba) and then following MMIO reads will generate an HPMC. The hard part is to determine what the corner case is based on a DMA address (as reported in SER PIM output). It requires deeper understanding of the DMA programming for the given SATA controller (driver directing HW what to do), how transaction completions are reported (SATA controller HW) and handled (driver operation). In the past, I've sorted several of these issues out for tg3 and tulip NIC drivers and I can with confidence say that some issues still remain in the tulip driver shutdown path. But I gave up on trying to fix those and lost interest later. cheers, grant ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CA+QBN9DgtjA3vdE7WA36o8P585r8vMbR+s2zPAF+529HuEzzfw@mail.gmail.com>]
[parent not found: <665cf5f2-35e4-f2ca-7b32-3e5f70ba7acd@bell.net>]
[parent not found: <CA+QBN9A6vgzxvDtvKNqenxFCGOksjgcxo_NHaGLRPsFyfDzL1Q@mail.gmail.com>]
[parent not found: <438edd73-e420-e3d9-df03-610e7dbc2e13@bell.net>]
[parent not found: <CA+QBN9CS11tBCOMKyGvjGF+w9bnd-pvUioSJD=o3AT9pJC=OmA@mail.gmail.com>]
* Re: kernel 4.15.7/64bit, C3600 is unstable during heavy I/O on PCI [not found] ` <CA+QBN9CS11tBCOMKyGvjGF+w9bnd-pvUioSJD=o3AT9pJC=OmA@mail.gmail.com> @ 2018-03-19 20:46 ` John David Anglin 0 siblings, 0 replies; 11+ messages in thread From: John David Anglin @ 2018-03-19 20:46 UTC (permalink / raw) To: Carlo Pisani; +Cc: Grant Grundler, linux-parisc List, debian-hppa This is info about PCI devices that Carlo sent me. It would help if we lnew where 0x4046b840 was in the kernel. From the IIR, 0xf3c001c, it seems the timeout occurred on a byte operation. Dave On 2018-03-17 2:58 PM, Carlo Pisani wrote: >> Probably, it would be useful to have verbose output from lspci posted to >> list as well. > > c3600 ~ $ lspci -v > 00:0c.0 Ethernet controller: Digital Equipment Corporation DECchip > 21142/43 (rev 41) > Subsystem: Hewlett-Packard Company DECchip 21142/43 > Flags: bus master, medium devsel, latency 255, IRQ 17 > I/O ports at 1000 [size=128] > Memory at fffffffff4008000 (32-bit, non-prefetchable) [size=1K] > Expansion ROM at fffffffff4040000 [disabled] [size=256K] > Kernel driver in use: tulip > lspci: Unable to load libkmod resources: error -12 > > 00:0d.0 Multimedia audio controller: Analog Devices Device 1889 > Subsystem: Analog Devices Device 1889 > Flags: bus master, medium devsel, latency 255, IRQ 18 > Memory at fffffffff400c000 (32-bit, prefetchable) [size=512] > Memory at fffffffff400b000 (32-bit, prefetchable) [size=16] > Memory at fffffffff400a000 (32-bit, prefetchable) [size=16] > Memory at fffffffff4009000 (32-bit, prefetchable) [size=16] > Capabilities: [dc] Power Management version 1 > Kernel driver in use: snd_ad1889 > > 00:0e.0 IDE interface: National Semiconductor Corporation 87415/87560 > IDE (rev 03) (prog-if 8f [Master SecP SecO PriP PriO]) > Flags: bus master, medium devsel, latency 255, IRQ 7 > I/O ports at 0f00 [size=8] > I/O ports at 0e00 [size=4] > I/O ports at 0d00 [size=8] > I/O ports at 0b00 [size=4] > I/O ports at 0a00 [size=16] > Kernel driver in use: NS87415_IDE > > 00:0e.1 Bridge: National Semiconductor Corporation 87560 Legacy I/O (rev 01) > Flags: bus master, medium devsel, latency 255, IRQ 19 > Kernel driver in use: SuperIO > > 00:0e.2 USB controller: National Semiconductor Corporation USB > Controller (rev 02) (prog-if 10 [OHCI]) > Flags: bus master, medium devsel, latency 240, IRQ 1 > Memory at fffffffff4007000 (32-bit, non-prefetchable) [size=4K] > Memory at fffffffff4006000 (32-bit, non-prefetchable) [size=4K] > Kernel driver in use: ohci-pci > > 00:0f.0 SCSI storage controller: LSI Logic / Symbios Logic 53C896/897 (rev 07) > Subsystem: LSI Logic / Symbios Logic LSI53C896/7 PCI to Dual > Channel Ultra2 SCSI Multifunction Controller > Flags: bus master, medium devsel, latency 255, IRQ 20 > I/O ports at 0900 [size=256] > Memory at fffffffff4005000 (64-bit, non-prefetchable) [size=1K] > Memory at fffffffff4002000 (64-bit, non-prefetchable) [size=8K] > Capabilities: [40] Power Management version 2 > Kernel driver in use: sym53c8xx > > 00:0f.1 SCSI storage controller: LSI Logic / Symbios Logic 53C896/897 (rev 07) > Subsystem: LSI Logic / Symbios Logic LSI53C896/7 PCI to Dual > Channel Ultra2 SCSI Multifunction Controller > Flags: bus master, medium devsel, latency 255, IRQ 20 > I/O ports at 0800 [size=256] > Memory at fffffffff4004000 (64-bit, non-prefetchable) [size=1K] > Memory at fffffffff4000000 (64-bit, non-prefetchable) [size=8K] > Capabilities: [40] Power Management version 2 > Kernel driver in use: sym53c8xx > > 01:05.0 RAID bus controller: VIA Technologies, Inc. VT6421 IDE/SATA > Controller (rev 50) > Subsystem: VIA Technologies, Inc. VT6421 IDE/SATA Controller > Flags: bus master, medium devsel, latency 240, IRQ 21 > I/O ports at 12500 [size=16] > I/O ports at 12400 [size=16] > I/O ports at 12300 [size=16] > I/O ports at 12200 [size=16] > I/O ports at 12100 [size=32] > I/O ports at 12000 [size=256] > Expansion ROM at fffffffff4800000 [disabled] [size=64K] > Capabilities: [e0] Power Management version 2 > Kernel driver in use: sata_via > The requesting device was 10/1/5/0. ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <81f118a7-5bc5-c6b2-eb10-a9c683c72a26@bell.net>]
* [RFC][PATCH] Fix HPMC handler by increasing size to multiple of 16 bytes [not found] ` <81f118a7-5bc5-c6b2-eb10-a9c683c72a26@bell.net> @ 2018-03-19 22:23 ` Helge Deller 2018-03-19 22:58 ` John David Anglin 0 siblings, 1 reply; 11+ messages in thread From: Helge Deller @ 2018-03-19 22:23 UTC (permalink / raw) To: John David Anglin; +Cc: debian-hppa, linux-parisc * John David Anglin <dave.anglin@bell.net>: > On 2018-03-17 7:36 AM, Helge Deller wrote: > > Can you maybe try to localize where in the drivers/ata/sata_via.c driver > > triggers the HPMC ? > > > > > > --- arch/parisc/kernel/hpmc.S 2018-01-28 22:20:33.000000000 +0100 > > +++ arch/parisc/kernel/hpmc.S 2018-03-15 14:13:46.611969815 +0100 > > @@ -308,4 +290,5 @@ > > .align 4 > > .export os_hpmc_size > > os_hpmc_size: > > - .word .os_hpmc_end-.os_hpmc > > + /* .word .os_hpmc_end-.os_hpmc */ > > + .word (.os_hpmc_end - .os_hpmc) * 4 /* sizeof(u32) */ > > > > This one seems wrong. > > I think you just didn't hit a HPMC with your first patch, and as such > > this patch has no influence... > Helge, did you check that os_hpmc is correctly entered after you added > the ".align 4" statement (e.g., trigger hpmc by accessing page 0 or > such)? I looked at one of my builds and the size is correct as is. > Is it possible that the length needs to be a multiple of 16? Current > length is 0x194. Good point. I changed that in the patch below. > There are a couple of minor issues with assembly code. There are no .type > and .size directives for os_hpmc_size. Fixed as well. > I think we should investigate further as I have never seen hpmc > handler entered on rp3440 or c8000. On rp3440, there are messages > about the branch failing in the SL log. However, this might have been > fixed by your alignment fix. No, the problem still exists in existing code. Below patch is build-tested only. [PATCH] Fix HPMC handler by increasing size to multiple of 16 bytes Make sure that the HPMC handler is 16-byte aligned and that it's length in the IVT is a multiple of 16 bytes. Otherwise PDC may decide not to jump to HPMC handler. Signed-off-by: Helge Deller <deller@gmx.de> diff --git a/arch/parisc/kernel/hpmc.S b/arch/parisc/kernel/hpmc.S index 8d072c44f300..f09f54f8042e 100644 --- a/arch/parisc/kernel/hpmc.S +++ b/arch/parisc/kernel/hpmc.S @@ -84,6 +84,7 @@ END(hpmc_pim_data) .text .import intr_save, code + .align 16 ENTRY_CFI(os_hpmc) .os_hpmc: @@ -305,7 +306,9 @@ ENDPROC_CFI(os_hpmc) __INITRODATA +.globl os_hpmc_size .align 4 - .export os_hpmc_size + .type os_hpmc_size, @object + .size os_hpmc_size, 4 os_hpmc_size: .word .os_hpmc_end-.os_hpmc diff --git a/arch/parisc/kernel/traps.c b/arch/parisc/kernel/traps.c index c919e6c0a687..ca6e8722fffb 100644 --- a/arch/parisc/kernel/traps.c +++ b/arch/parisc/kernel/traps.c @@ -837,8 +837,8 @@ void __init initialize_ivt(const void *iva) ivap[0] = instr; /* Compute Checksum for HPMC handler */ - length = os_hpmc_size; - ivap[7] = length; + length = ALIGN(os_hpmc_size, 16); + ivap[7] = length; /* length needs to be multiple of 16. */ hpmcp = (u32 *)os_hpmc; ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC][PATCH] Fix HPMC handler by increasing size to multiple of 16 bytes 2018-03-19 22:23 ` [RFC][PATCH] Fix HPMC handler by increasing size to multiple of 16 bytes Helge Deller @ 2018-03-19 22:58 ` John David Anglin 2018-03-24 20:18 ` [RFC][PATCH v2] " Helge Deller 0 siblings, 1 reply; 11+ messages in thread From: John David Anglin @ 2018-03-19 22:58 UTC (permalink / raw) To: Helge Deller; +Cc: debian-hppa, linux-parisc On 2018-03-19 6:23 PM, Helge Deller wrote: > - length = os_hpmc_size; > - ivap[7] = length; > + length = ALIGN(os_hpmc_size, 16); > + ivap[7] = length; /* length needs to be multiple of 16. */ If this is true, should we not add three nops to the end of os_hpmc so it has a length which is a multiple of 16? Dave ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCH v2] Fix HPMC handler by increasing size to multiple of 16 bytes 2018-03-19 22:58 ` John David Anglin @ 2018-03-24 20:18 ` Helge Deller 0 siblings, 0 replies; 11+ messages in thread From: Helge Deller @ 2018-03-24 20:18 UTC (permalink / raw) To: John David Anglin, linux-parisc, James Bottomley; +Cc: debian-hppa Make sure that the HPMC (High Priority Machine Check) handler is 16-byte aligned and that it's length in the IVT is a multiple of 16 bytes. Otherwise PDC may decide not to call the HPMC crash handler. Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org diff --git a/arch/parisc/kernel/hpmc.S b/arch/parisc/kernel/hpmc.S index 8d072c44f300..781c3b9a3e46 100644 --- a/arch/parisc/kernel/hpmc.S +++ b/arch/parisc/kernel/hpmc.S @@ -84,6 +84,7 @@ END(hpmc_pim_data) .text .import intr_save, code + .align 16 ENTRY_CFI(os_hpmc) .os_hpmc: @@ -300,12 +301,15 @@ os_hpmc_6: b . nop + .align 16 /* make function length multiple of 16 bytes */ ENDPROC_CFI(os_hpmc) .os_hpmc_end: __INITRODATA +.globl os_hpmc_size .align 4 - .export os_hpmc_size + .type os_hpmc_size, @object + .size os_hpmc_size, 4 os_hpmc_size: .word .os_hpmc_end-.os_hpmc ^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-03-24 20:18 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CA+QBN9DxM5PYCnPJCRtgxQ8xGk75=jAtsE+VibUfFOv+Yah6Og@mail.gmail.com> [not found] ` <CA+QBN9D9vUA7Q=Sd=moi+bSAkQjGQ6nGa8wnb1=7qHudAY8L8g@mail.gmail.com> 2018-03-17 11:36 ` kernel 4.15.7/64bit, C3600 is unstable during heavy I/O on PCI Helge Deller 2018-03-17 16:12 ` Grant Grundler [not found] ` <CA+QBN9Bd69whK14_0jT0m=5F4i2FxNPyEbMv3mgUsox9pX3fKA@mail.gmail.com> 2018-03-17 17:05 ` Grant Grundler [not found] ` <CA+QBN9A_0x6_0Ayo6YtkdC=j1ssa9mD7NTzU5+Jie0dYabp3bQ@mail.gmail.com> 2018-03-19 19:37 ` Grant Grundler 2018-03-19 19:41 ` Grant Grundler 2018-03-17 17:47 ` John David Anglin 2018-03-19 19:30 ` Grant Grundler [not found] ` <CA+QBN9DgtjA3vdE7WA36o8P585r8vMbR+s2zPAF+529HuEzzfw@mail.gmail.com> [not found] ` <665cf5f2-35e4-f2ca-7b32-3e5f70ba7acd@bell.net> [not found] ` <CA+QBN9A6vgzxvDtvKNqenxFCGOksjgcxo_NHaGLRPsFyfDzL1Q@mail.gmail.com> [not found] ` <438edd73-e420-e3d9-df03-610e7dbc2e13@bell.net> [not found] ` <CA+QBN9CS11tBCOMKyGvjGF+w9bnd-pvUioSJD=o3AT9pJC=OmA@mail.gmail.com> 2018-03-19 20:46 ` John David Anglin [not found] ` <81f118a7-5bc5-c6b2-eb10-a9c683c72a26@bell.net> 2018-03-19 22:23 ` [RFC][PATCH] Fix HPMC handler by increasing size to multiple of 16 bytes Helge Deller 2018-03-19 22:58 ` John David Anglin 2018-03-24 20:18 ` [RFC][PATCH v2] " Helge Deller
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.