* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-02 19:31 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-02 19:31 UTC (permalink / raw) To: Catalin Marinas, Will Deacon, Russell King, Thomas Petazzoni Cc: linux-arm-kernel, linux-kernel, libc-alpha Hi I tried to use a PCIe graphics card on the MacchiatoBIN board and I hit a strange problem. When I use the links browser in graphics mode on the framebuffer, I get occasional pixel corruption. Links does memcpy, memset and 4-byte writes on the framebuffer - nothing else. I found out that the pixel corruption is caused by overlapping unaligned stp instructions inside memcpy. In order to avoid branching, the arm64 memcpy implementation may write the same destination twice with different alignment. If I put "dmb sy" between the overlapping stp instructions, the pixel corruption goes away. This seems like a hardware bug. Is it a known errata? Do you have any workarounds for it? I tried AMD card (HD 6350) and NVidia (NVS 285) and both exhibit the same corruption. OpenGL doesn't work (it results in artifacts on the AMD card and lock-up on the NVidia card), but it's quite expected if even simple writing to the framebuffer doesn't work. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-02 19:31 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-02 19:31 UTC (permalink / raw) To: linux-arm-kernel Hi I tried to use a PCIe graphics card on the MacchiatoBIN board and I hit a strange problem. When I use the links browser in graphics mode on the framebuffer, I get occasional pixel corruption. Links does memcpy, memset and 4-byte writes on the framebuffer - nothing else. I found out that the pixel corruption is caused by overlapping unaligned stp instructions inside memcpy. In order to avoid branching, the arm64 memcpy implementation may write the same destination twice with different alignment. If I put "dmb sy" between the overlapping stp instructions, the pixel corruption goes away. This seems like a hardware bug. Is it a known errata? Do you have any workarounds for it? I tried AMD card (HD 6350) and NVidia (NVS 285) and both exhibit the same corruption. OpenGL doesn't work (it results in artifacts on the AMD card and lock-up on the NVidia card), but it's quite expected if even simple writing to the framebuffer doesn't work. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
[parent not found: <CAHCPf3tFGqkYEcWNN4LaWThw_rVqT316pzLv6T7RfxwO-eZ0EA@mail.gmail.com>]
* Re: framebuffer corruption due to overlapping stp instructions on arm64 [not found] ` <CAHCPf3tFGqkYEcWNN4LaWThw_rVqT316pzLv6T7RfxwO-eZ0EA@mail.gmail.com> @ 2018-08-03 6:35 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-03 6:35 UTC (permalink / raw) To: Matt Sealey Cc: Catalin Marinas, Russell King, Thomas Petazzoni, Will Deacon, libc-alpha, linux-arm-kernel, linux-kernel On Thu, 2 Aug 2018, Matt Sealey wrote: > The easiest explanation for this would be that the memory isn?t mapped > correctly. You can?t use PCIe memory spaces with anything other than > Device-nGnRE or stricter mappings. That?s just differences between the > AMBA and PCIe (posted/unposted) memory models. I've tried to use Device-nGnRE mapping and I've got unaligned access traps. Gcc have store-merging pass so that it generates unaligned accesses even in code that has none explicit unaligned accesses. Perhaps it would be possible to recompile the kernel without the store-merging pass, but recompiling all the userspace code is impossible. Should we catch the unaligned access traps in the kernel and emulate them? There are a lot of instructions that access memory in the ARMv8 ISA, so the emulator would be quite complicated. > Normal memory (cacheable or uncacheable, which Linux tends to call > ?memory? and ?writecombine? respectively) is not a good idea. > > There are two options; make sure Links maps it?s framebuffer as Device > memory, or the driver, or both - and make sure that only aligned > accesses happen (otherwise you?ll just get a synchronous exception) and > there isn?t a Normal memory alias. > > Alternatively, tell the PCIe driver that the framebuffer is in system > memory But how would the graphics card display from it? You'd have to periodically copy the framebuffer from the system memory to the real videoram. I'm not an expert in graphics drivers, I don't know if the graphics drivers have this possibility. > - you can map it however you like but there?ll be a performance > hit if you start to use GPU acceleration, but a significant performance > boost from the PoV of the CPU. Only memory accessed from the PCIe master > interface (i.e. reads and writes generated by the card itself - telling > the GPU to pull from system memory or other DMA) can be in Normal memory > and this allows PCIe to be cache coherent with the right interconnect. > The slave port on a PCIe root complex (i.e. CPU writes) can?t be used > with Normal, or reorderable, and therefore your 2GB of graphics memory > is going to be slow from the point of view of the CPU. > > To find the correct mapping you?ll need to know just how cache coherent > the PCIe RC is... > > Ta, > Matt Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 6:35 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-03 6:35 UTC (permalink / raw) To: linux-arm-kernel On Thu, 2 Aug 2018, Matt Sealey wrote: > The easiest explanation for this would be that the memory isn?t mapped > correctly. You can?t use PCIe memory spaces with anything other than > Device-nGnRE or stricter mappings. That?s just differences between the > AMBA and PCIe (posted/unposted) memory models. I've tried to use Device-nGnRE mapping and I've got unaligned access traps. Gcc have store-merging pass so that it generates unaligned accesses even in code that has none explicit unaligned accesses. Perhaps it would be possible to recompile the kernel without the store-merging pass, but recompiling all the userspace code is impossible. Should we catch the unaligned access traps in the kernel and emulate them? There are a lot of instructions that access memory in the ARMv8 ISA, so the emulator would be quite complicated. > Normal memory (cacheable or uncacheable, which Linux tends to call > ?memory? and ?writecombine? respectively) is not a good idea. > > There are two options; make sure Links maps it?s framebuffer as Device > memory, or the driver, or both - and make sure that only aligned > accesses happen (otherwise you?ll just get a synchronous exception) and > there isn?t a Normal memory alias. > > Alternatively, tell the PCIe driver that the framebuffer is in system > memory But how would the graphics card display from it? You'd have to periodically copy the framebuffer from the system memory to the real videoram. I'm not an expert in graphics drivers, I don't know if the graphics drivers have this possibility. > - you can map it however you like but there?ll be a performance > hit if you start to use GPU acceleration, but a significant performance > boost from the PoV of the CPU. Only memory accessed from the PCIe master > interface (i.e. reads and writes generated by the card itself - telling > the GPU to pull from system memory or other DMA) can be in Normal memory > and this allows PCIe to be cache coherent with the right interconnect. > The slave port on a PCIe root complex (i.e. CPU writes) can?t be used > with Normal, or reorderable, and therefore your 2GB of graphics memory > is going to be slow from the point of view of the CPU. > > To find the correct mapping you?ll need to know just how cache coherent > the PCIe RC is... > > Ta, > Matt Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 6:35 ` Mikulas Patocka @ 2018-08-03 7:16 ` Ard Biesheuvel -1 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-03 7:16 UTC (permalink / raw) To: Mikulas Patocka Cc: Matt Sealey, Thomas Petazzoni, libc-alpha, Catalin Marinas, Will Deacon, Russell King, Linux Kernel Mailing List, linux-arm-kernel On 3 August 2018 at 08:35, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Thu, 2 Aug 2018, Matt Sealey wrote: > >> The easiest explanation for this would be that the memory isn?t mapped >> correctly. You can?t use PCIe memory spaces with anything other than >> Device-nGnRE or stricter mappings. That?s just differences between the >> AMBA and PCIe (posted/unposted) memory models. Whoa hold on there. Are you saying we cannot have PCIe BAR windows with memory semantics on ARM? Most accelerated graphics drivers rely heavily on the ability to map the VRAM normal-non-cacheable (ioremap_wc, basically), and treat it as ordinary memory. > > I've tried to use Device-nGnRE mapping and I've got unaligned access > traps. Gcc have store-merging pass so that it generates unaligned accesses > even in code that has none explicit unaligned accesses. Perhaps it would > be possible to recompile the kernel without the store-merging pass, but > recompiling all the userspace code is impossible. > > Should we catch the unaligned access traps in the kernel and emulate them? > There are a lot of instructions that access memory in the ARMv8 ISA, so > the emulator would be quite complicated. > >> Normal memory (cacheable or uncacheable, which Linux tends to call >> ?memory? and ?writecombine? respectively) is not a good idea. >> >> There are two options; make sure Links maps it?s framebuffer as Device >> memory, or the driver, or both - and make sure that only aligned >> accesses happen (otherwise you?ll just get a synchronous exception) and >> there isn?t a Normal memory alias. >> >> Alternatively, tell the PCIe driver that the framebuffer is in system >> memory > > But how would the graphics card display from it? You'd have to > periodically copy the framebuffer from the system memory to the real > videoram. I'm not an expert in graphics drivers, I don't know if the > graphics drivers have this possibility. > >> - you can map it however you like but there?ll be a performance >> hit if you start to use GPU acceleration, but a significant performance >> boost from the PoV of the CPU. Only memory accessed from the PCIe master >> interface (i.e. reads and writes generated by the card itself - telling >> the GPU to pull from system memory or other DMA) can be in Normal memory >> and this allows PCIe to be cache coherent with the right interconnect. >> The slave port on a PCIe root complex (i.e. CPU writes) can?t be used >> with Normal, or reorderable, and therefore your 2GB of graphics memory >> is going to be slow from the point of view of the CPU. >> >> To find the correct mapping you?ll need to know just how cache coherent >> the PCIe RC is... >> >> Ta, >> Matt > > Mikulas > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 7:16 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-03 7:16 UTC (permalink / raw) To: linux-arm-kernel On 3 August 2018 at 08:35, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Thu, 2 Aug 2018, Matt Sealey wrote: > >> The easiest explanation for this would be that the memory isn?t mapped >> correctly. You can?t use PCIe memory spaces with anything other than >> Device-nGnRE or stricter mappings. That?s just differences between the >> AMBA and PCIe (posted/unposted) memory models. Whoa hold on there. Are you saying we cannot have PCIe BAR windows with memory semantics on ARM? Most accelerated graphics drivers rely heavily on the ability to map the VRAM normal-non-cacheable (ioremap_wc, basically), and treat it as ordinary memory. > > I've tried to use Device-nGnRE mapping and I've got unaligned access > traps. Gcc have store-merging pass so that it generates unaligned accesses > even in code that has none explicit unaligned accesses. Perhaps it would > be possible to recompile the kernel without the store-merging pass, but > recompiling all the userspace code is impossible. > > Should we catch the unaligned access traps in the kernel and emulate them? > There are a lot of instructions that access memory in the ARMv8 ISA, so > the emulator would be quite complicated. > >> Normal memory (cacheable or uncacheable, which Linux tends to call >> ?memory? and ?writecombine? respectively) is not a good idea. >> >> There are two options; make sure Links maps it?s framebuffer as Device >> memory, or the driver, or both - and make sure that only aligned >> accesses happen (otherwise you?ll just get a synchronous exception) and >> there isn?t a Normal memory alias. >> >> Alternatively, tell the PCIe driver that the framebuffer is in system >> memory > > But how would the graphics card display from it? You'd have to > periodically copy the framebuffer from the system memory to the real > videoram. I'm not an expert in graphics drivers, I don't know if the > graphics drivers have this possibility. > >> - you can map it however you like but there?ll be a performance >> hit if you start to use GPU acceleration, but a significant performance >> boost from the PoV of the CPU. Only memory accessed from the PCIe master >> interface (i.e. reads and writes generated by the card itself - telling >> the GPU to pull from system memory or other DMA) can be in Normal memory >> and this allows PCIe to be cache coherent with the right interconnect. >> The slave port on a PCIe root complex (i.e. CPU writes) can?t be used >> with Normal, or reorderable, and therefore your 2GB of graphics memory >> is going to be slow from the point of view of the CPU. >> >> To find the correct mapping you?ll need to know just how cache coherent >> the PCIe RC is... >> >> Ta, >> Matt > > Mikulas > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 7:16 ` Ard Biesheuvel @ 2018-08-03 9:41 ` Will Deacon -1 siblings, 0 replies; 238+ messages in thread From: Will Deacon @ 2018-08-03 9:41 UTC (permalink / raw) To: Ard Biesheuvel Cc: Mikulas Patocka, Matt Sealey, Thomas Petazzoni, libc-alpha, Catalin Marinas, Russell King, Linux Kernel Mailing List, linux-arm-kernel On Fri, Aug 03, 2018 at 09:16:39AM +0200, Ard Biesheuvel wrote: > On 3 August 2018 at 08:35, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Thu, 2 Aug 2018, Matt Sealey wrote: > > > >> The easiest explanation for this would be that the memory isn?t mapped > >> correctly. You can?t use PCIe memory spaces with anything other than > >> Device-nGnRE or stricter mappings. That?s just differences between the > >> AMBA and PCIe (posted/unposted) memory models. > > Whoa hold on there. > > Are you saying we cannot have PCIe BAR windows with memory semantics on ARM? > > Most accelerated graphics drivers rely heavily on the ability to map > the VRAM normal-non-cacheable (ioremap_wc, basically), and treat it as > ordinary memory. Yeah, I'd expect framebuffers to be mapped as normal NC. That should be fine for prefetchable BARs, no? Will ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 9:41 ` Will Deacon 0 siblings, 0 replies; 238+ messages in thread From: Will Deacon @ 2018-08-03 9:41 UTC (permalink / raw) To: linux-arm-kernel On Fri, Aug 03, 2018 at 09:16:39AM +0200, Ard Biesheuvel wrote: > On 3 August 2018 at 08:35, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Thu, 2 Aug 2018, Matt Sealey wrote: > > > >> The easiest explanation for this would be that the memory isn?t mapped > >> correctly. You can?t use PCIe memory spaces with anything other than > >> Device-nGnRE or stricter mappings. That?s just differences between the > >> AMBA and PCIe (posted/unposted) memory models. > > Whoa hold on there. > > Are you saying we cannot have PCIe BAR windows with memory semantics on ARM? > > Most accelerated graphics drivers rely heavily on the ability to map > the VRAM normal-non-cacheable (ioremap_wc, basically), and treat it as > ordinary memory. Yeah, I'd expect framebuffers to be mapped as normal NC. That should be fine for prefetchable BARs, no? Will ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 9:41 ` Will Deacon (?) @ 2018-08-03 17:09 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-03 17:09 UTC (permalink / raw) To: Will Deacon, Jingoo Han, Joao Pinto Cc: Ard Biesheuvel, Matt Sealey, Thomas Petazzoni, libc-alpha, Catalin Marinas, Russell King, Linux Kernel Mailing List, linux-arm-kernel, linux-pci On Fri, 3 Aug 2018, Will Deacon wrote: > On Fri, Aug 03, 2018 at 09:16:39AM +0200, Ard Biesheuvel wrote: > > On 3 August 2018 at 08:35, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > > > On Thu, 2 Aug 2018, Matt Sealey wrote: > > > > > >> The easiest explanation for this would be that the memory isn?t mapped > > >> correctly. You can?t use PCIe memory spaces with anything other than > > >> Device-nGnRE or stricter mappings. That?s just differences between the > > >> AMBA and PCIe (posted/unposted) memory models. > > > > Whoa hold on there. > > > > Are you saying we cannot have PCIe BAR windows with memory semantics on ARM? > > > > Most accelerated graphics drivers rely heavily on the ability to map > > the VRAM normal-non-cacheable (ioremap_wc, basically), and treat it as > > ordinary memory. > > Yeah, I'd expect framebuffers to be mapped as normal NC. That should be > fine for prefetchable BARs, no? > > Will So - why does it corrupt data then? I've created this program that reproduces the data corruption quicky. If I run it on /dev/fb0, I get an instant failure. Sometimes a few bytes are not written, sometimes a few bytes are written with a value that should be 16 bytes apart. I tried to run it on system RAM mapped with the NC attribute and I didn't get any corruption - that suggests the the bug may be in the PCIE subsystem. Jingoo Han and Joao Pinto are maintainers for the designware PCIE controllers. Could you suggest why does the controller corrupt data when writing to videoram? Are there any tricks that could be tried to work around the corruption? Mikulas #include <stdio.h> #include <stdlib.h> #include <string.h> #include <fcntl.h> #include <unistd.h> #include <sys/mman.h> #define LEN 256 #define PRINT_STRIDE 0x20 static unsigned char data[LEN]; static unsigned char val = 0; static unsigned char prev_data[LEN]; static unsigned char map_copy[LEN]; int main(int argc, char *argv[]) { unsigned long n = 0; int h; unsigned char *map; unsigned start, end, i; if (argc < 2) fprintf(stderr, "argc\n"), exit(1); if (argc >= 4) srandom(atoll(argv[3])); h = open(argv[1], O_RDWR | O_DSYNC); if (h == -1) perror("open"), exit(1); map = mmap(NULL, LEN, PROT_READ | PROT_WRITE, MAP_SHARED, h, argc >= 3 ? strtoull(argv[2], NULL, 16) : 0); if (map == MAP_FAILED) perror("mmap"), exit(1); memset(data, 0, LEN); memset(prev_data, 0, LEN); memset(map, 0, LEN); sleep(1); while (1) { start = (unsigned)random() % (LEN + 1); end = (unsigned)random() % (LEN + 1); if (start > end) continue; for (i = start; i < end; i++) data[i] = val++; memcpy(map + start, data + start, end - start); if (memcmp(map, data, LEN)) { unsigned j; memcpy(map_copy, map, LEN); fprintf(stderr, "mismatch after %lu loops!\n", n); fprintf(stderr, "last copied range: 0x%x - 0x%x (0x%x)\n", start, end, (unsigned)(end - start)); for (j = 0; j < LEN; j += PRINT_STRIDE) { fprintf(stderr, "p[%03x]", j); for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", prev_data[i]); fprintf(stderr, "\n"); fprintf(stderr, "d[%03x]", j); for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", data[i]); fprintf(stderr, "\n"); fprintf(stderr, "m[%03x]", j); for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", map_copy[i]); fprintf(stderr, "\n\n"); } exit(1); } memcpy(prev_data, data, LEN); n++; } } ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 17:09 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-03 17:09 UTC (permalink / raw) To: linux-arm-kernel On Fri, 3 Aug 2018, Will Deacon wrote: > On Fri, Aug 03, 2018 at 09:16:39AM +0200, Ard Biesheuvel wrote: > > On 3 August 2018 at 08:35, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > > > On Thu, 2 Aug 2018, Matt Sealey wrote: > > > > > >> The easiest explanation for this would be that the memory isn?t mapped > > >> correctly. You can?t use PCIe memory spaces with anything other than > > >> Device-nGnRE or stricter mappings. That?s just differences between the > > >> AMBA and PCIe (posted/unposted) memory models. > > > > Whoa hold on there. > > > > Are you saying we cannot have PCIe BAR windows with memory semantics on ARM? > > > > Most accelerated graphics drivers rely heavily on the ability to map > > the VRAM normal-non-cacheable (ioremap_wc, basically), and treat it as > > ordinary memory. > > Yeah, I'd expect framebuffers to be mapped as normal NC. That should be > fine for prefetchable BARs, no? > > Will So - why does it corrupt data then? I've created this program that reproduces the data corruption quicky. If I run it on /dev/fb0, I get an instant failure. Sometimes a few bytes are not written, sometimes a few bytes are written with a value that should be 16 bytes apart. I tried to run it on system RAM mapped with the NC attribute and I didn't get any corruption - that suggests the the bug may be in the PCIE subsystem. Jingoo Han and Joao Pinto are maintainers for the designware PCIE controllers. Could you suggest why does the controller corrupt data when writing to videoram? Are there any tricks that could be tried to work around the corruption? Mikulas #include <stdio.h> #include <stdlib.h> #include <string.h> #include <fcntl.h> #include <unistd.h> #include <sys/mman.h> #define LEN 256 #define PRINT_STRIDE 0x20 static unsigned char data[LEN]; static unsigned char val = 0; static unsigned char prev_data[LEN]; static unsigned char map_copy[LEN]; int main(int argc, char *argv[]) { unsigned long n = 0; int h; unsigned char *map; unsigned start, end, i; if (argc < 2) fprintf(stderr, "argc\n"), exit(1); if (argc >= 4) srandom(atoll(argv[3])); h = open(argv[1], O_RDWR | O_DSYNC); if (h == -1) perror("open"), exit(1); map = mmap(NULL, LEN, PROT_READ | PROT_WRITE, MAP_SHARED, h, argc >= 3 ? strtoull(argv[2], NULL, 16) : 0); if (map == MAP_FAILED) perror("mmap"), exit(1); memset(data, 0, LEN); memset(prev_data, 0, LEN); memset(map, 0, LEN); sleep(1); while (1) { start = (unsigned)random() % (LEN + 1); end = (unsigned)random() % (LEN + 1); if (start > end) continue; for (i = start; i < end; i++) data[i] = val++; memcpy(map + start, data + start, end - start); if (memcmp(map, data, LEN)) { unsigned j; memcpy(map_copy, map, LEN); fprintf(stderr, "mismatch after %lu loops!\n", n); fprintf(stderr, "last copied range: 0x%x - 0x%x (0x%x)\n", start, end, (unsigned)(end - start)); for (j = 0; j < LEN; j += PRINT_STRIDE) { fprintf(stderr, "p[%03x]", j); for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", prev_data[i]); fprintf(stderr, "\n"); fprintf(stderr, "d[%03x]", j); for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", data[i]); fprintf(stderr, "\n"); fprintf(stderr, "m[%03x]", j); for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", map_copy[i]); fprintf(stderr, "\n\n"); } exit(1); } memcpy(prev_data, data, LEN); n++; } } ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 17:09 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-03 17:09 UTC (permalink / raw) To: Will Deacon, Jingoo Han, Joao Pinto Cc: Thomas Petazzoni, libc-alpha, Ard Biesheuvel, Catalin Marinas, Russell King, Linux Kernel Mailing List, Matt Sealey, linux-pci, linux-arm-kernel On Fri, 3 Aug 2018, Will Deacon wrote: > On Fri, Aug 03, 2018 at 09:16:39AM +0200, Ard Biesheuvel wrote: > > On 3 August 2018 at 08:35, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > > > On Thu, 2 Aug 2018, Matt Sealey wrote: > > > > > >> The easiest explanation for this would be that the memory isn?t mapped > > >> correctly. You can?t use PCIe memory spaces with anything other than > > >> Device-nGnRE or stricter mappings. That?s just differences between the > > >> AMBA and PCIe (posted/unposted) memory models. > > > > Whoa hold on there. > > > > Are you saying we cannot have PCIe BAR windows with memory semantics on ARM? > > > > Most accelerated graphics drivers rely heavily on the ability to map > > the VRAM normal-non-cacheable (ioremap_wc, basically), and treat it as > > ordinary memory. > > Yeah, I'd expect framebuffers to be mapped as normal NC. That should be > fine for prefetchable BARs, no? > > Will So - why does it corrupt data then? I've created this program that reproduces the data corruption quicky. If I run it on /dev/fb0, I get an instant failure. Sometimes a few bytes are not written, sometimes a few bytes are written with a value that should be 16 bytes apart. I tried to run it on system RAM mapped with the NC attribute and I didn't get any corruption - that suggests the the bug may be in the PCIE subsystem. Jingoo Han and Joao Pinto are maintainers for the designware PCIE controllers. Could you suggest why does the controller corrupt data when writing to videoram? Are there any tricks that could be tried to work around the corruption? Mikulas #include <stdio.h> #include <stdlib.h> #include <string.h> #include <fcntl.h> #include <unistd.h> #include <sys/mman.h> #define LEN 256 #define PRINT_STRIDE 0x20 static unsigned char data[LEN]; static unsigned char val = 0; static unsigned char prev_data[LEN]; static unsigned char map_copy[LEN]; int main(int argc, char *argv[]) { unsigned long n = 0; int h; unsigned char *map; unsigned start, end, i; if (argc < 2) fprintf(stderr, "argc\n"), exit(1); if (argc >= 4) srandom(atoll(argv[3])); h = open(argv[1], O_RDWR | O_DSYNC); if (h == -1) perror("open"), exit(1); map = mmap(NULL, LEN, PROT_READ | PROT_WRITE, MAP_SHARED, h, argc >= 3 ? strtoull(argv[2], NULL, 16) : 0); if (map == MAP_FAILED) perror("mmap"), exit(1); memset(data, 0, LEN); memset(prev_data, 0, LEN); memset(map, 0, LEN); sleep(1); while (1) { start = (unsigned)random() % (LEN + 1); end = (unsigned)random() % (LEN + 1); if (start > end) continue; for (i = start; i < end; i++) data[i] = val++; memcpy(map + start, data + start, end - start); if (memcmp(map, data, LEN)) { unsigned j; memcpy(map_copy, map, LEN); fprintf(stderr, "mismatch after %lu loops!\n", n); fprintf(stderr, "last copied range: 0x%x - 0x%x (0x%x)\n", start, end, (unsigned)(end - start)); for (j = 0; j < LEN; j += PRINT_STRIDE) { fprintf(stderr, "p[%03x]", j); for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", prev_data[i]); fprintf(stderr, "\n"); fprintf(stderr, "d[%03x]", j); for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", data[i]); fprintf(stderr, "\n"); fprintf(stderr, "m[%03x]", j); for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", map_copy[i]); fprintf(stderr, "\n\n"); } exit(1); } memcpy(prev_data, data, LEN); n++; } } _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 17:09 ` Mikulas Patocka (?) @ 2018-08-03 17:32 ` Sinan Kaya -1 siblings, 0 replies; 238+ messages in thread From: Sinan Kaya @ 2018-08-03 17:32 UTC (permalink / raw) To: Mikulas Patocka, Will Deacon, Jingoo Han, Joao Pinto Cc: Ard Biesheuvel, Matt Sealey, Thomas Petazzoni, libc-alpha, Catalin Marinas, Russell King, Linux Kernel Mailing List, linux-arm-kernel, linux-pci On 8/3/2018 1:09 PM, Mikulas Patocka wrote: >>> Most accelerated graphics drivers rely heavily on the ability to map >>> the VRAM normal-non-cacheable (ioremap_wc, basically), and treat it as >>> ordinary memory. >> Yeah, I'd expect framebuffers to be mapped as normal NC. That should be >> fine for prefetchable BARs, no? >> >> Will > So - why does it corrupt data then? I've created this program that > reproduces the data corruption quicky. If I run it on /dev/fb0, I get an > instant failure. Sometimes a few bytes are not written, sometimes a few > bytes are written with a value that should be 16 bytes apart. > > I tried to run it on system RAM mapped with the NC attribute and I didn't > get any corruption - that suggests the the bug may be in the PCIE > subsystem. Note that normal-NC gives you write combining whereas device nGnRE doesn't have any write-combining support. normal-NC is typically mapped to prefetchable BAR space where write-combining is welcome. It could be an issue on the SOC itself too. I suggest you contact your board vendor. ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 17:32 ` Sinan Kaya 0 siblings, 0 replies; 238+ messages in thread From: Sinan Kaya @ 2018-08-03 17:32 UTC (permalink / raw) To: linux-arm-kernel On 8/3/2018 1:09 PM, Mikulas Patocka wrote: >>> Most accelerated graphics drivers rely heavily on the ability to map >>> the VRAM normal-non-cacheable (ioremap_wc, basically), and treat it as >>> ordinary memory. >> Yeah, I'd expect framebuffers to be mapped as normal NC. That should be >> fine for prefetchable BARs, no? >> >> Will > So - why does it corrupt data then? I've created this program that > reproduces the data corruption quicky. If I run it on /dev/fb0, I get an > instant failure. Sometimes a few bytes are not written, sometimes a few > bytes are written with a value that should be 16 bytes apart. > > I tried to run it on system RAM mapped with the NC attribute and I didn't > get any corruption - that suggests the the bug may be in the PCIE > subsystem. Note that normal-NC gives you write combining whereas device nGnRE doesn't have any write-combining support. normal-NC is typically mapped to prefetchable BAR space where write-combining is welcome. It could be an issue on the SOC itself too. I suggest you contact your board vendor. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 17:32 ` Sinan Kaya 0 siblings, 0 replies; 238+ messages in thread From: Sinan Kaya @ 2018-08-03 17:32 UTC (permalink / raw) To: Mikulas Patocka, Will Deacon, Jingoo Han, Joao Pinto Cc: Thomas Petazzoni, libc-alpha, Ard Biesheuvel, Catalin Marinas, Russell King, Linux Kernel Mailing List, Matt Sealey, linux-pci, linux-arm-kernel On 8/3/2018 1:09 PM, Mikulas Patocka wrote: >>> Most accelerated graphics drivers rely heavily on the ability to map >>> the VRAM normal-non-cacheable (ioremap_wc, basically), and treat it as >>> ordinary memory. >> Yeah, I'd expect framebuffers to be mapped as normal NC. That should be >> fine for prefetchable BARs, no? >> >> Will > So - why does it corrupt data then? I've created this program that > reproduces the data corruption quicky. If I run it on /dev/fb0, I get an > instant failure. Sometimes a few bytes are not written, sometimes a few > bytes are written with a value that should be 16 bytes apart. > > I tried to run it on system RAM mapped with the NC attribute and I didn't > get any corruption - that suggests the the bug may be in the PCIE > subsystem. Note that normal-NC gives you write combining whereas device nGnRE doesn't have any write-combining support. normal-NC is typically mapped to prefetchable BAR space where write-combining is welcome. It could be an issue on the SOC itself too. I suggest you contact your board vendor. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 17:09 ` Mikulas Patocka (?) @ 2018-08-03 17:33 ` Ard Biesheuvel -1 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-03 17:33 UTC (permalink / raw) To: Mikulas Patocka Cc: Will Deacon, Jingoo Han, Joao Pinto, Matt Sealey, Thomas Petazzoni, Catalin Marinas, Russell King, Linux Kernel Mailing List, linux-arm-kernel, linux-pci (- libc-alpha) On 3 August 2018 at 19:09, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Fri, 3 Aug 2018, Will Deacon wrote: > >> On Fri, Aug 03, 2018 at 09:16:39AM +0200, Ard Biesheuvel wrote: >> > On 3 August 2018 at 08:35, Mikulas Patocka <mpatocka@redhat.com> wrote: >> > > >> > > >> > > On Thu, 2 Aug 2018, Matt Sealey wrote: >> > > >> > >> The easiest explanation for this would be that the memory isn?t mapped >> > >> correctly. You can?t use PCIe memory spaces with anything other than >> > >> Device-nGnRE or stricter mappings. That?s just differences between the >> > >> AMBA and PCIe (posted/unposted) memory models. >> > >> > Whoa hold on there. >> > >> > Are you saying we cannot have PCIe BAR windows with memory semantics on ARM? >> > >> > Most accelerated graphics drivers rely heavily on the ability to map >> > the VRAM normal-non-cacheable (ioremap_wc, basically), and treat it as >> > ordinary memory. >> >> Yeah, I'd expect framebuffers to be mapped as normal NC. That should be >> fine for prefetchable BARs, no? >> >> Will > > So - why does it corrupt data then? I've created this program that > reproduces the data corruption quicky. If I run it on /dev/fb0, I get an > instant failure. Sometimes a few bytes are not written, sometimes a few > bytes are written with a value that should be 16 bytes apart. > Are we still talking about overlapping unaligned accesses here? Or do you see other failures as well? > I tried to run it on system RAM mapped with the NC attribute and I didn't > get any corruption - that suggests the the bug may be in the PCIE > subsystem. > > Jingoo Han and Joao Pinto are maintainers for the designware PCIE > controllers. Could you suggest why does the controller corrupt data when > writing to videoram? Are there any tricks that could be tried to work > around the corruption? > > Mikulas > > > > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > #include <fcntl.h> > #include <unistd.h> > #include <sys/mman.h> > > #define LEN 256 > #define PRINT_STRIDE 0x20 > > static unsigned char data[LEN]; > static unsigned char val = 0; > > static unsigned char prev_data[LEN]; > > static unsigned char map_copy[LEN]; > > int main(int argc, char *argv[]) > { > unsigned long n = 0; > int h; > unsigned char *map; > unsigned start, end, i; > > if (argc < 2) fprintf(stderr, "argc\n"), exit(1); > if (argc >= 4) srandom(atoll(argv[3])); > h = open(argv[1], O_RDWR | O_DSYNC); > if (h == -1) perror("open"), exit(1); > map = mmap(NULL, LEN, PROT_READ | PROT_WRITE, MAP_SHARED, h, argc >= 3 ? strtoull(argv[2], NULL, 16) : 0); > if (map == MAP_FAILED) perror("mmap"), exit(1); > > memset(data, 0, LEN); > memset(prev_data, 0, LEN); > memset(map, 0, LEN); > > sleep(1); > > while (1) { > start = (unsigned)random() % (LEN + 1); > end = (unsigned)random() % (LEN + 1); > if (start > end) > continue; > for (i = start; i < end; i++) > data[i] = val++; > memcpy(map + start, data + start, end - start); > if (memcmp(map, data, LEN)) { > unsigned j; > memcpy(map_copy, map, LEN); > fprintf(stderr, "mismatch after %lu loops!\n", n); > fprintf(stderr, "last copied range: 0x%x - 0x%x (0x%x)\n", start, end, (unsigned)(end - start)); > for (j = 0; j < LEN; j += PRINT_STRIDE) { > fprintf(stderr, "p[%03x]", j); > for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) > fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", prev_data[i]); > fprintf(stderr, "\n"); > fprintf(stderr, "d[%03x]", j); > for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) > fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", data[i]); > fprintf(stderr, "\n"); > fprintf(stderr, "m[%03x]", j); > for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) > fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", map_copy[i]); > fprintf(stderr, "\n\n"); > } > exit(1); > } > memcpy(prev_data, data, LEN); > n++; > } > } ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 17:33 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-03 17:33 UTC (permalink / raw) To: linux-arm-kernel (- libc-alpha) On 3 August 2018 at 19:09, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Fri, 3 Aug 2018, Will Deacon wrote: > >> On Fri, Aug 03, 2018 at 09:16:39AM +0200, Ard Biesheuvel wrote: >> > On 3 August 2018 at 08:35, Mikulas Patocka <mpatocka@redhat.com> wrote: >> > > >> > > >> > > On Thu, 2 Aug 2018, Matt Sealey wrote: >> > > >> > >> The easiest explanation for this would be that the memory isn?t mapped >> > >> correctly. You can?t use PCIe memory spaces with anything other than >> > >> Device-nGnRE or stricter mappings. That?s just differences between the >> > >> AMBA and PCIe (posted/unposted) memory models. >> > >> > Whoa hold on there. >> > >> > Are you saying we cannot have PCIe BAR windows with memory semantics on ARM? >> > >> > Most accelerated graphics drivers rely heavily on the ability to map >> > the VRAM normal-non-cacheable (ioremap_wc, basically), and treat it as >> > ordinary memory. >> >> Yeah, I'd expect framebuffers to be mapped as normal NC. That should be >> fine for prefetchable BARs, no? >> >> Will > > So - why does it corrupt data then? I've created this program that > reproduces the data corruption quicky. If I run it on /dev/fb0, I get an > instant failure. Sometimes a few bytes are not written, sometimes a few > bytes are written with a value that should be 16 bytes apart. > Are we still talking about overlapping unaligned accesses here? Or do you see other failures as well? > I tried to run it on system RAM mapped with the NC attribute and I didn't > get any corruption - that suggests the the bug may be in the PCIE > subsystem. > > Jingoo Han and Joao Pinto are maintainers for the designware PCIE > controllers. Could you suggest why does the controller corrupt data when > writing to videoram? Are there any tricks that could be tried to work > around the corruption? > > Mikulas > > > > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > #include <fcntl.h> > #include <unistd.h> > #include <sys/mman.h> > > #define LEN 256 > #define PRINT_STRIDE 0x20 > > static unsigned char data[LEN]; > static unsigned char val = 0; > > static unsigned char prev_data[LEN]; > > static unsigned char map_copy[LEN]; > > int main(int argc, char *argv[]) > { > unsigned long n = 0; > int h; > unsigned char *map; > unsigned start, end, i; > > if (argc < 2) fprintf(stderr, "argc\n"), exit(1); > if (argc >= 4) srandom(atoll(argv[3])); > h = open(argv[1], O_RDWR | O_DSYNC); > if (h == -1) perror("open"), exit(1); > map = mmap(NULL, LEN, PROT_READ | PROT_WRITE, MAP_SHARED, h, argc >= 3 ? strtoull(argv[2], NULL, 16) : 0); > if (map == MAP_FAILED) perror("mmap"), exit(1); > > memset(data, 0, LEN); > memset(prev_data, 0, LEN); > memset(map, 0, LEN); > > sleep(1); > > while (1) { > start = (unsigned)random() % (LEN + 1); > end = (unsigned)random() % (LEN + 1); > if (start > end) > continue; > for (i = start; i < end; i++) > data[i] = val++; > memcpy(map + start, data + start, end - start); > if (memcmp(map, data, LEN)) { > unsigned j; > memcpy(map_copy, map, LEN); > fprintf(stderr, "mismatch after %lu loops!\n", n); > fprintf(stderr, "last copied range: 0x%x - 0x%x (0x%x)\n", start, end, (unsigned)(end - start)); > for (j = 0; j < LEN; j += PRINT_STRIDE) { > fprintf(stderr, "p[%03x]", j); > for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) > fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", prev_data[i]); > fprintf(stderr, "\n"); > fprintf(stderr, "d[%03x]", j); > for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) > fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", data[i]); > fprintf(stderr, "\n"); > fprintf(stderr, "m[%03x]", j); > for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) > fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", map_copy[i]); > fprintf(stderr, "\n\n"); > } > exit(1); > } > memcpy(prev_data, data, LEN); > n++; > } > } ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 17:33 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-03 17:33 UTC (permalink / raw) To: Mikulas Patocka Cc: Thomas Petazzoni, Joao Pinto, linux-pci, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, Catalin Marinas, linux-arm-kernel (- libc-alpha) On 3 August 2018 at 19:09, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Fri, 3 Aug 2018, Will Deacon wrote: > >> On Fri, Aug 03, 2018 at 09:16:39AM +0200, Ard Biesheuvel wrote: >> > On 3 August 2018 at 08:35, Mikulas Patocka <mpatocka@redhat.com> wrote: >> > > >> > > >> > > On Thu, 2 Aug 2018, Matt Sealey wrote: >> > > >> > >> The easiest explanation for this would be that the memory isn?t mapped >> > >> correctly. You can?t use PCIe memory spaces with anything other than >> > >> Device-nGnRE or stricter mappings. That?s just differences between the >> > >> AMBA and PCIe (posted/unposted) memory models. >> > >> > Whoa hold on there. >> > >> > Are you saying we cannot have PCIe BAR windows with memory semantics on ARM? >> > >> > Most accelerated graphics drivers rely heavily on the ability to map >> > the VRAM normal-non-cacheable (ioremap_wc, basically), and treat it as >> > ordinary memory. >> >> Yeah, I'd expect framebuffers to be mapped as normal NC. That should be >> fine for prefetchable BARs, no? >> >> Will > > So - why does it corrupt data then? I've created this program that > reproduces the data corruption quicky. If I run it on /dev/fb0, I get an > instant failure. Sometimes a few bytes are not written, sometimes a few > bytes are written with a value that should be 16 bytes apart. > Are we still talking about overlapping unaligned accesses here? Or do you see other failures as well? > I tried to run it on system RAM mapped with the NC attribute and I didn't > get any corruption - that suggests the the bug may be in the PCIE > subsystem. > > Jingoo Han and Joao Pinto are maintainers for the designware PCIE > controllers. Could you suggest why does the controller corrupt data when > writing to videoram? Are there any tricks that could be tried to work > around the corruption? > > Mikulas > > > > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > #include <fcntl.h> > #include <unistd.h> > #include <sys/mman.h> > > #define LEN 256 > #define PRINT_STRIDE 0x20 > > static unsigned char data[LEN]; > static unsigned char val = 0; > > static unsigned char prev_data[LEN]; > > static unsigned char map_copy[LEN]; > > int main(int argc, char *argv[]) > { > unsigned long n = 0; > int h; > unsigned char *map; > unsigned start, end, i; > > if (argc < 2) fprintf(stderr, "argc\n"), exit(1); > if (argc >= 4) srandom(atoll(argv[3])); > h = open(argv[1], O_RDWR | O_DSYNC); > if (h == -1) perror("open"), exit(1); > map = mmap(NULL, LEN, PROT_READ | PROT_WRITE, MAP_SHARED, h, argc >= 3 ? strtoull(argv[2], NULL, 16) : 0); > if (map == MAP_FAILED) perror("mmap"), exit(1); > > memset(data, 0, LEN); > memset(prev_data, 0, LEN); > memset(map, 0, LEN); > > sleep(1); > > while (1) { > start = (unsigned)random() % (LEN + 1); > end = (unsigned)random() % (LEN + 1); > if (start > end) > continue; > for (i = start; i < end; i++) > data[i] = val++; > memcpy(map + start, data + start, end - start); > if (memcmp(map, data, LEN)) { > unsigned j; > memcpy(map_copy, map, LEN); > fprintf(stderr, "mismatch after %lu loops!\n", n); > fprintf(stderr, "last copied range: 0x%x - 0x%x (0x%x)\n", start, end, (unsigned)(end - start)); > for (j = 0; j < LEN; j += PRINT_STRIDE) { > fprintf(stderr, "p[%03x]", j); > for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) > fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", prev_data[i]); > fprintf(stderr, "\n"); > fprintf(stderr, "d[%03x]", j); > for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) > fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", data[i]); > fprintf(stderr, "\n"); > fprintf(stderr, "m[%03x]", j); > for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) > fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", map_copy[i]); > fprintf(stderr, "\n\n"); > } > exit(1); > } > memcpy(prev_data, data, LEN); > n++; > } > } _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 17:33 ` Ard Biesheuvel (?) @ 2018-08-03 18:25 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-03 18:25 UTC (permalink / raw) To: Ard Biesheuvel Cc: Will Deacon, Jingoo Han, Joao Pinto, Matt Sealey, Thomas Petazzoni, Catalin Marinas, Russell King, Linux Kernel Mailing List, linux-arm-kernel, linux-pci On Fri, 3 Aug 2018, Ard Biesheuvel wrote: > (- libc-alpha) > > On 3 August 2018 at 19:09, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Fri, 3 Aug 2018, Will Deacon wrote: > > > >> On Fri, Aug 03, 2018 at 09:16:39AM +0200, Ard Biesheuvel wrote: > >> > On 3 August 2018 at 08:35, Mikulas Patocka <mpatocka@redhat.com> wrote: > >> > > > >> > > > >> > > On Thu, 2 Aug 2018, Matt Sealey wrote: > >> > > > >> > >> The easiest explanation for this would be that the memory isn?t mapped > >> > >> correctly. You can?t use PCIe memory spaces with anything other than > >> > >> Device-nGnRE or stricter mappings. That?s just differences between the > >> > >> AMBA and PCIe (posted/unposted) memory models. > >> > > >> > Whoa hold on there. > >> > > >> > Are you saying we cannot have PCIe BAR windows with memory semantics on ARM? > >> > > >> > Most accelerated graphics drivers rely heavily on the ability to map > >> > the VRAM normal-non-cacheable (ioremap_wc, basically), and treat it as > >> > ordinary memory. > >> > >> Yeah, I'd expect framebuffers to be mapped as normal NC. That should be > >> fine for prefetchable BARs, no? > >> > >> Will > > > > So - why does it corrupt data then? I've created this program that > > reproduces the data corruption quicky. If I run it on /dev/fb0, I get an > > instant failure. Sometimes a few bytes are not written, sometimes a few > > bytes are written with a value that should be 16 bytes apart. > > > > Are we still talking about overlapping unaligned accesses here? Or do > you see other failures as well? Yes - it is caused by overlapping unaligned accesses inside memcpy. When I put "dmb sy" between the overlapping accesses in glibc/sysdeps/aarch64/memcpy.S, this program doesn't detect any memory corruption. > > I tried to run it on system RAM mapped with the NC attribute and I didn't > > get any corruption - that suggests the the bug may be in the PCIE > > subsystem. > > > > Jingoo Han and Joao Pinto are maintainers for the designware PCIE > > controllers. Could you suggest why does the controller corrupt data when > > writing to videoram? Are there any tricks that could be tried to work > > around the corruption? > > > > Mikulas > > > > > > > > #include <stdio.h> > > #include <stdlib.h> > > #include <string.h> > > #include <fcntl.h> > > #include <unistd.h> > > #include <sys/mman.h> > > > > #define LEN 256 > > #define PRINT_STRIDE 0x20 > > > > static unsigned char data[LEN]; > > static unsigned char val = 0; > > > > static unsigned char prev_data[LEN]; > > > > static unsigned char map_copy[LEN]; > > > > int main(int argc, char *argv[]) > > { > > unsigned long n = 0; > > int h; > > unsigned char *map; > > unsigned start, end, i; > > > > if (argc < 2) fprintf(stderr, "argc\n"), exit(1); > > if (argc >= 4) srandom(atoll(argv[3])); > > h = open(argv[1], O_RDWR | O_DSYNC); > > if (h == -1) perror("open"), exit(1); > > map = mmap(NULL, LEN, PROT_READ | PROT_WRITE, MAP_SHARED, h, argc >= 3 ? strtoull(argv[2], NULL, 16) : 0); > > if (map == MAP_FAILED) perror("mmap"), exit(1); > > > > memset(data, 0, LEN); > > memset(prev_data, 0, LEN); > > memset(map, 0, LEN); > > > > sleep(1); > > > > while (1) { > > start = (unsigned)random() % (LEN + 1); > > end = (unsigned)random() % (LEN + 1); > > if (start > end) > > continue; > > for (i = start; i < end; i++) > > data[i] = val++; > > memcpy(map + start, data + start, end - start); > > if (memcmp(map, data, LEN)) { > > unsigned j; > > memcpy(map_copy, map, LEN); > > fprintf(stderr, "mismatch after %lu loops!\n", n); > > fprintf(stderr, "last copied range: 0x%x - 0x%x (0x%x)\n", start, end, (unsigned)(end - start)); > > for (j = 0; j < LEN; j += PRINT_STRIDE) { > > fprintf(stderr, "p[%03x]", j); > > for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) > > fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", prev_data[i]); > > fprintf(stderr, "\n"); > > fprintf(stderr, "d[%03x]", j); > > for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) > > fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", data[i]); > > fprintf(stderr, "\n"); > > fprintf(stderr, "m[%03x]", j); > > for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) > > fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", map_copy[i]); > > fprintf(stderr, "\n\n"); > > } > > exit(1); > > } > > memcpy(prev_data, data, LEN); > > n++; > > } > > } > ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 18:25 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-03 18:25 UTC (permalink / raw) To: linux-arm-kernel On Fri, 3 Aug 2018, Ard Biesheuvel wrote: > (- libc-alpha) > > On 3 August 2018 at 19:09, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Fri, 3 Aug 2018, Will Deacon wrote: > > > >> On Fri, Aug 03, 2018 at 09:16:39AM +0200, Ard Biesheuvel wrote: > >> > On 3 August 2018 at 08:35, Mikulas Patocka <mpatocka@redhat.com> wrote: > >> > > > >> > > > >> > > On Thu, 2 Aug 2018, Matt Sealey wrote: > >> > > > >> > >> The easiest explanation for this would be that the memory isn?t mapped > >> > >> correctly. You can?t use PCIe memory spaces with anything other than > >> > >> Device-nGnRE or stricter mappings. That?s just differences between the > >> > >> AMBA and PCIe (posted/unposted) memory models. > >> > > >> > Whoa hold on there. > >> > > >> > Are you saying we cannot have PCIe BAR windows with memory semantics on ARM? > >> > > >> > Most accelerated graphics drivers rely heavily on the ability to map > >> > the VRAM normal-non-cacheable (ioremap_wc, basically), and treat it as > >> > ordinary memory. > >> > >> Yeah, I'd expect framebuffers to be mapped as normal NC. That should be > >> fine for prefetchable BARs, no? > >> > >> Will > > > > So - why does it corrupt data then? I've created this program that > > reproduces the data corruption quicky. If I run it on /dev/fb0, I get an > > instant failure. Sometimes a few bytes are not written, sometimes a few > > bytes are written with a value that should be 16 bytes apart. > > > > Are we still talking about overlapping unaligned accesses here? Or do > you see other failures as well? Yes - it is caused by overlapping unaligned accesses inside memcpy. When I put "dmb sy" between the overlapping accesses in glibc/sysdeps/aarch64/memcpy.S, this program doesn't detect any memory corruption. > > I tried to run it on system RAM mapped with the NC attribute and I didn't > > get any corruption - that suggests the the bug may be in the PCIE > > subsystem. > > > > Jingoo Han and Joao Pinto are maintainers for the designware PCIE > > controllers. Could you suggest why does the controller corrupt data when > > writing to videoram? Are there any tricks that could be tried to work > > around the corruption? > > > > Mikulas > > > > > > > > #include <stdio.h> > > #include <stdlib.h> > > #include <string.h> > > #include <fcntl.h> > > #include <unistd.h> > > #include <sys/mman.h> > > > > #define LEN 256 > > #define PRINT_STRIDE 0x20 > > > > static unsigned char data[LEN]; > > static unsigned char val = 0; > > > > static unsigned char prev_data[LEN]; > > > > static unsigned char map_copy[LEN]; > > > > int main(int argc, char *argv[]) > > { > > unsigned long n = 0; > > int h; > > unsigned char *map; > > unsigned start, end, i; > > > > if (argc < 2) fprintf(stderr, "argc\n"), exit(1); > > if (argc >= 4) srandom(atoll(argv[3])); > > h = open(argv[1], O_RDWR | O_DSYNC); > > if (h == -1) perror("open"), exit(1); > > map = mmap(NULL, LEN, PROT_READ | PROT_WRITE, MAP_SHARED, h, argc >= 3 ? strtoull(argv[2], NULL, 16) : 0); > > if (map == MAP_FAILED) perror("mmap"), exit(1); > > > > memset(data, 0, LEN); > > memset(prev_data, 0, LEN); > > memset(map, 0, LEN); > > > > sleep(1); > > > > while (1) { > > start = (unsigned)random() % (LEN + 1); > > end = (unsigned)random() % (LEN + 1); > > if (start > end) > > continue; > > for (i = start; i < end; i++) > > data[i] = val++; > > memcpy(map + start, data + start, end - start); > > if (memcmp(map, data, LEN)) { > > unsigned j; > > memcpy(map_copy, map, LEN); > > fprintf(stderr, "mismatch after %lu loops!\n", n); > > fprintf(stderr, "last copied range: 0x%x - 0x%x (0x%x)\n", start, end, (unsigned)(end - start)); > > for (j = 0; j < LEN; j += PRINT_STRIDE) { > > fprintf(stderr, "p[%03x]", j); > > for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) > > fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", prev_data[i]); > > fprintf(stderr, "\n"); > > fprintf(stderr, "d[%03x]", j); > > for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) > > fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", data[i]); > > fprintf(stderr, "\n"); > > fprintf(stderr, "m[%03x]", j); > > for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) > > fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", map_copy[i]); > > fprintf(stderr, "\n\n"); > > } > > exit(1); > > } > > memcpy(prev_data, data, LEN); > > n++; > > } > > } > ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 18:25 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-03 18:25 UTC (permalink / raw) To: Ard Biesheuvel Cc: Thomas Petazzoni, Joao Pinto, linux-pci, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, Catalin Marinas, linux-arm-kernel On Fri, 3 Aug 2018, Ard Biesheuvel wrote: > (- libc-alpha) > > On 3 August 2018 at 19:09, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Fri, 3 Aug 2018, Will Deacon wrote: > > > >> On Fri, Aug 03, 2018 at 09:16:39AM +0200, Ard Biesheuvel wrote: > >> > On 3 August 2018 at 08:35, Mikulas Patocka <mpatocka@redhat.com> wrote: > >> > > > >> > > > >> > > On Thu, 2 Aug 2018, Matt Sealey wrote: > >> > > > >> > >> The easiest explanation for this would be that the memory isn?t mapped > >> > >> correctly. You can?t use PCIe memory spaces with anything other than > >> > >> Device-nGnRE or stricter mappings. That?s just differences between the > >> > >> AMBA and PCIe (posted/unposted) memory models. > >> > > >> > Whoa hold on there. > >> > > >> > Are you saying we cannot have PCIe BAR windows with memory semantics on ARM? > >> > > >> > Most accelerated graphics drivers rely heavily on the ability to map > >> > the VRAM normal-non-cacheable (ioremap_wc, basically), and treat it as > >> > ordinary memory. > >> > >> Yeah, I'd expect framebuffers to be mapped as normal NC. That should be > >> fine for prefetchable BARs, no? > >> > >> Will > > > > So - why does it corrupt data then? I've created this program that > > reproduces the data corruption quicky. If I run it on /dev/fb0, I get an > > instant failure. Sometimes a few bytes are not written, sometimes a few > > bytes are written with a value that should be 16 bytes apart. > > > > Are we still talking about overlapping unaligned accesses here? Or do > you see other failures as well? Yes - it is caused by overlapping unaligned accesses inside memcpy. When I put "dmb sy" between the overlapping accesses in glibc/sysdeps/aarch64/memcpy.S, this program doesn't detect any memory corruption. > > I tried to run it on system RAM mapped with the NC attribute and I didn't > > get any corruption - that suggests the the bug may be in the PCIE > > subsystem. > > > > Jingoo Han and Joao Pinto are maintainers for the designware PCIE > > controllers. Could you suggest why does the controller corrupt data when > > writing to videoram? Are there any tricks that could be tried to work > > around the corruption? > > > > Mikulas > > > > > > > > #include <stdio.h> > > #include <stdlib.h> > > #include <string.h> > > #include <fcntl.h> > > #include <unistd.h> > > #include <sys/mman.h> > > > > #define LEN 256 > > #define PRINT_STRIDE 0x20 > > > > static unsigned char data[LEN]; > > static unsigned char val = 0; > > > > static unsigned char prev_data[LEN]; > > > > static unsigned char map_copy[LEN]; > > > > int main(int argc, char *argv[]) > > { > > unsigned long n = 0; > > int h; > > unsigned char *map; > > unsigned start, end, i; > > > > if (argc < 2) fprintf(stderr, "argc\n"), exit(1); > > if (argc >= 4) srandom(atoll(argv[3])); > > h = open(argv[1], O_RDWR | O_DSYNC); > > if (h == -1) perror("open"), exit(1); > > map = mmap(NULL, LEN, PROT_READ | PROT_WRITE, MAP_SHARED, h, argc >= 3 ? strtoull(argv[2], NULL, 16) : 0); > > if (map == MAP_FAILED) perror("mmap"), exit(1); > > > > memset(data, 0, LEN); > > memset(prev_data, 0, LEN); > > memset(map, 0, LEN); > > > > sleep(1); > > > > while (1) { > > start = (unsigned)random() % (LEN + 1); > > end = (unsigned)random() % (LEN + 1); > > if (start > end) > > continue; > > for (i = start; i < end; i++) > > data[i] = val++; > > memcpy(map + start, data + start, end - start); > > if (memcmp(map, data, LEN)) { > > unsigned j; > > memcpy(map_copy, map, LEN); > > fprintf(stderr, "mismatch after %lu loops!\n", n); > > fprintf(stderr, "last copied range: 0x%x - 0x%x (0x%x)\n", start, end, (unsigned)(end - start)); > > for (j = 0; j < LEN; j += PRINT_STRIDE) { > > fprintf(stderr, "p[%03x]", j); > > for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) > > fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", prev_data[i]); > > fprintf(stderr, "\n"); > > fprintf(stderr, "d[%03x]", j); > > for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) > > fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", data[i]); > > fprintf(stderr, "\n"); > > fprintf(stderr, "m[%03x]", j); > > for (i = j; i < j + PRINT_STRIDE && i < LEN; i++) > > fprintf(stderr, " %s%s%02x\e[0m", !(i % 4) ? " " : "", data[i] != map_copy[i] ? "\e[31m" : "", map_copy[i]); > > fprintf(stderr, "\n\n"); > > } > > exit(1); > > } > > memcpy(prev_data, data, LEN); > > n++; > > } > > } > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 18:25 ` Mikulas Patocka (?) @ 2018-08-03 20:44 ` Matt Sealey -1 siblings, 0 replies; 238+ messages in thread From: Matt Sealey @ 2018-08-03 20:44 UTC (permalink / raw) To: Mikulas Patocka Cc: Ard Biesheuvel, Will Deacon, Jingoo Han, Joao Pinto, Thomas Petazzoni, Catalin Marinas, Russell King, Linux Kernel Mailing List, linux-arm-kernel, linux-pci On 3 August 2018 at 13:25, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Fri, 3 Aug 2018, Ard Biesheuvel wrote: > >> Are we still talking about overlapping unaligned accesses here? Or do >> you see other failures as well? > > Yes - it is caused by overlapping unaligned accesses inside memcpy. When I > put "dmb sy" between the overlapping accesses in > glibc/sysdeps/aarch64/memcpy.S, this program doesn't detect any memory > corruption. It is a symptom of generating reorderable accesses inside memcpy. It's nothing to do with alignment, per se (see below). A dmb sy just hides the symptoms. What we're talking about here - yes, Ard, within certain amounts of reason - is that you cannot use PCI BAR memory as 'Normal' - certainly never cacheable memory, but Normal NC isn't good either. That is that your CPU cannot post writes or reads towards PCI memory spaces unless it is dealing with it as Device memory or very strictly controlled use of Normal Non-Cacheable. I understand why the rest of the world likes to mark stuff as 'writecombine,' but that's x86-ism, not an Arm memory type. There is potential for accesses to the same slave from different masters (or just different AXI IDs, most cores rotate over 8 or 16 or so for Normal memory to achieve) to be reordered. PCIe has no idea what the source was, it will just accept them in the order it receives them, and also it will be strictly defined to manage incoming AXI or ACE transactions (and barriers..) in a way that does not violate the PCIe memory model - the worst case is deadlocks, the best case is you see some very strange behavior. In any case the original ordering of two Normal-NC transactions may not make it to the PCIe bridge in the first place which is probably why a DMB resolves it - it will force the core to issue them in order and it's likely unless there is some hyper-complex multi-pathing going on, they'll stay ordered. If you MUST preserve the order between two Normal memory accesses, a barrier is required. The same is true also of any re-orderable device access. >> > I tried to run it on system RAM mapped with the NC attribute and I didn't >> > get any corruption - that suggests the the bug may be in the PCIE >> > subsystem. Pure fluke. I'll give a simple explanation. The Arm Architecture defines single-copy and multi-copy atomic transactions. You can treat 'single-copy' to mean that that transaction cannot be made partial, or reordered within itself, i.e. it must modify memory (if it is a store) in a single swift effort and any future reads from that memory must return the FULL result of that write. Multi-copy means it can be resized and reordered a bit. Will Deacon is going to crucify me for simplifying it, but.. let's proceed with a poor example: STR X0,[X1] on a 32-bit bus cannot ever be single-copy atomic, because you cannot write 64-bits of data on a 32-bit bus in a single, unbreakable transaction. This is because from one bus cycle to the next, one half of the transaction will be in a different place. Your interconnect will have latched and buffered 32-bits and the CPU is holding the other. STP X0, X1, [X2] on a 64-bit bus can be single-copy atomic with respect to the element size. But it is on the whole multi-copy atomic - that is to say that it can provide a single transaction with multiple elements which are transmitted, and those elements could be messed with on the way down the pipe. On a 128-bit bus, you might expect it to be single-copy atomic because the entire transaction can be fit into one single data beat, but *it is most definitely not* according to the architecture. The data from X0 and X1 may be required to be stored at *X2 and *(X2+8), but the architecture doesn't care which one is written first. Neither does AMBA. STP is only ever guaranteed to be single-copy atomic with regards to the element size (which is the X register in question). If you swap the data around, and do STP X1, X0, [X2] you may see a different result dependent on how the processor decides to pull data from the register file and in what order. Users of the old 32-bit ARM STM instruction will recall that it writes the register list in incrementing order, lowest register number to lowest address, so what is the solution for STP? Do you expect expect X0 to be emitted on the bus first or the data to be stored in *X2? It's neither! That means you can do an STP on one processor and an LDR of one of the 64-bit words on another processor, and you may be able to see a) None of the STP transaction b) X2 is written with the value in X0, but X2+8 is not holding the value in X1 c) b, only reversed d) What you expect And this can change dependent on the resizers and bridges and QoS and paths between a master interface and a slave interface, although a truly single-copy atomic transaction going through a downsizer to smaller than the transaction size is a broken system design, it may be allowable if the downsizer hazards addresses to the granularity of the larger bus size on the read and write channels and will stall the read until the write has committed at least to a buffer, or downstream of the downsizer, so that it will return on read the full breadth of the memory update.... that's down to the system designer. There are plenty of places things like this can happen - in cache controllers, for example, and merging store buffers (you may have a 256 bit or 512 bit buffer, but only a 128-bit memory interface). memcpy() as a function nor the loads and stores it makes are not single-copy atomic, no transactions need to be with Normal memory, so that merged stores and linefills (if cacheable) can be done. Hence, your memcpy() is just randomly chucking whatever data it likes to the bus and they'll arrive in any old order, 'writecombine' semantics make you think you'll only ever see one very large write with all the CPU activity merged together - also NOT true. And the granularity of the hazarding in your system, from the CPU store buffer to the bus interface to the interconnect buffering to the PCIe bridge to the PCIe EP is.. what? Not the same all the way down, I'll bet you. It is assuming that Intel writecombine semantics would apply, which to be truthful are NO different to the ones of a merging store buffer in an Arm processor (Intel architecture states that the writecombine buffer can be flushed at any time with any amount of actual data, it might not be the biggest burst you can imagine), but in practice it tends to be in cache-line sized chunks with strict incrementing order and subsequent writes due to the extremely large pipeline and queueing will be absorbed by the writecombine buffer almost with guarantee. Links is broken. Even on Intel. If you overlap memory transactions and expect them to be gathered and reordered to produce nice, ordered non-overlapping streaming transactions you'll be sorely disappointed when they don't, which is what is happening here. The fix is use barriers - and don't rely on single-copy atomicity (which is the only saving feature that would not require you to use a barrier) since this is a situation where absolutely none is afforded. It'd be easier to cross your fingers that the PCIe RC is has a coherent master port (ACE-Lite or something fancier) and can snoop into CPU caches. Then you can mark a memory location in DRAM as Normal Inner/Outer Cacheable Writeback, Inner/Outer Shareable, Write-allocate, read-allocate, and you won't even notice your CPU doing any memory writes, but yes if you tell a graphics adapter that it's main framebuffer is in DRAM it might be a bit slower (to the speed of the PCIe link.. which may affect your maximum resolution in some really strange circumstances). If it cannot use a DRAM framebuffer then I'd have to wonder why not.. every PCI graphics card I ever used could take any base address and the magic of PCI bus mastering would handle it. This is no different to how you'd use DRAM as texture memory.. phenomenally slowly, but without having to worry about any ordering semantics (except you should flush your data cache to PoC at the end of every frame). Ta, Matt ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 20:44 ` Matt Sealey 0 siblings, 0 replies; 238+ messages in thread From: Matt Sealey @ 2018-08-03 20:44 UTC (permalink / raw) To: linux-arm-kernel On 3 August 2018 at 13:25, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Fri, 3 Aug 2018, Ard Biesheuvel wrote: > >> Are we still talking about overlapping unaligned accesses here? Or do >> you see other failures as well? > > Yes - it is caused by overlapping unaligned accesses inside memcpy. When I > put "dmb sy" between the overlapping accesses in > glibc/sysdeps/aarch64/memcpy.S, this program doesn't detect any memory > corruption. It is a symptom of generating reorderable accesses inside memcpy. It's nothing to do with alignment, per se (see below). A dmb sy just hides the symptoms. What we're talking about here - yes, Ard, within certain amounts of reason - is that you cannot use PCI BAR memory as 'Normal' - certainly never cacheable memory, but Normal NC isn't good either. That is that your CPU cannot post writes or reads towards PCI memory spaces unless it is dealing with it as Device memory or very strictly controlled use of Normal Non-Cacheable. I understand why the rest of the world likes to mark stuff as 'writecombine,' but that's x86-ism, not an Arm memory type. There is potential for accesses to the same slave from different masters (or just different AXI IDs, most cores rotate over 8 or 16 or so for Normal memory to achieve) to be reordered. PCIe has no idea what the source was, it will just accept them in the order it receives them, and also it will be strictly defined to manage incoming AXI or ACE transactions (and barriers..) in a way that does not violate the PCIe memory model - the worst case is deadlocks, the best case is you see some very strange behavior. In any case the original ordering of two Normal-NC transactions may not make it to the PCIe bridge in the first place which is probably why a DMB resolves it - it will force the core to issue them in order and it's likely unless there is some hyper-complex multi-pathing going on, they'll stay ordered. If you MUST preserve the order between two Normal memory accesses, a barrier is required. The same is true also of any re-orderable device access. >> > I tried to run it on system RAM mapped with the NC attribute and I didn't >> > get any corruption - that suggests the the bug may be in the PCIE >> > subsystem. Pure fluke. I'll give a simple explanation. The Arm Architecture defines single-copy and multi-copy atomic transactions. You can treat 'single-copy' to mean that that transaction cannot be made partial, or reordered within itself, i.e. it must modify memory (if it is a store) in a single swift effort and any future reads from that memory must return the FULL result of that write. Multi-copy means it can be resized and reordered a bit. Will Deacon is going to crucify me for simplifying it, but.. let's proceed with a poor example: STR X0,[X1] on a 32-bit bus cannot ever be single-copy atomic, because you cannot write 64-bits of data on a 32-bit bus in a single, unbreakable transaction. This is because from one bus cycle to the next, one half of the transaction will be in a different place. Your interconnect will have latched and buffered 32-bits and the CPU is holding the other. STP X0, X1, [X2] on a 64-bit bus can be single-copy atomic with respect to the element size. But it is on the whole multi-copy atomic - that is to say that it can provide a single transaction with multiple elements which are transmitted, and those elements could be messed with on the way down the pipe. On a 128-bit bus, you might expect it to be single-copy atomic because the entire transaction can be fit into one single data beat, but *it is most definitely not* according to the architecture. The data from X0 and X1 may be required to be stored at *X2 and *(X2+8), but the architecture doesn't care which one is written first. Neither does AMBA. STP is only ever guaranteed to be single-copy atomic with regards to the element size (which is the X register in question). If you swap the data around, and do STP X1, X0, [X2] you may see a different result dependent on how the processor decides to pull data from the register file and in what order. Users of the old 32-bit ARM STM instruction will recall that it writes the register list in incrementing order, lowest register number to lowest address, so what is the solution for STP? Do you expect expect X0 to be emitted on the bus first or the data to be stored in *X2? It's neither! That means you can do an STP on one processor and an LDR of one of the 64-bit words on another processor, and you may be able to see a) None of the STP transaction b) X2 is written with the value in X0, but X2+8 is not holding the value in X1 c) b, only reversed d) What you expect And this can change dependent on the resizers and bridges and QoS and paths between a master interface and a slave interface, although a truly single-copy atomic transaction going through a downsizer to smaller than the transaction size is a broken system design, it may be allowable if the downsizer hazards addresses to the granularity of the larger bus size on the read and write channels and will stall the read until the write has committed at least to a buffer, or downstream of the downsizer, so that it will return on read the full breadth of the memory update.... that's down to the system designer. There are plenty of places things like this can happen - in cache controllers, for example, and merging store buffers (you may have a 256 bit or 512 bit buffer, but only a 128-bit memory interface). memcpy() as a function nor the loads and stores it makes are not single-copy atomic, no transactions need to be with Normal memory, so that merged stores and linefills (if cacheable) can be done. Hence, your memcpy() is just randomly chucking whatever data it likes to the bus and they'll arrive in any old order, 'writecombine' semantics make you think you'll only ever see one very large write with all the CPU activity merged together - also NOT true. And the granularity of the hazarding in your system, from the CPU store buffer to the bus interface to the interconnect buffering to the PCIe bridge to the PCIe EP is.. what? Not the same all the way down, I'll bet you. It is assuming that Intel writecombine semantics would apply, which to be truthful are NO different to the ones of a merging store buffer in an Arm processor (Intel architecture states that the writecombine buffer can be flushed at any time with any amount of actual data, it might not be the biggest burst you can imagine), but in practice it tends to be in cache-line sized chunks with strict incrementing order and subsequent writes due to the extremely large pipeline and queueing will be absorbed by the writecombine buffer almost with guarantee. Links is broken. Even on Intel. If you overlap memory transactions and expect them to be gathered and reordered to produce nice, ordered non-overlapping streaming transactions you'll be sorely disappointed when they don't, which is what is happening here. The fix is use barriers - and don't rely on single-copy atomicity (which is the only saving feature that would not require you to use a barrier) since this is a situation where absolutely none is afforded. It'd be easier to cross your fingers that the PCIe RC is has a coherent master port (ACE-Lite or something fancier) and can snoop into CPU caches. Then you can mark a memory location in DRAM as Normal Inner/Outer Cacheable Writeback, Inner/Outer Shareable, Write-allocate, read-allocate, and you won't even notice your CPU doing any memory writes, but yes if you tell a graphics adapter that it's main framebuffer is in DRAM it might be a bit slower (to the speed of the PCIe link.. which may affect your maximum resolution in some really strange circumstances). If it cannot use a DRAM framebuffer then I'd have to wonder why not.. every PCI graphics card I ever used could take any base address and the magic of PCI bus mastering would handle it. This is no different to how you'd use DRAM as texture memory.. phenomenally slowly, but without having to worry about any ordering semantics (except you should flush your data cache to PoC at the end of every frame). Ta, Matt ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 20:44 ` Matt Sealey 0 siblings, 0 replies; 238+ messages in thread From: Matt Sealey @ 2018-08-03 20:44 UTC (permalink / raw) To: Mikulas Patocka Cc: Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, linux-pci, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, Catalin Marinas, linux-arm-kernel On 3 August 2018 at 13:25, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Fri, 3 Aug 2018, Ard Biesheuvel wrote: > >> Are we still talking about overlapping unaligned accesses here? Or do >> you see other failures as well? > > Yes - it is caused by overlapping unaligned accesses inside memcpy. When I > put "dmb sy" between the overlapping accesses in > glibc/sysdeps/aarch64/memcpy.S, this program doesn't detect any memory > corruption. It is a symptom of generating reorderable accesses inside memcpy. It's nothing to do with alignment, per se (see below). A dmb sy just hides the symptoms. What we're talking about here - yes, Ard, within certain amounts of reason - is that you cannot use PCI BAR memory as 'Normal' - certainly never cacheable memory, but Normal NC isn't good either. That is that your CPU cannot post writes or reads towards PCI memory spaces unless it is dealing with it as Device memory or very strictly controlled use of Normal Non-Cacheable. I understand why the rest of the world likes to mark stuff as 'writecombine,' but that's x86-ism, not an Arm memory type. There is potential for accesses to the same slave from different masters (or just different AXI IDs, most cores rotate over 8 or 16 or so for Normal memory to achieve) to be reordered. PCIe has no idea what the source was, it will just accept them in the order it receives them, and also it will be strictly defined to manage incoming AXI or ACE transactions (and barriers..) in a way that does not violate the PCIe memory model - the worst case is deadlocks, the best case is you see some very strange behavior. In any case the original ordering of two Normal-NC transactions may not make it to the PCIe bridge in the first place which is probably why a DMB resolves it - it will force the core to issue them in order and it's likely unless there is some hyper-complex multi-pathing going on, they'll stay ordered. If you MUST preserve the order between two Normal memory accesses, a barrier is required. The same is true also of any re-orderable device access. >> > I tried to run it on system RAM mapped with the NC attribute and I didn't >> > get any corruption - that suggests the the bug may be in the PCIE >> > subsystem. Pure fluke. I'll give a simple explanation. The Arm Architecture defines single-copy and multi-copy atomic transactions. You can treat 'single-copy' to mean that that transaction cannot be made partial, or reordered within itself, i.e. it must modify memory (if it is a store) in a single swift effort and any future reads from that memory must return the FULL result of that write. Multi-copy means it can be resized and reordered a bit. Will Deacon is going to crucify me for simplifying it, but.. let's proceed with a poor example: STR X0,[X1] on a 32-bit bus cannot ever be single-copy atomic, because you cannot write 64-bits of data on a 32-bit bus in a single, unbreakable transaction. This is because from one bus cycle to the next, one half of the transaction will be in a different place. Your interconnect will have latched and buffered 32-bits and the CPU is holding the other. STP X0, X1, [X2] on a 64-bit bus can be single-copy atomic with respect to the element size. But it is on the whole multi-copy atomic - that is to say that it can provide a single transaction with multiple elements which are transmitted, and those elements could be messed with on the way down the pipe. On a 128-bit bus, you might expect it to be single-copy atomic because the entire transaction can be fit into one single data beat, but *it is most definitely not* according to the architecture. The data from X0 and X1 may be required to be stored at *X2 and *(X2+8), but the architecture doesn't care which one is written first. Neither does AMBA. STP is only ever guaranteed to be single-copy atomic with regards to the element size (which is the X register in question). If you swap the data around, and do STP X1, X0, [X2] you may see a different result dependent on how the processor decides to pull data from the register file and in what order. Users of the old 32-bit ARM STM instruction will recall that it writes the register list in incrementing order, lowest register number to lowest address, so what is the solution for STP? Do you expect expect X0 to be emitted on the bus first or the data to be stored in *X2? It's neither! That means you can do an STP on one processor and an LDR of one of the 64-bit words on another processor, and you may be able to see a) None of the STP transaction b) X2 is written with the value in X0, but X2+8 is not holding the value in X1 c) b, only reversed d) What you expect And this can change dependent on the resizers and bridges and QoS and paths between a master interface and a slave interface, although a truly single-copy atomic transaction going through a downsizer to smaller than the transaction size is a broken system design, it may be allowable if the downsizer hazards addresses to the granularity of the larger bus size on the read and write channels and will stall the read until the write has committed at least to a buffer, or downstream of the downsizer, so that it will return on read the full breadth of the memory update.... that's down to the system designer. There are plenty of places things like this can happen - in cache controllers, for example, and merging store buffers (you may have a 256 bit or 512 bit buffer, but only a 128-bit memory interface). memcpy() as a function nor the loads and stores it makes are not single-copy atomic, no transactions need to be with Normal memory, so that merged stores and linefills (if cacheable) can be done. Hence, your memcpy() is just randomly chucking whatever data it likes to the bus and they'll arrive in any old order, 'writecombine' semantics make you think you'll only ever see one very large write with all the CPU activity merged together - also NOT true. And the granularity of the hazarding in your system, from the CPU store buffer to the bus interface to the interconnect buffering to the PCIe bridge to the PCIe EP is.. what? Not the same all the way down, I'll bet you. It is assuming that Intel writecombine semantics would apply, which to be truthful are NO different to the ones of a merging store buffer in an Arm processor (Intel architecture states that the writecombine buffer can be flushed at any time with any amount of actual data, it might not be the biggest burst you can imagine), but in practice it tends to be in cache-line sized chunks with strict incrementing order and subsequent writes due to the extremely large pipeline and queueing will be absorbed by the writecombine buffer almost with guarantee. Links is broken. Even on Intel. If you overlap memory transactions and expect them to be gathered and reordered to produce nice, ordered non-overlapping streaming transactions you'll be sorely disappointed when they don't, which is what is happening here. The fix is use barriers - and don't rely on single-copy atomicity (which is the only saving feature that would not require you to use a barrier) since this is a situation where absolutely none is afforded. It'd be easier to cross your fingers that the PCIe RC is has a coherent master port (ACE-Lite or something fancier) and can snoop into CPU caches. Then you can mark a memory location in DRAM as Normal Inner/Outer Cacheable Writeback, Inner/Outer Shareable, Write-allocate, read-allocate, and you won't even notice your CPU doing any memory writes, but yes if you tell a graphics adapter that it's main framebuffer is in DRAM it might be a bit slower (to the speed of the PCIe link.. which may affect your maximum resolution in some really strange circumstances). If it cannot use a DRAM framebuffer then I'd have to wonder why not.. every PCI graphics card I ever used could take any base address and the magic of PCI bus mastering would handle it. This is no different to how you'd use DRAM as texture memory.. phenomenally slowly, but without having to worry about any ordering semantics (except you should flush your data cache to PoC at the end of every frame). Ta, Matt _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 20:44 ` Matt Sealey (?) @ 2018-08-03 21:20 ` Ard Biesheuvel -1 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-03 21:20 UTC (permalink / raw) To: Matt Sealey Cc: Mikulas Patocka, Will Deacon, Jingoo Han, Joao Pinto, Thomas Petazzoni, Catalin Marinas, Russell King, Linux Kernel Mailing List, linux-arm-kernel, linux-pci On 3 August 2018 at 22:44, Matt Sealey <neko@bakuhatsu.net> wrote: > On 3 August 2018 at 13:25, Mikulas Patocka <mpatocka@redhat.com> wrote: >> >> >> On Fri, 3 Aug 2018, Ard Biesheuvel wrote: >> >>> Are we still talking about overlapping unaligned accesses here? Or do >>> you see other failures as well? >> >> Yes - it is caused by overlapping unaligned accesses inside memcpy. When I >> put "dmb sy" between the overlapping accesses in >> glibc/sysdeps/aarch64/memcpy.S, this program doesn't detect any memory >> corruption. > > It is a symptom of generating reorderable accesses inside memcpy. It's nothing > to do with alignment, per se (see below). A dmb sy just hides the symptoms. > > What we're talking about here - yes, Ard, within certain amounts of > reason - is that > you cannot use PCI BAR memory as 'Normal' - certainly never cacheable memory, > but Normal NC isn't good either. That is that your CPU cannot post > writes or reads > towards PCI memory spaces unless it is dealing with it as Device memory or very > strictly controlled use of Normal Non-Cacheable. > > I understand why the rest of the world likes to mark stuff as > 'writecombine,' but > that's x86-ism, not an Arm memory type. > > There is potential for accesses to the same slave from different > masters (or just > different AXI IDs, most cores rotate over 8 or 16 or so for Normal > memory to achieve) > to be reordered. PCIe has no idea what the source was, it will just > accept them in the order it receives them, and also it will be > strictly defined to > manage incoming AXI or ACE transactions (and barriers..) in a way that does > not violate the PCIe memory model - the worst case is deadlocks, the best case > is you see some very strange behavior. > > In any case the original ordering of two Normal-NC transactions may > not make it to > the PCIe bridge in the first place which is probably why a DMB > resolves it - it will > force the core to issue them in order and it's likely unless there is > some hyper-complex > multi-pathing going on, they'll stay ordered. If you MUST preserve the > order between > two Normal memory accesses, a barrier is required. The same is true also of any > re-orderable device access. > None of this explains why some transactions fail to make it across entirely. The overlapping writes in question write the same data to the memory locations that are covered by both, and so the ordering in which the transactions are received should not affect the outcome. >>> > I tried to run it on system RAM mapped with the NC attribute and I didn't >>> > get any corruption - that suggests the the bug may be in the PCIE >>> > subsystem. > > Pure fluke. > > I'll give a simple explanation. The Arm Architecture defines > single-copy and multi-copy > atomic transactions. You can treat 'single-copy' to mean that that > transaction cannot > be made partial, or reordered within itself, i.e. it must modify > memory (if it is a store) in > a single swift effort and any future reads from that memory must > return the FULL result > of that write. > > Multi-copy means it can be resized and reordered a bit. Will Deacon is > going to crucify > me for simplifying it, but.. let's proceed with a poor example: > > STR X0,[X1] on a 32-bit bus cannot ever be single-copy atomic, because > you cannot > write 64-bits of data on a 32-bit bus in a single, unbreakable > transaction. This is because > from one bus cycle to the next, one half of the transaction will be in > a different place. Your > interconnect will have latched and buffered 32-bits and the CPU is > holding the other. > > STP X0, X1, [X2] on a 64-bit bus can be single-copy atomic with > respect to the element > size. But it is on the whole multi-copy atomic - that is to say that > it can provide a single > transaction with multiple elements which are transmitted, and those > elements could be > messed with on the way down the pipe. > > On a 128-bit bus, you might expect it to be single-copy atomic because > the entire > transaction can be fit into one single data beat, but *it is most > definitely not* according > to the architecture. The data from X0 and X1 may be required to be > stored at *X2 and > *(X2+8), but the architecture doesn't care which one is written first. > Neither does AMBA. > > STP is only ever guaranteed to be single-copy atomic with regards to > the element size > (which is the X register in question). If you swap the data around, > and do STP X1, X0, > [X2] you may see a different result dependent on how the processor > decides to pull > data from the register file and in what order. Users of the old 32-bit > ARM STM instruction > will recall that it writes the register list in incrementing order, > lowest register number to > lowest address, so what is the solution for STP? Do you expect expect > X0 to be emitted > on the bus first or the data to be stored in *X2? > > It's neither! > > That means you can do an STP on one processor and an LDR of one of the 64-bit > words on another processor, and you may be able to see > > a) None of the STP transaction > b) X2 is written with the value in X0, but X2+8 is not holding the value in X1 > c) b, only reversed > d) What you expect > > And this can change dependent on the resizers and bridges and QoS and paths > between a master interface and a slave interface, although a truly > single-copy atomic > transaction going through a downsizer to smaller than the transaction > size is a broken > system design, it may be allowable if the downsizer hazards addresses > to the granularity > of the larger bus size on the read and write channels and will stall > the read until the write > has committed at least to a buffer, or downstream of the downsizer, so > that it will return > on read the full breadth of the memory update.... that's down to the > system designer. > There are plenty of places things like this can happen - in cache > controllers, for > example, and merging store buffers (you may have a 256 bit or 512 bit > buffer, but > only a 128-bit memory interface). > > memcpy() as a function nor the loads and stores it makes are not > single-copy atomic, > no transactions need to be with Normal memory, so that merged stores > and linefills > (if cacheable) can be done. Hence, your memcpy() is just randomly > chucking whatever > data it likes to the bus and they'll arrive in any old order, > 'writecombine' semantics make > you think you'll only ever see one very large write with all the CPU > activity merged > together - also NOT true. > > And the granularity of the hazarding in your system, from the CPU > store buffer to the > bus interface to the interconnect buffering to the PCIe bridge to the > PCIe EP is.. what? > Not the same all the way down, I'll bet you. > > It is assuming that Intel writecombine semantics would apply, which to > be truthful are NO > different to the ones of a merging store buffer in an Arm processor > (Intel architecture states > that the writecombine buffer can be flushed at any time with any > amount of actual data, > it might not be the biggest burst you can imagine), but in practice it > tends to be in cache-line > sized chunks with strict incrementing order and subsequent writes due > to the extremely > large pipeline and queueing will be absorbed by the writecombine > buffer almost with > guarantee. > > Links is broken. Even on Intel. If you overlap memory transactions and > expect them to be > gathered and reordered to produce nice, ordered non-overlapping > streaming transactions > you'll be sorely disappointed when they don't, which is what is > happening here. The fix is > use barriers - and don't rely on single-copy atomicity (which is the > only saving feature that > would not require you to use a barrier) since this is a situation > where absolutely none is > afforded. > > It'd be easier to cross your fingers that the PCIe RC is has a > coherent master port (ACE-Lite > or something fancier) and can snoop into CPU caches. Then you can mark a memory > location in DRAM as Normal Inner/Outer Cacheable Writeback, > Inner/Outer Shareable, > Write-allocate, read-allocate, and you won't even notice your CPU > doing any memory > writes, but yes if you tell a graphics adapter that it's main > framebuffer is in DRAM it might > be a bit slower (to the speed of the PCIe link.. which may affect your > maximum resolution > in some really strange circumstances). If it cannot use a DRAM > framebuffer then I'd have to > wonder why not.. every PCI graphics card I ever used could take any > base address and > the magic of PCI bus mastering would handle it. This is no different > to how you'd use > DRAM as texture memory.. phenomenally slowly, but without having to > worry about any > ordering semantics (except you should flush your data cache to PoC at > the end of every > frame). > > Ta, > Matt ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 21:20 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-03 21:20 UTC (permalink / raw) To: linux-arm-kernel On 3 August 2018 at 22:44, Matt Sealey <neko@bakuhatsu.net> wrote: > On 3 August 2018 at 13:25, Mikulas Patocka <mpatocka@redhat.com> wrote: >> >> >> On Fri, 3 Aug 2018, Ard Biesheuvel wrote: >> >>> Are we still talking about overlapping unaligned accesses here? Or do >>> you see other failures as well? >> >> Yes - it is caused by overlapping unaligned accesses inside memcpy. When I >> put "dmb sy" between the overlapping accesses in >> glibc/sysdeps/aarch64/memcpy.S, this program doesn't detect any memory >> corruption. > > It is a symptom of generating reorderable accesses inside memcpy. It's nothing > to do with alignment, per se (see below). A dmb sy just hides the symptoms. > > What we're talking about here - yes, Ard, within certain amounts of > reason - is that > you cannot use PCI BAR memory as 'Normal' - certainly never cacheable memory, > but Normal NC isn't good either. That is that your CPU cannot post > writes or reads > towards PCI memory spaces unless it is dealing with it as Device memory or very > strictly controlled use of Normal Non-Cacheable. > > I understand why the rest of the world likes to mark stuff as > 'writecombine,' but > that's x86-ism, not an Arm memory type. > > There is potential for accesses to the same slave from different > masters (or just > different AXI IDs, most cores rotate over 8 or 16 or so for Normal > memory to achieve) > to be reordered. PCIe has no idea what the source was, it will just > accept them in the order it receives them, and also it will be > strictly defined to > manage incoming AXI or ACE transactions (and barriers..) in a way that does > not violate the PCIe memory model - the worst case is deadlocks, the best case > is you see some very strange behavior. > > In any case the original ordering of two Normal-NC transactions may > not make it to > the PCIe bridge in the first place which is probably why a DMB > resolves it - it will > force the core to issue them in order and it's likely unless there is > some hyper-complex > multi-pathing going on, they'll stay ordered. If you MUST preserve the > order between > two Normal memory accesses, a barrier is required. The same is true also of any > re-orderable device access. > None of this explains why some transactions fail to make it across entirely. The overlapping writes in question write the same data to the memory locations that are covered by both, and so the ordering in which the transactions are received should not affect the outcome. >>> > I tried to run it on system RAM mapped with the NC attribute and I didn't >>> > get any corruption - that suggests the the bug may be in the PCIE >>> > subsystem. > > Pure fluke. > > I'll give a simple explanation. The Arm Architecture defines > single-copy and multi-copy > atomic transactions. You can treat 'single-copy' to mean that that > transaction cannot > be made partial, or reordered within itself, i.e. it must modify > memory (if it is a store) in > a single swift effort and any future reads from that memory must > return the FULL result > of that write. > > Multi-copy means it can be resized and reordered a bit. Will Deacon is > going to crucify > me for simplifying it, but.. let's proceed with a poor example: > > STR X0,[X1] on a 32-bit bus cannot ever be single-copy atomic, because > you cannot > write 64-bits of data on a 32-bit bus in a single, unbreakable > transaction. This is because > from one bus cycle to the next, one half of the transaction will be in > a different place. Your > interconnect will have latched and buffered 32-bits and the CPU is > holding the other. > > STP X0, X1, [X2] on a 64-bit bus can be single-copy atomic with > respect to the element > size. But it is on the whole multi-copy atomic - that is to say that > it can provide a single > transaction with multiple elements which are transmitted, and those > elements could be > messed with on the way down the pipe. > > On a 128-bit bus, you might expect it to be single-copy atomic because > the entire > transaction can be fit into one single data beat, but *it is most > definitely not* according > to the architecture. The data from X0 and X1 may be required to be > stored at *X2 and > *(X2+8), but the architecture doesn't care which one is written first. > Neither does AMBA. > > STP is only ever guaranteed to be single-copy atomic with regards to > the element size > (which is the X register in question). If you swap the data around, > and do STP X1, X0, > [X2] you may see a different result dependent on how the processor > decides to pull > data from the register file and in what order. Users of the old 32-bit > ARM STM instruction > will recall that it writes the register list in incrementing order, > lowest register number to > lowest address, so what is the solution for STP? Do you expect expect > X0 to be emitted > on the bus first or the data to be stored in *X2? > > It's neither! > > That means you can do an STP on one processor and an LDR of one of the 64-bit > words on another processor, and you may be able to see > > a) None of the STP transaction > b) X2 is written with the value in X0, but X2+8 is not holding the value in X1 > c) b, only reversed > d) What you expect > > And this can change dependent on the resizers and bridges and QoS and paths > between a master interface and a slave interface, although a truly > single-copy atomic > transaction going through a downsizer to smaller than the transaction > size is a broken > system design, it may be allowable if the downsizer hazards addresses > to the granularity > of the larger bus size on the read and write channels and will stall > the read until the write > has committed at least to a buffer, or downstream of the downsizer, so > that it will return > on read the full breadth of the memory update.... that's down to the > system designer. > There are plenty of places things like this can happen - in cache > controllers, for > example, and merging store buffers (you may have a 256 bit or 512 bit > buffer, but > only a 128-bit memory interface). > > memcpy() as a function nor the loads and stores it makes are not > single-copy atomic, > no transactions need to be with Normal memory, so that merged stores > and linefills > (if cacheable) can be done. Hence, your memcpy() is just randomly > chucking whatever > data it likes to the bus and they'll arrive in any old order, > 'writecombine' semantics make > you think you'll only ever see one very large write with all the CPU > activity merged > together - also NOT true. > > And the granularity of the hazarding in your system, from the CPU > store buffer to the > bus interface to the interconnect buffering to the PCIe bridge to the > PCIe EP is.. what? > Not the same all the way down, I'll bet you. > > It is assuming that Intel writecombine semantics would apply, which to > be truthful are NO > different to the ones of a merging store buffer in an Arm processor > (Intel architecture states > that the writecombine buffer can be flushed at any time with any > amount of actual data, > it might not be the biggest burst you can imagine), but in practice it > tends to be in cache-line > sized chunks with strict incrementing order and subsequent writes due > to the extremely > large pipeline and queueing will be absorbed by the writecombine > buffer almost with > guarantee. > > Links is broken. Even on Intel. If you overlap memory transactions and > expect them to be > gathered and reordered to produce nice, ordered non-overlapping > streaming transactions > you'll be sorely disappointed when they don't, which is what is > happening here. The fix is > use barriers - and don't rely on single-copy atomicity (which is the > only saving feature that > would not require you to use a barrier) since this is a situation > where absolutely none is > afforded. > > It'd be easier to cross your fingers that the PCIe RC is has a > coherent master port (ACE-Lite > or something fancier) and can snoop into CPU caches. Then you can mark a memory > location in DRAM as Normal Inner/Outer Cacheable Writeback, > Inner/Outer Shareable, > Write-allocate, read-allocate, and you won't even notice your CPU > doing any memory > writes, but yes if you tell a graphics adapter that it's main > framebuffer is in DRAM it might > be a bit slower (to the speed of the PCIe link.. which may affect your > maximum resolution > in some really strange circumstances). If it cannot use a DRAM > framebuffer then I'd have to > wonder why not.. every PCI graphics card I ever used could take any > base address and > the magic of PCI bus mastering would handle it. This is no different > to how you'd use > DRAM as texture memory.. phenomenally slowly, but without having to > worry about any > ordering semantics (except you should flush your data cache to PoC at > the end of every > frame). > > Ta, > Matt ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 21:20 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-03 21:20 UTC (permalink / raw) To: Matt Sealey Cc: Thomas Petazzoni, Joao Pinto, linux-pci, Catalin Marinas, Will Deacon, Russell King, Linux Kernel Mailing List, Mikulas Patocka, Jingoo Han, linux-arm-kernel On 3 August 2018 at 22:44, Matt Sealey <neko@bakuhatsu.net> wrote: > On 3 August 2018 at 13:25, Mikulas Patocka <mpatocka@redhat.com> wrote: >> >> >> On Fri, 3 Aug 2018, Ard Biesheuvel wrote: >> >>> Are we still talking about overlapping unaligned accesses here? Or do >>> you see other failures as well? >> >> Yes - it is caused by overlapping unaligned accesses inside memcpy. When I >> put "dmb sy" between the overlapping accesses in >> glibc/sysdeps/aarch64/memcpy.S, this program doesn't detect any memory >> corruption. > > It is a symptom of generating reorderable accesses inside memcpy. It's nothing > to do with alignment, per se (see below). A dmb sy just hides the symptoms. > > What we're talking about here - yes, Ard, within certain amounts of > reason - is that > you cannot use PCI BAR memory as 'Normal' - certainly never cacheable memory, > but Normal NC isn't good either. That is that your CPU cannot post > writes or reads > towards PCI memory spaces unless it is dealing with it as Device memory or very > strictly controlled use of Normal Non-Cacheable. > > I understand why the rest of the world likes to mark stuff as > 'writecombine,' but > that's x86-ism, not an Arm memory type. > > There is potential for accesses to the same slave from different > masters (or just > different AXI IDs, most cores rotate over 8 or 16 or so for Normal > memory to achieve) > to be reordered. PCIe has no idea what the source was, it will just > accept them in the order it receives them, and also it will be > strictly defined to > manage incoming AXI or ACE transactions (and barriers..) in a way that does > not violate the PCIe memory model - the worst case is deadlocks, the best case > is you see some very strange behavior. > > In any case the original ordering of two Normal-NC transactions may > not make it to > the PCIe bridge in the first place which is probably why a DMB > resolves it - it will > force the core to issue them in order and it's likely unless there is > some hyper-complex > multi-pathing going on, they'll stay ordered. If you MUST preserve the > order between > two Normal memory accesses, a barrier is required. The same is true also of any > re-orderable device access. > None of this explains why some transactions fail to make it across entirely. The overlapping writes in question write the same data to the memory locations that are covered by both, and so the ordering in which the transactions are received should not affect the outcome. >>> > I tried to run it on system RAM mapped with the NC attribute and I didn't >>> > get any corruption - that suggests the the bug may be in the PCIE >>> > subsystem. > > Pure fluke. > > I'll give a simple explanation. The Arm Architecture defines > single-copy and multi-copy > atomic transactions. You can treat 'single-copy' to mean that that > transaction cannot > be made partial, or reordered within itself, i.e. it must modify > memory (if it is a store) in > a single swift effort and any future reads from that memory must > return the FULL result > of that write. > > Multi-copy means it can be resized and reordered a bit. Will Deacon is > going to crucify > me for simplifying it, but.. let's proceed with a poor example: > > STR X0,[X1] on a 32-bit bus cannot ever be single-copy atomic, because > you cannot > write 64-bits of data on a 32-bit bus in a single, unbreakable > transaction. This is because > from one bus cycle to the next, one half of the transaction will be in > a different place. Your > interconnect will have latched and buffered 32-bits and the CPU is > holding the other. > > STP X0, X1, [X2] on a 64-bit bus can be single-copy atomic with > respect to the element > size. But it is on the whole multi-copy atomic - that is to say that > it can provide a single > transaction with multiple elements which are transmitted, and those > elements could be > messed with on the way down the pipe. > > On a 128-bit bus, you might expect it to be single-copy atomic because > the entire > transaction can be fit into one single data beat, but *it is most > definitely not* according > to the architecture. The data from X0 and X1 may be required to be > stored at *X2 and > *(X2+8), but the architecture doesn't care which one is written first. > Neither does AMBA. > > STP is only ever guaranteed to be single-copy atomic with regards to > the element size > (which is the X register in question). If you swap the data around, > and do STP X1, X0, > [X2] you may see a different result dependent on how the processor > decides to pull > data from the register file and in what order. Users of the old 32-bit > ARM STM instruction > will recall that it writes the register list in incrementing order, > lowest register number to > lowest address, so what is the solution for STP? Do you expect expect > X0 to be emitted > on the bus first or the data to be stored in *X2? > > It's neither! > > That means you can do an STP on one processor and an LDR of one of the 64-bit > words on another processor, and you may be able to see > > a) None of the STP transaction > b) X2 is written with the value in X0, but X2+8 is not holding the value in X1 > c) b, only reversed > d) What you expect > > And this can change dependent on the resizers and bridges and QoS and paths > between a master interface and a slave interface, although a truly > single-copy atomic > transaction going through a downsizer to smaller than the transaction > size is a broken > system design, it may be allowable if the downsizer hazards addresses > to the granularity > of the larger bus size on the read and write channels and will stall > the read until the write > has committed at least to a buffer, or downstream of the downsizer, so > that it will return > on read the full breadth of the memory update.... that's down to the > system designer. > There are plenty of places things like this can happen - in cache > controllers, for > example, and merging store buffers (you may have a 256 bit or 512 bit > buffer, but > only a 128-bit memory interface). > > memcpy() as a function nor the loads and stores it makes are not > single-copy atomic, > no transactions need to be with Normal memory, so that merged stores > and linefills > (if cacheable) can be done. Hence, your memcpy() is just randomly > chucking whatever > data it likes to the bus and they'll arrive in any old order, > 'writecombine' semantics make > you think you'll only ever see one very large write with all the CPU > activity merged > together - also NOT true. > > And the granularity of the hazarding in your system, from the CPU > store buffer to the > bus interface to the interconnect buffering to the PCIe bridge to the > PCIe EP is.. what? > Not the same all the way down, I'll bet you. > > It is assuming that Intel writecombine semantics would apply, which to > be truthful are NO > different to the ones of a merging store buffer in an Arm processor > (Intel architecture states > that the writecombine buffer can be flushed at any time with any > amount of actual data, > it might not be the biggest burst you can imagine), but in practice it > tends to be in cache-line > sized chunks with strict incrementing order and subsequent writes due > to the extremely > large pipeline and queueing will be absorbed by the writecombine > buffer almost with > guarantee. > > Links is broken. Even on Intel. If you overlap memory transactions and > expect them to be > gathered and reordered to produce nice, ordered non-overlapping > streaming transactions > you'll be sorely disappointed when they don't, which is what is > happening here. The fix is > use barriers - and don't rely on single-copy atomicity (which is the > only saving feature that > would not require you to use a barrier) since this is a situation > where absolutely none is > afforded. > > It'd be easier to cross your fingers that the PCIe RC is has a > coherent master port (ACE-Lite > or something fancier) and can snoop into CPU caches. Then you can mark a memory > location in DRAM as Normal Inner/Outer Cacheable Writeback, > Inner/Outer Shareable, > Write-allocate, read-allocate, and you won't even notice your CPU > doing any memory > writes, but yes if you tell a graphics adapter that it's main > framebuffer is in DRAM it might > be a bit slower (to the speed of the PCIe link.. which may affect your > maximum resolution > in some really strange circumstances). If it cannot use a DRAM > framebuffer then I'd have to > wonder why not.. every PCI graphics card I ever used could take any > base address and > the magic of PCI bus mastering would handle it. This is no different > to how you'd use > DRAM as texture memory.. phenomenally slowly, but without having to > worry about any > ordering semantics (except you should flush your data cache to PoC at > the end of every > frame). > > Ta, > Matt _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 21:20 ` Ard Biesheuvel (?) @ 2018-08-06 10:25 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 10:25 UTC (permalink / raw) To: Ard Biesheuvel Cc: Matt Sealey, Will Deacon, Jingoo Han, Joao Pinto, Thomas Petazzoni, Catalin Marinas, Russell King, Linux Kernel Mailing List, linux-arm-kernel, linux-pci On Fri, 3 Aug 2018, Ard Biesheuvel wrote: > On 3 August 2018 at 22:44, Matt Sealey <neko@bakuhatsu.net> wrote: > > On 3 August 2018 at 13:25, Mikulas Patocka <mpatocka@redhat.com> wrote: > >> > >> > >> On Fri, 3 Aug 2018, Ard Biesheuvel wrote: > >> > >>> Are we still talking about overlapping unaligned accesses here? Or do > >>> you see other failures as well? > >> > >> Yes - it is caused by overlapping unaligned accesses inside memcpy. When I > >> put "dmb sy" between the overlapping accesses in > >> glibc/sysdeps/aarch64/memcpy.S, this program doesn't detect any memory > >> corruption. > > > > It is a symptom of generating reorderable accesses inside memcpy. It's nothing > > to do with alignment, per se (see below). A dmb sy just hides the symptoms. > > > > What we're talking about here - yes, Ard, within certain amounts of > > reason - is that > > you cannot use PCI BAR memory as 'Normal' - certainly never cacheable memory, > > but Normal NC isn't good either. That is that your CPU cannot post > > writes or reads > > towards PCI memory spaces unless it is dealing with it as Device memory or very > > strictly controlled use of Normal Non-Cacheable. > > > > I understand why the rest of the world likes to mark stuff as > > 'writecombine,' but > > that's x86-ism, not an Arm memory type. > > > > There is potential for accesses to the same slave from different > > masters (or just > > different AXI IDs, most cores rotate over 8 or 16 or so for Normal > > memory to achieve) > > to be reordered. PCIe has no idea what the source was, it will just > > accept them in the order it receives them, and also it will be > > strictly defined to > > manage incoming AXI or ACE transactions (and barriers..) in a way that does > > not violate the PCIe memory model - the worst case is deadlocks, the best case > > is you see some very strange behavior. > > > > In any case the original ordering of two Normal-NC transactions may > > not make it to > > the PCIe bridge in the first place which is probably why a DMB > > resolves it - it will > > force the core to issue them in order and it's likely unless there is > > some hyper-complex > > multi-pathing going on, they'll stay ordered. If you MUST preserve the > > order between > > two Normal memory accesses, a barrier is required. The same is true also of any > > re-orderable device access. > > > > None of this explains why some transactions fail to make it across > entirely. The overlapping writes in question write the same data to > the memory locations that are covered by both, and so the ordering in > which the transactions are received should not affect the outcome. You're right that the corruption couldn't be explained just by reordering writes. My hypothesis is that the PCIe controller tries to disambiguate the overlapping writes, but the disambiguation logic was not tested and it is buggy. If there's a barrier between the overlapping writes, the PCIe controller won't see any overlapping writes, so it won't trigger the faulty disambiguation logic and it works. Could the ARM engineers look if there's some chicken bit in Cortex-A72 that could insert barriers between non-cached writes automatically? I observe these kinds of corruptions: - failing to write a few bytes - writing a few bytes that were written 16 bytes before - writing a few bytes that were written 16 bytes after Here is the example of corruptions (the first line is previous content of videoram, the second line is the content that should be present after a memcpy, and the third line is th real contents of videoram after memcpy). Here it writes three bytes that were actually written by the memcpy function 16-bytes before: p[020] e3 e4 e5 e6 e7 e8 c8 bd be bf c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf d0 d1 d2 d3 d4 d5 d[020] 97 98 99 9a 9b 9c 9d 9e 9f a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af b0 b1 b2 b3 d3 d4 d5 m[020] 97 98 99 9a 9b*8c*8d*8e* 9f a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af b0 b1 b2 b3 d3 d4 d5 Writes 4 bytes with a content that was written 16 bytes before: p[020] 47 e2 e3 e4 e5 e6 e7 e8 e9 ea eb 52 53 54 55 56 57 58 59 5a 47 48 49 4a 4b 4c 4d 4e 4f 50 51 52 d[020] 47 e2 ec ed ee ef f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff 00 01 02 03 04 05 06 07 08 09 m[020] 47 e2 ec ed ee ef f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb*ec*ed**ee*ef*00 01 02 03 04 05 06 07 08 09 Writes 2 bytes with a content that was written 16 bytes before: p[0a0] eb ec ed ee ef f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff 00 01 02 03 04 05 2f 30 31 32 33 d[0a0] eb ec ed ee ef f0 f1 f2 f3 f4 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b m[0a0] eb ec ed ee ef f0 f1 f2 f3 f4 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15*06*07* 18 19 1a 1b Writes 3 bytes with a content that was written 16 bytes after: p[0a0] 0a 17 18 19 1a 1b 1c 1d 4a 4b 4c 4d 4e 4f 50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f 60 61 d[0a0] 0a 17 a9 aa ab ac ad ae af b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf c0 c1 c2 c3 c4 c5 c6 m[0a0] 0a 17 a9 aa ab ac ad ae af b0 b1 b2 b3 b4 b5 b6 b7 b8*c9*ca**cb*bc bd be bf c0 c1 c2 c3 c4 c5 c6 Fails to write three bytes: p[040] 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 27 28 29 d[040] a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf m[040] a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac*17*18*19* b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf Fails to write one byte: p[020] 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 36 37 38 39 d[020] 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 28 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f 40 41 42 43 m[020] 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 28 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f 40 41 42*39* Fails to write 5 bytes: p[020] 6e 6f 70 71 72 73 74 75 76 77 78 ca cb cc cd ce cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de d[020] 6e 6f 70 71 72 73 74 75 76 77 78 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f 50 51 52 53 54 55 56 57 58 m[020] 6e 6f 70 71 72 73 74 75 76 77 78 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f 50 51 52 53*da**db*dc*dd*de* Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 10:25 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 10:25 UTC (permalink / raw) To: linux-arm-kernel On Fri, 3 Aug 2018, Ard Biesheuvel wrote: > On 3 August 2018 at 22:44, Matt Sealey <neko@bakuhatsu.net> wrote: > > On 3 August 2018 at 13:25, Mikulas Patocka <mpatocka@redhat.com> wrote: > >> > >> > >> On Fri, 3 Aug 2018, Ard Biesheuvel wrote: > >> > >>> Are we still talking about overlapping unaligned accesses here? Or do > >>> you see other failures as well? > >> > >> Yes - it is caused by overlapping unaligned accesses inside memcpy. When I > >> put "dmb sy" between the overlapping accesses in > >> glibc/sysdeps/aarch64/memcpy.S, this program doesn't detect any memory > >> corruption. > > > > It is a symptom of generating reorderable accesses inside memcpy. It's nothing > > to do with alignment, per se (see below). A dmb sy just hides the symptoms. > > > > What we're talking about here - yes, Ard, within certain amounts of > > reason - is that > > you cannot use PCI BAR memory as 'Normal' - certainly never cacheable memory, > > but Normal NC isn't good either. That is that your CPU cannot post > > writes or reads > > towards PCI memory spaces unless it is dealing with it as Device memory or very > > strictly controlled use of Normal Non-Cacheable. > > > > I understand why the rest of the world likes to mark stuff as > > 'writecombine,' but > > that's x86-ism, not an Arm memory type. > > > > There is potential for accesses to the same slave from different > > masters (or just > > different AXI IDs, most cores rotate over 8 or 16 or so for Normal > > memory to achieve) > > to be reordered. PCIe has no idea what the source was, it will just > > accept them in the order it receives them, and also it will be > > strictly defined to > > manage incoming AXI or ACE transactions (and barriers..) in a way that does > > not violate the PCIe memory model - the worst case is deadlocks, the best case > > is you see some very strange behavior. > > > > In any case the original ordering of two Normal-NC transactions may > > not make it to > > the PCIe bridge in the first place which is probably why a DMB > > resolves it - it will > > force the core to issue them in order and it's likely unless there is > > some hyper-complex > > multi-pathing going on, they'll stay ordered. If you MUST preserve the > > order between > > two Normal memory accesses, a barrier is required. The same is true also of any > > re-orderable device access. > > > > None of this explains why some transactions fail to make it across > entirely. The overlapping writes in question write the same data to > the memory locations that are covered by both, and so the ordering in > which the transactions are received should not affect the outcome. You're right that the corruption couldn't be explained just by reordering writes. My hypothesis is that the PCIe controller tries to disambiguate the overlapping writes, but the disambiguation logic was not tested and it is buggy. If there's a barrier between the overlapping writes, the PCIe controller won't see any overlapping writes, so it won't trigger the faulty disambiguation logic and it works. Could the ARM engineers look if there's some chicken bit in Cortex-A72 that could insert barriers between non-cached writes automatically? I observe these kinds of corruptions: - failing to write a few bytes - writing a few bytes that were written 16 bytes before - writing a few bytes that were written 16 bytes after Here is the example of corruptions (the first line is previous content of videoram, the second line is the content that should be present after a memcpy, and the third line is th real contents of videoram after memcpy). Here it writes three bytes that were actually written by the memcpy function 16-bytes before: p[020] e3 e4 e5 e6 e7 e8 c8 bd be bf c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf d0 d1 d2 d3 d4 d5 d[020] 97 98 99 9a 9b 9c 9d 9e 9f a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af b0 b1 b2 b3 d3 d4 d5 m[020] 97 98 99 9a 9b*8c*8d*8e* 9f a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af b0 b1 b2 b3 d3 d4 d5 Writes 4 bytes with a content that was written 16 bytes before: p[020] 47 e2 e3 e4 e5 e6 e7 e8 e9 ea eb 52 53 54 55 56 57 58 59 5a 47 48 49 4a 4b 4c 4d 4e 4f 50 51 52 d[020] 47 e2 ec ed ee ef f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff 00 01 02 03 04 05 06 07 08 09 m[020] 47 e2 ec ed ee ef f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb*ec*ed**ee*ef*00 01 02 03 04 05 06 07 08 09 Writes 2 bytes with a content that was written 16 bytes before: p[0a0] eb ec ed ee ef f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff 00 01 02 03 04 05 2f 30 31 32 33 d[0a0] eb ec ed ee ef f0 f1 f2 f3 f4 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b m[0a0] eb ec ed ee ef f0 f1 f2 f3 f4 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15*06*07* 18 19 1a 1b Writes 3 bytes with a content that was written 16 bytes after: p[0a0] 0a 17 18 19 1a 1b 1c 1d 4a 4b 4c 4d 4e 4f 50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f 60 61 d[0a0] 0a 17 a9 aa ab ac ad ae af b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf c0 c1 c2 c3 c4 c5 c6 m[0a0] 0a 17 a9 aa ab ac ad ae af b0 b1 b2 b3 b4 b5 b6 b7 b8*c9*ca**cb*bc bd be bf c0 c1 c2 c3 c4 c5 c6 Fails to write three bytes: p[040] 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 27 28 29 d[040] a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf m[040] a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac*17*18*19* b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf Fails to write one byte: p[020] 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 36 37 38 39 d[020] 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 28 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f 40 41 42 43 m[020] 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 28 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f 40 41 42*39* Fails to write 5 bytes: p[020] 6e 6f 70 71 72 73 74 75 76 77 78 ca cb cc cd ce cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de d[020] 6e 6f 70 71 72 73 74 75 76 77 78 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f 50 51 52 53 54 55 56 57 58 m[020] 6e 6f 70 71 72 73 74 75 76 77 78 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f 50 51 52 53*da**db*dc*dd*de* Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 10:25 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 10:25 UTC (permalink / raw) To: Ard Biesheuvel Cc: Thomas Petazzoni, Joao Pinto, linux-pci, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, Catalin Marinas, linux-arm-kernel On Fri, 3 Aug 2018, Ard Biesheuvel wrote: > On 3 August 2018 at 22:44, Matt Sealey <neko@bakuhatsu.net> wrote: > > On 3 August 2018 at 13:25, Mikulas Patocka <mpatocka@redhat.com> wrote: > >> > >> > >> On Fri, 3 Aug 2018, Ard Biesheuvel wrote: > >> > >>> Are we still talking about overlapping unaligned accesses here? Or do > >>> you see other failures as well? > >> > >> Yes - it is caused by overlapping unaligned accesses inside memcpy. When I > >> put "dmb sy" between the overlapping accesses in > >> glibc/sysdeps/aarch64/memcpy.S, this program doesn't detect any memory > >> corruption. > > > > It is a symptom of generating reorderable accesses inside memcpy. It's nothing > > to do with alignment, per se (see below). A dmb sy just hides the symptoms. > > > > What we're talking about here - yes, Ard, within certain amounts of > > reason - is that > > you cannot use PCI BAR memory as 'Normal' - certainly never cacheable memory, > > but Normal NC isn't good either. That is that your CPU cannot post > > writes or reads > > towards PCI memory spaces unless it is dealing with it as Device memory or very > > strictly controlled use of Normal Non-Cacheable. > > > > I understand why the rest of the world likes to mark stuff as > > 'writecombine,' but > > that's x86-ism, not an Arm memory type. > > > > There is potential for accesses to the same slave from different > > masters (or just > > different AXI IDs, most cores rotate over 8 or 16 or so for Normal > > memory to achieve) > > to be reordered. PCIe has no idea what the source was, it will just > > accept them in the order it receives them, and also it will be > > strictly defined to > > manage incoming AXI or ACE transactions (and barriers..) in a way that does > > not violate the PCIe memory model - the worst case is deadlocks, the best case > > is you see some very strange behavior. > > > > In any case the original ordering of two Normal-NC transactions may > > not make it to > > the PCIe bridge in the first place which is probably why a DMB > > resolves it - it will > > force the core to issue them in order and it's likely unless there is > > some hyper-complex > > multi-pathing going on, they'll stay ordered. If you MUST preserve the > > order between > > two Normal memory accesses, a barrier is required. The same is true also of any > > re-orderable device access. > > > > None of this explains why some transactions fail to make it across > entirely. The overlapping writes in question write the same data to > the memory locations that are covered by both, and so the ordering in > which the transactions are received should not affect the outcome. You're right that the corruption couldn't be explained just by reordering writes. My hypothesis is that the PCIe controller tries to disambiguate the overlapping writes, but the disambiguation logic was not tested and it is buggy. If there's a barrier between the overlapping writes, the PCIe controller won't see any overlapping writes, so it won't trigger the faulty disambiguation logic and it works. Could the ARM engineers look if there's some chicken bit in Cortex-A72 that could insert barriers between non-cached writes automatically? I observe these kinds of corruptions: - failing to write a few bytes - writing a few bytes that were written 16 bytes before - writing a few bytes that were written 16 bytes after Here is the example of corruptions (the first line is previous content of videoram, the second line is the content that should be present after a memcpy, and the third line is th real contents of videoram after memcpy). Here it writes three bytes that were actually written by the memcpy function 16-bytes before: p[020] e3 e4 e5 e6 e7 e8 c8 bd be bf c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf d0 d1 d2 d3 d4 d5 d[020] 97 98 99 9a 9b 9c 9d 9e 9f a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af b0 b1 b2 b3 d3 d4 d5 m[020] 97 98 99 9a 9b*8c*8d*8e* 9f a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af b0 b1 b2 b3 d3 d4 d5 Writes 4 bytes with a content that was written 16 bytes before: p[020] 47 e2 e3 e4 e5 e6 e7 e8 e9 ea eb 52 53 54 55 56 57 58 59 5a 47 48 49 4a 4b 4c 4d 4e 4f 50 51 52 d[020] 47 e2 ec ed ee ef f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff 00 01 02 03 04 05 06 07 08 09 m[020] 47 e2 ec ed ee ef f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb*ec*ed**ee*ef*00 01 02 03 04 05 06 07 08 09 Writes 2 bytes with a content that was written 16 bytes before: p[0a0] eb ec ed ee ef f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff 00 01 02 03 04 05 2f 30 31 32 33 d[0a0] eb ec ed ee ef f0 f1 f2 f3 f4 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b m[0a0] eb ec ed ee ef f0 f1 f2 f3 f4 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15*06*07* 18 19 1a 1b Writes 3 bytes with a content that was written 16 bytes after: p[0a0] 0a 17 18 19 1a 1b 1c 1d 4a 4b 4c 4d 4e 4f 50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f 60 61 d[0a0] 0a 17 a9 aa ab ac ad ae af b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf c0 c1 c2 c3 c4 c5 c6 m[0a0] 0a 17 a9 aa ab ac ad ae af b0 b1 b2 b3 b4 b5 b6 b7 b8*c9*ca**cb*bc bd be bf c0 c1 c2 c3 c4 c5 c6 Fails to write three bytes: p[040] 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 26 27 28 29 d[040] a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf m[040] a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac*17*18*19* b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf Fails to write one byte: p[020] 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 28 29 2a 2b 2c 2d 2e 2f 30 31 32 33 34 35 36 37 38 39 d[020] 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 28 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f 40 41 42 43 m[020] 25 26 27 28 29 2a 2b 2c 2d 2e 2f 30 31 32 28 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f 40 41 42*39* Fails to write 5 bytes: p[020] 6e 6f 70 71 72 73 74 75 76 77 78 ca cb cc cd ce cf d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de d[020] 6e 6f 70 71 72 73 74 75 76 77 78 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f 50 51 52 53 54 55 56 57 58 m[020] 6e 6f 70 71 72 73 74 75 76 77 78 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f 50 51 52 53*da**db*dc*dd*de* Mikulas _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 10:25 ` Mikulas Patocka (?) @ 2018-08-06 12:42 ` Robin Murphy -1 siblings, 0 replies; 238+ messages in thread From: Robin Murphy @ 2018-08-06 12:42 UTC (permalink / raw) To: Mikulas Patocka, Ard Biesheuvel Cc: Thomas Petazzoni, Joao Pinto, linux-pci, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, Catalin Marinas, linux-arm-kernel On 06/08/18 11:25, Mikulas Patocka wrote: [...] >> None of this explains why some transactions fail to make it across >> entirely. The overlapping writes in question write the same data to >> the memory locations that are covered by both, and so the ordering in >> which the transactions are received should not affect the outcome. > > You're right that the corruption couldn't be explained just by reordering > writes. My hypothesis is that the PCIe controller tries to disambiguate > the overlapping writes, but the disambiguation logic was not tested and it > is buggy. If there's a barrier between the overlapping writes, the PCIe > controller won't see any overlapping writes, so it won't trigger the > faulty disambiguation logic and it works. > > Could the ARM engineers look if there's some chicken bit in Cortex-A72 > that could insert barriers between non-cached writes automatically? I don't think there is, and even if there was I imagine it would have a pretty hideous effect on non-coherent DMA buffers and the various other places in which we have Normal-NC mappings of actual system RAM. > I observe these kinds of corruptions: > - failing to write a few bytes That could potentially be explained by the reordering/atomicity issues Matt mentioned, i.e. the load is observing part of the store, before the store has fully completed. > - writing a few bytes that were written 16 bytes before > - writing a few bytes that were written 16 bytes after Those sound more like the interconnect or root complex ignoring the byte strobes on an unaligned burst, of which I think the simplistic view would be "it's broken". FWIW I stuck my old Nvidia 7600GT card in my Arm Juno r2 board (2x Cortex-A72), built your test program natively with GCC 8.1.1 at -O2, and it's still happily flickering pixels in the corner of the console after nearly an hour (in parallel with some iperf3 just to ensure plenty of PCIe traffic). I would strongly suspect this issue is particular to Armada 8k, so its' probably one for the Marvell folks to take a closer look at - I believe some previous interconnect issues on those SoCs were actually fixable in firmware. Robin. ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 12:42 ` Robin Murphy 0 siblings, 0 replies; 238+ messages in thread From: Robin Murphy @ 2018-08-06 12:42 UTC (permalink / raw) To: linux-arm-kernel On 06/08/18 11:25, Mikulas Patocka wrote: [...] >> None of this explains why some transactions fail to make it across >> entirely. The overlapping writes in question write the same data to >> the memory locations that are covered by both, and so the ordering in >> which the transactions are received should not affect the outcome. > > You're right that the corruption couldn't be explained just by reordering > writes. My hypothesis is that the PCIe controller tries to disambiguate > the overlapping writes, but the disambiguation logic was not tested and it > is buggy. If there's a barrier between the overlapping writes, the PCIe > controller won't see any overlapping writes, so it won't trigger the > faulty disambiguation logic and it works. > > Could the ARM engineers look if there's some chicken bit in Cortex-A72 > that could insert barriers between non-cached writes automatically? I don't think there is, and even if there was I imagine it would have a pretty hideous effect on non-coherent DMA buffers and the various other places in which we have Normal-NC mappings of actual system RAM. > I observe these kinds of corruptions: > - failing to write a few bytes That could potentially be explained by the reordering/atomicity issues Matt mentioned, i.e. the load is observing part of the store, before the store has fully completed. > - writing a few bytes that were written 16 bytes before > - writing a few bytes that were written 16 bytes after Those sound more like the interconnect or root complex ignoring the byte strobes on an unaligned burst, of which I think the simplistic view would be "it's broken". FWIW I stuck my old Nvidia 7600GT card in my Arm Juno r2 board (2x Cortex-A72), built your test program natively with GCC 8.1.1 at -O2, and it's still happily flickering pixels in the corner of the console after nearly an hour (in parallel with some iperf3 just to ensure plenty of PCIe traffic). I would strongly suspect this issue is particular to Armada 8k, so its' probably one for the Marvell folks to take a closer look at - I believe some previous interconnect issues on those SoCs were actually fixable in firmware. Robin. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 12:42 ` Robin Murphy 0 siblings, 0 replies; 238+ messages in thread From: Robin Murphy @ 2018-08-06 12:42 UTC (permalink / raw) To: Mikulas Patocka, Ard Biesheuvel Cc: Thomas Petazzoni, Joao Pinto, Catalin Marinas, linux-pci, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, Jingoo Han, linux-arm-kernel On 06/08/18 11:25, Mikulas Patocka wrote: [...] >> None of this explains why some transactions fail to make it across >> entirely. The overlapping writes in question write the same data to >> the memory locations that are covered by both, and so the ordering in >> which the transactions are received should not affect the outcome. > > You're right that the corruption couldn't be explained just by reordering > writes. My hypothesis is that the PCIe controller tries to disambiguate > the overlapping writes, but the disambiguation logic was not tested and it > is buggy. If there's a barrier between the overlapping writes, the PCIe > controller won't see any overlapping writes, so it won't trigger the > faulty disambiguation logic and it works. > > Could the ARM engineers look if there's some chicken bit in Cortex-A72 > that could insert barriers between non-cached writes automatically? I don't think there is, and even if there was I imagine it would have a pretty hideous effect on non-coherent DMA buffers and the various other places in which we have Normal-NC mappings of actual system RAM. > I observe these kinds of corruptions: > - failing to write a few bytes That could potentially be explained by the reordering/atomicity issues Matt mentioned, i.e. the load is observing part of the store, before the store has fully completed. > - writing a few bytes that were written 16 bytes before > - writing a few bytes that were written 16 bytes after Those sound more like the interconnect or root complex ignoring the byte strobes on an unaligned burst, of which I think the simplistic view would be "it's broken". FWIW I stuck my old Nvidia 7600GT card in my Arm Juno r2 board (2x Cortex-A72), built your test program natively with GCC 8.1.1 at -O2, and it's still happily flickering pixels in the corner of the console after nearly an hour (in parallel with some iperf3 just to ensure plenty of PCIe traffic). I would strongly suspect this issue is particular to Armada 8k, so its' probably one for the Marvell folks to take a closer look at - I believe some previous interconnect issues on those SoCs were actually fixable in firmware. Robin. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 12:42 ` Robin Murphy (?) @ 2018-08-06 12:53 ` Ard Biesheuvel -1 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 12:53 UTC (permalink / raw) To: Robin Murphy Cc: Mikulas Patocka, Thomas Petazzoni, Joao Pinto, linux-pci, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, Catalin Marinas, linux-arm-kernel On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: > On 06/08/18 11:25, Mikulas Patocka wrote: > [...] >>> >>> None of this explains why some transactions fail to make it across >>> entirely. The overlapping writes in question write the same data to >>> the memory locations that are covered by both, and so the ordering in >>> which the transactions are received should not affect the outcome. >> >> >> You're right that the corruption couldn't be explained just by reordering >> writes. My hypothesis is that the PCIe controller tries to disambiguate >> the overlapping writes, but the disambiguation logic was not tested and it >> is buggy. If there's a barrier between the overlapping writes, the PCIe >> controller won't see any overlapping writes, so it won't trigger the >> faulty disambiguation logic and it works. >> >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 >> that could insert barriers between non-cached writes automatically? > > > I don't think there is, and even if there was I imagine it would have a > pretty hideous effect on non-coherent DMA buffers and the various other > places in which we have Normal-NC mappings of actual system RAM. > >> I observe these kinds of corruptions: >> - failing to write a few bytes > > > That could potentially be explained by the reordering/atomicity issues Matt > mentioned, i.e. the load is observing part of the store, before the store > has fully completed. > OK, so that means the unaligned transaction gets split, and the subtransactions are reordered with the aligned transaction so that the sub-writes contain stale values from the sub-reads? >> - writing a few bytes that were written 16 bytes before >> - writing a few bytes that were written 16 bytes after > > > Those sound more like the interconnect or root complex ignoring the byte > strobes on an unaligned burst, of which I think the simplistic view would be > "it's broken". > > FWIW I stuck my old Nvidia 7600GT card in my Arm Juno r2 board (2x > Cortex-A72), built your test program natively with GCC 8.1.1 at -O2, and > it's still happily flickering pixels in the corner of the console after > nearly an hour (in parallel with some iperf3 just to ensure plenty of PCIe > traffic). I would strongly suspect this issue is particular to Armada 8k, so > its' probably one for the Marvell folks to take a closer look at - I believe > some previous interconnect issues on those SoCs were actually fixable in > firmware. > IIRC that was DVM dropping a few VA bits at the top, and a single MMIO control bit to put it back into 'non-broken' mode. ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 12:53 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 12:53 UTC (permalink / raw) To: linux-arm-kernel On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: > On 06/08/18 11:25, Mikulas Patocka wrote: > [...] >>> >>> None of this explains why some transactions fail to make it across >>> entirely. The overlapping writes in question write the same data to >>> the memory locations that are covered by both, and so the ordering in >>> which the transactions are received should not affect the outcome. >> >> >> You're right that the corruption couldn't be explained just by reordering >> writes. My hypothesis is that the PCIe controller tries to disambiguate >> the overlapping writes, but the disambiguation logic was not tested and it >> is buggy. If there's a barrier between the overlapping writes, the PCIe >> controller won't see any overlapping writes, so it won't trigger the >> faulty disambiguation logic and it works. >> >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 >> that could insert barriers between non-cached writes automatically? > > > I don't think there is, and even if there was I imagine it would have a > pretty hideous effect on non-coherent DMA buffers and the various other > places in which we have Normal-NC mappings of actual system RAM. > >> I observe these kinds of corruptions: >> - failing to write a few bytes > > > That could potentially be explained by the reordering/atomicity issues Matt > mentioned, i.e. the load is observing part of the store, before the store > has fully completed. > OK, so that means the unaligned transaction gets split, and the subtransactions are reordered with the aligned transaction so that the sub-writes contain stale values from the sub-reads? >> - writing a few bytes that were written 16 bytes before >> - writing a few bytes that were written 16 bytes after > > > Those sound more like the interconnect or root complex ignoring the byte > strobes on an unaligned burst, of which I think the simplistic view would be > "it's broken". > > FWIW I stuck my old Nvidia 7600GT card in my Arm Juno r2 board (2x > Cortex-A72), built your test program natively with GCC 8.1.1 at -O2, and > it's still happily flickering pixels in the corner of the console after > nearly an hour (in parallel with some iperf3 just to ensure plenty of PCIe > traffic). I would strongly suspect this issue is particular to Armada 8k, so > its' probably one for the Marvell folks to take a closer look at - I believe > some previous interconnect issues on those SoCs were actually fixable in > firmware. > IIRC that was DVM dropping a few VA bits at the top, and a single MMIO control bit to put it back into 'non-broken' mode. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 12:53 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 12:53 UTC (permalink / raw) To: Robin Murphy Cc: Thomas Petazzoni, Joao Pinto, Catalin Marinas, linux-pci, Will Deacon, Russell King, Linux Kernel Mailing List, Mikulas Patocka, Matt Sealey, Jingoo Han, linux-arm-kernel On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: > On 06/08/18 11:25, Mikulas Patocka wrote: > [...] >>> >>> None of this explains why some transactions fail to make it across >>> entirely. The overlapping writes in question write the same data to >>> the memory locations that are covered by both, and so the ordering in >>> which the transactions are received should not affect the outcome. >> >> >> You're right that the corruption couldn't be explained just by reordering >> writes. My hypothesis is that the PCIe controller tries to disambiguate >> the overlapping writes, but the disambiguation logic was not tested and it >> is buggy. If there's a barrier between the overlapping writes, the PCIe >> controller won't see any overlapping writes, so it won't trigger the >> faulty disambiguation logic and it works. >> >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 >> that could insert barriers between non-cached writes automatically? > > > I don't think there is, and even if there was I imagine it would have a > pretty hideous effect on non-coherent DMA buffers and the various other > places in which we have Normal-NC mappings of actual system RAM. > >> I observe these kinds of corruptions: >> - failing to write a few bytes > > > That could potentially be explained by the reordering/atomicity issues Matt > mentioned, i.e. the load is observing part of the store, before the store > has fully completed. > OK, so that means the unaligned transaction gets split, and the subtransactions are reordered with the aligned transaction so that the sub-writes contain stale values from the sub-reads? >> - writing a few bytes that were written 16 bytes before >> - writing a few bytes that were written 16 bytes after > > > Those sound more like the interconnect or root complex ignoring the byte > strobes on an unaligned burst, of which I think the simplistic view would be > "it's broken". > > FWIW I stuck my old Nvidia 7600GT card in my Arm Juno r2 board (2x > Cortex-A72), built your test program natively with GCC 8.1.1 at -O2, and > it's still happily flickering pixels in the corner of the console after > nearly an hour (in parallel with some iperf3 just to ensure plenty of PCIe > traffic). I would strongly suspect this issue is particular to Armada 8k, so > its' probably one for the Marvell folks to take a closer look at - I believe > some previous interconnect issues on those SoCs were actually fixable in > firmware. > IIRC that was DVM dropping a few VA bits at the top, and a single MMIO control bit to put it back into 'non-broken' mode. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 12:42 ` Robin Murphy (?) @ 2018-08-06 13:41 ` Marcin Wojtas -1 siblings, 0 replies; 238+ messages in thread From: Marcin Wojtas @ 2018-08-06 13:41 UTC (permalink / raw) To: mpatocka Cc: Ard Biesheuvel, Robin Murphy, Thomas Petazzoni, Joao Pinto, Catalin Marinas, linux-pci, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, Matt Sealey, Jingoo Han, linux-arm-kernel Hi Mikulas, pon., 6 sie 2018 o 14:42 Robin Murphy <robin.murphy@arm.com> napisał(a): > > On 06/08/18 11:25, Mikulas Patocka wrote: > [...] > >> None of this explains why some transactions fail to make it across > >> entirely. The overlapping writes in question write the same data to > >> the memory locations that are covered by both, and so the ordering in > >> which the transactions are received should not affect the outcome. > > > > You're right that the corruption couldn't be explained just by reordering > > writes. My hypothesis is that the PCIe controller tries to disambiguate > > the overlapping writes, but the disambiguation logic was not tested and it > > is buggy. If there's a barrier between the overlapping writes, the PCIe > > controller won't see any overlapping writes, so it won't trigger the > > faulty disambiguation logic and it works. > > > > Could the ARM engineers look if there's some chicken bit in Cortex-A72 > > that could insert barriers between non-cached writes automatically? > > I don't think there is, and even if there was I imagine it would have a > pretty hideous effect on non-coherent DMA buffers and the various other > places in which we have Normal-NC mappings of actual system RAM. > > > I observe these kinds of corruptions: > > - failing to write a few bytes > > That could potentially be explained by the reordering/atomicity issues > Matt mentioned, i.e. the load is observing part of the store, before the > store has fully completed. > > > - writing a few bytes that were written 16 bytes before > > - writing a few bytes that were written 16 bytes after > > Those sound more like the interconnect or root complex ignoring the byte > strobes on an unaligned burst, of which I think the simplistic view > would be "it's broken". > > FWIW I stuck my old Nvidia 7600GT card in my Arm Juno r2 board (2x > Cortex-A72), built your test program natively with GCC 8.1.1 at -O2, and > it's still happily flickering pixels in the corner of the console after > nearly an hour (in parallel with some iperf3 just to ensure plenty of > PCIe traffic). I would strongly suspect this issue is particular to > Armada 8k, so its' probably one for the Marvell folks to take a closer > look at - I believe some previous interconnect issues on those SoCs were > actually fixable in firmware. > > On my Macchiato I use GT630 card (nuveau driver) + debian + xfce desktop and in dual monitor mode, I could run a couple of 1080p streams. All smooth and I've never noticed any image corruption whatsoever (I spent a lot of time in front of such setup). Just to be on a safe side, can you send me a bootlog and your board revision? I'd like to see your firware version and type. Thanks, Marcin ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 13:41 ` Marcin Wojtas 0 siblings, 0 replies; 238+ messages in thread From: Marcin Wojtas @ 2018-08-06 13:41 UTC (permalink / raw) To: linux-arm-kernel Hi Mikulas, pon., 6 sie 2018 o 14:42 Robin Murphy <robin.murphy@arm.com> napisa?(a): > > On 06/08/18 11:25, Mikulas Patocka wrote: > [...] > >> None of this explains why some transactions fail to make it across > >> entirely. The overlapping writes in question write the same data to > >> the memory locations that are covered by both, and so the ordering in > >> which the transactions are received should not affect the outcome. > > > > You're right that the corruption couldn't be explained just by reordering > > writes. My hypothesis is that the PCIe controller tries to disambiguate > > the overlapping writes, but the disambiguation logic was not tested and it > > is buggy. If there's a barrier between the overlapping writes, the PCIe > > controller won't see any overlapping writes, so it won't trigger the > > faulty disambiguation logic and it works. > > > > Could the ARM engineers look if there's some chicken bit in Cortex-A72 > > that could insert barriers between non-cached writes automatically? > > I don't think there is, and even if there was I imagine it would have a > pretty hideous effect on non-coherent DMA buffers and the various other > places in which we have Normal-NC mappings of actual system RAM. > > > I observe these kinds of corruptions: > > - failing to write a few bytes > > That could potentially be explained by the reordering/atomicity issues > Matt mentioned, i.e. the load is observing part of the store, before the > store has fully completed. > > > - writing a few bytes that were written 16 bytes before > > - writing a few bytes that were written 16 bytes after > > Those sound more like the interconnect or root complex ignoring the byte > strobes on an unaligned burst, of which I think the simplistic view > would be "it's broken". > > FWIW I stuck my old Nvidia 7600GT card in my Arm Juno r2 board (2x > Cortex-A72), built your test program natively with GCC 8.1.1 at -O2, and > it's still happily flickering pixels in the corner of the console after > nearly an hour (in parallel with some iperf3 just to ensure plenty of > PCIe traffic). I would strongly suspect this issue is particular to > Armada 8k, so its' probably one for the Marvell folks to take a closer > look at - I believe some previous interconnect issues on those SoCs were > actually fixable in firmware. > > On my Macchiato I use GT630 card (nuveau driver) + debian + xfce desktop and in dual monitor mode, I could run a couple of 1080p streams. All smooth and I've never noticed any image corruption whatsoever (I spent a lot of time in front of such setup). Just to be on a safe side, can you send me a bootlog and your board revision? I'd like to see your firware version and type. Thanks, Marcin ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 13:41 ` Marcin Wojtas 0 siblings, 0 replies; 238+ messages in thread From: Marcin Wojtas @ 2018-08-06 13:41 UTC (permalink / raw) To: mpatocka Cc: Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, Catalin Marinas, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, Matt Sealey, linux-pci, Jingoo Han, Robin Murphy, linux-arm-kernel SGkgTWlrdWxhcywKCnBvbi4sIDYgc2llIDIwMTggbyAxNDo0MiBSb2JpbiBNdXJwaHkgPHJvYmlu Lm11cnBoeUBhcm0uY29tPiBuYXBpc2HFgihhKToKPgo+IE9uIDA2LzA4LzE4IDExOjI1LCBNaWt1 bGFzIFBhdG9ja2Egd3JvdGU6Cj4gWy4uLl0KPiA+PiBOb25lIG9mIHRoaXMgZXhwbGFpbnMgd2h5 IHNvbWUgdHJhbnNhY3Rpb25zIGZhaWwgdG8gbWFrZSBpdCBhY3Jvc3MKPiA+PiBlbnRpcmVseS4g VGhlIG92ZXJsYXBwaW5nIHdyaXRlcyBpbiBxdWVzdGlvbiB3cml0ZSB0aGUgc2FtZSBkYXRhIHRv Cj4gPj4gdGhlIG1lbW9yeSBsb2NhdGlvbnMgdGhhdCBhcmUgY292ZXJlZCBieSBib3RoLCBhbmQg c28gdGhlIG9yZGVyaW5nIGluCj4gPj4gd2hpY2ggdGhlIHRyYW5zYWN0aW9ucyBhcmUgcmVjZWl2 ZWQgc2hvdWxkIG5vdCBhZmZlY3QgdGhlIG91dGNvbWUuCj4gPgo+ID4gWW91J3JlIHJpZ2h0IHRo YXQgdGhlIGNvcnJ1cHRpb24gY291bGRuJ3QgYmUgZXhwbGFpbmVkIGp1c3QgYnkgcmVvcmRlcmlu Zwo+ID4gd3JpdGVzLiBNeSBoeXBvdGhlc2lzIGlzIHRoYXQgdGhlIFBDSWUgY29udHJvbGxlciB0 cmllcyB0byBkaXNhbWJpZ3VhdGUKPiA+IHRoZSBvdmVybGFwcGluZyB3cml0ZXMsIGJ1dCB0aGUg ZGlzYW1iaWd1YXRpb24gbG9naWMgd2FzIG5vdCB0ZXN0ZWQgYW5kIGl0Cj4gPiBpcyBidWdneS4g SWYgdGhlcmUncyBhIGJhcnJpZXIgYmV0d2VlbiB0aGUgb3ZlcmxhcHBpbmcgd3JpdGVzLCB0aGUg UENJZQo+ID4gY29udHJvbGxlciB3b24ndCBzZWUgYW55IG92ZXJsYXBwaW5nIHdyaXRlcywgc28g aXQgd29uJ3QgdHJpZ2dlciB0aGUKPiA+IGZhdWx0eSBkaXNhbWJpZ3VhdGlvbiBsb2dpYyBhbmQg aXQgd29ya3MuCj4gPgo+ID4gQ291bGQgdGhlIEFSTSBlbmdpbmVlcnMgbG9vayBpZiB0aGVyZSdz IHNvbWUgY2hpY2tlbiBiaXQgaW4gQ29ydGV4LUE3Mgo+ID4gdGhhdCBjb3VsZCBpbnNlcnQgYmFy cmllcnMgYmV0d2VlbiBub24tY2FjaGVkIHdyaXRlcyBhdXRvbWF0aWNhbGx5Pwo+Cj4gSSBkb24n dCB0aGluayB0aGVyZSBpcywgYW5kIGV2ZW4gaWYgdGhlcmUgd2FzIEkgaW1hZ2luZSBpdCB3b3Vs ZCBoYXZlIGEKPiBwcmV0dHkgaGlkZW91cyBlZmZlY3Qgb24gbm9uLWNvaGVyZW50IERNQSBidWZm ZXJzIGFuZCB0aGUgdmFyaW91cyBvdGhlcgo+IHBsYWNlcyBpbiB3aGljaCB3ZSBoYXZlIE5vcm1h bC1OQyBtYXBwaW5ncyBvZiBhY3R1YWwgc3lzdGVtIFJBTS4KPgo+ID4gSSBvYnNlcnZlIHRoZXNl IGtpbmRzIG9mIGNvcnJ1cHRpb25zOgo+ID4gLSBmYWlsaW5nIHRvIHdyaXRlIGEgZmV3IGJ5dGVz Cj4KPiBUaGF0IGNvdWxkIHBvdGVudGlhbGx5IGJlIGV4cGxhaW5lZCBieSB0aGUgcmVvcmRlcmlu Zy9hdG9taWNpdHkgaXNzdWVzCj4gTWF0dCBtZW50aW9uZWQsIGkuZS4gdGhlIGxvYWQgaXMgb2Jz ZXJ2aW5nIHBhcnQgb2YgdGhlIHN0b3JlLCBiZWZvcmUgdGhlCj4gc3RvcmUgaGFzIGZ1bGx5IGNv bXBsZXRlZC4KPgo+ID4gLSB3cml0aW5nIGEgZmV3IGJ5dGVzIHRoYXQgd2VyZSB3cml0dGVuIDE2 IGJ5dGVzIGJlZm9yZQo+ID4gLSB3cml0aW5nIGEgZmV3IGJ5dGVzIHRoYXQgd2VyZSB3cml0dGVu IDE2IGJ5dGVzIGFmdGVyCj4KPiBUaG9zZSBzb3VuZCBtb3JlIGxpa2UgdGhlIGludGVyY29ubmVj dCBvciByb290IGNvbXBsZXggaWdub3JpbmcgdGhlIGJ5dGUKPiBzdHJvYmVzIG9uIGFuIHVuYWxp Z25lZCBidXJzdCwgb2Ygd2hpY2ggSSB0aGluayB0aGUgc2ltcGxpc3RpYyB2aWV3Cj4gd291bGQg YmUgIml0J3MgYnJva2VuIi4KPgo+IEZXSVcgSSBzdHVjayBteSBvbGQgTnZpZGlhIDc2MDBHVCBj YXJkIGluIG15IEFybSBKdW5vIHIyIGJvYXJkICgyeAo+IENvcnRleC1BNzIpLCBidWlsdCB5b3Vy IHRlc3QgcHJvZ3JhbSBuYXRpdmVseSB3aXRoIEdDQyA4LjEuMSBhdCAtTzIsIGFuZAo+IGl0J3Mg c3RpbGwgaGFwcGlseSBmbGlja2VyaW5nIHBpeGVscyBpbiB0aGUgY29ybmVyIG9mIHRoZSBjb25z b2xlIGFmdGVyCj4gbmVhcmx5IGFuIGhvdXIgKGluIHBhcmFsbGVsIHdpdGggc29tZSBpcGVyZjMg anVzdCB0byBlbnN1cmUgcGxlbnR5IG9mCj4gUENJZSB0cmFmZmljKS4gSSB3b3VsZCBzdHJvbmds eSBzdXNwZWN0IHRoaXMgaXNzdWUgaXMgcGFydGljdWxhciB0bwo+IEFybWFkYSA4aywgc28gaXRz JyBwcm9iYWJseSBvbmUgZm9yIHRoZSBNYXJ2ZWxsIGZvbGtzIHRvIHRha2UgYSBjbG9zZXIKPiBs b29rIGF0IC0gSSBiZWxpZXZlIHNvbWUgcHJldmlvdXMgaW50ZXJjb25uZWN0IGlzc3VlcyBvbiB0 aG9zZSBTb0NzIHdlcmUKPiBhY3R1YWxseSBmaXhhYmxlIGluIGZpcm13YXJlLgo+Cj4KCk9uIG15 IE1hY2NoaWF0byBJIHVzZSBHVDYzMCBjYXJkIChudXZlYXUgZHJpdmVyKSArIGRlYmlhbiArIHhm Y2UKZGVza3RvcCBhbmQgaW4gZHVhbCBtb25pdG9yIG1vZGUsIEkgY291bGQgcnVuIGEgY291cGxl IG9mIDEwODBwCnN0cmVhbXMuIEFsbCBzbW9vdGggYW5kIEkndmUgbmV2ZXIgbm90aWNlZCBhbnkg aW1hZ2UgY29ycnVwdGlvbgp3aGF0c29ldmVyIChJIHNwZW50IGEgbG90IG9mIHRpbWUgaW4gZnJv bnQgb2Ygc3VjaCBzZXR1cCkuIEp1c3QgdG8gYmUKb24gYSBzYWZlIHNpZGUsIGNhbiB5b3Ugc2Vu ZCBtZSBhIGJvb3Rsb2cgYW5kIHlvdXIgYm9hcmQgcmV2aXNpb24/IEknZApsaWtlIHRvIHNlZSB5 b3VyIGZpcndhcmUgdmVyc2lvbiBhbmQgdHlwZS4KClRoYW5rcywKTWFyY2luCgpfX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpsaW51eC1hcm0ta2VybmVsIG1h aWxpbmcgbGlzdApsaW51eC1hcm0ta2VybmVsQGxpc3RzLmluZnJhZGVhZC5vcmcKaHR0cDovL2xp c3RzLmluZnJhZGVhZC5vcmcvbWFpbG1hbi9saXN0aW5mby9saW51eC1hcm0ta2VybmVsCg== ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 13:41 ` Marcin Wojtas (?) @ 2018-08-06 13:48 ` Ard Biesheuvel -1 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 13:48 UTC (permalink / raw) To: Marcin Wojtas Cc: Mikulas Patocka, Robin Murphy, Thomas Petazzoni, Joao Pinto, Catalin Marinas, linux-pci, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, Matt Sealey, Jingoo Han, linux-arm-kernel On 6 August 2018 at 15:41, Marcin Wojtas <mw@semihalf.com> wrote: > Hi Mikulas, > > pon., 6 sie 2018 o 14:42 Robin Murphy <robin.murphy@arm.com> napisał(a): >> >> On 06/08/18 11:25, Mikulas Patocka wrote: >> [...] >> >> None of this explains why some transactions fail to make it across >> >> entirely. The overlapping writes in question write the same data to >> >> the memory locations that are covered by both, and so the ordering in >> >> which the transactions are received should not affect the outcome. >> > >> > You're right that the corruption couldn't be explained just by reordering >> > writes. My hypothesis is that the PCIe controller tries to disambiguate >> > the overlapping writes, but the disambiguation logic was not tested and it >> > is buggy. If there's a barrier between the overlapping writes, the PCIe >> > controller won't see any overlapping writes, so it won't trigger the >> > faulty disambiguation logic and it works. >> > >> > Could the ARM engineers look if there's some chicken bit in Cortex-A72 >> > that could insert barriers between non-cached writes automatically? >> >> I don't think there is, and even if there was I imagine it would have a >> pretty hideous effect on non-coherent DMA buffers and the various other >> places in which we have Normal-NC mappings of actual system RAM. >> >> > I observe these kinds of corruptions: >> > - failing to write a few bytes >> >> That could potentially be explained by the reordering/atomicity issues >> Matt mentioned, i.e. the load is observing part of the store, before the >> store has fully completed. >> >> > - writing a few bytes that were written 16 bytes before >> > - writing a few bytes that were written 16 bytes after >> >> Those sound more like the interconnect or root complex ignoring the byte >> strobes on an unaligned burst, of which I think the simplistic view >> would be "it's broken". >> >> FWIW I stuck my old Nvidia 7600GT card in my Arm Juno r2 board (2x >> Cortex-A72), built your test program natively with GCC 8.1.1 at -O2, and >> it's still happily flickering pixels in the corner of the console after >> nearly an hour (in parallel with some iperf3 just to ensure plenty of >> PCIe traffic). I would strongly suspect this issue is particular to >> Armada 8k, so its' probably one for the Marvell folks to take a closer >> look at - I believe some previous interconnect issues on those SoCs were >> actually fixable in firmware. >> >> > > On my Macchiato I use GT630 card (nuveau driver) + debian + xfce > desktop and in dual monitor mode, I could run a couple of 1080p > streams. All smooth and I've never noticed any image corruption > whatsoever (I spent a lot of time in front of such setup). Just to be > on a safe side, can you send me a bootlog and your board revision? I'd > like to see your firware version and type. > Hi Marcin, Could you please try running his reproducer? ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 13:48 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 13:48 UTC (permalink / raw) To: linux-arm-kernel On 6 August 2018 at 15:41, Marcin Wojtas <mw@semihalf.com> wrote: > Hi Mikulas, > > pon., 6 sie 2018 o 14:42 Robin Murphy <robin.murphy@arm.com> napisa?(a): >> >> On 06/08/18 11:25, Mikulas Patocka wrote: >> [...] >> >> None of this explains why some transactions fail to make it across >> >> entirely. The overlapping writes in question write the same data to >> >> the memory locations that are covered by both, and so the ordering in >> >> which the transactions are received should not affect the outcome. >> > >> > You're right that the corruption couldn't be explained just by reordering >> > writes. My hypothesis is that the PCIe controller tries to disambiguate >> > the overlapping writes, but the disambiguation logic was not tested and it >> > is buggy. If there's a barrier between the overlapping writes, the PCIe >> > controller won't see any overlapping writes, so it won't trigger the >> > faulty disambiguation logic and it works. >> > >> > Could the ARM engineers look if there's some chicken bit in Cortex-A72 >> > that could insert barriers between non-cached writes automatically? >> >> I don't think there is, and even if there was I imagine it would have a >> pretty hideous effect on non-coherent DMA buffers and the various other >> places in which we have Normal-NC mappings of actual system RAM. >> >> > I observe these kinds of corruptions: >> > - failing to write a few bytes >> >> That could potentially be explained by the reordering/atomicity issues >> Matt mentioned, i.e. the load is observing part of the store, before the >> store has fully completed. >> >> > - writing a few bytes that were written 16 bytes before >> > - writing a few bytes that were written 16 bytes after >> >> Those sound more like the interconnect or root complex ignoring the byte >> strobes on an unaligned burst, of which I think the simplistic view >> would be "it's broken". >> >> FWIW I stuck my old Nvidia 7600GT card in my Arm Juno r2 board (2x >> Cortex-A72), built your test program natively with GCC 8.1.1 at -O2, and >> it's still happily flickering pixels in the corner of the console after >> nearly an hour (in parallel with some iperf3 just to ensure plenty of >> PCIe traffic). I would strongly suspect this issue is particular to >> Armada 8k, so its' probably one for the Marvell folks to take a closer >> look at - I believe some previous interconnect issues on those SoCs were >> actually fixable in firmware. >> >> > > On my Macchiato I use GT630 card (nuveau driver) + debian + xfce > desktop and in dual monitor mode, I could run a couple of 1080p > streams. All smooth and I've never noticed any image corruption > whatsoever (I spent a lot of time in front of such setup). Just to be > on a safe side, can you send me a bootlog and your board revision? I'd > like to see your firware version and type. > Hi Marcin, Could you please try running his reproducer? ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 13:48 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 13:48 UTC (permalink / raw) To: Marcin Wojtas Cc: Thomas Petazzoni, Joao Pinto, Jingoo Han, linux-pci, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, Mikulas Patocka, Matt Sealey, Catalin Marinas, Robin Murphy, linux-arm-kernel T24gNiBBdWd1c3QgMjAxOCBhdCAxNTo0MSwgTWFyY2luIFdvanRhcyA8bXdAc2VtaWhhbGYuY29t PiB3cm90ZToKPiBIaSBNaWt1bGFzLAo+Cj4gcG9uLiwgNiBzaWUgMjAxOCBvIDE0OjQyIFJvYmlu IE11cnBoeSA8cm9iaW4ubXVycGh5QGFybS5jb20+IG5hcGlzYcWCKGEpOgo+Pgo+PiBPbiAwNi8w OC8xOCAxMToyNSwgTWlrdWxhcyBQYXRvY2thIHdyb3RlOgo+PiBbLi4uXQo+PiA+PiBOb25lIG9m IHRoaXMgZXhwbGFpbnMgd2h5IHNvbWUgdHJhbnNhY3Rpb25zIGZhaWwgdG8gbWFrZSBpdCBhY3Jv c3MKPj4gPj4gZW50aXJlbHkuIFRoZSBvdmVybGFwcGluZyB3cml0ZXMgaW4gcXVlc3Rpb24gd3Jp dGUgdGhlIHNhbWUgZGF0YSB0bwo+PiA+PiB0aGUgbWVtb3J5IGxvY2F0aW9ucyB0aGF0IGFyZSBj b3ZlcmVkIGJ5IGJvdGgsIGFuZCBzbyB0aGUgb3JkZXJpbmcgaW4KPj4gPj4gd2hpY2ggdGhlIHRy YW5zYWN0aW9ucyBhcmUgcmVjZWl2ZWQgc2hvdWxkIG5vdCBhZmZlY3QgdGhlIG91dGNvbWUuCj4+ ID4KPj4gPiBZb3UncmUgcmlnaHQgdGhhdCB0aGUgY29ycnVwdGlvbiBjb3VsZG4ndCBiZSBleHBs YWluZWQganVzdCBieSByZW9yZGVyaW5nCj4+ID4gd3JpdGVzLiBNeSBoeXBvdGhlc2lzIGlzIHRo YXQgdGhlIFBDSWUgY29udHJvbGxlciB0cmllcyB0byBkaXNhbWJpZ3VhdGUKPj4gPiB0aGUgb3Zl cmxhcHBpbmcgd3JpdGVzLCBidXQgdGhlIGRpc2FtYmlndWF0aW9uIGxvZ2ljIHdhcyBub3QgdGVz dGVkIGFuZCBpdAo+PiA+IGlzIGJ1Z2d5LiBJZiB0aGVyZSdzIGEgYmFycmllciBiZXR3ZWVuIHRo ZSBvdmVybGFwcGluZyB3cml0ZXMsIHRoZSBQQ0llCj4+ID4gY29udHJvbGxlciB3b24ndCBzZWUg YW55IG92ZXJsYXBwaW5nIHdyaXRlcywgc28gaXQgd29uJ3QgdHJpZ2dlciB0aGUKPj4gPiBmYXVs dHkgZGlzYW1iaWd1YXRpb24gbG9naWMgYW5kIGl0IHdvcmtzLgo+PiA+Cj4+ID4gQ291bGQgdGhl IEFSTSBlbmdpbmVlcnMgbG9vayBpZiB0aGVyZSdzIHNvbWUgY2hpY2tlbiBiaXQgaW4gQ29ydGV4 LUE3Mgo+PiA+IHRoYXQgY291bGQgaW5zZXJ0IGJhcnJpZXJzIGJldHdlZW4gbm9uLWNhY2hlZCB3 cml0ZXMgYXV0b21hdGljYWxseT8KPj4KPj4gSSBkb24ndCB0aGluayB0aGVyZSBpcywgYW5kIGV2 ZW4gaWYgdGhlcmUgd2FzIEkgaW1hZ2luZSBpdCB3b3VsZCBoYXZlIGEKPj4gcHJldHR5IGhpZGVv dXMgZWZmZWN0IG9uIG5vbi1jb2hlcmVudCBETUEgYnVmZmVycyBhbmQgdGhlIHZhcmlvdXMgb3Ro ZXIKPj4gcGxhY2VzIGluIHdoaWNoIHdlIGhhdmUgTm9ybWFsLU5DIG1hcHBpbmdzIG9mIGFjdHVh bCBzeXN0ZW0gUkFNLgo+Pgo+PiA+IEkgb2JzZXJ2ZSB0aGVzZSBraW5kcyBvZiBjb3JydXB0aW9u czoKPj4gPiAtIGZhaWxpbmcgdG8gd3JpdGUgYSBmZXcgYnl0ZXMKPj4KPj4gVGhhdCBjb3VsZCBw b3RlbnRpYWxseSBiZSBleHBsYWluZWQgYnkgdGhlIHJlb3JkZXJpbmcvYXRvbWljaXR5IGlzc3Vl cwo+PiBNYXR0IG1lbnRpb25lZCwgaS5lLiB0aGUgbG9hZCBpcyBvYnNlcnZpbmcgcGFydCBvZiB0 aGUgc3RvcmUsIGJlZm9yZSB0aGUKPj4gc3RvcmUgaGFzIGZ1bGx5IGNvbXBsZXRlZC4KPj4KPj4g PiAtIHdyaXRpbmcgYSBmZXcgYnl0ZXMgdGhhdCB3ZXJlIHdyaXR0ZW4gMTYgYnl0ZXMgYmVmb3Jl Cj4+ID4gLSB3cml0aW5nIGEgZmV3IGJ5dGVzIHRoYXQgd2VyZSB3cml0dGVuIDE2IGJ5dGVzIGFm dGVyCj4+Cj4+IFRob3NlIHNvdW5kIG1vcmUgbGlrZSB0aGUgaW50ZXJjb25uZWN0IG9yIHJvb3Qg Y29tcGxleCBpZ25vcmluZyB0aGUgYnl0ZQo+PiBzdHJvYmVzIG9uIGFuIHVuYWxpZ25lZCBidXJz dCwgb2Ygd2hpY2ggSSB0aGluayB0aGUgc2ltcGxpc3RpYyB2aWV3Cj4+IHdvdWxkIGJlICJpdCdz IGJyb2tlbiIuCj4+Cj4+IEZXSVcgSSBzdHVjayBteSBvbGQgTnZpZGlhIDc2MDBHVCBjYXJkIGlu IG15IEFybSBKdW5vIHIyIGJvYXJkICgyeAo+PiBDb3J0ZXgtQTcyKSwgYnVpbHQgeW91ciB0ZXN0 IHByb2dyYW0gbmF0aXZlbHkgd2l0aCBHQ0MgOC4xLjEgYXQgLU8yLCBhbmQKPj4gaXQncyBzdGls bCBoYXBwaWx5IGZsaWNrZXJpbmcgcGl4ZWxzIGluIHRoZSBjb3JuZXIgb2YgdGhlIGNvbnNvbGUg YWZ0ZXIKPj4gbmVhcmx5IGFuIGhvdXIgKGluIHBhcmFsbGVsIHdpdGggc29tZSBpcGVyZjMganVz dCB0byBlbnN1cmUgcGxlbnR5IG9mCj4+IFBDSWUgdHJhZmZpYykuIEkgd291bGQgc3Ryb25nbHkg c3VzcGVjdCB0aGlzIGlzc3VlIGlzIHBhcnRpY3VsYXIgdG8KPj4gQXJtYWRhIDhrLCBzbyBpdHMn IHByb2JhYmx5IG9uZSBmb3IgdGhlIE1hcnZlbGwgZm9sa3MgdG8gdGFrZSBhIGNsb3Nlcgo+PiBs b29rIGF0IC0gSSBiZWxpZXZlIHNvbWUgcHJldmlvdXMgaW50ZXJjb25uZWN0IGlzc3VlcyBvbiB0 aG9zZSBTb0NzIHdlcmUKPj4gYWN0dWFsbHkgZml4YWJsZSBpbiBmaXJtd2FyZS4KPj4KPj4KPgo+ IE9uIG15IE1hY2NoaWF0byBJIHVzZSBHVDYzMCBjYXJkIChudXZlYXUgZHJpdmVyKSArIGRlYmlh biArIHhmY2UKPiBkZXNrdG9wIGFuZCBpbiBkdWFsIG1vbml0b3IgbW9kZSwgSSBjb3VsZCBydW4g YSBjb3VwbGUgb2YgMTA4MHAKPiBzdHJlYW1zLiBBbGwgc21vb3RoIGFuZCBJJ3ZlIG5ldmVyIG5v dGljZWQgYW55IGltYWdlIGNvcnJ1cHRpb24KPiB3aGF0c29ldmVyIChJIHNwZW50IGEgbG90IG9m IHRpbWUgaW4gZnJvbnQgb2Ygc3VjaCBzZXR1cCkuIEp1c3QgdG8gYmUKPiBvbiBhIHNhZmUgc2lk ZSwgY2FuIHlvdSBzZW5kIG1lIGEgYm9vdGxvZyBhbmQgeW91ciBib2FyZCByZXZpc2lvbj8gSSdk Cj4gbGlrZSB0byBzZWUgeW91ciBmaXJ3YXJlIHZlcnNpb24gYW5kIHR5cGUuCj4KCkhpIE1hcmNp biwKCkNvdWxkIHlvdSBwbGVhc2UgdHJ5IHJ1bm5pbmcgaGlzIHJlcHJvZHVjZXI/CgpfX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpsaW51eC1hcm0ta2VybmVs IG1haWxpbmcgbGlzdApsaW51eC1hcm0ta2VybmVsQGxpc3RzLmluZnJhZGVhZC5vcmcKaHR0cDov L2xpc3RzLmluZnJhZGVhZC5vcmcvbWFpbG1hbi9saXN0aW5mby9saW51eC1hcm0ta2VybmVsCg== ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 13:48 ` Ard Biesheuvel (?) @ 2018-08-06 14:07 ` Marcin Wojtas -1 siblings, 0 replies; 238+ messages in thread From: Marcin Wojtas @ 2018-08-06 14:07 UTC (permalink / raw) To: Ard Biesheuvel, mpatocka Cc: Robin Murphy, Thomas Petazzoni, Joao Pinto, Catalin Marinas, linux-pci, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, Matt Sealey, Jingoo Han, linux-arm-kernel Hi Ard, Mikulas, pon., 6 sie 2018 o 15:48 Ard Biesheuvel <ard.biesheuvel@linaro.org> napisał(a): > > On 6 August 2018 at 15:41, Marcin Wojtas <mw@semihalf.com> wrote: > > Hi Mikulas, > > > > pon., 6 sie 2018 o 14:42 Robin Murphy <robin.murphy@arm.com> napisał(a): > >> > >> On 06/08/18 11:25, Mikulas Patocka wrote: > >> [...] > >> >> None of this explains why some transactions fail to make it across > >> >> entirely. The overlapping writes in question write the same data to > >> >> the memory locations that are covered by both, and so the ordering in > >> >> which the transactions are received should not affect the outcome. > >> > > >> > You're right that the corruption couldn't be explained just by reordering > >> > writes. My hypothesis is that the PCIe controller tries to disambiguate > >> > the overlapping writes, but the disambiguation logic was not tested and it > >> > is buggy. If there's a barrier between the overlapping writes, the PCIe > >> > controller won't see any overlapping writes, so it won't trigger the > >> > faulty disambiguation logic and it works. > >> > > >> > Could the ARM engineers look if there's some chicken bit in Cortex-A72 > >> > that could insert barriers between non-cached writes automatically? > >> > >> I don't think there is, and even if there was I imagine it would have a > >> pretty hideous effect on non-coherent DMA buffers and the various other > >> places in which we have Normal-NC mappings of actual system RAM. > >> > >> > I observe these kinds of corruptions: > >> > - failing to write a few bytes > >> > >> That could potentially be explained by the reordering/atomicity issues > >> Matt mentioned, i.e. the load is observing part of the store, before the > >> store has fully completed. > >> > >> > - writing a few bytes that were written 16 bytes before > >> > - writing a few bytes that were written 16 bytes after > >> > >> Those sound more like the interconnect or root complex ignoring the byte > >> strobes on an unaligned burst, of which I think the simplistic view > >> would be "it's broken". > >> > >> FWIW I stuck my old Nvidia 7600GT card in my Arm Juno r2 board (2x > >> Cortex-A72), built your test program natively with GCC 8.1.1 at -O2, and > >> it's still happily flickering pixels in the corner of the console after > >> nearly an hour (in parallel with some iperf3 just to ensure plenty of > >> PCIe traffic). I would strongly suspect this issue is particular to > >> Armada 8k, so its' probably one for the Marvell folks to take a closer > >> look at - I believe some previous interconnect issues on those SoCs were > >> actually fixable in firmware. > >> > >> > > > > On my Macchiato I use GT630 card (nuveau driver) + debian + xfce > > desktop and in dual monitor mode, I could run a couple of 1080p > > streams. All smooth and I've never noticed any image corruption > > whatsoever (I spent a lot of time in front of such setup). Just to be > > on a safe side, can you send me a bootlog and your board revision? I'd > > like to see your firware version and type. > > > > Hi Marcin, > > Could you please try running his reproducer? This is exactly what I plan to do, as soon as I can plug my GFX card back to the board (tomorrow). Just to remain aligned - is it ok, if I boot my debian with GT630 plugged, compile the program with -O2 and simlply run it on /dev/fb0? Best regards, Marcin ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 14:07 ` Marcin Wojtas 0 siblings, 0 replies; 238+ messages in thread From: Marcin Wojtas @ 2018-08-06 14:07 UTC (permalink / raw) To: linux-arm-kernel Hi Ard, Mikulas, pon., 6 sie 2018 o 15:48 Ard Biesheuvel <ard.biesheuvel@linaro.org> napisa?(a): > > On 6 August 2018 at 15:41, Marcin Wojtas <mw@semihalf.com> wrote: > > Hi Mikulas, > > > > pon., 6 sie 2018 o 14:42 Robin Murphy <robin.murphy@arm.com> napisa?(a): > >> > >> On 06/08/18 11:25, Mikulas Patocka wrote: > >> [...] > >> >> None of this explains why some transactions fail to make it across > >> >> entirely. The overlapping writes in question write the same data to > >> >> the memory locations that are covered by both, and so the ordering in > >> >> which the transactions are received should not affect the outcome. > >> > > >> > You're right that the corruption couldn't be explained just by reordering > >> > writes. My hypothesis is that the PCIe controller tries to disambiguate > >> > the overlapping writes, but the disambiguation logic was not tested and it > >> > is buggy. If there's a barrier between the overlapping writes, the PCIe > >> > controller won't see any overlapping writes, so it won't trigger the > >> > faulty disambiguation logic and it works. > >> > > >> > Could the ARM engineers look if there's some chicken bit in Cortex-A72 > >> > that could insert barriers between non-cached writes automatically? > >> > >> I don't think there is, and even if there was I imagine it would have a > >> pretty hideous effect on non-coherent DMA buffers and the various other > >> places in which we have Normal-NC mappings of actual system RAM. > >> > >> > I observe these kinds of corruptions: > >> > - failing to write a few bytes > >> > >> That could potentially be explained by the reordering/atomicity issues > >> Matt mentioned, i.e. the load is observing part of the store, before the > >> store has fully completed. > >> > >> > - writing a few bytes that were written 16 bytes before > >> > - writing a few bytes that were written 16 bytes after > >> > >> Those sound more like the interconnect or root complex ignoring the byte > >> strobes on an unaligned burst, of which I think the simplistic view > >> would be "it's broken". > >> > >> FWIW I stuck my old Nvidia 7600GT card in my Arm Juno r2 board (2x > >> Cortex-A72), built your test program natively with GCC 8.1.1 at -O2, and > >> it's still happily flickering pixels in the corner of the console after > >> nearly an hour (in parallel with some iperf3 just to ensure plenty of > >> PCIe traffic). I would strongly suspect this issue is particular to > >> Armada 8k, so its' probably one for the Marvell folks to take a closer > >> look at - I believe some previous interconnect issues on those SoCs were > >> actually fixable in firmware. > >> > >> > > > > On my Macchiato I use GT630 card (nuveau driver) + debian + xfce > > desktop and in dual monitor mode, I could run a couple of 1080p > > streams. All smooth and I've never noticed any image corruption > > whatsoever (I spent a lot of time in front of such setup). Just to be > > on a safe side, can you send me a bootlog and your board revision? I'd > > like to see your firware version and type. > > > > Hi Marcin, > > Could you please try running his reproducer? This is exactly what I plan to do, as soon as I can plug my GFX card back to the board (tomorrow). Just to remain aligned - is it ok, if I boot my debian with GT630 plugged, compile the program with -O2 and simlply run it on /dev/fb0? Best regards, Marcin ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 14:07 ` Marcin Wojtas 0 siblings, 0 replies; 238+ messages in thread From: Marcin Wojtas @ 2018-08-06 14:07 UTC (permalink / raw) To: Ard Biesheuvel, mpatocka Cc: Thomas Petazzoni, Joao Pinto, Jingoo Han, Catalin Marinas, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, Matt Sealey, linux-pci, Robin Murphy, linux-arm-kernel SGkgQXJkLCBNaWt1bGFzLAoKcG9uLiwgNiBzaWUgMjAxOCBvIDE1OjQ4IEFyZCBCaWVzaGV1dmVs IDxhcmQuYmllc2hldXZlbEBsaW5hcm8ub3JnPiBuYXBpc2HFgihhKToKPgo+IE9uIDYgQXVndXN0 IDIwMTggYXQgMTU6NDEsIE1hcmNpbiBXb2p0YXMgPG13QHNlbWloYWxmLmNvbT4gd3JvdGU6Cj4g PiBIaSBNaWt1bGFzLAo+ID4KPiA+IHBvbi4sIDYgc2llIDIwMTggbyAxNDo0MiBSb2JpbiBNdXJw aHkgPHJvYmluLm11cnBoeUBhcm0uY29tPiBuYXBpc2HFgihhKToKPiA+Pgo+ID4+IE9uIDA2LzA4 LzE4IDExOjI1LCBNaWt1bGFzIFBhdG9ja2Egd3JvdGU6Cj4gPj4gWy4uLl0KPiA+PiA+PiBOb25l IG9mIHRoaXMgZXhwbGFpbnMgd2h5IHNvbWUgdHJhbnNhY3Rpb25zIGZhaWwgdG8gbWFrZSBpdCBh Y3Jvc3MKPiA+PiA+PiBlbnRpcmVseS4gVGhlIG92ZXJsYXBwaW5nIHdyaXRlcyBpbiBxdWVzdGlv biB3cml0ZSB0aGUgc2FtZSBkYXRhIHRvCj4gPj4gPj4gdGhlIG1lbW9yeSBsb2NhdGlvbnMgdGhh dCBhcmUgY292ZXJlZCBieSBib3RoLCBhbmQgc28gdGhlIG9yZGVyaW5nIGluCj4gPj4gPj4gd2hp Y2ggdGhlIHRyYW5zYWN0aW9ucyBhcmUgcmVjZWl2ZWQgc2hvdWxkIG5vdCBhZmZlY3QgdGhlIG91 dGNvbWUuCj4gPj4gPgo+ID4+ID4gWW91J3JlIHJpZ2h0IHRoYXQgdGhlIGNvcnJ1cHRpb24gY291 bGRuJ3QgYmUgZXhwbGFpbmVkIGp1c3QgYnkgcmVvcmRlcmluZwo+ID4+ID4gd3JpdGVzLiBNeSBo eXBvdGhlc2lzIGlzIHRoYXQgdGhlIFBDSWUgY29udHJvbGxlciB0cmllcyB0byBkaXNhbWJpZ3Vh dGUKPiA+PiA+IHRoZSBvdmVybGFwcGluZyB3cml0ZXMsIGJ1dCB0aGUgZGlzYW1iaWd1YXRpb24g bG9naWMgd2FzIG5vdCB0ZXN0ZWQgYW5kIGl0Cj4gPj4gPiBpcyBidWdneS4gSWYgdGhlcmUncyBh IGJhcnJpZXIgYmV0d2VlbiB0aGUgb3ZlcmxhcHBpbmcgd3JpdGVzLCB0aGUgUENJZQo+ID4+ID4g Y29udHJvbGxlciB3b24ndCBzZWUgYW55IG92ZXJsYXBwaW5nIHdyaXRlcywgc28gaXQgd29uJ3Qg dHJpZ2dlciB0aGUKPiA+PiA+IGZhdWx0eSBkaXNhbWJpZ3VhdGlvbiBsb2dpYyBhbmQgaXQgd29y a3MuCj4gPj4gPgo+ID4+ID4gQ291bGQgdGhlIEFSTSBlbmdpbmVlcnMgbG9vayBpZiB0aGVyZSdz IHNvbWUgY2hpY2tlbiBiaXQgaW4gQ29ydGV4LUE3Mgo+ID4+ID4gdGhhdCBjb3VsZCBpbnNlcnQg YmFycmllcnMgYmV0d2VlbiBub24tY2FjaGVkIHdyaXRlcyBhdXRvbWF0aWNhbGx5Pwo+ID4+Cj4g Pj4gSSBkb24ndCB0aGluayB0aGVyZSBpcywgYW5kIGV2ZW4gaWYgdGhlcmUgd2FzIEkgaW1hZ2lu ZSBpdCB3b3VsZCBoYXZlIGEKPiA+PiBwcmV0dHkgaGlkZW91cyBlZmZlY3Qgb24gbm9uLWNvaGVy ZW50IERNQSBidWZmZXJzIGFuZCB0aGUgdmFyaW91cyBvdGhlcgo+ID4+IHBsYWNlcyBpbiB3aGlj aCB3ZSBoYXZlIE5vcm1hbC1OQyBtYXBwaW5ncyBvZiBhY3R1YWwgc3lzdGVtIFJBTS4KPiA+Pgo+ ID4+ID4gSSBvYnNlcnZlIHRoZXNlIGtpbmRzIG9mIGNvcnJ1cHRpb25zOgo+ID4+ID4gLSBmYWls aW5nIHRvIHdyaXRlIGEgZmV3IGJ5dGVzCj4gPj4KPiA+PiBUaGF0IGNvdWxkIHBvdGVudGlhbGx5 IGJlIGV4cGxhaW5lZCBieSB0aGUgcmVvcmRlcmluZy9hdG9taWNpdHkgaXNzdWVzCj4gPj4gTWF0 dCBtZW50aW9uZWQsIGkuZS4gdGhlIGxvYWQgaXMgb2JzZXJ2aW5nIHBhcnQgb2YgdGhlIHN0b3Jl LCBiZWZvcmUgdGhlCj4gPj4gc3RvcmUgaGFzIGZ1bGx5IGNvbXBsZXRlZC4KPiA+Pgo+ID4+ID4g LSB3cml0aW5nIGEgZmV3IGJ5dGVzIHRoYXQgd2VyZSB3cml0dGVuIDE2IGJ5dGVzIGJlZm9yZQo+ ID4+ID4gLSB3cml0aW5nIGEgZmV3IGJ5dGVzIHRoYXQgd2VyZSB3cml0dGVuIDE2IGJ5dGVzIGFm dGVyCj4gPj4KPiA+PiBUaG9zZSBzb3VuZCBtb3JlIGxpa2UgdGhlIGludGVyY29ubmVjdCBvciBy b290IGNvbXBsZXggaWdub3JpbmcgdGhlIGJ5dGUKPiA+PiBzdHJvYmVzIG9uIGFuIHVuYWxpZ25l ZCBidXJzdCwgb2Ygd2hpY2ggSSB0aGluayB0aGUgc2ltcGxpc3RpYyB2aWV3Cj4gPj4gd291bGQg YmUgIml0J3MgYnJva2VuIi4KPiA+Pgo+ID4+IEZXSVcgSSBzdHVjayBteSBvbGQgTnZpZGlhIDc2 MDBHVCBjYXJkIGluIG15IEFybSBKdW5vIHIyIGJvYXJkICgyeAo+ID4+IENvcnRleC1BNzIpLCBi dWlsdCB5b3VyIHRlc3QgcHJvZ3JhbSBuYXRpdmVseSB3aXRoIEdDQyA4LjEuMSBhdCAtTzIsIGFu ZAo+ID4+IGl0J3Mgc3RpbGwgaGFwcGlseSBmbGlja2VyaW5nIHBpeGVscyBpbiB0aGUgY29ybmVy IG9mIHRoZSBjb25zb2xlIGFmdGVyCj4gPj4gbmVhcmx5IGFuIGhvdXIgKGluIHBhcmFsbGVsIHdp dGggc29tZSBpcGVyZjMganVzdCB0byBlbnN1cmUgcGxlbnR5IG9mCj4gPj4gUENJZSB0cmFmZmlj KS4gSSB3b3VsZCBzdHJvbmdseSBzdXNwZWN0IHRoaXMgaXNzdWUgaXMgcGFydGljdWxhciB0bwo+ ID4+IEFybWFkYSA4aywgc28gaXRzJyBwcm9iYWJseSBvbmUgZm9yIHRoZSBNYXJ2ZWxsIGZvbGtz IHRvIHRha2UgYSBjbG9zZXIKPiA+PiBsb29rIGF0IC0gSSBiZWxpZXZlIHNvbWUgcHJldmlvdXMg aW50ZXJjb25uZWN0IGlzc3VlcyBvbiB0aG9zZSBTb0NzIHdlcmUKPiA+PiBhY3R1YWxseSBmaXhh YmxlIGluIGZpcm13YXJlLgo+ID4+Cj4gPj4KPiA+Cj4gPiBPbiBteSBNYWNjaGlhdG8gSSB1c2Ug R1Q2MzAgY2FyZCAobnV2ZWF1IGRyaXZlcikgKyBkZWJpYW4gKyB4ZmNlCj4gPiBkZXNrdG9wIGFu ZCBpbiBkdWFsIG1vbml0b3IgbW9kZSwgSSBjb3VsZCBydW4gYSBjb3VwbGUgb2YgMTA4MHAKPiA+ IHN0cmVhbXMuIEFsbCBzbW9vdGggYW5kIEkndmUgbmV2ZXIgbm90aWNlZCBhbnkgaW1hZ2UgY29y cnVwdGlvbgo+ID4gd2hhdHNvZXZlciAoSSBzcGVudCBhIGxvdCBvZiB0aW1lIGluIGZyb250IG9m IHN1Y2ggc2V0dXApLiBKdXN0IHRvIGJlCj4gPiBvbiBhIHNhZmUgc2lkZSwgY2FuIHlvdSBzZW5k IG1lIGEgYm9vdGxvZyBhbmQgeW91ciBib2FyZCByZXZpc2lvbj8gSSdkCj4gPiBsaWtlIHRvIHNl ZSB5b3VyIGZpcndhcmUgdmVyc2lvbiBhbmQgdHlwZS4KPiA+Cj4KPiBIaSBNYXJjaW4sCj4KPiBD b3VsZCB5b3UgcGxlYXNlIHRyeSBydW5uaW5nIGhpcyByZXByb2R1Y2VyPwoKVGhpcyBpcyBleGFj dGx5IHdoYXQgSSBwbGFuIHRvIGRvLCBhcyBzb29uIGFzIEkgY2FuIHBsdWcgbXkgR0ZYIGNhcmQK YmFjayB0byB0aGUgYm9hcmQgKHRvbW9ycm93KS4gSnVzdCB0byByZW1haW4gYWxpZ25lZCAtIGlz IGl0IG9rLCBpZiBJCmJvb3QgbXkgZGViaWFuIHdpdGggR1Q2MzAgcGx1Z2dlZCwgY29tcGlsZSB0 aGUgcHJvZ3JhbSB3aXRoIC1PMiBhbmQKc2ltbHBseSBydW4gaXQgb24gL2Rldi9mYjA/CgpCZXN0 IHJlZ2FyZHMsCk1hcmNpbgoKX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX18KbGludXgtYXJtLWtlcm5lbCBtYWlsaW5nIGxpc3QKbGludXgtYXJtLWtlcm5lbEBs aXN0cy5pbmZyYWRlYWQub3JnCmh0dHA6Ly9saXN0cy5pbmZyYWRlYWQub3JnL21haWxtYW4vbGlz dGluZm8vbGludXgtYXJtLWtlcm5lbAo= ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 14:07 ` Marcin Wojtas (?) @ 2018-08-06 14:13 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 14:13 UTC (permalink / raw) To: Marcin Wojtas Cc: Ard Biesheuvel, Robin Murphy, Thomas Petazzoni, Joao Pinto, Catalin Marinas, linux-pci, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, Matt Sealey, Jingoo Han, linux-arm-kernel On Mon, 6 Aug 2018, Marcin Wojtas wrote: > > Hi Marcin, > > > > Could you please try running his reproducer? > > This is exactly what I plan to do, as soon as I can plug my GFX card > back to the board (tomorrow). Just to remain aligned - is it ok, if I > boot my debian with GT630 plugged, compile the program with -O2 and > simlply run it on /dev/fb0? > > Best regards, > Marcin Yes - when you run it, don't switch consoles (it will obviously trigger false warning), don't move the mouse to the upper left corner, and if you want to run it in the long term, turn off console blanking. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 14:13 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 14:13 UTC (permalink / raw) To: linux-arm-kernel On Mon, 6 Aug 2018, Marcin Wojtas wrote: > > Hi Marcin, > > > > Could you please try running his reproducer? > > This is exactly what I plan to do, as soon as I can plug my GFX card > back to the board (tomorrow). Just to remain aligned - is it ok, if I > boot my debian with GT630 plugged, compile the program with -O2 and > simlply run it on /dev/fb0? > > Best regards, > Marcin Yes - when you run it, don't switch consoles (it will obviously trigger false warning), don't move the mouse to the upper left corner, and if you want to run it in the long term, turn off console blanking. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 14:13 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 14:13 UTC (permalink / raw) To: Marcin Wojtas Cc: Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, Catalin Marinas, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, Matt Sealey, linux-pci, Jingoo Han, Robin Murphy, linux-arm-kernel On Mon, 6 Aug 2018, Marcin Wojtas wrote: > > Hi Marcin, > > > > Could you please try running his reproducer? > > This is exactly what I plan to do, as soon as I can plug my GFX card > back to the board (tomorrow). Just to remain aligned - is it ok, if I > boot my debian with GT630 plugged, compile the program with -O2 and > simlply run it on /dev/fb0? > > Best regards, > Marcin Yes - when you run it, don't switch consoles (it will obviously trigger false warning), don't move the mouse to the upper left corner, and if you want to run it in the long term, turn off console blanking. Mikulas _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 12:42 ` Robin Murphy (?) @ 2018-08-06 15:47 ` Ard Biesheuvel -1 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 15:47 UTC (permalink / raw) To: Robin Murphy Cc: Mikulas Patocka, Thomas Petazzoni, Joao Pinto, linux-pci, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, Catalin Marinas, linux-arm-kernel On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: > On 06/08/18 11:25, Mikulas Patocka wrote: > [...] >>> >>> None of this explains why some transactions fail to make it across >>> entirely. The overlapping writes in question write the same data to >>> the memory locations that are covered by both, and so the ordering in >>> which the transactions are received should not affect the outcome. >> >> >> You're right that the corruption couldn't be explained just by reordering >> writes. My hypothesis is that the PCIe controller tries to disambiguate >> the overlapping writes, but the disambiguation logic was not tested and it >> is buggy. If there's a barrier between the overlapping writes, the PCIe >> controller won't see any overlapping writes, so it won't trigger the >> faulty disambiguation logic and it works. >> >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 >> that could insert barriers between non-cached writes automatically? > > > I don't think there is, and even if there was I imagine it would have a > pretty hideous effect on non-coherent DMA buffers and the various other > places in which we have Normal-NC mappings of actual system RAM. > Looking at the A72 manual, there is one chicken bit that looks like it may be related: CPUACTLR_EL1 bit #50: 0 Enables store streaming on NC/GRE memory type. This is the reset value. 1 Disables store streaming on NC/GRE memory type. so putting something like mrs x0, S3_1_C15_C2_0 orr x0, x0, #(1 << 50) msr S3_1_C15_C2_0, x0 in __cpu_setup() would be worth a try. ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 15:47 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 15:47 UTC (permalink / raw) To: linux-arm-kernel On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: > On 06/08/18 11:25, Mikulas Patocka wrote: > [...] >>> >>> None of this explains why some transactions fail to make it across >>> entirely. The overlapping writes in question write the same data to >>> the memory locations that are covered by both, and so the ordering in >>> which the transactions are received should not affect the outcome. >> >> >> You're right that the corruption couldn't be explained just by reordering >> writes. My hypothesis is that the PCIe controller tries to disambiguate >> the overlapping writes, but the disambiguation logic was not tested and it >> is buggy. If there's a barrier between the overlapping writes, the PCIe >> controller won't see any overlapping writes, so it won't trigger the >> faulty disambiguation logic and it works. >> >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 >> that could insert barriers between non-cached writes automatically? > > > I don't think there is, and even if there was I imagine it would have a > pretty hideous effect on non-coherent DMA buffers and the various other > places in which we have Normal-NC mappings of actual system RAM. > Looking at the A72 manual, there is one chicken bit that looks like it may be related: CPUACTLR_EL1 bit #50: 0 Enables store streaming on NC/GRE memory type. This is the reset value. 1 Disables store streaming on NC/GRE memory type. so putting something like mrs x0, S3_1_C15_C2_0 orr x0, x0, #(1 << 50) msr S3_1_C15_C2_0, x0 in __cpu_setup() would be worth a try. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 15:47 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 15:47 UTC (permalink / raw) To: Robin Murphy Cc: Thomas Petazzoni, Joao Pinto, Catalin Marinas, linux-pci, Will Deacon, Russell King, Linux Kernel Mailing List, Mikulas Patocka, Matt Sealey, Jingoo Han, linux-arm-kernel On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: > On 06/08/18 11:25, Mikulas Patocka wrote: > [...] >>> >>> None of this explains why some transactions fail to make it across >>> entirely. The overlapping writes in question write the same data to >>> the memory locations that are covered by both, and so the ordering in >>> which the transactions are received should not affect the outcome. >> >> >> You're right that the corruption couldn't be explained just by reordering >> writes. My hypothesis is that the PCIe controller tries to disambiguate >> the overlapping writes, but the disambiguation logic was not tested and it >> is buggy. If there's a barrier between the overlapping writes, the PCIe >> controller won't see any overlapping writes, so it won't trigger the >> faulty disambiguation logic and it works. >> >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 >> that could insert barriers between non-cached writes automatically? > > > I don't think there is, and even if there was I imagine it would have a > pretty hideous effect on non-coherent DMA buffers and the various other > places in which we have Normal-NC mappings of actual system RAM. > Looking at the A72 manual, there is one chicken bit that looks like it may be related: CPUACTLR_EL1 bit #50: 0 Enables store streaming on NC/GRE memory type. This is the reset value. 1 Disables store streaming on NC/GRE memory type. so putting something like mrs x0, S3_1_C15_C2_0 orr x0, x0, #(1 << 50) msr S3_1_C15_C2_0, x0 in __cpu_setup() would be worth a try. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 15:47 ` Ard Biesheuvel (?) @ 2018-08-06 17:09 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 17:09 UTC (permalink / raw) To: Ard Biesheuvel Cc: Robin Murphy, Thomas Petazzoni, Joao Pinto, linux-pci, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, Catalin Marinas, linux-arm-kernel On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: > > On 06/08/18 11:25, Mikulas Patocka wrote: > > [...] > >>> > >>> None of this explains why some transactions fail to make it across > >>> entirely. The overlapping writes in question write the same data to > >>> the memory locations that are covered by both, and so the ordering in > >>> which the transactions are received should not affect the outcome. > >> > >> > >> You're right that the corruption couldn't be explained just by reordering > >> writes. My hypothesis is that the PCIe controller tries to disambiguate > >> the overlapping writes, but the disambiguation logic was not tested and it > >> is buggy. If there's a barrier between the overlapping writes, the PCIe > >> controller won't see any overlapping writes, so it won't trigger the > >> faulty disambiguation logic and it works. > >> > >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 > >> that could insert barriers between non-cached writes automatically? > > > > > > I don't think there is, and even if there was I imagine it would have a > > pretty hideous effect on non-coherent DMA buffers and the various other > > places in which we have Normal-NC mappings of actual system RAM. > > > > Looking at the A72 manual, there is one chicken bit that looks like it > may be related: > > CPUACTLR_EL1 bit #50: > > 0 Enables store streaming on NC/GRE memory type. This is the reset value. > 1 Disables store streaming on NC/GRE memory type. > > so putting something like > > mrs x0, S3_1_C15_C2_0 > orr x0, x0, #(1 << 50) > msr S3_1_C15_C2_0, x0 > > in __cpu_setup() would be worth a try. It won't boot. But if i write the same value that was read, it also won't boot. I created a simple kernel module that reads this register and it has bit 32 set, all other bits clear. But when I write the same value into it, the core that does the write is stuck in infinite loop. So, it seems that we are writing this register from a wrong place. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 17:09 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 17:09 UTC (permalink / raw) To: linux-arm-kernel On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: > > On 06/08/18 11:25, Mikulas Patocka wrote: > > [...] > >>> > >>> None of this explains why some transactions fail to make it across > >>> entirely. The overlapping writes in question write the same data to > >>> the memory locations that are covered by both, and so the ordering in > >>> which the transactions are received should not affect the outcome. > >> > >> > >> You're right that the corruption couldn't be explained just by reordering > >> writes. My hypothesis is that the PCIe controller tries to disambiguate > >> the overlapping writes, but the disambiguation logic was not tested and it > >> is buggy. If there's a barrier between the overlapping writes, the PCIe > >> controller won't see any overlapping writes, so it won't trigger the > >> faulty disambiguation logic and it works. > >> > >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 > >> that could insert barriers between non-cached writes automatically? > > > > > > I don't think there is, and even if there was I imagine it would have a > > pretty hideous effect on non-coherent DMA buffers and the various other > > places in which we have Normal-NC mappings of actual system RAM. > > > > Looking at the A72 manual, there is one chicken bit that looks like it > may be related: > > CPUACTLR_EL1 bit #50: > > 0 Enables store streaming on NC/GRE memory type. This is the reset value. > 1 Disables store streaming on NC/GRE memory type. > > so putting something like > > mrs x0, S3_1_C15_C2_0 > orr x0, x0, #(1 << 50) > msr S3_1_C15_C2_0, x0 > > in __cpu_setup() would be worth a try. It won't boot. But if i write the same value that was read, it also won't boot. I created a simple kernel module that reads this register and it has bit 32 set, all other bits clear. But when I write the same value into it, the core that does the write is stuck in infinite loop. So, it seems that we are writing this register from a wrong place. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 17:09 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 17:09 UTC (permalink / raw) To: Ard Biesheuvel Cc: Thomas Petazzoni, Joao Pinto, Catalin Marinas, linux-pci, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, Jingoo Han, Robin Murphy, linux-arm-kernel On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: > > On 06/08/18 11:25, Mikulas Patocka wrote: > > [...] > >>> > >>> None of this explains why some transactions fail to make it across > >>> entirely. The overlapping writes in question write the same data to > >>> the memory locations that are covered by both, and so the ordering in > >>> which the transactions are received should not affect the outcome. > >> > >> > >> You're right that the corruption couldn't be explained just by reordering > >> writes. My hypothesis is that the PCIe controller tries to disambiguate > >> the overlapping writes, but the disambiguation logic was not tested and it > >> is buggy. If there's a barrier between the overlapping writes, the PCIe > >> controller won't see any overlapping writes, so it won't trigger the > >> faulty disambiguation logic and it works. > >> > >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 > >> that could insert barriers between non-cached writes automatically? > > > > > > I don't think there is, and even if there was I imagine it would have a > > pretty hideous effect on non-coherent DMA buffers and the various other > > places in which we have Normal-NC mappings of actual system RAM. > > > > Looking at the A72 manual, there is one chicken bit that looks like it > may be related: > > CPUACTLR_EL1 bit #50: > > 0 Enables store streaming on NC/GRE memory type. This is the reset value. > 1 Disables store streaming on NC/GRE memory type. > > so putting something like > > mrs x0, S3_1_C15_C2_0 > orr x0, x0, #(1 << 50) > msr S3_1_C15_C2_0, x0 > > in __cpu_setup() would be worth a try. It won't boot. But if i write the same value that was read, it also won't boot. I created a simple kernel module that reads this register and it has bit 32 set, all other bits clear. But when I write the same value into it, the core that does the write is stuck in infinite loop. So, it seems that we are writing this register from a wrong place. Mikulas _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 17:09 ` Mikulas Patocka (?) @ 2018-08-06 17:21 ` Ard Biesheuvel -1 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 17:21 UTC (permalink / raw) To: Mikulas Patocka Cc: Robin Murphy, Thomas Petazzoni, Joao Pinto, linux-pci, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, Catalin Marinas, linux-arm-kernel On 6 August 2018 at 19:09, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > >> On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: >> > On 06/08/18 11:25, Mikulas Patocka wrote: >> > [...] >> >>> >> >>> None of this explains why some transactions fail to make it across >> >>> entirely. The overlapping writes in question write the same data to >> >>> the memory locations that are covered by both, and so the ordering in >> >>> which the transactions are received should not affect the outcome. >> >> >> >> >> >> You're right that the corruption couldn't be explained just by reordering >> >> writes. My hypothesis is that the PCIe controller tries to disambiguate >> >> the overlapping writes, but the disambiguation logic was not tested and it >> >> is buggy. If there's a barrier between the overlapping writes, the PCIe >> >> controller won't see any overlapping writes, so it won't trigger the >> >> faulty disambiguation logic and it works. >> >> >> >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 >> >> that could insert barriers between non-cached writes automatically? >> > >> > >> > I don't think there is, and even if there was I imagine it would have a >> > pretty hideous effect on non-coherent DMA buffers and the various other >> > places in which we have Normal-NC mappings of actual system RAM. >> > >> >> Looking at the A72 manual, there is one chicken bit that looks like it >> may be related: >> >> CPUACTLR_EL1 bit #50: >> >> 0 Enables store streaming on NC/GRE memory type. This is the reset value. >> 1 Disables store streaming on NC/GRE memory type. >> >> so putting something like >> >> mrs x0, S3_1_C15_C2_0 >> orr x0, x0, #(1 << 50) >> msr S3_1_C15_C2_0, x0 >> >> in __cpu_setup() would be worth a try. > > It won't boot. > > But if i write the same value that was read, it also won't boot. > > I created a simple kernel module that reads this register and it has bit > 32 set, all other bits clear. But when I write the same value into it, the > core that does the write is stuck in infinite loop. > > So, it seems that we are writing this register from a wrong place. > Ah, my bad. I didn't look closely enough at the description: """ The accessibility to the CPUACTLR_EL1 by Exception level is: EL0 - EL1(NS) RW (a) EL1(S) RW (a) EL2 RW (b) EL3(SCR.NS = 1) RW EL3(SCR.NS = 0) RW (a) Write access if ACTLR_EL3.CPUACTLR is 1 and ACTLR_EL2.CPUACTLR is 1, or ACTLR_EL3.CPUACTLR is 1 and SCR.NS is 0. """ so you'll have to do this from ARM Trusted Firmware. If you're comfortable rebuilding that: diff --git a/include/lib/cpus/aarch64/cortex_a72.h b/include/lib/cpus/aarch64/cortex_a72.h index bfd64918625b..a7b8cf4be0c6 100644 --- a/include/lib/cpus/aarch64/cortex_a72.h +++ b/include/lib/cpus/aarch64/cortex_a72.h @@ -31,6 +31,7 @@ #define CORTEX_A72_ACTLR_EL1 S3_1_C15_C2_0 #define CORTEX_A72_ACTLR_DISABLE_L1_DCACHE_HW_PFTCH (1 << 56) +#define CORTEX_A72_ACTLR_DIS_NC_GRE_STORE_STREAMING (1 << 50) #define CORTEX_A72_ACTLR_NO_ALLOC_WBWA (1 << 49) #define CORTEX_A72_ACTLR_DCC_AS_DCCI (1 << 44) #define CORTEX_A72_ACTLR_EL1_DIS_INSTR_PREFETCH (1 << 32) diff --git a/lib/cpus/aarch64/cortex_a72.S b/lib/cpus/aarch64/cortex_a72.S index 55e508678284..5914d6ee3ba6 100644 --- a/lib/cpus/aarch64/cortex_a72.S +++ b/lib/cpus/aarch64/cortex_a72.S @@ -133,6 +133,15 @@ func cortex_a72_reset_func orr x0, x0, #CORTEX_A72_ECTLR_SMP_BIT msr CORTEX_A72_ECTLR_EL1, x0 isb + + /* --------------------------------------------- + * Disables store streaming on NC/GRE memory type. + * --------------------------------------------- + */ + mrs x0, CORTEX_A72_ACTLR_EL1 + orr x0, x0, #CORTEX_A72_ACTLR_DIS_NC_GRE_STORE_STREAMING + msr CORTEX_A72_ACTLR_EL1, x0 + isb ret x19 endfunc cortex_a72_reset_func ^ permalink raw reply related [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 17:21 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 17:21 UTC (permalink / raw) To: linux-arm-kernel On 6 August 2018 at 19:09, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > >> On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: >> > On 06/08/18 11:25, Mikulas Patocka wrote: >> > [...] >> >>> >> >>> None of this explains why some transactions fail to make it across >> >>> entirely. The overlapping writes in question write the same data to >> >>> the memory locations that are covered by both, and so the ordering in >> >>> which the transactions are received should not affect the outcome. >> >> >> >> >> >> You're right that the corruption couldn't be explained just by reordering >> >> writes. My hypothesis is that the PCIe controller tries to disambiguate >> >> the overlapping writes, but the disambiguation logic was not tested and it >> >> is buggy. If there's a barrier between the overlapping writes, the PCIe >> >> controller won't see any overlapping writes, so it won't trigger the >> >> faulty disambiguation logic and it works. >> >> >> >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 >> >> that could insert barriers between non-cached writes automatically? >> > >> > >> > I don't think there is, and even if there was I imagine it would have a >> > pretty hideous effect on non-coherent DMA buffers and the various other >> > places in which we have Normal-NC mappings of actual system RAM. >> > >> >> Looking at the A72 manual, there is one chicken bit that looks like it >> may be related: >> >> CPUACTLR_EL1 bit #50: >> >> 0 Enables store streaming on NC/GRE memory type. This is the reset value. >> 1 Disables store streaming on NC/GRE memory type. >> >> so putting something like >> >> mrs x0, S3_1_C15_C2_0 >> orr x0, x0, #(1 << 50) >> msr S3_1_C15_C2_0, x0 >> >> in __cpu_setup() would be worth a try. > > It won't boot. > > But if i write the same value that was read, it also won't boot. > > I created a simple kernel module that reads this register and it has bit > 32 set, all other bits clear. But when I write the same value into it, the > core that does the write is stuck in infinite loop. > > So, it seems that we are writing this register from a wrong place. > Ah, my bad. I didn't look closely enough at the description: """ The accessibility to the CPUACTLR_EL1 by Exception level is: EL0 - EL1(NS) RW (a) EL1(S) RW (a) EL2 RW (b) EL3(SCR.NS = 1) RW EL3(SCR.NS = 0) RW (a) Write access if ACTLR_EL3.CPUACTLR is 1 and ACTLR_EL2.CPUACTLR is 1, or ACTLR_EL3.CPUACTLR is 1 and SCR.NS is 0. """ so you'll have to do this from ARM Trusted Firmware. If you're comfortable rebuilding that: diff --git a/include/lib/cpus/aarch64/cortex_a72.h b/include/lib/cpus/aarch64/cortex_a72.h index bfd64918625b..a7b8cf4be0c6 100644 --- a/include/lib/cpus/aarch64/cortex_a72.h +++ b/include/lib/cpus/aarch64/cortex_a72.h @@ -31,6 +31,7 @@ #define CORTEX_A72_ACTLR_EL1 S3_1_C15_C2_0 #define CORTEX_A72_ACTLR_DISABLE_L1_DCACHE_HW_PFTCH (1 << 56) +#define CORTEX_A72_ACTLR_DIS_NC_GRE_STORE_STREAMING (1 << 50) #define CORTEX_A72_ACTLR_NO_ALLOC_WBWA (1 << 49) #define CORTEX_A72_ACTLR_DCC_AS_DCCI (1 << 44) #define CORTEX_A72_ACTLR_EL1_DIS_INSTR_PREFETCH (1 << 32) diff --git a/lib/cpus/aarch64/cortex_a72.S b/lib/cpus/aarch64/cortex_a72.S index 55e508678284..5914d6ee3ba6 100644 --- a/lib/cpus/aarch64/cortex_a72.S +++ b/lib/cpus/aarch64/cortex_a72.S @@ -133,6 +133,15 @@ func cortex_a72_reset_func orr x0, x0, #CORTEX_A72_ECTLR_SMP_BIT msr CORTEX_A72_ECTLR_EL1, x0 isb + + /* --------------------------------------------- + * Disables store streaming on NC/GRE memory type. + * --------------------------------------------- + */ + mrs x0, CORTEX_A72_ACTLR_EL1 + orr x0, x0, #CORTEX_A72_ACTLR_DIS_NC_GRE_STORE_STREAMING + msr CORTEX_A72_ACTLR_EL1, x0 + isb ret x19 endfunc cortex_a72_reset_func ^ permalink raw reply related [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 17:21 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 17:21 UTC (permalink / raw) To: Mikulas Patocka Cc: Thomas Petazzoni, Joao Pinto, Catalin Marinas, linux-pci, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, Jingoo Han, Robin Murphy, linux-arm-kernel On 6 August 2018 at 19:09, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > >> On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: >> > On 06/08/18 11:25, Mikulas Patocka wrote: >> > [...] >> >>> >> >>> None of this explains why some transactions fail to make it across >> >>> entirely. The overlapping writes in question write the same data to >> >>> the memory locations that are covered by both, and so the ordering in >> >>> which the transactions are received should not affect the outcome. >> >> >> >> >> >> You're right that the corruption couldn't be explained just by reordering >> >> writes. My hypothesis is that the PCIe controller tries to disambiguate >> >> the overlapping writes, but the disambiguation logic was not tested and it >> >> is buggy. If there's a barrier between the overlapping writes, the PCIe >> >> controller won't see any overlapping writes, so it won't trigger the >> >> faulty disambiguation logic and it works. >> >> >> >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 >> >> that could insert barriers between non-cached writes automatically? >> > >> > >> > I don't think there is, and even if there was I imagine it would have a >> > pretty hideous effect on non-coherent DMA buffers and the various other >> > places in which we have Normal-NC mappings of actual system RAM. >> > >> >> Looking at the A72 manual, there is one chicken bit that looks like it >> may be related: >> >> CPUACTLR_EL1 bit #50: >> >> 0 Enables store streaming on NC/GRE memory type. This is the reset value. >> 1 Disables store streaming on NC/GRE memory type. >> >> so putting something like >> >> mrs x0, S3_1_C15_C2_0 >> orr x0, x0, #(1 << 50) >> msr S3_1_C15_C2_0, x0 >> >> in __cpu_setup() would be worth a try. > > It won't boot. > > But if i write the same value that was read, it also won't boot. > > I created a simple kernel module that reads this register and it has bit > 32 set, all other bits clear. But when I write the same value into it, the > core that does the write is stuck in infinite loop. > > So, it seems that we are writing this register from a wrong place. > Ah, my bad. I didn't look closely enough at the description: """ The accessibility to the CPUACTLR_EL1 by Exception level is: EL0 - EL1(NS) RW (a) EL1(S) RW (a) EL2 RW (b) EL3(SCR.NS = 1) RW EL3(SCR.NS = 0) RW (a) Write access if ACTLR_EL3.CPUACTLR is 1 and ACTLR_EL2.CPUACTLR is 1, or ACTLR_EL3.CPUACTLR is 1 and SCR.NS is 0. """ so you'll have to do this from ARM Trusted Firmware. If you're comfortable rebuilding that: diff --git a/include/lib/cpus/aarch64/cortex_a72.h b/include/lib/cpus/aarch64/cortex_a72.h index bfd64918625b..a7b8cf4be0c6 100644 --- a/include/lib/cpus/aarch64/cortex_a72.h +++ b/include/lib/cpus/aarch64/cortex_a72.h @@ -31,6 +31,7 @@ #define CORTEX_A72_ACTLR_EL1 S3_1_C15_C2_0 #define CORTEX_A72_ACTLR_DISABLE_L1_DCACHE_HW_PFTCH (1 << 56) +#define CORTEX_A72_ACTLR_DIS_NC_GRE_STORE_STREAMING (1 << 50) #define CORTEX_A72_ACTLR_NO_ALLOC_WBWA (1 << 49) #define CORTEX_A72_ACTLR_DCC_AS_DCCI (1 << 44) #define CORTEX_A72_ACTLR_EL1_DIS_INSTR_PREFETCH (1 << 32) diff --git a/lib/cpus/aarch64/cortex_a72.S b/lib/cpus/aarch64/cortex_a72.S index 55e508678284..5914d6ee3ba6 100644 --- a/lib/cpus/aarch64/cortex_a72.S +++ b/lib/cpus/aarch64/cortex_a72.S @@ -133,6 +133,15 @@ func cortex_a72_reset_func orr x0, x0, #CORTEX_A72_ECTLR_SMP_BIT msr CORTEX_A72_ECTLR_EL1, x0 isb + + /* --------------------------------------------- + * Disables store streaming on NC/GRE memory type. + * --------------------------------------------- + */ + mrs x0, CORTEX_A72_ACTLR_EL1 + orr x0, x0, #CORTEX_A72_ACTLR_DIS_NC_GRE_STORE_STREAMING + msr CORTEX_A72_ACTLR_EL1, x0 + isb ret x19 endfunc cortex_a72_reset_func _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 17:21 ` Ard Biesheuvel (?) @ 2018-08-06 19:54 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 19:54 UTC (permalink / raw) To: Ard Biesheuvel Cc: Robin Murphy, Thomas Petazzoni, Joao Pinto, linux-pci, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, Catalin Marinas, linux-arm-kernel On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > On 6 August 2018 at 19:09, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > > > >> On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: > >> > On 06/08/18 11:25, Mikulas Patocka wrote: > >> > [...] > >> >>> > >> >>> None of this explains why some transactions fail to make it across > >> >>> entirely. The overlapping writes in question write the same data to > >> >>> the memory locations that are covered by both, and so the ordering in > >> >>> which the transactions are received should not affect the outcome. > >> >> > >> >> > >> >> You're right that the corruption couldn't be explained just by reordering > >> >> writes. My hypothesis is that the PCIe controller tries to disambiguate > >> >> the overlapping writes, but the disambiguation logic was not tested and it > >> >> is buggy. If there's a barrier between the overlapping writes, the PCIe > >> >> controller won't see any overlapping writes, so it won't trigger the > >> >> faulty disambiguation logic and it works. > >> >> > >> >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 > >> >> that could insert barriers between non-cached writes automatically? > >> > > >> > > >> > I don't think there is, and even if there was I imagine it would have a > >> > pretty hideous effect on non-coherent DMA buffers and the various other > >> > places in which we have Normal-NC mappings of actual system RAM. > >> > > >> > >> Looking at the A72 manual, there is one chicken bit that looks like it > >> may be related: > >> > >> CPUACTLR_EL1 bit #50: > >> > >> 0 Enables store streaming on NC/GRE memory type. This is the reset value. > >> 1 Disables store streaming on NC/GRE memory type. > >> > >> so putting something like > >> > >> mrs x0, S3_1_C15_C2_0 > >> orr x0, x0, #(1 << 50) > >> msr S3_1_C15_C2_0, x0 > >> > >> in __cpu_setup() would be worth a try. > > > > It won't boot. > > > > But if i write the same value that was read, it also won't boot. > > > > I created a simple kernel module that reads this register and it has bit > > 32 set, all other bits clear. But when I write the same value into it, the > > core that does the write is stuck in infinite loop. > > > > So, it seems that we are writing this register from a wrong place. > > > > Ah, my bad. I didn't look closely enough at the description: > > """ > The accessibility to the CPUACTLR_EL1 by Exception level is: > > EL0 - > EL1(NS) RW (a) > EL1(S) RW (a) > EL2 RW (b) > EL3(SCR.NS = 1) RW > EL3(SCR.NS = 0) RW > > (a) Write access if ACTLR_EL3.CPUACTLR is 1 and ACTLR_EL2.CPUACTLR is > 1, or ACTLR_EL3.CPUACTLR is 1 and SCR.NS is 0. > """ > > so you'll have to do this from ARM Trusted Firmware. If you're > comfortable rebuilding that: > > diff --git a/include/lib/cpus/aarch64/cortex_a72.h > b/include/lib/cpus/aarch64/cortex_a72.h > index bfd64918625b..a7b8cf4be0c6 100644 > --- a/include/lib/cpus/aarch64/cortex_a72.h > +++ b/include/lib/cpus/aarch64/cortex_a72.h > @@ -31,6 +31,7 @@ > #define CORTEX_A72_ACTLR_EL1 S3_1_C15_C2_0 > > #define CORTEX_A72_ACTLR_DISABLE_L1_DCACHE_HW_PFTCH (1 << 56) > +#define CORTEX_A72_ACTLR_DIS_NC_GRE_STORE_STREAMING (1 << 50) > #define CORTEX_A72_ACTLR_NO_ALLOC_WBWA (1 << 49) > #define CORTEX_A72_ACTLR_DCC_AS_DCCI (1 << 44) > #define CORTEX_A72_ACTLR_EL1_DIS_INSTR_PREFETCH (1 << 32) > diff --git a/lib/cpus/aarch64/cortex_a72.S b/lib/cpus/aarch64/cortex_a72.S > index 55e508678284..5914d6ee3ba6 100644 > --- a/lib/cpus/aarch64/cortex_a72.S > +++ b/lib/cpus/aarch64/cortex_a72.S > @@ -133,6 +133,15 @@ func cortex_a72_reset_func > orr x0, x0, #CORTEX_A72_ECTLR_SMP_BIT > msr CORTEX_A72_ECTLR_EL1, x0 > isb > + > + /* --------------------------------------------- > + * Disables store streaming on NC/GRE memory type. > + * --------------------------------------------- > + */ > + mrs x0, CORTEX_A72_ACTLR_EL1 > + orr x0, x0, #CORTEX_A72_ACTLR_DIS_NC_GRE_STORE_STREAMING > + msr CORTEX_A72_ACTLR_EL1, x0 > + isb > ret x19 > endfunc cortex_a72_reset_func Unfortunatelly, it doesn't work. I verified that the bit is set after booting Linux, but the memcpy corruption was still present. I also tried the other chicken bits, it slowed down the system noticeably, but had no effect on the memcpy corruption. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 19:54 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 19:54 UTC (permalink / raw) To: linux-arm-kernel On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > On 6 August 2018 at 19:09, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > > > >> On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: > >> > On 06/08/18 11:25, Mikulas Patocka wrote: > >> > [...] > >> >>> > >> >>> None of this explains why some transactions fail to make it across > >> >>> entirely. The overlapping writes in question write the same data to > >> >>> the memory locations that are covered by both, and so the ordering in > >> >>> which the transactions are received should not affect the outcome. > >> >> > >> >> > >> >> You're right that the corruption couldn't be explained just by reordering > >> >> writes. My hypothesis is that the PCIe controller tries to disambiguate > >> >> the overlapping writes, but the disambiguation logic was not tested and it > >> >> is buggy. If there's a barrier between the overlapping writes, the PCIe > >> >> controller won't see any overlapping writes, so it won't trigger the > >> >> faulty disambiguation logic and it works. > >> >> > >> >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 > >> >> that could insert barriers between non-cached writes automatically? > >> > > >> > > >> > I don't think there is, and even if there was I imagine it would have a > >> > pretty hideous effect on non-coherent DMA buffers and the various other > >> > places in which we have Normal-NC mappings of actual system RAM. > >> > > >> > >> Looking at the A72 manual, there is one chicken bit that looks like it > >> may be related: > >> > >> CPUACTLR_EL1 bit #50: > >> > >> 0 Enables store streaming on NC/GRE memory type. This is the reset value. > >> 1 Disables store streaming on NC/GRE memory type. > >> > >> so putting something like > >> > >> mrs x0, S3_1_C15_C2_0 > >> orr x0, x0, #(1 << 50) > >> msr S3_1_C15_C2_0, x0 > >> > >> in __cpu_setup() would be worth a try. > > > > It won't boot. > > > > But if i write the same value that was read, it also won't boot. > > > > I created a simple kernel module that reads this register and it has bit > > 32 set, all other bits clear. But when I write the same value into it, the > > core that does the write is stuck in infinite loop. > > > > So, it seems that we are writing this register from a wrong place. > > > > Ah, my bad. I didn't look closely enough at the description: > > """ > The accessibility to the CPUACTLR_EL1 by Exception level is: > > EL0 - > EL1(NS) RW (a) > EL1(S) RW (a) > EL2 RW (b) > EL3(SCR.NS = 1) RW > EL3(SCR.NS = 0) RW > > (a) Write access if ACTLR_EL3.CPUACTLR is 1 and ACTLR_EL2.CPUACTLR is > 1, or ACTLR_EL3.CPUACTLR is 1 and SCR.NS is 0. > """ > > so you'll have to do this from ARM Trusted Firmware. If you're > comfortable rebuilding that: > > diff --git a/include/lib/cpus/aarch64/cortex_a72.h > b/include/lib/cpus/aarch64/cortex_a72.h > index bfd64918625b..a7b8cf4be0c6 100644 > --- a/include/lib/cpus/aarch64/cortex_a72.h > +++ b/include/lib/cpus/aarch64/cortex_a72.h > @@ -31,6 +31,7 @@ > #define CORTEX_A72_ACTLR_EL1 S3_1_C15_C2_0 > > #define CORTEX_A72_ACTLR_DISABLE_L1_DCACHE_HW_PFTCH (1 << 56) > +#define CORTEX_A72_ACTLR_DIS_NC_GRE_STORE_STREAMING (1 << 50) > #define CORTEX_A72_ACTLR_NO_ALLOC_WBWA (1 << 49) > #define CORTEX_A72_ACTLR_DCC_AS_DCCI (1 << 44) > #define CORTEX_A72_ACTLR_EL1_DIS_INSTR_PREFETCH (1 << 32) > diff --git a/lib/cpus/aarch64/cortex_a72.S b/lib/cpus/aarch64/cortex_a72.S > index 55e508678284..5914d6ee3ba6 100644 > --- a/lib/cpus/aarch64/cortex_a72.S > +++ b/lib/cpus/aarch64/cortex_a72.S > @@ -133,6 +133,15 @@ func cortex_a72_reset_func > orr x0, x0, #CORTEX_A72_ECTLR_SMP_BIT > msr CORTEX_A72_ECTLR_EL1, x0 > isb > + > + /* --------------------------------------------- > + * Disables store streaming on NC/GRE memory type. > + * --------------------------------------------- > + */ > + mrs x0, CORTEX_A72_ACTLR_EL1 > + orr x0, x0, #CORTEX_A72_ACTLR_DIS_NC_GRE_STORE_STREAMING > + msr CORTEX_A72_ACTLR_EL1, x0 > + isb > ret x19 > endfunc cortex_a72_reset_func Unfortunatelly, it doesn't work. I verified that the bit is set after booting Linux, but the memcpy corruption was still present. I also tried the other chicken bits, it slowed down the system noticeably, but had no effect on the memcpy corruption. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 19:54 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 19:54 UTC (permalink / raw) To: Ard Biesheuvel Cc: Thomas Petazzoni, Joao Pinto, Catalin Marinas, linux-pci, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, Jingoo Han, Robin Murphy, linux-arm-kernel On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > On 6 August 2018 at 19:09, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > > > >> On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: > >> > On 06/08/18 11:25, Mikulas Patocka wrote: > >> > [...] > >> >>> > >> >>> None of this explains why some transactions fail to make it across > >> >>> entirely. The overlapping writes in question write the same data to > >> >>> the memory locations that are covered by both, and so the ordering in > >> >>> which the transactions are received should not affect the outcome. > >> >> > >> >> > >> >> You're right that the corruption couldn't be explained just by reordering > >> >> writes. My hypothesis is that the PCIe controller tries to disambiguate > >> >> the overlapping writes, but the disambiguation logic was not tested and it > >> >> is buggy. If there's a barrier between the overlapping writes, the PCIe > >> >> controller won't see any overlapping writes, so it won't trigger the > >> >> faulty disambiguation logic and it works. > >> >> > >> >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 > >> >> that could insert barriers between non-cached writes automatically? > >> > > >> > > >> > I don't think there is, and even if there was I imagine it would have a > >> > pretty hideous effect on non-coherent DMA buffers and the various other > >> > places in which we have Normal-NC mappings of actual system RAM. > >> > > >> > >> Looking at the A72 manual, there is one chicken bit that looks like it > >> may be related: > >> > >> CPUACTLR_EL1 bit #50: > >> > >> 0 Enables store streaming on NC/GRE memory type. This is the reset value. > >> 1 Disables store streaming on NC/GRE memory type. > >> > >> so putting something like > >> > >> mrs x0, S3_1_C15_C2_0 > >> orr x0, x0, #(1 << 50) > >> msr S3_1_C15_C2_0, x0 > >> > >> in __cpu_setup() would be worth a try. > > > > It won't boot. > > > > But if i write the same value that was read, it also won't boot. > > > > I created a simple kernel module that reads this register and it has bit > > 32 set, all other bits clear. But when I write the same value into it, the > > core that does the write is stuck in infinite loop. > > > > So, it seems that we are writing this register from a wrong place. > > > > Ah, my bad. I didn't look closely enough at the description: > > """ > The accessibility to the CPUACTLR_EL1 by Exception level is: > > EL0 - > EL1(NS) RW (a) > EL1(S) RW (a) > EL2 RW (b) > EL3(SCR.NS = 1) RW > EL3(SCR.NS = 0) RW > > (a) Write access if ACTLR_EL3.CPUACTLR is 1 and ACTLR_EL2.CPUACTLR is > 1, or ACTLR_EL3.CPUACTLR is 1 and SCR.NS is 0. > """ > > so you'll have to do this from ARM Trusted Firmware. If you're > comfortable rebuilding that: > > diff --git a/include/lib/cpus/aarch64/cortex_a72.h > b/include/lib/cpus/aarch64/cortex_a72.h > index bfd64918625b..a7b8cf4be0c6 100644 > --- a/include/lib/cpus/aarch64/cortex_a72.h > +++ b/include/lib/cpus/aarch64/cortex_a72.h > @@ -31,6 +31,7 @@ > #define CORTEX_A72_ACTLR_EL1 S3_1_C15_C2_0 > > #define CORTEX_A72_ACTLR_DISABLE_L1_DCACHE_HW_PFTCH (1 << 56) > +#define CORTEX_A72_ACTLR_DIS_NC_GRE_STORE_STREAMING (1 << 50) > #define CORTEX_A72_ACTLR_NO_ALLOC_WBWA (1 << 49) > #define CORTEX_A72_ACTLR_DCC_AS_DCCI (1 << 44) > #define CORTEX_A72_ACTLR_EL1_DIS_INSTR_PREFETCH (1 << 32) > diff --git a/lib/cpus/aarch64/cortex_a72.S b/lib/cpus/aarch64/cortex_a72.S > index 55e508678284..5914d6ee3ba6 100644 > --- a/lib/cpus/aarch64/cortex_a72.S > +++ b/lib/cpus/aarch64/cortex_a72.S > @@ -133,6 +133,15 @@ func cortex_a72_reset_func > orr x0, x0, #CORTEX_A72_ECTLR_SMP_BIT > msr CORTEX_A72_ECTLR_EL1, x0 > isb > + > + /* --------------------------------------------- > + * Disables store streaming on NC/GRE memory type. > + * --------------------------------------------- > + */ > + mrs x0, CORTEX_A72_ACTLR_EL1 > + orr x0, x0, #CORTEX_A72_ACTLR_DIS_NC_GRE_STORE_STREAMING > + msr CORTEX_A72_ACTLR_EL1, x0 > + isb > ret x19 > endfunc cortex_a72_reset_func Unfortunatelly, it doesn't work. I verified that the bit is set after booting Linux, but the memcpy corruption was still present. I also tried the other chicken bits, it slowed down the system noticeably, but had no effect on the memcpy corruption. Mikulas _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 19:54 ` Mikulas Patocka (?) @ 2018-08-06 20:11 ` Ard Biesheuvel -1 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 20:11 UTC (permalink / raw) To: Mikulas Patocka Cc: Robin Murphy, Thomas Petazzoni, Joao Pinto, linux-pci, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, Catalin Marinas, linux-arm-kernel On 6 August 2018 at 21:54, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > >> On 6 August 2018 at 19:09, Mikulas Patocka <mpatocka@redhat.com> wrote: >> > >> > >> > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: >> > >> >> On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: >> >> > On 06/08/18 11:25, Mikulas Patocka wrote: >> >> > [...] >> >> >>> >> >> >>> None of this explains why some transactions fail to make it across >> >> >>> entirely. The overlapping writes in question write the same data to >> >> >>> the memory locations that are covered by both, and so the ordering in >> >> >>> which the transactions are received should not affect the outcome. >> >> >> >> >> >> >> >> >> You're right that the corruption couldn't be explained just by reordering >> >> >> writes. My hypothesis is that the PCIe controller tries to disambiguate >> >> >> the overlapping writes, but the disambiguation logic was not tested and it >> >> >> is buggy. If there's a barrier between the overlapping writes, the PCIe >> >> >> controller won't see any overlapping writes, so it won't trigger the >> >> >> faulty disambiguation logic and it works. >> >> >> >> >> >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 >> >> >> that could insert barriers between non-cached writes automatically? >> >> > >> >> > >> >> > I don't think there is, and even if there was I imagine it would have a >> >> > pretty hideous effect on non-coherent DMA buffers and the various other >> >> > places in which we have Normal-NC mappings of actual system RAM. >> >> > >> >> >> >> Looking at the A72 manual, there is one chicken bit that looks like it >> >> may be related: >> >> >> >> CPUACTLR_EL1 bit #50: >> >> >> >> 0 Enables store streaming on NC/GRE memory type. This is the reset value. >> >> 1 Disables store streaming on NC/GRE memory type. >> >> >> >> so putting something like >> >> >> >> mrs x0, S3_1_C15_C2_0 >> >> orr x0, x0, #(1 << 50) >> >> msr S3_1_C15_C2_0, x0 >> >> >> >> in __cpu_setup() would be worth a try. >> > >> > It won't boot. >> > >> > But if i write the same value that was read, it also won't boot. >> > >> > I created a simple kernel module that reads this register and it has bit >> > 32 set, all other bits clear. But when I write the same value into it, the >> > core that does the write is stuck in infinite loop. >> > >> > So, it seems that we are writing this register from a wrong place. >> > >> >> Ah, my bad. I didn't look closely enough at the description: >> >> """ >> The accessibility to the CPUACTLR_EL1 by Exception level is: >> >> EL0 - >> EL1(NS) RW (a) >> EL1(S) RW (a) >> EL2 RW (b) >> EL3(SCR.NS = 1) RW >> EL3(SCR.NS = 0) RW >> >> (a) Write access if ACTLR_EL3.CPUACTLR is 1 and ACTLR_EL2.CPUACTLR is >> 1, or ACTLR_EL3.CPUACTLR is 1 and SCR.NS is 0. >> """ >> >> so you'll have to do this from ARM Trusted Firmware. If you're >> comfortable rebuilding that: >> >> diff --git a/include/lib/cpus/aarch64/cortex_a72.h >> b/include/lib/cpus/aarch64/cortex_a72.h >> index bfd64918625b..a7b8cf4be0c6 100644 >> --- a/include/lib/cpus/aarch64/cortex_a72.h >> +++ b/include/lib/cpus/aarch64/cortex_a72.h >> @@ -31,6 +31,7 @@ >> #define CORTEX_A72_ACTLR_EL1 S3_1_C15_C2_0 >> >> #define CORTEX_A72_ACTLR_DISABLE_L1_DCACHE_HW_PFTCH (1 << 56) >> +#define CORTEX_A72_ACTLR_DIS_NC_GRE_STORE_STREAMING (1 << 50) >> #define CORTEX_A72_ACTLR_NO_ALLOC_WBWA (1 << 49) >> #define CORTEX_A72_ACTLR_DCC_AS_DCCI (1 << 44) >> #define CORTEX_A72_ACTLR_EL1_DIS_INSTR_PREFETCH (1 << 32) >> diff --git a/lib/cpus/aarch64/cortex_a72.S b/lib/cpus/aarch64/cortex_a72.S >> index 55e508678284..5914d6ee3ba6 100644 >> --- a/lib/cpus/aarch64/cortex_a72.S >> +++ b/lib/cpus/aarch64/cortex_a72.S >> @@ -133,6 +133,15 @@ func cortex_a72_reset_func >> orr x0, x0, #CORTEX_A72_ECTLR_SMP_BIT >> msr CORTEX_A72_ECTLR_EL1, x0 >> isb >> + >> + /* --------------------------------------------- >> + * Disables store streaming on NC/GRE memory type. >> + * --------------------------------------------- >> + */ >> + mrs x0, CORTEX_A72_ACTLR_EL1 >> + orr x0, x0, #CORTEX_A72_ACTLR_DIS_NC_GRE_STORE_STREAMING >> + msr CORTEX_A72_ACTLR_EL1, x0 >> + isb >> ret x19 >> endfunc cortex_a72_reset_func > > Unfortunatelly, it doesn't work. I verified that the bit is set after > booting Linux, but the memcpy corruption was still present. > > I also tried the other chicken bits, it slowed down the system noticeably, > but had no effect on the memcpy corruption. > OK, it was worth a shot Let's wait and see if Marcin has any results. ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 20:11 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 20:11 UTC (permalink / raw) To: linux-arm-kernel On 6 August 2018 at 21:54, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > >> On 6 August 2018 at 19:09, Mikulas Patocka <mpatocka@redhat.com> wrote: >> > >> > >> > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: >> > >> >> On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: >> >> > On 06/08/18 11:25, Mikulas Patocka wrote: >> >> > [...] >> >> >>> >> >> >>> None of this explains why some transactions fail to make it across >> >> >>> entirely. The overlapping writes in question write the same data to >> >> >>> the memory locations that are covered by both, and so the ordering in >> >> >>> which the transactions are received should not affect the outcome. >> >> >> >> >> >> >> >> >> You're right that the corruption couldn't be explained just by reordering >> >> >> writes. My hypothesis is that the PCIe controller tries to disambiguate >> >> >> the overlapping writes, but the disambiguation logic was not tested and it >> >> >> is buggy. If there's a barrier between the overlapping writes, the PCIe >> >> >> controller won't see any overlapping writes, so it won't trigger the >> >> >> faulty disambiguation logic and it works. >> >> >> >> >> >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 >> >> >> that could insert barriers between non-cached writes automatically? >> >> > >> >> > >> >> > I don't think there is, and even if there was I imagine it would have a >> >> > pretty hideous effect on non-coherent DMA buffers and the various other >> >> > places in which we have Normal-NC mappings of actual system RAM. >> >> > >> >> >> >> Looking at the A72 manual, there is one chicken bit that looks like it >> >> may be related: >> >> >> >> CPUACTLR_EL1 bit #50: >> >> >> >> 0 Enables store streaming on NC/GRE memory type. This is the reset value. >> >> 1 Disables store streaming on NC/GRE memory type. >> >> >> >> so putting something like >> >> >> >> mrs x0, S3_1_C15_C2_0 >> >> orr x0, x0, #(1 << 50) >> >> msr S3_1_C15_C2_0, x0 >> >> >> >> in __cpu_setup() would be worth a try. >> > >> > It won't boot. >> > >> > But if i write the same value that was read, it also won't boot. >> > >> > I created a simple kernel module that reads this register and it has bit >> > 32 set, all other bits clear. But when I write the same value into it, the >> > core that does the write is stuck in infinite loop. >> > >> > So, it seems that we are writing this register from a wrong place. >> > >> >> Ah, my bad. I didn't look closely enough at the description: >> >> """ >> The accessibility to the CPUACTLR_EL1 by Exception level is: >> >> EL0 - >> EL1(NS) RW (a) >> EL1(S) RW (a) >> EL2 RW (b) >> EL3(SCR.NS = 1) RW >> EL3(SCR.NS = 0) RW >> >> (a) Write access if ACTLR_EL3.CPUACTLR is 1 and ACTLR_EL2.CPUACTLR is >> 1, or ACTLR_EL3.CPUACTLR is 1 and SCR.NS is 0. >> """ >> >> so you'll have to do this from ARM Trusted Firmware. If you're >> comfortable rebuilding that: >> >> diff --git a/include/lib/cpus/aarch64/cortex_a72.h >> b/include/lib/cpus/aarch64/cortex_a72.h >> index bfd64918625b..a7b8cf4be0c6 100644 >> --- a/include/lib/cpus/aarch64/cortex_a72.h >> +++ b/include/lib/cpus/aarch64/cortex_a72.h >> @@ -31,6 +31,7 @@ >> #define CORTEX_A72_ACTLR_EL1 S3_1_C15_C2_0 >> >> #define CORTEX_A72_ACTLR_DISABLE_L1_DCACHE_HW_PFTCH (1 << 56) >> +#define CORTEX_A72_ACTLR_DIS_NC_GRE_STORE_STREAMING (1 << 50) >> #define CORTEX_A72_ACTLR_NO_ALLOC_WBWA (1 << 49) >> #define CORTEX_A72_ACTLR_DCC_AS_DCCI (1 << 44) >> #define CORTEX_A72_ACTLR_EL1_DIS_INSTR_PREFETCH (1 << 32) >> diff --git a/lib/cpus/aarch64/cortex_a72.S b/lib/cpus/aarch64/cortex_a72.S >> index 55e508678284..5914d6ee3ba6 100644 >> --- a/lib/cpus/aarch64/cortex_a72.S >> +++ b/lib/cpus/aarch64/cortex_a72.S >> @@ -133,6 +133,15 @@ func cortex_a72_reset_func >> orr x0, x0, #CORTEX_A72_ECTLR_SMP_BIT >> msr CORTEX_A72_ECTLR_EL1, x0 >> isb >> + >> + /* --------------------------------------------- >> + * Disables store streaming on NC/GRE memory type. >> + * --------------------------------------------- >> + */ >> + mrs x0, CORTEX_A72_ACTLR_EL1 >> + orr x0, x0, #CORTEX_A72_ACTLR_DIS_NC_GRE_STORE_STREAMING >> + msr CORTEX_A72_ACTLR_EL1, x0 >> + isb >> ret x19 >> endfunc cortex_a72_reset_func > > Unfortunatelly, it doesn't work. I verified that the bit is set after > booting Linux, but the memcpy corruption was still present. > > I also tried the other chicken bits, it slowed down the system noticeably, > but had no effect on the memcpy corruption. > OK, it was worth a shot Let's wait and see if Marcin has any results. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 20:11 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 20:11 UTC (permalink / raw) To: Mikulas Patocka Cc: Thomas Petazzoni, Joao Pinto, Catalin Marinas, linux-pci, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, Jingoo Han, Robin Murphy, linux-arm-kernel On 6 August 2018 at 21:54, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > >> On 6 August 2018 at 19:09, Mikulas Patocka <mpatocka@redhat.com> wrote: >> > >> > >> > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: >> > >> >> On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: >> >> > On 06/08/18 11:25, Mikulas Patocka wrote: >> >> > [...] >> >> >>> >> >> >>> None of this explains why some transactions fail to make it across >> >> >>> entirely. The overlapping writes in question write the same data to >> >> >>> the memory locations that are covered by both, and so the ordering in >> >> >>> which the transactions are received should not affect the outcome. >> >> >> >> >> >> >> >> >> You're right that the corruption couldn't be explained just by reordering >> >> >> writes. My hypothesis is that the PCIe controller tries to disambiguate >> >> >> the overlapping writes, but the disambiguation logic was not tested and it >> >> >> is buggy. If there's a barrier between the overlapping writes, the PCIe >> >> >> controller won't see any overlapping writes, so it won't trigger the >> >> >> faulty disambiguation logic and it works. >> >> >> >> >> >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 >> >> >> that could insert barriers between non-cached writes automatically? >> >> > >> >> > >> >> > I don't think there is, and even if there was I imagine it would have a >> >> > pretty hideous effect on non-coherent DMA buffers and the various other >> >> > places in which we have Normal-NC mappings of actual system RAM. >> >> > >> >> >> >> Looking at the A72 manual, there is one chicken bit that looks like it >> >> may be related: >> >> >> >> CPUACTLR_EL1 bit #50: >> >> >> >> 0 Enables store streaming on NC/GRE memory type. This is the reset value. >> >> 1 Disables store streaming on NC/GRE memory type. >> >> >> >> so putting something like >> >> >> >> mrs x0, S3_1_C15_C2_0 >> >> orr x0, x0, #(1 << 50) >> >> msr S3_1_C15_C2_0, x0 >> >> >> >> in __cpu_setup() would be worth a try. >> > >> > It won't boot. >> > >> > But if i write the same value that was read, it also won't boot. >> > >> > I created a simple kernel module that reads this register and it has bit >> > 32 set, all other bits clear. But when I write the same value into it, the >> > core that does the write is stuck in infinite loop. >> > >> > So, it seems that we are writing this register from a wrong place. >> > >> >> Ah, my bad. I didn't look closely enough at the description: >> >> """ >> The accessibility to the CPUACTLR_EL1 by Exception level is: >> >> EL0 - >> EL1(NS) RW (a) >> EL1(S) RW (a) >> EL2 RW (b) >> EL3(SCR.NS = 1) RW >> EL3(SCR.NS = 0) RW >> >> (a) Write access if ACTLR_EL3.CPUACTLR is 1 and ACTLR_EL2.CPUACTLR is >> 1, or ACTLR_EL3.CPUACTLR is 1 and SCR.NS is 0. >> """ >> >> so you'll have to do this from ARM Trusted Firmware. If you're >> comfortable rebuilding that: >> >> diff --git a/include/lib/cpus/aarch64/cortex_a72.h >> b/include/lib/cpus/aarch64/cortex_a72.h >> index bfd64918625b..a7b8cf4be0c6 100644 >> --- a/include/lib/cpus/aarch64/cortex_a72.h >> +++ b/include/lib/cpus/aarch64/cortex_a72.h >> @@ -31,6 +31,7 @@ >> #define CORTEX_A72_ACTLR_EL1 S3_1_C15_C2_0 >> >> #define CORTEX_A72_ACTLR_DISABLE_L1_DCACHE_HW_PFTCH (1 << 56) >> +#define CORTEX_A72_ACTLR_DIS_NC_GRE_STORE_STREAMING (1 << 50) >> #define CORTEX_A72_ACTLR_NO_ALLOC_WBWA (1 << 49) >> #define CORTEX_A72_ACTLR_DCC_AS_DCCI (1 << 44) >> #define CORTEX_A72_ACTLR_EL1_DIS_INSTR_PREFETCH (1 << 32) >> diff --git a/lib/cpus/aarch64/cortex_a72.S b/lib/cpus/aarch64/cortex_a72.S >> index 55e508678284..5914d6ee3ba6 100644 >> --- a/lib/cpus/aarch64/cortex_a72.S >> +++ b/lib/cpus/aarch64/cortex_a72.S >> @@ -133,6 +133,15 @@ func cortex_a72_reset_func >> orr x0, x0, #CORTEX_A72_ECTLR_SMP_BIT >> msr CORTEX_A72_ECTLR_EL1, x0 >> isb >> + >> + /* --------------------------------------------- >> + * Disables store streaming on NC/GRE memory type. >> + * --------------------------------------------- >> + */ >> + mrs x0, CORTEX_A72_ACTLR_EL1 >> + orr x0, x0, #CORTEX_A72_ACTLR_DIS_NC_GRE_STORE_STREAMING >> + msr CORTEX_A72_ACTLR_EL1, x0 >> + isb >> ret x19 >> endfunc cortex_a72_reset_func > > Unfortunatelly, it doesn't work. I verified that the bit is set after > booting Linux, but the memcpy corruption was still present. > > I also tried the other chicken bits, it slowed down the system noticeably, > but had no effect on the memcpy corruption. > OK, it was worth a shot Let's wait and see if Marcin has any results. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 20:11 ` Ard Biesheuvel (?) @ 2018-08-06 20:31 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 20:31 UTC (permalink / raw) To: Ard Biesheuvel Cc: Robin Murphy, Thomas Petazzoni, Joao Pinto, linux-pci, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, Catalin Marinas, linux-arm-kernel On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > > Unfortunatelly, it doesn't work. I verified that the bit is set after > > booting Linux, but the memcpy corruption was still present. > > > > I also tried the other chicken bits, it slowed down the system noticeably, > > but had no effect on the memcpy corruption. > > > > OK, it was worth a shot > > Let's wait and see if Marcin has any results. BTW. is there documentation for that DesignWare PCIe controller somewhere? Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 20:31 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 20:31 UTC (permalink / raw) To: linux-arm-kernel On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > > Unfortunatelly, it doesn't work. I verified that the bit is set after > > booting Linux, but the memcpy corruption was still present. > > > > I also tried the other chicken bits, it slowed down the system noticeably, > > but had no effect on the memcpy corruption. > > > > OK, it was worth a shot > > Let's wait and see if Marcin has any results. BTW. is there documentation for that DesignWare PCIe controller somewhere? Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 20:31 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 20:31 UTC (permalink / raw) To: Ard Biesheuvel Cc: Thomas Petazzoni, Joao Pinto, Catalin Marinas, linux-pci, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, Jingoo Han, Robin Murphy, linux-arm-kernel On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > > Unfortunatelly, it doesn't work. I verified that the bit is set after > > booting Linux, but the memcpy corruption was still present. > > > > I also tried the other chicken bits, it slowed down the system noticeably, > > but had no effect on the memcpy corruption. > > > > OK, it was worth a shot > > Let's wait and see if Marcin has any results. BTW. is there documentation for that DesignWare PCIe controller somewhere? Mikulas _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 20:11 ` Ard Biesheuvel (?) @ 2018-08-07 16:40 ` Marcin Wojtas -1 siblings, 0 replies; 238+ messages in thread From: Marcin Wojtas @ 2018-08-07 16:40 UTC (permalink / raw) To: Ard Biesheuvel, mpatocka Cc: Thomas Petazzoni, Joao Pinto, Catalin Marinas, linux-pci, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, Matt Sealey, Jingoo Han, Robin Murphy, linux-arm-kernel Ard, Mikulas, pon., 6 sie 2018 o 22:11 Ard Biesheuvel <ard.biesheuvel@linaro.org> napisał(a): > > On 6 August 2018 at 21:54, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > > > >> On 6 August 2018 at 19:09, Mikulas Patocka <mpatocka@redhat.com> wrote: > >> > > >> > > >> > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > >> > > >> >> On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: > >> >> > On 06/08/18 11:25, Mikulas Patocka wrote: > >> >> > [...] > >> >> >>> > >> >> >>> None of this explains why some transactions fail to make it across > >> >> >>> entirely. The overlapping writes in question write the same data to > >> >> >>> the memory locations that are covered by both, and so the ordering in > >> >> >>> which the transactions are received should not affect the outcome. > >> >> >> > >> >> >> > >> >> >> You're right that the corruption couldn't be explained just by reordering > >> >> >> writes. My hypothesis is that the PCIe controller tries to disambiguate > >> >> >> the overlapping writes, but the disambiguation logic was not tested and it > >> >> >> is buggy. If there's a barrier between the overlapping writes, the PCIe > >> >> >> controller won't see any overlapping writes, so it won't trigger the > >> >> >> faulty disambiguation logic and it works. > >> >> >> > >> >> >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 > >> >> >> that could insert barriers between non-cached writes automatically? > >> >> > > >> >> > > >> >> > I don't think there is, and even if there was I imagine it would have a > >> >> > pretty hideous effect on non-coherent DMA buffers and the various other > >> >> > places in which we have Normal-NC mappings of actual system RAM. > >> >> > > >> >> > >> >> Looking at the A72 manual, there is one chicken bit that looks like it > >> >> may be related: > >> >> > >> >> CPUACTLR_EL1 bit #50: > >> >> > >> >> 0 Enables store streaming on NC/GRE memory type. This is the reset value. > >> >> 1 Disables store streaming on NC/GRE memory type. > >> >> > >> >> so putting something like > >> >> > >> >> mrs x0, S3_1_C15_C2_0 > >> >> orr x0, x0, #(1 << 50) > >> >> msr S3_1_C15_C2_0, x0 > >> >> > >> >> in __cpu_setup() would be worth a try. > >> > > >> > It won't boot. > >> > > >> > But if i write the same value that was read, it also won't boot. > >> > > >> > I created a simple kernel module that reads this register and it has bit > >> > 32 set, all other bits clear. But when I write the same value into it, the > >> > core that does the write is stuck in infinite loop. > >> > > >> > So, it seems that we are writing this register from a wrong place. > >> > > >> > >> Ah, my bad. I didn't look closely enough at the description: > >> > >> """ > >> The accessibility to the CPUACTLR_EL1 by Exception level is: > >> > >> EL0 - > >> EL1(NS) RW (a) > >> EL1(S) RW (a) > >> EL2 RW (b) > >> EL3(SCR.NS = 1) RW > >> EL3(SCR.NS = 0) RW > >> > >> (a) Write access if ACTLR_EL3.CPUACTLR is 1 and ACTLR_EL2.CPUACTLR is > >> 1, or ACTLR_EL3.CPUACTLR is 1 and SCR.NS is 0. > >> """ > >> > >> so you'll have to do this from ARM Trusted Firmware. If you're > >> comfortable rebuilding that: > >> > >> diff --git a/include/lib/cpus/aarch64/cortex_a72.h > >> b/include/lib/cpus/aarch64/cortex_a72.h > >> index bfd64918625b..a7b8cf4be0c6 100644 > >> --- a/include/lib/cpus/aarch64/cortex_a72.h > >> +++ b/include/lib/cpus/aarch64/cortex_a72.h > >> @@ -31,6 +31,7 @@ > >> #define CORTEX_A72_ACTLR_EL1 S3_1_C15_C2_0 > >> > >> #define CORTEX_A72_ACTLR_DISABLE_L1_DCACHE_HW_PFTCH (1 << 56) > >> +#define CORTEX_A72_ACTLR_DIS_NC_GRE_STORE_STREAMING (1 << 50) > >> #define CORTEX_A72_ACTLR_NO_ALLOC_WBWA (1 << 49) > >> #define CORTEX_A72_ACTLR_DCC_AS_DCCI (1 << 44) > >> #define CORTEX_A72_ACTLR_EL1_DIS_INSTR_PREFETCH (1 << 32) > >> diff --git a/lib/cpus/aarch64/cortex_a72.S b/lib/cpus/aarch64/cortex_a72.S > >> index 55e508678284..5914d6ee3ba6 100644 > >> --- a/lib/cpus/aarch64/cortex_a72.S > >> +++ b/lib/cpus/aarch64/cortex_a72.S > >> @@ -133,6 +133,15 @@ func cortex_a72_reset_func > >> orr x0, x0, #CORTEX_A72_ECTLR_SMP_BIT > >> msr CORTEX_A72_ECTLR_EL1, x0 > >> isb > >> + > >> + /* --------------------------------------------- > >> + * Disables store streaming on NC/GRE memory type. > >> + * --------------------------------------------- > >> + */ > >> + mrs x0, CORTEX_A72_ACTLR_EL1 > >> + orr x0, x0, #CORTEX_A72_ACTLR_DIS_NC_GRE_STORE_STREAMING > >> + msr CORTEX_A72_ACTLR_EL1, x0 > >> + isb > >> ret x19 > >> endfunc cortex_a72_reset_func > > > > Unfortunatelly, it doesn't work. I verified that the bit is set after > > booting Linux, but the memcpy corruption was still present. > > > > I also tried the other chicken bits, it slowed down the system noticeably, > > but had no effect on the memcpy corruption. > > > > OK, it was worth a shot > > Let's wait and see if Marcin has any results. > After some self-caused setup issues I was able to run the test on my MacchiatoBin with the kernel v4.18-rc8. It's been running for 1h+ now, loading the CPU to 100% and no single error event... I built the binary file with: gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc -O2 Maybe it's the older firmware issue? Please send the full bootlog with the very first line after reset. My board rev is v1.3 and I use mainline UEFI (newest edk2 + edk2-platforms) + newest publicly available ARM-TF and earliest firmware for this board. Best regards, Marcin ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-07 16:40 ` Marcin Wojtas 0 siblings, 0 replies; 238+ messages in thread From: Marcin Wojtas @ 2018-08-07 16:40 UTC (permalink / raw) To: linux-arm-kernel Ard, Mikulas, pon., 6 sie 2018 o 22:11 Ard Biesheuvel <ard.biesheuvel@linaro.org> napisa?(a): > > On 6 August 2018 at 21:54, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > > > >> On 6 August 2018 at 19:09, Mikulas Patocka <mpatocka@redhat.com> wrote: > >> > > >> > > >> > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > >> > > >> >> On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: > >> >> > On 06/08/18 11:25, Mikulas Patocka wrote: > >> >> > [...] > >> >> >>> > >> >> >>> None of this explains why some transactions fail to make it across > >> >> >>> entirely. The overlapping writes in question write the same data to > >> >> >>> the memory locations that are covered by both, and so the ordering in > >> >> >>> which the transactions are received should not affect the outcome. > >> >> >> > >> >> >> > >> >> >> You're right that the corruption couldn't be explained just by reordering > >> >> >> writes. My hypothesis is that the PCIe controller tries to disambiguate > >> >> >> the overlapping writes, but the disambiguation logic was not tested and it > >> >> >> is buggy. If there's a barrier between the overlapping writes, the PCIe > >> >> >> controller won't see any overlapping writes, so it won't trigger the > >> >> >> faulty disambiguation logic and it works. > >> >> >> > >> >> >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 > >> >> >> that could insert barriers between non-cached writes automatically? > >> >> > > >> >> > > >> >> > I don't think there is, and even if there was I imagine it would have a > >> >> > pretty hideous effect on non-coherent DMA buffers and the various other > >> >> > places in which we have Normal-NC mappings of actual system RAM. > >> >> > > >> >> > >> >> Looking at the A72 manual, there is one chicken bit that looks like it > >> >> may be related: > >> >> > >> >> CPUACTLR_EL1 bit #50: > >> >> > >> >> 0 Enables store streaming on NC/GRE memory type. This is the reset value. > >> >> 1 Disables store streaming on NC/GRE memory type. > >> >> > >> >> so putting something like > >> >> > >> >> mrs x0, S3_1_C15_C2_0 > >> >> orr x0, x0, #(1 << 50) > >> >> msr S3_1_C15_C2_0, x0 > >> >> > >> >> in __cpu_setup() would be worth a try. > >> > > >> > It won't boot. > >> > > >> > But if i write the same value that was read, it also won't boot. > >> > > >> > I created a simple kernel module that reads this register and it has bit > >> > 32 set, all other bits clear. But when I write the same value into it, the > >> > core that does the write is stuck in infinite loop. > >> > > >> > So, it seems that we are writing this register from a wrong place. > >> > > >> > >> Ah, my bad. I didn't look closely enough at the description: > >> > >> """ > >> The accessibility to the CPUACTLR_EL1 by Exception level is: > >> > >> EL0 - > >> EL1(NS) RW (a) > >> EL1(S) RW (a) > >> EL2 RW (b) > >> EL3(SCR.NS = 1) RW > >> EL3(SCR.NS = 0) RW > >> > >> (a) Write access if ACTLR_EL3.CPUACTLR is 1 and ACTLR_EL2.CPUACTLR is > >> 1, or ACTLR_EL3.CPUACTLR is 1 and SCR.NS is 0. > >> """ > >> > >> so you'll have to do this from ARM Trusted Firmware. If you're > >> comfortable rebuilding that: > >> > >> diff --git a/include/lib/cpus/aarch64/cortex_a72.h > >> b/include/lib/cpus/aarch64/cortex_a72.h > >> index bfd64918625b..a7b8cf4be0c6 100644 > >> --- a/include/lib/cpus/aarch64/cortex_a72.h > >> +++ b/include/lib/cpus/aarch64/cortex_a72.h > >> @@ -31,6 +31,7 @@ > >> #define CORTEX_A72_ACTLR_EL1 S3_1_C15_C2_0 > >> > >> #define CORTEX_A72_ACTLR_DISABLE_L1_DCACHE_HW_PFTCH (1 << 56) > >> +#define CORTEX_A72_ACTLR_DIS_NC_GRE_STORE_STREAMING (1 << 50) > >> #define CORTEX_A72_ACTLR_NO_ALLOC_WBWA (1 << 49) > >> #define CORTEX_A72_ACTLR_DCC_AS_DCCI (1 << 44) > >> #define CORTEX_A72_ACTLR_EL1_DIS_INSTR_PREFETCH (1 << 32) > >> diff --git a/lib/cpus/aarch64/cortex_a72.S b/lib/cpus/aarch64/cortex_a72.S > >> index 55e508678284..5914d6ee3ba6 100644 > >> --- a/lib/cpus/aarch64/cortex_a72.S > >> +++ b/lib/cpus/aarch64/cortex_a72.S > >> @@ -133,6 +133,15 @@ func cortex_a72_reset_func > >> orr x0, x0, #CORTEX_A72_ECTLR_SMP_BIT > >> msr CORTEX_A72_ECTLR_EL1, x0 > >> isb > >> + > >> + /* --------------------------------------------- > >> + * Disables store streaming on NC/GRE memory type. > >> + * --------------------------------------------- > >> + */ > >> + mrs x0, CORTEX_A72_ACTLR_EL1 > >> + orr x0, x0, #CORTEX_A72_ACTLR_DIS_NC_GRE_STORE_STREAMING > >> + msr CORTEX_A72_ACTLR_EL1, x0 > >> + isb > >> ret x19 > >> endfunc cortex_a72_reset_func > > > > Unfortunatelly, it doesn't work. I verified that the bit is set after > > booting Linux, but the memcpy corruption was still present. > > > > I also tried the other chicken bits, it slowed down the system noticeably, > > but had no effect on the memcpy corruption. > > > > OK, it was worth a shot > > Let's wait and see if Marcin has any results. > After some self-caused setup issues I was able to run the test on my MacchiatoBin with the kernel v4.18-rc8. It's been running for 1h+ now, loading the CPU to 100% and no single error event... I built the binary file with: gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc -O2 Maybe it's the older firmware issue? Please send the full bootlog with the very first line after reset. My board rev is v1.3 and I use mainline UEFI (newest edk2 + edk2-platforms) + newest publicly available ARM-TF and earliest firmware for this board. Best regards, Marcin ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-07 16:40 ` Marcin Wojtas 0 siblings, 0 replies; 238+ messages in thread From: Marcin Wojtas @ 2018-08-07 16:40 UTC (permalink / raw) To: Ard Biesheuvel, mpatocka Cc: Thomas Petazzoni, Joao Pinto, Jingoo Han, Catalin Marinas, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, Matt Sealey, linux-pci, Robin Murphy, linux-arm-kernel QXJkLCBNaWt1bGFzLAoKcG9uLiwgNiBzaWUgMjAxOCBvIDIyOjExIEFyZCBCaWVzaGV1dmVsIDxh cmQuYmllc2hldXZlbEBsaW5hcm8ub3JnPiBuYXBpc2HFgihhKToKPgo+IE9uIDYgQXVndXN0IDIw MTggYXQgMjE6NTQsIE1pa3VsYXMgUGF0b2NrYSA8bXBhdG9ja2FAcmVkaGF0LmNvbT4gd3JvdGU6 Cj4gPgo+ID4KPiA+IE9uIE1vbiwgNiBBdWcgMjAxOCwgQXJkIEJpZXNoZXV2ZWwgd3JvdGU6Cj4g Pgo+ID4+IE9uIDYgQXVndXN0IDIwMTggYXQgMTk6MDksIE1pa3VsYXMgUGF0b2NrYSA8bXBhdG9j a2FAcmVkaGF0LmNvbT4gd3JvdGU6Cj4gPj4gPgo+ID4+ID4KPiA+PiA+IE9uIE1vbiwgNiBBdWcg MjAxOCwgQXJkIEJpZXNoZXV2ZWwgd3JvdGU6Cj4gPj4gPgo+ID4+ID4+IE9uIDYgQXVndXN0IDIw MTggYXQgMTQ6NDIsIFJvYmluIE11cnBoeSA8cm9iaW4ubXVycGh5QGFybS5jb20+IHdyb3RlOgo+ ID4+ID4+ID4gT24gMDYvMDgvMTggMTE6MjUsIE1pa3VsYXMgUGF0b2NrYSB3cm90ZToKPiA+PiA+ PiA+IFsuLi5dCj4gPj4gPj4gPj4+Cj4gPj4gPj4gPj4+IE5vbmUgb2YgdGhpcyBleHBsYWlucyB3 aHkgc29tZSB0cmFuc2FjdGlvbnMgZmFpbCB0byBtYWtlIGl0IGFjcm9zcwo+ID4+ID4+ID4+PiBl bnRpcmVseS4gVGhlIG92ZXJsYXBwaW5nIHdyaXRlcyBpbiBxdWVzdGlvbiB3cml0ZSB0aGUgc2Ft ZSBkYXRhIHRvCj4gPj4gPj4gPj4+IHRoZSBtZW1vcnkgbG9jYXRpb25zIHRoYXQgYXJlIGNvdmVy ZWQgYnkgYm90aCwgYW5kIHNvIHRoZSBvcmRlcmluZyBpbgo+ID4+ID4+ID4+PiB3aGljaCB0aGUg dHJhbnNhY3Rpb25zIGFyZSByZWNlaXZlZCBzaG91bGQgbm90IGFmZmVjdCB0aGUgb3V0Y29tZS4K PiA+PiA+PiA+Pgo+ID4+ID4+ID4+Cj4gPj4gPj4gPj4gWW91J3JlIHJpZ2h0IHRoYXQgdGhlIGNv cnJ1cHRpb24gY291bGRuJ3QgYmUgZXhwbGFpbmVkIGp1c3QgYnkgcmVvcmRlcmluZwo+ID4+ID4+ ID4+IHdyaXRlcy4gTXkgaHlwb3RoZXNpcyBpcyB0aGF0IHRoZSBQQ0llIGNvbnRyb2xsZXIgdHJp ZXMgdG8gZGlzYW1iaWd1YXRlCj4gPj4gPj4gPj4gdGhlIG92ZXJsYXBwaW5nIHdyaXRlcywgYnV0 IHRoZSBkaXNhbWJpZ3VhdGlvbiBsb2dpYyB3YXMgbm90IHRlc3RlZCBhbmQgaXQKPiA+PiA+PiA+ PiBpcyBidWdneS4gSWYgdGhlcmUncyBhIGJhcnJpZXIgYmV0d2VlbiB0aGUgb3ZlcmxhcHBpbmcg d3JpdGVzLCB0aGUgUENJZQo+ID4+ID4+ID4+IGNvbnRyb2xsZXIgd29uJ3Qgc2VlIGFueSBvdmVy bGFwcGluZyB3cml0ZXMsIHNvIGl0IHdvbid0IHRyaWdnZXIgdGhlCj4gPj4gPj4gPj4gZmF1bHR5 IGRpc2FtYmlndWF0aW9uIGxvZ2ljIGFuZCBpdCB3b3Jrcy4KPiA+PiA+PiA+Pgo+ID4+ID4+ID4+ IENvdWxkIHRoZSBBUk0gZW5naW5lZXJzIGxvb2sgaWYgdGhlcmUncyBzb21lIGNoaWNrZW4gYml0 IGluIENvcnRleC1BNzIKPiA+PiA+PiA+PiB0aGF0IGNvdWxkIGluc2VydCBiYXJyaWVycyBiZXR3 ZWVuIG5vbi1jYWNoZWQgd3JpdGVzIGF1dG9tYXRpY2FsbHk/Cj4gPj4gPj4gPgo+ID4+ID4+ID4K PiA+PiA+PiA+IEkgZG9uJ3QgdGhpbmsgdGhlcmUgaXMsIGFuZCBldmVuIGlmIHRoZXJlIHdhcyBJ IGltYWdpbmUgaXQgd291bGQgaGF2ZSBhCj4gPj4gPj4gPiBwcmV0dHkgaGlkZW91cyBlZmZlY3Qg b24gbm9uLWNvaGVyZW50IERNQSBidWZmZXJzIGFuZCB0aGUgdmFyaW91cyBvdGhlcgo+ID4+ID4+ ID4gcGxhY2VzIGluIHdoaWNoIHdlIGhhdmUgTm9ybWFsLU5DIG1hcHBpbmdzIG9mIGFjdHVhbCBz eXN0ZW0gUkFNLgo+ID4+ID4+ID4KPiA+PiA+Pgo+ID4+ID4+IExvb2tpbmcgYXQgdGhlIEE3MiBt YW51YWwsIHRoZXJlIGlzIG9uZSBjaGlja2VuIGJpdCB0aGF0IGxvb2tzIGxpa2UgaXQKPiA+PiA+ PiBtYXkgYmUgcmVsYXRlZDoKPiA+PiA+Pgo+ID4+ID4+IENQVUFDVExSX0VMMSBiaXQgIzUwOgo+ ID4+ID4+Cj4gPj4gPj4gMCBFbmFibGVzIHN0b3JlIHN0cmVhbWluZyBvbiBOQy9HUkUgbWVtb3J5 IHR5cGUuIFRoaXMgaXMgdGhlIHJlc2V0IHZhbHVlLgo+ID4+ID4+IDEgRGlzYWJsZXMgc3RvcmUg c3RyZWFtaW5nIG9uIE5DL0dSRSBtZW1vcnkgdHlwZS4KPiA+PiA+Pgo+ID4+ID4+IHNvIHB1dHRp bmcgc29tZXRoaW5nIGxpa2UKPiA+PiA+Pgo+ID4+ID4+IG1ycyB4MCwgUzNfMV9DMTVfQzJfMAo+ ID4+ID4+IG9yciB4MCwgeDAsICMoMSA8PCA1MCkKPiA+PiA+PiBtc3IgUzNfMV9DMTVfQzJfMCwg eDAKPiA+PiA+Pgo+ID4+ID4+IGluIF9fY3B1X3NldHVwKCkgd291bGQgYmUgd29ydGggYSB0cnku Cj4gPj4gPgo+ID4+ID4gSXQgd29uJ3QgYm9vdC4KPiA+PiA+Cj4gPj4gPiBCdXQgaWYgaSB3cml0 ZSB0aGUgc2FtZSB2YWx1ZSB0aGF0IHdhcyByZWFkLCBpdCBhbHNvIHdvbid0IGJvb3QuCj4gPj4g Pgo+ID4+ID4gSSBjcmVhdGVkIGEgc2ltcGxlIGtlcm5lbCBtb2R1bGUgdGhhdCByZWFkcyB0aGlz IHJlZ2lzdGVyIGFuZCBpdCBoYXMgYml0Cj4gPj4gPiAzMiBzZXQsIGFsbCBvdGhlciBiaXRzIGNs ZWFyLiBCdXQgd2hlbiBJIHdyaXRlIHRoZSBzYW1lIHZhbHVlIGludG8gaXQsIHRoZQo+ID4+ID4g Y29yZSB0aGF0IGRvZXMgdGhlIHdyaXRlIGlzIHN0dWNrIGluIGluZmluaXRlIGxvb3AuCj4gPj4g Pgo+ID4+ID4gU28sIGl0IHNlZW1zIHRoYXQgd2UgYXJlIHdyaXRpbmcgdGhpcyByZWdpc3RlciBm cm9tIGEgd3JvbmcgcGxhY2UuCj4gPj4gPgo+ID4+Cj4gPj4gQWgsIG15IGJhZC4gSSBkaWRuJ3Qg bG9vayBjbG9zZWx5IGVub3VnaCBhdCB0aGUgZGVzY3JpcHRpb246Cj4gPj4KPiA+PiAiIiIKPiA+ PiBUaGUgYWNjZXNzaWJpbGl0eSB0byB0aGUgQ1BVQUNUTFJfRUwxIGJ5IEV4Y2VwdGlvbiBsZXZl bCBpczoKPiA+Pgo+ID4+IEVMMCAgICAgICAgICAgICAgLQo+ID4+IEVMMShOUykgICAgICAgICAg UlcgKGEpCj4gPj4gRUwxKFMpICAgICAgICAgICBSVyAoYSkKPiA+PiBFTDIgICAgICAgICAgICAg IFJXIChiKQo+ID4+IEVMMyhTQ1IuTlMgPSAxKSAgUlcKPiA+PiBFTDMoU0NSLk5TID0gMCkgIFJX Cj4gPj4KPiA+PiAoYSkgV3JpdGUgYWNjZXNzIGlmIEFDVExSX0VMMy5DUFVBQ1RMUiBpcyAxIGFu ZCBBQ1RMUl9FTDIuQ1BVQUNUTFIgaXMKPiA+PiAxLCBvciBBQ1RMUl9FTDMuQ1BVQUNUTFIgaXMg MSBhbmQgU0NSLk5TIGlzIDAuCj4gPj4gIiIiCj4gPj4KPiA+PiBzbyB5b3UnbGwgaGF2ZSB0byBk byB0aGlzIGZyb20gQVJNIFRydXN0ZWQgRmlybXdhcmUuIElmIHlvdSdyZQo+ID4+IGNvbWZvcnRh YmxlIHJlYnVpbGRpbmcgdGhhdDoKPiA+Pgo+ID4+IGRpZmYgLS1naXQgYS9pbmNsdWRlL2xpYi9j cHVzL2FhcmNoNjQvY29ydGV4X2E3Mi5oCj4gPj4gYi9pbmNsdWRlL2xpYi9jcHVzL2FhcmNoNjQv Y29ydGV4X2E3Mi5oCj4gPj4gaW5kZXggYmZkNjQ5MTg2MjViLi5hN2I4Y2Y0YmUwYzYgMTAwNjQ0 Cj4gPj4gLS0tIGEvaW5jbHVkZS9saWIvY3B1cy9hYXJjaDY0L2NvcnRleF9hNzIuaAo+ID4+ICsr KyBiL2luY2x1ZGUvbGliL2NwdXMvYWFyY2g2NC9jb3J0ZXhfYTcyLmgKPiA+PiBAQCAtMzEsNiAr MzEsNyBAQAo+ID4+ICAjZGVmaW5lIENPUlRFWF9BNzJfQUNUTFJfRUwxICAgICAgICAgICAgICAg ICAgIFMzXzFfQzE1X0MyXzAKPiA+Pgo+ID4+ICAjZGVmaW5lIENPUlRFWF9BNzJfQUNUTFJfRElT QUJMRV9MMV9EQ0FDSEVfSFdfUEZUQ0ggICAgKDEgPDwgNTYpCj4gPj4gKyNkZWZpbmUgQ09SVEVY X0E3Ml9BQ1RMUl9ESVNfTkNfR1JFX1NUT1JFX1NUUkVBTUlORyAgICAoMSA8PCA1MCkKPiA+PiAg I2RlZmluZSBDT1JURVhfQTcyX0FDVExSX05PX0FMTE9DX1dCV0EgICAgICAgICAoMSA8PCA0OSkK PiA+PiAgI2RlZmluZSBDT1JURVhfQTcyX0FDVExSX0RDQ19BU19EQ0NJICAgICAgICAgICAoMSA8 PCA0NCkKPiA+PiAgI2RlZmluZSBDT1JURVhfQTcyX0FDVExSX0VMMV9ESVNfSU5TVFJfUFJFRkVU Q0ggICAgICAgICAgICAgICAgKDEgPDwgMzIpCj4gPj4gZGlmZiAtLWdpdCBhL2xpYi9jcHVzL2Fh cmNoNjQvY29ydGV4X2E3Mi5TIGIvbGliL2NwdXMvYWFyY2g2NC9jb3J0ZXhfYTcyLlMKPiA+PiBp bmRleCA1NWU1MDg2NzgyODQuLjU5MTRkNmVlM2JhNiAxMDA2NDQKPiA+PiAtLS0gYS9saWIvY3B1 cy9hYXJjaDY0L2NvcnRleF9hNzIuUwo+ID4+ICsrKyBiL2xpYi9jcHVzL2FhcmNoNjQvY29ydGV4 X2E3Mi5TCj4gPj4gQEAgLTEzMyw2ICsxMzMsMTUgQEAgZnVuYyBjb3J0ZXhfYTcyX3Jlc2V0X2Z1 bmMKPiA+PiAgICAgICAgIG9yciAgICAgeDAsIHgwLCAjQ09SVEVYX0E3Ml9FQ1RMUl9TTVBfQklU Cj4gPj4gICAgICAgICBtc3IgICAgIENPUlRFWF9BNzJfRUNUTFJfRUwxLCB4MAo+ID4+ICAgICAg ICAgaXNiCj4gPj4gKwo+ID4+ICsgICAgICAgLyogLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tCj4gPj4gKyAgICAgICAgKiBEaXNhYmxlcyBzdG9yZSBzdHJlYW1p bmcgb24gTkMvR1JFIG1lbW9yeSB0eXBlLgo+ID4+ICsgICAgICAgICogLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tCj4gPj4gKyAgICAgICAgKi8KPiA+PiArICAg ICAgIG1ycyAgICAgeDAsIENPUlRFWF9BNzJfQUNUTFJfRUwxCj4gPj4gKyAgICAgICBvcnIgICAg IHgwLCB4MCwgI0NPUlRFWF9BNzJfQUNUTFJfRElTX05DX0dSRV9TVE9SRV9TVFJFQU1JTkcKPiA+ PiArICAgICAgIG1zciAgICAgQ09SVEVYX0E3Ml9BQ1RMUl9FTDEsIHgwCj4gPj4gKyAgICAgICBp c2IKPiA+PiAgICAgICAgIHJldCB4MTkKPiA+PiAgZW5kZnVuYyBjb3J0ZXhfYTcyX3Jlc2V0X2Z1 bmMKPiA+Cj4gPiBVbmZvcnR1bmF0ZWxseSwgaXQgZG9lc24ndCB3b3JrLiBJIHZlcmlmaWVkIHRo YXQgdGhlIGJpdCBpcyBzZXQgYWZ0ZXIKPiA+IGJvb3RpbmcgTGludXgsIGJ1dCB0aGUgbWVtY3B5 IGNvcnJ1cHRpb24gd2FzIHN0aWxsIHByZXNlbnQuCj4gPgo+ID4gSSBhbHNvIHRyaWVkIHRoZSBv dGhlciBjaGlja2VuIGJpdHMsIGl0IHNsb3dlZCBkb3duIHRoZSBzeXN0ZW0gbm90aWNlYWJseSwK PiA+IGJ1dCBoYWQgbm8gZWZmZWN0IG9uIHRoZSBtZW1jcHkgY29ycnVwdGlvbi4KPiA+Cj4KPiBP SywgaXQgd2FzIHdvcnRoIGEgc2hvdAo+Cj4gTGV0J3Mgd2FpdCBhbmQgc2VlIGlmIE1hcmNpbiBo YXMgYW55IHJlc3VsdHMuCj4KCkFmdGVyIHNvbWUgc2VsZi1jYXVzZWQgc2V0dXAgaXNzdWVzIEkg d2FzIGFibGUgdG8gcnVuIHRoZSB0ZXN0IG9uIG15Ck1hY2NoaWF0b0JpbiB3aXRoIHRoZSBrZXJu ZWwgdjQuMTgtcmM4LiBJdCdzIGJlZW4gcnVubmluZyBmb3IgMWgrIG5vdywKbG9hZGluZyB0aGUg Q1BVIHRvIDEwMCUgYW5kIG5vIHNpbmdsZSBlcnJvciBldmVudC4uLgoKSSBidWlsdCB0aGUgYmlu YXJ5IGZpbGUgd2l0aDoKZ2NjLWxpbmFyby03LjIuMS0yMDE3LjExLXg4Nl82NF9hYXJjaDY0LWxp bnV4LWdudS9iaW4vYWFyY2g2NC1saW51eC1nbnUtZ2NjIC1PMgoKTWF5YmUgaXQncyB0aGUgb2xk ZXIgZmlybXdhcmUgaXNzdWU/IFBsZWFzZSBzZW5kIHRoZSBmdWxsIGJvb3Rsb2cgd2l0aAp0aGUg dmVyeSBmaXJzdCBsaW5lIGFmdGVyIHJlc2V0LiBNeSBib2FyZCByZXYgaXMgdjEuMyBhbmQgSSB1 c2UKbWFpbmxpbmUgVUVGSSAobmV3ZXN0IGVkazIgKyBlZGsyLXBsYXRmb3JtcykgKyBuZXdlc3Qg cHVibGljbHkKYXZhaWxhYmxlIEFSTS1URiBhbmQgZWFybGllc3QgZmlybXdhcmUgZm9yIHRoaXMg Ym9hcmQuCgpCZXN0IHJlZ2FyZHMsCk1hcmNpbgoKX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX18KbGludXgtYXJtLWtlcm5lbCBtYWlsaW5nIGxpc3QKbGludXgt YXJtLWtlcm5lbEBsaXN0cy5pbmZyYWRlYWQub3JnCmh0dHA6Ly9saXN0cy5pbmZyYWRlYWQub3Jn L21haWxtYW4vbGlzdGluZm8vbGludXgtYXJtLWtlcm5lbAo= ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-07 16:40 ` Marcin Wojtas (?) @ 2018-08-07 17:39 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-07 17:39 UTC (permalink / raw) To: Marcin Wojtas Cc: Ard Biesheuvel, Thomas Petazzoni, Joao Pinto, Catalin Marinas, linux-pci, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, Matt Sealey, Jingoo Han, Robin Murphy, linux-arm-kernel On Tue, 7 Aug 2018, Marcin Wojtas wrote: > Ard, Mikulas, > > After some self-caused setup issues I was able to run the test on my > MacchiatoBin with the kernel v4.18-rc8. It's been running for 1h+ now, > loading the CPU to 100% and no single error event... > > I built the binary file with: > gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc -O2 > > Maybe it's the older firmware issue? I have downloaded and built the firmware recently (it has timestamp Jul 30 2018). Do you still have your firmware file "flash-image.bin" that you used, so that I could try it? > Please send the full bootlog with > the very first line after reset. My board rev is v1.3 and I use > mainline UEFI (newest edk2 + edk2-platforms) + newest publicly > available ARM-TF and earliest firmware for this board. > > Best regards, > Marcin This is my bootlog: BootROM - 2.03 Starting CP-0 IOROM 1.07 Booting from SD 0 (0x29) Found valid image at boot postion 0x002 lNOTICE: Starting binary extension NOTICE: SVC: SW Revision 0x0. SVC is not supported mv_ddr: mv_ddr-devel-18.05.0-g84dd1d9 (Jul 30 2018 - 04:58:51 PM) mv_ddr: completed successfully NOTICE: Cold boot NOTICE: Booting Trusted Firmware NOTICE: BL1: v1.4(release):armada-18.05.2:80bbf686 NOTICE: BL1: Built : 17:00:18, Jul 30 2018 NOTICE: BL1: Booting BL2 lNOTICE: BL2: v1.4(release):armada-18.05.2:80bbf686 NOTICE: BL2: Built : 17:00:21, Jul 30 2018 BL2: Initiating SCP_BL2 transfer to SCP NOTICE: SCP_BL2 contains 2 concatenated images NOTICE: Load image to CP1 MSS AP0 NOTICE: Loading MSS image from address 0x4023020 Size 0x135c to MSS at 0xf4280000 NOTICE: Done NOTICE: Load image to AP0 MSS NOTICE: Loading MSS image from address 0x402437c Size 0x1f6c to MSS at 0xf0580000 N FreeRTOS 7.3.0 - Marvell cm3 - A8K release armada-18.05.1 OTICE: Done NOTICE: SCP Image doesn't contain PM firmware NOTICE: BL1: Booting BL31 lNOTICE: MSS PM is not supported in this build NOTICE: BL31: v1.4(release):armada-18.05.2:80bbf686 NOTICE: BL31: Built : 17:00:21, Jul 30 2018 lUEFI firmware (version MARVELL_EFI built at 16:50:27 on Jul 30 2018) Armada 8040 MachiatoBin Platform Init Comphy0-0: PCIE0 5 Gbps Comphy0-1: PCIE0 5 Gbps Comphy0-2: PCIE0 5 Gbps Comphy0-3: PCIE0 5 Gbps Comphy0-4: SFI 10.31 Gbps Comphy0-5: SATA1 5 Gbps Comphy1-0: SGMII1 1.25 Gbps Comphy1-1: SATA2 5 Gbps Comphy1-2: USB3_HOST0 5 Gbps Comphy1-3: SATA3 5 Gbps Comphy1-4: SFI 10.31 Gbps Comphy1-5: SGMII2 3.125 Gbps UTMI PHY 0 initialized to USB Host0 UTMI PHY 1 initialized to USB Host1 UTMI PHY 2 initialized to USB Host0 Succesfully installed protocol interfaces Error: Image at 000BF6F8000 start failed: 00000001 remove-symbol-file /usr/src/git/macchiato/edk2/Build/Armada80x0McBin-AARCH64/RELEASE_GCC5/AARCH64/MdeModulePkg/Universal/Acpi/AcpiPlatformDxe/AcpiPlatformDxe/DEBUG/AcpiPlatform.dll 0xBF6F9000 Detected w25q32bv SPI flash with page size 256 B, erase size 4 KB, total 4 MB ramdisk:blckio install. Status=Success Connect: PcieRoot(0x0)/Pci(0x0,0x0): Not Found 3h3h3hTianocore/EDK2 firmware version MARVELL_EFI Press ESCAPE for boot options ...error: no suitable video mode found. error: no video mode activated. GNU GRUB version 2.02~beta3-5 /----------------------------------------------------------------------------\||||||||||||||||||||||||||\----------------------------------------------------------------------------/ Use the ^ and v keys to select which entry is highlighted. Press enter to boot the selected OS, `e' to edit the commands before booting or `c' for a command-line. *Debian GNU/Linux Advanced options for Debian GNU/Linux System setup The highlighted entry will be executed automatically in 5s. The highlighted entry will be executed automatically in 4s. Loading Linux 4.17.11 ... EFI stub: Booting Linux Kernel... EFI stub: Using DTB from configuration table EFI stub: Exiting boot services and installing virtual address map... [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd081] [ 0.000000] Linux version 4.17.11 (root@leontynka) (gcc version 8.2.0 (Debian 8.2.0-2)) #10 SMP PREEMPT Fri Aug 3 18:29:35 CEST 2018 [ 0.000000] Machine model: Marvell 8040 MACCHIATOBin [ 0.000000] efi: Getting EFI parameters from FDT: [ 0.000000] efi: EFI v2.70 by EDK II [ 0.000000] efi: SMBIOS 3.0=0xbfed0000 ACPI 2.0=0xb6760000 MEMATTR=0xb8c63518 RNG=0xbffdcf98 [ 0.000000] efi: seeding entropy pool [ 0.000000] psci: probing for conduit method from DT. [ 0.000000] psci: PSCIv1.0 detected in firmware. [ 0.000000] psci: Using standard PSCI v0.2 function IDs [ 0.000000] psci: MIGRATE_INFO_TYPE not supported. [ 0.000000] psci: SMC Calling Convention v1.1 [ 0.000000] percpu: Embedded 26 pages/cpu @ (ptrval) s67096 r8192 d31208 u106496 [ 0.000000] Detected PIPT I-cache on CPU0 [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 1031688 [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.17.11 root=/dev/mmcblk0p1 ro console=ttyS0,115200 [ 0.000000] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) [ 0.000000] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) [ 0.000000] software IO TLB [mem 0xbb810000-0xbf810000] (64MB) mapped at [ (ptrval)- (ptrval)] [ 0.000000] Memory: 4033692K/4192256K available (4860K kernel code, 376K rwdata, 2452K rodata, 384K init, 2178K bss, 158564K reserved, 0K cma-reserved) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 [ 0.000000] Preemptible hierarchical RCU implementation. [ 0.000000] Tasks RCU enabled. [ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 [ 0.000000] GIC: Adjusting CPU interface base to 0x00000000f022f000 [ 0.000000] GIC: Using split EOI/Deactivate mode [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:160, num:32) [ 0.000000] GICv2m: range[mem 0xf0280000-0xf0280fff], SPI[160:191] [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:192, num:32) [ 0.000000] GICv2m: range[mem 0xf0290000-0xf0290fff], SPI[192:223] [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:224, num:32) [ 0.000000] GICv2m: range[mem 0xf02a0000-0xf02a0fff], SPI[224:255] [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:256, num:32) [ 0.000000] GICv2m: range[mem 0xf02b0000-0xf02b0fff], SPI[256:287] [ 0.000000] arch_timer: cp15 timer(s) running at 25.00MHz (phys). [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x5c40939b5, max_idle_ns: 440795202646 ns [ 0.000002] sched_clock: 56 bits at 25MHz, resolution 40ns, wraps every 4398046511100ns [ 0.000113] Console: colour dummy device 174x49 [ 0.000124] Calibrating delay loop (skipped), value calculated using timer frequency.. 50.08 BogoMIPS (lpj=83333) [ 0.000129] pid_max: default: 32768 minimum: 301 [ 0.000151] Security Framework initialized [ 0.000154] Yama: becoming mindful. [ 0.000183] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes) [ 0.000199] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes) [ 0.016676] ASID allocator initialised with 65536 entries [ 0.020006] Hierarchical SRCU implementation. [ 0.023435] Remapping and enabling EFI services. [ 0.026680] smp: Bringing up secondary CPUs ... [ 0.043500] Detected PIPT I-cache on CPU1 [ 0.043522] CPU1: Booted secondary processor 0x0000000001 [0x410fd081] [ 0.060176] Detected PIPT I-cache on CPU2 [ 0.060195] CPU2: Booted secondary processor 0x0000000100 [0x410fd081] [ 0.076859] Detected PIPT I-cache on CPU3 [ 0.076872] CPU3: Booted secondary processor 0x0000000101 [0x410fd081] [ 0.076901] smp: Brought up 1 node, 4 CPUs [ 0.076910] SMP: Total of 4 processors activated. [ 0.076913] CPU features: detected: 32-bit EL0 Support [ 0.077194] CPU: All CPU(s) started at EL2 [ 0.077205] alternatives: patching kernel code [ 0.077230] random: get_random_u64 called from compute_layout+0x94/0xe8 with crng_init=0 [ 0.077599] devtmpfs: initialized [ 0.078967] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370867519511994 ns [ 0.078975] futex hash table entries: 1024 (order: 5, 131072 bytes) [ 0.079032] pinctrl core: initialized pinctrl subsystem [ 0.079183] SMBIOS 3.0.0 present. [ 0.079191] DMI: Marvell Armada 8040 MacchiatoBin/Armada 8040 MacchiatoBin, BIOS EDK II Jul 30 2018 [ 0.079264] NET: Registered protocol family 16 [ 0.079484] cpuidle: using governor ladder [ 0.079535] cpuidle: using governor menu [ 0.079559] vdso: 2 pages (1 code @ (ptrval), 1 data @ (ptrval)) [ 0.079562] vdso: 2 pages (1 code @ (ptrval), 1 data @ (ptrval)) [ 0.079569] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers. [ 0.079681] DMA: preallocated 256 KiB pool for atomic allocations [ 0.082434] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages [ 0.082669] ACPI: Interpreter disabled. [ 0.082822] reg-fixed-voltage regulator-usb3-vbus0: could not find pctldev for node /cp0/config-space@f2000000/system-controller@440000/pinctrl/xhci0-vbus-pins, deferring probe [ 0.082929] SCSI subsystem initialized [ 0.083033] Registered efivars operations [ 0.083398] clocksource: Switched to clocksource arch_sys_counter [ 0.083517] pnp: PnP ACPI: disabled [ 0.085008] NET: Registered protocol family 2 [ 0.085146] tcp_listen_portaddr_hash hash table entries: 2048 (order: 3, 32768 bytes) [ 0.085156] TCP established hash table entries: 32768 (order: 6, 262144 bytes) [ 0.085213] TCP bind hash table entries: 32768 (order: 7, 524288 bytes) [ 0.085441] TCP: Hash tables configured (established 32768 bind 32768) [ 0.085492] UDP hash table entries: 2048 (order: 4, 65536 bytes) [ 0.085507] UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes) [ 0.085713] hw perfevents: unable to count PMU IRQs [ 0.085718] hw perfevents: /ap806/config-space@f0000000/pmu: failed to register PMU devices! [ 0.085823] kvm [1]: 8-bit VMID [ 0.086279] kvm [1]: vgic interrupt IRQ1 [ 0.086339] kvm [1]: Hyp mode initialized successfully [ 0.086649] workingset: timestamp_bits=62 max_order=20 bucket_order=0 [ 0.088566] io scheduler noop registered [ 0.088625] io scheduler cfq registered (default) [ 0.089467] armada-ap806-pinctrl f06f4000.system-controller:pinctrl: registered pinctrl driver [ 0.089690] armada-cp110-pinctrl f2440000.system-controller:pinctrl: registered pinctrl driver [ 0.089843] armada-cp110-pinctrl f4440000.system-controller:pinctrl: registered pinctrl driver [ 0.091439] mv_xor_v2 f0400000.xor: Marvell Version 2 XOR driver [ 0.091583] mv_xor_v2 f0420000.xor: Marvell Version 2 XOR driver [ 0.091736] mv_xor_v2 f0440000.xor: Marvell Version 2 XOR driver [ 0.091897] mv_xor_v2 f0460000.xor: Marvell Version 2 XOR driver [ 0.092084] mv_xor_v2 f26a0000.xor: Marvell Version 2 XOR driver [ 0.092247] mv_xor_v2 f26c0000.xor: Marvell Version 2 XOR driver [ 0.092432] mv_xor_v2 f46a0000.xor: Marvell Version 2 XOR driver [ 0.092596] mv_xor_v2 f46c0000.xor: Marvell Version 2 XOR driver [ 0.092685] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled [ 0.092996] console [ttyS0] disabled [ 0.113364] f0512000.serial: ttyS0 at MMIO 0xf0512000 (irq = 7, base_baud = 12500000) is a 16550A [ 0.777029] console [ttyS0] enabled [ 0.801032] f2702100.serial: ttyS1 at MMIO 0xf2702100 (irq = 26, base_baud = 15625000) is a 16550A [ 0.830524] f4702000.serial: ttyS2 at MMIO 0xf4702000 (irq = 27, base_baud = 15625000) is a 16550A [ 0.839705] cacheinfo: Unable to detect cache hierarchy for CPU 0 [ 0.846129] libphy: Fixed MDIO Bus: probed [ 0.850384] libphy: orion_mdio_bus: probed [ 0.854879] libphy: orion_mdio_bus: probed [ 0.862575] mousedev: PS/2 mouse device common for all mice [ 0.868314] rtc-efi rtc-efi: rtc core: registered rtc-efi as rtc0 [ 0.874446] i2c /dev entries driver [ 0.880187] sdhci: Secure Digital Host Controller Interface driver [ 0.886399] sdhci: Copyright(c) Pierre Ossman [ 0.890777] sdhci-pltfm: SDHCI platform and OF driver helper [ 0.896640] mmc0: Switching to 3.3V signalling voltage failed [ 0.927528] mmc0: SDHCI controller on f06e0000.sdhci [f06e0000.sdhci] using ADMA 64-bit [ 0.966330] mmc1: SDHCI controller on f2780000.sdhci [f2780000.sdhci] using ADMA 64-bit [ 0.976142] hw perfevents: enabled with armv8_cortex_a72 PMU driver, 7 counters available [ 0.984470] PCI: OF: host bridge /cp0/pcie@f2600000 ranges: [ 0.990120] PCI: OF: IO 0xeff00000..0xeff0ffff -> 0x00000000 [ 0.996083] PCI: OF: MEM 0xc0000000..0xdfffffff -> 0xc0000000 [ 1.002050] PCI: OF: MEM 0x800000000..0x8ffffffff -> 0x800000000 [ 1.008301] mmc0: new high speed MMC card at address 0001 [ 1.008337] armada8k-pcie f2600000.pcie: link up [ 1.013884] mmcblk0: mmc0:0001 8GME4R 7.28 GiB [ 1.018427] armada8k-pcie f2600000.pcie: PCI host bridge to bus 0000:00 [ 1.023036] mmcblk0boot0: mmc0:0001 8GME4R partition 1 4.00 MiB [ 1.029619] pci_bus 0000:00: root bus resource [bus 00-ff] [ 1.035652] mmcblk0boot1: mmc0:0001 8GME4R partition 2 4.00 MiB [ 1.041101] pci_bus 0000:00: root bus resource [io 0x0000-0xffff] [ 1.047085] mmcblk0rpmb: mmc0:0001 8GME4R partition 3 512 KiB, chardev (250:0) [ 1.053256] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xdfffffff] [ 1.067452] pci_bus 0000:00: root bus resource [mem 0x800000000-0x8ffffffff] [ 1.074559] mmcblk0: p1 p2 p3 [ 1.077642] pci 0000:00:00.0: disabling Extended Tags (this device can't handle them) [ 1.097253] pci 0000:00:00.0: BAR 9: assigned [mem 0x800000000-0x80fffffff 64bit pref] [ 1.105218] pci 0000:00:00.0: BAR 0: assigned [mem 0x810000000-0x8100fffff 64bit] [ 1.112749] pci 0000:00:00.0: BAR 8: assigned [mem 0xc0000000-0xc00fffff] [ 1.119578] pci 0000:00:00.0: BAR 7: assigned [io 0x1000-0x1fff] [ 1.125712] pci 0000:01:00.0: BAR 0: assigned [mem 0x800000000-0x80fffffff 64bit pref] [ 1.133705] pci 0000:01:00.0: BAR 2: assigned [mem 0xc0000000-0xc001ffff 64bit] [ 1.141088] pci 0000:01:00.0: BAR 6: assigned [mem 0xc0020000-0xc003ffff pref] [ 1.148353] pci 0000:01:00.1: BAR 0: assigned [mem 0xc0040000-0xc0043fff 64bit] [ 1.155738] pci 0000:01:00.0: BAR 4: assigned [io 0x1000-0x10ff] [ 1.161879] pci 0000:00:00.0: PCI bridge to [bus 01-ff] [ 1.167139] pci 0000:00:00.0: bridge window [io 0x1000-0x1fff] [ 1.173274] pci 0000:00:00.0: bridge window [mem 0xc0000000-0xc00fffff] [ 1.180106] pci 0000:00:00.0: bridge window [mem 0x800000000-0x80fffffff 64bit pref] [ 1.188210] mmc1: new high speed SDHC card at address 1234 [ 1.193873] mmcblk1: mmc1:1234 SA08G 7.41 GiB [ 1.198461] pcieport 0000:00:00.0: AER enabled with IRQ 32 [ 1.204020] pci 0000:01:00.1: Linked as a consumer to 0000:01:00.0 [ 1.210396] rtc-efi rtc-efi: setting system clock to 2018-08-06 20:01:28 UTC (1533585688) [ 1.211676] mmcblk1: p1 p2 [ 1.220717] v_5v0_usb3_hst_vbus: disabling [ 1.231509] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null) [ 1.239654] VFS: Mounted root (ext4 filesystem) readonly on device 179:1. [ 1.248061] devtmpfs: mounted [ 1.251151] Freeing unused kernel memory: 384K [ 1.325623] random: fast init done INIT: version 2.88 booting [info] Using makefile-style concurrent boot in runlevel S. [ 1.488069] NET: Registered protocol family 1 ERROR: could not open /proc/stat: No such file or directory [....] Starting the hotplug events dispatcher: systemd-udevdstarting version 239 [ ok . [....] Synthesizing the initial hotplug events...[ ok done. [ 1.786418] EFI Variables Facility v0.08 2004-May-17 [....] Waiting for /dev to be fully populated...[ 1.804433] mvpp2 f2000000.ethernet eth0: Using random mac address fe:a5:21:f0:f8:7d [ 1.806861] usbcore: registered new interface driver usbfs [ 1.817792] usbcore: registered new interface driver hub [ 1.817835] mvpp2 f4000000.ethernet eth1: Using random mac address 86:5f:16:0c:f9:16 [ 1.823172] usbcore: registered new device driver usb [ 1.837065] mvpp2 f4000000.ethernet eth2: Using random mac address 8e:6e:60:9f:57:60 [ 1.849493] ahci f2540000.sata: AHCI 0001.0000 32 slots 2 ports 6 Gbps 0x3 impl platform mode [ 1.851030] mvpp2 f4000000.ethernet eth3: Using random mac address c6:5e:07:9a:54:82 [ 1.859250] ahci f2540000.sata: flags: 64bit ncq sntf led only pmp fbs pio slum part sxs [ 1.874656] scsi host0: ahci [ 1.877789] scsi host1: ahci [ 1.880777] ata1: SATA max UDMA/133 mmio [mem 0xf2540000-0xf256ffff] port 0x100 irq 57 [ 1.888777] ata2: SATA max UDMA/133 mmio [mem 0xf2540000-0xf256ffff] port 0x180 irq 57 [ 1.897008] ahci f4540000.sata: AHCI 0001.0000 32 slots 2 ports 6 Gbps 0x3 impl platform mode [ 1.905629] ahci f4540000.sata: flags: 64bit ncq sntf led only pmp fbs pio slum part sxs [ 1.914173] scsi host2: ahci [ 1.917252] scsi host3: ahci [ 1.920225] ata3: SATA max UDMA/133 mmio [mem 0xf4540000-0xf456ffff] port 0x100 irq 58 [ 1.928198] ata4: SATA max UDMA/133 mmio [mem 0xf4540000-0xf456ffff] port 0x180 irq 58 [ 1.928608] xhci-hcd f2500000.usb3: xHCI Host Controller [ 1.942154] xhci-hcd f2500000.usb3: new USB bus registered, assigned bus number 1 [ 1.951232] xhci-hcd f2500000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x00010010 [ 1.959840] xhci-hcd f2500000.usb3: irq 59, io mem 0xf2500000 [ 1.966174] hub 1-0:1.0: USB hub found [ 1.970058] hub 1-0:1.0: 1 port detected [ 1.974150] xhci-hcd f2500000.usb3: xHCI Host Controller [ 1.979551] xhci-hcd f2500000.usb3: new USB bus registered, assigned bus number 2 [ 1.979558] xhci-hcd f2500000.usb3: Host supports USB 3.0 SuperSpeed [ 1.993840] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. [ 2.000947] cryptd: max_cpu_qlen set to 1000 [ 2.002463] hub 2-0:1.0: USB hub found [ 2.010089] hub 2-0:1.0: 1 port detected [ 2.014291] xhci-hcd f2510000.usb3: xHCI Host Controller [ 2.019647] xhci-hcd f2510000.usb3: new USB bus registered, assigned bus number 3 [ 2.027219] xhci-hcd f2510000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x00010010 [ 2.035823] xhci-hcd f2510000.usb3: irq 60, io mem 0xf2510000 [ 2.042445] hub 3-0:1.0: USB hub found [ 2.046236] hub 3-0:1.0: 1 port detected [ 2.050278] xhci-hcd f2510000.usb3: xHCI Host Controller [ 2.055768] xhci-hcd f2510000.usb3: new USB bus registered, assigned bus number 4 [ 2.063314] xhci-hcd f2510000.usb3: Host supports USB 3.0 SuperSpeed [ 2.069818] usb usb4: We don't know the algorithms for LPM for this host, disabling LPM. [ 2.078176] hub 4-0:1.0: USB hub found [ 2.081972] hub 4-0:1.0: 1 port detected [ 2.086215] xhci-hcd f4500000.usb3: xHCI Host Controller [ 2.091581] xhci-hcd f4500000.usb3: new USB bus registered, assigned bus number 5 [ 2.099158] xhci-hcd f4500000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x00010010 [ 2.107751] xhci-hcd f4500000.usb3: irq 61, io mem 0xf4500000 [ 2.113788] hub 5-0:1.0: USB hub found [ 2.117586] hub 5-0:1.0: 1 port detected [ 2.121642] xhci-hcd f4500000.usb3: xHCI Host Controller [ 2.126988] xhci-hcd f4500000.usb3: new USB bus registered, assigned bus number 6 [ 2.134514] xhci-hcd f4500000.usb3: Host supports USB 3.0 SuperSpeed [ 2.141009] usb usb6: We don't know the algorithms for LPM for this host, disabling LPM. [ 2.149354] hub 6-0:1.0: USB hub found [ 2.153156] hub 6-0:1.0: 1 port detected [ 2.162109] [drm] radeon kernel modesetting enabled. [ 2.167730] radeon 0000:01:00.0: enabling device (0000 -> 0003) [ 2.174555] [drm] initializing kernel modesetting (CEDAR 0x1002:0x68F9 0x1787:0x3000 0x00). [ 2.223834] ata1: SATA link down (SStatus 0 SControl 300) [ 2.250489] ata3: SATA link down (SStatus 0 SControl 300) [ 2.256599] ata4: SATA link down (SStatus 0 SControl 300) [ 2.303985] ATOM BIOS: CEDAR [ 2.306958] [drm] GPU not posted. posting now... [ 2.314564] radeon 0000:01:00.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used) [ 2.323482] radeon 0000:01:00.0: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF [ 2.331229] [drm] Detected VRAM RAM=1024M, BAR=256M [ 2.336194] [drm] RAM width 64bits DDR [ 2.343406] [TTM] Zone kernel: Available graphics memory: 2017038 kiB [ 2.349967] [TTM] Initializing pool allocator [ 2.354353] [TTM] Initializing DMA pool allocator [ 2.359105] [drm] radeon: 1024M of VRAM memory ready [ 2.364109] [drm] radeon: 1024M of GTT memory ready. [ 2.369260] [drm] Loading CEDAR Microcode [ 2.370083] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 2.380025] ata2.00: ATA-8: ST4000DM000-1F2168, CC52, max UDMA/133 [ 2.386240] ata2.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32) [ 2.393559] ata2.00: configured for UDMA/133 [ 2.397952] scsi 1:0:0:0: Direct-Access ATA ST4000DM000-1F21 CC52 PQ: 0 ANSI: 5 [ 2.418024] sd 1:0:0:0: [sda] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) [ 2.420812] [drm] Internal thermal controller with fan control [ 2.425823] sd 1:0:0:0: [sda] 4096-byte physical blocks [ 2.436947] sd 1:0:0:0: [sda] Write Protect is off [ 2.441160] [drm] radeon: dpm initialized [ 2.445807] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 2.459901] [drm] GART: num cpu pages 262144, num gpu pages 262144 [ 2.460082] usb 5-1: new high-speed USB device number 2 using xhci-hcd [ 2.466365] sd 1:0:0:0: [sda] Attached SCSI removable disk [ 2.467195] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0 [ 2.511728] NET: Registered protocol family 10 [ 2.516748] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 2.522778] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready [ 2.528771] Segment Routing with IPv6 [ 2.548836] [drm] PCIE GART of 1024M enabled (table at 0x000000000014C000). [ 2.556059] radeon 0000:01:00.0: WB enabled [ 2.560272] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0x (ptrval) [ 2.571109] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0x (ptrval) [ 2.585425] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0x (ptrval) [ 2.596259] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 2.602916] [drm] Driver supports precise vblank timestamp query. [ 2.609046] radeon 0000:01:00.0: radeon: MSI limited to 32-bit [ 2.615006] radeon 0000:01:00.0: radeon: using MSI. [ 2.619944] [drm] radeon: irq initialized. [ 2.635912] hub 5-1:1.0: USB hub found [ 2.639847] hub 5-1:1.0: 4 ports detected [ 2.644731] [drm] ring test on 0 succeeded in 0 usecs [ 2.649815] [drm] ring test on 3 succeeded in 3 usecs [ 2.797941] usb 6-1: new SuperSpeed USB device number 2 using xhci-hcd [ 2.833616] [drm] ring test on 5 succeeded in 1 usecs [ 2.838697] [drm] UVD initialized successfully. [ 2.843495] [drm] ib test on ring 0 succeeded in 0 usecs [ 2.848884] [drm] ib test on ring 3 succeeded in 0 usecs [ 2.905618] hub 6-1:1.0: USB hub found [ 2.909579] hub 6-1:1.0: 4 ports detected [ 2.986738] usb 5-1.4: new high-speed USB device number 3 using xhci-hcd [ 3.006863] [drm] ib test on ring 5 succeeded [ 3.011876] [drm] Radeon Display Connectors [ 3.016085] [drm] Connector 0: [ 3.019154] [drm] DP-1 [ 3.021698] [drm] HPD2 [ 3.024244] [drm] DDC: 0x6460 0x6460 0x6464 0x6464 0x6468 0x6468 0x646c 0x646c [ 3.031672] [drm] Encoders: [ 3.034653] [drm] DFP1: INTERNAL_UNIPHY1 [ 3.038941] [drm] Connector 1: [ 3.042008] [drm] DVI-I-1 [ 3.044814] [drm] HPD4 [ 3.047358] [drm] DDC: 0x6440 0x6440 0x6444 0x6444 0x6448 0x6448 0x644c 0x644c [ 3.054786] [drm] Encoders: [ 3.057767] [drm] DFP2: INTERNAL_UNIPHY [ 3.061968] [drm] CRT1: INTERNAL_KLDSCP_DAC1 [ 3.066605] [drm] Connector 2: [ 3.069672] [drm] DVI-I-2 [ 3.072478] [drm] HPD1 [ 3.075023] [drm] DDC: 0x6430 0x6430 0x6434 0x6434 0x6438 0x6438 0x643c 0x643c [ 3.082450] [drm] Encoders: [ 3.085430] [drm] DFP3: INTERNAL_UNIPHY1 [ 3.089719] [drm] CRT2: INTERNAL_KLDSCP_DAC2 [ 3.095924] hub 5-1.4:1.0: USB hub found [ 3.100002] hub 5-1.4:1.0: 4 ports detected [ 3.110713] mvpp2 f2000000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx [ 3.118686] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 3.206262] [drm] fb mappable at 0x80034D000 [ 3.210554] [drm] vram apper at 0x800000000 [ 3.214756] [drm] size 8294400 [ 3.217823] [drm] fb depth is 24 [ 3.221064] [drm] pitch is 7680 [ 3.264754] Console: switching to colour frame buffer device 240x67 [ 3.278056] radeon 0000:01:00.0: fb0: radeondrmfb frame buffer device [ 3.299311] [drm] Initialized radeon 2.50.0 20080528 for 0000:01:00.0 on minor 0 [ ok done. [....] Setting up keyboard layout...[ 3.351975] usb 6-1.4: new SuperSpeed USB device number 3 using xhci-hcd [ ok done. [ 3.458706] hub 6-1.4:1.0: USB hub found [ 3.462996] hub 6-1.4:1.0: 4 ports detected [ 3.540089] usb 5-1.4.1: new full-speed USB device number 4 using xhci-hcd [ 3.551721] EXT4-fs (mmcblk0p1): re-mounted. Opts: (null) [....] Checking root file system...fsck from util-linux 2.32 /dev/mmcblk0p1: clean, 165507/475136 files, 4946295/7599104 blocks [ ok done. [ 3.603163] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=60 [ 3.723430] usb 5-1.4.1: new high-speed USB device number 5 using xhci-hcd [....] Activating lvm and md swap...[ ok done. [....] Checking file systems...fsck from util-linux 2.32 checking super block... filesystem is clean, no checking needed. [ ok done. [ 3.892330] usbcore: registered new interface driver usbhid [ 3.898466] usbhid: USB HID core driver [....] Cleaning up temporary files... /tmp[ ok . [ 3.971996] usbcore: registered new interface driver snd-usb-audio [ 3.981106] input: ASUS Xonar U7 MKII as /devices/platform/cp1/cp1:config-space@f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.1/5-1.4.1:1.4/0003:0B05:183C.0001/input/input0 [ 3.996748] usb 5-1.4.2: new low-speed USB device number 6 using xhci-hcd [info] Loading kernel module nf_conntrack_ftp. [info] Loading kernel module snd-usb-audio. [info] Loading kernel module fbcon. modprobe: FATAL: Module fbcon not found in directory /lib/modules/4.17.11 [info] Loading kernel module udl. modprobe: FATAL: Module udl not found in directory /lib/modules/4.17.11 [ 4.050298] hid-generic 0003:0B05:183C.0001: input: USB HID v1.00 Device [ASUS Xonar U7 MKII] on usb-f4500000.usb3-1.4.1/input4 [ 4.143548] random: alsactl: uninitialized urandom read (4 bytes read) [ 4.159127] input: Logitech USB-PS/2 Optical Mouse as /devices/platform/cp1/cp1:config-space@f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.2/5-1.4.2:1.0/0003:046D:C01E.0002/input/input1 [ 4.175703] hid-generic 0003:046D:C01E.0002: input: USB HID v1.10 Mouse [Logitech USB-PS/2 Optical Mouse] on usb-f4500000.usb3-1.4.2/input0 [ 4.252844] Adding 4194300k swap on /i/SWAP. Priority:-2 extents:1 across:4194300k [ 4.270569] mvpp2 f4000000.ethernet eth2: Link is Up - 100Mbps/Full - flow control rx/tx [ 4.278738] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready [ 4.316754] usb 5-1.4.3: new low-speed USB device number 7 using xhci-hcd [....] Mounting local filesystems...[ ok done. [....] Activating swapfile swap...[ ok done. [....] Cleaning up temporary files...[ ok . [ 4.366696] random: dd: uninitialized urandom read (512 bytes read) [....] Starting Setting kernel variables: sysctl[ ok . [ 4.468860] PPP generic driver version 2.4.2 [ 4.474859] input: GASIA PS2toUSB Adapter as /devices/platform/cp1/cp1:config-space@f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.3/5-1.4.3:1.0/0003:0E8F:0020.0003/input/input2 [ 4.477505] NET: Registered protocol family 17 [ 4.539749] NET: Registered protocol family 24 [ 4.550179] hid-generic 0003:0E8F:0020.0003: input: USB HID v1.10 Keyboard [GASIA PS2toUSB Adapter] on usb-f4500000.usb3-1.4.3/input0 [ 4.566423] input: GASIA PS2toUSB Adapter as /devices/platform/cp1/cp1:config-space@f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.3/5-1.4.3:1.1/0003:0E8F:0020.0004/input/input3 [....] Configuring network interfaces...Plugin rp-pppoe.so loaded. ifup: interface eth0 already configured ifup: interface eth2 already configured [ ok done. [....] Cleaning up temporary files...[ ok . [ 4.636900] hid-generic 0003:0E8F:0020.0004: input: USB HID v1.10 Mouse [GASIA PS2toUSB Adapter] on usb-f4500000.usb3-1.4.3/input1 [ 4.660387] random: alsactl: uninitialized urandom read (4 bytes read) [....] Setting up X socket directories... /tmp/.X11-unix /tmp/.ICE-unix[ ok . [....] Setting sensors limits...[ ok done. [....] Setting up ALSA...[ ok done. [....] Loading netfilter rules...run-parts: executing /usr/share/netfilter-persistent/plugins.d/15-ip4tables start run-parts: executing /usr/share/netfilter-persistent/plugins.d/25-ip6tables start [ 4.743426] usb 5-1.4.4: new high-speed USB device number 8 using xhci-hcd [ ok done. INIT: Entering runlevel: 2 [info] Using makefile-style concurrent boot in runlevel 2. [....] Enabling additional executable binary formats: binfmt-support[ ok . [....] Setting up console font and keymap...[ ok done. [ 4.878142] udlfb 5-1.4.4:1.0: vendor descriptor length: 34 data: 22 5f 01 00 20 05 00 01 03 00 04 [ 4.887190] udlfb 5-1.4.4:1.0: DL chip limited to 2360000 pixel modes [....] Starting enhanced syslogd: rsyslogd[ ok . [ 5.046171] usb 5-1.4.4: Unable to get valid EDID from device/display [ 5.055288] usb 5-1.4.4: fb1 is DisplayLink USB device (800x600, 1880K framebuffer memory) [ 5.063655] usbcore: registered new interface driver udlfb [....] Starting system message bus: dbus[ ok . [....] Loading cpufreq kernel modules...[ ok done (none). [....] Starting mouse interface server: gpm[ ok . [....] Starting NTP server: ntpd[ ok . [ 5.167095] urandom_read: 3 callbacks suppressed [ 5.167098] random: automount: uninitialized urandom read (4 bytes read) [ 5.195518] random: isc-worker0000: uninitialized urandom read (10 bytes read) [ 5.202800] random: isc-worker0000: uninitialized urandom read (40 bytes read) [....] Starting automount...[ ok . [....] Starting domain name service...: bind9[ ok . [....] Starting virtual private network daemon:[ ok . [....] CPUFreq Utilities: Setting ondemand CPUFreq governor...disabled, governor not available...[ ok done. Starting very small Busybox based DHCP server: Starting /usr/sbin/udhcpd... udhcpd. Starting radvd: radvd. [....] Starting periodic command scheduler: cron[ ok . [....] Starting OpenBSD Secure Shell server: sshd[ ok . [....] Starting WIDE DHCPv6 client: dhcp6c[ ok . Debian GNU/Linux buster/sid leontynka ttyS0 leontynka login: [ 10.373259] random: crng init done [ 10.376676] random: 1 urandom warning(s) missed due to ratelimiting [ 23.931568] tun: Universal TUN/TAP device driver, 1.6 ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-07 17:39 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-07 17:39 UTC (permalink / raw) To: linux-arm-kernel On Tue, 7 Aug 2018, Marcin Wojtas wrote: > Ard, Mikulas, > > After some self-caused setup issues I was able to run the test on my > MacchiatoBin with the kernel v4.18-rc8. It's been running for 1h+ now, > loading the CPU to 100% and no single error event... > > I built the binary file with: > gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc -O2 > > Maybe it's the older firmware issue? I have downloaded and built the firmware recently (it has timestamp Jul 30 2018). Do you still have your firmware file "flash-image.bin" that you used, so that I could try it? > Please send the full bootlog with > the very first line after reset. My board rev is v1.3 and I use > mainline UEFI (newest edk2 + edk2-platforms) + newest publicly > available ARM-TF and earliest firmware for this board. > > Best regards, > Marcin This is my bootlog: BootROM - 2.03 Starting CP-0 IOROM 1.07 Booting from SD 0 (0x29) Found valid image at boot postion 0x002 lNOTICE: Starting binary extension NOTICE: SVC: SW Revision 0x0. SVC is not supported mv_ddr: mv_ddr-devel-18.05.0-g84dd1d9 (Jul 30 2018 - 04:58:51 PM) mv_ddr: completed successfully NOTICE: Cold boot NOTICE: Booting Trusted Firmware NOTICE: BL1: v1.4(release):armada-18.05.2:80bbf686 NOTICE: BL1: Built : 17:00:18, Jul 30 2018 NOTICE: BL1: Booting BL2 lNOTICE: BL2: v1.4(release):armada-18.05.2:80bbf686 NOTICE: BL2: Built : 17:00:21, Jul 30 2018 BL2: Initiating SCP_BL2 transfer to SCP NOTICE: SCP_BL2 contains 2 concatenated images NOTICE: Load image to CP1 MSS AP0 NOTICE: Loading MSS image from address 0x4023020 Size 0x135c to MSS at 0xf4280000 NOTICE: Done NOTICE: Load image to AP0 MSS NOTICE: Loading MSS image from address 0x402437c Size 0x1f6c to MSS at 0xf0580000 N FreeRTOS 7.3.0 - Marvell cm3 - A8K release armada-18.05.1 OTICE: Done NOTICE: SCP Image doesn't contain PM firmware NOTICE: BL1: Booting BL31 lNOTICE: MSS PM is not supported in this build NOTICE: BL31: v1.4(release):armada-18.05.2:80bbf686 NOTICE: BL31: Built : 17:00:21, Jul 30 2018 lUEFI firmware (version MARVELL_EFI built at 16:50:27 on Jul 30 2018) Armada 8040 MachiatoBin Platform Init Comphy0-0: PCIE0 5 Gbps Comphy0-1: PCIE0 5 Gbps Comphy0-2: PCIE0 5 Gbps Comphy0-3: PCIE0 5 Gbps Comphy0-4: SFI 10.31 Gbps Comphy0-5: SATA1 5 Gbps Comphy1-0: SGMII1 1.25 Gbps Comphy1-1: SATA2 5 Gbps Comphy1-2: USB3_HOST0 5 Gbps Comphy1-3: SATA3 5 Gbps Comphy1-4: SFI 10.31 Gbps Comphy1-5: SGMII2 3.125 Gbps UTMI PHY 0 initialized to USB Host0 UTMI PHY 1 initialized to USB Host1 UTMI PHY 2 initialized to USB Host0 Succesfully installed protocol interfaces Error: Image at 000BF6F8000 start failed: 00000001 remove-symbol-file /usr/src/git/macchiato/edk2/Build/Armada80x0McBin-AARCH64/RELEASE_GCC5/AARCH64/MdeModulePkg/Universal/Acpi/AcpiPlatformDxe/AcpiPlatformDxe/DEBUG/AcpiPlatform.dll 0xBF6F9000 Detected w25q32bv SPI flash with page size 256 B, erase size 4 KB, total 4 MB ramdisk:blckio install. Status=Success Connect: PcieRoot(0x0)/Pci(0x0,0x0): Not Found 3h3h3hTianocore/EDK2 firmware version MARVELL_EFI Press ESCAPE for boot options ...error: no suitable video mode found. error: no video mode activated. GNU GRUB version 2.02~beta3-5 /----------------------------------------------------------------------------\||||||||||||||||||||||||||\----------------------------------------------------------------------------/ Use the ^ and v keys to select which entry is highlighted. Press enter to boot the selected OS, `e' to edit the commands before booting or `c' for a command-line. *Debian GNU/Linux Advanced options for Debian GNU/Linux System setup The highlighted entry will be executed automatically in 5s. The highlighted entry will be executed automatically in 4s. Loading Linux 4.17.11 ... EFI stub: Booting Linux Kernel... EFI stub: Using DTB from configuration table EFI stub: Exiting boot services and installing virtual address map... [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd081] [ 0.000000] Linux version 4.17.11 (root at leontynka) (gcc version 8.2.0 (Debian 8.2.0-2)) #10 SMP PREEMPT Fri Aug 3 18:29:35 CEST 2018 [ 0.000000] Machine model: Marvell 8040 MACCHIATOBin [ 0.000000] efi: Getting EFI parameters from FDT: [ 0.000000] efi: EFI v2.70 by EDK II [ 0.000000] efi: SMBIOS 3.0=0xbfed0000 ACPI 2.0=0xb6760000 MEMATTR=0xb8c63518 RNG=0xbffdcf98 [ 0.000000] efi: seeding entropy pool [ 0.000000] psci: probing for conduit method from DT. [ 0.000000] psci: PSCIv1.0 detected in firmware. [ 0.000000] psci: Using standard PSCI v0.2 function IDs [ 0.000000] psci: MIGRATE_INFO_TYPE not supported. [ 0.000000] psci: SMC Calling Convention v1.1 [ 0.000000] percpu: Embedded 26 pages/cpu @ (ptrval) s67096 r8192 d31208 u106496 [ 0.000000] Detected PIPT I-cache on CPU0 [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 1031688 [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.17.11 root=/dev/mmcblk0p1 ro console=ttyS0,115200 [ 0.000000] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) [ 0.000000] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) [ 0.000000] software IO TLB [mem 0xbb810000-0xbf810000] (64MB) mapped at [ (ptrval)- (ptrval)] [ 0.000000] Memory: 4033692K/4192256K available (4860K kernel code, 376K rwdata, 2452K rodata, 384K init, 2178K bss, 158564K reserved, 0K cma-reserved) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 [ 0.000000] Preemptible hierarchical RCU implementation. [ 0.000000] Tasks RCU enabled. [ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 [ 0.000000] GIC: Adjusting CPU interface base to 0x00000000f022f000 [ 0.000000] GIC: Using split EOI/Deactivate mode [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:160, num:32) [ 0.000000] GICv2m: range[mem 0xf0280000-0xf0280fff], SPI[160:191] [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:192, num:32) [ 0.000000] GICv2m: range[mem 0xf0290000-0xf0290fff], SPI[192:223] [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:224, num:32) [ 0.000000] GICv2m: range[mem 0xf02a0000-0xf02a0fff], SPI[224:255] [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:256, num:32) [ 0.000000] GICv2m: range[mem 0xf02b0000-0xf02b0fff], SPI[256:287] [ 0.000000] arch_timer: cp15 timer(s) running at 25.00MHz (phys). [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x5c40939b5, max_idle_ns: 440795202646 ns [ 0.000002] sched_clock: 56 bits at 25MHz, resolution 40ns, wraps every 4398046511100ns [ 0.000113] Console: colour dummy device 174x49 [ 0.000124] Calibrating delay loop (skipped), value calculated using timer frequency.. 50.08 BogoMIPS (lpj=83333) [ 0.000129] pid_max: default: 32768 minimum: 301 [ 0.000151] Security Framework initialized [ 0.000154] Yama: becoming mindful. [ 0.000183] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes) [ 0.000199] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes) [ 0.016676] ASID allocator initialised with 65536 entries [ 0.020006] Hierarchical SRCU implementation. [ 0.023435] Remapping and enabling EFI services. [ 0.026680] smp: Bringing up secondary CPUs ... [ 0.043500] Detected PIPT I-cache on CPU1 [ 0.043522] CPU1: Booted secondary processor 0x0000000001 [0x410fd081] [ 0.060176] Detected PIPT I-cache on CPU2 [ 0.060195] CPU2: Booted secondary processor 0x0000000100 [0x410fd081] [ 0.076859] Detected PIPT I-cache on CPU3 [ 0.076872] CPU3: Booted secondary processor 0x0000000101 [0x410fd081] [ 0.076901] smp: Brought up 1 node, 4 CPUs [ 0.076910] SMP: Total of 4 processors activated. [ 0.076913] CPU features: detected: 32-bit EL0 Support [ 0.077194] CPU: All CPU(s) started at EL2 [ 0.077205] alternatives: patching kernel code [ 0.077230] random: get_random_u64 called from compute_layout+0x94/0xe8 with crng_init=0 [ 0.077599] devtmpfs: initialized [ 0.078967] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370867519511994 ns [ 0.078975] futex hash table entries: 1024 (order: 5, 131072 bytes) [ 0.079032] pinctrl core: initialized pinctrl subsystem [ 0.079183] SMBIOS 3.0.0 present. [ 0.079191] DMI: Marvell Armada 8040 MacchiatoBin/Armada 8040 MacchiatoBin, BIOS EDK II Jul 30 2018 [ 0.079264] NET: Registered protocol family 16 [ 0.079484] cpuidle: using governor ladder [ 0.079535] cpuidle: using governor menu [ 0.079559] vdso: 2 pages (1 code @ (ptrval), 1 data @ (ptrval)) [ 0.079562] vdso: 2 pages (1 code @ (ptrval), 1 data @ (ptrval)) [ 0.079569] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers. [ 0.079681] DMA: preallocated 256 KiB pool for atomic allocations [ 0.082434] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages [ 0.082669] ACPI: Interpreter disabled. [ 0.082822] reg-fixed-voltage regulator-usb3-vbus0: could not find pctldev for node /cp0/config-space at f2000000/system-controller at 440000/pinctrl/xhci0-vbus-pins, deferring probe [ 0.082929] SCSI subsystem initialized [ 0.083033] Registered efivars operations [ 0.083398] clocksource: Switched to clocksource arch_sys_counter [ 0.083517] pnp: PnP ACPI: disabled [ 0.085008] NET: Registered protocol family 2 [ 0.085146] tcp_listen_portaddr_hash hash table entries: 2048 (order: 3, 32768 bytes) [ 0.085156] TCP established hash table entries: 32768 (order: 6, 262144 bytes) [ 0.085213] TCP bind hash table entries: 32768 (order: 7, 524288 bytes) [ 0.085441] TCP: Hash tables configured (established 32768 bind 32768) [ 0.085492] UDP hash table entries: 2048 (order: 4, 65536 bytes) [ 0.085507] UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes) [ 0.085713] hw perfevents: unable to count PMU IRQs [ 0.085718] hw perfevents: /ap806/config-space at f0000000/pmu: failed to register PMU devices! [ 0.085823] kvm [1]: 8-bit VMID [ 0.086279] kvm [1]: vgic interrupt IRQ1 [ 0.086339] kvm [1]: Hyp mode initialized successfully [ 0.086649] workingset: timestamp_bits=62 max_order=20 bucket_order=0 [ 0.088566] io scheduler noop registered [ 0.088625] io scheduler cfq registered (default) [ 0.089467] armada-ap806-pinctrl f06f4000.system-controller:pinctrl: registered pinctrl driver [ 0.089690] armada-cp110-pinctrl f2440000.system-controller:pinctrl: registered pinctrl driver [ 0.089843] armada-cp110-pinctrl f4440000.system-controller:pinctrl: registered pinctrl driver [ 0.091439] mv_xor_v2 f0400000.xor: Marvell Version 2 XOR driver [ 0.091583] mv_xor_v2 f0420000.xor: Marvell Version 2 XOR driver [ 0.091736] mv_xor_v2 f0440000.xor: Marvell Version 2 XOR driver [ 0.091897] mv_xor_v2 f0460000.xor: Marvell Version 2 XOR driver [ 0.092084] mv_xor_v2 f26a0000.xor: Marvell Version 2 XOR driver [ 0.092247] mv_xor_v2 f26c0000.xor: Marvell Version 2 XOR driver [ 0.092432] mv_xor_v2 f46a0000.xor: Marvell Version 2 XOR driver [ 0.092596] mv_xor_v2 f46c0000.xor: Marvell Version 2 XOR driver [ 0.092685] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled [ 0.092996] console [ttyS0] disabled [ 0.113364] f0512000.serial: ttyS0 at MMIO 0xf0512000 (irq = 7, base_baud = 12500000) is a 16550A [ 0.777029] console [ttyS0] enabled [ 0.801032] f2702100.serial: ttyS1 at MMIO 0xf2702100 (irq = 26, base_baud = 15625000) is a 16550A [ 0.830524] f4702000.serial: ttyS2 at MMIO 0xf4702000 (irq = 27, base_baud = 15625000) is a 16550A [ 0.839705] cacheinfo: Unable to detect cache hierarchy for CPU 0 [ 0.846129] libphy: Fixed MDIO Bus: probed [ 0.850384] libphy: orion_mdio_bus: probed [ 0.854879] libphy: orion_mdio_bus: probed [ 0.862575] mousedev: PS/2 mouse device common for all mice [ 0.868314] rtc-efi rtc-efi: rtc core: registered rtc-efi as rtc0 [ 0.874446] i2c /dev entries driver [ 0.880187] sdhci: Secure Digital Host Controller Interface driver [ 0.886399] sdhci: Copyright(c) Pierre Ossman [ 0.890777] sdhci-pltfm: SDHCI platform and OF driver helper [ 0.896640] mmc0: Switching to 3.3V signalling voltage failed [ 0.927528] mmc0: SDHCI controller on f06e0000.sdhci [f06e0000.sdhci] using ADMA 64-bit [ 0.966330] mmc1: SDHCI controller on f2780000.sdhci [f2780000.sdhci] using ADMA 64-bit [ 0.976142] hw perfevents: enabled with armv8_cortex_a72 PMU driver, 7 counters available [ 0.984470] PCI: OF: host bridge /cp0/pcie at f2600000 ranges: [ 0.990120] PCI: OF: IO 0xeff00000..0xeff0ffff -> 0x00000000 [ 0.996083] PCI: OF: MEM 0xc0000000..0xdfffffff -> 0xc0000000 [ 1.002050] PCI: OF: MEM 0x800000000..0x8ffffffff -> 0x800000000 [ 1.008301] mmc0: new high speed MMC card at address 0001 [ 1.008337] armada8k-pcie f2600000.pcie: link up [ 1.013884] mmcblk0: mmc0:0001 8GME4R 7.28 GiB [ 1.018427] armada8k-pcie f2600000.pcie: PCI host bridge to bus 0000:00 [ 1.023036] mmcblk0boot0: mmc0:0001 8GME4R partition 1 4.00 MiB [ 1.029619] pci_bus 0000:00: root bus resource [bus 00-ff] [ 1.035652] mmcblk0boot1: mmc0:0001 8GME4R partition 2 4.00 MiB [ 1.041101] pci_bus 0000:00: root bus resource [io 0x0000-0xffff] [ 1.047085] mmcblk0rpmb: mmc0:0001 8GME4R partition 3 512 KiB, chardev (250:0) [ 1.053256] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xdfffffff] [ 1.067452] pci_bus 0000:00: root bus resource [mem 0x800000000-0x8ffffffff] [ 1.074559] mmcblk0: p1 p2 p3 [ 1.077642] pci 0000:00:00.0: disabling Extended Tags (this device can't handle them) [ 1.097253] pci 0000:00:00.0: BAR 9: assigned [mem 0x800000000-0x80fffffff 64bit pref] [ 1.105218] pci 0000:00:00.0: BAR 0: assigned [mem 0x810000000-0x8100fffff 64bit] [ 1.112749] pci 0000:00:00.0: BAR 8: assigned [mem 0xc0000000-0xc00fffff] [ 1.119578] pci 0000:00:00.0: BAR 7: assigned [io 0x1000-0x1fff] [ 1.125712] pci 0000:01:00.0: BAR 0: assigned [mem 0x800000000-0x80fffffff 64bit pref] [ 1.133705] pci 0000:01:00.0: BAR 2: assigned [mem 0xc0000000-0xc001ffff 64bit] [ 1.141088] pci 0000:01:00.0: BAR 6: assigned [mem 0xc0020000-0xc003ffff pref] [ 1.148353] pci 0000:01:00.1: BAR 0: assigned [mem 0xc0040000-0xc0043fff 64bit] [ 1.155738] pci 0000:01:00.0: BAR 4: assigned [io 0x1000-0x10ff] [ 1.161879] pci 0000:00:00.0: PCI bridge to [bus 01-ff] [ 1.167139] pci 0000:00:00.0: bridge window [io 0x1000-0x1fff] [ 1.173274] pci 0000:00:00.0: bridge window [mem 0xc0000000-0xc00fffff] [ 1.180106] pci 0000:00:00.0: bridge window [mem 0x800000000-0x80fffffff 64bit pref] [ 1.188210] mmc1: new high speed SDHC card at address 1234 [ 1.193873] mmcblk1: mmc1:1234 SA08G 7.41 GiB [ 1.198461] pcieport 0000:00:00.0: AER enabled with IRQ 32 [ 1.204020] pci 0000:01:00.1: Linked as a consumer to 0000:01:00.0 [ 1.210396] rtc-efi rtc-efi: setting system clock to 2018-08-06 20:01:28 UTC (1533585688) [ 1.211676] mmcblk1: p1 p2 [ 1.220717] v_5v0_usb3_hst_vbus: disabling [ 1.231509] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null) [ 1.239654] VFS: Mounted root (ext4 filesystem) readonly on device 179:1. [ 1.248061] devtmpfs: mounted [ 1.251151] Freeing unused kernel memory: 384K [ 1.325623] random: fast init done INIT: version 2.88 booting [info] Using makefile-style concurrent boot in runlevel S. [ 1.488069] NET: Registered protocol family 1 ERROR: could not open /proc/stat: No such file or directory [....] Starting the hotplug events dispatcher: systemd-udevdstarting version 239 [ ok . [....] Synthesizing the initial hotplug events...[ ok done. [ 1.786418] EFI Variables Facility v0.08 2004-May-17 [....] Waiting for /dev to be fully populated...[ 1.804433] mvpp2 f2000000.ethernet eth0: Using random mac address fe:a5:21:f0:f8:7d [ 1.806861] usbcore: registered new interface driver usbfs [ 1.817792] usbcore: registered new interface driver hub [ 1.817835] mvpp2 f4000000.ethernet eth1: Using random mac address 86:5f:16:0c:f9:16 [ 1.823172] usbcore: registered new device driver usb [ 1.837065] mvpp2 f4000000.ethernet eth2: Using random mac address 8e:6e:60:9f:57:60 [ 1.849493] ahci f2540000.sata: AHCI 0001.0000 32 slots 2 ports 6 Gbps 0x3 impl platform mode [ 1.851030] mvpp2 f4000000.ethernet eth3: Using random mac address c6:5e:07:9a:54:82 [ 1.859250] ahci f2540000.sata: flags: 64bit ncq sntf led only pmp fbs pio slum part sxs [ 1.874656] scsi host0: ahci [ 1.877789] scsi host1: ahci [ 1.880777] ata1: SATA max UDMA/133 mmio [mem 0xf2540000-0xf256ffff] port 0x100 irq 57 [ 1.888777] ata2: SATA max UDMA/133 mmio [mem 0xf2540000-0xf256ffff] port 0x180 irq 57 [ 1.897008] ahci f4540000.sata: AHCI 0001.0000 32 slots 2 ports 6 Gbps 0x3 impl platform mode [ 1.905629] ahci f4540000.sata: flags: 64bit ncq sntf led only pmp fbs pio slum part sxs [ 1.914173] scsi host2: ahci [ 1.917252] scsi host3: ahci [ 1.920225] ata3: SATA max UDMA/133 mmio [mem 0xf4540000-0xf456ffff] port 0x100 irq 58 [ 1.928198] ata4: SATA max UDMA/133 mmio [mem 0xf4540000-0xf456ffff] port 0x180 irq 58 [ 1.928608] xhci-hcd f2500000.usb3: xHCI Host Controller [ 1.942154] xhci-hcd f2500000.usb3: new USB bus registered, assigned bus number 1 [ 1.951232] xhci-hcd f2500000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x00010010 [ 1.959840] xhci-hcd f2500000.usb3: irq 59, io mem 0xf2500000 [ 1.966174] hub 1-0:1.0: USB hub found [ 1.970058] hub 1-0:1.0: 1 port detected [ 1.974150] xhci-hcd f2500000.usb3: xHCI Host Controller [ 1.979551] xhci-hcd f2500000.usb3: new USB bus registered, assigned bus number 2 [ 1.979558] xhci-hcd f2500000.usb3: Host supports USB 3.0 SuperSpeed [ 1.993840] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. [ 2.000947] cryptd: max_cpu_qlen set to 1000 [ 2.002463] hub 2-0:1.0: USB hub found [ 2.010089] hub 2-0:1.0: 1 port detected [ 2.014291] xhci-hcd f2510000.usb3: xHCI Host Controller [ 2.019647] xhci-hcd f2510000.usb3: new USB bus registered, assigned bus number 3 [ 2.027219] xhci-hcd f2510000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x00010010 [ 2.035823] xhci-hcd f2510000.usb3: irq 60, io mem 0xf2510000 [ 2.042445] hub 3-0:1.0: USB hub found [ 2.046236] hub 3-0:1.0: 1 port detected [ 2.050278] xhci-hcd f2510000.usb3: xHCI Host Controller [ 2.055768] xhci-hcd f2510000.usb3: new USB bus registered, assigned bus number 4 [ 2.063314] xhci-hcd f2510000.usb3: Host supports USB 3.0 SuperSpeed [ 2.069818] usb usb4: We don't know the algorithms for LPM for this host, disabling LPM. [ 2.078176] hub 4-0:1.0: USB hub found [ 2.081972] hub 4-0:1.0: 1 port detected [ 2.086215] xhci-hcd f4500000.usb3: xHCI Host Controller [ 2.091581] xhci-hcd f4500000.usb3: new USB bus registered, assigned bus number 5 [ 2.099158] xhci-hcd f4500000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x00010010 [ 2.107751] xhci-hcd f4500000.usb3: irq 61, io mem 0xf4500000 [ 2.113788] hub 5-0:1.0: USB hub found [ 2.117586] hub 5-0:1.0: 1 port detected [ 2.121642] xhci-hcd f4500000.usb3: xHCI Host Controller [ 2.126988] xhci-hcd f4500000.usb3: new USB bus registered, assigned bus number 6 [ 2.134514] xhci-hcd f4500000.usb3: Host supports USB 3.0 SuperSpeed [ 2.141009] usb usb6: We don't know the algorithms for LPM for this host, disabling LPM. [ 2.149354] hub 6-0:1.0: USB hub found [ 2.153156] hub 6-0:1.0: 1 port detected [ 2.162109] [drm] radeon kernel modesetting enabled. [ 2.167730] radeon 0000:01:00.0: enabling device (0000 -> 0003) [ 2.174555] [drm] initializing kernel modesetting (CEDAR 0x1002:0x68F9 0x1787:0x3000 0x00). [ 2.223834] ata1: SATA link down (SStatus 0 SControl 300) [ 2.250489] ata3: SATA link down (SStatus 0 SControl 300) [ 2.256599] ata4: SATA link down (SStatus 0 SControl 300) [ 2.303985] ATOM BIOS: CEDAR [ 2.306958] [drm] GPU not posted. posting now... [ 2.314564] radeon 0000:01:00.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used) [ 2.323482] radeon 0000:01:00.0: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF [ 2.331229] [drm] Detected VRAM RAM=1024M, BAR=256M [ 2.336194] [drm] RAM width 64bits DDR [ 2.343406] [TTM] Zone kernel: Available graphics memory: 2017038 kiB [ 2.349967] [TTM] Initializing pool allocator [ 2.354353] [TTM] Initializing DMA pool allocator [ 2.359105] [drm] radeon: 1024M of VRAM memory ready [ 2.364109] [drm] radeon: 1024M of GTT memory ready. [ 2.369260] [drm] Loading CEDAR Microcode [ 2.370083] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 2.380025] ata2.00: ATA-8: ST4000DM000-1F2168, CC52, max UDMA/133 [ 2.386240] ata2.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32) [ 2.393559] ata2.00: configured for UDMA/133 [ 2.397952] scsi 1:0:0:0: Direct-Access ATA ST4000DM000-1F21 CC52 PQ: 0 ANSI: 5 [ 2.418024] sd 1:0:0:0: [sda] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) [ 2.420812] [drm] Internal thermal controller with fan control [ 2.425823] sd 1:0:0:0: [sda] 4096-byte physical blocks [ 2.436947] sd 1:0:0:0: [sda] Write Protect is off [ 2.441160] [drm] radeon: dpm initialized [ 2.445807] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 2.459901] [drm] GART: num cpu pages 262144, num gpu pages 262144 [ 2.460082] usb 5-1: new high-speed USB device number 2 using xhci-hcd [ 2.466365] sd 1:0:0:0: [sda] Attached SCSI removable disk [ 2.467195] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0 [ 2.511728] NET: Registered protocol family 10 [ 2.516748] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 2.522778] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready [ 2.528771] Segment Routing with IPv6 [ 2.548836] [drm] PCIE GART of 1024M enabled (table at 0x000000000014C000). [ 2.556059] radeon 0000:01:00.0: WB enabled [ 2.560272] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0x (ptrval) [ 2.571109] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0x (ptrval) [ 2.585425] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0x (ptrval) [ 2.596259] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 2.602916] [drm] Driver supports precise vblank timestamp query. [ 2.609046] radeon 0000:01:00.0: radeon: MSI limited to 32-bit [ 2.615006] radeon 0000:01:00.0: radeon: using MSI. [ 2.619944] [drm] radeon: irq initialized. [ 2.635912] hub 5-1:1.0: USB hub found [ 2.639847] hub 5-1:1.0: 4 ports detected [ 2.644731] [drm] ring test on 0 succeeded in 0 usecs [ 2.649815] [drm] ring test on 3 succeeded in 3 usecs [ 2.797941] usb 6-1: new SuperSpeed USB device number 2 using xhci-hcd [ 2.833616] [drm] ring test on 5 succeeded in 1 usecs [ 2.838697] [drm] UVD initialized successfully. [ 2.843495] [drm] ib test on ring 0 succeeded in 0 usecs [ 2.848884] [drm] ib test on ring 3 succeeded in 0 usecs [ 2.905618] hub 6-1:1.0: USB hub found [ 2.909579] hub 6-1:1.0: 4 ports detected [ 2.986738] usb 5-1.4: new high-speed USB device number 3 using xhci-hcd [ 3.006863] [drm] ib test on ring 5 succeeded [ 3.011876] [drm] Radeon Display Connectors [ 3.016085] [drm] Connector 0: [ 3.019154] [drm] DP-1 [ 3.021698] [drm] HPD2 [ 3.024244] [drm] DDC: 0x6460 0x6460 0x6464 0x6464 0x6468 0x6468 0x646c 0x646c [ 3.031672] [drm] Encoders: [ 3.034653] [drm] DFP1: INTERNAL_UNIPHY1 [ 3.038941] [drm] Connector 1: [ 3.042008] [drm] DVI-I-1 [ 3.044814] [drm] HPD4 [ 3.047358] [drm] DDC: 0x6440 0x6440 0x6444 0x6444 0x6448 0x6448 0x644c 0x644c [ 3.054786] [drm] Encoders: [ 3.057767] [drm] DFP2: INTERNAL_UNIPHY [ 3.061968] [drm] CRT1: INTERNAL_KLDSCP_DAC1 [ 3.066605] [drm] Connector 2: [ 3.069672] [drm] DVI-I-2 [ 3.072478] [drm] HPD1 [ 3.075023] [drm] DDC: 0x6430 0x6430 0x6434 0x6434 0x6438 0x6438 0x643c 0x643c [ 3.082450] [drm] Encoders: [ 3.085430] [drm] DFP3: INTERNAL_UNIPHY1 [ 3.089719] [drm] CRT2: INTERNAL_KLDSCP_DAC2 [ 3.095924] hub 5-1.4:1.0: USB hub found [ 3.100002] hub 5-1.4:1.0: 4 ports detected [ 3.110713] mvpp2 f2000000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx [ 3.118686] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 3.206262] [drm] fb mappable at 0x80034D000 [ 3.210554] [drm] vram apper at 0x800000000 [ 3.214756] [drm] size 8294400 [ 3.217823] [drm] fb depth is 24 [ 3.221064] [drm] pitch is 7680 [ 3.264754] Console: switching to colour frame buffer device 240x67 [ 3.278056] radeon 0000:01:00.0: fb0: radeondrmfb frame buffer device [ 3.299311] [drm] Initialized radeon 2.50.0 20080528 for 0000:01:00.0 on minor 0 [ ok done. [....] Setting up keyboard layout...[ 3.351975] usb 6-1.4: new SuperSpeed USB device number 3 using xhci-hcd [ ok done. [ 3.458706] hub 6-1.4:1.0: USB hub found [ 3.462996] hub 6-1.4:1.0: 4 ports detected [ 3.540089] usb 5-1.4.1: new full-speed USB device number 4 using xhci-hcd [ 3.551721] EXT4-fs (mmcblk0p1): re-mounted. Opts: (null) [....] Checking root file system...fsck from util-linux 2.32 /dev/mmcblk0p1: clean, 165507/475136 files, 4946295/7599104 blocks [ ok done. [ 3.603163] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=60 [ 3.723430] usb 5-1.4.1: new high-speed USB device number 5 using xhci-hcd [....] Activating lvm and md swap...[ ok done. [....] Checking file systems...fsck from util-linux 2.32 checking super block... filesystem is clean, no checking needed. [ ok done. [ 3.892330] usbcore: registered new interface driver usbhid [ 3.898466] usbhid: USB HID core driver [....] Cleaning up temporary files... /tmp[ ok . [ 3.971996] usbcore: registered new interface driver snd-usb-audio [ 3.981106] input: ASUS Xonar U7 MKII as /devices/platform/cp1/cp1:config-space at f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.1/5-1.4.1:1.4/0003:0B05:183C.0001/input/input0 [ 3.996748] usb 5-1.4.2: new low-speed USB device number 6 using xhci-hcd [info] Loading kernel module nf_conntrack_ftp. [info] Loading kernel module snd-usb-audio. [info] Loading kernel module fbcon. modprobe: FATAL: Module fbcon not found in directory /lib/modules/4.17.11 [info] Loading kernel module udl. modprobe: FATAL: Module udl not found in directory /lib/modules/4.17.11 [ 4.050298] hid-generic 0003:0B05:183C.0001: input: USB HID v1.00 Device [ASUS Xonar U7 MKII] on usb-f4500000.usb3-1.4.1/input4 [ 4.143548] random: alsactl: uninitialized urandom read (4 bytes read) [ 4.159127] input: Logitech USB-PS/2 Optical Mouse as /devices/platform/cp1/cp1:config-space at f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.2/5-1.4.2:1.0/0003:046D:C01E.0002/input/input1 [ 4.175703] hid-generic 0003:046D:C01E.0002: input: USB HID v1.10 Mouse [Logitech USB-PS/2 Optical Mouse] on usb-f4500000.usb3-1.4.2/input0 [ 4.252844] Adding 4194300k swap on /i/SWAP. Priority:-2 extents:1 across:4194300k [ 4.270569] mvpp2 f4000000.ethernet eth2: Link is Up - 100Mbps/Full - flow control rx/tx [ 4.278738] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready [ 4.316754] usb 5-1.4.3: new low-speed USB device number 7 using xhci-hcd [....] Mounting local filesystems...[ ok done. [....] Activating swapfile swap...[ ok done. [....] Cleaning up temporary files...[ ok . [ 4.366696] random: dd: uninitialized urandom read (512 bytes read) [....] Starting Setting kernel variables: sysctl[ ok . [ 4.468860] PPP generic driver version 2.4.2 [ 4.474859] input: GASIA PS2toUSB Adapter as /devices/platform/cp1/cp1:config-space at f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.3/5-1.4.3:1.0/0003:0E8F:0020.0003/input/input2 [ 4.477505] NET: Registered protocol family 17 [ 4.539749] NET: Registered protocol family 24 [ 4.550179] hid-generic 0003:0E8F:0020.0003: input: USB HID v1.10 Keyboard [GASIA PS2toUSB Adapter] on usb-f4500000.usb3-1.4.3/input0 [ 4.566423] input: GASIA PS2toUSB Adapter as /devices/platform/cp1/cp1:config-space at f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.3/5-1.4.3:1.1/0003:0E8F:0020.0004/input/input3 [....] Configuring network interfaces...Plugin rp-pppoe.so loaded. ifup: interface eth0 already configured ifup: interface eth2 already configured [ ok done. [....] Cleaning up temporary files...[ ok . [ 4.636900] hid-generic 0003:0E8F:0020.0004: input: USB HID v1.10 Mouse [GASIA PS2toUSB Adapter] on usb-f4500000.usb3-1.4.3/input1 [ 4.660387] random: alsactl: uninitialized urandom read (4 bytes read) [....] Setting up X socket directories... /tmp/.X11-unix /tmp/.ICE-unix[ ok . [....] Setting sensors limits...[ ok done. [....] Setting up ALSA...[ ok done. [....] Loading netfilter rules...run-parts: executing /usr/share/netfilter-persistent/plugins.d/15-ip4tables start run-parts: executing /usr/share/netfilter-persistent/plugins.d/25-ip6tables start [ 4.743426] usb 5-1.4.4: new high-speed USB device number 8 using xhci-hcd [ ok done. INIT: Entering runlevel: 2 [info] Using makefile-style concurrent boot in runlevel 2. [....] Enabling additional executable binary formats: binfmt-support[ ok . [....] Setting up console font and keymap...[ ok done. [ 4.878142] udlfb 5-1.4.4:1.0: vendor descriptor length: 34 data: 22 5f 01 00 20 05 00 01 03 00 04 [ 4.887190] udlfb 5-1.4.4:1.0: DL chip limited to 2360000 pixel modes [....] Starting enhanced syslogd: rsyslogd[ ok . [ 5.046171] usb 5-1.4.4: Unable to get valid EDID from device/display [ 5.055288] usb 5-1.4.4: fb1 is DisplayLink USB device (800x600, 1880K framebuffer memory) [ 5.063655] usbcore: registered new interface driver udlfb [....] Starting system message bus: dbus[ ok . [....] Loading cpufreq kernel modules...[ ok done (none). [....] Starting mouse interface server: gpm[ ok . [....] Starting NTP server: ntpd[ ok . [ 5.167095] urandom_read: 3 callbacks suppressed [ 5.167098] random: automount: uninitialized urandom read (4 bytes read) [ 5.195518] random: isc-worker0000: uninitialized urandom read (10 bytes read) [ 5.202800] random: isc-worker0000: uninitialized urandom read (40 bytes read) [....] Starting automount...[ ok . [....] Starting domain name service...: bind9[ ok . [....] Starting virtual private network daemon:[ ok . [....] CPUFreq Utilities: Setting ondemand CPUFreq governor...disabled, governor not available...[ ok done. Starting very small Busybox based DHCP server: Starting /usr/sbin/udhcpd... udhcpd. Starting radvd: radvd. [....] Starting periodic command scheduler: cron[ ok . [....] Starting OpenBSD Secure Shell server: sshd[ ok . [....] Starting WIDE DHCPv6 client: dhcp6c[ ok . Debian GNU/Linux buster/sid leontynka ttyS0 leontynka login: [ 10.373259] random: crng init done [ 10.376676] random: 1 urandom warning(s) missed due to ratelimiting [ 23.931568] tun: Universal TUN/TAP device driver, 1.6 ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-07 17:39 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-07 17:39 UTC (permalink / raw) To: Marcin Wojtas Cc: Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, Catalin Marinas, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, Matt Sealey, linux-pci, Jingoo Han, Robin Murphy, linux-arm-kernel On Tue, 7 Aug 2018, Marcin Wojtas wrote: > Ard, Mikulas, > > After some self-caused setup issues I was able to run the test on my > MacchiatoBin with the kernel v4.18-rc8. It's been running for 1h+ now, > loading the CPU to 100% and no single error event... > > I built the binary file with: > gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc -O2 > > Maybe it's the older firmware issue? I have downloaded and built the firmware recently (it has timestamp Jul 30 2018). Do you still have your firmware file "flash-image.bin" that you used, so that I could try it? > Please send the full bootlog with > the very first line after reset. My board rev is v1.3 and I use > mainline UEFI (newest edk2 + edk2-platforms) + newest publicly > available ARM-TF and earliest firmware for this board. > > Best regards, > Marcin This is my bootlog: BootROM - 2.03 Starting CP-0 IOROM 1.07 Booting from SD 0 (0x29) Found valid image at boot postion 0x002 lNOTICE: Starting binary extension NOTICE: SVC: SW Revision 0x0. SVC is not supported mv_ddr: mv_ddr-devel-18.05.0-g84dd1d9 (Jul 30 2018 - 04:58:51 PM) mv_ddr: completed successfully NOTICE: Cold boot NOTICE: Booting Trusted Firmware NOTICE: BL1: v1.4(release):armada-18.05.2:80bbf686 NOTICE: BL1: Built : 17:00:18, Jul 30 2018 NOTICE: BL1: Booting BL2 lNOTICE: BL2: v1.4(release):armada-18.05.2:80bbf686 NOTICE: BL2: Built : 17:00:21, Jul 30 2018 BL2: Initiating SCP_BL2 transfer to SCP NOTICE: SCP_BL2 contains 2 concatenated images NOTICE: Load image to CP1 MSS AP0 NOTICE: Loading MSS image from address 0x4023020 Size 0x135c to MSS at 0xf4280000 NOTICE: Done NOTICE: Load image to AP0 MSS NOTICE: Loading MSS image from address 0x402437c Size 0x1f6c to MSS at 0xf0580000 N FreeRTOS 7.3.0 - Marvell cm3 - A8K release armada-18.05.1 OTICE: Done NOTICE: SCP Image doesn't contain PM firmware NOTICE: BL1: Booting BL31 lNOTICE: MSS PM is not supported in this build NOTICE: BL31: v1.4(release):armada-18.05.2:80bbf686 NOTICE: BL31: Built : 17:00:21, Jul 30 2018 lUEFI firmware (version MARVELL_EFI built at 16:50:27 on Jul 30 2018) Armada 8040 MachiatoBin Platform Init Comphy0-0: PCIE0 5 Gbps Comphy0-1: PCIE0 5 Gbps Comphy0-2: PCIE0 5 Gbps Comphy0-3: PCIE0 5 Gbps Comphy0-4: SFI 10.31 Gbps Comphy0-5: SATA1 5 Gbps Comphy1-0: SGMII1 1.25 Gbps Comphy1-1: SATA2 5 Gbps Comphy1-2: USB3_HOST0 5 Gbps Comphy1-3: SATA3 5 Gbps Comphy1-4: SFI 10.31 Gbps Comphy1-5: SGMII2 3.125 Gbps UTMI PHY 0 initialized to USB Host0 UTMI PHY 1 initialized to USB Host1 UTMI PHY 2 initialized to USB Host0 Succesfully installed protocol interfaces Error: Image at 000BF6F8000 start failed: 00000001 remove-symbol-file /usr/src/git/macchiato/edk2/Build/Armada80x0McBin-AARCH64/RELEASE_GCC5/AARCH64/MdeModulePkg/Universal/Acpi/AcpiPlatformDxe/AcpiPlatformDxe/DEBUG/AcpiPlatform.dll 0xBF6F9000 Detected w25q32bv SPI flash with page size 256 B, erase size 4 KB, total 4 MB ramdisk:blckio install. Status=Success Connect: PcieRoot(0x0)/Pci(0x0,0x0): Not Found 3h3h3hTianocore/EDK2 firmware version MARVELL_EFI Press ESCAPE for boot options ...error: no suitable video mode found. error: no video mode activated. GNU GRUB version 2.02~beta3-5 /----------------------------------------------------------------------------\||||||||||||||||||||||||||\----------------------------------------------------------------------------/ Use the ^ and v keys to select which entry is highlighted. Press enter to boot the selected OS, `e' to edit the commands before booting or `c' for a command-line. *Debian GNU/Linux Advanced options for Debian GNU/Linux System setup The highlighted entry will be executed automatically in 5s. The highlighted entry will be executed automatically in 4s. Loading Linux 4.17.11 ... EFI stub: Booting Linux Kernel... EFI stub: Using DTB from configuration table EFI stub: Exiting boot services and installing virtual address map... [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd081] [ 0.000000] Linux version 4.17.11 (root@leontynka) (gcc version 8.2.0 (Debian 8.2.0-2)) #10 SMP PREEMPT Fri Aug 3 18:29:35 CEST 2018 [ 0.000000] Machine model: Marvell 8040 MACCHIATOBin [ 0.000000] efi: Getting EFI parameters from FDT: [ 0.000000] efi: EFI v2.70 by EDK II [ 0.000000] efi: SMBIOS 3.0=0xbfed0000 ACPI 2.0=0xb6760000 MEMATTR=0xb8c63518 RNG=0xbffdcf98 [ 0.000000] efi: seeding entropy pool [ 0.000000] psci: probing for conduit method from DT. [ 0.000000] psci: PSCIv1.0 detected in firmware. [ 0.000000] psci: Using standard PSCI v0.2 function IDs [ 0.000000] psci: MIGRATE_INFO_TYPE not supported. [ 0.000000] psci: SMC Calling Convention v1.1 [ 0.000000] percpu: Embedded 26 pages/cpu @ (ptrval) s67096 r8192 d31208 u106496 [ 0.000000] Detected PIPT I-cache on CPU0 [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 1031688 [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.17.11 root=/dev/mmcblk0p1 ro console=ttyS0,115200 [ 0.000000] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) [ 0.000000] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) [ 0.000000] software IO TLB [mem 0xbb810000-0xbf810000] (64MB) mapped at [ (ptrval)- (ptrval)] [ 0.000000] Memory: 4033692K/4192256K available (4860K kernel code, 376K rwdata, 2452K rodata, 384K init, 2178K bss, 158564K reserved, 0K cma-reserved) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 [ 0.000000] Preemptible hierarchical RCU implementation. [ 0.000000] Tasks RCU enabled. [ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 [ 0.000000] GIC: Adjusting CPU interface base to 0x00000000f022f000 [ 0.000000] GIC: Using split EOI/Deactivate mode [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:160, num:32) [ 0.000000] GICv2m: range[mem 0xf0280000-0xf0280fff], SPI[160:191] [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:192, num:32) [ 0.000000] GICv2m: range[mem 0xf0290000-0xf0290fff], SPI[192:223] [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:224, num:32) [ 0.000000] GICv2m: range[mem 0xf02a0000-0xf02a0fff], SPI[224:255] [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:256, num:32) [ 0.000000] GICv2m: range[mem 0xf02b0000-0xf02b0fff], SPI[256:287] [ 0.000000] arch_timer: cp15 timer(s) running at 25.00MHz (phys). [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x5c40939b5, max_idle_ns: 440795202646 ns [ 0.000002] sched_clock: 56 bits at 25MHz, resolution 40ns, wraps every 4398046511100ns [ 0.000113] Console: colour dummy device 174x49 [ 0.000124] Calibrating delay loop (skipped), value calculated using timer frequency.. 50.08 BogoMIPS (lpj=83333) [ 0.000129] pid_max: default: 32768 minimum: 301 [ 0.000151] Security Framework initialized [ 0.000154] Yama: becoming mindful. [ 0.000183] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes) [ 0.000199] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes) [ 0.016676] ASID allocator initialised with 65536 entries [ 0.020006] Hierarchical SRCU implementation. [ 0.023435] Remapping and enabling EFI services. [ 0.026680] smp: Bringing up secondary CPUs ... [ 0.043500] Detected PIPT I-cache on CPU1 [ 0.043522] CPU1: Booted secondary processor 0x0000000001 [0x410fd081] [ 0.060176] Detected PIPT I-cache on CPU2 [ 0.060195] CPU2: Booted secondary processor 0x0000000100 [0x410fd081] [ 0.076859] Detected PIPT I-cache on CPU3 [ 0.076872] CPU3: Booted secondary processor 0x0000000101 [0x410fd081] [ 0.076901] smp: Brought up 1 node, 4 CPUs [ 0.076910] SMP: Total of 4 processors activated. [ 0.076913] CPU features: detected: 32-bit EL0 Support [ 0.077194] CPU: All CPU(s) started at EL2 [ 0.077205] alternatives: patching kernel code [ 0.077230] random: get_random_u64 called from compute_layout+0x94/0xe8 with crng_init=0 [ 0.077599] devtmpfs: initialized [ 0.078967] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370867519511994 ns [ 0.078975] futex hash table entries: 1024 (order: 5, 131072 bytes) [ 0.079032] pinctrl core: initialized pinctrl subsystem [ 0.079183] SMBIOS 3.0.0 present. [ 0.079191] DMI: Marvell Armada 8040 MacchiatoBin/Armada 8040 MacchiatoBin, BIOS EDK II Jul 30 2018 [ 0.079264] NET: Registered protocol family 16 [ 0.079484] cpuidle: using governor ladder [ 0.079535] cpuidle: using governor menu [ 0.079559] vdso: 2 pages (1 code @ (ptrval), 1 data @ (ptrval)) [ 0.079562] vdso: 2 pages (1 code @ (ptrval), 1 data @ (ptrval)) [ 0.079569] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers. [ 0.079681] DMA: preallocated 256 KiB pool for atomic allocations [ 0.082434] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages [ 0.082669] ACPI: Interpreter disabled. [ 0.082822] reg-fixed-voltage regulator-usb3-vbus0: could not find pctldev for node /cp0/config-space@f2000000/system-controller@440000/pinctrl/xhci0-vbus-pins, deferring probe [ 0.082929] SCSI subsystem initialized [ 0.083033] Registered efivars operations [ 0.083398] clocksource: Switched to clocksource arch_sys_counter [ 0.083517] pnp: PnP ACPI: disabled [ 0.085008] NET: Registered protocol family 2 [ 0.085146] tcp_listen_portaddr_hash hash table entries: 2048 (order: 3, 32768 bytes) [ 0.085156] TCP established hash table entries: 32768 (order: 6, 262144 bytes) [ 0.085213] TCP bind hash table entries: 32768 (order: 7, 524288 bytes) [ 0.085441] TCP: Hash tables configured (established 32768 bind 32768) [ 0.085492] UDP hash table entries: 2048 (order: 4, 65536 bytes) [ 0.085507] UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes) [ 0.085713] hw perfevents: unable to count PMU IRQs [ 0.085718] hw perfevents: /ap806/config-space@f0000000/pmu: failed to register PMU devices! [ 0.085823] kvm [1]: 8-bit VMID [ 0.086279] kvm [1]: vgic interrupt IRQ1 [ 0.086339] kvm [1]: Hyp mode initialized successfully [ 0.086649] workingset: timestamp_bits=62 max_order=20 bucket_order=0 [ 0.088566] io scheduler noop registered [ 0.088625] io scheduler cfq registered (default) [ 0.089467] armada-ap806-pinctrl f06f4000.system-controller:pinctrl: registered pinctrl driver [ 0.089690] armada-cp110-pinctrl f2440000.system-controller:pinctrl: registered pinctrl driver [ 0.089843] armada-cp110-pinctrl f4440000.system-controller:pinctrl: registered pinctrl driver [ 0.091439] mv_xor_v2 f0400000.xor: Marvell Version 2 XOR driver [ 0.091583] mv_xor_v2 f0420000.xor: Marvell Version 2 XOR driver [ 0.091736] mv_xor_v2 f0440000.xor: Marvell Version 2 XOR driver [ 0.091897] mv_xor_v2 f0460000.xor: Marvell Version 2 XOR driver [ 0.092084] mv_xor_v2 f26a0000.xor: Marvell Version 2 XOR driver [ 0.092247] mv_xor_v2 f26c0000.xor: Marvell Version 2 XOR driver [ 0.092432] mv_xor_v2 f46a0000.xor: Marvell Version 2 XOR driver [ 0.092596] mv_xor_v2 f46c0000.xor: Marvell Version 2 XOR driver [ 0.092685] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled [ 0.092996] console [ttyS0] disabled [ 0.113364] f0512000.serial: ttyS0 at MMIO 0xf0512000 (irq = 7, base_baud = 12500000) is a 16550A [ 0.777029] console [ttyS0] enabled [ 0.801032] f2702100.serial: ttyS1 at MMIO 0xf2702100 (irq = 26, base_baud = 15625000) is a 16550A [ 0.830524] f4702000.serial: ttyS2 at MMIO 0xf4702000 (irq = 27, base_baud = 15625000) is a 16550A [ 0.839705] cacheinfo: Unable to detect cache hierarchy for CPU 0 [ 0.846129] libphy: Fixed MDIO Bus: probed [ 0.850384] libphy: orion_mdio_bus: probed [ 0.854879] libphy: orion_mdio_bus: probed [ 0.862575] mousedev: PS/2 mouse device common for all mice [ 0.868314] rtc-efi rtc-efi: rtc core: registered rtc-efi as rtc0 [ 0.874446] i2c /dev entries driver [ 0.880187] sdhci: Secure Digital Host Controller Interface driver [ 0.886399] sdhci: Copyright(c) Pierre Ossman [ 0.890777] sdhci-pltfm: SDHCI platform and OF driver helper [ 0.896640] mmc0: Switching to 3.3V signalling voltage failed [ 0.927528] mmc0: SDHCI controller on f06e0000.sdhci [f06e0000.sdhci] using ADMA 64-bit [ 0.966330] mmc1: SDHCI controller on f2780000.sdhci [f2780000.sdhci] using ADMA 64-bit [ 0.976142] hw perfevents: enabled with armv8_cortex_a72 PMU driver, 7 counters available [ 0.984470] PCI: OF: host bridge /cp0/pcie@f2600000 ranges: [ 0.990120] PCI: OF: IO 0xeff00000..0xeff0ffff -> 0x00000000 [ 0.996083] PCI: OF: MEM 0xc0000000..0xdfffffff -> 0xc0000000 [ 1.002050] PCI: OF: MEM 0x800000000..0x8ffffffff -> 0x800000000 [ 1.008301] mmc0: new high speed MMC card at address 0001 [ 1.008337] armada8k-pcie f2600000.pcie: link up [ 1.013884] mmcblk0: mmc0:0001 8GME4R 7.28 GiB [ 1.018427] armada8k-pcie f2600000.pcie: PCI host bridge to bus 0000:00 [ 1.023036] mmcblk0boot0: mmc0:0001 8GME4R partition 1 4.00 MiB [ 1.029619] pci_bus 0000:00: root bus resource [bus 00-ff] [ 1.035652] mmcblk0boot1: mmc0:0001 8GME4R partition 2 4.00 MiB [ 1.041101] pci_bus 0000:00: root bus resource [io 0x0000-0xffff] [ 1.047085] mmcblk0rpmb: mmc0:0001 8GME4R partition 3 512 KiB, chardev (250:0) [ 1.053256] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xdfffffff] [ 1.067452] pci_bus 0000:00: root bus resource [mem 0x800000000-0x8ffffffff] [ 1.074559] mmcblk0: p1 p2 p3 [ 1.077642] pci 0000:00:00.0: disabling Extended Tags (this device can't handle them) [ 1.097253] pci 0000:00:00.0: BAR 9: assigned [mem 0x800000000-0x80fffffff 64bit pref] [ 1.105218] pci 0000:00:00.0: BAR 0: assigned [mem 0x810000000-0x8100fffff 64bit] [ 1.112749] pci 0000:00:00.0: BAR 8: assigned [mem 0xc0000000-0xc00fffff] [ 1.119578] pci 0000:00:00.0: BAR 7: assigned [io 0x1000-0x1fff] [ 1.125712] pci 0000:01:00.0: BAR 0: assigned [mem 0x800000000-0x80fffffff 64bit pref] [ 1.133705] pci 0000:01:00.0: BAR 2: assigned [mem 0xc0000000-0xc001ffff 64bit] [ 1.141088] pci 0000:01:00.0: BAR 6: assigned [mem 0xc0020000-0xc003ffff pref] [ 1.148353] pci 0000:01:00.1: BAR 0: assigned [mem 0xc0040000-0xc0043fff 64bit] [ 1.155738] pci 0000:01:00.0: BAR 4: assigned [io 0x1000-0x10ff] [ 1.161879] pci 0000:00:00.0: PCI bridge to [bus 01-ff] [ 1.167139] pci 0000:00:00.0: bridge window [io 0x1000-0x1fff] [ 1.173274] pci 0000:00:00.0: bridge window [mem 0xc0000000-0xc00fffff] [ 1.180106] pci 0000:00:00.0: bridge window [mem 0x800000000-0x80fffffff 64bit pref] [ 1.188210] mmc1: new high speed SDHC card at address 1234 [ 1.193873] mmcblk1: mmc1:1234 SA08G 7.41 GiB [ 1.198461] pcieport 0000:00:00.0: AER enabled with IRQ 32 [ 1.204020] pci 0000:01:00.1: Linked as a consumer to 0000:01:00.0 [ 1.210396] rtc-efi rtc-efi: setting system clock to 2018-08-06 20:01:28 UTC (1533585688) [ 1.211676] mmcblk1: p1 p2 [ 1.220717] v_5v0_usb3_hst_vbus: disabling [ 1.231509] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null) [ 1.239654] VFS: Mounted root (ext4 filesystem) readonly on device 179:1. [ 1.248061] devtmpfs: mounted [ 1.251151] Freeing unused kernel memory: 384K [ 1.325623] random: fast init done INIT: version 2.88 booting [info] Using makefile-style concurrent boot in runlevel S. [ 1.488069] NET: Registered protocol family 1 ERROR: could not open /proc/stat: No such file or directory [....] Starting the hotplug events dispatcher: systemd-udevdstarting version 239 [ ok . [....] Synthesizing the initial hotplug events...[ ok done. [ 1.786418] EFI Variables Facility v0.08 2004-May-17 [....] Waiting for /dev to be fully populated...[ 1.804433] mvpp2 f2000000.ethernet eth0: Using random mac address fe:a5:21:f0:f8:7d [ 1.806861] usbcore: registered new interface driver usbfs [ 1.817792] usbcore: registered new interface driver hub [ 1.817835] mvpp2 f4000000.ethernet eth1: Using random mac address 86:5f:16:0c:f9:16 [ 1.823172] usbcore: registered new device driver usb [ 1.837065] mvpp2 f4000000.ethernet eth2: Using random mac address 8e:6e:60:9f:57:60 [ 1.849493] ahci f2540000.sata: AHCI 0001.0000 32 slots 2 ports 6 Gbps 0x3 impl platform mode [ 1.851030] mvpp2 f4000000.ethernet eth3: Using random mac address c6:5e:07:9a:54:82 [ 1.859250] ahci f2540000.sata: flags: 64bit ncq sntf led only pmp fbs pio slum part sxs [ 1.874656] scsi host0: ahci [ 1.877789] scsi host1: ahci [ 1.880777] ata1: SATA max UDMA/133 mmio [mem 0xf2540000-0xf256ffff] port 0x100 irq 57 [ 1.888777] ata2: SATA max UDMA/133 mmio [mem 0xf2540000-0xf256ffff] port 0x180 irq 57 [ 1.897008] ahci f4540000.sata: AHCI 0001.0000 32 slots 2 ports 6 Gbps 0x3 impl platform mode [ 1.905629] ahci f4540000.sata: flags: 64bit ncq sntf led only pmp fbs pio slum part sxs [ 1.914173] scsi host2: ahci [ 1.917252] scsi host3: ahci [ 1.920225] ata3: SATA max UDMA/133 mmio [mem 0xf4540000-0xf456ffff] port 0x100 irq 58 [ 1.928198] ata4: SATA max UDMA/133 mmio [mem 0xf4540000-0xf456ffff] port 0x180 irq 58 [ 1.928608] xhci-hcd f2500000.usb3: xHCI Host Controller [ 1.942154] xhci-hcd f2500000.usb3: new USB bus registered, assigned bus number 1 [ 1.951232] xhci-hcd f2500000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x00010010 [ 1.959840] xhci-hcd f2500000.usb3: irq 59, io mem 0xf2500000 [ 1.966174] hub 1-0:1.0: USB hub found [ 1.970058] hub 1-0:1.0: 1 port detected [ 1.974150] xhci-hcd f2500000.usb3: xHCI Host Controller [ 1.979551] xhci-hcd f2500000.usb3: new USB bus registered, assigned bus number 2 [ 1.979558] xhci-hcd f2500000.usb3: Host supports USB 3.0 SuperSpeed [ 1.993840] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. [ 2.000947] cryptd: max_cpu_qlen set to 1000 [ 2.002463] hub 2-0:1.0: USB hub found [ 2.010089] hub 2-0:1.0: 1 port detected [ 2.014291] xhci-hcd f2510000.usb3: xHCI Host Controller [ 2.019647] xhci-hcd f2510000.usb3: new USB bus registered, assigned bus number 3 [ 2.027219] xhci-hcd f2510000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x00010010 [ 2.035823] xhci-hcd f2510000.usb3: irq 60, io mem 0xf2510000 [ 2.042445] hub 3-0:1.0: USB hub found [ 2.046236] hub 3-0:1.0: 1 port detected [ 2.050278] xhci-hcd f2510000.usb3: xHCI Host Controller [ 2.055768] xhci-hcd f2510000.usb3: new USB bus registered, assigned bus number 4 [ 2.063314] xhci-hcd f2510000.usb3: Host supports USB 3.0 SuperSpeed [ 2.069818] usb usb4: We don't know the algorithms for LPM for this host, disabling LPM. [ 2.078176] hub 4-0:1.0: USB hub found [ 2.081972] hub 4-0:1.0: 1 port detected [ 2.086215] xhci-hcd f4500000.usb3: xHCI Host Controller [ 2.091581] xhci-hcd f4500000.usb3: new USB bus registered, assigned bus number 5 [ 2.099158] xhci-hcd f4500000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x00010010 [ 2.107751] xhci-hcd f4500000.usb3: irq 61, io mem 0xf4500000 [ 2.113788] hub 5-0:1.0: USB hub found [ 2.117586] hub 5-0:1.0: 1 port detected [ 2.121642] xhci-hcd f4500000.usb3: xHCI Host Controller [ 2.126988] xhci-hcd f4500000.usb3: new USB bus registered, assigned bus number 6 [ 2.134514] xhci-hcd f4500000.usb3: Host supports USB 3.0 SuperSpeed [ 2.141009] usb usb6: We don't know the algorithms for LPM for this host, disabling LPM. [ 2.149354] hub 6-0:1.0: USB hub found [ 2.153156] hub 6-0:1.0: 1 port detected [ 2.162109] [drm] radeon kernel modesetting enabled. [ 2.167730] radeon 0000:01:00.0: enabling device (0000 -> 0003) [ 2.174555] [drm] initializing kernel modesetting (CEDAR 0x1002:0x68F9 0x1787:0x3000 0x00). [ 2.223834] ata1: SATA link down (SStatus 0 SControl 300) [ 2.250489] ata3: SATA link down (SStatus 0 SControl 300) [ 2.256599] ata4: SATA link down (SStatus 0 SControl 300) [ 2.303985] ATOM BIOS: CEDAR [ 2.306958] [drm] GPU not posted. posting now... [ 2.314564] radeon 0000:01:00.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used) [ 2.323482] radeon 0000:01:00.0: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF [ 2.331229] [drm] Detected VRAM RAM=1024M, BAR=256M [ 2.336194] [drm] RAM width 64bits DDR [ 2.343406] [TTM] Zone kernel: Available graphics memory: 2017038 kiB [ 2.349967] [TTM] Initializing pool allocator [ 2.354353] [TTM] Initializing DMA pool allocator [ 2.359105] [drm] radeon: 1024M of VRAM memory ready [ 2.364109] [drm] radeon: 1024M of GTT memory ready. [ 2.369260] [drm] Loading CEDAR Microcode [ 2.370083] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 2.380025] ata2.00: ATA-8: ST4000DM000-1F2168, CC52, max UDMA/133 [ 2.386240] ata2.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32) [ 2.393559] ata2.00: configured for UDMA/133 [ 2.397952] scsi 1:0:0:0: Direct-Access ATA ST4000DM000-1F21 CC52 PQ: 0 ANSI: 5 [ 2.418024] sd 1:0:0:0: [sda] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) [ 2.420812] [drm] Internal thermal controller with fan control [ 2.425823] sd 1:0:0:0: [sda] 4096-byte physical blocks [ 2.436947] sd 1:0:0:0: [sda] Write Protect is off [ 2.441160] [drm] radeon: dpm initialized [ 2.445807] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 2.459901] [drm] GART: num cpu pages 262144, num gpu pages 262144 [ 2.460082] usb 5-1: new high-speed USB device number 2 using xhci-hcd [ 2.466365] sd 1:0:0:0: [sda] Attached SCSI removable disk [ 2.467195] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0 [ 2.511728] NET: Registered protocol family 10 [ 2.516748] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 2.522778] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready [ 2.528771] Segment Routing with IPv6 [ 2.548836] [drm] PCIE GART of 1024M enabled (table at 0x000000000014C000). [ 2.556059] radeon 0000:01:00.0: WB enabled [ 2.560272] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0x (ptrval) [ 2.571109] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0x (ptrval) [ 2.585425] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0x (ptrval) [ 2.596259] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 2.602916] [drm] Driver supports precise vblank timestamp query. [ 2.609046] radeon 0000:01:00.0: radeon: MSI limited to 32-bit [ 2.615006] radeon 0000:01:00.0: radeon: using MSI. [ 2.619944] [drm] radeon: irq initialized. [ 2.635912] hub 5-1:1.0: USB hub found [ 2.639847] hub 5-1:1.0: 4 ports detected [ 2.644731] [drm] ring test on 0 succeeded in 0 usecs [ 2.649815] [drm] ring test on 3 succeeded in 3 usecs [ 2.797941] usb 6-1: new SuperSpeed USB device number 2 using xhci-hcd [ 2.833616] [drm] ring test on 5 succeeded in 1 usecs [ 2.838697] [drm] UVD initialized successfully. [ 2.843495] [drm] ib test on ring 0 succeeded in 0 usecs [ 2.848884] [drm] ib test on ring 3 succeeded in 0 usecs [ 2.905618] hub 6-1:1.0: USB hub found [ 2.909579] hub 6-1:1.0: 4 ports detected [ 2.986738] usb 5-1.4: new high-speed USB device number 3 using xhci-hcd [ 3.006863] [drm] ib test on ring 5 succeeded [ 3.011876] [drm] Radeon Display Connectors [ 3.016085] [drm] Connector 0: [ 3.019154] [drm] DP-1 [ 3.021698] [drm] HPD2 [ 3.024244] [drm] DDC: 0x6460 0x6460 0x6464 0x6464 0x6468 0x6468 0x646c 0x646c [ 3.031672] [drm] Encoders: [ 3.034653] [drm] DFP1: INTERNAL_UNIPHY1 [ 3.038941] [drm] Connector 1: [ 3.042008] [drm] DVI-I-1 [ 3.044814] [drm] HPD4 [ 3.047358] [drm] DDC: 0x6440 0x6440 0x6444 0x6444 0x6448 0x6448 0x644c 0x644c [ 3.054786] [drm] Encoders: [ 3.057767] [drm] DFP2: INTERNAL_UNIPHY [ 3.061968] [drm] CRT1: INTERNAL_KLDSCP_DAC1 [ 3.066605] [drm] Connector 2: [ 3.069672] [drm] DVI-I-2 [ 3.072478] [drm] HPD1 [ 3.075023] [drm] DDC: 0x6430 0x6430 0x6434 0x6434 0x6438 0x6438 0x643c 0x643c [ 3.082450] [drm] Encoders: [ 3.085430] [drm] DFP3: INTERNAL_UNIPHY1 [ 3.089719] [drm] CRT2: INTERNAL_KLDSCP_DAC2 [ 3.095924] hub 5-1.4:1.0: USB hub found [ 3.100002] hub 5-1.4:1.0: 4 ports detected [ 3.110713] mvpp2 f2000000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx [ 3.118686] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 3.206262] [drm] fb mappable at 0x80034D000 [ 3.210554] [drm] vram apper at 0x800000000 [ 3.214756] [drm] size 8294400 [ 3.217823] [drm] fb depth is 24 [ 3.221064] [drm] pitch is 7680 [ 3.264754] Console: switching to colour frame buffer device 240x67 [ 3.278056] radeon 0000:01:00.0: fb0: radeondrmfb frame buffer device [ 3.299311] [drm] Initialized radeon 2.50.0 20080528 for 0000:01:00.0 on minor 0 [ ok done. [....] Setting up keyboard layout...[ 3.351975] usb 6-1.4: new SuperSpeed USB device number 3 using xhci-hcd [ ok done. [ 3.458706] hub 6-1.4:1.0: USB hub found [ 3.462996] hub 6-1.4:1.0: 4 ports detected [ 3.540089] usb 5-1.4.1: new full-speed USB device number 4 using xhci-hcd [ 3.551721] EXT4-fs (mmcblk0p1): re-mounted. Opts: (null) [....] Checking root file system...fsck from util-linux 2.32 /dev/mmcblk0p1: clean, 165507/475136 files, 4946295/7599104 blocks [ ok done. [ 3.603163] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=60 [ 3.723430] usb 5-1.4.1: new high-speed USB device number 5 using xhci-hcd [....] Activating lvm and md swap...[ ok done. [....] Checking file systems...fsck from util-linux 2.32 checking super block... filesystem is clean, no checking needed. [ ok done. [ 3.892330] usbcore: registered new interface driver usbhid [ 3.898466] usbhid: USB HID core driver [....] Cleaning up temporary files... /tmp[ ok . [ 3.971996] usbcore: registered new interface driver snd-usb-audio [ 3.981106] input: ASUS Xonar U7 MKII as /devices/platform/cp1/cp1:config-space@f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.1/5-1.4.1:1.4/0003:0B05:183C.0001/input/input0 [ 3.996748] usb 5-1.4.2: new low-speed USB device number 6 using xhci-hcd [info] Loading kernel module nf_conntrack_ftp. [info] Loading kernel module snd-usb-audio. [info] Loading kernel module fbcon. modprobe: FATAL: Module fbcon not found in directory /lib/modules/4.17.11 [info] Loading kernel module udl. modprobe: FATAL: Module udl not found in directory /lib/modules/4.17.11 [ 4.050298] hid-generic 0003:0B05:183C.0001: input: USB HID v1.00 Device [ASUS Xonar U7 MKII] on usb-f4500000.usb3-1.4.1/input4 [ 4.143548] random: alsactl: uninitialized urandom read (4 bytes read) [ 4.159127] input: Logitech USB-PS/2 Optical Mouse as /devices/platform/cp1/cp1:config-space@f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.2/5-1.4.2:1.0/0003:046D:C01E.0002/input/input1 [ 4.175703] hid-generic 0003:046D:C01E.0002: input: USB HID v1.10 Mouse [Logitech USB-PS/2 Optical Mouse] on usb-f4500000.usb3-1.4.2/input0 [ 4.252844] Adding 4194300k swap on /i/SWAP. Priority:-2 extents:1 across:4194300k [ 4.270569] mvpp2 f4000000.ethernet eth2: Link is Up - 100Mbps/Full - flow control rx/tx [ 4.278738] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready [ 4.316754] usb 5-1.4.3: new low-speed USB device number 7 using xhci-hcd [....] Mounting local filesystems...[ ok done. [....] Activating swapfile swap...[ ok done. [....] Cleaning up temporary files...[ ok . [ 4.366696] random: dd: uninitialized urandom read (512 bytes read) [....] Starting Setting kernel variables: sysctl[ ok . [ 4.468860] PPP generic driver version 2.4.2 [ 4.474859] input: GASIA PS2toUSB Adapter as /devices/platform/cp1/cp1:config-space@f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.3/5-1.4.3:1.0/0003:0E8F:0020.0003/input/input2 [ 4.477505] NET: Registered protocol family 17 [ 4.539749] NET: Registered protocol family 24 [ 4.550179] hid-generic 0003:0E8F:0020.0003: input: USB HID v1.10 Keyboard [GASIA PS2toUSB Adapter] on usb-f4500000.usb3-1.4.3/input0 [ 4.566423] input: GASIA PS2toUSB Adapter as /devices/platform/cp1/cp1:config-space@f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.3/5-1.4.3:1.1/0003:0E8F:0020.0004/input/input3 [....] Configuring network interfaces...Plugin rp-pppoe.so loaded. ifup: interface eth0 already configured ifup: interface eth2 already configured [ ok done. [....] Cleaning up temporary files...[ ok . [ 4.636900] hid-generic 0003:0E8F:0020.0004: input: USB HID v1.10 Mouse [GASIA PS2toUSB Adapter] on usb-f4500000.usb3-1.4.3/input1 [ 4.660387] random: alsactl: uninitialized urandom read (4 bytes read) [....] Setting up X socket directories... /tmp/.X11-unix /tmp/.ICE-unix[ ok . [....] Setting sensors limits...[ ok done. [....] Setting up ALSA...[ ok done. [....] Loading netfilter rules...run-parts: executing /usr/share/netfilter-persistent/plugins.d/15-ip4tables start run-parts: executing /usr/share/netfilter-persistent/plugins.d/25-ip6tables start [ 4.743426] usb 5-1.4.4: new high-speed USB device number 8 using xhci-hcd [ ok done. INIT: Entering runlevel: 2 [info] Using makefile-style concurrent boot in runlevel 2. [....] Enabling additional executable binary formats: binfmt-support[ ok . [....] Setting up console font and keymap...[ ok done. [ 4.878142] udlfb 5-1.4.4:1.0: vendor descriptor length: 34 data: 22 5f 01 00 20 05 00 01 03 00 04 [ 4.887190] udlfb 5-1.4.4:1.0: DL chip limited to 2360000 pixel modes [....] Starting enhanced syslogd: rsyslogd[ ok . [ 5.046171] usb 5-1.4.4: Unable to get valid EDID from device/display [ 5.055288] usb 5-1.4.4: fb1 is DisplayLink USB device (800x600, 1880K framebuffer memory) [ 5.063655] usbcore: registered new interface driver udlfb [....] Starting system message bus: dbus[ ok . [....] Loading cpufreq kernel modules...[ ok done (none). [....] Starting mouse interface server: gpm[ ok . [....] Starting NTP server: ntpd[ ok . [ 5.167095] urandom_read: 3 callbacks suppressed [ 5.167098] random: automount: uninitialized urandom read (4 bytes read) [ 5.195518] random: isc-worker0000: uninitialized urandom read (10 bytes read) [ 5.202800] random: isc-worker0000: uninitialized urandom read (40 bytes read) [....] Starting automount...[ ok . [....] Starting domain name service...: bind9[ ok . [....] Starting virtual private network daemon:[ ok . [....] CPUFreq Utilities: Setting ondemand CPUFreq governor...disabled, governor not available...[ ok done. Starting very small Busybox based DHCP server: Starting /usr/sbin/udhcpd... udhcpd. Starting radvd: radvd. [....] Starting periodic command scheduler: cron[ ok . [....] Starting OpenBSD Secure Shell server: sshd[ ok . [....] Starting WIDE DHCPv6 client: dhcp6c[ ok . Debian GNU/Linux buster/sid leontynka ttyS0 leontynka login: [ 10.373259] random: crng init done [ 10.376676] random: 1 urandom warning(s) missed due to ratelimiting [ 23.931568] tun: Universal TUN/TAP device driver, 1.6 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-07 17:39 ` Mikulas Patocka (?) @ 2018-08-07 18:07 ` Ard Biesheuvel -1 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-07 18:07 UTC (permalink / raw) To: Mikulas Patocka Cc: Marcin Wojtas, Thomas Petazzoni, Joao Pinto, Catalin Marinas, linux-pci, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, Matt Sealey, Jingoo Han, Robin Murphy, linux-arm-kernel On 7 August 2018 at 19:39, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Tue, 7 Aug 2018, Marcin Wojtas wrote: > >> Ard, Mikulas, >> >> After some self-caused setup issues I was able to run the test on my >> MacchiatoBin with the kernel v4.18-rc8. It's been running for 1h+ now, >> loading the CPU to 100% and no single error event... >> >> I built the binary file with: >> gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc -O2 >> >> Maybe it's the older firmware issue? > > I have downloaded and built the firmware recently (it has timestamp Jul 30 > 2018). > > Do you still have your firmware file "flash-image.bin" that you used, so > that I could try it? > >> Please send the full bootlog with >> the very first line after reset. My board rev is v1.3 and I use >> mainline UEFI (newest edk2 + edk2-platforms) + newest publicly >> available ARM-TF and earliest firmware for this board. >> >> Best regards, >> Marcin > Mikulas, Is the issue reproducible with an nvidia card + nouveau driver as well ? Given the screen corruption i see with radeon even on other arm systems, i'd like to ensure that this is a platform bug not a driver bug. > This is my bootlog: > > BootROM - 2.03 > Starting CP-0 IOROM 1.07 > Booting from SD 0 (0x29) > Found valid image at boot postion 0x002 > lNOTICE: Starting binary extension > NOTICE: SVC: SW Revision 0x0. SVC is not supported > mv_ddr: mv_ddr-devel-18.05.0-g84dd1d9 (Jul 30 2018 - 04:58:51 PM) > mv_ddr: completed successfully > NOTICE: Cold boot > NOTICE: Booting Trusted Firmware > NOTICE: BL1: v1.4(release):armada-18.05.2:80bbf686 > NOTICE: BL1: Built : 17:00:18, Jul 30 2018 > NOTICE: BL1: Booting BL2 > lNOTICE: BL2: v1.4(release):armada-18.05.2:80bbf686 > NOTICE: BL2: Built : 17:00:21, Jul 30 2018 > BL2: Initiating SCP_BL2 transfer to SCP > NOTICE: SCP_BL2 contains 2 concatenated images > NOTICE: Load image to CP1 MSS AP0 > NOTICE: Loading MSS image from address 0x4023020 Size 0x135c to MSS at 0xf4280000 > NOTICE: Done > NOTICE: Load image to AP0 MSS > NOTICE: Loading MSS image from address 0x402437c Size 0x1f6c to MSS at 0xf0580000 > N > > FreeRTOS 7.3.0 - Marvell cm3 - A8K release armada-18.05.1 > > OTICE: Done > NOTICE: SCP Image doesn't contain PM firmware > NOTICE: BL1: Booting BL31 > lNOTICE: MSS PM is not supported in this build > NOTICE: BL31: v1.4(release):armada-18.05.2:80bbf686 > NOTICE: BL31: Built : 17:00:21, Jul 30 2018 > lUEFI firmware (version MARVELL_EFI built at 16:50:27 on Jul 30 2018) > > Armada 8040 MachiatoBin Platform Init > > Comphy0-0: PCIE0 5 Gbps > Comphy0-1: PCIE0 5 Gbps > Comphy0-2: PCIE0 5 Gbps > Comphy0-3: PCIE0 5 Gbps > Comphy0-4: SFI 10.31 Gbps > Comphy0-5: SATA1 5 Gbps > > Comphy1-0: SGMII1 1.25 Gbps > Comphy1-1: SATA2 5 Gbps > Comphy1-2: USB3_HOST0 5 Gbps > Comphy1-3: SATA3 5 Gbps > Comphy1-4: SFI 10.31 Gbps > Comphy1-5: SGMII2 3.125 Gbps > > UTMI PHY 0 initialized to USB Host0 > UTMI PHY 1 initialized to USB Host1 > UTMI PHY 2 initialized to USB Host0 > Succesfully installed protocol interfaces > Error: Image at 000BF6F8000 start failed: 00000001 > remove-symbol-file /usr/src/git/macchiato/edk2/Build/Armada80x0McBin-AARCH64/RELEASE_GCC5/AARCH64/MdeModulePkg/Universal/Acpi/AcpiPlatformDxe/AcpiPlatformDxe/DEBUG/AcpiPlatform.dll 0xBF6F9000 > Detected w25q32bv SPI flash with page size 256 B, erase size 4 KB, total 4 MB > ramdisk:blckio install. Status=Success > Connect: PcieRoot(0x0)/Pci(0x0,0x0): Not Found > 3h3h3hTianocore/EDK2 firmware version MARVELL_EFI > Press ESCAPE for boot options ...error: no suitable video mode found. > error: no video mode activated. > GNU GRUB version 2.02~beta3-5 > > /----------------------------------------------------------------------------\||||||||||||||||||||||||||\----------------------------------------------------------------------------/ Use the ^ and v keys to select which entry is highlighted. > Press enter to boot the selected OS, `e' to edit the commands > before booting or `c' for a command-line. > *Debian GNU/Linux Advanced options for Debian GNU/Linux System setup > The highlighted entry will be executed automatically in 5s. The highlighted entry will be executed automatically in 4s. Loading Linux 4.17.11 ... > EFI stub: Booting Linux Kernel... > EFI stub: Using DTB from configuration table > EFI stub: Exiting boot services and installing virtual address map... > [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd081] > [ 0.000000] Linux version 4.17.11 (root@leontynka) (gcc version 8.2.0 (Debian 8.2.0-2)) #10 SMP PREEMPT Fri Aug 3 18:29:35 CEST 2018 > [ 0.000000] Machine model: Marvell 8040 MACCHIATOBin > [ 0.000000] efi: Getting EFI parameters from FDT: > [ 0.000000] efi: EFI v2.70 by EDK II > [ 0.000000] efi: SMBIOS 3.0=0xbfed0000 ACPI 2.0=0xb6760000 MEMATTR=0xb8c63518 RNG=0xbffdcf98 > [ 0.000000] efi: seeding entropy pool > [ 0.000000] psci: probing for conduit method from DT. > [ 0.000000] psci: PSCIv1.0 detected in firmware. > [ 0.000000] psci: Using standard PSCI v0.2 function IDs > [ 0.000000] psci: MIGRATE_INFO_TYPE not supported. > [ 0.000000] psci: SMC Calling Convention v1.1 > [ 0.000000] percpu: Embedded 26 pages/cpu @ (ptrval) s67096 r8192 d31208 u106496 > [ 0.000000] Detected PIPT I-cache on CPU0 > [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 1031688 > [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.17.11 root=/dev/mmcblk0p1 ro console=ttyS0,115200 > [ 0.000000] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) > [ 0.000000] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) > [ 0.000000] software IO TLB [mem 0xbb810000-0xbf810000] (64MB) mapped at [ (ptrval)- (ptrval)] > [ 0.000000] Memory: 4033692K/4192256K available (4860K kernel code, 376K rwdata, 2452K rodata, 384K init, 2178K bss, 158564K reserved, 0K cma-reserved) > [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 > [ 0.000000] Preemptible hierarchical RCU implementation. > [ 0.000000] Tasks RCU enabled. > [ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 > [ 0.000000] GIC: Adjusting CPU interface base to 0x00000000f022f000 > [ 0.000000] GIC: Using split EOI/Deactivate mode > [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:160, num:32) > [ 0.000000] GICv2m: range[mem 0xf0280000-0xf0280fff], SPI[160:191] > [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:192, num:32) > [ 0.000000] GICv2m: range[mem 0xf0290000-0xf0290fff], SPI[192:223] > [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:224, num:32) > [ 0.000000] GICv2m: range[mem 0xf02a0000-0xf02a0fff], SPI[224:255] > [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:256, num:32) > [ 0.000000] GICv2m: range[mem 0xf02b0000-0xf02b0fff], SPI[256:287] > [ 0.000000] arch_timer: cp15 timer(s) running at 25.00MHz (phys). > [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x5c40939b5, max_idle_ns: 440795202646 ns > [ 0.000002] sched_clock: 56 bits at 25MHz, resolution 40ns, wraps every 4398046511100ns > [ 0.000113] Console: colour dummy device 174x49 > [ 0.000124] Calibrating delay loop (skipped), value calculated using timer frequency.. 50.08 BogoMIPS (lpj=83333) > [ 0.000129] pid_max: default: 32768 minimum: 301 > [ 0.000151] Security Framework initialized > [ 0.000154] Yama: becoming mindful. > [ 0.000183] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes) > [ 0.000199] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes) > [ 0.016676] ASID allocator initialised with 65536 entries > [ 0.020006] Hierarchical SRCU implementation. > [ 0.023435] Remapping and enabling EFI services. > [ 0.026680] smp: Bringing up secondary CPUs ... > [ 0.043500] Detected PIPT I-cache on CPU1 > [ 0.043522] CPU1: Booted secondary processor 0x0000000001 [0x410fd081] > [ 0.060176] Detected PIPT I-cache on CPU2 > [ 0.060195] CPU2: Booted secondary processor 0x0000000100 [0x410fd081] > [ 0.076859] Detected PIPT I-cache on CPU3 > [ 0.076872] CPU3: Booted secondary processor 0x0000000101 [0x410fd081] > [ 0.076901] smp: Brought up 1 node, 4 CPUs > [ 0.076910] SMP: Total of 4 processors activated. > [ 0.076913] CPU features: detected: 32-bit EL0 Support > [ 0.077194] CPU: All CPU(s) started at EL2 > [ 0.077205] alternatives: patching kernel code > [ 0.077230] random: get_random_u64 called from compute_layout+0x94/0xe8 with crng_init=0 > [ 0.077599] devtmpfs: initialized > [ 0.078967] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370867519511994 ns > [ 0.078975] futex hash table entries: 1024 (order: 5, 131072 bytes) > [ 0.079032] pinctrl core: initialized pinctrl subsystem > [ 0.079183] SMBIOS 3.0.0 present. > [ 0.079191] DMI: Marvell Armada 8040 MacchiatoBin/Armada 8040 MacchiatoBin, BIOS EDK II Jul 30 2018 > [ 0.079264] NET: Registered protocol family 16 > [ 0.079484] cpuidle: using governor ladder > [ 0.079535] cpuidle: using governor menu > [ 0.079559] vdso: 2 pages (1 code @ (ptrval), 1 data @ (ptrval)) > [ 0.079562] vdso: 2 pages (1 code @ (ptrval), 1 data @ (ptrval)) > [ 0.079569] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers. > [ 0.079681] DMA: preallocated 256 KiB pool for atomic allocations > [ 0.082434] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages > [ 0.082669] ACPI: Interpreter disabled. > [ 0.082822] reg-fixed-voltage regulator-usb3-vbus0: could not find pctldev for node /cp0/config-space@f2000000/system-controller@440000/pinctrl/xhci0-vbus-pins, deferring probe > [ 0.082929] SCSI subsystem initialized > [ 0.083033] Registered efivars operations > [ 0.083398] clocksource: Switched to clocksource arch_sys_counter > [ 0.083517] pnp: PnP ACPI: disabled > [ 0.085008] NET: Registered protocol family 2 > [ 0.085146] tcp_listen_portaddr_hash hash table entries: 2048 (order: 3, 32768 bytes) > [ 0.085156] TCP established hash table entries: 32768 (order: 6, 262144 bytes) > [ 0.085213] TCP bind hash table entries: 32768 (order: 7, 524288 bytes) > [ 0.085441] TCP: Hash tables configured (established 32768 bind 32768) > [ 0.085492] UDP hash table entries: 2048 (order: 4, 65536 bytes) > [ 0.085507] UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes) > [ 0.085713] hw perfevents: unable to count PMU IRQs > [ 0.085718] hw perfevents: /ap806/config-space@f0000000/pmu: failed to register PMU devices! > [ 0.085823] kvm [1]: 8-bit VMID > [ 0.086279] kvm [1]: vgic interrupt IRQ1 > [ 0.086339] kvm [1]: Hyp mode initialized successfully > [ 0.086649] workingset: timestamp_bits=62 max_order=20 bucket_order=0 > [ 0.088566] io scheduler noop registered > [ 0.088625] io scheduler cfq registered (default) > [ 0.089467] armada-ap806-pinctrl f06f4000.system-controller:pinctrl: registered pinctrl driver > [ 0.089690] armada-cp110-pinctrl f2440000.system-controller:pinctrl: registered pinctrl driver > [ 0.089843] armada-cp110-pinctrl f4440000.system-controller:pinctrl: registered pinctrl driver > [ 0.091439] mv_xor_v2 f0400000.xor: Marvell Version 2 XOR driver > [ 0.091583] mv_xor_v2 f0420000.xor: Marvell Version 2 XOR driver > [ 0.091736] mv_xor_v2 f0440000.xor: Marvell Version 2 XOR driver > [ 0.091897] mv_xor_v2 f0460000.xor: Marvell Version 2 XOR driver > [ 0.092084] mv_xor_v2 f26a0000.xor: Marvell Version 2 XOR driver > [ 0.092247] mv_xor_v2 f26c0000.xor: Marvell Version 2 XOR driver > [ 0.092432] mv_xor_v2 f46a0000.xor: Marvell Version 2 XOR driver > [ 0.092596] mv_xor_v2 f46c0000.xor: Marvell Version 2 XOR driver > [ 0.092685] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled > [ 0.092996] console [ttyS0] disabled > [ 0.113364] f0512000.serial: ttyS0 at MMIO 0xf0512000 (irq = 7, base_baud = 12500000) is a 16550A > [ 0.777029] console [ttyS0] enabled > [ 0.801032] f2702100.serial: ttyS1 at MMIO 0xf2702100 (irq = 26, base_baud = 15625000) is a 16550A > [ 0.830524] f4702000.serial: ttyS2 at MMIO 0xf4702000 (irq = 27, base_baud = 15625000) is a 16550A > [ 0.839705] cacheinfo: Unable to detect cache hierarchy for CPU 0 > [ 0.846129] libphy: Fixed MDIO Bus: probed > [ 0.850384] libphy: orion_mdio_bus: probed > [ 0.854879] libphy: orion_mdio_bus: probed > [ 0.862575] mousedev: PS/2 mouse device common for all mice > [ 0.868314] rtc-efi rtc-efi: rtc core: registered rtc-efi as rtc0 > [ 0.874446] i2c /dev entries driver > [ 0.880187] sdhci: Secure Digital Host Controller Interface driver > [ 0.886399] sdhci: Copyright(c) Pierre Ossman > [ 0.890777] sdhci-pltfm: SDHCI platform and OF driver helper > [ 0.896640] mmc0: Switching to 3.3V signalling voltage failed > [ 0.927528] mmc0: SDHCI controller on f06e0000.sdhci [f06e0000.sdhci] using ADMA 64-bit > [ 0.966330] mmc1: SDHCI controller on f2780000.sdhci [f2780000.sdhci] using ADMA 64-bit > [ 0.976142] hw perfevents: enabled with armv8_cortex_a72 PMU driver, 7 counters available > [ 0.984470] PCI: OF: host bridge /cp0/pcie@f2600000 ranges: > [ 0.990120] PCI: OF: IO 0xeff00000..0xeff0ffff -> 0x00000000 > [ 0.996083] PCI: OF: MEM 0xc0000000..0xdfffffff -> 0xc0000000 > [ 1.002050] PCI: OF: MEM 0x800000000..0x8ffffffff -> 0x800000000 > [ 1.008301] mmc0: new high speed MMC card at address 0001 > [ 1.008337] armada8k-pcie f2600000.pcie: link up > [ 1.013884] mmcblk0: mmc0:0001 8GME4R 7.28 GiB > [ 1.018427] armada8k-pcie f2600000.pcie: PCI host bridge to bus 0000:00 > [ 1.023036] mmcblk0boot0: mmc0:0001 8GME4R partition 1 4.00 MiB > [ 1.029619] pci_bus 0000:00: root bus resource [bus 00-ff] > [ 1.035652] mmcblk0boot1: mmc0:0001 8GME4R partition 2 4.00 MiB > [ 1.041101] pci_bus 0000:00: root bus resource [io 0x0000-0xffff] > [ 1.047085] mmcblk0rpmb: mmc0:0001 8GME4R partition 3 512 KiB, chardev (250:0) > [ 1.053256] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xdfffffff] > [ 1.067452] pci_bus 0000:00: root bus resource [mem 0x800000000-0x8ffffffff] > [ 1.074559] mmcblk0: p1 p2 p3 > [ 1.077642] pci 0000:00:00.0: disabling Extended Tags (this device can't handle them) > [ 1.097253] pci 0000:00:00.0: BAR 9: assigned [mem 0x800000000-0x80fffffff 64bit pref] > [ 1.105218] pci 0000:00:00.0: BAR 0: assigned [mem 0x810000000-0x8100fffff 64bit] > [ 1.112749] pci 0000:00:00.0: BAR 8: assigned [mem 0xc0000000-0xc00fffff] > [ 1.119578] pci 0000:00:00.0: BAR 7: assigned [io 0x1000-0x1fff] > [ 1.125712] pci 0000:01:00.0: BAR 0: assigned [mem 0x800000000-0x80fffffff 64bit pref] > [ 1.133705] pci 0000:01:00.0: BAR 2: assigned [mem 0xc0000000-0xc001ffff 64bit] > [ 1.141088] pci 0000:01:00.0: BAR 6: assigned [mem 0xc0020000-0xc003ffff pref] > [ 1.148353] pci 0000:01:00.1: BAR 0: assigned [mem 0xc0040000-0xc0043fff 64bit] > [ 1.155738] pci 0000:01:00.0: BAR 4: assigned [io 0x1000-0x10ff] > [ 1.161879] pci 0000:00:00.0: PCI bridge to [bus 01-ff] > [ 1.167139] pci 0000:00:00.0: bridge window [io 0x1000-0x1fff] > [ 1.173274] pci 0000:00:00.0: bridge window [mem 0xc0000000-0xc00fffff] > [ 1.180106] pci 0000:00:00.0: bridge window [mem 0x800000000-0x80fffffff 64bit pref] > [ 1.188210] mmc1: new high speed SDHC card at address 1234 > [ 1.193873] mmcblk1: mmc1:1234 SA08G 7.41 GiB > [ 1.198461] pcieport 0000:00:00.0: AER enabled with IRQ 32 > [ 1.204020] pci 0000:01:00.1: Linked as a consumer to 0000:01:00.0 > [ 1.210396] rtc-efi rtc-efi: setting system clock to 2018-08-06 20:01:28 UTC (1533585688) > [ 1.211676] mmcblk1: p1 p2 > [ 1.220717] v_5v0_usb3_hst_vbus: disabling > [ 1.231509] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null) > [ 1.239654] VFS: Mounted root (ext4 filesystem) readonly on device 179:1. > [ 1.248061] devtmpfs: mounted > [ 1.251151] Freeing unused kernel memory: 384K > [ 1.325623] random: fast init done > INIT: version 2.88 booting > [info] Using makefile-style concurrent boot in runlevel S. > [ 1.488069] NET: Registered protocol family 1 > ERROR: could not open /proc/stat: No such file or directory > [....] Starting the hotplug events dispatcher: systemd-udevdstarting version 239 > [ ok . > [....] Synthesizing the initial hotplug events...[ ok done. > [ 1.786418] EFI Variables Facility v0.08 2004-May-17 > [....] Waiting for /dev to be fully populated...[ 1.804433] mvpp2 f2000000.ethernet eth0: Using random mac address fe:a5:21:f0:f8:7d > [ 1.806861] usbcore: registered new interface driver usbfs > [ 1.817792] usbcore: registered new interface driver hub > [ 1.817835] mvpp2 f4000000.ethernet eth1: Using random mac address 86:5f:16:0c:f9:16 > [ 1.823172] usbcore: registered new device driver usb > [ 1.837065] mvpp2 f4000000.ethernet eth2: Using random mac address 8e:6e:60:9f:57:60 > [ 1.849493] ahci f2540000.sata: AHCI 0001.0000 32 slots 2 ports 6 Gbps 0x3 impl platform mode > [ 1.851030] mvpp2 f4000000.ethernet eth3: Using random mac address c6:5e:07:9a:54:82 > [ 1.859250] ahci f2540000.sata: flags: 64bit ncq sntf led only pmp fbs pio slum part sxs > [ 1.874656] scsi host0: ahci > [ 1.877789] scsi host1: ahci > [ 1.880777] ata1: SATA max UDMA/133 mmio [mem 0xf2540000-0xf256ffff] port 0x100 irq 57 > [ 1.888777] ata2: SATA max UDMA/133 mmio [mem 0xf2540000-0xf256ffff] port 0x180 irq 57 > [ 1.897008] ahci f4540000.sata: AHCI 0001.0000 32 slots 2 ports 6 Gbps 0x3 impl platform mode > [ 1.905629] ahci f4540000.sata: flags: 64bit ncq sntf led only pmp fbs pio slum part sxs > [ 1.914173] scsi host2: ahci > [ 1.917252] scsi host3: ahci > [ 1.920225] ata3: SATA max UDMA/133 mmio [mem 0xf4540000-0xf456ffff] port 0x100 irq 58 > [ 1.928198] ata4: SATA max UDMA/133 mmio [mem 0xf4540000-0xf456ffff] port 0x180 irq 58 > [ 1.928608] xhci-hcd f2500000.usb3: xHCI Host Controller > [ 1.942154] xhci-hcd f2500000.usb3: new USB bus registered, assigned bus number 1 > [ 1.951232] xhci-hcd f2500000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x00010010 > [ 1.959840] xhci-hcd f2500000.usb3: irq 59, io mem 0xf2500000 > [ 1.966174] hub 1-0:1.0: USB hub found > [ 1.970058] hub 1-0:1.0: 1 port detected > [ 1.974150] xhci-hcd f2500000.usb3: xHCI Host Controller > [ 1.979551] xhci-hcd f2500000.usb3: new USB bus registered, assigned bus number 2 > [ 1.979558] xhci-hcd f2500000.usb3: Host supports USB 3.0 SuperSpeed > [ 1.993840] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. > [ 2.000947] cryptd: max_cpu_qlen set to 1000 > [ 2.002463] hub 2-0:1.0: USB hub found > [ 2.010089] hub 2-0:1.0: 1 port detected > [ 2.014291] xhci-hcd f2510000.usb3: xHCI Host Controller > [ 2.019647] xhci-hcd f2510000.usb3: new USB bus registered, assigned bus number 3 > [ 2.027219] xhci-hcd f2510000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x00010010 > [ 2.035823] xhci-hcd f2510000.usb3: irq 60, io mem 0xf2510000 > [ 2.042445] hub 3-0:1.0: USB hub found > [ 2.046236] hub 3-0:1.0: 1 port detected > [ 2.050278] xhci-hcd f2510000.usb3: xHCI Host Controller > [ 2.055768] xhci-hcd f2510000.usb3: new USB bus registered, assigned bus number 4 > [ 2.063314] xhci-hcd f2510000.usb3: Host supports USB 3.0 SuperSpeed > [ 2.069818] usb usb4: We don't know the algorithms for LPM for this host, disabling LPM. > [ 2.078176] hub 4-0:1.0: USB hub found > [ 2.081972] hub 4-0:1.0: 1 port detected > [ 2.086215] xhci-hcd f4500000.usb3: xHCI Host Controller > [ 2.091581] xhci-hcd f4500000.usb3: new USB bus registered, assigned bus number 5 > [ 2.099158] xhci-hcd f4500000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x00010010 > [ 2.107751] xhci-hcd f4500000.usb3: irq 61, io mem 0xf4500000 > [ 2.113788] hub 5-0:1.0: USB hub found > [ 2.117586] hub 5-0:1.0: 1 port detected > [ 2.121642] xhci-hcd f4500000.usb3: xHCI Host Controller > [ 2.126988] xhci-hcd f4500000.usb3: new USB bus registered, assigned bus number 6 > [ 2.134514] xhci-hcd f4500000.usb3: Host supports USB 3.0 SuperSpeed > [ 2.141009] usb usb6: We don't know the algorithms for LPM for this host, disabling LPM. > [ 2.149354] hub 6-0:1.0: USB hub found > [ 2.153156] hub 6-0:1.0: 1 port detected > [ 2.162109] [drm] radeon kernel modesetting enabled. > [ 2.167730] radeon 0000:01:00.0: enabling device (0000 -> 0003) > [ 2.174555] [drm] initializing kernel modesetting (CEDAR 0x1002:0x68F9 0x1787:0x3000 0x00). > [ 2.223834] ata1: SATA link down (SStatus 0 SControl 300) > [ 2.250489] ata3: SATA link down (SStatus 0 SControl 300) > [ 2.256599] ata4: SATA link down (SStatus 0 SControl 300) > [ 2.303985] ATOM BIOS: CEDAR > [ 2.306958] [drm] GPU not posted. posting now... > [ 2.314564] radeon 0000:01:00.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used) > [ 2.323482] radeon 0000:01:00.0: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF > [ 2.331229] [drm] Detected VRAM RAM=1024M, BAR=256M > [ 2.336194] [drm] RAM width 64bits DDR > [ 2.343406] [TTM] Zone kernel: Available graphics memory: 2017038 kiB > [ 2.349967] [TTM] Initializing pool allocator > [ 2.354353] [TTM] Initializing DMA pool allocator > [ 2.359105] [drm] radeon: 1024M of VRAM memory ready > [ 2.364109] [drm] radeon: 1024M of GTT memory ready. > [ 2.369260] [drm] Loading CEDAR Microcode > [ 2.370083] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ 2.380025] ata2.00: ATA-8: ST4000DM000-1F2168, CC52, max UDMA/133 > [ 2.386240] ata2.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32) > [ 2.393559] ata2.00: configured for UDMA/133 > [ 2.397952] scsi 1:0:0:0: Direct-Access ATA ST4000DM000-1F21 CC52 PQ: 0 ANSI: 5 > [ 2.418024] sd 1:0:0:0: [sda] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) > [ 2.420812] [drm] Internal thermal controller with fan control > [ 2.425823] sd 1:0:0:0: [sda] 4096-byte physical blocks > [ 2.436947] sd 1:0:0:0: [sda] Write Protect is off > [ 2.441160] [drm] radeon: dpm initialized > [ 2.445807] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > [ 2.459901] [drm] GART: num cpu pages 262144, num gpu pages 262144 > [ 2.460082] usb 5-1: new high-speed USB device number 2 using xhci-hcd > [ 2.466365] sd 1:0:0:0: [sda] Attached SCSI removable disk > [ 2.467195] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0 > [ 2.511728] NET: Registered protocol family 10 > [ 2.516748] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready > [ 2.522778] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready > [ 2.528771] Segment Routing with IPv6 > [ 2.548836] [drm] PCIE GART of 1024M enabled (table at 0x000000000014C000). > [ 2.556059] radeon 0000:01:00.0: WB enabled > [ 2.560272] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0x (ptrval) > [ 2.571109] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0x (ptrval) > [ 2.585425] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0x (ptrval) > [ 2.596259] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). > [ 2.602916] [drm] Driver supports precise vblank timestamp query. > [ 2.609046] radeon 0000:01:00.0: radeon: MSI limited to 32-bit > [ 2.615006] radeon 0000:01:00.0: radeon: using MSI. > [ 2.619944] [drm] radeon: irq initialized. > [ 2.635912] hub 5-1:1.0: USB hub found > [ 2.639847] hub 5-1:1.0: 4 ports detected > [ 2.644731] [drm] ring test on 0 succeeded in 0 usecs > [ 2.649815] [drm] ring test on 3 succeeded in 3 usecs > [ 2.797941] usb 6-1: new SuperSpeed USB device number 2 using xhci-hcd > [ 2.833616] [drm] ring test on 5 succeeded in 1 usecs > [ 2.838697] [drm] UVD initialized successfully. > [ 2.843495] [drm] ib test on ring 0 succeeded in 0 usecs > [ 2.848884] [drm] ib test on ring 3 succeeded in 0 usecs > [ 2.905618] hub 6-1:1.0: USB hub found > [ 2.909579] hub 6-1:1.0: 4 ports detected > [ 2.986738] usb 5-1.4: new high-speed USB device number 3 using xhci-hcd > [ 3.006863] [drm] ib test on ring 5 succeeded > [ 3.011876] [drm] Radeon Display Connectors > [ 3.016085] [drm] Connector 0: > [ 3.019154] [drm] DP-1 > [ 3.021698] [drm] HPD2 > [ 3.024244] [drm] DDC: 0x6460 0x6460 0x6464 0x6464 0x6468 0x6468 0x646c 0x646c > [ 3.031672] [drm] Encoders: > [ 3.034653] [drm] DFP1: INTERNAL_UNIPHY1 > [ 3.038941] [drm] Connector 1: > [ 3.042008] [drm] DVI-I-1 > [ 3.044814] [drm] HPD4 > [ 3.047358] [drm] DDC: 0x6440 0x6440 0x6444 0x6444 0x6448 0x6448 0x644c 0x644c > [ 3.054786] [drm] Encoders: > [ 3.057767] [drm] DFP2: INTERNAL_UNIPHY > [ 3.061968] [drm] CRT1: INTERNAL_KLDSCP_DAC1 > [ 3.066605] [drm] Connector 2: > [ 3.069672] [drm] DVI-I-2 > [ 3.072478] [drm] HPD1 > [ 3.075023] [drm] DDC: 0x6430 0x6430 0x6434 0x6434 0x6438 0x6438 0x643c 0x643c > [ 3.082450] [drm] Encoders: > [ 3.085430] [drm] DFP3: INTERNAL_UNIPHY1 > [ 3.089719] [drm] CRT2: INTERNAL_KLDSCP_DAC2 > [ 3.095924] hub 5-1.4:1.0: USB hub found > [ 3.100002] hub 5-1.4:1.0: 4 ports detected > [ 3.110713] mvpp2 f2000000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx > [ 3.118686] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > [ 3.206262] [drm] fb mappable at 0x80034D000 > [ 3.210554] [drm] vram apper at 0x800000000 > [ 3.214756] [drm] size 8294400 > [ 3.217823] [drm] fb depth is 24 > [ 3.221064] [drm] pitch is 7680 > [ 3.264754] Console: switching to colour frame buffer device 240x67 > [ 3.278056] radeon 0000:01:00.0: fb0: radeondrmfb frame buffer device > [ 3.299311] [drm] Initialized radeon 2.50.0 20080528 for 0000:01:00.0 on minor 0 > [ ok done. > [....] Setting up keyboard layout...[ 3.351975] usb 6-1.4: new SuperSpeed USB device number 3 using xhci-hcd > [ ok done. > [ 3.458706] hub 6-1.4:1.0: USB hub found > [ 3.462996] hub 6-1.4:1.0: 4 ports detected > [ 3.540089] usb 5-1.4.1: new full-speed USB device number 4 using xhci-hcd > [ 3.551721] EXT4-fs (mmcblk0p1): re-mounted. Opts: (null) > [....] Checking root file system...fsck from util-linux 2.32 > /dev/mmcblk0p1: clean, 165507/475136 files, 4946295/7599104 blocks > [ ok done. > [ 3.603163] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=60 > [ 3.723430] usb 5-1.4.1: new high-speed USB device number 5 using xhci-hcd > [....] Activating lvm and md swap...[ ok done. > [....] Checking file systems...fsck from util-linux 2.32 > checking super block... > filesystem is clean, no checking needed. > [ ok done. > [ 3.892330] usbcore: registered new interface driver usbhid > [ 3.898466] usbhid: USB HID core driver > [....] Cleaning up temporary files... /tmp[ ok . > [ 3.971996] usbcore: registered new interface driver snd-usb-audio > [ 3.981106] input: ASUS Xonar U7 MKII as /devices/platform/cp1/cp1:config-space@f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.1/5-1.4.1:1.4/0003:0B05:183C.0001/input/input0 > [ 3.996748] usb 5-1.4.2: new low-speed USB device number 6 using xhci-hcd > [info] Loading kernel module nf_conntrack_ftp. > [info] Loading kernel module snd-usb-audio. > [info] Loading kernel module fbcon. > modprobe: FATAL: Module fbcon not found in directory /lib/modules/4.17.11 > [info] Loading kernel module udl. > modprobe: FATAL: Module udl not found in directory /lib/modules/4.17.11 > [ 4.050298] hid-generic 0003:0B05:183C.0001: input: USB HID v1.00 Device [ASUS Xonar U7 MKII] on usb-f4500000.usb3-1.4.1/input4 > [ 4.143548] random: alsactl: uninitialized urandom read (4 bytes read) > [ 4.159127] input: Logitech USB-PS/2 Optical Mouse as /devices/platform/cp1/cp1:config-space@f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.2/5-1.4.2:1.0/0003:046D:C01E.0002/input/input1 > [ 4.175703] hid-generic 0003:046D:C01E.0002: input: USB HID v1.10 Mouse [Logitech USB-PS/2 Optical Mouse] on usb-f4500000.usb3-1.4.2/input0 > [ 4.252844] Adding 4194300k swap on /i/SWAP. Priority:-2 extents:1 across:4194300k > [ 4.270569] mvpp2 f4000000.ethernet eth2: Link is Up - 100Mbps/Full - flow control rx/tx > [ 4.278738] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready > [ 4.316754] usb 5-1.4.3: new low-speed USB device number 7 using xhci-hcd > [....] Mounting local filesystems...[ ok done. > [....] Activating swapfile swap...[ ok done. > [....] Cleaning up temporary files...[ ok . > [ 4.366696] random: dd: uninitialized urandom read (512 bytes read) > [....] Starting Setting kernel variables: sysctl[ ok . > [ 4.468860] PPP generic driver version 2.4.2 > [ 4.474859] input: GASIA PS2toUSB Adapter as /devices/platform/cp1/cp1:config-space@f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.3/5-1.4.3:1.0/0003:0E8F:0020.0003/input/input2 > [ 4.477505] NET: Registered protocol family 17 > [ 4.539749] NET: Registered protocol family 24 > [ 4.550179] hid-generic 0003:0E8F:0020.0003: input: USB HID v1.10 Keyboard [GASIA PS2toUSB Adapter] on usb-f4500000.usb3-1.4.3/input0 > [ 4.566423] input: GASIA PS2toUSB Adapter as /devices/platform/cp1/cp1:config-space@f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.3/5-1.4.3:1.1/0003:0E8F:0020.0004/input/input3 > [....] Configuring network interfaces...Plugin rp-pppoe.so loaded. > ifup: interface eth0 already configured > ifup: interface eth2 already configured > [ ok done. > [....] Cleaning up temporary files...[ ok . > [ 4.636900] hid-generic 0003:0E8F:0020.0004: input: USB HID v1.10 Mouse [GASIA PS2toUSB Adapter] on usb-f4500000.usb3-1.4.3/input1 > [ 4.660387] random: alsactl: uninitialized urandom read (4 bytes read) > [....] Setting up X socket directories... /tmp/.X11-unix /tmp/.ICE-unix[ ok . > [....] Setting sensors limits...[ ok done. > [....] Setting up ALSA...[ ok done. > [....] Loading netfilter rules...run-parts: executing /usr/share/netfilter-persistent/plugins.d/15-ip4tables start > run-parts: executing /usr/share/netfilter-persistent/plugins.d/25-ip6tables start > [ 4.743426] usb 5-1.4.4: new high-speed USB device number 8 using xhci-hcd > [ ok done. > INIT: Entering runlevel: 2 > [info] Using makefile-style concurrent boot in runlevel 2. > [....] Enabling additional executable binary formats: binfmt-support[ ok . > [....] Setting up console font and keymap...[ ok done. > [ 4.878142] udlfb 5-1.4.4:1.0: vendor descriptor length: 34 data: 22 5f 01 00 20 05 00 01 03 00 04 > [ 4.887190] udlfb 5-1.4.4:1.0: DL chip limited to 2360000 pixel modes > [....] Starting enhanced syslogd: rsyslogd[ ok . > [ 5.046171] usb 5-1.4.4: Unable to get valid EDID from device/display > [ 5.055288] usb 5-1.4.4: fb1 is DisplayLink USB device (800x600, 1880K framebuffer memory) > [ 5.063655] usbcore: registered new interface driver udlfb > [....] Starting system message bus: dbus[ ok . > [....] Loading cpufreq kernel modules...[ ok done (none). > [....] Starting mouse interface server: gpm[ ok . > [....] Starting NTP server: ntpd[ ok . > [ 5.167095] urandom_read: 3 callbacks suppressed > [ 5.167098] random: automount: uninitialized urandom read (4 bytes read) > [ 5.195518] random: isc-worker0000: uninitialized urandom read (10 bytes read) > [ 5.202800] random: isc-worker0000: uninitialized urandom read (40 bytes read) > [....] Starting automount...[ ok . > [....] Starting domain name service...: bind9[ ok . > [....] Starting virtual private network daemon:[ ok . > [....] CPUFreq Utilities: Setting ondemand CPUFreq governor...disabled, governor not available...[ ok done. > Starting very small Busybox based DHCP server: Starting /usr/sbin/udhcpd... > udhcpd. > Starting radvd: radvd. > [....] Starting periodic command scheduler: cron[ ok . > [....] Starting OpenBSD Secure Shell server: sshd[ ok . > [....] Starting WIDE DHCPv6 client: dhcp6c[ ok . > > Debian GNU/Linux buster/sid leontynka ttyS0 > > leontynka login: [ 10.373259] random: crng init done > [ 10.376676] random: 1 urandom warning(s) missed due to ratelimiting > [ 23.931568] tun: Universal TUN/TAP device driver, 1.6 ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-07 18:07 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-07 18:07 UTC (permalink / raw) To: linux-arm-kernel On 7 August 2018 at 19:39, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Tue, 7 Aug 2018, Marcin Wojtas wrote: > >> Ard, Mikulas, >> >> After some self-caused setup issues I was able to run the test on my >> MacchiatoBin with the kernel v4.18-rc8. It's been running for 1h+ now, >> loading the CPU to 100% and no single error event... >> >> I built the binary file with: >> gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc -O2 >> >> Maybe it's the older firmware issue? > > I have downloaded and built the firmware recently (it has timestamp Jul 30 > 2018). > > Do you still have your firmware file "flash-image.bin" that you used, so > that I could try it? > >> Please send the full bootlog with >> the very first line after reset. My board rev is v1.3 and I use >> mainline UEFI (newest edk2 + edk2-platforms) + newest publicly >> available ARM-TF and earliest firmware for this board. >> >> Best regards, >> Marcin > Mikulas, Is the issue reproducible with an nvidia card + nouveau driver as well ? Given the screen corruption i see with radeon even on other arm systems, i'd like to ensure that this is a platform bug not a driver bug. > This is my bootlog: > > BootROM - 2.03 > Starting CP-0 IOROM 1.07 > Booting from SD 0 (0x29) > Found valid image at boot postion 0x002 > lNOTICE: Starting binary extension > NOTICE: SVC: SW Revision 0x0. SVC is not supported > mv_ddr: mv_ddr-devel-18.05.0-g84dd1d9 (Jul 30 2018 - 04:58:51 PM) > mv_ddr: completed successfully > NOTICE: Cold boot > NOTICE: Booting Trusted Firmware > NOTICE: BL1: v1.4(release):armada-18.05.2:80bbf686 > NOTICE: BL1: Built : 17:00:18, Jul 30 2018 > NOTICE: BL1: Booting BL2 > lNOTICE: BL2: v1.4(release):armada-18.05.2:80bbf686 > NOTICE: BL2: Built : 17:00:21, Jul 30 2018 > BL2: Initiating SCP_BL2 transfer to SCP > NOTICE: SCP_BL2 contains 2 concatenated images > NOTICE: Load image to CP1 MSS AP0 > NOTICE: Loading MSS image from address 0x4023020 Size 0x135c to MSS at 0xf4280000 > NOTICE: Done > NOTICE: Load image to AP0 MSS > NOTICE: Loading MSS image from address 0x402437c Size 0x1f6c to MSS at 0xf0580000 > N > > FreeRTOS 7.3.0 - Marvell cm3 - A8K release armada-18.05.1 > > OTICE: Done > NOTICE: SCP Image doesn't contain PM firmware > NOTICE: BL1: Booting BL31 > lNOTICE: MSS PM is not supported in this build > NOTICE: BL31: v1.4(release):armada-18.05.2:80bbf686 > NOTICE: BL31: Built : 17:00:21, Jul 30 2018 > lUEFI firmware (version MARVELL_EFI built at 16:50:27 on Jul 30 2018) > > Armada 8040 MachiatoBin Platform Init > > Comphy0-0: PCIE0 5 Gbps > Comphy0-1: PCIE0 5 Gbps > Comphy0-2: PCIE0 5 Gbps > Comphy0-3: PCIE0 5 Gbps > Comphy0-4: SFI 10.31 Gbps > Comphy0-5: SATA1 5 Gbps > > Comphy1-0: SGMII1 1.25 Gbps > Comphy1-1: SATA2 5 Gbps > Comphy1-2: USB3_HOST0 5 Gbps > Comphy1-3: SATA3 5 Gbps > Comphy1-4: SFI 10.31 Gbps > Comphy1-5: SGMII2 3.125 Gbps > > UTMI PHY 0 initialized to USB Host0 > UTMI PHY 1 initialized to USB Host1 > UTMI PHY 2 initialized to USB Host0 > Succesfully installed protocol interfaces > Error: Image at 000BF6F8000 start failed: 00000001 > remove-symbol-file /usr/src/git/macchiato/edk2/Build/Armada80x0McBin-AARCH64/RELEASE_GCC5/AARCH64/MdeModulePkg/Universal/Acpi/AcpiPlatformDxe/AcpiPlatformDxe/DEBUG/AcpiPlatform.dll 0xBF6F9000 > Detected w25q32bv SPI flash with page size 256 B, erase size 4 KB, total 4 MB > ramdisk:blckio install. Status=Success > Connect: PcieRoot(0x0)/Pci(0x0,0x0): Not Found > 3h3h3hTianocore/EDK2 firmware version MARVELL_EFI > Press ESCAPE for boot options ...error: no suitable video mode found. > error: no video mode activated. > GNU GRUB version 2.02~beta3-5 > > /----------------------------------------------------------------------------\||||||||||||||||||||||||||\----------------------------------------------------------------------------/ Use the ^ and v keys to select which entry is highlighted. > Press enter to boot the selected OS, `e' to edit the commands > before booting or `c' for a command-line. > *Debian GNU/Linux Advanced options for Debian GNU/Linux System setup > The highlighted entry will be executed automatically in 5s. The highlighted entry will be executed automatically in 4s. Loading Linux 4.17.11 ... > EFI stub: Booting Linux Kernel... > EFI stub: Using DTB from configuration table > EFI stub: Exiting boot services and installing virtual address map... > [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd081] > [ 0.000000] Linux version 4.17.11 (root at leontynka) (gcc version 8.2.0 (Debian 8.2.0-2)) #10 SMP PREEMPT Fri Aug 3 18:29:35 CEST 2018 > [ 0.000000] Machine model: Marvell 8040 MACCHIATOBin > [ 0.000000] efi: Getting EFI parameters from FDT: > [ 0.000000] efi: EFI v2.70 by EDK II > [ 0.000000] efi: SMBIOS 3.0=0xbfed0000 ACPI 2.0=0xb6760000 MEMATTR=0xb8c63518 RNG=0xbffdcf98 > [ 0.000000] efi: seeding entropy pool > [ 0.000000] psci: probing for conduit method from DT. > [ 0.000000] psci: PSCIv1.0 detected in firmware. > [ 0.000000] psci: Using standard PSCI v0.2 function IDs > [ 0.000000] psci: MIGRATE_INFO_TYPE not supported. > [ 0.000000] psci: SMC Calling Convention v1.1 > [ 0.000000] percpu: Embedded 26 pages/cpu @ (ptrval) s67096 r8192 d31208 u106496 > [ 0.000000] Detected PIPT I-cache on CPU0 > [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 1031688 > [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.17.11 root=/dev/mmcblk0p1 ro console=ttyS0,115200 > [ 0.000000] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) > [ 0.000000] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) > [ 0.000000] software IO TLB [mem 0xbb810000-0xbf810000] (64MB) mapped at [ (ptrval)- (ptrval)] > [ 0.000000] Memory: 4033692K/4192256K available (4860K kernel code, 376K rwdata, 2452K rodata, 384K init, 2178K bss, 158564K reserved, 0K cma-reserved) > [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 > [ 0.000000] Preemptible hierarchical RCU implementation. > [ 0.000000] Tasks RCU enabled. > [ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 > [ 0.000000] GIC: Adjusting CPU interface base to 0x00000000f022f000 > [ 0.000000] GIC: Using split EOI/Deactivate mode > [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:160, num:32) > [ 0.000000] GICv2m: range[mem 0xf0280000-0xf0280fff], SPI[160:191] > [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:192, num:32) > [ 0.000000] GICv2m: range[mem 0xf0290000-0xf0290fff], SPI[192:223] > [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:224, num:32) > [ 0.000000] GICv2m: range[mem 0xf02a0000-0xf02a0fff], SPI[224:255] > [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:256, num:32) > [ 0.000000] GICv2m: range[mem 0xf02b0000-0xf02b0fff], SPI[256:287] > [ 0.000000] arch_timer: cp15 timer(s) running at 25.00MHz (phys). > [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x5c40939b5, max_idle_ns: 440795202646 ns > [ 0.000002] sched_clock: 56 bits at 25MHz, resolution 40ns, wraps every 4398046511100ns > [ 0.000113] Console: colour dummy device 174x49 > [ 0.000124] Calibrating delay loop (skipped), value calculated using timer frequency.. 50.08 BogoMIPS (lpj=83333) > [ 0.000129] pid_max: default: 32768 minimum: 301 > [ 0.000151] Security Framework initialized > [ 0.000154] Yama: becoming mindful. > [ 0.000183] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes) > [ 0.000199] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes) > [ 0.016676] ASID allocator initialised with 65536 entries > [ 0.020006] Hierarchical SRCU implementation. > [ 0.023435] Remapping and enabling EFI services. > [ 0.026680] smp: Bringing up secondary CPUs ... > [ 0.043500] Detected PIPT I-cache on CPU1 > [ 0.043522] CPU1: Booted secondary processor 0x0000000001 [0x410fd081] > [ 0.060176] Detected PIPT I-cache on CPU2 > [ 0.060195] CPU2: Booted secondary processor 0x0000000100 [0x410fd081] > [ 0.076859] Detected PIPT I-cache on CPU3 > [ 0.076872] CPU3: Booted secondary processor 0x0000000101 [0x410fd081] > [ 0.076901] smp: Brought up 1 node, 4 CPUs > [ 0.076910] SMP: Total of 4 processors activated. > [ 0.076913] CPU features: detected: 32-bit EL0 Support > [ 0.077194] CPU: All CPU(s) started at EL2 > [ 0.077205] alternatives: patching kernel code > [ 0.077230] random: get_random_u64 called from compute_layout+0x94/0xe8 with crng_init=0 > [ 0.077599] devtmpfs: initialized > [ 0.078967] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370867519511994 ns > [ 0.078975] futex hash table entries: 1024 (order: 5, 131072 bytes) > [ 0.079032] pinctrl core: initialized pinctrl subsystem > [ 0.079183] SMBIOS 3.0.0 present. > [ 0.079191] DMI: Marvell Armada 8040 MacchiatoBin/Armada 8040 MacchiatoBin, BIOS EDK II Jul 30 2018 > [ 0.079264] NET: Registered protocol family 16 > [ 0.079484] cpuidle: using governor ladder > [ 0.079535] cpuidle: using governor menu > [ 0.079559] vdso: 2 pages (1 code @ (ptrval), 1 data @ (ptrval)) > [ 0.079562] vdso: 2 pages (1 code @ (ptrval), 1 data @ (ptrval)) > [ 0.079569] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers. > [ 0.079681] DMA: preallocated 256 KiB pool for atomic allocations > [ 0.082434] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages > [ 0.082669] ACPI: Interpreter disabled. > [ 0.082822] reg-fixed-voltage regulator-usb3-vbus0: could not find pctldev for node /cp0/config-space at f2000000/system-controller at 440000/pinctrl/xhci0-vbus-pins, deferring probe > [ 0.082929] SCSI subsystem initialized > [ 0.083033] Registered efivars operations > [ 0.083398] clocksource: Switched to clocksource arch_sys_counter > [ 0.083517] pnp: PnP ACPI: disabled > [ 0.085008] NET: Registered protocol family 2 > [ 0.085146] tcp_listen_portaddr_hash hash table entries: 2048 (order: 3, 32768 bytes) > [ 0.085156] TCP established hash table entries: 32768 (order: 6, 262144 bytes) > [ 0.085213] TCP bind hash table entries: 32768 (order: 7, 524288 bytes) > [ 0.085441] TCP: Hash tables configured (established 32768 bind 32768) > [ 0.085492] UDP hash table entries: 2048 (order: 4, 65536 bytes) > [ 0.085507] UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes) > [ 0.085713] hw perfevents: unable to count PMU IRQs > [ 0.085718] hw perfevents: /ap806/config-space at f0000000/pmu: failed to register PMU devices! > [ 0.085823] kvm [1]: 8-bit VMID > [ 0.086279] kvm [1]: vgic interrupt IRQ1 > [ 0.086339] kvm [1]: Hyp mode initialized successfully > [ 0.086649] workingset: timestamp_bits=62 max_order=20 bucket_order=0 > [ 0.088566] io scheduler noop registered > [ 0.088625] io scheduler cfq registered (default) > [ 0.089467] armada-ap806-pinctrl f06f4000.system-controller:pinctrl: registered pinctrl driver > [ 0.089690] armada-cp110-pinctrl f2440000.system-controller:pinctrl: registered pinctrl driver > [ 0.089843] armada-cp110-pinctrl f4440000.system-controller:pinctrl: registered pinctrl driver > [ 0.091439] mv_xor_v2 f0400000.xor: Marvell Version 2 XOR driver > [ 0.091583] mv_xor_v2 f0420000.xor: Marvell Version 2 XOR driver > [ 0.091736] mv_xor_v2 f0440000.xor: Marvell Version 2 XOR driver > [ 0.091897] mv_xor_v2 f0460000.xor: Marvell Version 2 XOR driver > [ 0.092084] mv_xor_v2 f26a0000.xor: Marvell Version 2 XOR driver > [ 0.092247] mv_xor_v2 f26c0000.xor: Marvell Version 2 XOR driver > [ 0.092432] mv_xor_v2 f46a0000.xor: Marvell Version 2 XOR driver > [ 0.092596] mv_xor_v2 f46c0000.xor: Marvell Version 2 XOR driver > [ 0.092685] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled > [ 0.092996] console [ttyS0] disabled > [ 0.113364] f0512000.serial: ttyS0 at MMIO 0xf0512000 (irq = 7, base_baud = 12500000) is a 16550A > [ 0.777029] console [ttyS0] enabled > [ 0.801032] f2702100.serial: ttyS1 at MMIO 0xf2702100 (irq = 26, base_baud = 15625000) is a 16550A > [ 0.830524] f4702000.serial: ttyS2 at MMIO 0xf4702000 (irq = 27, base_baud = 15625000) is a 16550A > [ 0.839705] cacheinfo: Unable to detect cache hierarchy for CPU 0 > [ 0.846129] libphy: Fixed MDIO Bus: probed > [ 0.850384] libphy: orion_mdio_bus: probed > [ 0.854879] libphy: orion_mdio_bus: probed > [ 0.862575] mousedev: PS/2 mouse device common for all mice > [ 0.868314] rtc-efi rtc-efi: rtc core: registered rtc-efi as rtc0 > [ 0.874446] i2c /dev entries driver > [ 0.880187] sdhci: Secure Digital Host Controller Interface driver > [ 0.886399] sdhci: Copyright(c) Pierre Ossman > [ 0.890777] sdhci-pltfm: SDHCI platform and OF driver helper > [ 0.896640] mmc0: Switching to 3.3V signalling voltage failed > [ 0.927528] mmc0: SDHCI controller on f06e0000.sdhci [f06e0000.sdhci] using ADMA 64-bit > [ 0.966330] mmc1: SDHCI controller on f2780000.sdhci [f2780000.sdhci] using ADMA 64-bit > [ 0.976142] hw perfevents: enabled with armv8_cortex_a72 PMU driver, 7 counters available > [ 0.984470] PCI: OF: host bridge /cp0/pcie at f2600000 ranges: > [ 0.990120] PCI: OF: IO 0xeff00000..0xeff0ffff -> 0x00000000 > [ 0.996083] PCI: OF: MEM 0xc0000000..0xdfffffff -> 0xc0000000 > [ 1.002050] PCI: OF: MEM 0x800000000..0x8ffffffff -> 0x800000000 > [ 1.008301] mmc0: new high speed MMC card at address 0001 > [ 1.008337] armada8k-pcie f2600000.pcie: link up > [ 1.013884] mmcblk0: mmc0:0001 8GME4R 7.28 GiB > [ 1.018427] armada8k-pcie f2600000.pcie: PCI host bridge to bus 0000:00 > [ 1.023036] mmcblk0boot0: mmc0:0001 8GME4R partition 1 4.00 MiB > [ 1.029619] pci_bus 0000:00: root bus resource [bus 00-ff] > [ 1.035652] mmcblk0boot1: mmc0:0001 8GME4R partition 2 4.00 MiB > [ 1.041101] pci_bus 0000:00: root bus resource [io 0x0000-0xffff] > [ 1.047085] mmcblk0rpmb: mmc0:0001 8GME4R partition 3 512 KiB, chardev (250:0) > [ 1.053256] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xdfffffff] > [ 1.067452] pci_bus 0000:00: root bus resource [mem 0x800000000-0x8ffffffff] > [ 1.074559] mmcblk0: p1 p2 p3 > [ 1.077642] pci 0000:00:00.0: disabling Extended Tags (this device can't handle them) > [ 1.097253] pci 0000:00:00.0: BAR 9: assigned [mem 0x800000000-0x80fffffff 64bit pref] > [ 1.105218] pci 0000:00:00.0: BAR 0: assigned [mem 0x810000000-0x8100fffff 64bit] > [ 1.112749] pci 0000:00:00.0: BAR 8: assigned [mem 0xc0000000-0xc00fffff] > [ 1.119578] pci 0000:00:00.0: BAR 7: assigned [io 0x1000-0x1fff] > [ 1.125712] pci 0000:01:00.0: BAR 0: assigned [mem 0x800000000-0x80fffffff 64bit pref] > [ 1.133705] pci 0000:01:00.0: BAR 2: assigned [mem 0xc0000000-0xc001ffff 64bit] > [ 1.141088] pci 0000:01:00.0: BAR 6: assigned [mem 0xc0020000-0xc003ffff pref] > [ 1.148353] pci 0000:01:00.1: BAR 0: assigned [mem 0xc0040000-0xc0043fff 64bit] > [ 1.155738] pci 0000:01:00.0: BAR 4: assigned [io 0x1000-0x10ff] > [ 1.161879] pci 0000:00:00.0: PCI bridge to [bus 01-ff] > [ 1.167139] pci 0000:00:00.0: bridge window [io 0x1000-0x1fff] > [ 1.173274] pci 0000:00:00.0: bridge window [mem 0xc0000000-0xc00fffff] > [ 1.180106] pci 0000:00:00.0: bridge window [mem 0x800000000-0x80fffffff 64bit pref] > [ 1.188210] mmc1: new high speed SDHC card at address 1234 > [ 1.193873] mmcblk1: mmc1:1234 SA08G 7.41 GiB > [ 1.198461] pcieport 0000:00:00.0: AER enabled with IRQ 32 > [ 1.204020] pci 0000:01:00.1: Linked as a consumer to 0000:01:00.0 > [ 1.210396] rtc-efi rtc-efi: setting system clock to 2018-08-06 20:01:28 UTC (1533585688) > [ 1.211676] mmcblk1: p1 p2 > [ 1.220717] v_5v0_usb3_hst_vbus: disabling > [ 1.231509] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null) > [ 1.239654] VFS: Mounted root (ext4 filesystem) readonly on device 179:1. > [ 1.248061] devtmpfs: mounted > [ 1.251151] Freeing unused kernel memory: 384K > [ 1.325623] random: fast init done > INIT: version 2.88 booting > [info] Using makefile-style concurrent boot in runlevel S. > [ 1.488069] NET: Registered protocol family 1 > ERROR: could not open /proc/stat: No such file or directory > [....] Starting the hotplug events dispatcher: systemd-udevdstarting version 239 > [ ok . > [....] Synthesizing the initial hotplug events...[ ok done. > [ 1.786418] EFI Variables Facility v0.08 2004-May-17 > [....] Waiting for /dev to be fully populated...[ 1.804433] mvpp2 f2000000.ethernet eth0: Using random mac address fe:a5:21:f0:f8:7d > [ 1.806861] usbcore: registered new interface driver usbfs > [ 1.817792] usbcore: registered new interface driver hub > [ 1.817835] mvpp2 f4000000.ethernet eth1: Using random mac address 86:5f:16:0c:f9:16 > [ 1.823172] usbcore: registered new device driver usb > [ 1.837065] mvpp2 f4000000.ethernet eth2: Using random mac address 8e:6e:60:9f:57:60 > [ 1.849493] ahci f2540000.sata: AHCI 0001.0000 32 slots 2 ports 6 Gbps 0x3 impl platform mode > [ 1.851030] mvpp2 f4000000.ethernet eth3: Using random mac address c6:5e:07:9a:54:82 > [ 1.859250] ahci f2540000.sata: flags: 64bit ncq sntf led only pmp fbs pio slum part sxs > [ 1.874656] scsi host0: ahci > [ 1.877789] scsi host1: ahci > [ 1.880777] ata1: SATA max UDMA/133 mmio [mem 0xf2540000-0xf256ffff] port 0x100 irq 57 > [ 1.888777] ata2: SATA max UDMA/133 mmio [mem 0xf2540000-0xf256ffff] port 0x180 irq 57 > [ 1.897008] ahci f4540000.sata: AHCI 0001.0000 32 slots 2 ports 6 Gbps 0x3 impl platform mode > [ 1.905629] ahci f4540000.sata: flags: 64bit ncq sntf led only pmp fbs pio slum part sxs > [ 1.914173] scsi host2: ahci > [ 1.917252] scsi host3: ahci > [ 1.920225] ata3: SATA max UDMA/133 mmio [mem 0xf4540000-0xf456ffff] port 0x100 irq 58 > [ 1.928198] ata4: SATA max UDMA/133 mmio [mem 0xf4540000-0xf456ffff] port 0x180 irq 58 > [ 1.928608] xhci-hcd f2500000.usb3: xHCI Host Controller > [ 1.942154] xhci-hcd f2500000.usb3: new USB bus registered, assigned bus number 1 > [ 1.951232] xhci-hcd f2500000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x00010010 > [ 1.959840] xhci-hcd f2500000.usb3: irq 59, io mem 0xf2500000 > [ 1.966174] hub 1-0:1.0: USB hub found > [ 1.970058] hub 1-0:1.0: 1 port detected > [ 1.974150] xhci-hcd f2500000.usb3: xHCI Host Controller > [ 1.979551] xhci-hcd f2500000.usb3: new USB bus registered, assigned bus number 2 > [ 1.979558] xhci-hcd f2500000.usb3: Host supports USB 3.0 SuperSpeed > [ 1.993840] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. > [ 2.000947] cryptd: max_cpu_qlen set to 1000 > [ 2.002463] hub 2-0:1.0: USB hub found > [ 2.010089] hub 2-0:1.0: 1 port detected > [ 2.014291] xhci-hcd f2510000.usb3: xHCI Host Controller > [ 2.019647] xhci-hcd f2510000.usb3: new USB bus registered, assigned bus number 3 > [ 2.027219] xhci-hcd f2510000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x00010010 > [ 2.035823] xhci-hcd f2510000.usb3: irq 60, io mem 0xf2510000 > [ 2.042445] hub 3-0:1.0: USB hub found > [ 2.046236] hub 3-0:1.0: 1 port detected > [ 2.050278] xhci-hcd f2510000.usb3: xHCI Host Controller > [ 2.055768] xhci-hcd f2510000.usb3: new USB bus registered, assigned bus number 4 > [ 2.063314] xhci-hcd f2510000.usb3: Host supports USB 3.0 SuperSpeed > [ 2.069818] usb usb4: We don't know the algorithms for LPM for this host, disabling LPM. > [ 2.078176] hub 4-0:1.0: USB hub found > [ 2.081972] hub 4-0:1.0: 1 port detected > [ 2.086215] xhci-hcd f4500000.usb3: xHCI Host Controller > [ 2.091581] xhci-hcd f4500000.usb3: new USB bus registered, assigned bus number 5 > [ 2.099158] xhci-hcd f4500000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x00010010 > [ 2.107751] xhci-hcd f4500000.usb3: irq 61, io mem 0xf4500000 > [ 2.113788] hub 5-0:1.0: USB hub found > [ 2.117586] hub 5-0:1.0: 1 port detected > [ 2.121642] xhci-hcd f4500000.usb3: xHCI Host Controller > [ 2.126988] xhci-hcd f4500000.usb3: new USB bus registered, assigned bus number 6 > [ 2.134514] xhci-hcd f4500000.usb3: Host supports USB 3.0 SuperSpeed > [ 2.141009] usb usb6: We don't know the algorithms for LPM for this host, disabling LPM. > [ 2.149354] hub 6-0:1.0: USB hub found > [ 2.153156] hub 6-0:1.0: 1 port detected > [ 2.162109] [drm] radeon kernel modesetting enabled. > [ 2.167730] radeon 0000:01:00.0: enabling device (0000 -> 0003) > [ 2.174555] [drm] initializing kernel modesetting (CEDAR 0x1002:0x68F9 0x1787:0x3000 0x00). > [ 2.223834] ata1: SATA link down (SStatus 0 SControl 300) > [ 2.250489] ata3: SATA link down (SStatus 0 SControl 300) > [ 2.256599] ata4: SATA link down (SStatus 0 SControl 300) > [ 2.303985] ATOM BIOS: CEDAR > [ 2.306958] [drm] GPU not posted. posting now... > [ 2.314564] radeon 0000:01:00.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used) > [ 2.323482] radeon 0000:01:00.0: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF > [ 2.331229] [drm] Detected VRAM RAM=1024M, BAR=256M > [ 2.336194] [drm] RAM width 64bits DDR > [ 2.343406] [TTM] Zone kernel: Available graphics memory: 2017038 kiB > [ 2.349967] [TTM] Initializing pool allocator > [ 2.354353] [TTM] Initializing DMA pool allocator > [ 2.359105] [drm] radeon: 1024M of VRAM memory ready > [ 2.364109] [drm] radeon: 1024M of GTT memory ready. > [ 2.369260] [drm] Loading CEDAR Microcode > [ 2.370083] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ 2.380025] ata2.00: ATA-8: ST4000DM000-1F2168, CC52, max UDMA/133 > [ 2.386240] ata2.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32) > [ 2.393559] ata2.00: configured for UDMA/133 > [ 2.397952] scsi 1:0:0:0: Direct-Access ATA ST4000DM000-1F21 CC52 PQ: 0 ANSI: 5 > [ 2.418024] sd 1:0:0:0: [sda] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) > [ 2.420812] [drm] Internal thermal controller with fan control > [ 2.425823] sd 1:0:0:0: [sda] 4096-byte physical blocks > [ 2.436947] sd 1:0:0:0: [sda] Write Protect is off > [ 2.441160] [drm] radeon: dpm initialized > [ 2.445807] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > [ 2.459901] [drm] GART: num cpu pages 262144, num gpu pages 262144 > [ 2.460082] usb 5-1: new high-speed USB device number 2 using xhci-hcd > [ 2.466365] sd 1:0:0:0: [sda] Attached SCSI removable disk > [ 2.467195] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0 > [ 2.511728] NET: Registered protocol family 10 > [ 2.516748] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready > [ 2.522778] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready > [ 2.528771] Segment Routing with IPv6 > [ 2.548836] [drm] PCIE GART of 1024M enabled (table at 0x000000000014C000). > [ 2.556059] radeon 0000:01:00.0: WB enabled > [ 2.560272] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0x (ptrval) > [ 2.571109] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0x (ptrval) > [ 2.585425] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0x (ptrval) > [ 2.596259] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). > [ 2.602916] [drm] Driver supports precise vblank timestamp query. > [ 2.609046] radeon 0000:01:00.0: radeon: MSI limited to 32-bit > [ 2.615006] radeon 0000:01:00.0: radeon: using MSI. > [ 2.619944] [drm] radeon: irq initialized. > [ 2.635912] hub 5-1:1.0: USB hub found > [ 2.639847] hub 5-1:1.0: 4 ports detected > [ 2.644731] [drm] ring test on 0 succeeded in 0 usecs > [ 2.649815] [drm] ring test on 3 succeeded in 3 usecs > [ 2.797941] usb 6-1: new SuperSpeed USB device number 2 using xhci-hcd > [ 2.833616] [drm] ring test on 5 succeeded in 1 usecs > [ 2.838697] [drm] UVD initialized successfully. > [ 2.843495] [drm] ib test on ring 0 succeeded in 0 usecs > [ 2.848884] [drm] ib test on ring 3 succeeded in 0 usecs > [ 2.905618] hub 6-1:1.0: USB hub found > [ 2.909579] hub 6-1:1.0: 4 ports detected > [ 2.986738] usb 5-1.4: new high-speed USB device number 3 using xhci-hcd > [ 3.006863] [drm] ib test on ring 5 succeeded > [ 3.011876] [drm] Radeon Display Connectors > [ 3.016085] [drm] Connector 0: > [ 3.019154] [drm] DP-1 > [ 3.021698] [drm] HPD2 > [ 3.024244] [drm] DDC: 0x6460 0x6460 0x6464 0x6464 0x6468 0x6468 0x646c 0x646c > [ 3.031672] [drm] Encoders: > [ 3.034653] [drm] DFP1: INTERNAL_UNIPHY1 > [ 3.038941] [drm] Connector 1: > [ 3.042008] [drm] DVI-I-1 > [ 3.044814] [drm] HPD4 > [ 3.047358] [drm] DDC: 0x6440 0x6440 0x6444 0x6444 0x6448 0x6448 0x644c 0x644c > [ 3.054786] [drm] Encoders: > [ 3.057767] [drm] DFP2: INTERNAL_UNIPHY > [ 3.061968] [drm] CRT1: INTERNAL_KLDSCP_DAC1 > [ 3.066605] [drm] Connector 2: > [ 3.069672] [drm] DVI-I-2 > [ 3.072478] [drm] HPD1 > [ 3.075023] [drm] DDC: 0x6430 0x6430 0x6434 0x6434 0x6438 0x6438 0x643c 0x643c > [ 3.082450] [drm] Encoders: > [ 3.085430] [drm] DFP3: INTERNAL_UNIPHY1 > [ 3.089719] [drm] CRT2: INTERNAL_KLDSCP_DAC2 > [ 3.095924] hub 5-1.4:1.0: USB hub found > [ 3.100002] hub 5-1.4:1.0: 4 ports detected > [ 3.110713] mvpp2 f2000000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx > [ 3.118686] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > [ 3.206262] [drm] fb mappable at 0x80034D000 > [ 3.210554] [drm] vram apper at 0x800000000 > [ 3.214756] [drm] size 8294400 > [ 3.217823] [drm] fb depth is 24 > [ 3.221064] [drm] pitch is 7680 > [ 3.264754] Console: switching to colour frame buffer device 240x67 > [ 3.278056] radeon 0000:01:00.0: fb0: radeondrmfb frame buffer device > [ 3.299311] [drm] Initialized radeon 2.50.0 20080528 for 0000:01:00.0 on minor 0 > [ ok done. > [....] Setting up keyboard layout...[ 3.351975] usb 6-1.4: new SuperSpeed USB device number 3 using xhci-hcd > [ ok done. > [ 3.458706] hub 6-1.4:1.0: USB hub found > [ 3.462996] hub 6-1.4:1.0: 4 ports detected > [ 3.540089] usb 5-1.4.1: new full-speed USB device number 4 using xhci-hcd > [ 3.551721] EXT4-fs (mmcblk0p1): re-mounted. Opts: (null) > [....] Checking root file system...fsck from util-linux 2.32 > /dev/mmcblk0p1: clean, 165507/475136 files, 4946295/7599104 blocks > [ ok done. > [ 3.603163] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=60 > [ 3.723430] usb 5-1.4.1: new high-speed USB device number 5 using xhci-hcd > [....] Activating lvm and md swap...[ ok done. > [....] Checking file systems...fsck from util-linux 2.32 > checking super block... > filesystem is clean, no checking needed. > [ ok done. > [ 3.892330] usbcore: registered new interface driver usbhid > [ 3.898466] usbhid: USB HID core driver > [....] Cleaning up temporary files... /tmp[ ok . > [ 3.971996] usbcore: registered new interface driver snd-usb-audio > [ 3.981106] input: ASUS Xonar U7 MKII as /devices/platform/cp1/cp1:config-space at f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.1/5-1.4.1:1.4/0003:0B05:183C.0001/input/input0 > [ 3.996748] usb 5-1.4.2: new low-speed USB device number 6 using xhci-hcd > [info] Loading kernel module nf_conntrack_ftp. > [info] Loading kernel module snd-usb-audio. > [info] Loading kernel module fbcon. > modprobe: FATAL: Module fbcon not found in directory /lib/modules/4.17.11 > [info] Loading kernel module udl. > modprobe: FATAL: Module udl not found in directory /lib/modules/4.17.11 > [ 4.050298] hid-generic 0003:0B05:183C.0001: input: USB HID v1.00 Device [ASUS Xonar U7 MKII] on usb-f4500000.usb3-1.4.1/input4 > [ 4.143548] random: alsactl: uninitialized urandom read (4 bytes read) > [ 4.159127] input: Logitech USB-PS/2 Optical Mouse as /devices/platform/cp1/cp1:config-space at f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.2/5-1.4.2:1.0/0003:046D:C01E.0002/input/input1 > [ 4.175703] hid-generic 0003:046D:C01E.0002: input: USB HID v1.10 Mouse [Logitech USB-PS/2 Optical Mouse] on usb-f4500000.usb3-1.4.2/input0 > [ 4.252844] Adding 4194300k swap on /i/SWAP. Priority:-2 extents:1 across:4194300k > [ 4.270569] mvpp2 f4000000.ethernet eth2: Link is Up - 100Mbps/Full - flow control rx/tx > [ 4.278738] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready > [ 4.316754] usb 5-1.4.3: new low-speed USB device number 7 using xhci-hcd > [....] Mounting local filesystems...[ ok done. > [....] Activating swapfile swap...[ ok done. > [....] Cleaning up temporary files...[ ok . > [ 4.366696] random: dd: uninitialized urandom read (512 bytes read) > [....] Starting Setting kernel variables: sysctl[ ok . > [ 4.468860] PPP generic driver version 2.4.2 > [ 4.474859] input: GASIA PS2toUSB Adapter as /devices/platform/cp1/cp1:config-space at f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.3/5-1.4.3:1.0/0003:0E8F:0020.0003/input/input2 > [ 4.477505] NET: Registered protocol family 17 > [ 4.539749] NET: Registered protocol family 24 > [ 4.550179] hid-generic 0003:0E8F:0020.0003: input: USB HID v1.10 Keyboard [GASIA PS2toUSB Adapter] on usb-f4500000.usb3-1.4.3/input0 > [ 4.566423] input: GASIA PS2toUSB Adapter as /devices/platform/cp1/cp1:config-space at f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.3/5-1.4.3:1.1/0003:0E8F:0020.0004/input/input3 > [....] Configuring network interfaces...Plugin rp-pppoe.so loaded. > ifup: interface eth0 already configured > ifup: interface eth2 already configured > [ ok done. > [....] Cleaning up temporary files...[ ok . > [ 4.636900] hid-generic 0003:0E8F:0020.0004: input: USB HID v1.10 Mouse [GASIA PS2toUSB Adapter] on usb-f4500000.usb3-1.4.3/input1 > [ 4.660387] random: alsactl: uninitialized urandom read (4 bytes read) > [....] Setting up X socket directories... /tmp/.X11-unix /tmp/.ICE-unix[ ok . > [....] Setting sensors limits...[ ok done. > [....] Setting up ALSA...[ ok done. > [....] Loading netfilter rules...run-parts: executing /usr/share/netfilter-persistent/plugins.d/15-ip4tables start > run-parts: executing /usr/share/netfilter-persistent/plugins.d/25-ip6tables start > [ 4.743426] usb 5-1.4.4: new high-speed USB device number 8 using xhci-hcd > [ ok done. > INIT: Entering runlevel: 2 > [info] Using makefile-style concurrent boot in runlevel 2. > [....] Enabling additional executable binary formats: binfmt-support[ ok . > [....] Setting up console font and keymap...[ ok done. > [ 4.878142] udlfb 5-1.4.4:1.0: vendor descriptor length: 34 data: 22 5f 01 00 20 05 00 01 03 00 04 > [ 4.887190] udlfb 5-1.4.4:1.0: DL chip limited to 2360000 pixel modes > [....] Starting enhanced syslogd: rsyslogd[ ok . > [ 5.046171] usb 5-1.4.4: Unable to get valid EDID from device/display > [ 5.055288] usb 5-1.4.4: fb1 is DisplayLink USB device (800x600, 1880K framebuffer memory) > [ 5.063655] usbcore: registered new interface driver udlfb > [....] Starting system message bus: dbus[ ok . > [....] Loading cpufreq kernel modules...[ ok done (none). > [....] Starting mouse interface server: gpm[ ok . > [....] Starting NTP server: ntpd[ ok . > [ 5.167095] urandom_read: 3 callbacks suppressed > [ 5.167098] random: automount: uninitialized urandom read (4 bytes read) > [ 5.195518] random: isc-worker0000: uninitialized urandom read (10 bytes read) > [ 5.202800] random: isc-worker0000: uninitialized urandom read (40 bytes read) > [....] Starting automount...[ ok . > [....] Starting domain name service...: bind9[ ok . > [....] Starting virtual private network daemon:[ ok . > [....] CPUFreq Utilities: Setting ondemand CPUFreq governor...disabled, governor not available...[ ok done. > Starting very small Busybox based DHCP server: Starting /usr/sbin/udhcpd... > udhcpd. > Starting radvd: radvd. > [....] Starting periodic command scheduler: cron[ ok . > [....] Starting OpenBSD Secure Shell server: sshd[ ok . > [....] Starting WIDE DHCPv6 client: dhcp6c[ ok . > > Debian GNU/Linux buster/sid leontynka ttyS0 > > leontynka login: [ 10.373259] random: crng init done > [ 10.376676] random: 1 urandom warning(s) missed due to ratelimiting > [ 23.931568] tun: Universal TUN/TAP device driver, 1.6 ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-07 18:07 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-07 18:07 UTC (permalink / raw) To: Mikulas Patocka Cc: Thomas Petazzoni, Joao Pinto, Jingoo Han, linux-pci, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, Matt Sealey, Catalin Marinas, Marcin Wojtas, Robin Murphy, linux-arm-kernel On 7 August 2018 at 19:39, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Tue, 7 Aug 2018, Marcin Wojtas wrote: > >> Ard, Mikulas, >> >> After some self-caused setup issues I was able to run the test on my >> MacchiatoBin with the kernel v4.18-rc8. It's been running for 1h+ now, >> loading the CPU to 100% and no single error event... >> >> I built the binary file with: >> gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc -O2 >> >> Maybe it's the older firmware issue? > > I have downloaded and built the firmware recently (it has timestamp Jul 30 > 2018). > > Do you still have your firmware file "flash-image.bin" that you used, so > that I could try it? > >> Please send the full bootlog with >> the very first line after reset. My board rev is v1.3 and I use >> mainline UEFI (newest edk2 + edk2-platforms) + newest publicly >> available ARM-TF and earliest firmware for this board. >> >> Best regards, >> Marcin > Mikulas, Is the issue reproducible with an nvidia card + nouveau driver as well ? Given the screen corruption i see with radeon even on other arm systems, i'd like to ensure that this is a platform bug not a driver bug. > This is my bootlog: > > BootROM - 2.03 > Starting CP-0 IOROM 1.07 > Booting from SD 0 (0x29) > Found valid image at boot postion 0x002 > lNOTICE: Starting binary extension > NOTICE: SVC: SW Revision 0x0. SVC is not supported > mv_ddr: mv_ddr-devel-18.05.0-g84dd1d9 (Jul 30 2018 - 04:58:51 PM) > mv_ddr: completed successfully > NOTICE: Cold boot > NOTICE: Booting Trusted Firmware > NOTICE: BL1: v1.4(release):armada-18.05.2:80bbf686 > NOTICE: BL1: Built : 17:00:18, Jul 30 2018 > NOTICE: BL1: Booting BL2 > lNOTICE: BL2: v1.4(release):armada-18.05.2:80bbf686 > NOTICE: BL2: Built : 17:00:21, Jul 30 2018 > BL2: Initiating SCP_BL2 transfer to SCP > NOTICE: SCP_BL2 contains 2 concatenated images > NOTICE: Load image to CP1 MSS AP0 > NOTICE: Loading MSS image from address 0x4023020 Size 0x135c to MSS at 0xf4280000 > NOTICE: Done > NOTICE: Load image to AP0 MSS > NOTICE: Loading MSS image from address 0x402437c Size 0x1f6c to MSS at 0xf0580000 > N > > FreeRTOS 7.3.0 - Marvell cm3 - A8K release armada-18.05.1 > > OTICE: Done > NOTICE: SCP Image doesn't contain PM firmware > NOTICE: BL1: Booting BL31 > lNOTICE: MSS PM is not supported in this build > NOTICE: BL31: v1.4(release):armada-18.05.2:80bbf686 > NOTICE: BL31: Built : 17:00:21, Jul 30 2018 > lUEFI firmware (version MARVELL_EFI built at 16:50:27 on Jul 30 2018) > > Armada 8040 MachiatoBin Platform Init > > Comphy0-0: PCIE0 5 Gbps > Comphy0-1: PCIE0 5 Gbps > Comphy0-2: PCIE0 5 Gbps > Comphy0-3: PCIE0 5 Gbps > Comphy0-4: SFI 10.31 Gbps > Comphy0-5: SATA1 5 Gbps > > Comphy1-0: SGMII1 1.25 Gbps > Comphy1-1: SATA2 5 Gbps > Comphy1-2: USB3_HOST0 5 Gbps > Comphy1-3: SATA3 5 Gbps > Comphy1-4: SFI 10.31 Gbps > Comphy1-5: SGMII2 3.125 Gbps > > UTMI PHY 0 initialized to USB Host0 > UTMI PHY 1 initialized to USB Host1 > UTMI PHY 2 initialized to USB Host0 > Succesfully installed protocol interfaces > Error: Image at 000BF6F8000 start failed: 00000001 > remove-symbol-file /usr/src/git/macchiato/edk2/Build/Armada80x0McBin-AARCH64/RELEASE_GCC5/AARCH64/MdeModulePkg/Universal/Acpi/AcpiPlatformDxe/AcpiPlatformDxe/DEBUG/AcpiPlatform.dll 0xBF6F9000 > Detected w25q32bv SPI flash with page size 256 B, erase size 4 KB, total 4 MB > ramdisk:blckio install. Status=Success > Connect: PcieRoot(0x0)/Pci(0x0,0x0): Not Found > 3h3h3hTianocore/EDK2 firmware version MARVELL_EFI > Press ESCAPE for boot options ...error: no suitable video mode found. > error: no video mode activated. > GNU GRUB version 2.02~beta3-5 > > /----------------------------------------------------------------------------\||||||||||||||||||||||||||\----------------------------------------------------------------------------/ Use the ^ and v keys to select which entry is highlighted. > Press enter to boot the selected OS, `e' to edit the commands > before booting or `c' for a command-line. > *Debian GNU/Linux Advanced options for Debian GNU/Linux System setup > The highlighted entry will be executed automatically in 5s. The highlighted entry will be executed automatically in 4s. Loading Linux 4.17.11 ... > EFI stub: Booting Linux Kernel... > EFI stub: Using DTB from configuration table > EFI stub: Exiting boot services and installing virtual address map... > [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd081] > [ 0.000000] Linux version 4.17.11 (root@leontynka) (gcc version 8.2.0 (Debian 8.2.0-2)) #10 SMP PREEMPT Fri Aug 3 18:29:35 CEST 2018 > [ 0.000000] Machine model: Marvell 8040 MACCHIATOBin > [ 0.000000] efi: Getting EFI parameters from FDT: > [ 0.000000] efi: EFI v2.70 by EDK II > [ 0.000000] efi: SMBIOS 3.0=0xbfed0000 ACPI 2.0=0xb6760000 MEMATTR=0xb8c63518 RNG=0xbffdcf98 > [ 0.000000] efi: seeding entropy pool > [ 0.000000] psci: probing for conduit method from DT. > [ 0.000000] psci: PSCIv1.0 detected in firmware. > [ 0.000000] psci: Using standard PSCI v0.2 function IDs > [ 0.000000] psci: MIGRATE_INFO_TYPE not supported. > [ 0.000000] psci: SMC Calling Convention v1.1 > [ 0.000000] percpu: Embedded 26 pages/cpu @ (ptrval) s67096 r8192 d31208 u106496 > [ 0.000000] Detected PIPT I-cache on CPU0 > [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 1031688 > [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.17.11 root=/dev/mmcblk0p1 ro console=ttyS0,115200 > [ 0.000000] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes) > [ 0.000000] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes) > [ 0.000000] software IO TLB [mem 0xbb810000-0xbf810000] (64MB) mapped at [ (ptrval)- (ptrval)] > [ 0.000000] Memory: 4033692K/4192256K available (4860K kernel code, 376K rwdata, 2452K rodata, 384K init, 2178K bss, 158564K reserved, 0K cma-reserved) > [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 > [ 0.000000] Preemptible hierarchical RCU implementation. > [ 0.000000] Tasks RCU enabled. > [ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 > [ 0.000000] GIC: Adjusting CPU interface base to 0x00000000f022f000 > [ 0.000000] GIC: Using split EOI/Deactivate mode > [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:160, num:32) > [ 0.000000] GICv2m: range[mem 0xf0280000-0xf0280fff], SPI[160:191] > [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:192, num:32) > [ 0.000000] GICv2m: range[mem 0xf0290000-0xf0290fff], SPI[192:223] > [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:224, num:32) > [ 0.000000] GICv2m: range[mem 0xf02a0000-0xf02a0fff], SPI[224:255] > [ 0.000000] GICv2m: DT overriding V2M MSI_TYPER (base:256, num:32) > [ 0.000000] GICv2m: range[mem 0xf02b0000-0xf02b0fff], SPI[256:287] > [ 0.000000] arch_timer: cp15 timer(s) running at 25.00MHz (phys). > [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x5c40939b5, max_idle_ns: 440795202646 ns > [ 0.000002] sched_clock: 56 bits at 25MHz, resolution 40ns, wraps every 4398046511100ns > [ 0.000113] Console: colour dummy device 174x49 > [ 0.000124] Calibrating delay loop (skipped), value calculated using timer frequency.. 50.08 BogoMIPS (lpj=83333) > [ 0.000129] pid_max: default: 32768 minimum: 301 > [ 0.000151] Security Framework initialized > [ 0.000154] Yama: becoming mindful. > [ 0.000183] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes) > [ 0.000199] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes) > [ 0.016676] ASID allocator initialised with 65536 entries > [ 0.020006] Hierarchical SRCU implementation. > [ 0.023435] Remapping and enabling EFI services. > [ 0.026680] smp: Bringing up secondary CPUs ... > [ 0.043500] Detected PIPT I-cache on CPU1 > [ 0.043522] CPU1: Booted secondary processor 0x0000000001 [0x410fd081] > [ 0.060176] Detected PIPT I-cache on CPU2 > [ 0.060195] CPU2: Booted secondary processor 0x0000000100 [0x410fd081] > [ 0.076859] Detected PIPT I-cache on CPU3 > [ 0.076872] CPU3: Booted secondary processor 0x0000000101 [0x410fd081] > [ 0.076901] smp: Brought up 1 node, 4 CPUs > [ 0.076910] SMP: Total of 4 processors activated. > [ 0.076913] CPU features: detected: 32-bit EL0 Support > [ 0.077194] CPU: All CPU(s) started at EL2 > [ 0.077205] alternatives: patching kernel code > [ 0.077230] random: get_random_u64 called from compute_layout+0x94/0xe8 with crng_init=0 > [ 0.077599] devtmpfs: initialized > [ 0.078967] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6370867519511994 ns > [ 0.078975] futex hash table entries: 1024 (order: 5, 131072 bytes) > [ 0.079032] pinctrl core: initialized pinctrl subsystem > [ 0.079183] SMBIOS 3.0.0 present. > [ 0.079191] DMI: Marvell Armada 8040 MacchiatoBin/Armada 8040 MacchiatoBin, BIOS EDK II Jul 30 2018 > [ 0.079264] NET: Registered protocol family 16 > [ 0.079484] cpuidle: using governor ladder > [ 0.079535] cpuidle: using governor menu > [ 0.079559] vdso: 2 pages (1 code @ (ptrval), 1 data @ (ptrval)) > [ 0.079562] vdso: 2 pages (1 code @ (ptrval), 1 data @ (ptrval)) > [ 0.079569] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers. > [ 0.079681] DMA: preallocated 256 KiB pool for atomic allocations > [ 0.082434] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages > [ 0.082669] ACPI: Interpreter disabled. > [ 0.082822] reg-fixed-voltage regulator-usb3-vbus0: could not find pctldev for node /cp0/config-space@f2000000/system-controller@440000/pinctrl/xhci0-vbus-pins, deferring probe > [ 0.082929] SCSI subsystem initialized > [ 0.083033] Registered efivars operations > [ 0.083398] clocksource: Switched to clocksource arch_sys_counter > [ 0.083517] pnp: PnP ACPI: disabled > [ 0.085008] NET: Registered protocol family 2 > [ 0.085146] tcp_listen_portaddr_hash hash table entries: 2048 (order: 3, 32768 bytes) > [ 0.085156] TCP established hash table entries: 32768 (order: 6, 262144 bytes) > [ 0.085213] TCP bind hash table entries: 32768 (order: 7, 524288 bytes) > [ 0.085441] TCP: Hash tables configured (established 32768 bind 32768) > [ 0.085492] UDP hash table entries: 2048 (order: 4, 65536 bytes) > [ 0.085507] UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes) > [ 0.085713] hw perfevents: unable to count PMU IRQs > [ 0.085718] hw perfevents: /ap806/config-space@f0000000/pmu: failed to register PMU devices! > [ 0.085823] kvm [1]: 8-bit VMID > [ 0.086279] kvm [1]: vgic interrupt IRQ1 > [ 0.086339] kvm [1]: Hyp mode initialized successfully > [ 0.086649] workingset: timestamp_bits=62 max_order=20 bucket_order=0 > [ 0.088566] io scheduler noop registered > [ 0.088625] io scheduler cfq registered (default) > [ 0.089467] armada-ap806-pinctrl f06f4000.system-controller:pinctrl: registered pinctrl driver > [ 0.089690] armada-cp110-pinctrl f2440000.system-controller:pinctrl: registered pinctrl driver > [ 0.089843] armada-cp110-pinctrl f4440000.system-controller:pinctrl: registered pinctrl driver > [ 0.091439] mv_xor_v2 f0400000.xor: Marvell Version 2 XOR driver > [ 0.091583] mv_xor_v2 f0420000.xor: Marvell Version 2 XOR driver > [ 0.091736] mv_xor_v2 f0440000.xor: Marvell Version 2 XOR driver > [ 0.091897] mv_xor_v2 f0460000.xor: Marvell Version 2 XOR driver > [ 0.092084] mv_xor_v2 f26a0000.xor: Marvell Version 2 XOR driver > [ 0.092247] mv_xor_v2 f26c0000.xor: Marvell Version 2 XOR driver > [ 0.092432] mv_xor_v2 f46a0000.xor: Marvell Version 2 XOR driver > [ 0.092596] mv_xor_v2 f46c0000.xor: Marvell Version 2 XOR driver > [ 0.092685] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled > [ 0.092996] console [ttyS0] disabled > [ 0.113364] f0512000.serial: ttyS0 at MMIO 0xf0512000 (irq = 7, base_baud = 12500000) is a 16550A > [ 0.777029] console [ttyS0] enabled > [ 0.801032] f2702100.serial: ttyS1 at MMIO 0xf2702100 (irq = 26, base_baud = 15625000) is a 16550A > [ 0.830524] f4702000.serial: ttyS2 at MMIO 0xf4702000 (irq = 27, base_baud = 15625000) is a 16550A > [ 0.839705] cacheinfo: Unable to detect cache hierarchy for CPU 0 > [ 0.846129] libphy: Fixed MDIO Bus: probed > [ 0.850384] libphy: orion_mdio_bus: probed > [ 0.854879] libphy: orion_mdio_bus: probed > [ 0.862575] mousedev: PS/2 mouse device common for all mice > [ 0.868314] rtc-efi rtc-efi: rtc core: registered rtc-efi as rtc0 > [ 0.874446] i2c /dev entries driver > [ 0.880187] sdhci: Secure Digital Host Controller Interface driver > [ 0.886399] sdhci: Copyright(c) Pierre Ossman > [ 0.890777] sdhci-pltfm: SDHCI platform and OF driver helper > [ 0.896640] mmc0: Switching to 3.3V signalling voltage failed > [ 0.927528] mmc0: SDHCI controller on f06e0000.sdhci [f06e0000.sdhci] using ADMA 64-bit > [ 0.966330] mmc1: SDHCI controller on f2780000.sdhci [f2780000.sdhci] using ADMA 64-bit > [ 0.976142] hw perfevents: enabled with armv8_cortex_a72 PMU driver, 7 counters available > [ 0.984470] PCI: OF: host bridge /cp0/pcie@f2600000 ranges: > [ 0.990120] PCI: OF: IO 0xeff00000..0xeff0ffff -> 0x00000000 > [ 0.996083] PCI: OF: MEM 0xc0000000..0xdfffffff -> 0xc0000000 > [ 1.002050] PCI: OF: MEM 0x800000000..0x8ffffffff -> 0x800000000 > [ 1.008301] mmc0: new high speed MMC card at address 0001 > [ 1.008337] armada8k-pcie f2600000.pcie: link up > [ 1.013884] mmcblk0: mmc0:0001 8GME4R 7.28 GiB > [ 1.018427] armada8k-pcie f2600000.pcie: PCI host bridge to bus 0000:00 > [ 1.023036] mmcblk0boot0: mmc0:0001 8GME4R partition 1 4.00 MiB > [ 1.029619] pci_bus 0000:00: root bus resource [bus 00-ff] > [ 1.035652] mmcblk0boot1: mmc0:0001 8GME4R partition 2 4.00 MiB > [ 1.041101] pci_bus 0000:00: root bus resource [io 0x0000-0xffff] > [ 1.047085] mmcblk0rpmb: mmc0:0001 8GME4R partition 3 512 KiB, chardev (250:0) > [ 1.053256] pci_bus 0000:00: root bus resource [mem 0xc0000000-0xdfffffff] > [ 1.067452] pci_bus 0000:00: root bus resource [mem 0x800000000-0x8ffffffff] > [ 1.074559] mmcblk0: p1 p2 p3 > [ 1.077642] pci 0000:00:00.0: disabling Extended Tags (this device can't handle them) > [ 1.097253] pci 0000:00:00.0: BAR 9: assigned [mem 0x800000000-0x80fffffff 64bit pref] > [ 1.105218] pci 0000:00:00.0: BAR 0: assigned [mem 0x810000000-0x8100fffff 64bit] > [ 1.112749] pci 0000:00:00.0: BAR 8: assigned [mem 0xc0000000-0xc00fffff] > [ 1.119578] pci 0000:00:00.0: BAR 7: assigned [io 0x1000-0x1fff] > [ 1.125712] pci 0000:01:00.0: BAR 0: assigned [mem 0x800000000-0x80fffffff 64bit pref] > [ 1.133705] pci 0000:01:00.0: BAR 2: assigned [mem 0xc0000000-0xc001ffff 64bit] > [ 1.141088] pci 0000:01:00.0: BAR 6: assigned [mem 0xc0020000-0xc003ffff pref] > [ 1.148353] pci 0000:01:00.1: BAR 0: assigned [mem 0xc0040000-0xc0043fff 64bit] > [ 1.155738] pci 0000:01:00.0: BAR 4: assigned [io 0x1000-0x10ff] > [ 1.161879] pci 0000:00:00.0: PCI bridge to [bus 01-ff] > [ 1.167139] pci 0000:00:00.0: bridge window [io 0x1000-0x1fff] > [ 1.173274] pci 0000:00:00.0: bridge window [mem 0xc0000000-0xc00fffff] > [ 1.180106] pci 0000:00:00.0: bridge window [mem 0x800000000-0x80fffffff 64bit pref] > [ 1.188210] mmc1: new high speed SDHC card at address 1234 > [ 1.193873] mmcblk1: mmc1:1234 SA08G 7.41 GiB > [ 1.198461] pcieport 0000:00:00.0: AER enabled with IRQ 32 > [ 1.204020] pci 0000:01:00.1: Linked as a consumer to 0000:01:00.0 > [ 1.210396] rtc-efi rtc-efi: setting system clock to 2018-08-06 20:01:28 UTC (1533585688) > [ 1.211676] mmcblk1: p1 p2 > [ 1.220717] v_5v0_usb3_hst_vbus: disabling > [ 1.231509] EXT4-fs (mmcblk0p1): mounted filesystem with ordered data mode. Opts: (null) > [ 1.239654] VFS: Mounted root (ext4 filesystem) readonly on device 179:1. > [ 1.248061] devtmpfs: mounted > [ 1.251151] Freeing unused kernel memory: 384K > [ 1.325623] random: fast init done > INIT: version 2.88 booting > [info] Using makefile-style concurrent boot in runlevel S. > [ 1.488069] NET: Registered protocol family 1 > ERROR: could not open /proc/stat: No such file or directory > [....] Starting the hotplug events dispatcher: systemd-udevdstarting version 239 > [ ok . > [....] Synthesizing the initial hotplug events...[ ok done. > [ 1.786418] EFI Variables Facility v0.08 2004-May-17 > [....] Waiting for /dev to be fully populated...[ 1.804433] mvpp2 f2000000.ethernet eth0: Using random mac address fe:a5:21:f0:f8:7d > [ 1.806861] usbcore: registered new interface driver usbfs > [ 1.817792] usbcore: registered new interface driver hub > [ 1.817835] mvpp2 f4000000.ethernet eth1: Using random mac address 86:5f:16:0c:f9:16 > [ 1.823172] usbcore: registered new device driver usb > [ 1.837065] mvpp2 f4000000.ethernet eth2: Using random mac address 8e:6e:60:9f:57:60 > [ 1.849493] ahci f2540000.sata: AHCI 0001.0000 32 slots 2 ports 6 Gbps 0x3 impl platform mode > [ 1.851030] mvpp2 f4000000.ethernet eth3: Using random mac address c6:5e:07:9a:54:82 > [ 1.859250] ahci f2540000.sata: flags: 64bit ncq sntf led only pmp fbs pio slum part sxs > [ 1.874656] scsi host0: ahci > [ 1.877789] scsi host1: ahci > [ 1.880777] ata1: SATA max UDMA/133 mmio [mem 0xf2540000-0xf256ffff] port 0x100 irq 57 > [ 1.888777] ata2: SATA max UDMA/133 mmio [mem 0xf2540000-0xf256ffff] port 0x180 irq 57 > [ 1.897008] ahci f4540000.sata: AHCI 0001.0000 32 slots 2 ports 6 Gbps 0x3 impl platform mode > [ 1.905629] ahci f4540000.sata: flags: 64bit ncq sntf led only pmp fbs pio slum part sxs > [ 1.914173] scsi host2: ahci > [ 1.917252] scsi host3: ahci > [ 1.920225] ata3: SATA max UDMA/133 mmio [mem 0xf4540000-0xf456ffff] port 0x100 irq 58 > [ 1.928198] ata4: SATA max UDMA/133 mmio [mem 0xf4540000-0xf456ffff] port 0x180 irq 58 > [ 1.928608] xhci-hcd f2500000.usb3: xHCI Host Controller > [ 1.942154] xhci-hcd f2500000.usb3: new USB bus registered, assigned bus number 1 > [ 1.951232] xhci-hcd f2500000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x00010010 > [ 1.959840] xhci-hcd f2500000.usb3: irq 59, io mem 0xf2500000 > [ 1.966174] hub 1-0:1.0: USB hub found > [ 1.970058] hub 1-0:1.0: 1 port detected > [ 1.974150] xhci-hcd f2500000.usb3: xHCI Host Controller > [ 1.979551] xhci-hcd f2500000.usb3: new USB bus registered, assigned bus number 2 > [ 1.979558] xhci-hcd f2500000.usb3: Host supports USB 3.0 SuperSpeed > [ 1.993840] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. > [ 2.000947] cryptd: max_cpu_qlen set to 1000 > [ 2.002463] hub 2-0:1.0: USB hub found > [ 2.010089] hub 2-0:1.0: 1 port detected > [ 2.014291] xhci-hcd f2510000.usb3: xHCI Host Controller > [ 2.019647] xhci-hcd f2510000.usb3: new USB bus registered, assigned bus number 3 > [ 2.027219] xhci-hcd f2510000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x00010010 > [ 2.035823] xhci-hcd f2510000.usb3: irq 60, io mem 0xf2510000 > [ 2.042445] hub 3-0:1.0: USB hub found > [ 2.046236] hub 3-0:1.0: 1 port detected > [ 2.050278] xhci-hcd f2510000.usb3: xHCI Host Controller > [ 2.055768] xhci-hcd f2510000.usb3: new USB bus registered, assigned bus number 4 > [ 2.063314] xhci-hcd f2510000.usb3: Host supports USB 3.0 SuperSpeed > [ 2.069818] usb usb4: We don't know the algorithms for LPM for this host, disabling LPM. > [ 2.078176] hub 4-0:1.0: USB hub found > [ 2.081972] hub 4-0:1.0: 1 port detected > [ 2.086215] xhci-hcd f4500000.usb3: xHCI Host Controller > [ 2.091581] xhci-hcd f4500000.usb3: new USB bus registered, assigned bus number 5 > [ 2.099158] xhci-hcd f4500000.usb3: hcc params 0x0a000990 hci version 0x100 quirks 0x00010010 > [ 2.107751] xhci-hcd f4500000.usb3: irq 61, io mem 0xf4500000 > [ 2.113788] hub 5-0:1.0: USB hub found > [ 2.117586] hub 5-0:1.0: 1 port detected > [ 2.121642] xhci-hcd f4500000.usb3: xHCI Host Controller > [ 2.126988] xhci-hcd f4500000.usb3: new USB bus registered, assigned bus number 6 > [ 2.134514] xhci-hcd f4500000.usb3: Host supports USB 3.0 SuperSpeed > [ 2.141009] usb usb6: We don't know the algorithms for LPM for this host, disabling LPM. > [ 2.149354] hub 6-0:1.0: USB hub found > [ 2.153156] hub 6-0:1.0: 1 port detected > [ 2.162109] [drm] radeon kernel modesetting enabled. > [ 2.167730] radeon 0000:01:00.0: enabling device (0000 -> 0003) > [ 2.174555] [drm] initializing kernel modesetting (CEDAR 0x1002:0x68F9 0x1787:0x3000 0x00). > [ 2.223834] ata1: SATA link down (SStatus 0 SControl 300) > [ 2.250489] ata3: SATA link down (SStatus 0 SControl 300) > [ 2.256599] ata4: SATA link down (SStatus 0 SControl 300) > [ 2.303985] ATOM BIOS: CEDAR > [ 2.306958] [drm] GPU not posted. posting now... > [ 2.314564] radeon 0000:01:00.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used) > [ 2.323482] radeon 0000:01:00.0: GTT: 1024M 0x0000000040000000 - 0x000000007FFFFFFF > [ 2.331229] [drm] Detected VRAM RAM=1024M, BAR=256M > [ 2.336194] [drm] RAM width 64bits DDR > [ 2.343406] [TTM] Zone kernel: Available graphics memory: 2017038 kiB > [ 2.349967] [TTM] Initializing pool allocator > [ 2.354353] [TTM] Initializing DMA pool allocator > [ 2.359105] [drm] radeon: 1024M of VRAM memory ready > [ 2.364109] [drm] radeon: 1024M of GTT memory ready. > [ 2.369260] [drm] Loading CEDAR Microcode > [ 2.370083] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) > [ 2.380025] ata2.00: ATA-8: ST4000DM000-1F2168, CC52, max UDMA/133 > [ 2.386240] ata2.00: 7814037168 sectors, multi 0: LBA48 NCQ (depth 31/32) > [ 2.393559] ata2.00: configured for UDMA/133 > [ 2.397952] scsi 1:0:0:0: Direct-Access ATA ST4000DM000-1F21 CC52 PQ: 0 ANSI: 5 > [ 2.418024] sd 1:0:0:0: [sda] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) > [ 2.420812] [drm] Internal thermal controller with fan control > [ 2.425823] sd 1:0:0:0: [sda] 4096-byte physical blocks > [ 2.436947] sd 1:0:0:0: [sda] Write Protect is off > [ 2.441160] [drm] radeon: dpm initialized > [ 2.445807] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA > [ 2.459901] [drm] GART: num cpu pages 262144, num gpu pages 262144 > [ 2.460082] usb 5-1: new high-speed USB device number 2 using xhci-hcd > [ 2.466365] sd 1:0:0:0: [sda] Attached SCSI removable disk > [ 2.467195] [drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0 > [ 2.511728] NET: Registered protocol family 10 > [ 2.516748] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready > [ 2.522778] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready > [ 2.528771] Segment Routing with IPv6 > [ 2.548836] [drm] PCIE GART of 1024M enabled (table at 0x000000000014C000). > [ 2.556059] radeon 0000:01:00.0: WB enabled > [ 2.560272] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0x (ptrval) > [ 2.571109] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0x (ptrval) > [ 2.585425] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x000000000005c418 and cpu addr 0x (ptrval) > [ 2.596259] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). > [ 2.602916] [drm] Driver supports precise vblank timestamp query. > [ 2.609046] radeon 0000:01:00.0: radeon: MSI limited to 32-bit > [ 2.615006] radeon 0000:01:00.0: radeon: using MSI. > [ 2.619944] [drm] radeon: irq initialized. > [ 2.635912] hub 5-1:1.0: USB hub found > [ 2.639847] hub 5-1:1.0: 4 ports detected > [ 2.644731] [drm] ring test on 0 succeeded in 0 usecs > [ 2.649815] [drm] ring test on 3 succeeded in 3 usecs > [ 2.797941] usb 6-1: new SuperSpeed USB device number 2 using xhci-hcd > [ 2.833616] [drm] ring test on 5 succeeded in 1 usecs > [ 2.838697] [drm] UVD initialized successfully. > [ 2.843495] [drm] ib test on ring 0 succeeded in 0 usecs > [ 2.848884] [drm] ib test on ring 3 succeeded in 0 usecs > [ 2.905618] hub 6-1:1.0: USB hub found > [ 2.909579] hub 6-1:1.0: 4 ports detected > [ 2.986738] usb 5-1.4: new high-speed USB device number 3 using xhci-hcd > [ 3.006863] [drm] ib test on ring 5 succeeded > [ 3.011876] [drm] Radeon Display Connectors > [ 3.016085] [drm] Connector 0: > [ 3.019154] [drm] DP-1 > [ 3.021698] [drm] HPD2 > [ 3.024244] [drm] DDC: 0x6460 0x6460 0x6464 0x6464 0x6468 0x6468 0x646c 0x646c > [ 3.031672] [drm] Encoders: > [ 3.034653] [drm] DFP1: INTERNAL_UNIPHY1 > [ 3.038941] [drm] Connector 1: > [ 3.042008] [drm] DVI-I-1 > [ 3.044814] [drm] HPD4 > [ 3.047358] [drm] DDC: 0x6440 0x6440 0x6444 0x6444 0x6448 0x6448 0x644c 0x644c > [ 3.054786] [drm] Encoders: > [ 3.057767] [drm] DFP2: INTERNAL_UNIPHY > [ 3.061968] [drm] CRT1: INTERNAL_KLDSCP_DAC1 > [ 3.066605] [drm] Connector 2: > [ 3.069672] [drm] DVI-I-2 > [ 3.072478] [drm] HPD1 > [ 3.075023] [drm] DDC: 0x6430 0x6430 0x6434 0x6434 0x6438 0x6438 0x643c 0x643c > [ 3.082450] [drm] Encoders: > [ 3.085430] [drm] DFP3: INTERNAL_UNIPHY1 > [ 3.089719] [drm] CRT2: INTERNAL_KLDSCP_DAC2 > [ 3.095924] hub 5-1.4:1.0: USB hub found > [ 3.100002] hub 5-1.4:1.0: 4 ports detected > [ 3.110713] mvpp2 f2000000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx > [ 3.118686] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > [ 3.206262] [drm] fb mappable at 0x80034D000 > [ 3.210554] [drm] vram apper at 0x800000000 > [ 3.214756] [drm] size 8294400 > [ 3.217823] [drm] fb depth is 24 > [ 3.221064] [drm] pitch is 7680 > [ 3.264754] Console: switching to colour frame buffer device 240x67 > [ 3.278056] radeon 0000:01:00.0: fb0: radeondrmfb frame buffer device > [ 3.299311] [drm] Initialized radeon 2.50.0 20080528 for 0000:01:00.0 on minor 0 > [ ok done. > [....] Setting up keyboard layout...[ 3.351975] usb 6-1.4: new SuperSpeed USB device number 3 using xhci-hcd > [ ok done. > [ 3.458706] hub 6-1.4:1.0: USB hub found > [ 3.462996] hub 6-1.4:1.0: 4 ports detected > [ 3.540089] usb 5-1.4.1: new full-speed USB device number 4 using xhci-hcd > [ 3.551721] EXT4-fs (mmcblk0p1): re-mounted. Opts: (null) > [....] Checking root file system...fsck from util-linux 2.32 > /dev/mmcblk0p1: clean, 165507/475136 files, 4946295/7599104 blocks > [ ok done. > [ 3.603163] EXT4-fs (mmcblk0p1): re-mounted. Opts: commit=60 > [ 3.723430] usb 5-1.4.1: new high-speed USB device number 5 using xhci-hcd > [....] Activating lvm and md swap...[ ok done. > [....] Checking file systems...fsck from util-linux 2.32 > checking super block... > filesystem is clean, no checking needed. > [ ok done. > [ 3.892330] usbcore: registered new interface driver usbhid > [ 3.898466] usbhid: USB HID core driver > [....] Cleaning up temporary files... /tmp[ ok . > [ 3.971996] usbcore: registered new interface driver snd-usb-audio > [ 3.981106] input: ASUS Xonar U7 MKII as /devices/platform/cp1/cp1:config-space@f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.1/5-1.4.1:1.4/0003:0B05:183C.0001/input/input0 > [ 3.996748] usb 5-1.4.2: new low-speed USB device number 6 using xhci-hcd > [info] Loading kernel module nf_conntrack_ftp. > [info] Loading kernel module snd-usb-audio. > [info] Loading kernel module fbcon. > modprobe: FATAL: Module fbcon not found in directory /lib/modules/4.17.11 > [info] Loading kernel module udl. > modprobe: FATAL: Module udl not found in directory /lib/modules/4.17.11 > [ 4.050298] hid-generic 0003:0B05:183C.0001: input: USB HID v1.00 Device [ASUS Xonar U7 MKII] on usb-f4500000.usb3-1.4.1/input4 > [ 4.143548] random: alsactl: uninitialized urandom read (4 bytes read) > [ 4.159127] input: Logitech USB-PS/2 Optical Mouse as /devices/platform/cp1/cp1:config-space@f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.2/5-1.4.2:1.0/0003:046D:C01E.0002/input/input1 > [ 4.175703] hid-generic 0003:046D:C01E.0002: input: USB HID v1.10 Mouse [Logitech USB-PS/2 Optical Mouse] on usb-f4500000.usb3-1.4.2/input0 > [ 4.252844] Adding 4194300k swap on /i/SWAP. Priority:-2 extents:1 across:4194300k > [ 4.270569] mvpp2 f4000000.ethernet eth2: Link is Up - 100Mbps/Full - flow control rx/tx > [ 4.278738] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready > [ 4.316754] usb 5-1.4.3: new low-speed USB device number 7 using xhci-hcd > [....] Mounting local filesystems...[ ok done. > [....] Activating swapfile swap...[ ok done. > [....] Cleaning up temporary files...[ ok . > [ 4.366696] random: dd: uninitialized urandom read (512 bytes read) > [....] Starting Setting kernel variables: sysctl[ ok . > [ 4.468860] PPP generic driver version 2.4.2 > [ 4.474859] input: GASIA PS2toUSB Adapter as /devices/platform/cp1/cp1:config-space@f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.3/5-1.4.3:1.0/0003:0E8F:0020.0003/input/input2 > [ 4.477505] NET: Registered protocol family 17 > [ 4.539749] NET: Registered protocol family 24 > [ 4.550179] hid-generic 0003:0E8F:0020.0003: input: USB HID v1.10 Keyboard [GASIA PS2toUSB Adapter] on usb-f4500000.usb3-1.4.3/input0 > [ 4.566423] input: GASIA PS2toUSB Adapter as /devices/platform/cp1/cp1:config-space@f4000000/f4500000.usb3/usb5/5-1/5-1.4/5-1.4.3/5-1.4.3:1.1/0003:0E8F:0020.0004/input/input3 > [....] Configuring network interfaces...Plugin rp-pppoe.so loaded. > ifup: interface eth0 already configured > ifup: interface eth2 already configured > [ ok done. > [....] Cleaning up temporary files...[ ok . > [ 4.636900] hid-generic 0003:0E8F:0020.0004: input: USB HID v1.10 Mouse [GASIA PS2toUSB Adapter] on usb-f4500000.usb3-1.4.3/input1 > [ 4.660387] random: alsactl: uninitialized urandom read (4 bytes read) > [....] Setting up X socket directories... /tmp/.X11-unix /tmp/.ICE-unix[ ok . > [....] Setting sensors limits...[ ok done. > [....] Setting up ALSA...[ ok done. > [....] Loading netfilter rules...run-parts: executing /usr/share/netfilter-persistent/plugins.d/15-ip4tables start > run-parts: executing /usr/share/netfilter-persistent/plugins.d/25-ip6tables start > [ 4.743426] usb 5-1.4.4: new high-speed USB device number 8 using xhci-hcd > [ ok done. > INIT: Entering runlevel: 2 > [info] Using makefile-style concurrent boot in runlevel 2. > [....] Enabling additional executable binary formats: binfmt-support[ ok . > [....] Setting up console font and keymap...[ ok done. > [ 4.878142] udlfb 5-1.4.4:1.0: vendor descriptor length: 34 data: 22 5f 01 00 20 05 00 01 03 00 04 > [ 4.887190] udlfb 5-1.4.4:1.0: DL chip limited to 2360000 pixel modes > [....] Starting enhanced syslogd: rsyslogd[ ok . > [ 5.046171] usb 5-1.4.4: Unable to get valid EDID from device/display > [ 5.055288] usb 5-1.4.4: fb1 is DisplayLink USB device (800x600, 1880K framebuffer memory) > [ 5.063655] usbcore: registered new interface driver udlfb > [....] Starting system message bus: dbus[ ok . > [....] Loading cpufreq kernel modules...[ ok done (none). > [....] Starting mouse interface server: gpm[ ok . > [....] Starting NTP server: ntpd[ ok . > [ 5.167095] urandom_read: 3 callbacks suppressed > [ 5.167098] random: automount: uninitialized urandom read (4 bytes read) > [ 5.195518] random: isc-worker0000: uninitialized urandom read (10 bytes read) > [ 5.202800] random: isc-worker0000: uninitialized urandom read (40 bytes read) > [....] Starting automount...[ ok . > [....] Starting domain name service...: bind9[ ok . > [....] Starting virtual private network daemon:[ ok . > [....] CPUFreq Utilities: Setting ondemand CPUFreq governor...disabled, governor not available...[ ok done. > Starting very small Busybox based DHCP server: Starting /usr/sbin/udhcpd... > udhcpd. > Starting radvd: radvd. > [....] Starting periodic command scheduler: cron[ ok . > [....] Starting OpenBSD Secure Shell server: sshd[ ok . > [....] Starting WIDE DHCPv6 client: dhcp6c[ ok . > > Debian GNU/Linux buster/sid leontynka ttyS0 > > leontynka login: [ 10.373259] random: crng init done > [ 10.376676] random: 1 urandom warning(s) missed due to ratelimiting > [ 23.931568] tun: Universal TUN/TAP device driver, 1.6 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-07 18:07 ` Ard Biesheuvel (?) @ 2018-08-07 18:17 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-07 18:17 UTC (permalink / raw) To: Ard Biesheuvel Cc: Marcin Wojtas, Thomas Petazzoni, Joao Pinto, Catalin Marinas, linux-pci, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, Matt Sealey, Jingoo Han, Robin Murphy, linux-arm-kernel On Tue, 7 Aug 2018, Ard Biesheuvel wrote: > On 7 August 2018 at 19:39, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Tue, 7 Aug 2018, Marcin Wojtas wrote: > > > >> Ard, Mikulas, > >> > >> After some self-caused setup issues I was able to run the test on my > >> MacchiatoBin with the kernel v4.18-rc8. It's been running for 1h+ now, > >> loading the CPU to 100% and no single error event... > >> > >> I built the binary file with: > >> gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc -O2 > >> > >> Maybe it's the older firmware issue? > > > > I have downloaded and built the firmware recently (it has timestamp Jul 30 > > 2018). > > > > Do you still have your firmware file "flash-image.bin" that you used, so > > that I could try it? > > > >> Please send the full bootlog with > >> the very first line after reset. My board rev is v1.3 and I use > >> mainline UEFI (newest edk2 + edk2-platforms) + newest publicly > >> available ARM-TF and earliest firmware for this board. > >> > >> Best regards, > >> Marcin > > > > Mikulas, > > Is the issue reproducible with an nvidia card + nouveau driver as well ? > > Given the screen corruption i see with radeon even on other arm > systems, i'd like to ensure that this is a platform bug not a driver > bug. I see the same memcpy-to-framebuffer corruption on Radeon HD 6450 and nVidia Quadro NVS 285. 3D acceleration on nVidia is slow, but it doesn't have visible glitches. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-07 18:17 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-07 18:17 UTC (permalink / raw) To: linux-arm-kernel On Tue, 7 Aug 2018, Ard Biesheuvel wrote: > On 7 August 2018 at 19:39, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Tue, 7 Aug 2018, Marcin Wojtas wrote: > > > >> Ard, Mikulas, > >> > >> After some self-caused setup issues I was able to run the test on my > >> MacchiatoBin with the kernel v4.18-rc8. It's been running for 1h+ now, > >> loading the CPU to 100% and no single error event... > >> > >> I built the binary file with: > >> gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc -O2 > >> > >> Maybe it's the older firmware issue? > > > > I have downloaded and built the firmware recently (it has timestamp Jul 30 > > 2018). > > > > Do you still have your firmware file "flash-image.bin" that you used, so > > that I could try it? > > > >> Please send the full bootlog with > >> the very first line after reset. My board rev is v1.3 and I use > >> mainline UEFI (newest edk2 + edk2-platforms) + newest publicly > >> available ARM-TF and earliest firmware for this board. > >> > >> Best regards, > >> Marcin > > > > Mikulas, > > Is the issue reproducible with an nvidia card + nouveau driver as well ? > > Given the screen corruption i see with radeon even on other arm > systems, i'd like to ensure that this is a platform bug not a driver > bug. I see the same memcpy-to-framebuffer corruption on Radeon HD 6450 and nVidia Quadro NVS 285. 3D acceleration on nVidia is slow, but it doesn't have visible glitches. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-07 18:17 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-07 18:17 UTC (permalink / raw) To: Ard Biesheuvel Cc: Thomas Petazzoni, Joao Pinto, Jingoo Han, linux-pci, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, Matt Sealey, Catalin Marinas, Marcin Wojtas, Robin Murphy, linux-arm-kernel On Tue, 7 Aug 2018, Ard Biesheuvel wrote: > On 7 August 2018 at 19:39, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Tue, 7 Aug 2018, Marcin Wojtas wrote: > > > >> Ard, Mikulas, > >> > >> After some self-caused setup issues I was able to run the test on my > >> MacchiatoBin with the kernel v4.18-rc8. It's been running for 1h+ now, > >> loading the CPU to 100% and no single error event... > >> > >> I built the binary file with: > >> gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc -O2 > >> > >> Maybe it's the older firmware issue? > > > > I have downloaded and built the firmware recently (it has timestamp Jul 30 > > 2018). > > > > Do you still have your firmware file "flash-image.bin" that you used, so > > that I could try it? > > > >> Please send the full bootlog with > >> the very first line after reset. My board rev is v1.3 and I use > >> mainline UEFI (newest edk2 + edk2-platforms) + newest publicly > >> available ARM-TF and earliest firmware for this board. > >> > >> Best regards, > >> Marcin > > > > Mikulas, > > Is the issue reproducible with an nvidia card + nouveau driver as well ? > > Given the screen corruption i see with radeon even on other arm > systems, i'd like to ensure that this is a platform bug not a driver > bug. I see the same memcpy-to-framebuffer corruption on Radeon HD 6450 and nVidia Quadro NVS 285. 3D acceleration on nVidia is slow, but it doesn't have visible glitches. Mikulas _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
[parent not found: <CAPv3WKcKoEe=Qysp6Oac2C=G9bUhUQf1twSRCY+_qJ6XEC-iag@mail.gmail.com>]
* Re: framebuffer corruption due to overlapping stp instructions on arm64 [not found] ` <CAPv3WKcKoEe=Qysp6Oac2C=G9bUhUQf1twSRCY+_qJ6XEC-iag@mail.gmail.com> 2018-08-08 14:10 ` Mikulas Patocka @ 2018-08-08 14:10 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 14:10 UTC (permalink / raw) To: Marcin Wojtas Cc: Ard Biesheuvel, Thomas Petazzoni, Joao Pinto, Catalin Marinas, linux-pci, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, Matt Sealey, Jingoo Han, Robin Murphy, linux-arm-kernel On Wed, 8 Aug 2018, Marcin Wojtas wrote: > Hi Mikulas, > > wt., 7 sie 2018 o 19:39 Mikulas Patocka <mpatocka@redhat.com> napisa?(a): > > > > > > > > On Tue, 7 Aug 2018, Marcin Wojtas wrote: > > > > > Ard, Mikulas, > > > > > > After some self-caused setup issues I was able to run the test on my > > > MacchiatoBin with the kernel v4.18-rc8. It's been running for 1h+ now, > > > loading the CPU to 100% and no single error event... > > > > > > I built the binary file with: > > > gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc -O2 > > > > > > Maybe it's the older firmware issue? > > > > I have downloaded and built the firmware recently (it has timestamp Jul 30 > > 2018). > > > > Do you still have your firmware file "flash-image.bin" that you used, so > > that I could try it? > > Attached. Please let know if you see any difference. I booted this image, but the same corruption happens. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 14:10 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 14:10 UTC (permalink / raw) To: linux-arm-kernel On Wed, 8 Aug 2018, Marcin Wojtas wrote: > Hi Mikulas, > > wt., 7 sie 2018 o 19:39 Mikulas Patocka <mpatocka@redhat.com> napisa?(a): > > > > > > > > On Tue, 7 Aug 2018, Marcin Wojtas wrote: > > > > > Ard, Mikulas, > > > > > > After some self-caused setup issues I was able to run the test on my > > > MacchiatoBin with the kernel v4.18-rc8. It's been running for 1h+ now, > > > loading the CPU to 100% and no single error event... > > > > > > I built the binary file with: > > > gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc -O2 > > > > > > Maybe it's the older firmware issue? > > > > I have downloaded and built the firmware recently (it has timestamp Jul 30 > > 2018). > > > > Do you still have your firmware file "flash-image.bin" that you used, so > > that I could try it? > > Attached. Please let know if you see any difference. I booted this image, but the same corruption happens. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 14:10 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 14:10 UTC (permalink / raw) To: Marcin Wojtas Cc: Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, Catalin Marinas, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, Matt Sealey, linux-pci, Jingoo Han, Robin Murphy, linux-arm-kernel On Wed, 8 Aug 2018, Marcin Wojtas wrote: > Hi Mikulas, > > wt., 7 sie 2018 o 19:39 Mikulas Patocka <mpatocka@redhat.com> napisa?(a): > > > > > > > > On Tue, 7 Aug 2018, Marcin Wojtas wrote: > > > > > Ard, Mikulas, > > > > > > After some self-caused setup issues I was able to run the test on my > > > MacchiatoBin with the kernel v4.18-rc8. It's been running for 1h+ now, > > > loading the CPU to 100% and no single error event... > > > > > > I built the binary file with: > > > gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc -O2 > > > > > > Maybe it's the older firmware issue? > > > > I have downloaded and built the firmware recently (it has timestamp Jul 30 > > 2018). > > > > Do you still have your firmware file "flash-image.bin" that you used, so > > that I could try it? > > Attached. Please let know if you see any difference. I booted this image, but the same corruption happens. Mikulas _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 15:47 ` Ard Biesheuvel (?) @ 2018-08-06 17:13 ` Catalin Marinas -1 siblings, 0 replies; 238+ messages in thread From: Catalin Marinas @ 2018-08-06 17:13 UTC (permalink / raw) To: Ard Biesheuvel Cc: Robin Murphy, Thomas Petazzoni, Joao Pinto, linux-pci, Will Deacon, Russell King, Linux Kernel Mailing List, Mikulas Patocka, Matt Sealey, Jingoo Han, linux-arm-kernel On Mon, Aug 06, 2018 at 05:47:36PM +0200, Ard Biesheuvel wrote: > On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: > > On 06/08/18 11:25, Mikulas Patocka wrote: > > [...] > >>> > >>> None of this explains why some transactions fail to make it across > >>> entirely. The overlapping writes in question write the same data to > >>> the memory locations that are covered by both, and so the ordering in > >>> which the transactions are received should not affect the outcome. > >> > >> You're right that the corruption couldn't be explained just by reordering > >> writes. My hypothesis is that the PCIe controller tries to disambiguate > >> the overlapping writes, but the disambiguation logic was not tested and it > >> is buggy. If there's a barrier between the overlapping writes, the PCIe > >> controller won't see any overlapping writes, so it won't trigger the > >> faulty disambiguation logic and it works. > >> > >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 > >> that could insert barriers between non-cached writes automatically? > > > > I don't think there is, and even if there was I imagine it would have a > > pretty hideous effect on non-coherent DMA buffers and the various other > > places in which we have Normal-NC mappings of actual system RAM. > > Looking at the A72 manual, there is one chicken bit that looks like it > may be related: > > CPUACTLR_EL1 bit #50: > > 0 Enables store streaming on NC/GRE memory type. This is the reset value. > 1 Disables store streaming on NC/GRE memory type. > > so putting something like > > mrs x0, S3_1_C15_C2_0 > orr x0, x0, #(1 << 50) > msr S3_1_C15_C2_0, x0 > > in __cpu_setup() would be worth a try. Note that access to this register may be disabled at EL3 by firmware (ACTLR_EL3.CPUACTLR). FWIW, Mikulas' test seems to run fine on a ThunderX1 with AMD FirePro W2100 (on /dev/fb1) -- Catalin ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 17:13 ` Catalin Marinas 0 siblings, 0 replies; 238+ messages in thread From: Catalin Marinas @ 2018-08-06 17:13 UTC (permalink / raw) To: linux-arm-kernel On Mon, Aug 06, 2018 at 05:47:36PM +0200, Ard Biesheuvel wrote: > On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: > > On 06/08/18 11:25, Mikulas Patocka wrote: > > [...] > >>> > >>> None of this explains why some transactions fail to make it across > >>> entirely. The overlapping writes in question write the same data to > >>> the memory locations that are covered by both, and so the ordering in > >>> which the transactions are received should not affect the outcome. > >> > >> You're right that the corruption couldn't be explained just by reordering > >> writes. My hypothesis is that the PCIe controller tries to disambiguate > >> the overlapping writes, but the disambiguation logic was not tested and it > >> is buggy. If there's a barrier between the overlapping writes, the PCIe > >> controller won't see any overlapping writes, so it won't trigger the > >> faulty disambiguation logic and it works. > >> > >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 > >> that could insert barriers between non-cached writes automatically? > > > > I don't think there is, and even if there was I imagine it would have a > > pretty hideous effect on non-coherent DMA buffers and the various other > > places in which we have Normal-NC mappings of actual system RAM. > > Looking at the A72 manual, there is one chicken bit that looks like it > may be related: > > CPUACTLR_EL1 bit #50: > > 0 Enables store streaming on NC/GRE memory type. This is the reset value. > 1 Disables store streaming on NC/GRE memory type. > > so putting something like > > mrs x0, S3_1_C15_C2_0 > orr x0, x0, #(1 << 50) > msr S3_1_C15_C2_0, x0 > > in __cpu_setup() would be worth a try. Note that access to this register may be disabled at EL3 by firmware (ACTLR_EL3.CPUACTLR). FWIW, Mikulas' test seems to run fine on a ThunderX1 with AMD FirePro W2100 (on /dev/fb1) -- Catalin ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 17:13 ` Catalin Marinas 0 siblings, 0 replies; 238+ messages in thread From: Catalin Marinas @ 2018-08-06 17:13 UTC (permalink / raw) To: Ard Biesheuvel Cc: Thomas Petazzoni, Joao Pinto, linux-pci, Will Deacon, Russell King, Linux Kernel Mailing List, Mikulas Patocka, Matt Sealey, Jingoo Han, Robin Murphy, linux-arm-kernel On Mon, Aug 06, 2018 at 05:47:36PM +0200, Ard Biesheuvel wrote: > On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: > > On 06/08/18 11:25, Mikulas Patocka wrote: > > [...] > >>> > >>> None of this explains why some transactions fail to make it across > >>> entirely. The overlapping writes in question write the same data to > >>> the memory locations that are covered by both, and so the ordering in > >>> which the transactions are received should not affect the outcome. > >> > >> You're right that the corruption couldn't be explained just by reordering > >> writes. My hypothesis is that the PCIe controller tries to disambiguate > >> the overlapping writes, but the disambiguation logic was not tested and it > >> is buggy. If there's a barrier between the overlapping writes, the PCIe > >> controller won't see any overlapping writes, so it won't trigger the > >> faulty disambiguation logic and it works. > >> > >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 > >> that could insert barriers between non-cached writes automatically? > > > > I don't think there is, and even if there was I imagine it would have a > > pretty hideous effect on non-coherent DMA buffers and the various other > > places in which we have Normal-NC mappings of actual system RAM. > > Looking at the A72 manual, there is one chicken bit that looks like it > may be related: > > CPUACTLR_EL1 bit #50: > > 0 Enables store streaming on NC/GRE memory type. This is the reset value. > 1 Disables store streaming on NC/GRE memory type. > > so putting something like > > mrs x0, S3_1_C15_C2_0 > orr x0, x0, #(1 << 50) > msr S3_1_C15_C2_0, x0 > > in __cpu_setup() would be worth a try. Note that access to this register may be disabled at EL3 by firmware (ACTLR_EL3.CPUACTLR). FWIW, Mikulas' test seems to run fine on a ThunderX1 with AMD FirePro W2100 (on /dev/fb1) -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 17:13 ` Catalin Marinas (?) @ 2018-08-06 17:19 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 17:19 UTC (permalink / raw) To: Catalin Marinas Cc: Ard Biesheuvel, Robin Murphy, Thomas Petazzoni, Joao Pinto, linux-pci, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, Jingoo Han, linux-arm-kernel On Mon, 6 Aug 2018, Catalin Marinas wrote: > On Mon, Aug 06, 2018 at 05:47:36PM +0200, Ard Biesheuvel wrote: > > On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: > > > On 06/08/18 11:25, Mikulas Patocka wrote: > > > [...] > > >>> > > >>> None of this explains why some transactions fail to make it across > > >>> entirely. The overlapping writes in question write the same data to > > >>> the memory locations that are covered by both, and so the ordering in > > >>> which the transactions are received should not affect the outcome. > > >> > > >> You're right that the corruption couldn't be explained just by reordering > > >> writes. My hypothesis is that the PCIe controller tries to disambiguate > > >> the overlapping writes, but the disambiguation logic was not tested and it > > >> is buggy. If there's a barrier between the overlapping writes, the PCIe > > >> controller won't see any overlapping writes, so it won't trigger the > > >> faulty disambiguation logic and it works. > > >> > > >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 > > >> that could insert barriers between non-cached writes automatically? > > > > > > I don't think there is, and even if there was I imagine it would have a > > > pretty hideous effect on non-coherent DMA buffers and the various other > > > places in which we have Normal-NC mappings of actual system RAM. > > > > Looking at the A72 manual, there is one chicken bit that looks like it > > may be related: > > > > CPUACTLR_EL1 bit #50: > > > > 0 Enables store streaming on NC/GRE memory type. This is the reset value. > > 1 Disables store streaming on NC/GRE memory type. > > > > so putting something like > > > > mrs x0, S3_1_C15_C2_0 > > orr x0, x0, #(1 << 50) > > msr S3_1_C15_C2_0, x0 > > > > in __cpu_setup() would be worth a try. > > Note that access to this register may be disabled at EL3 by firmware > (ACTLR_EL3.CPUACTLR). > > FWIW, Mikulas' test seems to run fine on a ThunderX1 with AMD > FirePro W2100 (on /dev/fb1) I have the EDK EFI firmware sources (and I can load it from a SD card, so there's no risk of bricking the board), so I can insert the write into it, if you say where. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 17:19 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 17:19 UTC (permalink / raw) To: linux-arm-kernel On Mon, 6 Aug 2018, Catalin Marinas wrote: > On Mon, Aug 06, 2018 at 05:47:36PM +0200, Ard Biesheuvel wrote: > > On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: > > > On 06/08/18 11:25, Mikulas Patocka wrote: > > > [...] > > >>> > > >>> None of this explains why some transactions fail to make it across > > >>> entirely. The overlapping writes in question write the same data to > > >>> the memory locations that are covered by both, and so the ordering in > > >>> which the transactions are received should not affect the outcome. > > >> > > >> You're right that the corruption couldn't be explained just by reordering > > >> writes. My hypothesis is that the PCIe controller tries to disambiguate > > >> the overlapping writes, but the disambiguation logic was not tested and it > > >> is buggy. If there's a barrier between the overlapping writes, the PCIe > > >> controller won't see any overlapping writes, so it won't trigger the > > >> faulty disambiguation logic and it works. > > >> > > >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 > > >> that could insert barriers between non-cached writes automatically? > > > > > > I don't think there is, and even if there was I imagine it would have a > > > pretty hideous effect on non-coherent DMA buffers and the various other > > > places in which we have Normal-NC mappings of actual system RAM. > > > > Looking at the A72 manual, there is one chicken bit that looks like it > > may be related: > > > > CPUACTLR_EL1 bit #50: > > > > 0 Enables store streaming on NC/GRE memory type. This is the reset value. > > 1 Disables store streaming on NC/GRE memory type. > > > > so putting something like > > > > mrs x0, S3_1_C15_C2_0 > > orr x0, x0, #(1 << 50) > > msr S3_1_C15_C2_0, x0 > > > > in __cpu_setup() would be worth a try. > > Note that access to this register may be disabled at EL3 by firmware > (ACTLR_EL3.CPUACTLR). > > FWIW, Mikulas' test seems to run fine on a ThunderX1 with AMD > FirePro W2100 (on /dev/fb1) I have the EDK EFI firmware sources (and I can load it from a SD card, so there's no risk of bricking the board), so I can insert the write into it, if you say where. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 17:19 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 17:19 UTC (permalink / raw) To: Catalin Marinas Cc: Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, linux-pci, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, Jingoo Han, Robin Murphy, linux-arm-kernel On Mon, 6 Aug 2018, Catalin Marinas wrote: > On Mon, Aug 06, 2018 at 05:47:36PM +0200, Ard Biesheuvel wrote: > > On 6 August 2018 at 14:42, Robin Murphy <robin.murphy@arm.com> wrote: > > > On 06/08/18 11:25, Mikulas Patocka wrote: > > > [...] > > >>> > > >>> None of this explains why some transactions fail to make it across > > >>> entirely. The overlapping writes in question write the same data to > > >>> the memory locations that are covered by both, and so the ordering in > > >>> which the transactions are received should not affect the outcome. > > >> > > >> You're right that the corruption couldn't be explained just by reordering > > >> writes. My hypothesis is that the PCIe controller tries to disambiguate > > >> the overlapping writes, but the disambiguation logic was not tested and it > > >> is buggy. If there's a barrier between the overlapping writes, the PCIe > > >> controller won't see any overlapping writes, so it won't trigger the > > >> faulty disambiguation logic and it works. > > >> > > >> Could the ARM engineers look if there's some chicken bit in Cortex-A72 > > >> that could insert barriers between non-cached writes automatically? > > > > > > I don't think there is, and even if there was I imagine it would have a > > > pretty hideous effect on non-coherent DMA buffers and the various other > > > places in which we have Normal-NC mappings of actual system RAM. > > > > Looking at the A72 manual, there is one chicken bit that looks like it > > may be related: > > > > CPUACTLR_EL1 bit #50: > > > > 0 Enables store streaming on NC/GRE memory type. This is the reset value. > > 1 Disables store streaming on NC/GRE memory type. > > > > so putting something like > > > > mrs x0, S3_1_C15_C2_0 > > orr x0, x0, #(1 << 50) > > msr S3_1_C15_C2_0, x0 > > > > in __cpu_setup() would be worth a try. > > Note that access to this register may be disabled at EL3 by firmware > (ACTLR_EL3.CPUACTLR). > > FWIW, Mikulas' test seems to run fine on a ThunderX1 with AMD > FirePro W2100 (on /dev/fb1) I have the EDK EFI firmware sources (and I can load it from a SD card, so there's no risk of bricking the board), so I can insert the write into it, if you say where. Mikulas _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 12:42 ` Robin Murphy (?) @ 2018-08-08 18:31 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 18:31 UTC (permalink / raw) To: Robin Murphy Cc: Ard Biesheuvel, Thomas Petazzoni, Joao Pinto, linux-pci, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, Catalin Marinas, linux-arm-kernel On Mon, 6 Aug 2018, Robin Murphy wrote: > I would strongly suspect this issue is particular to Armada 8k, so its' > probably one for the Marvell folks to take a closer look at - I believe > some previous interconnect issues on those SoCs were actually fixable in > firmware. > > Robin. Do you have any contant for them? I suppose that corporate support would ignore just a single user. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 18:31 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 18:31 UTC (permalink / raw) To: linux-arm-kernel On Mon, 6 Aug 2018, Robin Murphy wrote: > I would strongly suspect this issue is particular to Armada 8k, so its' > probably one for the Marvell folks to take a closer look at - I believe > some previous interconnect issues on those SoCs were actually fixable in > firmware. > > Robin. Do you have any contant for them? I suppose that corporate support would ignore just a single user. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 18:31 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 18:31 UTC (permalink / raw) To: Robin Murphy Cc: Thomas Petazzoni, Joao Pinto, Catalin Marinas, Ard Biesheuvel, linux-pci, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, Jingoo Han, linux-arm-kernel On Mon, 6 Aug 2018, Robin Murphy wrote: > I would strongly suspect this issue is particular to Armada 8k, so its' > probably one for the Marvell folks to take a closer look at - I believe > some previous interconnect issues on those SoCs were actually fixable in > firmware. > > Robin. Do you have any contant for them? I suppose that corporate support would ignore just a single user. Mikulas _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 20:44 ` Matt Sealey (?) @ 2018-08-04 13:29 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-04 13:29 UTC (permalink / raw) To: Matt Sealey Cc: Ard Biesheuvel, Will Deacon, Jingoo Han, Joao Pinto, Thomas Petazzoni, Catalin Marinas, Russell King, Linux Kernel Mailing List, linux-arm-kernel, linux-pci On Fri, 3 Aug 2018, Matt Sealey wrote: > On 3 August 2018 at 13:25, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Fri, 3 Aug 2018, Ard Biesheuvel wrote: > > > >> Are we still talking about overlapping unaligned accesses here? Or do > >> you see other failures as well? > > > > Yes - it is caused by overlapping unaligned accesses inside memcpy. When I > > put "dmb sy" between the overlapping accesses in > > glibc/sysdeps/aarch64/memcpy.S, this program doesn't detect any memory > > corruption. > > It is a symptom of generating reorderable accesses inside memcpy. It's nothing > to do with alignment, per se (see below). A dmb sy just hides the symptoms. > > What we're talking about here - yes, Ard, within certain amounts of > reason - is that you cannot use PCI BAR memory as 'Normal' - certainly > never cacheable memory, but Normal NC isn't good either. So, are you going to map the PCI BAR as Device-nGnRE and then emulate all the unaligned accesses in the trap handler? Or are you going to give up on supporting PCIe graphics on ARM at all? Videocards have linear framebuffer for 25 years. It was introduced as a feature that simplified graphics programming a lot - programmers can use C pointer arithmetics for drawing and they don't have to fiddle with hardware registers. If you argue that graphics programmers can't use it (after they have been using it for 25 years) - they will just ignore you and ARM. > Links is broken. What else should it use? Are you going to introduce new functions memcpy_to_framebuffer() and memset_framebuffer()? > Even on Intel. No, it's not. Intel will detect overlapping accesses. You can write this - it is legal C code: void g(void); void overlapping(unsigned char *p) { p[0] = p[1] = p[2] = p[3] = 1; g(); p[3] = p[4] = p[5] = p[6] = 2; } and the compiler compiles it to this: overlapping: .LFB0: pushl %ebx subl $8, %esp movl 16(%esp), %ebx movl $16843009, (%ebx) call g movl $33686018, 3(%ebx) addl $8, %esp popl %ebx ret Now - if the CPU is incapable of detecing the hazaard between writes to (%ebx) and 3(%ebx) and reorders these writes, it is just broken because it violates the C standard. If you argue that ARM is incapable of detecting this hazaard and reorders these two overlapping memory writes - it means that you can't use C pointers to access videoram on ARM - which means that you can't have PCIe graphics at all. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-04 13:29 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-04 13:29 UTC (permalink / raw) To: linux-arm-kernel On Fri, 3 Aug 2018, Matt Sealey wrote: > On 3 August 2018 at 13:25, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Fri, 3 Aug 2018, Ard Biesheuvel wrote: > > > >> Are we still talking about overlapping unaligned accesses here? Or do > >> you see other failures as well? > > > > Yes - it is caused by overlapping unaligned accesses inside memcpy. When I > > put "dmb sy" between the overlapping accesses in > > glibc/sysdeps/aarch64/memcpy.S, this program doesn't detect any memory > > corruption. > > It is a symptom of generating reorderable accesses inside memcpy. It's nothing > to do with alignment, per se (see below). A dmb sy just hides the symptoms. > > What we're talking about here - yes, Ard, within certain amounts of > reason - is that you cannot use PCI BAR memory as 'Normal' - certainly > never cacheable memory, but Normal NC isn't good either. So, are you going to map the PCI BAR as Device-nGnRE and then emulate all the unaligned accesses in the trap handler? Or are you going to give up on supporting PCIe graphics on ARM at all? Videocards have linear framebuffer for 25 years. It was introduced as a feature that simplified graphics programming a lot - programmers can use C pointer arithmetics for drawing and they don't have to fiddle with hardware registers. If you argue that graphics programmers can't use it (after they have been using it for 25 years) - they will just ignore you and ARM. > Links is broken. What else should it use? Are you going to introduce new functions memcpy_to_framebuffer() and memset_framebuffer()? > Even on Intel. No, it's not. Intel will detect overlapping accesses. You can write this - it is legal C code: void g(void); void overlapping(unsigned char *p) { p[0] = p[1] = p[2] = p[3] = 1; g(); p[3] = p[4] = p[5] = p[6] = 2; } and the compiler compiles it to this: overlapping: .LFB0: pushl %ebx subl $8, %esp movl 16(%esp), %ebx movl $16843009, (%ebx) call g movl $33686018, 3(%ebx) addl $8, %esp popl %ebx ret Now - if the CPU is incapable of detecing the hazaard between writes to (%ebx) and 3(%ebx) and reorders these writes, it is just broken because it violates the C standard. If you argue that ARM is incapable of detecting this hazaard and reorders these two overlapping memory writes - it means that you can't use C pointers to access videoram on ARM - which means that you can't have PCIe graphics at all. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-04 13:29 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-04 13:29 UTC (permalink / raw) To: Matt Sealey Cc: Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, linux-pci, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, Catalin Marinas, linux-arm-kernel On Fri, 3 Aug 2018, Matt Sealey wrote: > On 3 August 2018 at 13:25, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Fri, 3 Aug 2018, Ard Biesheuvel wrote: > > > >> Are we still talking about overlapping unaligned accesses here? Or do > >> you see other failures as well? > > > > Yes - it is caused by overlapping unaligned accesses inside memcpy. When I > > put "dmb sy" between the overlapping accesses in > > glibc/sysdeps/aarch64/memcpy.S, this program doesn't detect any memory > > corruption. > > It is a symptom of generating reorderable accesses inside memcpy. It's nothing > to do with alignment, per se (see below). A dmb sy just hides the symptoms. > > What we're talking about here - yes, Ard, within certain amounts of > reason - is that you cannot use PCI BAR memory as 'Normal' - certainly > never cacheable memory, but Normal NC isn't good either. So, are you going to map the PCI BAR as Device-nGnRE and then emulate all the unaligned accesses in the trap handler? Or are you going to give up on supporting PCIe graphics on ARM at all? Videocards have linear framebuffer for 25 years. It was introduced as a feature that simplified graphics programming a lot - programmers can use C pointer arithmetics for drawing and they don't have to fiddle with hardware registers. If you argue that graphics programmers can't use it (after they have been using it for 25 years) - they will just ignore you and ARM. > Links is broken. What else should it use? Are you going to introduce new functions memcpy_to_framebuffer() and memset_framebuffer()? > Even on Intel. No, it's not. Intel will detect overlapping accesses. You can write this - it is legal C code: void g(void); void overlapping(unsigned char *p) { p[0] = p[1] = p[2] = p[3] = 1; g(); p[3] = p[4] = p[5] = p[6] = 2; } and the compiler compiles it to this: overlapping: .LFB0: pushl %ebx subl $8, %esp movl 16(%esp), %ebx movl $16843009, (%ebx) call g movl $33686018, 3(%ebx) addl $8, %esp popl %ebx ret Now - if the CPU is incapable of detecing the hazaard between writes to (%ebx) and 3(%ebx) and reorders these writes, it is just broken because it violates the C standard. If you argue that ARM is incapable of detecting this hazaard and reorders these two overlapping memory writes - it means that you can't use C pointers to access videoram on ARM - which means that you can't have PCIe graphics at all. Mikulas _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 20:44 ` Matt Sealey (?) @ 2018-08-08 12:16 ` Catalin Marinas -1 siblings, 0 replies; 238+ messages in thread From: Catalin Marinas @ 2018-08-08 12:16 UTC (permalink / raw) To: Matt Sealey Cc: Mikulas Patocka, Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, linux-pci, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, linux-arm-kernel Hi Matt, On Fri, Aug 03, 2018 at 03:44:44PM -0500, Matt Sealey wrote: > On 3 August 2018 at 13:25, Mikulas Patocka <mpatocka@redhat.com> wrote: > > On Fri, 3 Aug 2018, Ard Biesheuvel wrote: > >> Are we still talking about overlapping unaligned accesses here? Or do > >> you see other failures as well? > > > > Yes - it is caused by overlapping unaligned accesses inside memcpy. > > When I put "dmb sy" between the overlapping accesses in > > glibc/sysdeps/aarch64/memcpy.S, this program doesn't detect any > > memory corruption. > > It is a symptom of generating reorderable accesses inside memcpy. It's nothing > to do with alignment, per se (see below). A dmb sy just hides the symptoms. > > What we're talking about here - yes, Ard, within certain amounts of > reason - is that you cannot use PCI BAR memory as 'Normal' - certainly > never cacheable memory, but Normal NC isn't good either. That is that > your CPU cannot post writes or reads towards PCI memory spaces unless > it is dealing with it as Device memory or very strictly controlled use > of Normal Non-Cacheable. I disagree that it's not possible to use Normal NC on prefetchable BARs. This particular case looks more like a hardware issue to me as other platforms don't exhibit the same behaviour. Note that allowing Normal NC mapping of prefetchable BARs together with unaliagned accesses is also a requirement for SBSA-compliant platforms ([1]; though I don't find the text in D.2 very clear). > >> > I tried to run it on system RAM mapped with the NC attribute and I didn't > >> > get any corruption - that suggests the the bug may be in the PCIE > >> > subsystem. > > Pure fluke. Do you mean you don't expect Mikulas' test to run fine on system RAM with Normal NC mapping? We would have bigger issues if this was the case. > I'll give a simple explanation. The Arm Architecture defines > single-copy and multi-copy atomic transactions. You can treat > 'single-copy' to mean that that transaction cannot be made partial, or > reordered within itself, i.e. it must modify memory (if it is a store) > in a single swift effort and any future reads from that memory must > return the FULL result of that write. > > Multi-copy means it can be resized and reordered a bit. Will Deacon is > going to crucify me for simplifying it, but.. let's proceed with a > poor example: Not sure about Will but I think you got them wrong ;). The single/multi copy atomicity is considered in respect to (multiple) observers, a.k.a. masters, and nothing to do with reordering a bit (see B2.2 in the ARMv8 ARM). > STR X0,[X1] on a 32-bit bus cannot ever be single-copy atomic, because > you cannot write 64-bits of data on a 32-bit bus in a single, > unbreakable transaction. This is because from one bus cycle to the > next, one half of the transaction will be in a different place. Your > interconnect will have latched and buffered 32-bits and the CPU is > holding the other. It depends on the implementation, interconnect, buses. Since single-copy atomicity refers to master accesses, the above transaction could be a burst of two 32-bit writes and treated atomically by the interconnect (i.e. not interruptible). > STP X0, X1, [X2] on a 64-bit bus can be single-copy atomic with > respect to the element size. But it is on the whole multi-copy atomic > - that is to say that it can provide a single transaction with > multiple elements which are transmitted, and those elements could be > messed with on the way down the pipe. This has nothing to do with multi-copy atomicity which actually refers to multiple observers seeing the same write. The ARM architecture is not exactly multi-copy atomic anyway (rather "other-multi-copy atomic"). Architecturally, STP is treated as two single-copy accesses (as you mentioned already). Anyway, the single/multiple copy atomicity is irrelevant for the C test from Mikulas where you have the same observer (the CPU) writing and reading the memory. I wonder whether writing a byte and reading a long back would show similar corruption. > And the granularity of the hazarding in your system, from the CPU > store buffer to the bus interface to the interconnect buffering to the > PCIe bridge to the PCIe EP is.. what? Not the same all the way down, > I'll bet you. I think hazarding is what goes wrong here, especially since with overlapping unaligned addresses. However, I disagree that it is impossible to implement this properly on a platform with PCIe so that Normal NC mappings can be used. Thanks. [1] https://developer.arm.com/docs/den0029/latest/server-base-system-architecture -- Catalin ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 12:16 ` Catalin Marinas 0 siblings, 0 replies; 238+ messages in thread From: Catalin Marinas @ 2018-08-08 12:16 UTC (permalink / raw) To: linux-arm-kernel Hi Matt, On Fri, Aug 03, 2018 at 03:44:44PM -0500, Matt Sealey wrote: > On 3 August 2018 at 13:25, Mikulas Patocka <mpatocka@redhat.com> wrote: > > On Fri, 3 Aug 2018, Ard Biesheuvel wrote: > >> Are we still talking about overlapping unaligned accesses here? Or do > >> you see other failures as well? > > > > Yes - it is caused by overlapping unaligned accesses inside memcpy. > > When I put "dmb sy" between the overlapping accesses in > > glibc/sysdeps/aarch64/memcpy.S, this program doesn't detect any > > memory corruption. > > It is a symptom of generating reorderable accesses inside memcpy. It's nothing > to do with alignment, per se (see below). A dmb sy just hides the symptoms. > > What we're talking about here - yes, Ard, within certain amounts of > reason - is that you cannot use PCI BAR memory as 'Normal' - certainly > never cacheable memory, but Normal NC isn't good either. That is that > your CPU cannot post writes or reads towards PCI memory spaces unless > it is dealing with it as Device memory or very strictly controlled use > of Normal Non-Cacheable. I disagree that it's not possible to use Normal NC on prefetchable BARs. This particular case looks more like a hardware issue to me as other platforms don't exhibit the same behaviour. Note that allowing Normal NC mapping of prefetchable BARs together with unaliagned accesses is also a requirement for SBSA-compliant platforms ([1]; though I don't find the text in D.2 very clear). > >> > I tried to run it on system RAM mapped with the NC attribute and I didn't > >> > get any corruption - that suggests the the bug may be in the PCIE > >> > subsystem. > > Pure fluke. Do you mean you don't expect Mikulas' test to run fine on system RAM with Normal NC mapping? We would have bigger issues if this was the case. > I'll give a simple explanation. The Arm Architecture defines > single-copy and multi-copy atomic transactions. You can treat > 'single-copy' to mean that that transaction cannot be made partial, or > reordered within itself, i.e. it must modify memory (if it is a store) > in a single swift effort and any future reads from that memory must > return the FULL result of that write. > > Multi-copy means it can be resized and reordered a bit. Will Deacon is > going to crucify me for simplifying it, but.. let's proceed with a > poor example: Not sure about Will but I think you got them wrong ;). The single/multi copy atomicity is considered in respect to (multiple) observers, a.k.a. masters, and nothing to do with reordering a bit (see B2.2 in the ARMv8 ARM). > STR X0,[X1] on a 32-bit bus cannot ever be single-copy atomic, because > you cannot write 64-bits of data on a 32-bit bus in a single, > unbreakable transaction. This is because from one bus cycle to the > next, one half of the transaction will be in a different place. Your > interconnect will have latched and buffered 32-bits and the CPU is > holding the other. It depends on the implementation, interconnect, buses. Since single-copy atomicity refers to master accesses, the above transaction could be a burst of two 32-bit writes and treated atomically by the interconnect (i.e. not interruptible). > STP X0, X1, [X2] on a 64-bit bus can be single-copy atomic with > respect to the element size. But it is on the whole multi-copy atomic > - that is to say that it can provide a single transaction with > multiple elements which are transmitted, and those elements could be > messed with on the way down the pipe. This has nothing to do with multi-copy atomicity which actually refers to multiple observers seeing the same write. The ARM architecture is not exactly multi-copy atomic anyway (rather "other-multi-copy atomic"). Architecturally, STP is treated as two single-copy accesses (as you mentioned already). Anyway, the single/multiple copy atomicity is irrelevant for the C test from Mikulas where you have the same observer (the CPU) writing and reading the memory. I wonder whether writing a byte and reading a long back would show similar corruption. > And the granularity of the hazarding in your system, from the CPU > store buffer to the bus interface to the interconnect buffering to the > PCIe bridge to the PCIe EP is.. what? Not the same all the way down, > I'll bet you. I think hazarding is what goes wrong here, especially since with overlapping unaligned addresses. However, I disagree that it is impossible to implement this properly on a platform with PCIe so that Normal NC mappings can be used. Thanks. [1] https://developer.arm.com/docs/den0029/latest/server-base-system-architecture -- Catalin ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 12:16 ` Catalin Marinas 0 siblings, 0 replies; 238+ messages in thread From: Catalin Marinas @ 2018-08-08 12:16 UTC (permalink / raw) To: Matt Sealey Cc: Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, linux-pci, Will Deacon, Russell King, Linux Kernel Mailing List, Mikulas Patocka, Jingoo Han, linux-arm-kernel Hi Matt, On Fri, Aug 03, 2018 at 03:44:44PM -0500, Matt Sealey wrote: > On 3 August 2018 at 13:25, Mikulas Patocka <mpatocka@redhat.com> wrote: > > On Fri, 3 Aug 2018, Ard Biesheuvel wrote: > >> Are we still talking about overlapping unaligned accesses here? Or do > >> you see other failures as well? > > > > Yes - it is caused by overlapping unaligned accesses inside memcpy. > > When I put "dmb sy" between the overlapping accesses in > > glibc/sysdeps/aarch64/memcpy.S, this program doesn't detect any > > memory corruption. > > It is a symptom of generating reorderable accesses inside memcpy. It's nothing > to do with alignment, per se (see below). A dmb sy just hides the symptoms. > > What we're talking about here - yes, Ard, within certain amounts of > reason - is that you cannot use PCI BAR memory as 'Normal' - certainly > never cacheable memory, but Normal NC isn't good either. That is that > your CPU cannot post writes or reads towards PCI memory spaces unless > it is dealing with it as Device memory or very strictly controlled use > of Normal Non-Cacheable. I disagree that it's not possible to use Normal NC on prefetchable BARs. This particular case looks more like a hardware issue to me as other platforms don't exhibit the same behaviour. Note that allowing Normal NC mapping of prefetchable BARs together with unaliagned accesses is also a requirement for SBSA-compliant platforms ([1]; though I don't find the text in D.2 very clear). > >> > I tried to run it on system RAM mapped with the NC attribute and I didn't > >> > get any corruption - that suggests the the bug may be in the PCIE > >> > subsystem. > > Pure fluke. Do you mean you don't expect Mikulas' test to run fine on system RAM with Normal NC mapping? We would have bigger issues if this was the case. > I'll give a simple explanation. The Arm Architecture defines > single-copy and multi-copy atomic transactions. You can treat > 'single-copy' to mean that that transaction cannot be made partial, or > reordered within itself, i.e. it must modify memory (if it is a store) > in a single swift effort and any future reads from that memory must > return the FULL result of that write. > > Multi-copy means it can be resized and reordered a bit. Will Deacon is > going to crucify me for simplifying it, but.. let's proceed with a > poor example: Not sure about Will but I think you got them wrong ;). The single/multi copy atomicity is considered in respect to (multiple) observers, a.k.a. masters, and nothing to do with reordering a bit (see B2.2 in the ARMv8 ARM). > STR X0,[X1] on a 32-bit bus cannot ever be single-copy atomic, because > you cannot write 64-bits of data on a 32-bit bus in a single, > unbreakable transaction. This is because from one bus cycle to the > next, one half of the transaction will be in a different place. Your > interconnect will have latched and buffered 32-bits and the CPU is > holding the other. It depends on the implementation, interconnect, buses. Since single-copy atomicity refers to master accesses, the above transaction could be a burst of two 32-bit writes and treated atomically by the interconnect (i.e. not interruptible). > STP X0, X1, [X2] on a 64-bit bus can be single-copy atomic with > respect to the element size. But it is on the whole multi-copy atomic > - that is to say that it can provide a single transaction with > multiple elements which are transmitted, and those elements could be > messed with on the way down the pipe. This has nothing to do with multi-copy atomicity which actually refers to multiple observers seeing the same write. The ARM architecture is not exactly multi-copy atomic anyway (rather "other-multi-copy atomic"). Architecturally, STP is treated as two single-copy accesses (as you mentioned already). Anyway, the single/multiple copy atomicity is irrelevant for the C test from Mikulas where you have the same observer (the CPU) writing and reading the memory. I wonder whether writing a byte and reading a long back would show similar corruption. > And the granularity of the hazarding in your system, from the CPU > store buffer to the bus interface to the interconnect buffering to the > PCIe bridge to the PCIe EP is.. what? Not the same all the way down, > I'll bet you. I think hazarding is what goes wrong here, especially since with overlapping unaligned addresses. However, I disagree that it is impossible to implement this properly on a platform with PCIe so that Normal NC mappings can be used. Thanks. [1] https://developer.arm.com/docs/den0029/latest/server-base-system-architecture -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-08 12:16 ` Catalin Marinas (?) @ 2018-08-08 13:02 ` David Laight -1 siblings, 0 replies; 238+ messages in thread From: David Laight @ 2018-08-08 13:02 UTC (permalink / raw) To: 'Catalin Marinas', Matt Sealey Cc: Mikulas Patocka, Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, linux-pci, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, linux-arm-kernel From: Catalin Marinas > Sent: 08 August 2018 13:17 ... > I think hazarding is what goes wrong here, especially since with > overlapping unaligned addresses. However, I disagree that it is > impossible to implement this properly on a platform with PCIe so that > Normal NC mappings can be used. I've been trying to follow this discussion... Is the problem just that reads don't snoop/flush the write-combining buffer? Aligned writes that end on an appropriate boundary will leave the write combining buffer empty. But if the buffer isn't emptied the PCIe read gets ahead of the PCIe write. ISTR even x86 requires a fence instruction in some sequence associated with write-combining writes. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 13:02 ` David Laight 0 siblings, 0 replies; 238+ messages in thread From: David Laight @ 2018-08-08 13:02 UTC (permalink / raw) To: linux-arm-kernel From: Catalin Marinas > Sent: 08 August 2018 13:17 ... > I think hazarding is what goes wrong here, especially since with > overlapping unaligned addresses. However, I disagree that it is > impossible to implement this properly on a platform with PCIe so that > Normal NC mappings can be used. I've been trying to follow this discussion... Is the problem just that reads don't snoop/flush the write-combining buffer? Aligned writes that end on an appropriate boundary will leave the write combining buffer empty. But if the buffer isn't emptied the PCIe read gets ahead of the PCIe write. ISTR even x86 requires a fence instruction in some sequence associated with write-combining writes. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 13:02 ` David Laight 0 siblings, 0 replies; 238+ messages in thread From: David Laight @ 2018-08-08 13:02 UTC (permalink / raw) To: 'Catalin Marinas', Matt Sealey Cc: Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, linux-pci, Will Deacon, Russell King, Linux Kernel Mailing List, Mikulas Patocka, Jingoo Han, linux-arm-kernel From: Catalin Marinas > Sent: 08 August 2018 13:17 ... > I think hazarding is what goes wrong here, especially since with > overlapping unaligned addresses. However, I disagree that it is > impossible to implement this properly on a platform with PCIe so that > Normal NC mappings can be used. I've been trying to follow this discussion... Is the problem just that reads don't snoop/flush the write-combining buffer? Aligned writes that end on an appropriate boundary will leave the write combining buffer empty. But if the buffer isn't emptied the PCIe read gets ahead of the PCIe write. ISTR even x86 requires a fence instruction in some sequence associated with write-combining writes. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-08 13:02 ` David Laight (?) @ 2018-08-08 13:46 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 13:46 UTC (permalink / raw) To: David Laight Cc: 'Catalin Marinas', Matt Sealey, Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, linux-pci, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, linux-arm-kernel On Wed, 8 Aug 2018, David Laight wrote: > From: Catalin Marinas > > Sent: 08 August 2018 13:17 > ... > > I think hazarding is what goes wrong here, especially since with > > overlapping unaligned addresses. However, I disagree that it is > > impossible to implement this properly on a platform with PCIe so that > > Normal NC mappings can be used. > > I've been trying to follow this discussion... > > Is the problem just that reads don't snoop/flush the write-combining buffer? No. The pixel corruption is permanently visible on the monitor (even if there are no reads from the framebuffer at all). So it can't be explained as mishandling read-after-write hazard. > Aligned writes that end on an appropriate boundary will leave the write > combining buffer empty. > But if the buffer isn't emptied the PCIe read gets ahead of the PCIe write. > > ISTR even x86 requires a fence instruction in some sequence associated > with write-combining writes. Other x86 cores may observe wc writes out of order - but a single x86 core is self-consistent - i.e. if you do movl $0x00000000, (%ebx) movl $0xFFFFFFFF, 3(%ebx) then the byte at ebx+3 will always contain 0xFF. The core can't just corrupt data while doing reordering. The problem on ARM is that I see data corruption when the overlapping unaligned writes are done just by a single core. > David Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 13:46 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 13:46 UTC (permalink / raw) To: linux-arm-kernel On Wed, 8 Aug 2018, David Laight wrote: > From: Catalin Marinas > > Sent: 08 August 2018 13:17 > ... > > I think hazarding is what goes wrong here, especially since with > > overlapping unaligned addresses. However, I disagree that it is > > impossible to implement this properly on a platform with PCIe so that > > Normal NC mappings can be used. > > I've been trying to follow this discussion... > > Is the problem just that reads don't snoop/flush the write-combining buffer? No. The pixel corruption is permanently visible on the monitor (even if there are no reads from the framebuffer at all). So it can't be explained as mishandling read-after-write hazard. > Aligned writes that end on an appropriate boundary will leave the write > combining buffer empty. > But if the buffer isn't emptied the PCIe read gets ahead of the PCIe write. > > ISTR even x86 requires a fence instruction in some sequence associated > with write-combining writes. Other x86 cores may observe wc writes out of order - but a single x86 core is self-consistent - i.e. if you do movl $0x00000000, (%ebx) movl $0xFFFFFFFF, 3(%ebx) then the byte at ebx+3 will always contain 0xFF. The core can't just corrupt data while doing reordering. The problem on ARM is that I see data corruption when the overlapping unaligned writes are done just by a single core. > David Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 13:46 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 13:46 UTC (permalink / raw) To: David Laight Cc: Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, 'Catalin Marinas', Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, linux-pci, Jingoo Han, linux-arm-kernel On Wed, 8 Aug 2018, David Laight wrote: > From: Catalin Marinas > > Sent: 08 August 2018 13:17 > ... > > I think hazarding is what goes wrong here, especially since with > > overlapping unaligned addresses. However, I disagree that it is > > impossible to implement this properly on a platform with PCIe so that > > Normal NC mappings can be used. > > I've been trying to follow this discussion... > > Is the problem just that reads don't snoop/flush the write-combining buffer? No. The pixel corruption is permanently visible on the monitor (even if there are no reads from the framebuffer at all). So it can't be explained as mishandling read-after-write hazard. > Aligned writes that end on an appropriate boundary will leave the write > combining buffer empty. > But if the buffer isn't emptied the PCIe read gets ahead of the PCIe write. > > ISTR even x86 requires a fence instruction in some sequence associated > with write-combining writes. Other x86 cores may observe wc writes out of order - but a single x86 core is self-consistent - i.e. if you do movl $0x00000000, (%ebx) movl $0xFFFFFFFF, 3(%ebx) then the byte at ebx+3 will always contain 0xFF. The core can't just corrupt data while doing reordering. The problem on ARM is that I see data corruption when the overlapping unaligned writes are done just by a single core. > David Mikulas _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-08 13:46 ` Mikulas Patocka (?) @ 2018-08-08 14:26 ` David Laight -1 siblings, 0 replies; 238+ messages in thread From: David Laight @ 2018-08-08 14:26 UTC (permalink / raw) To: 'Mikulas Patocka' Cc: 'Catalin Marinas', Matt Sealey, Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, linux-pci, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, linux-arm-kernel From: Mikulas Patocka > Sent: 08 August 2018 14:47 ... > The problem on ARM is that I see data corruption when the overlapping > unaligned writes are done just by a single core. Is this a sequence of unaligned writes (that shouldn't modify the same physical locations) or an aligned write followed by an unaligned one that updates part of the earlier write. (Or the opposite order?) It might be that the unaligned writes are bypassing the write-combining buffer (without flushing it) - so overtake the aligned write. Alternatively the unaligned writes go through the write-combining buffer but the byte-enables aren't handled in the expected way. It ought to be possible to work out which sequence is actually broken. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 14:26 ` David Laight 0 siblings, 0 replies; 238+ messages in thread From: David Laight @ 2018-08-08 14:26 UTC (permalink / raw) To: linux-arm-kernel From: Mikulas Patocka > Sent: 08 August 2018 14:47 ... > The problem on ARM is that I see data corruption when the overlapping > unaligned writes are done just by a single core. Is this a sequence of unaligned writes (that shouldn't modify the same physical locations) or an aligned write followed by an unaligned one that updates part of the earlier write. (Or the opposite order?) It might be that the unaligned writes are bypassing the write-combining buffer (without flushing it) - so overtake the aligned write. Alternatively the unaligned writes go through the write-combining buffer but the byte-enables aren't handled in the expected way. It ought to be possible to work out which sequence is actually broken. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 14:26 ` David Laight 0 siblings, 0 replies; 238+ messages in thread From: David Laight @ 2018-08-08 14:26 UTC (permalink / raw) To: 'Mikulas Patocka' Cc: Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, 'Catalin Marinas', Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, linux-pci, Jingoo Han, linux-arm-kernel From: Mikulas Patocka > Sent: 08 August 2018 14:47 ... > The problem on ARM is that I see data corruption when the overlapping > unaligned writes are done just by a single core. Is this a sequence of unaligned writes (that shouldn't modify the same physical locations) or an aligned write followed by an unaligned one that updates part of the earlier write. (Or the opposite order?) It might be that the unaligned writes are bypassing the write-combining buffer (without flushing it) - so overtake the aligned write. Alternatively the unaligned writes go through the write-combining buffer but the byte-enables aren't handled in the expected way. It ought to be possible to work out which sequence is actually broken. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-08 14:26 ` David Laight (?) @ 2018-08-08 14:50 ` Catalin Marinas -1 siblings, 0 replies; 238+ messages in thread From: Catalin Marinas @ 2018-08-08 14:50 UTC (permalink / raw) To: David Laight Cc: 'Mikulas Patocka', Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, linux-pci, Jingoo Han, linux-arm-kernel On Wed, Aug 08, 2018 at 02:26:11PM +0000, David Laight wrote: > From: Mikulas Patocka > > Sent: 08 August 2018 14:47 > ... > > The problem on ARM is that I see data corruption when the overlapping > > unaligned writes are done just by a single core. > > Is this a sequence of unaligned writes (that shouldn't modify the > same physical locations) or an aligned write followed by an > unaligned one that updates part of the earlier write. > (Or the opposite order?) In the memcpy() case, there can be a sequence of unaligned writes but they would not modify the same byte (so no overlapping address at the byte level). -- Catalin ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 14:50 ` Catalin Marinas 0 siblings, 0 replies; 238+ messages in thread From: Catalin Marinas @ 2018-08-08 14:50 UTC (permalink / raw) To: linux-arm-kernel On Wed, Aug 08, 2018 at 02:26:11PM +0000, David Laight wrote: > From: Mikulas Patocka > > Sent: 08 August 2018 14:47 > ... > > The problem on ARM is that I see data corruption when the overlapping > > unaligned writes are done just by a single core. > > Is this a sequence of unaligned writes (that shouldn't modify the > same physical locations) or an aligned write followed by an > unaligned one that updates part of the earlier write. > (Or the opposite order?) In the memcpy() case, there can be a sequence of unaligned writes but they would not modify the same byte (so no overlapping address at the byte level). -- Catalin ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 14:50 ` Catalin Marinas 0 siblings, 0 replies; 238+ messages in thread From: Catalin Marinas @ 2018-08-08 14:50 UTC (permalink / raw) To: David Laight Cc: Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, linux-pci, Will Deacon, Russell King, Linux Kernel Mailing List, 'Mikulas Patocka', Matt Sealey, Jingoo Han, linux-arm-kernel On Wed, Aug 08, 2018 at 02:26:11PM +0000, David Laight wrote: > From: Mikulas Patocka > > Sent: 08 August 2018 14:47 > ... > > The problem on ARM is that I see data corruption when the overlapping > > unaligned writes are done just by a single core. > > Is this a sequence of unaligned writes (that shouldn't modify the > same physical locations) or an aligned write followed by an > unaligned one that updates part of the earlier write. > (Or the opposite order?) In the memcpy() case, there can be a sequence of unaligned writes but they would not modify the same byte (so no overlapping address at the byte level). -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-08 14:50 ` Catalin Marinas (?) @ 2018-08-08 16:21 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 16:21 UTC (permalink / raw) To: Catalin Marinas Cc: David Laight, Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, linux-pci, Jingoo Han, linux-arm-kernel On Wed, 8 Aug 2018, Catalin Marinas wrote: > On Wed, Aug 08, 2018 at 02:26:11PM +0000, David Laight wrote: > > From: Mikulas Patocka > > > Sent: 08 August 2018 14:47 > > ... > > > The problem on ARM is that I see data corruption when the overlapping > > > unaligned writes are done just by a single core. > > > > Is this a sequence of unaligned writes (that shouldn't modify the > > same physical locations) or an aligned write followed by an > > unaligned one that updates part of the earlier write. > > (Or the opposite order?) > > In the memcpy() case, there can be a sequence of unaligned writes but > they would not modify the same byte (so no overlapping address at the > byte level). They do modify the same byte, but with the same value. Suppose that you want to copy a piece of data that is between 8 and 16 bytes long. You can do this: add src_end, src, len add dst_end, dst, len ldr x0, [src] ldr x1, [src_end - 8] str x0, [dst] str x1, [dst_end - 8] The ARM64 memcpy uses this trick heavily in order to reduce branching, and this is what makes the PCIe controller choke. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 16:21 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 16:21 UTC (permalink / raw) To: linux-arm-kernel On Wed, 8 Aug 2018, Catalin Marinas wrote: > On Wed, Aug 08, 2018 at 02:26:11PM +0000, David Laight wrote: > > From: Mikulas Patocka > > > Sent: 08 August 2018 14:47 > > ... > > > The problem on ARM is that I see data corruption when the overlapping > > > unaligned writes are done just by a single core. > > > > Is this a sequence of unaligned writes (that shouldn't modify the > > same physical locations) or an aligned write followed by an > > unaligned one that updates part of the earlier write. > > (Or the opposite order?) > > In the memcpy() case, there can be a sequence of unaligned writes but > they would not modify the same byte (so no overlapping address at the > byte level). They do modify the same byte, but with the same value. Suppose that you want to copy a piece of data that is between 8 and 16 bytes long. You can do this: add src_end, src, len add dst_end, dst, len ldr x0, [src] ldr x1, [src_end - 8] str x0, [dst] str x1, [dst_end - 8] The ARM64 memcpy uses this trick heavily in order to reduce branching, and this is what makes the PCIe controller choke. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 16:21 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 16:21 UTC (permalink / raw) To: Catalin Marinas Cc: Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, linux-pci, Will Deacon, Russell King, Linux Kernel Mailing List, David Laight, Matt Sealey, Jingoo Han, linux-arm-kernel On Wed, 8 Aug 2018, Catalin Marinas wrote: > On Wed, Aug 08, 2018 at 02:26:11PM +0000, David Laight wrote: > > From: Mikulas Patocka > > > Sent: 08 August 2018 14:47 > > ... > > > The problem on ARM is that I see data corruption when the overlapping > > > unaligned writes are done just by a single core. > > > > Is this a sequence of unaligned writes (that shouldn't modify the > > same physical locations) or an aligned write followed by an > > unaligned one that updates part of the earlier write. > > (Or the opposite order?) > > In the memcpy() case, there can be a sequence of unaligned writes but > they would not modify the same byte (so no overlapping address at the > byte level). They do modify the same byte, but with the same value. Suppose that you want to copy a piece of data that is between 8 and 16 bytes long. You can do this: add src_end, src, len add dst_end, dst, len ldr x0, [src] ldr x1, [src_end - 8] str x0, [dst] str x1, [dst_end - 8] The ARM64 memcpy uses this trick heavily in order to reduce branching, and this is what makes the PCIe controller choke. Mikulas _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-08 16:21 ` Mikulas Patocka (?) @ 2018-08-08 16:31 ` Arnd Bergmann -1 siblings, 0 replies; 238+ messages in thread From: Arnd Bergmann @ 2018-08-08 16:31 UTC (permalink / raw) To: Mikulas Patocka Cc: Catalin Marinas, Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, linux-pci, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, David Laight, neko, Jingoo Han, Linux ARM On Wed, Aug 8, 2018 at 6:22 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > > > On Wed, Aug 08, 2018 at 02:26:11PM +0000, David Laight wrote: > > > From: Mikulas Patocka > > > > Sent: 08 August 2018 14:47 > > > ... > > > > The problem on ARM is that I see data corruption when the overlapping > > > > unaligned writes are done just by a single core. > > > > > > Is this a sequence of unaligned writes (that shouldn't modify the > > > same physical locations) or an aligned write followed by an > > > unaligned one that updates part of the earlier write. > > > (Or the opposite order?) > > > > In the memcpy() case, there can be a sequence of unaligned writes but > > they would not modify the same byte (so no overlapping address at the > > byte level). > > They do modify the same byte, but with the same value. Suppose that you > want to copy a piece of data that is between 8 and 16 bytes long. You can > do this: > > add src_end, src, len > add dst_end, dst, len > ldr x0, [src] > ldr x1, [src_end - 8] > str x0, [dst] > str x1, [dst_end - 8] > > The ARM64 memcpy uses this trick heavily in order to reduce branching, and > this is what makes the PCIe controller choke. So when a single unaligned 'stp' gets translated into a PCIe with TLP with length=5 (20 bytes) and LastBE = ~1stBE, write combining the overlapping stores gives us a TLP with a longer length (5..8 for two stores), and byte-enable bits that are not exactly a complement. If the explanation is just that of the byte-enable settings of the merged TLP are wrong, maybe the problem is that one of them is always the complement of the other, which would work for power-of-two length but not the odd length of the TLP post write-combining? Arnd ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 16:31 ` Arnd Bergmann 0 siblings, 0 replies; 238+ messages in thread From: Arnd Bergmann @ 2018-08-08 16:31 UTC (permalink / raw) To: linux-arm-kernel On Wed, Aug 8, 2018 at 6:22 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > > > On Wed, Aug 08, 2018 at 02:26:11PM +0000, David Laight wrote: > > > From: Mikulas Patocka > > > > Sent: 08 August 2018 14:47 > > > ... > > > > The problem on ARM is that I see data corruption when the overlapping > > > > unaligned writes are done just by a single core. > > > > > > Is this a sequence of unaligned writes (that shouldn't modify the > > > same physical locations) or an aligned write followed by an > > > unaligned one that updates part of the earlier write. > > > (Or the opposite order?) > > > > In the memcpy() case, there can be a sequence of unaligned writes but > > they would not modify the same byte (so no overlapping address at the > > byte level). > > They do modify the same byte, but with the same value. Suppose that you > want to copy a piece of data that is between 8 and 16 bytes long. You can > do this: > > add src_end, src, len > add dst_end, dst, len > ldr x0, [src] > ldr x1, [src_end - 8] > str x0, [dst] > str x1, [dst_end - 8] > > The ARM64 memcpy uses this trick heavily in order to reduce branching, and > this is what makes the PCIe controller choke. So when a single unaligned 'stp' gets translated into a PCIe with TLP with length=5 (20 bytes) and LastBE = ~1stBE, write combining the overlapping stores gives us a TLP with a longer length (5..8 for two stores), and byte-enable bits that are not exactly a complement. If the explanation is just that of the byte-enable settings of the merged TLP are wrong, maybe the problem is that one of them is always the complement of the other, which would work for power-of-two length but not the odd length of the TLP post write-combining? Arnd ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 16:31 ` Arnd Bergmann 0 siblings, 0 replies; 238+ messages in thread From: Arnd Bergmann @ 2018-08-08 16:31 UTC (permalink / raw) To: Mikulas Patocka Cc: Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, Catalin Marinas, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, David Laight, neko, linux-pci, Jingoo Han, Linux ARM On Wed, Aug 8, 2018 at 6:22 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > > > On Wed, Aug 08, 2018 at 02:26:11PM +0000, David Laight wrote: > > > From: Mikulas Patocka > > > > Sent: 08 August 2018 14:47 > > > ... > > > > The problem on ARM is that I see data corruption when the overlapping > > > > unaligned writes are done just by a single core. > > > > > > Is this a sequence of unaligned writes (that shouldn't modify the > > > same physical locations) or an aligned write followed by an > > > unaligned one that updates part of the earlier write. > > > (Or the opposite order?) > > > > In the memcpy() case, there can be a sequence of unaligned writes but > > they would not modify the same byte (so no overlapping address at the > > byte level). > > They do modify the same byte, but with the same value. Suppose that you > want to copy a piece of data that is between 8 and 16 bytes long. You can > do this: > > add src_end, src, len > add dst_end, dst, len > ldr x0, [src] > ldr x1, [src_end - 8] > str x0, [dst] > str x1, [dst_end - 8] > > The ARM64 memcpy uses this trick heavily in order to reduce branching, and > this is what makes the PCIe controller choke. So when a single unaligned 'stp' gets translated into a PCIe with TLP with length=5 (20 bytes) and LastBE = ~1stBE, write combining the overlapping stores gives us a TLP with a longer length (5..8 for two stores), and byte-enable bits that are not exactly a complement. If the explanation is just that of the byte-enable settings of the merged TLP are wrong, maybe the problem is that one of them is always the complement of the other, which would work for power-of-two length but not the odd length of the TLP post write-combining? Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-08 16:31 ` Arnd Bergmann (?) @ 2018-08-08 16:43 ` David Laight -1 siblings, 0 replies; 238+ messages in thread From: David Laight @ 2018-08-08 16:43 UTC (permalink / raw) To: 'Arnd Bergmann', Mikulas Patocka Cc: Catalin Marinas, Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, linux-pci, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, neko, Jingoo Han, Linux ARM From: Arnd Bergmann > Sent: 08 August 2018 17:31 .. > > They do modify the same byte, but with the same value. Suppose that you > > want to copy a piece of data that is between 8 and 16 bytes long. You can > > do this: > > > > add src_end, src, len > > add dst_end, dst, len > > ldr x0, [src] > > ldr x1, [src_end - 8] > > str x0, [dst] > > str x1, [dst_end - 8] I've done that myself (on x86) copied the last 'word' first then everything else in increasing address order. > > The ARM64 memcpy uses this trick heavily in order to reduce branching, and > > this is what makes the PCIe controller choke. More likely the write combining buffer? > So when a single unaligned 'stp' gets translated into a PCIe with TLP > with length=5 (20 bytes) and LastBE = ~1stBE, write combining the > overlapping stores gives us a TLP with a longer length (5..8 for two > stores), and byte-enable bits that are not exactly a complement. Write combining should generate a much longer TLP. Depending on the size of the write combining buffer. But in the above case I'd have thought that the second write would fail to 'combine' - because it isn't contiguous with the stored data. So something more complex will be going on. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 16:43 ` David Laight 0 siblings, 0 replies; 238+ messages in thread From: David Laight @ 2018-08-08 16:43 UTC (permalink / raw) To: linux-arm-kernel From: Arnd Bergmann > Sent: 08 August 2018 17:31 .. > > They do modify the same byte, but with the same value. Suppose that you > > want to copy a piece of data that is between 8 and 16 bytes long. You can > > do this: > > > > add src_end, src, len > > add dst_end, dst, len > > ldr x0, [src] > > ldr x1, [src_end - 8] > > str x0, [dst] > > str x1, [dst_end - 8] I've done that myself (on x86) copied the last 'word' first then everything else in increasing address order. > > The ARM64 memcpy uses this trick heavily in order to reduce branching, and > > this is what makes the PCIe controller choke. More likely the write combining buffer? > So when a single unaligned 'stp' gets translated into a PCIe with TLP > with length=5 (20 bytes) and LastBE = ~1stBE, write combining the > overlapping stores gives us a TLP with a longer length (5..8 for two > stores), and byte-enable bits that are not exactly a complement. Write combining should generate a much longer TLP. Depending on the size of the write combining buffer. But in the above case I'd have thought that the second write would fail to 'combine' - because it isn't contiguous with the stored data. So something more complex will be going on. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 16:43 ` David Laight 0 siblings, 0 replies; 238+ messages in thread From: David Laight @ 2018-08-08 16:43 UTC (permalink / raw) To: 'Arnd Bergmann', Mikulas Patocka Cc: Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, Catalin Marinas, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, neko, linux-pci, Jingoo Han, Linux ARM From: Arnd Bergmann > Sent: 08 August 2018 17:31 .. > > They do modify the same byte, but with the same value. Suppose that you > > want to copy a piece of data that is between 8 and 16 bytes long. You can > > do this: > > > > add src_end, src, len > > add dst_end, dst, len > > ldr x0, [src] > > ldr x1, [src_end - 8] > > str x0, [dst] > > str x1, [dst_end - 8] I've done that myself (on x86) copied the last 'word' first then everything else in increasing address order. > > The ARM64 memcpy uses this trick heavily in order to reduce branching, and > > this is what makes the PCIe controller choke. More likely the write combining buffer? > So when a single unaligned 'stp' gets translated into a PCIe with TLP > with length=5 (20 bytes) and LastBE = ~1stBE, write combining the > overlapping stores gives us a TLP with a longer length (5..8 for two > stores), and byte-enable bits that are not exactly a complement. Write combining should generate a much longer TLP. Depending on the size of the write combining buffer. But in the above case I'd have thought that the second write would fail to 'combine' - because it isn't contiguous with the stored data. So something more complex will be going on. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-08 16:43 ` David Laight (?) @ 2018-08-08 18:56 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 18:56 UTC (permalink / raw) To: David Laight Cc: 'Arnd Bergmann', Catalin Marinas, Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, linux-pci, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, neko, Jingoo Han, Linux ARM On Wed, 8 Aug 2018, David Laight wrote: > From: Arnd Bergmann > > Sent: 08 August 2018 17:31 > .. > > > They do modify the same byte, but with the same value. Suppose that you > > > want to copy a piece of data that is between 8 and 16 bytes long. You can > > > do this: > > > > > > add src_end, src, len > > > add dst_end, dst, len > > > ldr x0, [src] > > > ldr x1, [src_end - 8] > > > str x0, [dst] > > > str x1, [dst_end - 8] > > I've done that myself (on x86) copied the last 'word' first then > everything else in increasing address order. > > > > The ARM64 memcpy uses this trick heavily in order to reduce branching, and > > > this is what makes the PCIe controller choke. > > More likely the write combining buffer? When I write to memory (using the NC mapping - that is also used in the PCI BAR), I get no corruption. So the corruption must be in the PCIe controller, not the core or memory subsystem. I also tried to disable write streaming on NC mapping with a chicken bit, but it didn't help. > > So when a single unaligned 'stp' gets translated into a PCIe with TLP > > with length=5 (20 bytes) and LastBE = ~1stBE, write combining the > > overlapping stores gives us a TLP with a longer length (5..8 for two > > stores), and byte-enable bits that are not exactly a complement. > > Write combining should generate a much longer TLP. > Depending on the size of the write combining buffer. > > But in the above case I'd have thought that the second write > would fail to 'combine' - because it isn't contiguous with the > stored data. > > So something more complex will be going on. > > David Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 18:56 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 18:56 UTC (permalink / raw) To: linux-arm-kernel On Wed, 8 Aug 2018, David Laight wrote: > From: Arnd Bergmann > > Sent: 08 August 2018 17:31 > .. > > > They do modify the same byte, but with the same value. Suppose that you > > > want to copy a piece of data that is between 8 and 16 bytes long. You can > > > do this: > > > > > > add src_end, src, len > > > add dst_end, dst, len > > > ldr x0, [src] > > > ldr x1, [src_end - 8] > > > str x0, [dst] > > > str x1, [dst_end - 8] > > I've done that myself (on x86) copied the last 'word' first then > everything else in increasing address order. > > > > The ARM64 memcpy uses this trick heavily in order to reduce branching, and > > > this is what makes the PCIe controller choke. > > More likely the write combining buffer? When I write to memory (using the NC mapping - that is also used in the PCI BAR), I get no corruption. So the corruption must be in the PCIe controller, not the core or memory subsystem. I also tried to disable write streaming on NC mapping with a chicken bit, but it didn't help. > > So when a single unaligned 'stp' gets translated into a PCIe with TLP > > with length=5 (20 bytes) and LastBE = ~1stBE, write combining the > > overlapping stores gives us a TLP with a longer length (5..8 for two > > stores), and byte-enable bits that are not exactly a complement. > > Write combining should generate a much longer TLP. > Depending on the size of the write combining buffer. > > But in the above case I'd have thought that the second write > would fail to 'combine' - because it isn't contiguous with the > stored data. > > So something more complex will be going on. > > David Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 18:56 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 18:56 UTC (permalink / raw) To: David Laight Cc: Thomas Petazzoni, Joao Pinto, 'Arnd Bergmann', Ard Biesheuvel, Catalin Marinas, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, neko, linux-pci, Jingoo Han, Linux ARM On Wed, 8 Aug 2018, David Laight wrote: > From: Arnd Bergmann > > Sent: 08 August 2018 17:31 > .. > > > They do modify the same byte, but with the same value. Suppose that you > > > want to copy a piece of data that is between 8 and 16 bytes long. You can > > > do this: > > > > > > add src_end, src, len > > > add dst_end, dst, len > > > ldr x0, [src] > > > ldr x1, [src_end - 8] > > > str x0, [dst] > > > str x1, [dst_end - 8] > > I've done that myself (on x86) copied the last 'word' first then > everything else in increasing address order. > > > > The ARM64 memcpy uses this trick heavily in order to reduce branching, and > > > this is what makes the PCIe controller choke. > > More likely the write combining buffer? When I write to memory (using the NC mapping - that is also used in the PCI BAR), I get no corruption. So the corruption must be in the PCIe controller, not the core or memory subsystem. I also tried to disable write streaming on NC mapping with a chicken bit, but it didn't help. > > So when a single unaligned 'stp' gets translated into a PCIe with TLP > > with length=5 (20 bytes) and LastBE = ~1stBE, write combining the > > overlapping stores gives us a TLP with a longer length (5..8 for two > > stores), and byte-enable bits that are not exactly a complement. > > Write combining should generate a much longer TLP. > Depending on the size of the write combining buffer. > > But in the above case I'd have thought that the second write > would fail to 'combine' - because it isn't contiguous with the > stored data. > > So something more complex will be going on. > > David Mikulas _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-08 14:26 ` David Laight (?) @ 2018-08-08 18:37 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 18:37 UTC (permalink / raw) To: David Laight Cc: 'Catalin Marinas', Matt Sealey, Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, linux-pci, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, linux-arm-kernel On Wed, 8 Aug 2018, David Laight wrote: > From: Mikulas Patocka > > Sent: 08 August 2018 14:47 > ... > > The problem on ARM is that I see data corruption when the overlapping > > unaligned writes are done just by a single core. > > Is this a sequence of unaligned writes (that shouldn't modify the > same physical locations) or an aligned write followed by an > unaligned one that updates part of the earlier write. > (Or the opposite order?) > > It might be that the unaligned writes are bypassing the write-combining > buffer (without flushing it) - so overtake the aligned write. > > Alternatively the unaligned writes go through the write-combining > buffer but the byte-enables aren't handled in the expected way. > > It ought to be possible to work out which sequence is actually broken. > > David All the unaligned/or aligned writes inside memcpy write the same value to the overlapping bytes. So, the corruption can't be explained just by reordering the writes or failing to detect hazard between them. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 18:37 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 18:37 UTC (permalink / raw) To: linux-arm-kernel On Wed, 8 Aug 2018, David Laight wrote: > From: Mikulas Patocka > > Sent: 08 August 2018 14:47 > ... > > The problem on ARM is that I see data corruption when the overlapping > > unaligned writes are done just by a single core. > > Is this a sequence of unaligned writes (that shouldn't modify the > same physical locations) or an aligned write followed by an > unaligned one that updates part of the earlier write. > (Or the opposite order?) > > It might be that the unaligned writes are bypassing the write-combining > buffer (without flushing it) - so overtake the aligned write. > > Alternatively the unaligned writes go through the write-combining > buffer but the byte-enables aren't handled in the expected way. > > It ought to be possible to work out which sequence is actually broken. > > David All the unaligned/or aligned writes inside memcpy write the same value to the overlapping bytes. So, the corruption can't be explained just by reordering the writes or failing to detect hazard between them. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 18:37 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 18:37 UTC (permalink / raw) To: David Laight Cc: Thomas Petazzoni, Joao Pinto, Ard Biesheuvel, 'Catalin Marinas', Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, linux-pci, Jingoo Han, linux-arm-kernel On Wed, 8 Aug 2018, David Laight wrote: > From: Mikulas Patocka > > Sent: 08 August 2018 14:47 > ... > > The problem on ARM is that I see data corruption when the overlapping > > unaligned writes are done just by a single core. > > Is this a sequence of unaligned writes (that shouldn't modify the > same physical locations) or an aligned write followed by an > unaligned one that updates part of the earlier write. > (Or the opposite order?) > > It might be that the unaligned writes are bypassing the write-combining > buffer (without flushing it) - so overtake the aligned write. > > Alternatively the unaligned writes go through the write-combining > buffer but the byte-enables aren't handled in the expected way. > > It ought to be possible to work out which sequence is actually broken. > > David All the unaligned/or aligned writes inside memcpy write the same value to the overlapping bytes. So, the corruption can't be explained just by reordering the writes or failing to detect hazard between them. Mikulas _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 17:09 ` Mikulas Patocka (?) @ 2018-08-08 11:39 ` Catalin Marinas -1 siblings, 0 replies; 238+ messages in thread From: Catalin Marinas @ 2018-08-08 11:39 UTC (permalink / raw) To: Mikulas Patocka Cc: Will Deacon, Jingoo Han, Joao Pinto, Thomas Petazzoni, libc-alpha, Ard Biesheuvel, Russell King, Linux Kernel Mailing List, Matt Sealey, linux-pci, linux-arm-kernel On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > while (1) { > start = (unsigned)random() % (LEN + 1); > end = (unsigned)random() % (LEN + 1); > if (start > end) > continue; > for (i = start; i < end; i++) > data[i] = val++; > memcpy(map + start, data + start, end - start); > if (memcmp(map, data, LEN)) { It may be worth trying to do a memcmp(map+start, data+start, end-start) here to see whether the hazard logic fails when the writes are unaligned but the reads are not. This problem may as well appear if you do byte writes and read longs back (and I consider this a hardware problem on this specific board). -- Catalin ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 11:39 ` Catalin Marinas 0 siblings, 0 replies; 238+ messages in thread From: Catalin Marinas @ 2018-08-08 11:39 UTC (permalink / raw) To: linux-arm-kernel On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > while (1) { > start = (unsigned)random() % (LEN + 1); > end = (unsigned)random() % (LEN + 1); > if (start > end) > continue; > for (i = start; i < end; i++) > data[i] = val++; > memcpy(map + start, data + start, end - start); > if (memcmp(map, data, LEN)) { It may be worth trying to do a memcmp(map+start, data+start, end-start) here to see whether the hazard logic fails when the writes are unaligned but the reads are not. This problem may as well appear if you do byte writes and read longs back (and I consider this a hardware problem on this specific board). -- Catalin ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 11:39 ` Catalin Marinas 0 siblings, 0 replies; 238+ messages in thread From: Catalin Marinas @ 2018-08-08 11:39 UTC (permalink / raw) To: Mikulas Patocka Cc: Thomas Petazzoni, Joao Pinto, libc-alpha, Ard Biesheuvel, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, linux-pci, linux-arm-kernel On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > while (1) { > start = (unsigned)random() % (LEN + 1); > end = (unsigned)random() % (LEN + 1); > if (start > end) > continue; > for (i = start; i < end; i++) > data[i] = val++; > memcpy(map + start, data + start, end - start); > if (memcmp(map, data, LEN)) { It may be worth trying to do a memcmp(map+start, data+start, end-start) here to see whether the hazard logic fails when the writes are unaligned but the reads are not. This problem may as well appear if you do byte writes and read longs back (and I consider this a hardware problem on this specific board). -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-08 11:39 ` Catalin Marinas (?) @ 2018-08-08 14:12 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 14:12 UTC (permalink / raw) To: Catalin Marinas Cc: Will Deacon, Jingoo Han, Joao Pinto, Thomas Petazzoni, libc-alpha, Ard Biesheuvel, Russell King, Linux Kernel Mailing List, Matt Sealey, linux-pci, linux-arm-kernel On Wed, 8 Aug 2018, Catalin Marinas wrote: > On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > > while (1) { > > start = (unsigned)random() % (LEN + 1); > > end = (unsigned)random() % (LEN + 1); > > if (start > end) > > continue; > > for (i = start; i < end; i++) > > data[i] = val++; > > memcpy(map + start, data + start, end - start); > > if (memcmp(map, data, LEN)) { > > It may be worth trying to do a memcmp(map+start, data+start, end-start) > here to see whether the hazard logic fails when the writes are unaligned > but the reads are not. > > This problem may as well appear if you do byte writes and read longs > back (and I consider this a hardware problem on this specific board). I triad to insert usleep(10000) between the memcpy and memcmp, but the same corruption occurs. So, it can't be read-after-write hazard. It is caused by the improper handling of hazard between the overlapping writes inside memcpy. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 14:12 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 14:12 UTC (permalink / raw) To: linux-arm-kernel On Wed, 8 Aug 2018, Catalin Marinas wrote: > On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > > while (1) { > > start = (unsigned)random() % (LEN + 1); > > end = (unsigned)random() % (LEN + 1); > > if (start > end) > > continue; > > for (i = start; i < end; i++) > > data[i] = val++; > > memcpy(map + start, data + start, end - start); > > if (memcmp(map, data, LEN)) { > > It may be worth trying to do a memcmp(map+start, data+start, end-start) > here to see whether the hazard logic fails when the writes are unaligned > but the reads are not. > > This problem may as well appear if you do byte writes and read longs > back (and I consider this a hardware problem on this specific board). I triad to insert usleep(10000) between the memcpy and memcmp, but the same corruption occurs. So, it can't be read-after-write hazard. It is caused by the improper handling of hazard between the overlapping writes inside memcpy. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 14:12 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 14:12 UTC (permalink / raw) To: Catalin Marinas Cc: Thomas Petazzoni, Joao Pinto, libc-alpha, Ard Biesheuvel, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, linux-pci, linux-arm-kernel On Wed, 8 Aug 2018, Catalin Marinas wrote: > On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > > while (1) { > > start = (unsigned)random() % (LEN + 1); > > end = (unsigned)random() % (LEN + 1); > > if (start > end) > > continue; > > for (i = start; i < end; i++) > > data[i] = val++; > > memcpy(map + start, data + start, end - start); > > if (memcmp(map, data, LEN)) { > > It may be worth trying to do a memcmp(map+start, data+start, end-start) > here to see whether the hazard logic fails when the writes are unaligned > but the reads are not. > > This problem may as well appear if you do byte writes and read longs > back (and I consider this a hardware problem on this specific board). I triad to insert usleep(10000) between the memcpy and memcmp, but the same corruption occurs. So, it can't be read-after-write hazard. It is caused by the improper handling of hazard between the overlapping writes inside memcpy. Mikulas _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-08 14:12 ` Mikulas Patocka (?) @ 2018-08-08 14:28 ` Catalin Marinas -1 siblings, 0 replies; 238+ messages in thread From: Catalin Marinas @ 2018-08-08 14:28 UTC (permalink / raw) To: Mikulas Patocka Cc: Thomas Petazzoni, Joao Pinto, libc-alpha, Ard Biesheuvel, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, linux-pci, linux-arm-kernel On Wed, Aug 08, 2018 at 10:12:27AM -0400, Mikulas Patocka wrote: > On Wed, 8 Aug 2018, Catalin Marinas wrote: > > On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > > > while (1) { > > > start = (unsigned)random() % (LEN + 1); > > > end = (unsigned)random() % (LEN + 1); > > > if (start > end) > > > continue; > > > for (i = start; i < end; i++) > > > data[i] = val++; > > > memcpy(map + start, data + start, end - start); > > > if (memcmp(map, data, LEN)) { > > > > It may be worth trying to do a memcmp(map+start, data+start, end-start) > > here to see whether the hazard logic fails when the writes are unaligned > > but the reads are not. > > > > This problem may as well appear if you do byte writes and read longs > > back (and I consider this a hardware problem on this specific board). > > I triad to insert usleep(10000) between the memcpy and memcmp, but the > same corruption occurs. So, it can't be read-after-write hazard. It is > caused by the improper handling of hazard between the overlapping writes > inside memcpy. It could get it wrong between subsequent writes to the same 64-bit range (e.g. the address & ~63 is the same but the data strobes for which bytes to write are different). If it somehow thinks that it's a write-after-write hazard even though the strobes are different, it could cancel one of the writes. It may be worth trying with a byte-only memcpy() function while keeping the default memcmp(). -- Catalin ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 14:28 ` Catalin Marinas 0 siblings, 0 replies; 238+ messages in thread From: Catalin Marinas @ 2018-08-08 14:28 UTC (permalink / raw) To: linux-arm-kernel On Wed, Aug 08, 2018 at 10:12:27AM -0400, Mikulas Patocka wrote: > On Wed, 8 Aug 2018, Catalin Marinas wrote: > > On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > > > while (1) { > > > start = (unsigned)random() % (LEN + 1); > > > end = (unsigned)random() % (LEN + 1); > > > if (start > end) > > > continue; > > > for (i = start; i < end; i++) > > > data[i] = val++; > > > memcpy(map + start, data + start, end - start); > > > if (memcmp(map, data, LEN)) { > > > > It may be worth trying to do a memcmp(map+start, data+start, end-start) > > here to see whether the hazard logic fails when the writes are unaligned > > but the reads are not. > > > > This problem may as well appear if you do byte writes and read longs > > back (and I consider this a hardware problem on this specific board). > > I triad to insert usleep(10000) between the memcpy and memcmp, but the > same corruption occurs. So, it can't be read-after-write hazard. It is > caused by the improper handling of hazard between the overlapping writes > inside memcpy. It could get it wrong between subsequent writes to the same 64-bit range (e.g. the address & ~63 is the same but the data strobes for which bytes to write are different). If it somehow thinks that it's a write-after-write hazard even though the strobes are different, it could cancel one of the writes. It may be worth trying with a byte-only memcpy() function while keeping the default memcmp(). -- Catalin ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 14:28 ` Catalin Marinas 0 siblings, 0 replies; 238+ messages in thread From: Catalin Marinas @ 2018-08-08 14:28 UTC (permalink / raw) To: Mikulas Patocka Cc: Thomas Petazzoni, Joao Pinto, libc-alpha, Ard Biesheuvel, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, linux-pci, linux-arm-kernel On Wed, Aug 08, 2018 at 10:12:27AM -0400, Mikulas Patocka wrote: > On Wed, 8 Aug 2018, Catalin Marinas wrote: > > On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > > > while (1) { > > > start = (unsigned)random() % (LEN + 1); > > > end = (unsigned)random() % (LEN + 1); > > > if (start > end) > > > continue; > > > for (i = start; i < end; i++) > > > data[i] = val++; > > > memcpy(map + start, data + start, end - start); > > > if (memcmp(map, data, LEN)) { > > > > It may be worth trying to do a memcmp(map+start, data+start, end-start) > > here to see whether the hazard logic fails when the writes are unaligned > > but the reads are not. > > > > This problem may as well appear if you do byte writes and read longs > > back (and I consider this a hardware problem on this specific board). > > I triad to insert usleep(10000) between the memcpy and memcmp, but the > same corruption occurs. So, it can't be read-after-write hazard. It is > caused by the improper handling of hazard between the overlapping writes > inside memcpy. It could get it wrong between subsequent writes to the same 64-bit range (e.g. the address & ~63 is the same but the data strobes for which bytes to write are different). If it somehow thinks that it's a write-after-write hazard even though the strobes are different, it could cancel one of the writes. It may be worth trying with a byte-only memcpy() function while keeping the default memcmp(). -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-08 14:28 ` Catalin Marinas (?) @ 2018-08-08 18:40 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 18:40 UTC (permalink / raw) To: Catalin Marinas Cc: Thomas Petazzoni, Joao Pinto, libc-alpha, Ard Biesheuvel, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, linux-pci, linux-arm-kernel On Wed, 8 Aug 2018, Catalin Marinas wrote: > On Wed, Aug 08, 2018 at 10:12:27AM -0400, Mikulas Patocka wrote: > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > > > On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > > > > while (1) { > > > > start = (unsigned)random() % (LEN + 1); > > > > end = (unsigned)random() % (LEN + 1); > > > > if (start > end) > > > > continue; > > > > for (i = start; i < end; i++) > > > > data[i] = val++; > > > > memcpy(map + start, data + start, end - start); > > > > if (memcmp(map, data, LEN)) { > > > > > > It may be worth trying to do a memcmp(map+start, data+start, end-start) > > > here to see whether the hazard logic fails when the writes are unaligned > > > but the reads are not. > > > > > > This problem may as well appear if you do byte writes and read longs > > > back (and I consider this a hardware problem on this specific board). > > > > I triad to insert usleep(10000) between the memcpy and memcmp, but the > > same corruption occurs. So, it can't be read-after-write hazard. It is > > caused by the improper handling of hazard between the overlapping writes > > inside memcpy. > > It could get it wrong between subsequent writes to the same 64-bit range > (e.g. the address & ~63 is the same but the data strobes for which bytes > to write are different). If it somehow thinks that it's a > write-after-write hazard even though the strobes are different, it could > cancel one of the writes. I believe that the SoC has logic for write-after-write detection, but the logic is broken and corrupts data. If I insert "dmb sy" between the overlapping writes, there's no corruption (the PCIe controller won't see any overlapping writes in that case). > It may be worth trying with a byte-only memcpy() function while keeping > the default memcmp(). I tried that and byte-only memcpy works without any corruption. > -- > Catalin Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 18:40 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 18:40 UTC (permalink / raw) To: linux-arm-kernel On Wed, 8 Aug 2018, Catalin Marinas wrote: > On Wed, Aug 08, 2018 at 10:12:27AM -0400, Mikulas Patocka wrote: > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > > > On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > > > > while (1) { > > > > start = (unsigned)random() % (LEN + 1); > > > > end = (unsigned)random() % (LEN + 1); > > > > if (start > end) > > > > continue; > > > > for (i = start; i < end; i++) > > > > data[i] = val++; > > > > memcpy(map + start, data + start, end - start); > > > > if (memcmp(map, data, LEN)) { > > > > > > It may be worth trying to do a memcmp(map+start, data+start, end-start) > > > here to see whether the hazard logic fails when the writes are unaligned > > > but the reads are not. > > > > > > This problem may as well appear if you do byte writes and read longs > > > back (and I consider this a hardware problem on this specific board). > > > > I triad to insert usleep(10000) between the memcpy and memcmp, but the > > same corruption occurs. So, it can't be read-after-write hazard. It is > > caused by the improper handling of hazard between the overlapping writes > > inside memcpy. > > It could get it wrong between subsequent writes to the same 64-bit range > (e.g. the address & ~63 is the same but the data strobes for which bytes > to write are different). If it somehow thinks that it's a > write-after-write hazard even though the strobes are different, it could > cancel one of the writes. I believe that the SoC has logic for write-after-write detection, but the logic is broken and corrupts data. If I insert "dmb sy" between the overlapping writes, there's no corruption (the PCIe controller won't see any overlapping writes in that case). > It may be worth trying with a byte-only memcpy() function while keeping > the default memcmp(). I tried that and byte-only memcpy works without any corruption. > -- > Catalin Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 18:40 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 18:40 UTC (permalink / raw) To: Catalin Marinas Cc: Thomas Petazzoni, Joao Pinto, libc-alpha, Ard Biesheuvel, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, linux-pci, linux-arm-kernel On Wed, 8 Aug 2018, Catalin Marinas wrote: > On Wed, Aug 08, 2018 at 10:12:27AM -0400, Mikulas Patocka wrote: > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > > > On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > > > > while (1) { > > > > start = (unsigned)random() % (LEN + 1); > > > > end = (unsigned)random() % (LEN + 1); > > > > if (start > end) > > > > continue; > > > > for (i = start; i < end; i++) > > > > data[i] = val++; > > > > memcpy(map + start, data + start, end - start); > > > > if (memcmp(map, data, LEN)) { > > > > > > It may be worth trying to do a memcmp(map+start, data+start, end-start) > > > here to see whether the hazard logic fails when the writes are unaligned > > > but the reads are not. > > > > > > This problem may as well appear if you do byte writes and read longs > > > back (and I consider this a hardware problem on this specific board). > > > > I triad to insert usleep(10000) between the memcpy and memcmp, but the > > same corruption occurs. So, it can't be read-after-write hazard. It is > > caused by the improper handling of hazard between the overlapping writes > > inside memcpy. > > It could get it wrong between subsequent writes to the same 64-bit range > (e.g. the address & ~63 is the same but the data strobes for which bytes > to write are different). If it somehow thinks that it's a > write-after-write hazard even though the strobes are different, it could > cancel one of the writes. I believe that the SoC has logic for write-after-write detection, but the logic is broken and corrupts data. If I insert "dmb sy" between the overlapping writes, there's no corruption (the PCIe controller won't see any overlapping writes in that case). > It may be worth trying with a byte-only memcpy() function while keeping > the default memcmp(). I tried that and byte-only memcpy works without any corruption. > -- > Catalin Mikulas _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-08 14:12 ` Mikulas Patocka (?) @ 2018-08-08 15:01 ` Richard Earnshaw (lists) -1 siblings, 0 replies; 238+ messages in thread From: Richard Earnshaw (lists) @ 2018-08-08 15:01 UTC (permalink / raw) To: Mikulas Patocka, Catalin Marinas Cc: Will Deacon, Jingoo Han, Joao Pinto, Thomas Petazzoni, libc-alpha, Ard Biesheuvel, Russell King, Linux Kernel Mailing List, Matt Sealey, linux-pci, linux-arm-kernel On 08/08/18 15:12, Mikulas Patocka wrote: > > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > >> On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: >>> while (1) { >>> start = (unsigned)random() % (LEN + 1); >>> end = (unsigned)random() % (LEN + 1); >>> if (start > end) >>> continue; >>> for (i = start; i < end; i++) >>> data[i] = val++; >>> memcpy(map + start, data + start, end - start); >>> if (memcmp(map, data, LEN)) { >> >> It may be worth trying to do a memcmp(map+start, data+start, end-start) >> here to see whether the hazard logic fails when the writes are unaligned >> but the reads are not. >> >> This problem may as well appear if you do byte writes and read longs >> back (and I consider this a hardware problem on this specific board). > > I triad to insert usleep(10000) between the memcpy and memcmp, but the > same corruption occurs. So, it can't be read-after-write hazard. It is > caused by the improper handling of hazard between the overlapping writes > inside memcpy. > > Mikulas > I don't think you've told us what form the corruption takes. Does it lose some bytes? Modify values beyond the copy range? Write completely arbitrary values? The overlapping writes in memcpy never write different values to the same location, so I still feel this must be some sort of HW issue, not a SW one. R. ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 15:01 ` Richard Earnshaw (lists) 0 siblings, 0 replies; 238+ messages in thread From: Richard Earnshaw (lists) @ 2018-08-08 15:01 UTC (permalink / raw) To: linux-arm-kernel On 08/08/18 15:12, Mikulas Patocka wrote: > > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > >> On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: >>> while (1) { >>> start = (unsigned)random() % (LEN + 1); >>> end = (unsigned)random() % (LEN + 1); >>> if (start > end) >>> continue; >>> for (i = start; i < end; i++) >>> data[i] = val++; >>> memcpy(map + start, data + start, end - start); >>> if (memcmp(map, data, LEN)) { >> >> It may be worth trying to do a memcmp(map+start, data+start, end-start) >> here to see whether the hazard logic fails when the writes are unaligned >> but the reads are not. >> >> This problem may as well appear if you do byte writes and read longs >> back (and I consider this a hardware problem on this specific board). > > I triad to insert usleep(10000) between the memcpy and memcmp, but the > same corruption occurs. So, it can't be read-after-write hazard. It is > caused by the improper handling of hazard between the overlapping writes > inside memcpy. > > Mikulas > I don't think you've told us what form the corruption takes. Does it lose some bytes? Modify values beyond the copy range? Write completely arbitrary values? The overlapping writes in memcpy never write different values to the same location, so I still feel this must be some sort of HW issue, not a SW one. R. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 15:01 ` Richard Earnshaw (lists) 0 siblings, 0 replies; 238+ messages in thread From: Richard Earnshaw (lists) @ 2018-08-08 15:01 UTC (permalink / raw) To: Mikulas Patocka, Catalin Marinas Cc: Thomas Petazzoni, Joao Pinto, libc-alpha, Ard Biesheuvel, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, linux-pci, linux-arm-kernel On 08/08/18 15:12, Mikulas Patocka wrote: > > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > >> On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: >>> while (1) { >>> start = (unsigned)random() % (LEN + 1); >>> end = (unsigned)random() % (LEN + 1); >>> if (start > end) >>> continue; >>> for (i = start; i < end; i++) >>> data[i] = val++; >>> memcpy(map + start, data + start, end - start); >>> if (memcmp(map, data, LEN)) { >> >> It may be worth trying to do a memcmp(map+start, data+start, end-start) >> here to see whether the hazard logic fails when the writes are unaligned >> but the reads are not. >> >> This problem may as well appear if you do byte writes and read longs >> back (and I consider this a hardware problem on this specific board). > > I triad to insert usleep(10000) between the memcpy and memcmp, but the > same corruption occurs. So, it can't be read-after-write hazard. It is > caused by the improper handling of hazard between the overlapping writes > inside memcpy. > > Mikulas > I don't think you've told us what form the corruption takes. Does it lose some bytes? Modify values beyond the copy range? Write completely arbitrary values? The overlapping writes in memcpy never write different values to the same location, so I still feel this must be some sort of HW issue, not a SW one. R. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-08 15:01 ` Richard Earnshaw (lists) (?) @ 2018-08-08 15:14 ` Catalin Marinas -1 siblings, 0 replies; 238+ messages in thread From: Catalin Marinas @ 2018-08-08 15:14 UTC (permalink / raw) To: Richard Earnshaw (lists) Cc: Mikulas Patocka, Thomas Petazzoni, Joao Pinto, libc-alpha, Ard Biesheuvel, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, Matt Sealey, linux-pci, linux-arm-kernel On Wed, Aug 08, 2018 at 04:01:12PM +0100, Richard Earnshaw wrote: > On 08/08/18 15:12, Mikulas Patocka wrote: > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > >> On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > >>> while (1) { > >>> start = (unsigned)random() % (LEN + 1); > >>> end = (unsigned)random() % (LEN + 1); > >>> if (start > end) > >>> continue; > >>> for (i = start; i < end; i++) > >>> data[i] = val++; > >>> memcpy(map + start, data + start, end - start); > >>> if (memcmp(map, data, LEN)) { > >> > >> It may be worth trying to do a memcmp(map+start, data+start, end-start) > >> here to see whether the hazard logic fails when the writes are unaligned > >> but the reads are not. > >> > >> This problem may as well appear if you do byte writes and read longs > >> back (and I consider this a hardware problem on this specific board). > > > > I triad to insert usleep(10000) between the memcpy and memcmp, but the > > same corruption occurs. So, it can't be read-after-write hazard. It is > > caused by the improper handling of hazard between the overlapping writes > > inside memcpy. > > I don't think you've told us what form the corruption takes. Does it > lose some bytes? Modify values beyond the copy range? Write completely > arbitrary values? From this message: https://lore.kernel.org/lkml/alpine.LRH.2.02.1808060553130.30832@file01.intranet.prod.int.rdu2.redhat.com/ - failing to write a few bytes - writing a few bytes that were written 16 bytes before - writing a few bytes that were written 16 bytes after > The overlapping writes in memcpy never write different values to the > same location, so I still feel this must be some sort of HW issue, not a > SW one. So do I (my interpretation is that it combines or rather skips some of the writes to the same 16-byte address as it ignores the data strobes). -- Catalin ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 15:14 ` Catalin Marinas 0 siblings, 0 replies; 238+ messages in thread From: Catalin Marinas @ 2018-08-08 15:14 UTC (permalink / raw) To: linux-arm-kernel On Wed, Aug 08, 2018 at 04:01:12PM +0100, Richard Earnshaw wrote: > On 08/08/18 15:12, Mikulas Patocka wrote: > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > >> On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > >>> while (1) { > >>> start = (unsigned)random() % (LEN + 1); > >>> end = (unsigned)random() % (LEN + 1); > >>> if (start > end) > >>> continue; > >>> for (i = start; i < end; i++) > >>> data[i] = val++; > >>> memcpy(map + start, data + start, end - start); > >>> if (memcmp(map, data, LEN)) { > >> > >> It may be worth trying to do a memcmp(map+start, data+start, end-start) > >> here to see whether the hazard logic fails when the writes are unaligned > >> but the reads are not. > >> > >> This problem may as well appear if you do byte writes and read longs > >> back (and I consider this a hardware problem on this specific board). > > > > I triad to insert usleep(10000) between the memcpy and memcmp, but the > > same corruption occurs. So, it can't be read-after-write hazard. It is > > caused by the improper handling of hazard between the overlapping writes > > inside memcpy. > > I don't think you've told us what form the corruption takes. Does it > lose some bytes? Modify values beyond the copy range? Write completely > arbitrary values? >From this message: https://lore.kernel.org/lkml/alpine.LRH.2.02.1808060553130.30832 at file01.intranet.prod.int.rdu2.redhat.com/ - failing to write a few bytes - writing a few bytes that were written 16 bytes before - writing a few bytes that were written 16 bytes after > The overlapping writes in memcpy never write different values to the > same location, so I still feel this must be some sort of HW issue, not a > SW one. So do I (my interpretation is that it combines or rather skips some of the writes to the same 16-byte address as it ignores the data strobes). -- Catalin ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 15:14 ` Catalin Marinas 0 siblings, 0 replies; 238+ messages in thread From: Catalin Marinas @ 2018-08-08 15:14 UTC (permalink / raw) To: Richard Earnshaw (lists) Cc: Thomas Petazzoni, Joao Pinto, libc-alpha, Ard Biesheuvel, Jingoo Han, Will Deacon, Russell King, Linux Kernel Mailing List, Mikulas Patocka, Matt Sealey, linux-pci, linux-arm-kernel On Wed, Aug 08, 2018 at 04:01:12PM +0100, Richard Earnshaw wrote: > On 08/08/18 15:12, Mikulas Patocka wrote: > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > >> On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > >>> while (1) { > >>> start = (unsigned)random() % (LEN + 1); > >>> end = (unsigned)random() % (LEN + 1); > >>> if (start > end) > >>> continue; > >>> for (i = start; i < end; i++) > >>> data[i] = val++; > >>> memcpy(map + start, data + start, end - start); > >>> if (memcmp(map, data, LEN)) { > >> > >> It may be worth trying to do a memcmp(map+start, data+start, end-start) > >> here to see whether the hazard logic fails when the writes are unaligned > >> but the reads are not. > >> > >> This problem may as well appear if you do byte writes and read longs > >> back (and I consider this a hardware problem on this specific board). > > > > I triad to insert usleep(10000) between the memcpy and memcmp, but the > > same corruption occurs. So, it can't be read-after-write hazard. It is > > caused by the improper handling of hazard between the overlapping writes > > inside memcpy. > > I don't think you've told us what form the corruption takes. Does it > lose some bytes? Modify values beyond the copy range? Write completely > arbitrary values? >From this message: https://lore.kernel.org/lkml/alpine.LRH.2.02.1808060553130.30832@file01.intranet.prod.int.rdu2.redhat.com/ - failing to write a few bytes - writing a few bytes that were written 16 bytes before - writing a few bytes that were written 16 bytes after > The overlapping writes in memcpy never write different values to the > same location, so I still feel this must be some sort of HW issue, not a > SW one. So do I (my interpretation is that it combines or rather skips some of the writes to the same 16-byte address as it ignores the data strobes). -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-08 15:14 ` Catalin Marinas (?) @ 2018-08-08 16:01 ` Arnd Bergmann -1 siblings, 0 replies; 238+ messages in thread From: Arnd Bergmann @ 2018-08-08 16:01 UTC (permalink / raw) To: Catalin Marinas Cc: Richard.Earnshaw, Mikulas Patocka, Thomas Petazzoni, Joao Pinto, GNU C Library, Ard Biesheuvel, Jingoo Han, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, neko, linux-pci, Linux ARM On Wed, Aug 8, 2018 at 5:15 PM Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Wed, Aug 08, 2018 at 04:01:12PM +0100, Richard Earnshaw wrote: > > On 08/08/18 15:12, Mikulas Patocka wrote: > > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > > >> On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > - failing to write a few bytes > - writing a few bytes that were written 16 bytes before > - writing a few bytes that were written 16 bytes after > > > The overlapping writes in memcpy never write different values to the > > same location, so I still feel this must be some sort of HW issue, not a > > SW one. > > So do I (my interpretation is that it combines or rather skips some of > the writes to the same 16-byte address as it ignores the data strobes). Maybe it just always writes to the wrong location, 16 bytes apart for one of the stp instructions. Since we are usually dealing with a pair of overlapping 'stp', both unaligned, that could explain both the missing bytes (we write data to the wrong place, but overwrite it with the correct data right away) and the extra copy (we write it to the wrong place, but then write the correct data to the correct place as well). This sounds a bit like what the original ARM CPUs did on unaligned memory access, where a single aligned 4-byte location was accessed, but the bytes swapped around. There may be a few more things worth trying out or analysing from the recorded past failures to understand more about how it goes wrong: - For which data lengths does it fail? Having two overlapping unaligned stp is something that only happens for 16..96 byte memcpy. - What if we use a pair of str instructions instead of an stp in a modified memcpy? Does it now write to still write to the wrong place 16 bytes away, just 8 bytes away, or correctly? - Does it change in any way if we do the overlapping writes in the reverse order? E.g. for the 16..64 byte case: diff --git a/sysdeps/aarch64/memcpy.S b/sysdeps/aarch64/memcpy.S index 7e1163e6a0..09d0160bdf 100644 --- a/sysdeps/aarch64/memcpy.S +++ b/sysdeps/aarch64/memcpy.S @@ -102,11 +102,11 @@ ENTRY (MEMCPY) tbz tmp1, 5, 1f ldp B_l, B_h, [src, 16] ldp C_l, C_h, [srcend, -32] - stp B_l, B_h, [dstin, 16] stp C_l, C_h, [dstend, -32] + stp B_l, B_h, [dstin, 16] 1: - stp A_l, A_h, [dstin] stp D_l, D_h, [dstend, -16] + stp A_l, A_h, [dstin] ret .p2align 4 Arnd ^ permalink raw reply related [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 16:01 ` Arnd Bergmann 0 siblings, 0 replies; 238+ messages in thread From: Arnd Bergmann @ 2018-08-08 16:01 UTC (permalink / raw) To: linux-arm-kernel On Wed, Aug 8, 2018 at 5:15 PM Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Wed, Aug 08, 2018 at 04:01:12PM +0100, Richard Earnshaw wrote: > > On 08/08/18 15:12, Mikulas Patocka wrote: > > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > > >> On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > - failing to write a few bytes > - writing a few bytes that were written 16 bytes before > - writing a few bytes that were written 16 bytes after > > > The overlapping writes in memcpy never write different values to the > > same location, so I still feel this must be some sort of HW issue, not a > > SW one. > > So do I (my interpretation is that it combines or rather skips some of > the writes to the same 16-byte address as it ignores the data strobes). Maybe it just always writes to the wrong location, 16 bytes apart for one of the stp instructions. Since we are usually dealing with a pair of overlapping 'stp', both unaligned, that could explain both the missing bytes (we write data to the wrong place, but overwrite it with the correct data right away) and the extra copy (we write it to the wrong place, but then write the correct data to the correct place as well). This sounds a bit like what the original ARM CPUs did on unaligned memory access, where a single aligned 4-byte location was accessed, but the bytes swapped around. There may be a few more things worth trying out or analysing from the recorded past failures to understand more about how it goes wrong: - For which data lengths does it fail? Having two overlapping unaligned stp is something that only happens for 16..96 byte memcpy. - What if we use a pair of str instructions instead of an stp in a modified memcpy? Does it now write to still write to the wrong place 16 bytes away, just 8 bytes away, or correctly? - Does it change in any way if we do the overlapping writes in the reverse order? E.g. for the 16..64 byte case: diff --git a/sysdeps/aarch64/memcpy.S b/sysdeps/aarch64/memcpy.S index 7e1163e6a0..09d0160bdf 100644 --- a/sysdeps/aarch64/memcpy.S +++ b/sysdeps/aarch64/memcpy.S @@ -102,11 +102,11 @@ ENTRY (MEMCPY) tbz tmp1, 5, 1f ldp B_l, B_h, [src, 16] ldp C_l, C_h, [srcend, -32] - stp B_l, B_h, [dstin, 16] stp C_l, C_h, [dstend, -32] + stp B_l, B_h, [dstin, 16] 1: - stp A_l, A_h, [dstin] stp D_l, D_h, [dstend, -16] + stp A_l, A_h, [dstin] ret .p2align 4 Arnd ^ permalink raw reply related [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 16:01 ` Arnd Bergmann 0 siblings, 0 replies; 238+ messages in thread From: Arnd Bergmann @ 2018-08-08 16:01 UTC (permalink / raw) To: Catalin Marinas Cc: Thomas Petazzoni, Richard.Earnshaw, Joao Pinto, GNU C Library, Ard Biesheuvel, Jingoo Han, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, Mikulas Patocka, neko, linux-pci, Linux ARM On Wed, Aug 8, 2018 at 5:15 PM Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Wed, Aug 08, 2018 at 04:01:12PM +0100, Richard Earnshaw wrote: > > On 08/08/18 15:12, Mikulas Patocka wrote: > > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > > >> On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > - failing to write a few bytes > - writing a few bytes that were written 16 bytes before > - writing a few bytes that were written 16 bytes after > > > The overlapping writes in memcpy never write different values to the > > same location, so I still feel this must be some sort of HW issue, not a > > SW one. > > So do I (my interpretation is that it combines or rather skips some of > the writes to the same 16-byte address as it ignores the data strobes). Maybe it just always writes to the wrong location, 16 bytes apart for one of the stp instructions. Since we are usually dealing with a pair of overlapping 'stp', both unaligned, that could explain both the missing bytes (we write data to the wrong place, but overwrite it with the correct data right away) and the extra copy (we write it to the wrong place, but then write the correct data to the correct place as well). This sounds a bit like what the original ARM CPUs did on unaligned memory access, where a single aligned 4-byte location was accessed, but the bytes swapped around. There may be a few more things worth trying out or analysing from the recorded past failures to understand more about how it goes wrong: - For which data lengths does it fail? Having two overlapping unaligned stp is something that only happens for 16..96 byte memcpy. - What if we use a pair of str instructions instead of an stp in a modified memcpy? Does it now write to still write to the wrong place 16 bytes away, just 8 bytes away, or correctly? - Does it change in any way if we do the overlapping writes in the reverse order? E.g. for the 16..64 byte case: diff --git a/sysdeps/aarch64/memcpy.S b/sysdeps/aarch64/memcpy.S index 7e1163e6a0..09d0160bdf 100644 --- a/sysdeps/aarch64/memcpy.S +++ b/sysdeps/aarch64/memcpy.S @@ -102,11 +102,11 @@ ENTRY (MEMCPY) tbz tmp1, 5, 1f ldp B_l, B_h, [src, 16] ldp C_l, C_h, [srcend, -32] - stp B_l, B_h, [dstin, 16] stp C_l, C_h, [dstend, -32] + stp B_l, B_h, [dstin, 16] 1: - stp A_l, A_h, [dstin] stp D_l, D_h, [dstend, -16] + stp A_l, A_h, [dstin] ret .p2align 4 Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-08 16:01 ` Arnd Bergmann (?) @ 2018-08-08 18:25 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 18:25 UTC (permalink / raw) To: Arnd Bergmann Cc: Catalin Marinas, Richard.Earnshaw, Thomas Petazzoni, Joao Pinto, GNU C Library, Ard Biesheuvel, Jingoo Han, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, neko, linux-pci, Linux ARM On Wed, 8 Aug 2018, Arnd Bergmann wrote: > On Wed, Aug 8, 2018 at 5:15 PM Catalin Marinas <catalin.marinas@arm.com> wrote: > > > > On Wed, Aug 08, 2018 at 04:01:12PM +0100, Richard Earnshaw wrote: > > > On 08/08/18 15:12, Mikulas Patocka wrote: > > > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > > > >> On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > > - failing to write a few bytes > > - writing a few bytes that were written 16 bytes before > > - writing a few bytes that were written 16 bytes after > > > > > The overlapping writes in memcpy never write different values to the > > > same location, so I still feel this must be some sort of HW issue, not a > > > SW one. > > > > So do I (my interpretation is that it combines or rather skips some of > > the writes to the same 16-byte address as it ignores the data strobes). > > Maybe it just always writes to the wrong location, 16 bytes apart for one of > the stp instructions. Since we are usually dealing with a pair of overlapping > 'stp', both unaligned, that could explain both the missing bytes (we write > data to the wrong place, but overwrite it with the correct data right away) > and the extra copy (we write it to the wrong place, but then write the correct > data to the correct place as well). > > This sounds a bit like what the original ARM CPUs did on unaligned > memory access, where a single aligned 4-byte location was accessed, > but the bytes swapped around. > > There may be a few more things worth trying out or analysing from > the recorded past failures to understand more about how it goes > wrong: > > - For which data lengths does it fail? Having two overlapping > unaligned stp is something that only happens for 16..96 byte > memcpy. If you want to research the corruptions in detail, I uploaded a file containing 7k corruptions here: http://people.redhat.com/~mpatocka/testcases/arm-pcie-corruption/ > - What if we use a pair of str instructions instead of an stp in > a modified memcpy? Does it now write to still write to the > wrong place 16 bytes away, just 8 bytes away, or correctly? I replaced all stp instructions with str and it didn't have effect on corruptions. Either a few bytes is omitted, or a value that belongs 16 bytes before or after is written. > - Does it change in any way if we do the overlapping writes > in the reverse order? E.g. for the 16..64 byte case: > > diff --git a/sysdeps/aarch64/memcpy.S b/sysdeps/aarch64/memcpy.S > index 7e1163e6a0..09d0160bdf 100644 > --- a/sysdeps/aarch64/memcpy.S > +++ b/sysdeps/aarch64/memcpy.S > @@ -102,11 +102,11 @@ ENTRY (MEMCPY) > tbz tmp1, 5, 1f > ldp B_l, B_h, [src, 16] > ldp C_l, C_h, [srcend, -32] > - stp B_l, B_h, [dstin, 16] > stp C_l, C_h, [dstend, -32] > + stp B_l, B_h, [dstin, 16] > 1: > - stp A_l, A_h, [dstin] > stp D_l, D_h, [dstend, -16] > + stp A_l, A_h, [dstin] > ret > > .p2align 4 > > Arnd After reordering them, I observe only omitted writes, there are no longer misdirected writes: http://people.redhat.com/~mpatocka/testcases/arm-pcie-corruption/reorder-test/ Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 18:25 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 18:25 UTC (permalink / raw) To: linux-arm-kernel On Wed, 8 Aug 2018, Arnd Bergmann wrote: > On Wed, Aug 8, 2018 at 5:15 PM Catalin Marinas <catalin.marinas@arm.com> wrote: > > > > On Wed, Aug 08, 2018 at 04:01:12PM +0100, Richard Earnshaw wrote: > > > On 08/08/18 15:12, Mikulas Patocka wrote: > > > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > > > >> On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > > - failing to write a few bytes > > - writing a few bytes that were written 16 bytes before > > - writing a few bytes that were written 16 bytes after > > > > > The overlapping writes in memcpy never write different values to the > > > same location, so I still feel this must be some sort of HW issue, not a > > > SW one. > > > > So do I (my interpretation is that it combines or rather skips some of > > the writes to the same 16-byte address as it ignores the data strobes). > > Maybe it just always writes to the wrong location, 16 bytes apart for one of > the stp instructions. Since we are usually dealing with a pair of overlapping > 'stp', both unaligned, that could explain both the missing bytes (we write > data to the wrong place, but overwrite it with the correct data right away) > and the extra copy (we write it to the wrong place, but then write the correct > data to the correct place as well). > > This sounds a bit like what the original ARM CPUs did on unaligned > memory access, where a single aligned 4-byte location was accessed, > but the bytes swapped around. > > There may be a few more things worth trying out or analysing from > the recorded past failures to understand more about how it goes > wrong: > > - For which data lengths does it fail? Having two overlapping > unaligned stp is something that only happens for 16..96 byte > memcpy. If you want to research the corruptions in detail, I uploaded a file containing 7k corruptions here: http://people.redhat.com/~mpatocka/testcases/arm-pcie-corruption/ > - What if we use a pair of str instructions instead of an stp in > a modified memcpy? Does it now write to still write to the > wrong place 16 bytes away, just 8 bytes away, or correctly? I replaced all stp instructions with str and it didn't have effect on corruptions. Either a few bytes is omitted, or a value that belongs 16 bytes before or after is written. > - Does it change in any way if we do the overlapping writes > in the reverse order? E.g. for the 16..64 byte case: > > diff --git a/sysdeps/aarch64/memcpy.S b/sysdeps/aarch64/memcpy.S > index 7e1163e6a0..09d0160bdf 100644 > --- a/sysdeps/aarch64/memcpy.S > +++ b/sysdeps/aarch64/memcpy.S > @@ -102,11 +102,11 @@ ENTRY (MEMCPY) > tbz tmp1, 5, 1f > ldp B_l, B_h, [src, 16] > ldp C_l, C_h, [srcend, -32] > - stp B_l, B_h, [dstin, 16] > stp C_l, C_h, [dstend, -32] > + stp B_l, B_h, [dstin, 16] > 1: > - stp A_l, A_h, [dstin] > stp D_l, D_h, [dstend, -16] > + stp A_l, A_h, [dstin] > ret > > .p2align 4 > > Arnd After reordering them, I observe only omitted writes, there are no longer misdirected writes: http://people.redhat.com/~mpatocka/testcases/arm-pcie-corruption/reorder-test/ Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 18:25 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 18:25 UTC (permalink / raw) To: Arnd Bergmann Cc: Thomas Petazzoni, Richard.Earnshaw, Joao Pinto, GNU C Library, Ard Biesheuvel, linux-pci, Catalin Marinas, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, neko, Jingoo Han, Linux ARM On Wed, 8 Aug 2018, Arnd Bergmann wrote: > On Wed, Aug 8, 2018 at 5:15 PM Catalin Marinas <catalin.marinas@arm.com> wrote: > > > > On Wed, Aug 08, 2018 at 04:01:12PM +0100, Richard Earnshaw wrote: > > > On 08/08/18 15:12, Mikulas Patocka wrote: > > > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > > > >> On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > > - failing to write a few bytes > > - writing a few bytes that were written 16 bytes before > > - writing a few bytes that were written 16 bytes after > > > > > The overlapping writes in memcpy never write different values to the > > > same location, so I still feel this must be some sort of HW issue, not a > > > SW one. > > > > So do I (my interpretation is that it combines or rather skips some of > > the writes to the same 16-byte address as it ignores the data strobes). > > Maybe it just always writes to the wrong location, 16 bytes apart for one of > the stp instructions. Since we are usually dealing with a pair of overlapping > 'stp', both unaligned, that could explain both the missing bytes (we write > data to the wrong place, but overwrite it with the correct data right away) > and the extra copy (we write it to the wrong place, but then write the correct > data to the correct place as well). > > This sounds a bit like what the original ARM CPUs did on unaligned > memory access, where a single aligned 4-byte location was accessed, > but the bytes swapped around. > > There may be a few more things worth trying out or analysing from > the recorded past failures to understand more about how it goes > wrong: > > - For which data lengths does it fail? Having two overlapping > unaligned stp is something that only happens for 16..96 byte > memcpy. If you want to research the corruptions in detail, I uploaded a file containing 7k corruptions here: http://people.redhat.com/~mpatocka/testcases/arm-pcie-corruption/ > - What if we use a pair of str instructions instead of an stp in > a modified memcpy? Does it now write to still write to the > wrong place 16 bytes away, just 8 bytes away, or correctly? I replaced all stp instructions with str and it didn't have effect on corruptions. Either a few bytes is omitted, or a value that belongs 16 bytes before or after is written. > - Does it change in any way if we do the overlapping writes > in the reverse order? E.g. for the 16..64 byte case: > > diff --git a/sysdeps/aarch64/memcpy.S b/sysdeps/aarch64/memcpy.S > index 7e1163e6a0..09d0160bdf 100644 > --- a/sysdeps/aarch64/memcpy.S > +++ b/sysdeps/aarch64/memcpy.S > @@ -102,11 +102,11 @@ ENTRY (MEMCPY) > tbz tmp1, 5, 1f > ldp B_l, B_h, [src, 16] > ldp C_l, C_h, [srcend, -32] > - stp B_l, B_h, [dstin, 16] > stp C_l, C_h, [dstend, -32] > + stp B_l, B_h, [dstin, 16] > 1: > - stp A_l, A_h, [dstin] > stp D_l, D_h, [dstend, -16] > + stp A_l, A_h, [dstin] > ret > > .p2align 4 > > Arnd After reordering them, I observe only omitted writes, there are no longer misdirected writes: http://people.redhat.com/~mpatocka/testcases/arm-pcie-corruption/reorder-test/ Mikulas _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-08 18:25 ` Mikulas Patocka (?) @ 2018-08-08 21:51 ` Arnd Bergmann -1 siblings, 0 replies; 238+ messages in thread From: Arnd Bergmann @ 2018-08-08 21:51 UTC (permalink / raw) To: Mikulas Patocka Cc: Catalin Marinas, Richard.Earnshaw, Thomas Petazzoni, Joao Pinto, GNU C Library, Ard Biesheuvel, Jingoo Han, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, neko, linux-pci, Linux ARM On Wed, Aug 8, 2018 at 8:25 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > On Wed, 8 Aug 2018, Arnd Bergmann wrote: > > > On Wed, Aug 8, 2018 at 5:15 PM Catalin Marinas <catalin.marinas@arm.com> wrote: > > > > > > On Wed, Aug 08, 2018 at 04:01:12PM +0100, Richard Earnshaw wrote: > > > > On 08/08/18 15:12, Mikulas Patocka wrote: > > > > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > > > > >> On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > > > - failing to write a few bytes > > > - writing a few bytes that were written 16 bytes before > > > - writing a few bytes that were written 16 bytes after > > > > > > > The overlapping writes in memcpy never write different values to the > > > > same location, so I still feel this must be some sort of HW issue, not a > > > > SW one. > > > > > > So do I (my interpretation is that it combines or rather skips some of > > > the writes to the same 16-byte address as it ignores the data strobes). > > > > Maybe it just always writes to the wrong location, 16 bytes apart for one of > > the stp instructions. Since we are usually dealing with a pair of overlapping > > 'stp', both unaligned, that could explain both the missing bytes (we write > > data to the wrong place, but overwrite it with the correct data right away) > > and the extra copy (we write it to the wrong place, but then write the correct > > data to the correct place as well). > > > > This sounds a bit like what the original ARM CPUs did on unaligned > > memory access, where a single aligned 4-byte location was accessed, > > but the bytes swapped around. > > > > There may be a few more things worth trying out or analysing from > > the recorded past failures to understand more about how it goes > > wrong: > > > > - For which data lengths does it fail? Having two overlapping > > unaligned stp is something that only happens for 16..96 byte > > memcpy. > > If you want to research the corruptions in detail, I uploaded a file > containing 7k corruptions here: > http://people.redhat.com/~mpatocka/testcases/arm-pcie-corruption/ Nice! I already found a couple of things: - Failure to copy always happens at the *end* of a 16 byte aligned physical address, it misses between 1 and 6 bytes, never 7 or more, and it's more likely to be fewer bytes that are affected. 279 7 389 6 484 5 683 4 741 3 836 2 946 1 - The first byte that fails to get copied is always 16 bytes after the memcpy target. Since we only observe it at the end of the 16 byte range, it means this happens specifically for addresses ending in 0x9 (7 bytes missed) to 0xf (1 byte missed). - Out of 7445 corruptions, 4358 were of the kind that misses a copy at the end of a 16-byte area, they were for copies between 41 and 64 bytes, more to the larger end of the scale (note that with your test program, smaller memcpys happen more frequenly than larger ones). 47 0x29 36 0x2a 47 0x2b 23 0x2c 29 0x2d 31 0x2e 36 0x2f 46 0x30 45 0x31 51 0x32 62 0x33 64 0x34 77 0x35 91 0x36 90 0x37 100 0x38 100 0x39 209 0x3a 279 0x3b 366 0x3c 498 0x3d 602 0x3e 682 0x3f 747 0x40 - All corruption with data copied to the wrong place happened for copies between 33 and 47 bytes, mostly to the smaller end of the scale: 391 0x21 360 0x22 319 0x23 273 0x24 273 0x25 241 0x26 224 0x27 221 0x28 231 0x29 208 0x2a 163 0x2b 86 0x2c 63 0x2d 33 0x2e 1 0x2f - One common (but not the only, still investigating) case for data getting written to the wrong place is: * corruption starts 16 bytes after the memcpy start * corrupt bytes are the same as the bytes written to the start * start address ends in 0x1 through 0x7 * length of corruption is at most memcpy length- 32, always between 1 and 7. Arnd ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 21:51 ` Arnd Bergmann 0 siblings, 0 replies; 238+ messages in thread From: Arnd Bergmann @ 2018-08-08 21:51 UTC (permalink / raw) To: linux-arm-kernel On Wed, Aug 8, 2018 at 8:25 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > On Wed, 8 Aug 2018, Arnd Bergmann wrote: > > > On Wed, Aug 8, 2018 at 5:15 PM Catalin Marinas <catalin.marinas@arm.com> wrote: > > > > > > On Wed, Aug 08, 2018 at 04:01:12PM +0100, Richard Earnshaw wrote: > > > > On 08/08/18 15:12, Mikulas Patocka wrote: > > > > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > > > > >> On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > > > - failing to write a few bytes > > > - writing a few bytes that were written 16 bytes before > > > - writing a few bytes that were written 16 bytes after > > > > > > > The overlapping writes in memcpy never write different values to the > > > > same location, so I still feel this must be some sort of HW issue, not a > > > > SW one. > > > > > > So do I (my interpretation is that it combines or rather skips some of > > > the writes to the same 16-byte address as it ignores the data strobes). > > > > Maybe it just always writes to the wrong location, 16 bytes apart for one of > > the stp instructions. Since we are usually dealing with a pair of overlapping > > 'stp', both unaligned, that could explain both the missing bytes (we write > > data to the wrong place, but overwrite it with the correct data right away) > > and the extra copy (we write it to the wrong place, but then write the correct > > data to the correct place as well). > > > > This sounds a bit like what the original ARM CPUs did on unaligned > > memory access, where a single aligned 4-byte location was accessed, > > but the bytes swapped around. > > > > There may be a few more things worth trying out or analysing from > > the recorded past failures to understand more about how it goes > > wrong: > > > > - For which data lengths does it fail? Having two overlapping > > unaligned stp is something that only happens for 16..96 byte > > memcpy. > > If you want to research the corruptions in detail, I uploaded a file > containing 7k corruptions here: > http://people.redhat.com/~mpatocka/testcases/arm-pcie-corruption/ Nice! I already found a couple of things: - Failure to copy always happens at the *end* of a 16 byte aligned physical address, it misses between 1 and 6 bytes, never 7 or more, and it's more likely to be fewer bytes that are affected. 279 7 389 6 484 5 683 4 741 3 836 2 946 1 - The first byte that fails to get copied is always 16 bytes after the memcpy target. Since we only observe it at the end of the 16 byte range, it means this happens specifically for addresses ending in 0x9 (7 bytes missed) to 0xf (1 byte missed). - Out of 7445 corruptions, 4358 were of the kind that misses a copy at the end of a 16-byte area, they were for copies between 41 and 64 bytes, more to the larger end of the scale (note that with your test program, smaller memcpys happen more frequenly than larger ones). 47 0x29 36 0x2a 47 0x2b 23 0x2c 29 0x2d 31 0x2e 36 0x2f 46 0x30 45 0x31 51 0x32 62 0x33 64 0x34 77 0x35 91 0x36 90 0x37 100 0x38 100 0x39 209 0x3a 279 0x3b 366 0x3c 498 0x3d 602 0x3e 682 0x3f 747 0x40 - All corruption with data copied to the wrong place happened for copies between 33 and 47 bytes, mostly to the smaller end of the scale: 391 0x21 360 0x22 319 0x23 273 0x24 273 0x25 241 0x26 224 0x27 221 0x28 231 0x29 208 0x2a 163 0x2b 86 0x2c 63 0x2d 33 0x2e 1 0x2f - One common (but not the only, still investigating) case for data getting written to the wrong place is: * corruption starts 16 bytes after the memcpy start * corrupt bytes are the same as the bytes written to the start * start address ends in 0x1 through 0x7 * length of corruption is at most memcpy length- 32, always between 1 and 7. Arnd ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 21:51 ` Arnd Bergmann 0 siblings, 0 replies; 238+ messages in thread From: Arnd Bergmann @ 2018-08-08 21:51 UTC (permalink / raw) To: Mikulas Patocka Cc: Thomas Petazzoni, Richard.Earnshaw, Joao Pinto, GNU C Library, Ard Biesheuvel, linux-pci, Catalin Marinas, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, neko, Jingoo Han, Linux ARM On Wed, Aug 8, 2018 at 8:25 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > On Wed, 8 Aug 2018, Arnd Bergmann wrote: > > > On Wed, Aug 8, 2018 at 5:15 PM Catalin Marinas <catalin.marinas@arm.com> wrote: > > > > > > On Wed, Aug 08, 2018 at 04:01:12PM +0100, Richard Earnshaw wrote: > > > > On 08/08/18 15:12, Mikulas Patocka wrote: > > > > > On Wed, 8 Aug 2018, Catalin Marinas wrote: > > > > >> On Fri, Aug 03, 2018 at 01:09:02PM -0400, Mikulas Patocka wrote: > > > - failing to write a few bytes > > > - writing a few bytes that were written 16 bytes before > > > - writing a few bytes that were written 16 bytes after > > > > > > > The overlapping writes in memcpy never write different values to the > > > > same location, so I still feel this must be some sort of HW issue, not a > > > > SW one. > > > > > > So do I (my interpretation is that it combines or rather skips some of > > > the writes to the same 16-byte address as it ignores the data strobes). > > > > Maybe it just always writes to the wrong location, 16 bytes apart for one of > > the stp instructions. Since we are usually dealing with a pair of overlapping > > 'stp', both unaligned, that could explain both the missing bytes (we write > > data to the wrong place, but overwrite it with the correct data right away) > > and the extra copy (we write it to the wrong place, but then write the correct > > data to the correct place as well). > > > > This sounds a bit like what the original ARM CPUs did on unaligned > > memory access, where a single aligned 4-byte location was accessed, > > but the bytes swapped around. > > > > There may be a few more things worth trying out or analysing from > > the recorded past failures to understand more about how it goes > > wrong: > > > > - For which data lengths does it fail? Having two overlapping > > unaligned stp is something that only happens for 16..96 byte > > memcpy. > > If you want to research the corruptions in detail, I uploaded a file > containing 7k corruptions here: > http://people.redhat.com/~mpatocka/testcases/arm-pcie-corruption/ Nice! I already found a couple of things: - Failure to copy always happens at the *end* of a 16 byte aligned physical address, it misses between 1 and 6 bytes, never 7 or more, and it's more likely to be fewer bytes that are affected. 279 7 389 6 484 5 683 4 741 3 836 2 946 1 - The first byte that fails to get copied is always 16 bytes after the memcpy target. Since we only observe it at the end of the 16 byte range, it means this happens specifically for addresses ending in 0x9 (7 bytes missed) to 0xf (1 byte missed). - Out of 7445 corruptions, 4358 were of the kind that misses a copy at the end of a 16-byte area, they were for copies between 41 and 64 bytes, more to the larger end of the scale (note that with your test program, smaller memcpys happen more frequenly than larger ones). 47 0x29 36 0x2a 47 0x2b 23 0x2c 29 0x2d 31 0x2e 36 0x2f 46 0x30 45 0x31 51 0x32 62 0x33 64 0x34 77 0x35 91 0x36 90 0x37 100 0x38 100 0x39 209 0x3a 279 0x3b 366 0x3c 498 0x3d 602 0x3e 682 0x3f 747 0x40 - All corruption with data copied to the wrong place happened for copies between 33 and 47 bytes, mostly to the smaller end of the scale: 391 0x21 360 0x22 319 0x23 273 0x24 273 0x25 241 0x26 224 0x27 221 0x28 231 0x29 208 0x2a 163 0x2b 86 0x2c 63 0x2d 33 0x2e 1 0x2f - One common (but not the only, still investigating) case for data getting written to the wrong place is: * corruption starts 16 bytes after the memcpy start * corrupt bytes are the same as the bytes written to the start * start address ends in 0x1 through 0x7 * length of corruption is at most memcpy length- 32, always between 1 and 7. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-08 21:51 ` Arnd Bergmann (?) @ 2018-08-09 15:29 ` Arnd Bergmann -1 siblings, 0 replies; 238+ messages in thread From: Arnd Bergmann @ 2018-08-09 15:29 UTC (permalink / raw) To: Mikulas Patocka Cc: Catalin Marinas, Richard.Earnshaw, Thomas Petazzoni, Joao Pinto, GNU C Library, Ard Biesheuvel, Jingoo Han, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, neko, linux-pci, Linux ARM On Wed, Aug 8, 2018 at 11:51 PM Arnd Bergmann <arnd@arndb.de> wrote: > I already found a couple of things: > > - Failure to copy always happens at the *end* of a 16 byte aligned > physical address, it misses between 1 and 6 bytes, never 7 or more, > and it's more likely to be fewer bytes that are affected. > > - The first byte that fails to get copied is always 16 bytes after the > memcpy target. Since we only observe it at the end of the 16 byte > range, it means this happens specifically for addresses ending in > 0x9 (7 bytes missed) to 0xf (1 byte missed). > > - Out of 7445 corruptions, 4358 were of the kind that misses a copy at the > end of a 16-byte area, they were for copies between 41 and 64 bytes, > more to the larger end of the scale (note that with your test program, > smaller memcpys happen more frequenly than larger ones). Thinking about it some more, this scenario can be explained by a read-modify-write logic gone wrong somewhere in the hardware, leading to the original bytes being written back after we write the correct data. The code path we hit most commonly in glibc is like this one: // offset = 0xd, could be 0x9..0xf + n*0x10 // length = 0x3f, could be 0x29..0x3f memcpy(map + 0xd, data + 0xd, 0x3f); stp B_l, B_h, [dstin, 16] # offset 0x1d stp C_l, C_h, [dstend, -32] # offset 0x2c stp A_l, A_h, [dstin] # offset 0x0d stp D_l, D_h, [dstend, -16] # offset 0x3c The corruption here always appears in bytes 0x1d..0x1f. A theory that matches this corruption is that the stores for B, C and D get combined into write transaction of length 0x2f, spanning bytes 0x1d..0x4b in the map. This may prefetch either 8 bytes at 0x18 or 16 bytes at 0x10 into a temporary HW buffer, which gets modified with the correct data for 0x1d..0x1f before writing back that prefetched data. The key here is the write of A to offset 0x0d..0x1c. This also prefetches the data at 0x18..0x1f, and modifies the bytes ..1c in it. When this is prefetched before the first write, but written back after it, offsets 0x1d..0x1f have the original data again! Variations that trigger the same thing include the modified sequence: stp C_l, C_h, [dstend, -32] # offset 0x2c stp B_l, B_h, [dstin, 16] # offset 0x1d stp D_l, D_h, [dstend, -16] # offset 0x3c stp A_l, A_h, [dstin] # offset 0x0d and the special case for 64 byte memcpy that uses a completely different sequence, either (original, corruption is common for 64 byte) stp A_l, A_h, [dstin] # offset 0x0d stp B_l, B_h, [dstin, 16] # offset 0x1d stp C_l, C_h, [dstin, 32] # offset 0x2d stp D_l, D_h, [dstin, 48] # offset 0x3d stp E_l, E_h, [dstend, -32] # offset 0x2d again stp F_l, F_h, [dstend, -16] # offset 0x3d again or (patched libc, corruption happens very rarely for 64 byte compared to other sizes) stp E_l, E_h, [dstend, -32] # offset 0x2d stp F_l, F_h, [dstend, -16] # offset 0x3d stp A_l, A_h, [dstin] # offset 0x0d stp B_l, B_h, [dstin, 16] # offset 0x1d stp C_l, C_h, [dstin, 32] # offset 0x2d again stp D_l, D_h, [dstin, 48] # offset 0x3d again The corruption for both also happens at 0x1d..0x1f, which unfortunately is not easily explained by the theory above, but maybe my glibc sources are slightly different from the ones that were used on the system. > - All corruption with data copied to the wrong place happened for copies > between 33 and 47 bytes, mostly to the smaller end of the scale: > 391 0x21 > 360 0x22 ... > 33 0x2e > 1 0x2f > > - One common (but not the only, still investigating) case for data getting > written to the wrong place is: > * corruption starts 16 bytes after the memcpy start > * corrupt bytes are the same as the bytes written to the start > * start address ends in 0x1 through 0x7 > * length of corruption is at most memcpy length- 32, always > between 1 and 7. This is only observed with the original sequence (B, C, A, D) in glibc, and only when C overlaps with both A and B. A typical example would be // offset = 0x02, can be [0x01..0x07,0x09..0x0f] + n*0x10 // length = 0x23, could be 0x21..0x2f memcpy(map + 0x2, data + 0x2, 0x23); stp B_l, B_h, [dstin, 16] # offset 0x22 stp C_l, C_h, [dstend, -32] # offset 0x15 stp A_l, A_h, [dstin] # offset 0x12 stp D_l, D_h, [dstend, -16] # offset 0x25 In this example, bytes 0x22..0x24 incorrectly contain the data that was written to bytes 0x12..0x14. I would guess that only the stores to C and D get combined here, so we actually have three separate store transactions rather than the two in the first example. Each of the three stores touches data in the 0x20..0x2f range, and these are the transactions that might happen on them, assuming there is a read-modify-write logic somewhere: B1: prefetch 0x20..0x21, modify 0x22..0x2f, store 0x20 CD1: modify 0x20..0x24 A1: prefetch 0x10.0x11, modify 0x12..0x1f, store 0x10 A2: prefetch 0x22..0x2f, modify 0x20..0x21, store 0x20 CD2: modify 0x25..0x2f, store The observation is that data from the A1 stage at offset 0x12..0x14 ends up in the CD buffer, which I can't yet explain simply by doing steps in the wrong order, it still requires something to also confuse two buffers. I've also shown that the length of the corruption strictly depends on the start and end pointer values, and put it up in a spreadsheet at [1]. The case of writing the data to the wrong place happens exactly when A and C are within the same 16-byte aligned range, while writing back the old data (or not writing at all) happens exactly when C writes to the same 16-byte range as the end of B, but doesn't overlap with A. Also, if either pointer is 8-byte aligned, everything is fine. Arnd [1] https://docs.google.com/spreadsheets/d/1zlDMNAgF--5n0zQmfV3JBzkhdSrUNtwSXIHZH-fqRio/edit#gid=0 ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-09 15:29 ` Arnd Bergmann 0 siblings, 0 replies; 238+ messages in thread From: Arnd Bergmann @ 2018-08-09 15:29 UTC (permalink / raw) To: linux-arm-kernel On Wed, Aug 8, 2018 at 11:51 PM Arnd Bergmann <arnd@arndb.de> wrote: > I already found a couple of things: > > - Failure to copy always happens at the *end* of a 16 byte aligned > physical address, it misses between 1 and 6 bytes, never 7 or more, > and it's more likely to be fewer bytes that are affected. > > - The first byte that fails to get copied is always 16 bytes after the > memcpy target. Since we only observe it at the end of the 16 byte > range, it means this happens specifically for addresses ending in > 0x9 (7 bytes missed) to 0xf (1 byte missed). > > - Out of 7445 corruptions, 4358 were of the kind that misses a copy at the > end of a 16-byte area, they were for copies between 41 and 64 bytes, > more to the larger end of the scale (note that with your test program, > smaller memcpys happen more frequenly than larger ones). Thinking about it some more, this scenario can be explained by a read-modify-write logic gone wrong somewhere in the hardware, leading to the original bytes being written back after we write the correct data. The code path we hit most commonly in glibc is like this one: // offset = 0xd, could be 0x9..0xf + n*0x10 // length = 0x3f, could be 0x29..0x3f memcpy(map + 0xd, data + 0xd, 0x3f); stp B_l, B_h, [dstin, 16] # offset 0x1d stp C_l, C_h, [dstend, -32] # offset 0x2c stp A_l, A_h, [dstin] # offset 0x0d stp D_l, D_h, [dstend, -16] # offset 0x3c The corruption here always appears in bytes 0x1d..0x1f. A theory that matches this corruption is that the stores for B, C and D get combined into write transaction of length 0x2f, spanning bytes 0x1d..0x4b in the map. This may prefetch either 8 bytes at 0x18 or 16 bytes at 0x10 into a temporary HW buffer, which gets modified with the correct data for 0x1d..0x1f before writing back that prefetched data. The key here is the write of A to offset 0x0d..0x1c. This also prefetches the data at 0x18..0x1f, and modifies the bytes ..1c in it. When this is prefetched before the first write, but written back after it, offsets 0x1d..0x1f have the original data again! Variations that trigger the same thing include the modified sequence: stp C_l, C_h, [dstend, -32] # offset 0x2c stp B_l, B_h, [dstin, 16] # offset 0x1d stp D_l, D_h, [dstend, -16] # offset 0x3c stp A_l, A_h, [dstin] # offset 0x0d and the special case for 64 byte memcpy that uses a completely different sequence, either (original, corruption is common for 64 byte) stp A_l, A_h, [dstin] # offset 0x0d stp B_l, B_h, [dstin, 16] # offset 0x1d stp C_l, C_h, [dstin, 32] # offset 0x2d stp D_l, D_h, [dstin, 48] # offset 0x3d stp E_l, E_h, [dstend, -32] # offset 0x2d again stp F_l, F_h, [dstend, -16] # offset 0x3d again or (patched libc, corruption happens very rarely for 64 byte compared to other sizes) stp E_l, E_h, [dstend, -32] # offset 0x2d stp F_l, F_h, [dstend, -16] # offset 0x3d stp A_l, A_h, [dstin] # offset 0x0d stp B_l, B_h, [dstin, 16] # offset 0x1d stp C_l, C_h, [dstin, 32] # offset 0x2d again stp D_l, D_h, [dstin, 48] # offset 0x3d again The corruption for both also happens at 0x1d..0x1f, which unfortunately is not easily explained by the theory above, but maybe my glibc sources are slightly different from the ones that were used on the system. > - All corruption with data copied to the wrong place happened for copies > between 33 and 47 bytes, mostly to the smaller end of the scale: > 391 0x21 > 360 0x22 ... > 33 0x2e > 1 0x2f > > - One common (but not the only, still investigating) case for data getting > written to the wrong place is: > * corruption starts 16 bytes after the memcpy start > * corrupt bytes are the same as the bytes written to the start > * start address ends in 0x1 through 0x7 > * length of corruption is at most memcpy length- 32, always > between 1 and 7. This is only observed with the original sequence (B, C, A, D) in glibc, and only when C overlaps with both A and B. A typical example would be // offset = 0x02, can be [0x01..0x07,0x09..0x0f] + n*0x10 // length = 0x23, could be 0x21..0x2f memcpy(map + 0x2, data + 0x2, 0x23); stp B_l, B_h, [dstin, 16] # offset 0x22 stp C_l, C_h, [dstend, -32] # offset 0x15 stp A_l, A_h, [dstin] # offset 0x12 stp D_l, D_h, [dstend, -16] # offset 0x25 In this example, bytes 0x22..0x24 incorrectly contain the data that was written to bytes 0x12..0x14. I would guess that only the stores to C and D get combined here, so we actually have three separate store transactions rather than the two in the first example. Each of the three stores touches data in the 0x20..0x2f range, and these are the transactions that might happen on them, assuming there is a read-modify-write logic somewhere: B1: prefetch 0x20..0x21, modify 0x22..0x2f, store 0x20 CD1: modify 0x20..0x24 A1: prefetch 0x10.0x11, modify 0x12..0x1f, store 0x10 A2: prefetch 0x22..0x2f, modify 0x20..0x21, store 0x20 CD2: modify 0x25..0x2f, store The observation is that data from the A1 stage at offset 0x12..0x14 ends up in the CD buffer, which I can't yet explain simply by doing steps in the wrong order, it still requires something to also confuse two buffers. I've also shown that the length of the corruption strictly depends on the start and end pointer values, and put it up in a spreadsheet at [1]. The case of writing the data to the wrong place happens exactly when A and C are within the same 16-byte aligned range, while writing back the old data (or not writing at all) happens exactly when C writes to the same 16-byte range as the end of B, but doesn't overlap with A. Also, if either pointer is 8-byte aligned, everything is fine. Arnd [1] https://docs.google.com/spreadsheets/d/1zlDMNAgF--5n0zQmfV3JBzkhdSrUNtwSXIHZH-fqRio/edit#gid=0 ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-09 15:29 ` Arnd Bergmann 0 siblings, 0 replies; 238+ messages in thread From: Arnd Bergmann @ 2018-08-09 15:29 UTC (permalink / raw) To: Mikulas Patocka Cc: Thomas Petazzoni, Richard.Earnshaw, Joao Pinto, GNU C Library, Ard Biesheuvel, linux-pci, Catalin Marinas, Will Deacon, Russell King - ARM Linux, Linux Kernel Mailing List, neko, Jingoo Han, Linux ARM On Wed, Aug 8, 2018 at 11:51 PM Arnd Bergmann <arnd@arndb.de> wrote: > I already found a couple of things: > > - Failure to copy always happens at the *end* of a 16 byte aligned > physical address, it misses between 1 and 6 bytes, never 7 or more, > and it's more likely to be fewer bytes that are affected. > > - The first byte that fails to get copied is always 16 bytes after the > memcpy target. Since we only observe it at the end of the 16 byte > range, it means this happens specifically for addresses ending in > 0x9 (7 bytes missed) to 0xf (1 byte missed). > > - Out of 7445 corruptions, 4358 were of the kind that misses a copy at the > end of a 16-byte area, they were for copies between 41 and 64 bytes, > more to the larger end of the scale (note that with your test program, > smaller memcpys happen more frequenly than larger ones). Thinking about it some more, this scenario can be explained by a read-modify-write logic gone wrong somewhere in the hardware, leading to the original bytes being written back after we write the correct data. The code path we hit most commonly in glibc is like this one: // offset = 0xd, could be 0x9..0xf + n*0x10 // length = 0x3f, could be 0x29..0x3f memcpy(map + 0xd, data + 0xd, 0x3f); stp B_l, B_h, [dstin, 16] # offset 0x1d stp C_l, C_h, [dstend, -32] # offset 0x2c stp A_l, A_h, [dstin] # offset 0x0d stp D_l, D_h, [dstend, -16] # offset 0x3c The corruption here always appears in bytes 0x1d..0x1f. A theory that matches this corruption is that the stores for B, C and D get combined into write transaction of length 0x2f, spanning bytes 0x1d..0x4b in the map. This may prefetch either 8 bytes at 0x18 or 16 bytes at 0x10 into a temporary HW buffer, which gets modified with the correct data for 0x1d..0x1f before writing back that prefetched data. The key here is the write of A to offset 0x0d..0x1c. This also prefetches the data at 0x18..0x1f, and modifies the bytes ..1c in it. When this is prefetched before the first write, but written back after it, offsets 0x1d..0x1f have the original data again! Variations that trigger the same thing include the modified sequence: stp C_l, C_h, [dstend, -32] # offset 0x2c stp B_l, B_h, [dstin, 16] # offset 0x1d stp D_l, D_h, [dstend, -16] # offset 0x3c stp A_l, A_h, [dstin] # offset 0x0d and the special case for 64 byte memcpy that uses a completely different sequence, either (original, corruption is common for 64 byte) stp A_l, A_h, [dstin] # offset 0x0d stp B_l, B_h, [dstin, 16] # offset 0x1d stp C_l, C_h, [dstin, 32] # offset 0x2d stp D_l, D_h, [dstin, 48] # offset 0x3d stp E_l, E_h, [dstend, -32] # offset 0x2d again stp F_l, F_h, [dstend, -16] # offset 0x3d again or (patched libc, corruption happens very rarely for 64 byte compared to other sizes) stp E_l, E_h, [dstend, -32] # offset 0x2d stp F_l, F_h, [dstend, -16] # offset 0x3d stp A_l, A_h, [dstin] # offset 0x0d stp B_l, B_h, [dstin, 16] # offset 0x1d stp C_l, C_h, [dstin, 32] # offset 0x2d again stp D_l, D_h, [dstin, 48] # offset 0x3d again The corruption for both also happens at 0x1d..0x1f, which unfortunately is not easily explained by the theory above, but maybe my glibc sources are slightly different from the ones that were used on the system. > - All corruption with data copied to the wrong place happened for copies > between 33 and 47 bytes, mostly to the smaller end of the scale: > 391 0x21 > 360 0x22 ... > 33 0x2e > 1 0x2f > > - One common (but not the only, still investigating) case for data getting > written to the wrong place is: > * corruption starts 16 bytes after the memcpy start > * corrupt bytes are the same as the bytes written to the start > * start address ends in 0x1 through 0x7 > * length of corruption is at most memcpy length- 32, always > between 1 and 7. This is only observed with the original sequence (B, C, A, D) in glibc, and only when C overlaps with both A and B. A typical example would be // offset = 0x02, can be [0x01..0x07,0x09..0x0f] + n*0x10 // length = 0x23, could be 0x21..0x2f memcpy(map + 0x2, data + 0x2, 0x23); stp B_l, B_h, [dstin, 16] # offset 0x22 stp C_l, C_h, [dstend, -32] # offset 0x15 stp A_l, A_h, [dstin] # offset 0x12 stp D_l, D_h, [dstend, -16] # offset 0x25 In this example, bytes 0x22..0x24 incorrectly contain the data that was written to bytes 0x12..0x14. I would guess that only the stores to C and D get combined here, so we actually have three separate store transactions rather than the two in the first example. Each of the three stores touches data in the 0x20..0x2f range, and these are the transactions that might happen on them, assuming there is a read-modify-write logic somewhere: B1: prefetch 0x20..0x21, modify 0x22..0x2f, store 0x20 CD1: modify 0x20..0x24 A1: prefetch 0x10.0x11, modify 0x12..0x1f, store 0x10 A2: prefetch 0x22..0x2f, modify 0x20..0x21, store 0x20 CD2: modify 0x25..0x2f, store The observation is that data from the A1 stage at offset 0x12..0x14 ends up in the CD buffer, which I can't yet explain simply by doing steps in the wrong order, it still requires something to also confuse two buffers. I've also shown that the length of the corruption strictly depends on the start and end pointer values, and put it up in a spreadsheet at [1]. The case of writing the data to the wrong place happens exactly when A and C are within the same 16-byte aligned range, while writing back the old data (or not writing at all) happens exactly when C writes to the same 16-byte range as the end of B, but doesn't overlap with A. Also, if either pointer is 8-byte aligned, everything is fine. Arnd [1] https://docs.google.com/spreadsheets/d/1zlDMNAgF--5n0zQmfV3JBzkhdSrUNtwSXIHZH-fqRio/edit#gid=0 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-02 19:31 ` Mikulas Patocka @ 2018-08-03 7:11 ` Andrew Pinski -1 siblings, 0 replies; 238+ messages in thread From: Andrew Pinski @ 2018-08-03 7:11 UTC (permalink / raw) To: mpatocka Cc: Catalin Marinas, Will Deacon, linux, thomas.petazzoni, linux-arm-kernel, LKML, GNU C Library On Thu, Aug 2, 2018 at 12:31 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > Hi > > I tried to use a PCIe graphics card on the MacchiatoBIN board and I hit a > strange problem. > > When I use the links browser in graphics mode on the framebuffer, I get > occasional pixel corruption. Links does memcpy, memset and 4-byte writes > on the framebuffer - nothing else. > > I found out that the pixel corruption is caused by overlapping unaligned > stp instructions inside memcpy. In order to avoid branching, the arm64 > memcpy implementation may write the same destination twice with different > alignment. If I put "dmb sy" between the overlapping stp instructions, the > pixel corruption goes away. > > This seems like a hardware bug. Is it a known errata? Do you have any > workarounds for it? Yes fix Links not to use memcpy on the framebuffer. It is undefined behavior to use device memory with memcpy. Thanks, Andrew Pinski > > I tried AMD card (HD 6350) and NVidia (NVS 285) and both exhibit the same > corruption. OpenGL doesn't work (it results in artifacts on the AMD card > and lock-up on the NVidia card), but it's quite expected if even simple > writing to the framebuffer doesn't work. > > Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 7:11 ` Andrew Pinski 0 siblings, 0 replies; 238+ messages in thread From: Andrew Pinski @ 2018-08-03 7:11 UTC (permalink / raw) To: linux-arm-kernel On Thu, Aug 2, 2018 at 12:31 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > Hi > > I tried to use a PCIe graphics card on the MacchiatoBIN board and I hit a > strange problem. > > When I use the links browser in graphics mode on the framebuffer, I get > occasional pixel corruption. Links does memcpy, memset and 4-byte writes > on the framebuffer - nothing else. > > I found out that the pixel corruption is caused by overlapping unaligned > stp instructions inside memcpy. In order to avoid branching, the arm64 > memcpy implementation may write the same destination twice with different > alignment. If I put "dmb sy" between the overlapping stp instructions, the > pixel corruption goes away. > > This seems like a hardware bug. Is it a known errata? Do you have any > workarounds for it? Yes fix Links not to use memcpy on the framebuffer. It is undefined behavior to use device memory with memcpy. Thanks, Andrew Pinski > > I tried AMD card (HD 6350) and NVidia (NVS 285) and both exhibit the same > corruption. OpenGL doesn't work (it results in artifacts on the AMD card > and lock-up on the NVidia card), but it's quite expected if even simple > writing to the framebuffer doesn't work. > > Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 7:11 ` Andrew Pinski @ 2018-08-03 7:53 ` Florian Weimer -1 siblings, 0 replies; 238+ messages in thread From: Florian Weimer @ 2018-08-03 7:53 UTC (permalink / raw) To: Andrew Pinski, mpatocka Cc: Catalin Marinas, Will Deacon, linux, thomas.petazzoni, linux-arm-kernel, LKML, GNU C Library On 08/03/2018 09:11 AM, Andrew Pinski wrote: > Yes fix Links not to use memcpy on the framebuffer. > It is undefined behavior to use device memory with memcpy. Some (de facto) ABIs require that it is supported, though. For example, the POWER string functions avoid unaligned loads and stores for this reason because the platform has the same issue with device memory. And yes, GCC will expand memcpy on POWER to something that is incompatible with device memory. 8-( If we don't want people to use memcpy, we probably need to provide a credible alternative. Thanks, Florian ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 7:53 ` Florian Weimer 0 siblings, 0 replies; 238+ messages in thread From: Florian Weimer @ 2018-08-03 7:53 UTC (permalink / raw) To: linux-arm-kernel On 08/03/2018 09:11 AM, Andrew Pinski wrote: > Yes fix Links not to use memcpy on the framebuffer. > It is undefined behavior to use device memory with memcpy. Some (de facto) ABIs require that it is supported, though. For example, the POWER string functions avoid unaligned loads and stores for this reason because the platform has the same issue with device memory. And yes, GCC will expand memcpy on POWER to something that is incompatible with device memory. 8-( If we don't want people to use memcpy, we probably need to provide a credible alternative. Thanks, Florian ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 7:53 ` Florian Weimer @ 2018-08-03 9:12 ` Szabolcs Nagy -1 siblings, 0 replies; 238+ messages in thread From: Szabolcs Nagy @ 2018-08-03 9:12 UTC (permalink / raw) To: Florian Weimer, Andrew Pinski, mpatocka Cc: nd, Catalin Marinas, Will Deacon, linux, thomas.petazzoni, linux-arm-kernel, LKML, GNU C Library On 03/08/18 08:53, Florian Weimer wrote: > On 08/03/2018 09:11 AM, Andrew Pinski wrote: >> Yes fix Links not to use memcpy on the framebuffer. >> It is undefined behavior to use device memory with memcpy. > > Some (de facto) ABIs require that it is supported, though. For example, the POWER string functions avoid unaligned loads and stores for this > reason because the platform has the same issue with device memory. And yes, GCC will expand memcpy on POWER to something that is incompatible > with device memory. 8-( > i think it's not reasonable to require libc memcpy to work on device memory. i think if device memory is exposed to regular userspace applications that should be fixed. > If we don't want people to use memcpy, we probably need to provide a credible alternative. > > Thanks, > Florian ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 9:12 ` Szabolcs Nagy 0 siblings, 0 replies; 238+ messages in thread From: Szabolcs Nagy @ 2018-08-03 9:12 UTC (permalink / raw) To: linux-arm-kernel On 03/08/18 08:53, Florian Weimer wrote: > On 08/03/2018 09:11 AM, Andrew Pinski wrote: >> Yes fix Links not to use memcpy on the framebuffer. >> It is undefined behavior to use device memory with memcpy. > > Some (de facto) ABIs require that it is supported, though.? For example, the POWER string functions avoid unaligned loads and stores for this > reason because the platform has the same issue with device memory.? And yes, GCC will expand memcpy on POWER to something that is incompatible > with device memory. 8-( > i think it's not reasonable to require libc memcpy to work on device memory. i think if device memory is exposed to regular userspace applications that should be fixed. > If we don't want people to use memcpy, we probably need to provide a credible alternative. > > Thanks, > Florian ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 7:53 ` Florian Weimer @ 2018-08-03 9:15 ` Ramana Radhakrishnan -1 siblings, 0 replies; 238+ messages in thread From: Ramana Radhakrishnan @ 2018-08-03 9:15 UTC (permalink / raw) To: Florian Weimer Cc: Andrew Pinski, mpatocka, Catalin Marinas, Will Deacon, linux, thomas.petazzoni, linux-arm-kernel, LKML, GNU C Library On Fri, Aug 3, 2018 at 8:53 AM, Florian Weimer <fweimer@redhat.com> wrote: > On 08/03/2018 09:11 AM, Andrew Pinski wrote: >> >> Yes fix Links not to use memcpy on the framebuffer. >> It is undefined behavior to use device memory with memcpy. > > > Some (de facto) ABIs require that it is supported, though. For example, the > POWER string functions avoid unaligned loads and stores for this reason > because the platform has the same issue with device memory. And yes, GCC > will expand memcpy on POWER to something that is incompatible with device > memory. 8-( GCC for AArch64 - use -mstrict-align GCC for AArch32 - use -mno-unaligned-access. If you see unaligned accesses coming out of the compiler for well defined programs then that's a bug. Frequently we see undefined programs that get the compiler to produce traps - atleast one or 2 bugs a year in GCC . > > If we don't want people to use memcpy, we probably need to provide a > credible alternative. I believe a number of packages have rolled their own to take these constraints into account for AArch32, perhaps it needs to be expanded for AArch64 as well. regards Ramana > > Thanks, > Florian ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 9:15 ` Ramana Radhakrishnan 0 siblings, 0 replies; 238+ messages in thread From: Ramana Radhakrishnan @ 2018-08-03 9:15 UTC (permalink / raw) To: linux-arm-kernel On Fri, Aug 3, 2018 at 8:53 AM, Florian Weimer <fweimer@redhat.com> wrote: > On 08/03/2018 09:11 AM, Andrew Pinski wrote: >> >> Yes fix Links not to use memcpy on the framebuffer. >> It is undefined behavior to use device memory with memcpy. > > > Some (de facto) ABIs require that it is supported, though. For example, the > POWER string functions avoid unaligned loads and stores for this reason > because the platform has the same issue with device memory. And yes, GCC > will expand memcpy on POWER to something that is incompatible with device > memory. 8-( GCC for AArch64 - use -mstrict-align GCC for AArch32 - use -mno-unaligned-access. If you see unaligned accesses coming out of the compiler for well defined programs then that's a bug. Frequently we see undefined programs that get the compiler to produce traps - atleast one or 2 bugs a year in GCC . > > If we don't want people to use memcpy, we probably need to provide a > credible alternative. I believe a number of packages have rolled their own to take these constraints into account for AArch32, perhaps it needs to be expanded for AArch64 as well. regards Ramana > > Thanks, > Florian ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 9:15 ` Ramana Radhakrishnan @ 2018-08-03 9:29 ` Ard Biesheuvel -1 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-03 9:29 UTC (permalink / raw) To: Ramana Radhakrishnan Cc: Florian Weimer, Thomas Petazzoni, GNU C Library, Andrew Pinski, Catalin Marinas, Will Deacon, Russell King, LKML, Mikulas Patocka, linux-arm-kernel On 3 August 2018 at 11:15, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote: > On Fri, Aug 3, 2018 at 8:53 AM, Florian Weimer <fweimer@redhat.com> wrote: >> On 08/03/2018 09:11 AM, Andrew Pinski wrote: >>> >>> Yes fix Links not to use memcpy on the framebuffer. >>> It is undefined behavior to use device memory with memcpy. >> >> >> Some (de facto) ABIs require that it is supported, though. For example, the >> POWER string functions avoid unaligned loads and stores for this reason >> because the platform has the same issue with device memory. And yes, GCC >> will expand memcpy on POWER to something that is incompatible with device >> memory. 8-( > > GCC for AArch64 - use -mstrict-align > GCC for AArch32 - use -mno-unaligned-access. > > If you see unaligned accesses coming out of the compiler for well > defined programs then that's a bug. Frequently we see undefined > programs that get the compiler to produce traps - atleast one or 2 > bugs a year in GCC . > > >> >> If we don't want people to use memcpy, we probably need to provide a >> credible alternative. > > I believe a number of packages have rolled their own to take these > constraints into account > for AArch32, perhaps it needs to be expanded for AArch64 as well. > I guess the semantics of a framebuffer are not strictly defined, but the current reality is that it is expected to have memory semantics (by Linux/glibc) Matt is saying fundamental properties of the underlying interconnects (AMBA) make that impossible on ARM, but I'd like to understand better if that is universally the case, and whether such a system is still PCIe compliant. The discussion about whether memcpy() should rely on unaligned accesses, and whether you should use it on device memory is orthogonal to that, and not the heart of the matter IMO ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 9:29 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-03 9:29 UTC (permalink / raw) To: linux-arm-kernel On 3 August 2018 at 11:15, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote: > On Fri, Aug 3, 2018 at 8:53 AM, Florian Weimer <fweimer@redhat.com> wrote: >> On 08/03/2018 09:11 AM, Andrew Pinski wrote: >>> >>> Yes fix Links not to use memcpy on the framebuffer. >>> It is undefined behavior to use device memory with memcpy. >> >> >> Some (de facto) ABIs require that it is supported, though. For example, the >> POWER string functions avoid unaligned loads and stores for this reason >> because the platform has the same issue with device memory. And yes, GCC >> will expand memcpy on POWER to something that is incompatible with device >> memory. 8-( > > GCC for AArch64 - use -mstrict-align > GCC for AArch32 - use -mno-unaligned-access. > > If you see unaligned accesses coming out of the compiler for well > defined programs then that's a bug. Frequently we see undefined > programs that get the compiler to produce traps - atleast one or 2 > bugs a year in GCC . > > >> >> If we don't want people to use memcpy, we probably need to provide a >> credible alternative. > > I believe a number of packages have rolled their own to take these > constraints into account > for AArch32, perhaps it needs to be expanded for AArch64 as well. > I guess the semantics of a framebuffer are not strictly defined, but the current reality is that it is expected to have memory semantics (by Linux/glibc) Matt is saying fundamental properties of the underlying interconnects (AMBA) make that impossible on ARM, but I'd like to understand better if that is universally the case, and whether such a system is still PCIe compliant. The discussion about whether memcpy() should rely on unaligned accesses, and whether you should use it on device memory is orthogonal to that, and not the heart of the matter IMO ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 9:29 ` Ard Biesheuvel @ 2018-08-03 9:37 ` Ramana Radhakrishnan -1 siblings, 0 replies; 238+ messages in thread From: Ramana Radhakrishnan @ 2018-08-03 9:37 UTC (permalink / raw) To: Ard Biesheuvel Cc: Florian Weimer, Thomas Petazzoni, GNU C Library, Andrew Pinski, Catalin Marinas, Will Deacon, Russell King, LKML, Mikulas Patocka, linux-arm-kernel < snip> > I guess the semantics of a framebuffer are not strictly defined, but > the current reality is that it is expected to have memory semantics > (by Linux/glibc) > > Matt is saying fundamental properties of the underlying interconnects > (AMBA) make that impossible on ARM, but I'd like to understand better > if that is universally the case, and whether such a system is still > PCIe compliant. I don't know that side of the architecture enough to make any definitive statements. > > The discussion about whether memcpy() should rely on unaligned > accesses, and whether you should use it on device memory is orthogonal > to that, and not the heart of the matter IMO Then maybe take libc-alpha off if it isn't relevant. regards Ramana ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 9:37 ` Ramana Radhakrishnan 0 siblings, 0 replies; 238+ messages in thread From: Ramana Radhakrishnan @ 2018-08-03 9:37 UTC (permalink / raw) To: linux-arm-kernel < snip> > I guess the semantics of a framebuffer are not strictly defined, but > the current reality is that it is expected to have memory semantics > (by Linux/glibc) > > Matt is saying fundamental properties of the underlying interconnects > (AMBA) make that impossible on ARM, but I'd like to understand better > if that is universally the case, and whether such a system is still > PCIe compliant. I don't know that side of the architecture enough to make any definitive statements. > > The discussion about whether memcpy() should rely on unaligned > accesses, and whether you should use it on device memory is orthogonal > to that, and not the heart of the matter IMO Then maybe take libc-alpha off if it isn't relevant. regards Ramana ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 9:29 ` Ard Biesheuvel @ 2018-08-03 9:42 ` Richard Earnshaw (lists) -1 siblings, 0 replies; 238+ messages in thread From: Richard Earnshaw (lists) @ 2018-08-03 9:42 UTC (permalink / raw) To: Ard Biesheuvel, Ramana Radhakrishnan Cc: Florian Weimer, Thomas Petazzoni, GNU C Library, Andrew Pinski, Catalin Marinas, Will Deacon, Russell King, LKML, Mikulas Patocka, linux-arm-kernel On 03/08/18 10:29, Ard Biesheuvel wrote: > On 3 August 2018 at 11:15, Ramana Radhakrishnan > <ramana.gcc@googlemail.com> wrote: >> On Fri, Aug 3, 2018 at 8:53 AM, Florian Weimer <fweimer@redhat.com> wrote: >>> On 08/03/2018 09:11 AM, Andrew Pinski wrote: >>>> >>>> Yes fix Links not to use memcpy on the framebuffer. >>>> It is undefined behavior to use device memory with memcpy. >>> >>> >>> Some (de facto) ABIs require that it is supported, though. For example, the >>> POWER string functions avoid unaligned loads and stores for this reason >>> because the platform has the same issue with device memory. And yes, GCC >>> will expand memcpy on POWER to something that is incompatible with device >>> memory. 8-( >> >> GCC for AArch64 - use -mstrict-align >> GCC for AArch32 - use -mno-unaligned-access. >> >> If you see unaligned accesses coming out of the compiler for well >> defined programs then that's a bug. Frequently we see undefined >> programs that get the compiler to produce traps - atleast one or 2 >> bugs a year in GCC . >> >> >>> >>> If we don't want people to use memcpy, we probably need to provide a >>> credible alternative. >> >> I believe a number of packages have rolled their own to take these >> constraints into account >> for AArch32, perhaps it needs to be expanded for AArch64 as well. >> > > I guess the semantics of a framebuffer are not strictly defined, but > the current reality is that it is expected to have memory semantics > (by Linux/glibc) > > Matt is saying fundamental properties of the underlying interconnects > (AMBA) make that impossible on ARM, but I'd like to understand better > if that is universally the case, and whether such a system is still > PCIe compliant. > > The discussion about whether memcpy() should rely on unaligned > accesses, and whether you should use it on device memory is orthogonal > to that, and not the heart of the matter IMO > Whoa, hold on. Memcpy should never be used on device memory. Period. Memcpy doesn't know anything about what size of access is needed for accessing a device. But why is the buffer in device memory rather than some other form of uncached memory? If you change memcpy to deal with an aspect of the system hardware, you'll end up hosing performance EVERYWHERE. DON'T DO IT! If you must, create a new API with tighter semantics, but don't change memcpy to accommodate this. Anyway, back to the original report. What memory mapping is being used? In detail? R. ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 9:42 ` Richard Earnshaw (lists) 0 siblings, 0 replies; 238+ messages in thread From: Richard Earnshaw (lists) @ 2018-08-03 9:42 UTC (permalink / raw) To: linux-arm-kernel On 03/08/18 10:29, Ard Biesheuvel wrote: > On 3 August 2018 at 11:15, Ramana Radhakrishnan > <ramana.gcc@googlemail.com> wrote: >> On Fri, Aug 3, 2018 at 8:53 AM, Florian Weimer <fweimer@redhat.com> wrote: >>> On 08/03/2018 09:11 AM, Andrew Pinski wrote: >>>> >>>> Yes fix Links not to use memcpy on the framebuffer. >>>> It is undefined behavior to use device memory with memcpy. >>> >>> >>> Some (de facto) ABIs require that it is supported, though. For example, the >>> POWER string functions avoid unaligned loads and stores for this reason >>> because the platform has the same issue with device memory. And yes, GCC >>> will expand memcpy on POWER to something that is incompatible with device >>> memory. 8-( >> >> GCC for AArch64 - use -mstrict-align >> GCC for AArch32 - use -mno-unaligned-access. >> >> If you see unaligned accesses coming out of the compiler for well >> defined programs then that's a bug. Frequently we see undefined >> programs that get the compiler to produce traps - atleast one or 2 >> bugs a year in GCC . >> >> >>> >>> If we don't want people to use memcpy, we probably need to provide a >>> credible alternative. >> >> I believe a number of packages have rolled their own to take these >> constraints into account >> for AArch32, perhaps it needs to be expanded for AArch64 as well. >> > > I guess the semantics of a framebuffer are not strictly defined, but > the current reality is that it is expected to have memory semantics > (by Linux/glibc) > > Matt is saying fundamental properties of the underlying interconnects > (AMBA) make that impossible on ARM, but I'd like to understand better > if that is universally the case, and whether such a system is still > PCIe compliant. > > The discussion about whether memcpy() should rely on unaligned > accesses, and whether you should use it on device memory is orthogonal > to that, and not the heart of the matter IMO > Whoa, hold on. Memcpy should never be used on device memory. Period. Memcpy doesn't know anything about what size of access is needed for accessing a device. But why is the buffer in device memory rather than some other form of uncached memory? If you change memcpy to deal with an aspect of the system hardware, you'll end up hosing performance EVERYWHERE. DON'T DO IT! If you must, create a new API with tighter semantics, but don't change memcpy to accommodate this. Anyway, back to the original report. What memory mapping is being used? In detail? R. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 9:42 ` Richard Earnshaw (lists) @ 2018-08-04 0:58 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-04 0:58 UTC (permalink / raw) To: Richard Earnshaw (lists) Cc: Ard Biesheuvel, Ramana Radhakrishnan, Florian Weimer, Thomas Petazzoni, GNU C Library, Andrew Pinski, Catalin Marinas, Will Deacon, Russell King, LKML, linux-arm-kernel On Fri, 3 Aug 2018, Richard Earnshaw (lists) wrote: > Whoa, hold on. > > Memcpy should never be used on device memory. Period. Memcpy doesn't > know anything about what size of access is needed for accessing a device. > > But why is the buffer in device memory rather than some other form of > uncached memory? > > If you change memcpy to deal with an aspect of the system hardware, > you'll end up hosing performance EVERYWHERE. DON'T DO IT! memcpy in glibc uses ifunc selection and it already has optimized variants for Falkor and Thunder-X. You can add just another variant for Armada-8040 that works around this bug and you won't be harming anyone but users of Armada-8040. Furthermore, you can detect in the kernel that the PCI bus has some device with prefetchable BAR and activate the workaround only if there is videocard plugged in the PCIe slot. > If you must, create a new API with tighter semantics, but don't change > memcpy to accommodate this. > > Anyway, back to the original report. What memory mapping is being used? > In detail? It is PCI prefetchable BAR. It is mapped using pgprot_writecombine, which results in MT_NORMAL_NC page attributes. (the MT_DEVICE_nGnRE can't be used because it results in crashes due to unaligned accesses to videoram). > R. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-04 0:58 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-04 0:58 UTC (permalink / raw) To: linux-arm-kernel On Fri, 3 Aug 2018, Richard Earnshaw (lists) wrote: > Whoa, hold on. > > Memcpy should never be used on device memory. Period. Memcpy doesn't > know anything about what size of access is needed for accessing a device. > > But why is the buffer in device memory rather than some other form of > uncached memory? > > If you change memcpy to deal with an aspect of the system hardware, > you'll end up hosing performance EVERYWHERE. DON'T DO IT! memcpy in glibc uses ifunc selection and it already has optimized variants for Falkor and Thunder-X. You can add just another variant for Armada-8040 that works around this bug and you won't be harming anyone but users of Armada-8040. Furthermore, you can detect in the kernel that the PCI bus has some device with prefetchable BAR and activate the workaround only if there is videocard plugged in the PCIe slot. > If you must, create a new API with tighter semantics, but don't change > memcpy to accommodate this. > > Anyway, back to the original report. What memory mapping is being used? > In detail? It is PCI prefetchable BAR. It is mapped using pgprot_writecombine, which results in MT_NORMAL_NC page attributes. (the MT_DEVICE_nGnRE can't be used because it results in crashes due to unaligned accesses to videoram). > R. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-04 0:58 ` Mikulas Patocka @ 2018-08-04 1:13 ` Andrew Pinski -1 siblings, 0 replies; 238+ messages in thread From: Andrew Pinski @ 2018-08-04 1:13 UTC (permalink / raw) To: mpatocka Cc: Richard Earnshaw, ard.biesheuvel, Ramana Radhakrishnan, Florian Weimer, thomas.petazzoni, GNU C Library, Catalin Marinas, Will Deacon, linux, LKML, linux-arm-kernel On Fri, Aug 3, 2018 at 5:58 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > On Fri, 3 Aug 2018, Richard Earnshaw (lists) wrote: > > > Whoa, hold on. > > > > Memcpy should never be used on device memory. Period. Memcpy doesn't > > know anything about what size of access is needed for accessing a device. > > > > But why is the buffer in device memory rather than some other form of > > uncached memory? > > > > If you change memcpy to deal with an aspect of the system hardware, > > you'll end up hosing performance EVERYWHERE. DON'T DO IT! > > memcpy in glibc uses ifunc selection and it already has optimized variants > for Falkor and Thunder-X. You can add just another variant for Armada-8040 > that works around this bug and you won't be harming anyone but users of > Armada-8040. Except it is not a bug in the ARMADA at all. It is a bug in thinking memcpy will work on non-DRAM memory. Can you run the test program on x86 using the similar framebuffer setup? Does doing two writes (one aligned and one unaligned but overlapping with previous one) cause the same issue? I suspect it does, then using memcpy for frame buffers is wrong. Thanks, Andrew > > Furthermore, you can detect in the kernel that the PCI bus has some device > with prefetchable BAR and activate the workaround only if there is > videocard plugged in the PCIe slot. > > > If you must, create a new API with tighter semantics, but don't change > > memcpy to accommodate this. > > > > Anyway, back to the original report. What memory mapping is being used? > > In detail? > > It is PCI prefetchable BAR. It is mapped using pgprot_writecombine, which > results in MT_NORMAL_NC page attributes. (the MT_DEVICE_nGnRE can't be > used because it results in crashes due to unaligned accesses to videoram). > > > R. > > Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-04 1:13 ` Andrew Pinski 0 siblings, 0 replies; 238+ messages in thread From: Andrew Pinski @ 2018-08-04 1:13 UTC (permalink / raw) To: linux-arm-kernel On Fri, Aug 3, 2018 at 5:58 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > On Fri, 3 Aug 2018, Richard Earnshaw (lists) wrote: > > > Whoa, hold on. > > > > Memcpy should never be used on device memory. Period. Memcpy doesn't > > know anything about what size of access is needed for accessing a device. > > > > But why is the buffer in device memory rather than some other form of > > uncached memory? > > > > If you change memcpy to deal with an aspect of the system hardware, > > you'll end up hosing performance EVERYWHERE. DON'T DO IT! > > memcpy in glibc uses ifunc selection and it already has optimized variants > for Falkor and Thunder-X. You can add just another variant for Armada-8040 > that works around this bug and you won't be harming anyone but users of > Armada-8040. Except it is not a bug in the ARMADA at all. It is a bug in thinking memcpy will work on non-DRAM memory. Can you run the test program on x86 using the similar framebuffer setup? Does doing two writes (one aligned and one unaligned but overlapping with previous one) cause the same issue? I suspect it does, then using memcpy for frame buffers is wrong. Thanks, Andrew > > Furthermore, you can detect in the kernel that the PCI bus has some device > with prefetchable BAR and activate the workaround only if there is > videocard plugged in the PCIe slot. > > > If you must, create a new API with tighter semantics, but don't change > > memcpy to accommodate this. > > > > Anyway, back to the original report. What memory mapping is being used? > > In detail? > > It is PCI prefetchable BAR. It is mapped using pgprot_writecombine, which > results in MT_NORMAL_NC page attributes. (the MT_DEVICE_nGnRE can't be > used because it results in crashes due to unaligned accesses to videoram). > > > R. > > Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-04 1:13 ` Andrew Pinski @ 2018-08-04 11:04 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-04 11:04 UTC (permalink / raw) To: Andrew Pinski Cc: Richard Earnshaw, ard.biesheuvel, Ramana Radhakrishnan, Florian Weimer, thomas.petazzoni, GNU C Library, Catalin Marinas, Will Deacon, linux, LKML, linux-arm-kernel On Fri, 3 Aug 2018, Andrew Pinski wrote: > On Fri, Aug 3, 2018 at 5:58 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > > On Fri, 3 Aug 2018, Richard Earnshaw (lists) wrote: > > > > > Whoa, hold on. > > > > > > Memcpy should never be used on device memory. Period. Memcpy doesn't > > > know anything about what size of access is needed for accessing a device. > > > > > > But why is the buffer in device memory rather than some other form of > > > uncached memory? > > > > > > If you change memcpy to deal with an aspect of the system hardware, > > > you'll end up hosing performance EVERYWHERE. DON'T DO IT! > > > > memcpy in glibc uses ifunc selection and it already has optimized variants > > for Falkor and Thunder-X. You can add just another variant for Armada-8040 > > that works around this bug and you won't be harming anyone but users of > > Armada-8040. > > Except it is not a bug in the ARMADA at all. It is a bug in thinking > memcpy will work on non-DRAM memory. There's plenty of memcpy's in the graphics stack. No one will be rewriting all the graphics drivers because of tiny market share that ARM has in desktop computers. So if you refuse to fix things and blame everyone else, you can as well announce that you don't want to have PCIe graphics on ARM at all. > Can you run the test program on x86 using the similar framebuffer > setup? Does doing two writes (one aligned and one unaligned but > overlapping with previous one) cause the same issue? I suspect it > does, then using memcpy for frame buffers is wrong. > > Thanks, > Andrew Overlapping unaligned writes work on x86 - they have to, because of backward compatibility. 8086, 80286 and 80386 didn't have any cache at all. 80486 and Pentium had cache, but when the CPU was reading some data from memory, the motherboard could disable cacheability for this data by a special pin. Software didn't have to do any explicit cache management - programs for 80386 that expected that there's no cache worked flawlessly on 80486 and Pentium. Pentium Pro had memory type range registers that determine cacheability of various memory regions (so that it could allocate a cache line on write without having to query the motherboard if the particular region of memory is cacheable) - but the MTRRs were set by BIOS and the software didn't have to care about them at all - an 80386 operating system that had no idea of cacheability would still work on Pentium Pro. MTRRs could also set a write-combining mode on a region of memory - but again, this is completely transparent to the software (the write combining buffers are flushed when accessing an I/O port or uncacheable memory) - so that an accelerated graphics driver written for Pentium that had no idea of write-combining would still work on Pentium Pro with write combining enabled. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-04 11:04 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-04 11:04 UTC (permalink / raw) To: linux-arm-kernel On Fri, 3 Aug 2018, Andrew Pinski wrote: > On Fri, Aug 3, 2018 at 5:58 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > > > On Fri, 3 Aug 2018, Richard Earnshaw (lists) wrote: > > > > > Whoa, hold on. > > > > > > Memcpy should never be used on device memory. Period. Memcpy doesn't > > > know anything about what size of access is needed for accessing a device. > > > > > > But why is the buffer in device memory rather than some other form of > > > uncached memory? > > > > > > If you change memcpy to deal with an aspect of the system hardware, > > > you'll end up hosing performance EVERYWHERE. DON'T DO IT! > > > > memcpy in glibc uses ifunc selection and it already has optimized variants > > for Falkor and Thunder-X. You can add just another variant for Armada-8040 > > that works around this bug and you won't be harming anyone but users of > > Armada-8040. > > Except it is not a bug in the ARMADA at all. It is a bug in thinking > memcpy will work on non-DRAM memory. There's plenty of memcpy's in the graphics stack. No one will be rewriting all the graphics drivers because of tiny market share that ARM has in desktop computers. So if you refuse to fix things and blame everyone else, you can as well announce that you don't want to have PCIe graphics on ARM at all. > Can you run the test program on x86 using the similar framebuffer > setup? Does doing two writes (one aligned and one unaligned but > overlapping with previous one) cause the same issue? I suspect it > does, then using memcpy for frame buffers is wrong. > > Thanks, > Andrew Overlapping unaligned writes work on x86 - they have to, because of backward compatibility. 8086, 80286 and 80386 didn't have any cache at all. 80486 and Pentium had cache, but when the CPU was reading some data from memory, the motherboard could disable cacheability for this data by a special pin. Software didn't have to do any explicit cache management - programs for 80386 that expected that there's no cache worked flawlessly on 80486 and Pentium. Pentium Pro had memory type range registers that determine cacheability of various memory regions (so that it could allocate a cache line on write without having to query the motherboard if the particular region of memory is cacheable) - but the MTRRs were set by BIOS and the software didn't have to care about them at all - an 80386 operating system that had no idea of cacheability would still work on Pentium Pro. MTRRs could also set a write-combining mode on a region of memory - but again, this is completely transparent to the software (the write combining buffers are flushed when accessing an I/O port or uncacheable memory) - so that an accelerated graphics driver written for Pentium that had no idea of write-combining would still work on Pentium Pro with write combining enabled. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-04 11:04 ` Mikulas Patocka @ 2018-08-05 18:33 ` Florian Weimer -1 siblings, 0 replies; 238+ messages in thread From: Florian Weimer @ 2018-08-05 18:33 UTC (permalink / raw) To: Mikulas Patocka, Andrew Pinski Cc: Richard Earnshaw, ard.biesheuvel, Ramana Radhakrishnan, thomas.petazzoni, GNU C Library, Catalin Marinas, Will Deacon, linux, LKML, linux-arm-kernel On 08/04/2018 01:04 PM, Mikulas Patocka wrote: > There's plenty of memcpy's in the graphics stack. No one will be rewriting > all the graphics drivers because of tiny market share that ARM has in > desktop computers. So if you refuse to fix things and blame everyone else, > you can as well announce that you don't want to have PCIe graphics on ARM > at all. The POWER toolchain maintainers said pretty much the same thing not too long ago. I wonder how many architectures need to fail until the graphics stack is finally fixed. Thanks, Florian ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-05 18:33 ` Florian Weimer 0 siblings, 0 replies; 238+ messages in thread From: Florian Weimer @ 2018-08-05 18:33 UTC (permalink / raw) To: linux-arm-kernel On 08/04/2018 01:04 PM, Mikulas Patocka wrote: > There's plenty of memcpy's in the graphics stack. No one will be rewriting > all the graphics drivers because of tiny market share that ARM has in > desktop computers. So if you refuse to fix things and blame everyone else, > you can as well announce that you don't want to have PCIe graphics on ARM > at all. The POWER toolchain maintainers said pretty much the same thing not too long ago. I wonder how many architectures need to fail until the graphics stack is finally fixed. Thanks, Florian ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-05 18:33 ` Florian Weimer @ 2018-08-06 8:02 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 8:02 UTC (permalink / raw) To: Florian Weimer Cc: Andrew Pinski, Richard Earnshaw, ard.biesheuvel, Ramana Radhakrishnan, thomas.petazzoni, GNU C Library, Catalin Marinas, Will Deacon, linux, LKML, linux-arm-kernel On Sun, 5 Aug 2018, Florian Weimer wrote: > On 08/04/2018 01:04 PM, Mikulas Patocka wrote: > > There's plenty of memcpy's in the graphics stack. No one will be rewriting > > all the graphics drivers because of tiny market share that ARM has in > > desktop computers. So if you refuse to fix things and blame everyone else, > > you can as well announce that you don't want to have PCIe graphics on ARM > > at all. > > The POWER toolchain maintainers said pretty much the same thing not too > long ago. I wonder how many architectures need to fail until the > graphics stack is finally fixed. > > Thanks, > Florian If you say that your architecture doesn't support unaligned accesses at all, there's no problem - the compiler won't generate them and the libc won't contain them. But if you say that your architecture supports unaligned accesses except for the framebuffer, then you have a problem - the compiler can't know which pointers point to the framebuffer and libc can't know either - you caused this problem by your architectural decision. You can use 'volatile' to suppress memory optimizations, but it's impossible to go through the whole Linux graphics stack and add volatile to every pointer that may point to videoram. Even if you succeesed, new videoram accesses without volatile will appear after a year of development. See for example the macros READ_ONCE and WRITE_ONCE in Linux kernel - they should be used when there's concurrent access to the particular variable, but mainstream architectures don't require them, so many kernel developers are omitting them in their code. If you are building a supercomputer with a particular GPU, you can force the GPU vendor to provide POWER-compliant drivers. If you are building a workstation where the user can plug any GPU, forcing developers will go nowhere. You have to emulate the unaligned accesses and make sure that the next versions of your architecture support them in hardware. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 8:02 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 8:02 UTC (permalink / raw) To: linux-arm-kernel On Sun, 5 Aug 2018, Florian Weimer wrote: > On 08/04/2018 01:04 PM, Mikulas Patocka wrote: > > There's plenty of memcpy's in the graphics stack. No one will be rewriting > > all the graphics drivers because of tiny market share that ARM has in > > desktop computers. So if you refuse to fix things and blame everyone else, > > you can as well announce that you don't want to have PCIe graphics on ARM > > at all. > > The POWER toolchain maintainers said pretty much the same thing not too > long ago. I wonder how many architectures need to fail until the > graphics stack is finally fixed. > > Thanks, > Florian If you say that your architecture doesn't support unaligned accesses at all, there's no problem - the compiler won't generate them and the libc won't contain them. But if you say that your architecture supports unaligned accesses except for the framebuffer, then you have a problem - the compiler can't know which pointers point to the framebuffer and libc can't know either - you caused this problem by your architectural decision. You can use 'volatile' to suppress memory optimizations, but it's impossible to go through the whole Linux graphics stack and add volatile to every pointer that may point to videoram. Even if you succeesed, new videoram accesses without volatile will appear after a year of development. See for example the macros READ_ONCE and WRITE_ONCE in Linux kernel - they should be used when there's concurrent access to the particular variable, but mainstream architectures don't require them, so many kernel developers are omitting them in their code. If you are building a supercomputer with a particular GPU, you can force the GPU vendor to provide POWER-compliant drivers. If you are building a workstation where the user can plug any GPU, forcing developers will go nowhere. You have to emulate the unaligned accesses and make sure that the next versions of your architecture support them in hardware. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 8:02 ` Mikulas Patocka @ 2018-08-06 8:10 ` Ard Biesheuvel -1 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 8:10 UTC (permalink / raw) To: Mikulas Patocka Cc: Florian Weimer, Andrew Pinski, Richard Earnshaw, Ramana Radhakrishnan, Thomas Petazzoni, GNU C Library, Catalin Marinas, Will Deacon, Russell King, LKML, linux-arm-kernel On 6 August 2018 at 10:02, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Sun, 5 Aug 2018, Florian Weimer wrote: > >> On 08/04/2018 01:04 PM, Mikulas Patocka wrote: >> > There's plenty of memcpy's in the graphics stack. No one will be rewriting >> > all the graphics drivers because of tiny market share that ARM has in >> > desktop computers. So if you refuse to fix things and blame everyone else, >> > you can as well announce that you don't want to have PCIe graphics on ARM >> > at all. >> >> The POWER toolchain maintainers said pretty much the same thing not too >> long ago. I wonder how many architectures need to fail until the >> graphics stack is finally fixed. >> >> Thanks, >> Florian > > If you say that your architecture doesn't support unaligned accesses at > all, there's no problem - the compiler won't generate them and the libc > won't contain them. > > But if you say that your architecture supports unaligned accesses except > for the framebuffer, then you have a problem - the compiler can't know > which pointers point to the framebuffer and libc can't know either - you > caused this problem by your architectural decision. > > You can use 'volatile' to suppress memory optimizations, but it's > impossible to go through the whole Linux graphics stack and add volatile > to every pointer that may point to videoram. Even if you succeesed, new > videoram accesses without volatile will appear after a year of > development. > > See for example the macros READ_ONCE and WRITE_ONCE in Linux kernel - they > should be used when there's concurrent access to the particular variable, > but mainstream architectures don't require them, so many kernel developers > are omitting them in their code. > > If you are building a supercomputer with a particular GPU, you can force > the GPU vendor to provide POWER-compliant drivers. If you are building a > workstation where the user can plug any GPU, forcing developers will go > nowhere. You have to emulate the unaligned accesses and make sure that the > next versions of your architecture support them in hardware. > I have the feeling this discussion is going off the rails again. The original report is about corruption when doing overlapping writes. Matt Sealey said you cannot have PCI outbound windows with memory semantics on ARM, and so you should be using device mappings (which do not tolerate unaligned accesses) In this context, 'device mapping' does not mean 'any non-DRAM region', but it refers to a particular type of MMU mapping attribute defined by the ARM architecture. I think we can all agree that memcpy() should be usable on any region of memory that has true memory semantics, even if it is backed by VRAM on a graphics card. The question is if PCIe can provide such regions on ARM. ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 8:10 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 8:10 UTC (permalink / raw) To: linux-arm-kernel On 6 August 2018 at 10:02, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Sun, 5 Aug 2018, Florian Weimer wrote: > >> On 08/04/2018 01:04 PM, Mikulas Patocka wrote: >> > There's plenty of memcpy's in the graphics stack. No one will be rewriting >> > all the graphics drivers because of tiny market share that ARM has in >> > desktop computers. So if you refuse to fix things and blame everyone else, >> > you can as well announce that you don't want to have PCIe graphics on ARM >> > at all. >> >> The POWER toolchain maintainers said pretty much the same thing not too >> long ago. I wonder how many architectures need to fail until the >> graphics stack is finally fixed. >> >> Thanks, >> Florian > > If you say that your architecture doesn't support unaligned accesses at > all, there's no problem - the compiler won't generate them and the libc > won't contain them. > > But if you say that your architecture supports unaligned accesses except > for the framebuffer, then you have a problem - the compiler can't know > which pointers point to the framebuffer and libc can't know either - you > caused this problem by your architectural decision. > > You can use 'volatile' to suppress memory optimizations, but it's > impossible to go through the whole Linux graphics stack and add volatile > to every pointer that may point to videoram. Even if you succeesed, new > videoram accesses without volatile will appear after a year of > development. > > See for example the macros READ_ONCE and WRITE_ONCE in Linux kernel - they > should be used when there's concurrent access to the particular variable, > but mainstream architectures don't require them, so many kernel developers > are omitting them in their code. > > If you are building a supercomputer with a particular GPU, you can force > the GPU vendor to provide POWER-compliant drivers. If you are building a > workstation where the user can plug any GPU, forcing developers will go > nowhere. You have to emulate the unaligned accesses and make sure that the > next versions of your architecture support them in hardware. > I have the feeling this discussion is going off the rails again. The original report is about corruption when doing overlapping writes. Matt Sealey said you cannot have PCI outbound windows with memory semantics on ARM, and so you should be using device mappings (which do not tolerate unaligned accesses) In this context, 'device mapping' does not mean 'any non-DRAM region', but it refers to a particular type of MMU mapping attribute defined by the ARM architecture. I think we can all agree that memcpy() should be usable on any region of memory that has true memory semantics, even if it is backed by VRAM on a graphics card. The question is if PCIe can provide such regions on ARM. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 8:10 ` Ard Biesheuvel @ 2018-08-06 10:31 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 10:31 UTC (permalink / raw) To: Ard Biesheuvel Cc: Florian Weimer, Andrew Pinski, Richard Earnshaw, Ramana Radhakrishnan, Thomas Petazzoni, GNU C Library, Catalin Marinas, Will Deacon, Russell King, LKML, linux-arm-kernel On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > On 6 August 2018 at 10:02, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Sun, 5 Aug 2018, Florian Weimer wrote: > > > >> On 08/04/2018 01:04 PM, Mikulas Patocka wrote: > >> > There's plenty of memcpy's in the graphics stack. No one will be rewriting > >> > all the graphics drivers because of tiny market share that ARM has in > >> > desktop computers. So if you refuse to fix things and blame everyone else, > >> > you can as well announce that you don't want to have PCIe graphics on ARM > >> > at all. > >> > >> The POWER toolchain maintainers said pretty much the same thing not too > >> long ago. I wonder how many architectures need to fail until the > >> graphics stack is finally fixed. > >> > >> Thanks, > >> Florian > > > > If you say that your architecture doesn't support unaligned accesses at > > all, there's no problem - the compiler won't generate them and the libc > > won't contain them. > > > > But if you say that your architecture supports unaligned accesses except > > for the framebuffer, then you have a problem - the compiler can't know > > which pointers point to the framebuffer and libc can't know either - you > > caused this problem by your architectural decision. > > > > You can use 'volatile' to suppress memory optimizations, but it's > > impossible to go through the whole Linux graphics stack and add volatile > > to every pointer that may point to videoram. Even if you succeesed, new > > videoram accesses without volatile will appear after a year of > > development. > > > > See for example the macros READ_ONCE and WRITE_ONCE in Linux kernel - they > > should be used when there's concurrent access to the particular variable, > > but mainstream architectures don't require them, so many kernel developers > > are omitting them in their code. > > > > If you are building a supercomputer with a particular GPU, you can force > > the GPU vendor to provide POWER-compliant drivers. If you are building a > > workstation where the user can plug any GPU, forcing developers will go > > nowhere. You have to emulate the unaligned accesses and make sure that the > > next versions of your architecture support them in hardware. > > > > I have the feeling this discussion is going off the rails again. > > The original report is about corruption when doing overlapping writes. > Matt Sealey said you cannot have PCI outbound windows with memory > semantics on ARM, and so you should be using device mappings (which do > not tolerate unaligned accesses) > > In this context, 'device mapping' does not mean 'any non-DRAM region', > but it refers to a particular type of MMU mapping attribute defined by > the ARM architecture. > > I think we can all agree that memcpy() should be usable on any region > of memory that has true memory semantics, even if it is backed by VRAM > on a graphics card. > > The question is if PCIe can provide such regions on ARM. I think there are three possible solutions: 1. provide an alternative memcpy implementation that doesn't do unaligned accesses and recompile the graphics software with -mstrict-align 2. map the PCI BAR as device memory and emulate the unaligned instructions 3. find some hardware workaround that could insert delays between the PCIe accesses (but the hardware engineers need to cooperate on this instead of asserting that they refuse tu support it) Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 10:31 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 10:31 UTC (permalink / raw) To: linux-arm-kernel On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > On 6 August 2018 at 10:02, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Sun, 5 Aug 2018, Florian Weimer wrote: > > > >> On 08/04/2018 01:04 PM, Mikulas Patocka wrote: > >> > There's plenty of memcpy's in the graphics stack. No one will be rewriting > >> > all the graphics drivers because of tiny market share that ARM has in > >> > desktop computers. So if you refuse to fix things and blame everyone else, > >> > you can as well announce that you don't want to have PCIe graphics on ARM > >> > at all. > >> > >> The POWER toolchain maintainers said pretty much the same thing not too > >> long ago. I wonder how many architectures need to fail until the > >> graphics stack is finally fixed. > >> > >> Thanks, > >> Florian > > > > If you say that your architecture doesn't support unaligned accesses at > > all, there's no problem - the compiler won't generate them and the libc > > won't contain them. > > > > But if you say that your architecture supports unaligned accesses except > > for the framebuffer, then you have a problem - the compiler can't know > > which pointers point to the framebuffer and libc can't know either - you > > caused this problem by your architectural decision. > > > > You can use 'volatile' to suppress memory optimizations, but it's > > impossible to go through the whole Linux graphics stack and add volatile > > to every pointer that may point to videoram. Even if you succeesed, new > > videoram accesses without volatile will appear after a year of > > development. > > > > See for example the macros READ_ONCE and WRITE_ONCE in Linux kernel - they > > should be used when there's concurrent access to the particular variable, > > but mainstream architectures don't require them, so many kernel developers > > are omitting them in their code. > > > > If you are building a supercomputer with a particular GPU, you can force > > the GPU vendor to provide POWER-compliant drivers. If you are building a > > workstation where the user can plug any GPU, forcing developers will go > > nowhere. You have to emulate the unaligned accesses and make sure that the > > next versions of your architecture support them in hardware. > > > > I have the feeling this discussion is going off the rails again. > > The original report is about corruption when doing overlapping writes. > Matt Sealey said you cannot have PCI outbound windows with memory > semantics on ARM, and so you should be using device mappings (which do > not tolerate unaligned accesses) > > In this context, 'device mapping' does not mean 'any non-DRAM region', > but it refers to a particular type of MMU mapping attribute defined by > the ARM architecture. > > I think we can all agree that memcpy() should be usable on any region > of memory that has true memory semantics, even if it is backed by VRAM > on a graphics card. > > The question is if PCIe can provide such regions on ARM. I think there are three possible solutions: 1. provide an alternative memcpy implementation that doesn't do unaligned accesses and recompile the graphics software with -mstrict-align 2. map the PCI BAR as device memory and emulate the unaligned instructions 3. find some hardware workaround that could insert delays between the PCIe accesses (but the hardware engineers need to cooperate on this instead of asserting that they refuse tu support it) Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 10:31 ` Mikulas Patocka @ 2018-08-06 10:37 ` Ard Biesheuvel -1 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 10:37 UTC (permalink / raw) To: Mikulas Patocka Cc: Florian Weimer, Andrew Pinski, Richard Earnshaw, Ramana Radhakrishnan, Thomas Petazzoni, GNU C Library, Catalin Marinas, Will Deacon, Russell King, LKML, linux-arm-kernel On 6 August 2018 at 12:31, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > >> On 6 August 2018 at 10:02, Mikulas Patocka <mpatocka@redhat.com> wrote: >> > >> > >> > On Sun, 5 Aug 2018, Florian Weimer wrote: >> > >> >> On 08/04/2018 01:04 PM, Mikulas Patocka wrote: >> >> > There's plenty of memcpy's in the graphics stack. No one will be rewriting >> >> > all the graphics drivers because of tiny market share that ARM has in >> >> > desktop computers. So if you refuse to fix things and blame everyone else, >> >> > you can as well announce that you don't want to have PCIe graphics on ARM >> >> > at all. >> >> >> >> The POWER toolchain maintainers said pretty much the same thing not too >> >> long ago. I wonder how many architectures need to fail until the >> >> graphics stack is finally fixed. >> >> >> >> Thanks, >> >> Florian >> > >> > If you say that your architecture doesn't support unaligned accesses at >> > all, there's no problem - the compiler won't generate them and the libc >> > won't contain them. >> > >> > But if you say that your architecture supports unaligned accesses except >> > for the framebuffer, then you have a problem - the compiler can't know >> > which pointers point to the framebuffer and libc can't know either - you >> > caused this problem by your architectural decision. >> > >> > You can use 'volatile' to suppress memory optimizations, but it's >> > impossible to go through the whole Linux graphics stack and add volatile >> > to every pointer that may point to videoram. Even if you succeesed, new >> > videoram accesses without volatile will appear after a year of >> > development. >> > >> > See for example the macros READ_ONCE and WRITE_ONCE in Linux kernel - they >> > should be used when there's concurrent access to the particular variable, >> > but mainstream architectures don't require them, so many kernel developers >> > are omitting them in their code. >> > >> > If you are building a supercomputer with a particular GPU, you can force >> > the GPU vendor to provide POWER-compliant drivers. If you are building a >> > workstation where the user can plug any GPU, forcing developers will go >> > nowhere. You have to emulate the unaligned accesses and make sure that the >> > next versions of your architecture support them in hardware. >> > >> >> I have the feeling this discussion is going off the rails again. >> >> The original report is about corruption when doing overlapping writes. >> Matt Sealey said you cannot have PCI outbound windows with memory >> semantics on ARM, and so you should be using device mappings (which do >> not tolerate unaligned accesses) >> >> In this context, 'device mapping' does not mean 'any non-DRAM region', >> but it refers to a particular type of MMU mapping attribute defined by >> the ARM architecture. >> >> I think we can all agree that memcpy() should be usable on any region >> of memory that has true memory semantics, even if it is backed by VRAM >> on a graphics card. >> >> The question is if PCIe can provide such regions on ARM. > > I think there are three possible solutions: > > 1. provide an alternative memcpy implementation that doesn't do unaligned > accesses and recompile the graphics software with -mstrict-align > > 2. map the PCI BAR as device memory and emulate the unaligned instructions > > 3. find some hardware workaround that could insert delays between the PCIe > accesses (but the hardware engineers need to cooperate on this instead of > asserting that they refuse tu support it) > Are we talking about a quirk for the Armada 8040 or about PCIe on ARM in general? If the latter, I still haven't seen an explanation why the particulars of AMBA justify overlapped writes being dropped at will by the interconnect. ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 10:37 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 10:37 UTC (permalink / raw) To: linux-arm-kernel On 6 August 2018 at 12:31, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > >> On 6 August 2018 at 10:02, Mikulas Patocka <mpatocka@redhat.com> wrote: >> > >> > >> > On Sun, 5 Aug 2018, Florian Weimer wrote: >> > >> >> On 08/04/2018 01:04 PM, Mikulas Patocka wrote: >> >> > There's plenty of memcpy's in the graphics stack. No one will be rewriting >> >> > all the graphics drivers because of tiny market share that ARM has in >> >> > desktop computers. So if you refuse to fix things and blame everyone else, >> >> > you can as well announce that you don't want to have PCIe graphics on ARM >> >> > at all. >> >> >> >> The POWER toolchain maintainers said pretty much the same thing not too >> >> long ago. I wonder how many architectures need to fail until the >> >> graphics stack is finally fixed. >> >> >> >> Thanks, >> >> Florian >> > >> > If you say that your architecture doesn't support unaligned accesses at >> > all, there's no problem - the compiler won't generate them and the libc >> > won't contain them. >> > >> > But if you say that your architecture supports unaligned accesses except >> > for the framebuffer, then you have a problem - the compiler can't know >> > which pointers point to the framebuffer and libc can't know either - you >> > caused this problem by your architectural decision. >> > >> > You can use 'volatile' to suppress memory optimizations, but it's >> > impossible to go through the whole Linux graphics stack and add volatile >> > to every pointer that may point to videoram. Even if you succeesed, new >> > videoram accesses without volatile will appear after a year of >> > development. >> > >> > See for example the macros READ_ONCE and WRITE_ONCE in Linux kernel - they >> > should be used when there's concurrent access to the particular variable, >> > but mainstream architectures don't require them, so many kernel developers >> > are omitting them in their code. >> > >> > If you are building a supercomputer with a particular GPU, you can force >> > the GPU vendor to provide POWER-compliant drivers. If you are building a >> > workstation where the user can plug any GPU, forcing developers will go >> > nowhere. You have to emulate the unaligned accesses and make sure that the >> > next versions of your architecture support them in hardware. >> > >> >> I have the feeling this discussion is going off the rails again. >> >> The original report is about corruption when doing overlapping writes. >> Matt Sealey said you cannot have PCI outbound windows with memory >> semantics on ARM, and so you should be using device mappings (which do >> not tolerate unaligned accesses) >> >> In this context, 'device mapping' does not mean 'any non-DRAM region', >> but it refers to a particular type of MMU mapping attribute defined by >> the ARM architecture. >> >> I think we can all agree that memcpy() should be usable on any region >> of memory that has true memory semantics, even if it is backed by VRAM >> on a graphics card. >> >> The question is if PCIe can provide such regions on ARM. > > I think there are three possible solutions: > > 1. provide an alternative memcpy implementation that doesn't do unaligned > accesses and recompile the graphics software with -mstrict-align > > 2. map the PCI BAR as device memory and emulate the unaligned instructions > > 3. find some hardware workaround that could insert delays between the PCIe > accesses (but the hardware engineers need to cooperate on this instead of > asserting that they refuse tu support it) > Are we talking about a quirk for the Armada 8040 or about PCIe on ARM in general? If the latter, I still haven't seen an explanation why the particulars of AMBA justify overlapped writes being dropped at will by the interconnect. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 10:37 ` Ard Biesheuvel @ 2018-08-06 10:42 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 10:42 UTC (permalink / raw) To: Ard Biesheuvel Cc: Florian Weimer, Andrew Pinski, Richard Earnshaw, Ramana Radhakrishnan, Thomas Petazzoni, GNU C Library, Catalin Marinas, Will Deacon, Russell King, LKML, linux-arm-kernel On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > On 6 August 2018 at 12:31, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > > > >> On 6 August 2018 at 10:02, Mikulas Patocka <mpatocka@redhat.com> wrote: > >> > > >> > > >> > On Sun, 5 Aug 2018, Florian Weimer wrote: > >> > > >> >> On 08/04/2018 01:04 PM, Mikulas Patocka wrote: > >> >> > There's plenty of memcpy's in the graphics stack. No one will be rewriting > >> >> > all the graphics drivers because of tiny market share that ARM has in > >> >> > desktop computers. So if you refuse to fix things and blame everyone else, > >> >> > you can as well announce that you don't want to have PCIe graphics on ARM > >> >> > at all. > >> >> > >> >> The POWER toolchain maintainers said pretty much the same thing not too > >> >> long ago. I wonder how many architectures need to fail until the > >> >> graphics stack is finally fixed. > >> >> > >> >> Thanks, > >> >> Florian > >> > > >> > If you say that your architecture doesn't support unaligned accesses at > >> > all, there's no problem - the compiler won't generate them and the libc > >> > won't contain them. > >> > > >> > But if you say that your architecture supports unaligned accesses except > >> > for the framebuffer, then you have a problem - the compiler can't know > >> > which pointers point to the framebuffer and libc can't know either - you > >> > caused this problem by your architectural decision. > >> > > >> > You can use 'volatile' to suppress memory optimizations, but it's > >> > impossible to go through the whole Linux graphics stack and add volatile > >> > to every pointer that may point to videoram. Even if you succeesed, new > >> > videoram accesses without volatile will appear after a year of > >> > development. > >> > > >> > See for example the macros READ_ONCE and WRITE_ONCE in Linux kernel - they > >> > should be used when there's concurrent access to the particular variable, > >> > but mainstream architectures don't require them, so many kernel developers > >> > are omitting them in their code. > >> > > >> > If you are building a supercomputer with a particular GPU, you can force > >> > the GPU vendor to provide POWER-compliant drivers. If you are building a > >> > workstation where the user can plug any GPU, forcing developers will go > >> > nowhere. You have to emulate the unaligned accesses and make sure that the > >> > next versions of your architecture support them in hardware. > >> > > >> > >> I have the feeling this discussion is going off the rails again. > >> > >> The original report is about corruption when doing overlapping writes. > >> Matt Sealey said you cannot have PCI outbound windows with memory > >> semantics on ARM, and so you should be using device mappings (which do > >> not tolerate unaligned accesses) > >> > >> In this context, 'device mapping' does not mean 'any non-DRAM region', > >> but it refers to a particular type of MMU mapping attribute defined by > >> the ARM architecture. > >> > >> I think we can all agree that memcpy() should be usable on any region > >> of memory that has true memory semantics, even if it is backed by VRAM > >> on a graphics card. > >> > >> The question is if PCIe can provide such regions on ARM. > > > > I think there are three possible solutions: > > > > 1. provide an alternative memcpy implementation that doesn't do unaligned > > accesses and recompile the graphics software with -mstrict-align > > > > 2. map the PCI BAR as device memory and emulate the unaligned instructions > > > > 3. find some hardware workaround that could insert delays between the PCIe > > accesses (but the hardware engineers need to cooperate on this instead of > > asserting that they refuse tu support it) > > > > Are we talking about a quirk for the Armada 8040 or about PCIe on ARM > in general? I don't know - there are not any other easily available PCIe ARM boards except for Armada 8040. > If the latter, I still haven't seen an explanation why the particulars > of AMBA justify overlapped writes being dropped at will by the > interconnect. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 10:42 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 10:42 UTC (permalink / raw) To: linux-arm-kernel On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > On 6 August 2018 at 12:31, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > > > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > > > >> On 6 August 2018 at 10:02, Mikulas Patocka <mpatocka@redhat.com> wrote: > >> > > >> > > >> > On Sun, 5 Aug 2018, Florian Weimer wrote: > >> > > >> >> On 08/04/2018 01:04 PM, Mikulas Patocka wrote: > >> >> > There's plenty of memcpy's in the graphics stack. No one will be rewriting > >> >> > all the graphics drivers because of tiny market share that ARM has in > >> >> > desktop computers. So if you refuse to fix things and blame everyone else, > >> >> > you can as well announce that you don't want to have PCIe graphics on ARM > >> >> > at all. > >> >> > >> >> The POWER toolchain maintainers said pretty much the same thing not too > >> >> long ago. I wonder how many architectures need to fail until the > >> >> graphics stack is finally fixed. > >> >> > >> >> Thanks, > >> >> Florian > >> > > >> > If you say that your architecture doesn't support unaligned accesses at > >> > all, there's no problem - the compiler won't generate them and the libc > >> > won't contain them. > >> > > >> > But if you say that your architecture supports unaligned accesses except > >> > for the framebuffer, then you have a problem - the compiler can't know > >> > which pointers point to the framebuffer and libc can't know either - you > >> > caused this problem by your architectural decision. > >> > > >> > You can use 'volatile' to suppress memory optimizations, but it's > >> > impossible to go through the whole Linux graphics stack and add volatile > >> > to every pointer that may point to videoram. Even if you succeesed, new > >> > videoram accesses without volatile will appear after a year of > >> > development. > >> > > >> > See for example the macros READ_ONCE and WRITE_ONCE in Linux kernel - they > >> > should be used when there's concurrent access to the particular variable, > >> > but mainstream architectures don't require them, so many kernel developers > >> > are omitting them in their code. > >> > > >> > If you are building a supercomputer with a particular GPU, you can force > >> > the GPU vendor to provide POWER-compliant drivers. If you are building a > >> > workstation where the user can plug any GPU, forcing developers will go > >> > nowhere. You have to emulate the unaligned accesses and make sure that the > >> > next versions of your architecture support them in hardware. > >> > > >> > >> I have the feeling this discussion is going off the rails again. > >> > >> The original report is about corruption when doing overlapping writes. > >> Matt Sealey said you cannot have PCI outbound windows with memory > >> semantics on ARM, and so you should be using device mappings (which do > >> not tolerate unaligned accesses) > >> > >> In this context, 'device mapping' does not mean 'any non-DRAM region', > >> but it refers to a particular type of MMU mapping attribute defined by > >> the ARM architecture. > >> > >> I think we can all agree that memcpy() should be usable on any region > >> of memory that has true memory semantics, even if it is backed by VRAM > >> on a graphics card. > >> > >> The question is if PCIe can provide such regions on ARM. > > > > I think there are three possible solutions: > > > > 1. provide an alternative memcpy implementation that doesn't do unaligned > > accesses and recompile the graphics software with -mstrict-align > > > > 2. map the PCI BAR as device memory and emulate the unaligned instructions > > > > 3. find some hardware workaround that could insert delays between the PCIe > > accesses (but the hardware engineers need to cooperate on this instead of > > asserting that they refuse tu support it) > > > > Are we talking about a quirk for the Armada 8040 or about PCIe on ARM > in general? I don't know - there are not any other easily available PCIe ARM boards except for Armada 8040. > If the latter, I still haven't seen an explanation why the particulars > of AMBA justify overlapped writes being dropped at will by the > interconnect. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 10:42 ` Mikulas Patocka @ 2018-08-06 10:48 ` Ard Biesheuvel -1 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 10:48 UTC (permalink / raw) To: Mikulas Patocka Cc: Florian Weimer, Andrew Pinski, Richard Earnshaw, Ramana Radhakrishnan, Thomas Petazzoni, GNU C Library, Catalin Marinas, Will Deacon, Russell King, LKML, linux-arm-kernel On 6 August 2018 at 12:42, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > >> On 6 August 2018 at 12:31, Mikulas Patocka <mpatocka@redhat.com> wrote: >> > >> > >> > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: >> > >> >> On 6 August 2018 at 10:02, Mikulas Patocka <mpatocka@redhat.com> wrote: >> >> > >> >> > >> >> > On Sun, 5 Aug 2018, Florian Weimer wrote: >> >> > >> >> >> On 08/04/2018 01:04 PM, Mikulas Patocka wrote: >> >> >> > There's plenty of memcpy's in the graphics stack. No one will be rewriting >> >> >> > all the graphics drivers because of tiny market share that ARM has in >> >> >> > desktop computers. So if you refuse to fix things and blame everyone else, >> >> >> > you can as well announce that you don't want to have PCIe graphics on ARM >> >> >> > at all. >> >> >> >> >> >> The POWER toolchain maintainers said pretty much the same thing not too >> >> >> long ago. I wonder how many architectures need to fail until the >> >> >> graphics stack is finally fixed. >> >> >> >> >> >> Thanks, >> >> >> Florian >> >> > >> >> > If you say that your architecture doesn't support unaligned accesses at >> >> > all, there's no problem - the compiler won't generate them and the libc >> >> > won't contain them. >> >> > >> >> > But if you say that your architecture supports unaligned accesses except >> >> > for the framebuffer, then you have a problem - the compiler can't know >> >> > which pointers point to the framebuffer and libc can't know either - you >> >> > caused this problem by your architectural decision. >> >> > >> >> > You can use 'volatile' to suppress memory optimizations, but it's >> >> > impossible to go through the whole Linux graphics stack and add volatile >> >> > to every pointer that may point to videoram. Even if you succeesed, new >> >> > videoram accesses without volatile will appear after a year of >> >> > development. >> >> > >> >> > See for example the macros READ_ONCE and WRITE_ONCE in Linux kernel - they >> >> > should be used when there's concurrent access to the particular variable, >> >> > but mainstream architectures don't require them, so many kernel developers >> >> > are omitting them in their code. >> >> > >> >> > If you are building a supercomputer with a particular GPU, you can force >> >> > the GPU vendor to provide POWER-compliant drivers. If you are building a >> >> > workstation where the user can plug any GPU, forcing developers will go >> >> > nowhere. You have to emulate the unaligned accesses and make sure that the >> >> > next versions of your architecture support them in hardware. >> >> > >> >> >> >> I have the feeling this discussion is going off the rails again. >> >> >> >> The original report is about corruption when doing overlapping writes. >> >> Matt Sealey said you cannot have PCI outbound windows with memory >> >> semantics on ARM, and so you should be using device mappings (which do >> >> not tolerate unaligned accesses) >> >> >> >> In this context, 'device mapping' does not mean 'any non-DRAM region', >> >> but it refers to a particular type of MMU mapping attribute defined by >> >> the ARM architecture. >> >> >> >> I think we can all agree that memcpy() should be usable on any region >> >> of memory that has true memory semantics, even if it is backed by VRAM >> >> on a graphics card. >> >> >> >> The question is if PCIe can provide such regions on ARM. >> > >> > I think there are three possible solutions: >> > >> > 1. provide an alternative memcpy implementation that doesn't do unaligned >> > accesses and recompile the graphics software with -mstrict-align >> > >> > 2. map the PCI BAR as device memory and emulate the unaligned instructions >> > >> > 3. find some hardware workaround that could insert delays between the PCIe >> > accesses (but the hardware engineers need to cooperate on this instead of >> > asserting that they refuse tu support it) >> > >> >> Are we talking about a quirk for the Armada 8040 or about PCIe on ARM >> in general? > > I don't know - there are not any other easily available PCIe ARM boards > except for Armada 8040. > ... indeed, and sadly, the ones that are available all have this horrible Synopsys DesignWare PCIe IP that does not implement a true root complex at all, but is simply repurposed endpoint IP with some tweaks so it vaguely resembles a root complex. But this is exactly why I am asking: I use a AMD Seattle Overdrive as my main Linux development system, and it runs the gnome-shell stack flawlessly (using the nouveau driver), as well as a UEFI framebuffer using efifb. So my suspicion is that this is either a Synopsys IP issue or an interconnect issue, and has nothing to do with the impedance mismatch between AMBA and PCIe. ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 10:48 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 10:48 UTC (permalink / raw) To: linux-arm-kernel On 6 August 2018 at 12:42, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > >> On 6 August 2018 at 12:31, Mikulas Patocka <mpatocka@redhat.com> wrote: >> > >> > >> > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: >> > >> >> On 6 August 2018 at 10:02, Mikulas Patocka <mpatocka@redhat.com> wrote: >> >> > >> >> > >> >> > On Sun, 5 Aug 2018, Florian Weimer wrote: >> >> > >> >> >> On 08/04/2018 01:04 PM, Mikulas Patocka wrote: >> >> >> > There's plenty of memcpy's in the graphics stack. No one will be rewriting >> >> >> > all the graphics drivers because of tiny market share that ARM has in >> >> >> > desktop computers. So if you refuse to fix things and blame everyone else, >> >> >> > you can as well announce that you don't want to have PCIe graphics on ARM >> >> >> > at all. >> >> >> >> >> >> The POWER toolchain maintainers said pretty much the same thing not too >> >> >> long ago. I wonder how many architectures need to fail until the >> >> >> graphics stack is finally fixed. >> >> >> >> >> >> Thanks, >> >> >> Florian >> >> > >> >> > If you say that your architecture doesn't support unaligned accesses at >> >> > all, there's no problem - the compiler won't generate them and the libc >> >> > won't contain them. >> >> > >> >> > But if you say that your architecture supports unaligned accesses except >> >> > for the framebuffer, then you have a problem - the compiler can't know >> >> > which pointers point to the framebuffer and libc can't know either - you >> >> > caused this problem by your architectural decision. >> >> > >> >> > You can use 'volatile' to suppress memory optimizations, but it's >> >> > impossible to go through the whole Linux graphics stack and add volatile >> >> > to every pointer that may point to videoram. Even if you succeesed, new >> >> > videoram accesses without volatile will appear after a year of >> >> > development. >> >> > >> >> > See for example the macros READ_ONCE and WRITE_ONCE in Linux kernel - they >> >> > should be used when there's concurrent access to the particular variable, >> >> > but mainstream architectures don't require them, so many kernel developers >> >> > are omitting them in their code. >> >> > >> >> > If you are building a supercomputer with a particular GPU, you can force >> >> > the GPU vendor to provide POWER-compliant drivers. If you are building a >> >> > workstation where the user can plug any GPU, forcing developers will go >> >> > nowhere. You have to emulate the unaligned accesses and make sure that the >> >> > next versions of your architecture support them in hardware. >> >> > >> >> >> >> I have the feeling this discussion is going off the rails again. >> >> >> >> The original report is about corruption when doing overlapping writes. >> >> Matt Sealey said you cannot have PCI outbound windows with memory >> >> semantics on ARM, and so you should be using device mappings (which do >> >> not tolerate unaligned accesses) >> >> >> >> In this context, 'device mapping' does not mean 'any non-DRAM region', >> >> but it refers to a particular type of MMU mapping attribute defined by >> >> the ARM architecture. >> >> >> >> I think we can all agree that memcpy() should be usable on any region >> >> of memory that has true memory semantics, even if it is backed by VRAM >> >> on a graphics card. >> >> >> >> The question is if PCIe can provide such regions on ARM. >> > >> > I think there are three possible solutions: >> > >> > 1. provide an alternative memcpy implementation that doesn't do unaligned >> > accesses and recompile the graphics software with -mstrict-align >> > >> > 2. map the PCI BAR as device memory and emulate the unaligned instructions >> > >> > 3. find some hardware workaround that could insert delays between the PCIe >> > accesses (but the hardware engineers need to cooperate on this instead of >> > asserting that they refuse tu support it) >> > >> >> Are we talking about a quirk for the Armada 8040 or about PCIe on ARM >> in general? > > I don't know - there are not any other easily available PCIe ARM boards > except for Armada 8040. > ... indeed, and sadly, the ones that are available all have this horrible Synopsys DesignWare PCIe IP that does not implement a true root complex at all, but is simply repurposed endpoint IP with some tweaks so it vaguely resembles a root complex. But this is exactly why I am asking: I use a AMD Seattle Overdrive as my main Linux development system, and it runs the gnome-shell stack flawlessly (using the nouveau driver), as well as a UEFI framebuffer using efifb. So my suspicion is that this is either a Synopsys IP issue or an interconnect issue, and has nothing to do with the impedance mismatch between AMBA and PCIe. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 10:48 ` Ard Biesheuvel @ 2018-08-06 12:09 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 12:09 UTC (permalink / raw) To: Ard Biesheuvel Cc: Florian Weimer, Andrew Pinski, Richard Earnshaw, Ramana Radhakrishnan, Thomas Petazzoni, GNU C Library, Catalin Marinas, Will Deacon, Russell King, LKML, linux-arm-kernel On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > >> Are we talking about a quirk for the Armada 8040 or about PCIe on ARM > >> in general? > > > > I don't know - there are not any other easily available PCIe ARM boards > > except for Armada 8040. > > ... indeed, and sadly, the ones that are available all have this > horrible Synopsys DesignWare PCIe IP that does not implement a true > root complex at all, but is simply repurposed endpoint IP with some > tweaks so it vaguely resembles a root complex. > > But this is exactly why I am asking: I use a AMD Seattle Overdrive as > my main Linux development system, and it runs the gnome-shell stack > flawlessly (using the nouveau driver), as well as a UEFI framebuffer > using efifb. So my suspicion is that this is either a Synopsys IP > issue or an interconnect issue, and has nothing to do with the > impedance mismatch between AMBA and PCIe. If you run the program for testing memcpy on framebuffer that I posted in this thread - does it detect some corruption for you? BTW. does the Radeon GPU driver work for you? My observation is that OpenGL with Nouveau works, but it's slow and the whole system locks up when playing video in chromium. Radeon HD 6350 (pre-GCN), doesn't lock up, but OpenGL (and Glamour) has many artifacts and corrupted textures. When I switch it to EXA acceleration and don't use OpenGL, it works. The artifacts are not fixed by preloading a glibc with fixed memcpy, so there's supposedly some other bug somewhere. Unfortunatelly, there's no low-power GCN card. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 12:09 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 12:09 UTC (permalink / raw) To: linux-arm-kernel On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > >> Are we talking about a quirk for the Armada 8040 or about PCIe on ARM > >> in general? > > > > I don't know - there are not any other easily available PCIe ARM boards > > except for Armada 8040. > > ... indeed, and sadly, the ones that are available all have this > horrible Synopsys DesignWare PCIe IP that does not implement a true > root complex at all, but is simply repurposed endpoint IP with some > tweaks so it vaguely resembles a root complex. > > But this is exactly why I am asking: I use a AMD Seattle Overdrive as > my main Linux development system, and it runs the gnome-shell stack > flawlessly (using the nouveau driver), as well as a UEFI framebuffer > using efifb. So my suspicion is that this is either a Synopsys IP > issue or an interconnect issue, and has nothing to do with the > impedance mismatch between AMBA and PCIe. If you run the program for testing memcpy on framebuffer that I posted in this thread - does it detect some corruption for you? BTW. does the Radeon GPU driver work for you? My observation is that OpenGL with Nouveau works, but it's slow and the whole system locks up when playing video in chromium. Radeon HD 6350 (pre-GCN), doesn't lock up, but OpenGL (and Glamour) has many artifacts and corrupted textures. When I switch it to EXA acceleration and don't use OpenGL, it works. The artifacts are not fixed by preloading a glibc with fixed memcpy, so there's supposedly some other bug somewhere. Unfortunatelly, there's no low-power GCN card. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 12:09 ` Mikulas Patocka @ 2018-08-06 12:19 ` Ard Biesheuvel -1 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 12:19 UTC (permalink / raw) To: Mikulas Patocka Cc: Florian Weimer, Andrew Pinski, Richard Earnshaw, Ramana Radhakrishnan, Thomas Petazzoni, GNU C Library, Catalin Marinas, Will Deacon, Russell King, LKML, linux-arm-kernel On 6 August 2018 at 14:09, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > >> >> Are we talking about a quirk for the Armada 8040 or about PCIe on ARM >> >> in general? >> > >> > I don't know - there are not any other easily available PCIe ARM boards >> > except for Armada 8040. >> >> ... indeed, and sadly, the ones that are available all have this >> horrible Synopsys DesignWare PCIe IP that does not implement a true >> root complex at all, but is simply repurposed endpoint IP with some >> tweaks so it vaguely resembles a root complex. >> >> But this is exactly why I am asking: I use a AMD Seattle Overdrive as >> my main Linux development system, and it runs the gnome-shell stack >> flawlessly (using the nouveau driver), as well as a UEFI framebuffer >> using efifb. So my suspicion is that this is either a Synopsys IP >> issue or an interconnect issue, and has nothing to do with the >> impedance mismatch between AMBA and PCIe. > > If you run the program for testing memcpy on framebuffer that I posted in > this thread - does it detect some corruption for you? > I won't be able to check that for a while - I'm currently travelling. > > BTW. does the Radeon GPU driver work for you? > > My observation is that OpenGL with Nouveau works, but it's slow and the > whole system locks up when playing video in chromium. > No that works fine for me. VDPAU acceleration works as well, but it depends on your chromium build whether it can actually use it, I think? In any case, mplayer can use vdpau to play 1080p h264 without breaking a sweat on this system. Note that the VDPAU driver also relies on memory semantics, i.e., it may use DC ZVA (zero cacheline) instructions which are not permitted on device mappings. This is probably just glibc's memset() being invoked, but I remember hitting this on another PCIe-impaired arm64 system with Synopsys PCIe IP > Radeon HD 6350 (pre-GCN), doesn't lock up, but OpenGL (and Glamour) has > many artifacts and corrupted textures. When I switch it to EXA > acceleration and don't use OpenGL, it works. > > The artifacts are not fixed by preloading a glibc with fixed memcpy, so > there's supposedly some other bug somewhere. > Yes, I have the same experience, and I have been meaning to report it to the maintainers/developers. Good to have another data point. ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 12:19 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 12:19 UTC (permalink / raw) To: linux-arm-kernel On 6 August 2018 at 14:09, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > >> >> Are we talking about a quirk for the Armada 8040 or about PCIe on ARM >> >> in general? >> > >> > I don't know - there are not any other easily available PCIe ARM boards >> > except for Armada 8040. >> >> ... indeed, and sadly, the ones that are available all have this >> horrible Synopsys DesignWare PCIe IP that does not implement a true >> root complex at all, but is simply repurposed endpoint IP with some >> tweaks so it vaguely resembles a root complex. >> >> But this is exactly why I am asking: I use a AMD Seattle Overdrive as >> my main Linux development system, and it runs the gnome-shell stack >> flawlessly (using the nouveau driver), as well as a UEFI framebuffer >> using efifb. So my suspicion is that this is either a Synopsys IP >> issue or an interconnect issue, and has nothing to do with the >> impedance mismatch between AMBA and PCIe. > > If you run the program for testing memcpy on framebuffer that I posted in > this thread - does it detect some corruption for you? > I won't be able to check that for a while - I'm currently travelling. > > BTW. does the Radeon GPU driver work for you? > > My observation is that OpenGL with Nouveau works, but it's slow and the > whole system locks up when playing video in chromium. > No that works fine for me. VDPAU acceleration works as well, but it depends on your chromium build whether it can actually use it, I think? In any case, mplayer can use vdpau to play 1080p h264 without breaking a sweat on this system. Note that the VDPAU driver also relies on memory semantics, i.e., it may use DC ZVA (zero cacheline) instructions which are not permitted on device mappings. This is probably just glibc's memset() being invoked, but I remember hitting this on another PCIe-impaired arm64 system with Synopsys PCIe IP > Radeon HD 6350 (pre-GCN), doesn't lock up, but OpenGL (and Glamour) has > many artifacts and corrupted textures. When I switch it to EXA > acceleration and don't use OpenGL, it works. > > The artifacts are not fixed by preloading a glibc with fixed memcpy, so > there's supposedly some other bug somewhere. > Yes, I have the same experience, and I have been meaning to report it to the maintainers/developers. Good to have another data point. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 12:19 ` Ard Biesheuvel @ 2018-08-06 12:22 ` Ard Biesheuvel -1 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 12:22 UTC (permalink / raw) To: Mikulas Patocka Cc: Florian Weimer, Andrew Pinski, Richard Earnshaw, Ramana Radhakrishnan, Thomas Petazzoni, GNU C Library, Catalin Marinas, Will Deacon, Russell King, LKML, linux-arm-kernel On 6 August 2018 at 14:19, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 6 August 2018 at 14:09, Mikulas Patocka <mpatocka@redhat.com> wrote: >> >> >> On Mon, 6 Aug 2018, Ard Biesheuvel wrote: >> >>> >> Are we talking about a quirk for the Armada 8040 or about PCIe on ARM >>> >> in general? >>> > >>> > I don't know - there are not any other easily available PCIe ARM boards >>> > except for Armada 8040. >>> >>> ... indeed, and sadly, the ones that are available all have this >>> horrible Synopsys DesignWare PCIe IP that does not implement a true >>> root complex at all, but is simply repurposed endpoint IP with some >>> tweaks so it vaguely resembles a root complex. >>> >>> But this is exactly why I am asking: I use a AMD Seattle Overdrive as >>> my main Linux development system, and it runs the gnome-shell stack >>> flawlessly (using the nouveau driver), as well as a UEFI framebuffer >>> using efifb. So my suspicion is that this is either a Synopsys IP >>> issue or an interconnect issue, and has nothing to do with the >>> impedance mismatch between AMBA and PCIe. >> >> If you run the program for testing memcpy on framebuffer that I posted in >> this thread - does it detect some corruption for you? >> > > I won't be able to check that for a while - I'm currently travelling. > >> >> BTW. does the Radeon GPU driver work for you? >> >> My observation is that OpenGL with Nouveau works, but it's slow and the >> whole system locks up when playing video in chromium. >> Are you setting the pstate to auto? That helps a lot in my experience. I.e., echo auto > /sys/kernel/debug/dri/0/pstate ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 12:22 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 12:22 UTC (permalink / raw) To: linux-arm-kernel On 6 August 2018 at 14:19, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 6 August 2018 at 14:09, Mikulas Patocka <mpatocka@redhat.com> wrote: >> >> >> On Mon, 6 Aug 2018, Ard Biesheuvel wrote: >> >>> >> Are we talking about a quirk for the Armada 8040 or about PCIe on ARM >>> >> in general? >>> > >>> > I don't know - there are not any other easily available PCIe ARM boards >>> > except for Armada 8040. >>> >>> ... indeed, and sadly, the ones that are available all have this >>> horrible Synopsys DesignWare PCIe IP that does not implement a true >>> root complex at all, but is simply repurposed endpoint IP with some >>> tweaks so it vaguely resembles a root complex. >>> >>> But this is exactly why I am asking: I use a AMD Seattle Overdrive as >>> my main Linux development system, and it runs the gnome-shell stack >>> flawlessly (using the nouveau driver), as well as a UEFI framebuffer >>> using efifb. So my suspicion is that this is either a Synopsys IP >>> issue or an interconnect issue, and has nothing to do with the >>> impedance mismatch between AMBA and PCIe. >> >> If you run the program for testing memcpy on framebuffer that I posted in >> this thread - does it detect some corruption for you? >> > > I won't be able to check that for a while - I'm currently travelling. > >> >> BTW. does the Radeon GPU driver work for you? >> >> My observation is that OpenGL with Nouveau works, but it's slow and the >> whole system locks up when playing video in chromium. >> Are you setting the pstate to auto? That helps a lot in my experience. I.e., echo auto > /sys/kernel/debug/dri/0/pstate ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 12:19 ` Ard Biesheuvel @ 2018-08-07 14:14 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-07 14:14 UTC (permalink / raw) To: Ard Biesheuvel Cc: Florian Weimer, Andrew Pinski, Richard Earnshaw, Ramana Radhakrishnan, Thomas Petazzoni, GNU C Library, Catalin Marinas, Will Deacon, Russell King, LKML, linux-arm-kernel On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > No that works fine for me. VDPAU acceleration works as well, but it > depends on your chromium build whether it can actually use it, I > think? In any case, mplayer can use vdpau to play 1080p h264 without > breaking a sweat on this system. > > Note that the VDPAU driver also relies on memory semantics, i.e., it > may use DC ZVA (zero cacheline) instructions which are not permitted > on device mappings. This is probably just glibc's memset() being > invoked, but I remember hitting this on another PCIe-impaired arm64 > system with Synopsys PCIe IP DC ZVA can be disabled with the SCTRL_EL1.DZE bit, so that neither kernel nor userspace will use it. If the mapping didn't support unaligned writes, it would be worse. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-07 14:14 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-07 14:14 UTC (permalink / raw) To: linux-arm-kernel On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > No that works fine for me. VDPAU acceleration works as well, but it > depends on your chromium build whether it can actually use it, I > think? In any case, mplayer can use vdpau to play 1080p h264 without > breaking a sweat on this system. > > Note that the VDPAU driver also relies on memory semantics, i.e., it > may use DC ZVA (zero cacheline) instructions which are not permitted > on device mappings. This is probably just glibc's memset() being > invoked, but I remember hitting this on another PCIe-impaired arm64 > system with Synopsys PCIe IP DC ZVA can be disabled with the SCTRL_EL1.DZE bit, so that neither kernel nor userspace will use it. If the mapping didn't support unaligned writes, it would be worse. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-07 14:14 ` Mikulas Patocka @ 2018-08-07 14:40 ` Ard Biesheuvel -1 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-07 14:40 UTC (permalink / raw) To: Mikulas Patocka Cc: Florian Weimer, Andrew Pinski, Richard Earnshaw, Ramana Radhakrishnan, Thomas Petazzoni, GNU C Library, Catalin Marinas, Will Deacon, Russell King, LKML, linux-arm-kernel On 7 August 2018 at 16:14, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > >> No that works fine for me. VDPAU acceleration works as well, but it >> depends on your chromium build whether it can actually use it, I >> think? In any case, mplayer can use vdpau to play 1080p h264 without >> breaking a sweat on this system. >> >> Note that the VDPAU driver also relies on memory semantics, i.e., it >> may use DC ZVA (zero cacheline) instructions which are not permitted >> on device mappings. This is probably just glibc's memset() being >> invoked, but I remember hitting this on another PCIe-impaired arm64 >> system with Synopsys PCIe IP > > DC ZVA can be disabled with the SCTRL_EL1.DZE bit, so that neither kernel > nor userspace will use it. Of course, but only the OS can do that, and only system wide unless we're eager to create infrastructure for managing this per process. But it is also beside the point: I mentioned it to illustrate that even use cases like libvdpau that don't operate on the 'framebuffer' abstraction make assumptions about VRAM having true memory semantics. > If the mapping didn't support unaligned writes, > it would be worse. > > Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-07 14:40 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-07 14:40 UTC (permalink / raw) To: linux-arm-kernel On 7 August 2018 at 16:14, Mikulas Patocka <mpatocka@redhat.com> wrote: > > > On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > >> No that works fine for me. VDPAU acceleration works as well, but it >> depends on your chromium build whether it can actually use it, I >> think? In any case, mplayer can use vdpau to play 1080p h264 without >> breaking a sweat on this system. >> >> Note that the VDPAU driver also relies on memory semantics, i.e., it >> may use DC ZVA (zero cacheline) instructions which are not permitted >> on device mappings. This is probably just glibc's memset() being >> invoked, but I remember hitting this on another PCIe-impaired arm64 >> system with Synopsys PCIe IP > > DC ZVA can be disabled with the SCTRL_EL1.DZE bit, so that neither kernel > nor userspace will use it. Of course, but only the OS can do that, and only system wide unless we're eager to create infrastructure for managing this per process. But it is also beside the point: I mentioned it to illustrate that even use cases like libvdpau that don't operate on the 'framebuffer' abstraction make assumptions about VRAM having true memory semantics. > If the mapping didn't support unaligned writes, > it would be worse. > > Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 12:19 ` Ard Biesheuvel @ 2018-08-08 19:15 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 19:15 UTC (permalink / raw) To: Ard Biesheuvel Cc: Florian Weimer, Andrew Pinski, Richard Earnshaw, Ramana Radhakrishnan, Thomas Petazzoni, GNU C Library, Catalin Marinas, Will Deacon, Russell King, LKML, linux-arm-kernel On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > No that works fine for me. VDPAU acceleration works as well, but it > depends on your chromium build whether it can actually use it, I > think? In any case, mplayer can use vdpau to play 1080p h264 without > breaking a sweat on this system. I didn't install the vdpau libraries and firmware. mplayer plays through xv and works (it can't play through vdpau). Chromium uses I-don't-know-what and locks up. > Note that the VDPAU driver also relies on memory semantics, i.e., it > may use DC ZVA (zero cacheline) instructions which are not permitted > on device mappings. This is probably just glibc's memset() being > invoked, but I remember hitting this on another PCIe-impaired arm64 > system with Synopsys PCIe IP > Are you setting the pstate to auto? That helps a lot in my experience. > > I.e., > > echo auto > /sys/kernel/debug/dri/0/pstate I tried that, but it didn't help. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 19:15 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 19:15 UTC (permalink / raw) To: linux-arm-kernel On Mon, 6 Aug 2018, Ard Biesheuvel wrote: > No that works fine for me. VDPAU acceleration works as well, but it > depends on your chromium build whether it can actually use it, I > think? In any case, mplayer can use vdpau to play 1080p h264 without > breaking a sweat on this system. I didn't install the vdpau libraries and firmware. mplayer plays through xv and works (it can't play through vdpau). Chromium uses I-don't-know-what and locks up. > Note that the VDPAU driver also relies on memory semantics, i.e., it > may use DC ZVA (zero cacheline) instructions which are not permitted > on device mappings. This is probably just glibc's memset() being > invoked, but I remember hitting this on another PCIe-impaired arm64 > system with Synopsys PCIe IP > Are you setting the pstate to auto? That helps a lot in my experience. > > I.e., > > echo auto > /sys/kernel/debug/dri/0/pstate I tried that, but it didn't help. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 10:31 ` Mikulas Patocka @ 2018-08-06 11:19 ` Siddhesh Poyarekar -1 siblings, 0 replies; 238+ messages in thread From: Siddhesh Poyarekar @ 2018-08-06 11:19 UTC (permalink / raw) To: Mikulas Patocka, Ard Biesheuvel Cc: Florian Weimer, Andrew Pinski, Richard Earnshaw, Ramana Radhakrishnan, Thomas Petazzoni, GNU C Library, Catalin Marinas, Will Deacon, Russell King, LKML, linux-arm-kernel, Tulio Magno Quites Machado Filho On 08/06/2018 04:01 PM, Mikulas Patocka wrote: > I think there are three possible solutions: > > 1. provide an alternative memcpy implementation that doesn't do unaligned > accesses and recompile the graphics software with -mstrict-align Given that there's already a tunable glibc.cpu.cached_memopt for powerpc that (as Tulio clarified elsewhere) essentially does the same thing for cache-inhibited memory, it wouldn't be too much of an overhead to put in another ifunc implementation that gets chosen only when one sets this tunable. In fact, we could reuse the C string routines for this to avoid adding yet another assembly implementation to have to support. That way we can minimally fix the issue at hand without regressing existing uses. You can then set the glibc.cpu.cached_memopt tunable in the default environment for your board[1] or for applications that need it (e.g. whenever DISPLAY is exported or something like that). The only difference from Power would be that cpu.noncached==0 for Power by default whereas for aarch64 it will be the other way around. It shouldn't be too hard to enhance the framework to set platform-specific defaults. Siddhesh [1] Or if you're feeling particularly generous, help us implement systemwide tunables since you have an actual use case for it :) ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 11:19 ` Siddhesh Poyarekar 0 siblings, 0 replies; 238+ messages in thread From: Siddhesh Poyarekar @ 2018-08-06 11:19 UTC (permalink / raw) To: linux-arm-kernel On 08/06/2018 04:01 PM, Mikulas Patocka wrote: > I think there are three possible solutions: > > 1. provide an alternative memcpy implementation that doesn't do unaligned > accesses and recompile the graphics software with -mstrict-align Given that there's already a tunable glibc.cpu.cached_memopt for powerpc that (as Tulio clarified elsewhere) essentially does the same thing for cache-inhibited memory, it wouldn't be too much of an overhead to put in another ifunc implementation that gets chosen only when one sets this tunable. In fact, we could reuse the C string routines for this to avoid adding yet another assembly implementation to have to support. That way we can minimally fix the issue at hand without regressing existing uses. You can then set the glibc.cpu.cached_memopt tunable in the default environment for your board[1] or for applications that need it (e.g. whenever DISPLAY is exported or something like that). The only difference from Power would be that cpu.noncached==0 for Power by default whereas for aarch64 it will be the other way around. It shouldn't be too hard to enhance the framework to set platform-specific defaults. Siddhesh [1] Or if you're feeling particularly generous, help us implement systemwide tunables since you have an actual use case for it :) ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 11:19 ` Siddhesh Poyarekar @ 2018-08-06 11:29 ` Ard Biesheuvel -1 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 11:29 UTC (permalink / raw) To: Siddhesh Poyarekar Cc: Mikulas Patocka, Florian Weimer, Andrew Pinski, Richard Earnshaw, Ramana Radhakrishnan, Thomas Petazzoni, GNU C Library, Catalin Marinas, Will Deacon, Russell King, LKML, linux-arm-kernel, Tulio Magno Quites Machado Filho On 6 August 2018 at 13:19, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote: > On 08/06/2018 04:01 PM, Mikulas Patocka wrote: >> >> I think there are three possible solutions: >> >> 1. provide an alternative memcpy implementation that doesn't do unaligned >> accesses and recompile the graphics software with -mstrict-align > > > Given that there's already a tunable glibc.cpu.cached_memopt for powerpc > that (as Tulio clarified elsewhere) essentially does the same thing for > cache-inhibited memory, it wouldn't be too much of an overhead to put in > another ifunc implementation that gets chosen only when one sets this > tunable. In fact, we could reuse the C string routines for this to avoid > adding yet another assembly implementation to have to support. That way we > can minimally fix the issue at hand without regressing existing uses. > > You can then set the glibc.cpu.cached_memopt tunable in the default > environment for your board[1] or for applications that need it (e.g. > whenever DISPLAY is exported or something like that). > > The only difference from Power would be that cpu.noncached==0 for Power by > default whereas for aarch64 it will be the other way around. It shouldn't > be too hard to enhance the framework to set platform-specific defaults. > Thanks Siddhesh, But we don't need another memcpy(). We need outbound PCIe windows that tolerate being mapped as normal non-cacheable memory. And if this is fundamentally impossible, can someone please try explaining it again? (apologies for being thick) ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 11:29 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 11:29 UTC (permalink / raw) To: linux-arm-kernel On 6 August 2018 at 13:19, Siddhesh Poyarekar <siddhesh@gotplt.org> wrote: > On 08/06/2018 04:01 PM, Mikulas Patocka wrote: >> >> I think there are three possible solutions: >> >> 1. provide an alternative memcpy implementation that doesn't do unaligned >> accesses and recompile the graphics software with -mstrict-align > > > Given that there's already a tunable glibc.cpu.cached_memopt for powerpc > that (as Tulio clarified elsewhere) essentially does the same thing for > cache-inhibited memory, it wouldn't be too much of an overhead to put in > another ifunc implementation that gets chosen only when one sets this > tunable. In fact, we could reuse the C string routines for this to avoid > adding yet another assembly implementation to have to support. That way we > can minimally fix the issue at hand without regressing existing uses. > > You can then set the glibc.cpu.cached_memopt tunable in the default > environment for your board[1] or for applications that need it (e.g. > whenever DISPLAY is exported or something like that). > > The only difference from Power would be that cpu.noncached==0 for Power by > default whereas for aarch64 it will be the other way around. It shouldn't > be too hard to enhance the framework to set platform-specific defaults. > Thanks Siddhesh, But we don't need another memcpy(). We need outbound PCIe windows that tolerate being mapped as normal non-cacheable memory. And if this is fundamentally impossible, can someone please try explaining it again? (apologies for being thick) ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-05 18:33 ` Florian Weimer @ 2018-08-06 14:26 ` Tulio Magno Quites Machado Filho -1 siblings, 0 replies; 238+ messages in thread From: Tulio Magno Quites Machado Filho @ 2018-08-06 14:26 UTC (permalink / raw) To: Florian Weimer, Mikulas Patocka, Andrew Pinski Cc: Richard Earnshaw, ard.biesheuvel, Ramana Radhakrishnan, thomas.petazzoni, GNU C Library, Catalin Marinas, Will Deacon, linux, LKML, linux-arm-kernel Florian Weimer <fweimer@redhat.com> writes: > On 08/04/2018 01:04 PM, Mikulas Patocka wrote: >> There's plenty of memcpy's in the graphics stack. No one will be rewriting >> all the graphics drivers because of tiny market share that ARM has in >> desktop computers. So if you refuse to fix things and blame everyone else, >> you can as well announce that you don't want to have PCIe graphics on ARM >> at all. > > The POWER toolchain maintainers said pretty much the same thing not too > long ago. I wonder how many architectures need to fail until the > graphics stack is finally fixed. Unfortunately, it is not just the graphics stack. This is being used in other userspace programs that benefit from GPUs and accelerators. But can we say they're are nonportable programs? I'm not convinced yet. -- Tulio Magno ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 14:26 ` Tulio Magno Quites Machado Filho 0 siblings, 0 replies; 238+ messages in thread From: Tulio Magno Quites Machado Filho @ 2018-08-06 14:26 UTC (permalink / raw) To: linux-arm-kernel Florian Weimer <fweimer@redhat.com> writes: > On 08/04/2018 01:04 PM, Mikulas Patocka wrote: >> There's plenty of memcpy's in the graphics stack. No one will be rewriting >> all the graphics drivers because of tiny market share that ARM has in >> desktop computers. So if you refuse to fix things and blame everyone else, >> you can as well announce that you don't want to have PCIe graphics on ARM >> at all. > > The POWER toolchain maintainers said pretty much the same thing not too > long ago. I wonder how many architectures need to fail until the > graphics stack is finally fixed. Unfortunately, it is not just the graphics stack. This is being used in other userspace programs that benefit from GPUs and accelerators. But can we say they're are nonportable programs? I'm not convinced yet. -- Tulio Magno ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-04 11:04 ` Mikulas Patocka @ 2018-08-05 21:51 ` Pavel Machek -1 siblings, 0 replies; 238+ messages in thread From: Pavel Machek @ 2018-08-05 21:51 UTC (permalink / raw) To: Mikulas Patocka Cc: Andrew Pinski, Richard Earnshaw, ard.biesheuvel, Ramana Radhakrishnan, Florian Weimer, thomas.petazzoni, GNU C Library, Catalin Marinas, Will Deacon, linux, LKML, linux-arm-kernel [-- Attachment #1: Type: text/plain, Size: 940 bytes --] Hi! > > Can you run the test program on x86 using the similar framebuffer > > setup? Does doing two writes (one aligned and one unaligned but > > overlapping with previous one) cause the same issue? I suspect it > > does, then using memcpy for frame buffers is wrong. I'm pretty sure it will work ok on x86. > Overlapping unaligned writes work on x86 - they have to, because of > backward compatibility. It is not that easy. 8086s (and similar) did not have MTRRs and PATs either. Overlapping unaligned writes _on main memory_, _with normal MTRR settings_ certainly work ok on x86. Chances is memory type can be configured to work similar way on your ARM/PCIe case? > 8086, 80286 and 80386 didn't have any cache at all. 386s had cache (but not on die). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-05 21:51 ` Pavel Machek 0 siblings, 0 replies; 238+ messages in thread From: Pavel Machek @ 2018-08-05 21:51 UTC (permalink / raw) To: linux-arm-kernel Hi! > > Can you run the test program on x86 using the similar framebuffer > > setup? Does doing two writes (one aligned and one unaligned but > > overlapping with previous one) cause the same issue? I suspect it > > does, then using memcpy for frame buffers is wrong. I'm pretty sure it will work ok on x86. > Overlapping unaligned writes work on x86 - they have to, because of > backward compatibility. It is not that easy. 8086s (and similar) did not have MTRRs and PATs either. Overlapping unaligned writes _on main memory_, _with normal MTRR settings_ certainly work ok on x86. Chances is memory type can be configured to work similar way on your ARM/PCIe case? > 8086, 80286 and 80386 didn't have any cache at all. 386s had cache (but not on die). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 181 bytes Desc: Digital signature URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180805/bbf8727f/attachment.sig> ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-05 21:51 ` Pavel Machek @ 2018-08-06 14:30 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 14:30 UTC (permalink / raw) To: Pavel Machek Cc: Andrew Pinski, Richard Earnshaw, ard.biesheuvel, Ramana Radhakrishnan, Florian Weimer, thomas.petazzoni, GNU C Library, Catalin Marinas, Will Deacon, linux, LKML, linux-arm-kernel On Sun, 5 Aug 2018, Pavel Machek wrote: > Hi! > > > > Can you run the test program on x86 using the similar framebuffer > > > setup? Does doing two writes (one aligned and one unaligned but > > > overlapping with previous one) cause the same issue? I suspect it > > > does, then using memcpy for frame buffers is wrong. > > I'm pretty sure it will work ok on x86. > > > Overlapping unaligned writes work on x86 - they have to, because of > > backward compatibility. > > It is not that easy. 8086s (and similar) did not have MTRRs and PATs > either. Overlapping unaligned writes _on main memory_, _with normal > MTRR settings_ certainly work ok on x86. It works even with write-combining. Write-combining specifies, that the writes may hit the framebuffer in unspecified order. But if the writes are overlapping, the CPU can't just reorder them and write the wrong result to the framebuffer. > Chances is memory type can be configured to work similar way on your > ARM/PCIe case? ARM has memory types GRE, nGRE, nGnRE, nGnRnE - that allow or not allow gathering, reordering, early write acknowledgement. Unfortunatelly, all these memory types will trigger a fault on unaligned accesses. It has also Non-Cached memory type (some people on this thread believe that it can't be used for GPUs, some believe that it can) - this memory type supports unaligned accesses, so it is actually used for framebuffers on ARM. If we had a memory type that didn't do early write acknowledgement and supported unaligned accesses, it would solve this problem. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 14:30 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-06 14:30 UTC (permalink / raw) To: linux-arm-kernel On Sun, 5 Aug 2018, Pavel Machek wrote: > Hi! > > > > Can you run the test program on x86 using the similar framebuffer > > > setup? Does doing two writes (one aligned and one unaligned but > > > overlapping with previous one) cause the same issue? I suspect it > > > does, then using memcpy for frame buffers is wrong. > > I'm pretty sure it will work ok on x86. > > > Overlapping unaligned writes work on x86 - they have to, because of > > backward compatibility. > > It is not that easy. 8086s (and similar) did not have MTRRs and PATs > either. Overlapping unaligned writes _on main memory_, _with normal > MTRR settings_ certainly work ok on x86. It works even with write-combining. Write-combining specifies, that the writes may hit the framebuffer in unspecified order. But if the writes are overlapping, the CPU can't just reorder them and write the wrong result to the framebuffer. > Chances is memory type can be configured to work similar way on your > ARM/PCIe case? ARM has memory types GRE, nGRE, nGnRE, nGnRnE - that allow or not allow gathering, reordering, early write acknowledgement. Unfortunatelly, all these memory types will trigger a fault on unaligned accesses. It has also Non-Cached memory type (some people on this thread believe that it can't be used for GPUs, some believe that it can) - this memory type supports unaligned accesses, so it is actually used for framebuffers on ARM. If we had a memory type that didn't do early write acknowledgement and supported unaligned accesses, it would solve this problem. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 9:29 ` Ard Biesheuvel @ 2018-08-03 11:24 ` David Laight -1 siblings, 0 replies; 238+ messages in thread From: David Laight @ 2018-08-03 11:24 UTC (permalink / raw) To: 'Ard Biesheuvel', Ramana Radhakrishnan Cc: Florian Weimer, Thomas Petazzoni, GNU C Library, Andrew Pinski, Catalin Marinas, Will Deacon, Russell King, LKML, Mikulas Patocka, linux-arm-kernel From: Ard Biesheuvel > Sent: 03 August 2018 10:30 ... > The discussion about whether memcpy() should rely on unaligned > accesses, and whether you should use it on device memory is orthogonal > to that, and not the heart of the matter IMO Even on x86 using memcpy() on PCIe memory (maybe mmap()ed into userspace) isn't a good idea. In the kernel memcpy_to/fromio() ought to be a better choice but that is just an alternate name for memcpy(). The problem on x86 is that memcpy() is likely to be implemented as 'rep movsb' on modern cpu - relying on the cpu hardware to perform cache-line sized transfers (etc). Unfortunately on uncached locations it has to revert to byte copies. So PCIe transfers (especially reads) are very slow. The transfers need to use the largest size register available. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 11:24 ` David Laight 0 siblings, 0 replies; 238+ messages in thread From: David Laight @ 2018-08-03 11:24 UTC (permalink / raw) To: linux-arm-kernel From: Ard Biesheuvel > Sent: 03 August 2018 10:30 ... > The discussion about whether memcpy() should rely on unaligned > accesses, and whether you should use it on device memory is orthogonal > to that, and not the heart of the matter IMO Even on x86 using memcpy() on PCIe memory (maybe mmap()ed into userspace) isn't a good idea. In the kernel memcpy_to/fromio() ought to be a better choice but that is just an alternate name for memcpy(). The problem on x86 is that memcpy() is likely to be implemented as 'rep movsb' on modern cpu - relying on the cpu hardware to perform cache-line sized transfers (etc). Unfortunately on uncached locations it has to revert to byte copies. So PCIe transfers (especially reads) are very slow. The transfers need to use the largest size register available. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 11:24 ` David Laight @ 2018-08-03 12:04 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-03 12:04 UTC (permalink / raw) To: David Laight Cc: 'Ard Biesheuvel', Ramana Radhakrishnan, Florian Weimer, Thomas Petazzoni, GNU C Library, Andrew Pinski, Catalin Marinas, Will Deacon, Russell King, LKML, linux-arm-kernel On Fri, 3 Aug 2018, David Laight wrote: > From: Ard Biesheuvel > > Sent: 03 August 2018 10:30 > ... > > The discussion about whether memcpy() should rely on unaligned > > accesses, and whether you should use it on device memory is orthogonal > > to that, and not the heart of the matter IMO > > Even on x86 using memcpy() on PCIe memory (maybe mmap()ed into userspace) > isn't a good idea. > In the kernel memcpy_to/fromio() ought to be a better choice but that > is just an alternate name for memcpy(). > > The problem on x86 is that memcpy() is likely to be implemented as > 'rep movsb' on modern cpu - relying on the cpu hardware to perform > cache-line sized transfers (etc). > Unfortunately on uncached locations it has to revert to byte copies. > So PCIe transfers (especially reads) are very slow. > > The transfers need to use the largest size register available. > > David On x86, the framebuffer is mapped as write-combining memory type, so "rep movsb" could merge the byte writes to larger chunks. I don't have a cpu with the ERMS feature - could anyone try it if rep movsb works worse or better than explicit writes to the framebuffer? Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 12:04 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-03 12:04 UTC (permalink / raw) To: linux-arm-kernel On Fri, 3 Aug 2018, David Laight wrote: > From: Ard Biesheuvel > > Sent: 03 August 2018 10:30 > ... > > The discussion about whether memcpy() should rely on unaligned > > accesses, and whether you should use it on device memory is orthogonal > > to that, and not the heart of the matter IMO > > Even on x86 using memcpy() on PCIe memory (maybe mmap()ed into userspace) > isn't a good idea. > In the kernel memcpy_to/fromio() ought to be a better choice but that > is just an alternate name for memcpy(). > > The problem on x86 is that memcpy() is likely to be implemented as > 'rep movsb' on modern cpu - relying on the cpu hardware to perform > cache-line sized transfers (etc). > Unfortunately on uncached locations it has to revert to byte copies. > So PCIe transfers (especially reads) are very slow. > > The transfers need to use the largest size register available. > > David On x86, the framebuffer is mapped as write-combining memory type, so "rep movsb" could merge the byte writes to larger chunks. I don't have a cpu with the ERMS feature - could anyone try it if rep movsb works worse or better than explicit writes to the framebuffer? Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 12:04 ` Mikulas Patocka @ 2018-08-03 13:04 ` David Laight -1 siblings, 0 replies; 238+ messages in thread From: David Laight @ 2018-08-03 13:04 UTC (permalink / raw) To: 'Mikulas Patocka' Cc: 'Ard Biesheuvel', Ramana Radhakrishnan, Florian Weimer, Thomas Petazzoni, GNU C Library, Andrew Pinski, Catalin Marinas, Will Deacon, Russell King, LKML, linux-arm-kernel From: Mikulas Patocka > Sent: 03 August 2018 13:05 ... > > Even on x86 using memcpy() on PCIe memory (maybe mmap()ed into userspace) > > isn't a good idea. > > In the kernel memcpy_to/fromio() ought to be a better choice but that > > is just an alternate name for memcpy(). > > > > The problem on x86 is that memcpy() is likely to be implemented as > > 'rep movsb' on modern cpu - relying on the cpu hardware to perform > > cache-line sized transfers (etc). > > Unfortunately on uncached locations it has to revert to byte copies. > > So PCIe transfers (especially reads) are very slow. > > > > The transfers need to use the largest size register available. > > > > David > > On x86, the framebuffer is mapped as write-combining memory type, so "rep > movsb" could merge the byte writes to larger chunks. I don't have a cpu > with the ERMS feature - could anyone try it if rep movsb works worse or > better than explicit writes to the framebuffer? I don't think 'write combining' can help reads, and memcpy_to/fromio() are likely to be used for normal memory mapped io areas. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 13:04 ` David Laight 0 siblings, 0 replies; 238+ messages in thread From: David Laight @ 2018-08-03 13:04 UTC (permalink / raw) To: linux-arm-kernel From: Mikulas Patocka > Sent: 03 August 2018 13:05 ... > > Even on x86 using memcpy() on PCIe memory (maybe mmap()ed into userspace) > > isn't a good idea. > > In the kernel memcpy_to/fromio() ought to be a better choice but that > > is just an alternate name for memcpy(). > > > > The problem on x86 is that memcpy() is likely to be implemented as > > 'rep movsb' on modern cpu - relying on the cpu hardware to perform > > cache-line sized transfers (etc). > > Unfortunately on uncached locations it has to revert to byte copies. > > So PCIe transfers (especially reads) are very slow. > > > > The transfers need to use the largest size register available. > > > > David > > On x86, the framebuffer is mapped as write-combining memory type, so "rep > movsb" could merge the byte writes to larger chunks. I don't have a cpu > with the ERMS feature - could anyone try it if rep movsb works worse or > better than explicit writes to the framebuffer? I don't think 'write combining' can help reads, and memcpy_to/fromio() are likely to be used for normal memory mapped io areas. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 13:04 ` David Laight @ 2018-08-05 14:36 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-05 14:36 UTC (permalink / raw) To: David Laight Cc: 'Ard Biesheuvel', Ramana Radhakrishnan, Florian Weimer, Thomas Petazzoni, GNU C Library, Andrew Pinski, Catalin Marinas, Will Deacon, Russell King, LKML, linux-arm-kernel On Fri, 3 Aug 2018, David Laight wrote: > From: Mikulas Patocka > > Sent: 03 August 2018 13:05 > ... > > > Even on x86 using memcpy() on PCIe memory (maybe mmap()ed into userspace) > > > isn't a good idea. > > > In the kernel memcpy_to/fromio() ought to be a better choice but that > > > is just an alternate name for memcpy(). > > > > > > The problem on x86 is that memcpy() is likely to be implemented as > > > 'rep movsb' on modern cpu - relying on the cpu hardware to perform > > > cache-line sized transfers (etc). > > > Unfortunately on uncached locations it has to revert to byte copies. > > > So PCIe transfers (especially reads) are very slow. > > > > > > The transfers need to use the largest size register available. > > > > > > David > > > > On x86, the framebuffer is mapped as write-combining memory type, so "rep > > movsb" could merge the byte writes to larger chunks. I don't have a cpu > > with the ERMS feature - could anyone try it if rep movsb works worse or > > better than explicit writes to the framebuffer? > > I don't think 'write combining' can help reads, and memcpy_to/fromio() There's an instruction movntdqa (and vmovntdqa) that can actually do prefetch on write-combining memory type. It's the only instruction that can do it. It this instruction is used on non-write-combining memory type, it behaves like movdqa. > are likely to be used for normal memory mapped io areas. > > David I benchmarked it on a processor with ERMS - for writes to the framebuffer, there's no difference between memcpy, 8-byte writes, rep stosb, rep stosq, mmx, sse, avx - all this method achieve 16-17 GB/s For reading from the framebuffer: 323 MB/s - memcpy (using avx2) 91 MB/s - explicit 8-byte reads 249 MB/s - rep movsq 307 MB/s - rep movsb 90 MB/s - mmx 176 MB/s - sse 4750 MB/s - sse movntdqa 330 MB/s - avx 5369 MB/s - avx vmovntdqa So - it may make sense to introduce a function memcpy_from_framebuffer() that uses movntdqa or vmovntdqa on CPUs that support it. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-05 14:36 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-05 14:36 UTC (permalink / raw) To: linux-arm-kernel On Fri, 3 Aug 2018, David Laight wrote: > From: Mikulas Patocka > > Sent: 03 August 2018 13:05 > ... > > > Even on x86 using memcpy() on PCIe memory (maybe mmap()ed into userspace) > > > isn't a good idea. > > > In the kernel memcpy_to/fromio() ought to be a better choice but that > > > is just an alternate name for memcpy(). > > > > > > The problem on x86 is that memcpy() is likely to be implemented as > > > 'rep movsb' on modern cpu - relying on the cpu hardware to perform > > > cache-line sized transfers (etc). > > > Unfortunately on uncached locations it has to revert to byte copies. > > > So PCIe transfers (especially reads) are very slow. > > > > > > The transfers need to use the largest size register available. > > > > > > David > > > > On x86, the framebuffer is mapped as write-combining memory type, so "rep > > movsb" could merge the byte writes to larger chunks. I don't have a cpu > > with the ERMS feature - could anyone try it if rep movsb works worse or > > better than explicit writes to the framebuffer? > > I don't think 'write combining' can help reads, and memcpy_to/fromio() There's an instruction movntdqa (and vmovntdqa) that can actually do prefetch on write-combining memory type. It's the only instruction that can do it. It this instruction is used on non-write-combining memory type, it behaves like movdqa. > are likely to be used for normal memory mapped io areas. > > David I benchmarked it on a processor with ERMS - for writes to the framebuffer, there's no difference between memcpy, 8-byte writes, rep stosb, rep stosq, mmx, sse, avx - all this method achieve 16-17 GB/s For reading from the framebuffer: 323 MB/s - memcpy (using avx2) 91 MB/s - explicit 8-byte reads 249 MB/s - rep movsq 307 MB/s - rep movsb 90 MB/s - mmx 176 MB/s - sse 4750 MB/s - sse movntdqa 330 MB/s - avx 5369 MB/s - avx vmovntdqa So - it may make sense to introduce a function memcpy_from_framebuffer() that uses movntdqa or vmovntdqa on CPUs that support it. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-05 14:36 ` Mikulas Patocka @ 2018-08-06 10:18 ` David Laight -1 siblings, 0 replies; 238+ messages in thread From: David Laight @ 2018-08-06 10:18 UTC (permalink / raw) To: 'Mikulas Patocka' Cc: 'Ard Biesheuvel', Ramana Radhakrishnan, Florian Weimer, Thomas Petazzoni, GNU C Library, Andrew Pinski, Catalin Marinas, Will Deacon, Russell King, LKML, linux-arm-kernel From: Mikulas Patocka > Sent: 05 August 2018 15:36 > To: David Laight ... > There's an instruction movntdqa (and vmovntdqa) that can actually do > prefetch on write-combining memory type. It's the only instruction that > can do it. > > It this instruction is used on non-write-combining memory type, it behaves > like movdqa. > ... > I benchmarked it on a processor with ERMS - for writes to the framebuffer, > there's no difference between memcpy, 8-byte writes, rep stosb, rep stosq, > mmx, sse, avx - all this method achieve 16-17 GB/s The combination of write-combining, posted writes and a fast PCIe slave are probably why there is little difference. > For reading from the framebuffer: > 323 MB/s - memcpy (using avx2) > 91 MB/s - explicit 8-byte reads > 249 MB/s - rep movsq > 307 MB/s - rep movsb You must be getting the ERMS hardware optimised 'rep movsb'. > 90 MB/s - mmx > 176 MB/s - sse > 4750 MB/s - sse movntdqa > 330 MB/s - avx avx512 is probably faster still. > 5369 MB/s - avx vmovntdqa > > So - it may make sense to introduce a function memcpy_from_framebuffer() > that uses movntdqa or vmovntdqa on CPUs that support it. For kernel space it ought to be just memcpy_fromio(). Can you easily repeat the tests using a non-write-combining map of the same PCIe slave? I can probably run the same measurements against our rather leisurely FPGA based PCIe slave. IIRC PCIe reads happen every 128 clocks of the cards 62.5MHz clock, increasing the size of the registers makes a significant different. I've not tried mapping write-combining and using (v)movntdaq. I'm not sure what effect write-combining would have if the whole BAR were mapped that way - so I'll either have to map the physical addresses twice or add in another BAR. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 10:18 ` David Laight 0 siblings, 0 replies; 238+ messages in thread From: David Laight @ 2018-08-06 10:18 UTC (permalink / raw) To: linux-arm-kernel From: Mikulas Patocka > Sent: 05 August 2018 15:36 > To: David Laight ... > There's an instruction movntdqa (and vmovntdqa) that can actually do > prefetch on write-combining memory type. It's the only instruction that > can do it. > > It this instruction is used on non-write-combining memory type, it behaves > like movdqa. > ... > I benchmarked it on a processor with ERMS - for writes to the framebuffer, > there's no difference between memcpy, 8-byte writes, rep stosb, rep stosq, > mmx, sse, avx - all this method achieve 16-17 GB/s The combination of write-combining, posted writes and a fast PCIe slave are probably why there is little difference. > For reading from the framebuffer: > 323 MB/s - memcpy (using avx2) > 91 MB/s - explicit 8-byte reads > 249 MB/s - rep movsq > 307 MB/s - rep movsb You must be getting the ERMS hardware optimised 'rep movsb'. > 90 MB/s - mmx > 176 MB/s - sse > 4750 MB/s - sse movntdqa > 330 MB/s - avx avx512 is probably faster still. > 5369 MB/s - avx vmovntdqa > > So - it may make sense to introduce a function memcpy_from_framebuffer() > that uses movntdqa or vmovntdqa on CPUs that support it. For kernel space it ought to be just memcpy_fromio(). Can you easily repeat the tests using a non-write-combining map of the same PCIe slave? I can probably run the same measurements against our rather leisurely FPGA based PCIe slave. IIRC PCIe reads happen every 128 clocks of the cards 62.5MHz clock, increasing the size of the registers makes a significant different. I've not tried mapping write-combining and using (v)movntdaq. I'm not sure what effect write-combining would have if the whole BAR were mapped that way - so I'll either have to map the physical addresses twice or add in another BAR. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 10:18 ` David Laight @ 2018-08-07 14:07 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-07 14:07 UTC (permalink / raw) To: David Laight Cc: 'Ard Biesheuvel', Ramana Radhakrishnan, Florian Weimer, Thomas Petazzoni, GNU C Library, Andrew Pinski, Catalin Marinas, Will Deacon, Russell King, LKML, linux-arm-kernel On Mon, 6 Aug 2018, David Laight wrote: > From: Mikulas Patocka > > Sent: 05 August 2018 15:36 > > To: David Laight > ... > > There's an instruction movntdqa (and vmovntdqa) that can actually do > > prefetch on write-combining memory type. It's the only instruction that > > can do it. > > > > It this instruction is used on non-write-combining memory type, it behaves > > like movdqa. > > > ... > > I benchmarked it on a processor with ERMS - for writes to the framebuffer, > > there's no difference between memcpy, 8-byte writes, rep stosb, rep stosq, > > mmx, sse, avx - all this method achieve 16-17 GB/s > > The combination of write-combining, posted writes and a fast PCIe slave > are probably why there is little difference. > > > For reading from the framebuffer: > > 323 MB/s - memcpy (using avx2) > > 91 MB/s - explicit 8-byte reads > > 249 MB/s - rep movsq > > 307 MB/s - rep movsb > > You must be getting the ERMS hardware optimised 'rep movsb'. > > > 90 MB/s - mmx > > 176 MB/s - sse > > 4750 MB/s - sse movntdqa > > 330 MB/s - avx > > avx512 is probably faster still. > > > 5369 MB/s - avx vmovntdqa > > > > So - it may make sense to introduce a function memcpy_from_framebuffer() > > that uses movntdqa or vmovntdqa on CPUs that support it. > > For kernel space it ought to be just memcpy_fromio(). I meant for userspace. Unaccelerated scrolling is still painfully slow even on modern computers because of slow framebuffer read. If glibc provided a function memcpy_from_framebuffer() that used movntdqa and the fbdev Xorg driver used it, it would help the users who use unaccelerated drivers for some reason. > Can you easily repeat the tests using a non-write-combining map of the > same PCIe slave? I mapped the framebuffer as uncached and these are the results: reading from the framebuffer: 318 MB/s - memcpy 74 MB/s - explicit 8-byte reads 73 MB/s - rep movsq 11 MB/s - rep movsb 87 MB/s - mmx 173 MB/s - sse 173 MB/s - sse movntdqa 323 MB/s - avx 284 MB/s - avx vmovntdqa zeroing the framebuffer: 19 MB/s - memset 154 MB/s - explicit 8-byte writes 152 MB/s - rep stosq 19 MB/s - rep stosb 152 MB/s - mmx 306 MB/s - sse 621 MB/s - avx copying data to the framebuffer: 618 MB/s - memcpy (using avx2) 152 MB/s - explicit 8-byte writes 139 MB/s - rep movsq 17 MB/s - rep movsb 154 MB/s - mmx 305 MB/s - sse 306 MB/s - sse movntdqa 619 MB/s - avx 619 MB/s - avx movntdqa > I can probably run the same measurements against our rather leisurely > FPGA based PCIe slave. > IIRC PCIe reads happen every 128 clocks of the cards 62.5MHz clock, > increasing the size of the registers makes a significant different. > I've not tried mapping write-combining and using (v)movntdaq. > I'm not sure what effect write-combining would have if the whole BAR > were mapped that way - so I'll either have to map the physical addresses > twice or add in another BAR. > > David Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-07 14:07 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-07 14:07 UTC (permalink / raw) To: linux-arm-kernel On Mon, 6 Aug 2018, David Laight wrote: > From: Mikulas Patocka > > Sent: 05 August 2018 15:36 > > To: David Laight > ... > > There's an instruction movntdqa (and vmovntdqa) that can actually do > > prefetch on write-combining memory type. It's the only instruction that > > can do it. > > > > It this instruction is used on non-write-combining memory type, it behaves > > like movdqa. > > > ... > > I benchmarked it on a processor with ERMS - for writes to the framebuffer, > > there's no difference between memcpy, 8-byte writes, rep stosb, rep stosq, > > mmx, sse, avx - all this method achieve 16-17 GB/s > > The combination of write-combining, posted writes and a fast PCIe slave > are probably why there is little difference. > > > For reading from the framebuffer: > > 323 MB/s - memcpy (using avx2) > > 91 MB/s - explicit 8-byte reads > > 249 MB/s - rep movsq > > 307 MB/s - rep movsb > > You must be getting the ERMS hardware optimised 'rep movsb'. > > > 90 MB/s - mmx > > 176 MB/s - sse > > 4750 MB/s - sse movntdqa > > 330 MB/s - avx > > avx512 is probably faster still. > > > 5369 MB/s - avx vmovntdqa > > > > So - it may make sense to introduce a function memcpy_from_framebuffer() > > that uses movntdqa or vmovntdqa on CPUs that support it. > > For kernel space it ought to be just memcpy_fromio(). I meant for userspace. Unaccelerated scrolling is still painfully slow even on modern computers because of slow framebuffer read. If glibc provided a function memcpy_from_framebuffer() that used movntdqa and the fbdev Xorg driver used it, it would help the users who use unaccelerated drivers for some reason. > Can you easily repeat the tests using a non-write-combining map of the > same PCIe slave? I mapped the framebuffer as uncached and these are the results: reading from the framebuffer: 318 MB/s - memcpy 74 MB/s - explicit 8-byte reads 73 MB/s - rep movsq 11 MB/s - rep movsb 87 MB/s - mmx 173 MB/s - sse 173 MB/s - sse movntdqa 323 MB/s - avx 284 MB/s - avx vmovntdqa zeroing the framebuffer: 19 MB/s - memset 154 MB/s - explicit 8-byte writes 152 MB/s - rep stosq 19 MB/s - rep stosb 152 MB/s - mmx 306 MB/s - sse 621 MB/s - avx copying data to the framebuffer: 618 MB/s - memcpy (using avx2) 152 MB/s - explicit 8-byte writes 139 MB/s - rep movsq 17 MB/s - rep movsb 154 MB/s - mmx 305 MB/s - sse 306 MB/s - sse movntdqa 619 MB/s - avx 619 MB/s - avx movntdqa > I can probably run the same measurements against our rather leisurely > FPGA based PCIe slave. > IIRC PCIe reads happen every 128 clocks of the cards 62.5MHz clock, > increasing the size of the registers makes a significant different. > I've not tried mapping write-combining and using (v)movntdaq. > I'm not sure what effect write-combining would have if the whole BAR > were mapped that way - so I'll either have to map the physical addresses > twice or add in another BAR. > > David Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-07 14:07 ` Mikulas Patocka @ 2018-08-07 14:33 ` David Laight -1 siblings, 0 replies; 238+ messages in thread From: David Laight @ 2018-08-07 14:33 UTC (permalink / raw) To: 'Mikulas Patocka' Cc: 'Ard Biesheuvel', Ramana Radhakrishnan, Florian Weimer, Thomas Petazzoni, GNU C Library, Andrew Pinski, Catalin Marinas, Will Deacon, Russell King, LKML, linux-arm-kernel From: Mikulas Patocka > Sent: 07 August 2018 15:07 ... > Unaccelerated scrolling is still painfully slow > even on modern computers because of slow framebuffer read. I solved that many years ago on a strongarm system by mapping the screen memory at two separate virtual addresses. One uncached used for writes, the second cached using the 'minicache' for reads. (and immediately fell foul of a memcpy() function that compared the two virtual addresses and decided to copy backwards) I suspect some modern cpus don't like you doing that and the graphics 'drivers' won't use different mappings. Even in glibc you want a more general copy_to/from_io_memory() rather than just 'copy_from_framebuffer()'. Best to define both - even if they end up identical. Other drivers allow PCIe space be mmap()ed into user space. While your tests show vmovntdqa being slightly slower than an avx read for uncached mappings it is still much better than all the other options. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-07 14:33 ` David Laight 0 siblings, 0 replies; 238+ messages in thread From: David Laight @ 2018-08-07 14:33 UTC (permalink / raw) To: linux-arm-kernel From: Mikulas Patocka > Sent: 07 August 2018 15:07 ... > Unaccelerated scrolling is still painfully slow > even on modern computers because of slow framebuffer read. I solved that many years ago on a strongarm system by mapping the screen memory at two separate virtual addresses. One uncached used for writes, the second cached using the 'minicache' for reads. (and immediately fell foul of a memcpy() function that compared the two virtual addresses and decided to copy backwards) I suspect some modern cpus don't like you doing that and the graphics 'drivers' won't use different mappings. Even in glibc you want a more general copy_to/from_io_memory() rather than just 'copy_from_framebuffer()'. Best to define both - even if they end up identical. Other drivers allow PCIe space be mmap()ed into user space. While your tests show vmovntdqa being slightly slower than an avx read for uncached mappings it is still much better than all the other options. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 238+ messages in thread
* RE: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-07 14:33 ` David Laight @ 2018-08-08 14:21 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 14:21 UTC (permalink / raw) To: David Laight Cc: 'Ard Biesheuvel', Ramana Radhakrishnan, Florian Weimer, Thomas Petazzoni, GNU C Library, Andrew Pinski, Catalin Marinas, Will Deacon, Russell King, LKML, linux-arm-kernel On Tue, 7 Aug 2018, David Laight wrote: > From: Mikulas Patocka > > Sent: 07 August 2018 15:07 > ... > > Unaccelerated scrolling is still painfully slow > > even on modern computers because of slow framebuffer read. > > I solved that many years ago on a strongarm system by mapping > the screen memory at two separate virtual addresses. > One uncached used for writes, the second cached using the > 'minicache' for reads. > (and immediately fell foul of a memcpy() function that compared > the two virtual addresses and decided to copy backwards) > > I suspect some modern cpus don't like you doing that and the > graphics 'drivers' won't use different mappings. Intel says that you can't mix PAT memory attributes - but the non-temporal store instructions use write-combining semantics on a memory that is normally cacheable - and it is allowed to mix non-temporal stores with other cacheable memory accesses - so I believe that the CPU will snoop the cache for wc accesses and handle the conflict. > Even in glibc you want a more general copy_to/from_io_memory() > rather than just 'copy_from_framebuffer()'. > Best to define both - even if they end up identical. > Other drivers allow PCIe space be mmap()ed into user space. > > While your tests show vmovntdqa being slightly slower than an > avx read for uncached mappings it is still much better than > all the other options. Tihs was a measuring glitch - movntdqa is as fast as movdqa on non-cached mappings. Mikulas > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) > ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-08 14:21 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-08 14:21 UTC (permalink / raw) To: linux-arm-kernel On Tue, 7 Aug 2018, David Laight wrote: > From: Mikulas Patocka > > Sent: 07 August 2018 15:07 > ... > > Unaccelerated scrolling is still painfully slow > > even on modern computers because of slow framebuffer read. > > I solved that many years ago on a strongarm system by mapping > the screen memory at two separate virtual addresses. > One uncached used for writes, the second cached using the > 'minicache' for reads. > (and immediately fell foul of a memcpy() function that compared > the two virtual addresses and decided to copy backwards) > > I suspect some modern cpus don't like you doing that and the > graphics 'drivers' won't use different mappings. Intel says that you can't mix PAT memory attributes - but the non-temporal store instructions use write-combining semantics on a memory that is normally cacheable - and it is allowed to mix non-temporal stores with other cacheable memory accesses - so I believe that the CPU will snoop the cache for wc accesses and handle the conflict. > Even in glibc you want a more general copy_to/from_io_memory() > rather than just 'copy_from_framebuffer()'. > Best to define both - even if they end up identical. > Other drivers allow PCIe space be mmap()ed into user space. > > While your tests show vmovntdqa being slightly slower than an > avx read for uncached mappings it is still much better than > all the other options. Tihs was a measuring glitch - movntdqa is as fast as movdqa on non-cached mappings. Mikulas > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) > ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 7:53 ` Florian Weimer @ 2018-08-03 13:20 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-03 13:20 UTC (permalink / raw) To: Florian Weimer Cc: Andrew Pinski, Catalin Marinas, Will Deacon, linux, thomas.petazzoni, linux-arm-kernel, LKML, GNU C Library On Fri, 3 Aug 2018, Florian Weimer wrote: > On 08/03/2018 09:11 AM, Andrew Pinski wrote: > > Yes fix Links not to use memcpy on the framebuffer. > > It is undefined behavior to use device memory with memcpy. > > Some (de facto) ABIs require that it is supported, though. For example, > the POWER string functions avoid unaligned loads and stores for this > reason because the platform has the same issue with device memory. And > yes, GCC will expand memcpy on POWER to something that is incompatible > with device memory. 8-( > > If we don't want people to use memcpy, we probably need to provide a > credible alternative. > > Thanks, > Florian And what does POWER do with code like this? void write_merge(int *x) { x[0] = x[1] = 0; } With -O2, gcc-8 translates it into: li 9,0 std 9,0(3) blr And that std instruction may end up being unaligned (the C ABI mandates that x is aligned to 4 bytes, not 8). If this piece of code is inside some graphics driver and writes to framebuffer memory, what do you do with it? Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 13:20 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-03 13:20 UTC (permalink / raw) To: linux-arm-kernel On Fri, 3 Aug 2018, Florian Weimer wrote: > On 08/03/2018 09:11 AM, Andrew Pinski wrote: > > Yes fix Links not to use memcpy on the framebuffer. > > It is undefined behavior to use device memory with memcpy. > > Some (de facto) ABIs require that it is supported, though. For example, > the POWER string functions avoid unaligned loads and stores for this > reason because the platform has the same issue with device memory. And > yes, GCC will expand memcpy on POWER to something that is incompatible > with device memory. 8-( > > If we don't want people to use memcpy, we probably need to provide a > credible alternative. > > Thanks, > Florian And what does POWER do with code like this? void write_merge(int *x) { x[0] = x[1] = 0; } With -O2, gcc-8 translates it into: li 9,0 std 9,0(3) blr And that std instruction may end up being unaligned (the C ABI mandates that x is aligned to 4 bytes, not 8). If this piece of code is inside some graphics driver and writes to framebuffer memory, what do you do with it? Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 7:11 ` Andrew Pinski @ 2018-08-03 13:31 ` Mikulas Patocka -1 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-03 13:31 UTC (permalink / raw) To: Andrew Pinski Cc: Catalin Marinas, Will Deacon, linux, thomas.petazzoni, linux-arm-kernel, LKML, GNU C Library On Fri, 3 Aug 2018, Andrew Pinski wrote: > On Thu, Aug 2, 2018 at 12:31 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > Hi > > > > I tried to use a PCIe graphics card on the MacchiatoBIN board and I hit a > > strange problem. > > > > When I use the links browser in graphics mode on the framebuffer, I get > > occasional pixel corruption. Links does memcpy, memset and 4-byte writes > > on the framebuffer - nothing else. > > > > I found out that the pixel corruption is caused by overlapping unaligned > > stp instructions inside memcpy. In order to avoid branching, the arm64 > > memcpy implementation may write the same destination twice with different > > alignment. If I put "dmb sy" between the overlapping stp instructions, the > > pixel corruption goes away. > > > > This seems like a hardware bug. Is it a known errata? Do you have any > > workarounds for it? > > Yes fix Links not to use memcpy on the framebuffer. > It is undefined behavior to use device memory with memcpy. > > Thanks, > Andrew Pinski Links can be fixed easily - but there is exterme amount of code that accesses videoram via C pointers in the Xserver and in the GPU drivers. How do you intend to fix that? What should we use instead of direct access or memcpy? Libc doesn't provide any macros or functions for framebuffer access. Using hardcoded assembler doesn't make the the programs portable. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 13:31 ` Mikulas Patocka 0 siblings, 0 replies; 238+ messages in thread From: Mikulas Patocka @ 2018-08-03 13:31 UTC (permalink / raw) To: linux-arm-kernel On Fri, 3 Aug 2018, Andrew Pinski wrote: > On Thu, Aug 2, 2018 at 12:31 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > Hi > > > > I tried to use a PCIe graphics card on the MacchiatoBIN board and I hit a > > strange problem. > > > > When I use the links browser in graphics mode on the framebuffer, I get > > occasional pixel corruption. Links does memcpy, memset and 4-byte writes > > on the framebuffer - nothing else. > > > > I found out that the pixel corruption is caused by overlapping unaligned > > stp instructions inside memcpy. In order to avoid branching, the arm64 > > memcpy implementation may write the same destination twice with different > > alignment. If I put "dmb sy" between the overlapping stp instructions, the > > pixel corruption goes away. > > > > This seems like a hardware bug. Is it a known errata? Do you have any > > workarounds for it? > > Yes fix Links not to use memcpy on the framebuffer. > It is undefined behavior to use device memory with memcpy. > > Thanks, > Andrew Pinski Links can be fixed easily - but there is exterme amount of code that accesses videoram via C pointers in the Xserver and in the GPU drivers. How do you intend to fix that? What should we use instead of direct access or memcpy? Libc doesn't provide any macros or functions for framebuffer access. Using hardcoded assembler doesn't make the the programs portable. Mikulas ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 13:31 ` Mikulas Patocka @ 2018-08-03 14:17 ` Richard Earnshaw (lists) -1 siblings, 0 replies; 238+ messages in thread From: Richard Earnshaw (lists) @ 2018-08-03 14:17 UTC (permalink / raw) To: Mikulas Patocka, Andrew Pinski Cc: Catalin Marinas, Will Deacon, linux, thomas.petazzoni, linux-arm-kernel, LKML, GNU C Library On 03/08/18 14:31, Mikulas Patocka wrote: > > > On Fri, 3 Aug 2018, Andrew Pinski wrote: > >> On Thu, Aug 2, 2018 at 12:31 PM Mikulas Patocka <mpatocka@redhat.com> wrote: >>> >>> Hi >>> >>> I tried to use a PCIe graphics card on the MacchiatoBIN board and I hit a >>> strange problem. >>> >>> When I use the links browser in graphics mode on the framebuffer, I get >>> occasional pixel corruption. Links does memcpy, memset and 4-byte writes >>> on the framebuffer - nothing else. >>> >>> I found out that the pixel corruption is caused by overlapping unaligned >>> stp instructions inside memcpy. In order to avoid branching, the arm64 >>> memcpy implementation may write the same destination twice with different >>> alignment. If I put "dmb sy" between the overlapping stp instructions, the >>> pixel corruption goes away. >>> >>> This seems like a hardware bug. Is it a known errata? Do you have any >>> workarounds for it? >> >> Yes fix Links not to use memcpy on the framebuffer. >> It is undefined behavior to use device memory with memcpy. >> >> Thanks, >> Andrew Pinski > > Links can be fixed easily - but there is exterme amount of code that > accesses videoram via C pointers in the Xserver and in the GPU drivers. > How do you intend to fix that? > > What should we use instead of direct access or memcpy? Libc doesn't > provide any macros or functions for framebuffer access. Using hardcoded > assembler doesn't make the the programs portable. > > Mikulas > Dialing back the optimization levels when building the Xserver so the compilers plays by its rules is one thing. Dialing back the optimizations in the C library to handle a non-conforming program is quite another. That affects every program on the system, even if it turns out to be a server with no graphics system. R. ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-03 14:17 ` Richard Earnshaw (lists) 0 siblings, 0 replies; 238+ messages in thread From: Richard Earnshaw (lists) @ 2018-08-03 14:17 UTC (permalink / raw) To: linux-arm-kernel On 03/08/18 14:31, Mikulas Patocka wrote: > > > On Fri, 3 Aug 2018, Andrew Pinski wrote: > >> On Thu, Aug 2, 2018 at 12:31 PM Mikulas Patocka <mpatocka@redhat.com> wrote: >>> >>> Hi >>> >>> I tried to use a PCIe graphics card on the MacchiatoBIN board and I hit a >>> strange problem. >>> >>> When I use the links browser in graphics mode on the framebuffer, I get >>> occasional pixel corruption. Links does memcpy, memset and 4-byte writes >>> on the framebuffer - nothing else. >>> >>> I found out that the pixel corruption is caused by overlapping unaligned >>> stp instructions inside memcpy. In order to avoid branching, the arm64 >>> memcpy implementation may write the same destination twice with different >>> alignment. If I put "dmb sy" between the overlapping stp instructions, the >>> pixel corruption goes away. >>> >>> This seems like a hardware bug. Is it a known errata? Do you have any >>> workarounds for it? >> >> Yes fix Links not to use memcpy on the framebuffer. >> It is undefined behavior to use device memory with memcpy. >> >> Thanks, >> Andrew Pinski > > Links can be fixed easily - but there is exterme amount of code that > accesses videoram via C pointers in the Xserver and in the GPU drivers. > How do you intend to fix that? > > What should we use instead of direct access or memcpy? Libc doesn't > provide any macros or functions for framebuffer access. Using hardcoded > assembler doesn't make the the programs portable. > > Mikulas > Dialing back the optimization levels when building the Xserver so the compilers plays by its rules is one thing. Dialing back the optimizations in the C library to handle a non-conforming program is quite another. That affects every program on the system, even if it turns out to be a server with no graphics system. R. ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-03 7:11 ` Andrew Pinski @ 2018-08-05 21:36 ` Pavel Machek -1 siblings, 0 replies; 238+ messages in thread From: Pavel Machek @ 2018-08-05 21:36 UTC (permalink / raw) To: Andrew Pinski Cc: mpatocka, Catalin Marinas, Will Deacon, linux, thomas.petazzoni, linux-arm-kernel, LKML, GNU C Library [-- Attachment #1: Type: text/plain, Size: 1200 bytes --] Hi! > > I tried to use a PCIe graphics card on the MacchiatoBIN board and I hit a > > strange problem. > > > > When I use the links browser in graphics mode on the framebuffer, I get > > occasional pixel corruption. Links does memcpy, memset and 4-byte writes > > on the framebuffer - nothing else. > > > > I found out that the pixel corruption is caused by overlapping unaligned > > stp instructions inside memcpy. In order to avoid branching, the arm64 > > memcpy implementation may write the same destination twice with different > > alignment. If I put "dmb sy" between the overlapping stp instructions, the > > pixel corruption goes away. > > > > This seems like a hardware bug. Is it a known errata? Do you have any > > workarounds for it? > > Yes fix Links not to use memcpy on the framebuffer. > It is undefined behavior to use device memory with memcpy. No, I don't think so. Why do you think so? I'm pretty sure that gcc is allowed to do memcpy-like tricks even when memcpy is not mentioned explicitely. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-05 21:36 ` Pavel Machek 0 siblings, 0 replies; 238+ messages in thread From: Pavel Machek @ 2018-08-05 21:36 UTC (permalink / raw) To: linux-arm-kernel Hi! > > I tried to use a PCIe graphics card on the MacchiatoBIN board and I hit a > > strange problem. > > > > When I use the links browser in graphics mode on the framebuffer, I get > > occasional pixel corruption. Links does memcpy, memset and 4-byte writes > > on the framebuffer - nothing else. > > > > I found out that the pixel corruption is caused by overlapping unaligned > > stp instructions inside memcpy. In order to avoid branching, the arm64 > > memcpy implementation may write the same destination twice with different > > alignment. If I put "dmb sy" between the overlapping stp instructions, the > > pixel corruption goes away. > > > > This seems like a hardware bug. Is it a known errata? Do you have any > > workarounds for it? > > Yes fix Links not to use memcpy on the framebuffer. > It is undefined behavior to use device memory with memcpy. No, I don't think so. Why do you think so? I'm pretty sure that gcc is allowed to do memcpy-like tricks even when memcpy is not mentioned explicitely. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 181 bytes Desc: Digital signature URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180805/30c3bcbe/attachment.sig> ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-05 21:36 ` Pavel Machek @ 2018-08-06 8:04 ` Ramana Radhakrishnan -1 siblings, 0 replies; 238+ messages in thread From: Ramana Radhakrishnan @ 2018-08-06 8:04 UTC (permalink / raw) To: Pavel Machek Cc: Andrew Pinski, Mikulas Patocka, Catalin Marinas, Will Deacon, Russell King, Thomas Petazzoni, linux-arm-kernel, LKML, GNU C Library On Sun, Aug 5, 2018 at 10:36 PM, Pavel Machek <pavel@ucw.cz> wrote: > Hi! > >> > I tried to use a PCIe graphics card on the MacchiatoBIN board and I hit a >> > strange problem. >> > >> > When I use the links browser in graphics mode on the framebuffer, I get >> > occasional pixel corruption. Links does memcpy, memset and 4-byte writes >> > on the framebuffer - nothing else. >> > >> > I found out that the pixel corruption is caused by overlapping unaligned >> > stp instructions inside memcpy. In order to avoid branching, the arm64 >> > memcpy implementation may write the same destination twice with different >> > alignment. If I put "dmb sy" between the overlapping stp instructions, the >> > pixel corruption goes away. >> > >> > This seems like a hardware bug. Is it a known errata? Do you have any >> > workarounds for it? >> >> Yes fix Links not to use memcpy on the framebuffer. >> It is undefined behavior to use device memory with memcpy. > > No, I don't think so. Why do you think so? It is undefined behaviour in the architecture. Ramana ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 8:04 ` Ramana Radhakrishnan 0 siblings, 0 replies; 238+ messages in thread From: Ramana Radhakrishnan @ 2018-08-06 8:04 UTC (permalink / raw) To: linux-arm-kernel On Sun, Aug 5, 2018 at 10:36 PM, Pavel Machek <pavel@ucw.cz> wrote: > Hi! > >> > I tried to use a PCIe graphics card on the MacchiatoBIN board and I hit a >> > strange problem. >> > >> > When I use the links browser in graphics mode on the framebuffer, I get >> > occasional pixel corruption. Links does memcpy, memset and 4-byte writes >> > on the framebuffer - nothing else. >> > >> > I found out that the pixel corruption is caused by overlapping unaligned >> > stp instructions inside memcpy. In order to avoid branching, the arm64 >> > memcpy implementation may write the same destination twice with different >> > alignment. If I put "dmb sy" between the overlapping stp instructions, the >> > pixel corruption goes away. >> > >> > This seems like a hardware bug. Is it a known errata? Do you have any >> > workarounds for it? >> >> Yes fix Links not to use memcpy on the framebuffer. >> It is undefined behavior to use device memory with memcpy. > > No, I don't think so. Why do you think so? It is undefined behaviour in the architecture. Ramana ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 8:04 ` Ramana Radhakrishnan @ 2018-08-06 8:44 ` Pavel Machek -1 siblings, 0 replies; 238+ messages in thread From: Pavel Machek @ 2018-08-06 8:44 UTC (permalink / raw) To: Ramana Radhakrishnan Cc: Andrew Pinski, Mikulas Patocka, Catalin Marinas, Will Deacon, Russell King, Thomas Petazzoni, linux-arm-kernel, LKML, GNU C Library [-- Attachment #1: Type: text/plain, Size: 1490 bytes --] On Mon 2018-08-06 09:04:33, Ramana Radhakrishnan wrote: > On Sun, Aug 5, 2018 at 10:36 PM, Pavel Machek <pavel@ucw.cz> wrote: > > Hi! > > > >> > I tried to use a PCIe graphics card on the MacchiatoBIN board and I hit a > >> > strange problem. > >> > > >> > When I use the links browser in graphics mode on the framebuffer, I get > >> > occasional pixel corruption. Links does memcpy, memset and 4-byte writes > >> > on the framebuffer - nothing else. > >> > > >> > I found out that the pixel corruption is caused by overlapping unaligned > >> > stp instructions inside memcpy. In order to avoid branching, the arm64 > >> > memcpy implementation may write the same destination twice with different > >> > alignment. If I put "dmb sy" between the overlapping stp instructions, the > >> > pixel corruption goes away. > >> > > >> > This seems like a hardware bug. Is it a known errata? Do you have any > >> > workarounds for it? > >> > >> Yes fix Links not to use memcpy on the framebuffer. > >> It is undefined behavior to use device memory with memcpy. > > > > No, I don't think so. Why do you think so? > > It is undefined behaviour in the architecture. Why do you think so? Pointer to documentation would be helpful. Normal access is used for mmapped areas, and I don't think we want to change that. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 8:44 ` Pavel Machek 0 siblings, 0 replies; 238+ messages in thread From: Pavel Machek @ 2018-08-06 8:44 UTC (permalink / raw) To: linux-arm-kernel On Mon 2018-08-06 09:04:33, Ramana Radhakrishnan wrote: > On Sun, Aug 5, 2018 at 10:36 PM, Pavel Machek <pavel@ucw.cz> wrote: > > Hi! > > > >> > I tried to use a PCIe graphics card on the MacchiatoBIN board and I hit a > >> > strange problem. > >> > > >> > When I use the links browser in graphics mode on the framebuffer, I get > >> > occasional pixel corruption. Links does memcpy, memset and 4-byte writes > >> > on the framebuffer - nothing else. > >> > > >> > I found out that the pixel corruption is caused by overlapping unaligned > >> > stp instructions inside memcpy. In order to avoid branching, the arm64 > >> > memcpy implementation may write the same destination twice with different > >> > alignment. If I put "dmb sy" between the overlapping stp instructions, the > >> > pixel corruption goes away. > >> > > >> > This seems like a hardware bug. Is it a known errata? Do you have any > >> > workarounds for it? > >> > >> Yes fix Links not to use memcpy on the framebuffer. > >> It is undefined behavior to use device memory with memcpy. > > > > No, I don't think so. Why do you think so? > > It is undefined behaviour in the architecture. Why do you think so? Pointer to documentation would be helpful. Normal access is used for mmapped areas, and I don't think we want to change that. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 181 bytes Desc: Digital signature URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180806/53c42960/attachment.sig> ^ permalink raw reply [flat|nested] 238+ messages in thread
* Re: framebuffer corruption due to overlapping stp instructions on arm64 2018-08-06 8:44 ` Pavel Machek @ 2018-08-06 9:11 ` Ard Biesheuvel -1 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 9:11 UTC (permalink / raw) To: Pavel Machek Cc: Ramana Radhakrishnan, Thomas Petazzoni, GNU C Library, Andrew Pinski, Catalin Marinas, Will Deacon, Russell King, LKML, Mikulas Patocka, linux-arm-kernel On 6 August 2018 at 10:44, Pavel Machek <pavel@ucw.cz> wrote: > On Mon 2018-08-06 09:04:33, Ramana Radhakrishnan wrote: >> On Sun, Aug 5, 2018 at 10:36 PM, Pavel Machek <pavel@ucw.cz> wrote: >> > Hi! >> > >> >> > I tried to use a PCIe graphics card on the MacchiatoBIN board and I hit a >> >> > strange problem. >> >> > >> >> > When I use the links browser in graphics mode on the framebuffer, I get >> >> > occasional pixel corruption. Links does memcpy, memset and 4-byte writes >> >> > on the framebuffer - nothing else. >> >> > >> >> > I found out that the pixel corruption is caused by overlapping unaligned >> >> > stp instructions inside memcpy. In order to avoid branching, the arm64 >> >> > memcpy implementation may write the same destination twice with different >> >> > alignment. If I put "dmb sy" between the overlapping stp instructions, the >> >> > pixel corruption goes away. >> >> > >> >> > This seems like a hardware bug. Is it a known errata? Do you have any >> >> > workarounds for it? >> >> >> >> Yes fix Links not to use memcpy on the framebuffer. >> >> It is undefined behavior to use device memory with memcpy. >> > >> > No, I don't think so. Why do you think so? >> >> It is undefined behaviour in the architecture. > > Why do you think so? Pointer to documentation would be helpful. > > Normal access is used for mmapped areas, and I don't think we want to > change that. Pavel, In this context, 'device mapping' specifically means one of the Device-{G,nG}{R,nR}{E,nE} mapping attributes as defined by the ARM ARM, where G, R and E stand for Gathering, Reordering and Early acknowledgement, respectively. There is no disagreement whether memcpy() is suitable for such regions - it is not. These mappings are intended for memory mapped device registers, not for memory. The issue under discussion here is whether PCIe can provide outbound windows with true memory semantics, which is the assumption that is present all throughout the Linux graphics stack. ^ permalink raw reply [flat|nested] 238+ messages in thread
* framebuffer corruption due to overlapping stp instructions on arm64 @ 2018-08-06 9:11 ` Ard Biesheuvel 0 siblings, 0 replies; 238+ messages in thread From: Ard Biesheuvel @ 2018-08-06 9:11 UTC (permalink / raw) To: linux-arm-kernel On 6 August 2018 at 10:44, Pavel Machek <pavel@ucw.cz> wrote: > On Mon 2018-08-06 09:04:33, Ramana Radhakrishnan wrote: >> On Sun, Aug 5, 2018 at 10:36 PM, Pavel Machek <pavel@ucw.cz> wrote: >> > Hi! >> > >> >> > I tried to use a PCIe graphics card on the MacchiatoBIN board and I hit a >> >> > strange problem. >> >> > >> >> > When I use the links browser in graphics mode on the framebuffer, I get >> >> > occasional pixel corruption. Links does memcpy, memset and 4-byte writes >> >> > on the framebuffer - nothing else. >> >> > >> >> > I found out that the pixel corruption is caused by overlapping unaligned >> >> > stp instructions inside memcpy. In order to avoid branching, the arm64 >> >> > memcpy implementation may write the same destination twice with different >> >> > alignment. If I put "dmb sy" between the overlapping stp instructions, the >> >> > pixel corruption goes away. >> >> > >> >> > This seems like a hardware bug. Is it a known errata? Do you have any >> >> > workarounds for it? >> >> >> >> Yes fix Links not to use memcpy on the framebuffer. >> >> It is undefined behavior to use device memory with memcpy. >> > >> > No, I don't think so. Why do you think so? >> >> It is undefined behaviour in the architecture. > > Why do you think so? Pointer to documentation would be helpful. > > Normal access is used for mmapped areas, and I don't think we want to > change that. Pavel, In this context, 'device mapping' specifically means one of the Device-{G,nG}{R,nR}{E,nE} mapping attributes as defined by the ARM ARM, where G, R and E stand for Gathering, Reordering and Early acknowledgement, respectively. There is no disagreement whether memcpy() is suitable for such regions - it is not. These mappings are intended for memory mapped device registers, not for memory. The issue under discussion here is whether PCIe can provide outbound windows with true memory semantics, which is the assumption that is present all throughout the Linux graphics stack. ^ permalink raw reply [flat|nested] 238+ messages in thread
end of thread, other threads:[~2018-08-09 15:29 UTC | newest] Thread overview: 238+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-08-02 19:31 framebuffer corruption due to overlapping stp instructions on arm64 Mikulas Patocka 2018-08-02 19:31 ` Mikulas Patocka [not found] ` <CAHCPf3tFGqkYEcWNN4LaWThw_rVqT316pzLv6T7RfxwO-eZ0EA@mail.gmail.com> 2018-08-03 6:35 ` Mikulas Patocka 2018-08-03 6:35 ` Mikulas Patocka 2018-08-03 7:16 ` Ard Biesheuvel 2018-08-03 7:16 ` Ard Biesheuvel 2018-08-03 9:41 ` Will Deacon 2018-08-03 9:41 ` Will Deacon 2018-08-03 17:09 ` Mikulas Patocka 2018-08-03 17:09 ` Mikulas Patocka 2018-08-03 17:09 ` Mikulas Patocka 2018-08-03 17:32 ` Sinan Kaya 2018-08-03 17:32 ` Sinan Kaya 2018-08-03 17:32 ` Sinan Kaya 2018-08-03 17:33 ` Ard Biesheuvel 2018-08-03 17:33 ` Ard Biesheuvel 2018-08-03 17:33 ` Ard Biesheuvel 2018-08-03 18:25 ` Mikulas Patocka 2018-08-03 18:25 ` Mikulas Patocka 2018-08-03 18:25 ` Mikulas Patocka 2018-08-03 20:44 ` Matt Sealey 2018-08-03 20:44 ` Matt Sealey 2018-08-03 20:44 ` Matt Sealey 2018-08-03 21:20 ` Ard Biesheuvel 2018-08-03 21:20 ` Ard Biesheuvel 2018-08-03 21:20 ` Ard Biesheuvel 2018-08-06 10:25 ` Mikulas Patocka 2018-08-06 10:25 ` Mikulas Patocka 2018-08-06 10:25 ` Mikulas Patocka 2018-08-06 12:42 ` Robin Murphy 2018-08-06 12:42 ` Robin Murphy 2018-08-06 12:42 ` Robin Murphy 2018-08-06 12:53 ` Ard Biesheuvel 2018-08-06 12:53 ` Ard Biesheuvel 2018-08-06 12:53 ` Ard Biesheuvel 2018-08-06 13:41 ` Marcin Wojtas 2018-08-06 13:41 ` Marcin Wojtas 2018-08-06 13:41 ` Marcin Wojtas 2018-08-06 13:48 ` Ard Biesheuvel 2018-08-06 13:48 ` Ard Biesheuvel 2018-08-06 13:48 ` Ard Biesheuvel 2018-08-06 14:07 ` Marcin Wojtas 2018-08-06 14:07 ` Marcin Wojtas 2018-08-06 14:07 ` Marcin Wojtas 2018-08-06 14:13 ` Mikulas Patocka 2018-08-06 14:13 ` Mikulas Patocka 2018-08-06 14:13 ` Mikulas Patocka 2018-08-06 15:47 ` Ard Biesheuvel 2018-08-06 15:47 ` Ard Biesheuvel 2018-08-06 15:47 ` Ard Biesheuvel 2018-08-06 17:09 ` Mikulas Patocka 2018-08-06 17:09 ` Mikulas Patocka 2018-08-06 17:09 ` Mikulas Patocka 2018-08-06 17:21 ` Ard Biesheuvel 2018-08-06 17:21 ` Ard Biesheuvel 2018-08-06 17:21 ` Ard Biesheuvel 2018-08-06 19:54 ` Mikulas Patocka 2018-08-06 19:54 ` Mikulas Patocka 2018-08-06 19:54 ` Mikulas Patocka 2018-08-06 20:11 ` Ard Biesheuvel 2018-08-06 20:11 ` Ard Biesheuvel 2018-08-06 20:11 ` Ard Biesheuvel 2018-08-06 20:31 ` Mikulas Patocka 2018-08-06 20:31 ` Mikulas Patocka 2018-08-06 20:31 ` Mikulas Patocka 2018-08-07 16:40 ` Marcin Wojtas 2018-08-07 16:40 ` Marcin Wojtas 2018-08-07 16:40 ` Marcin Wojtas 2018-08-07 17:39 ` Mikulas Patocka 2018-08-07 17:39 ` Mikulas Patocka 2018-08-07 17:39 ` Mikulas Patocka 2018-08-07 18:07 ` Ard Biesheuvel 2018-08-07 18:07 ` Ard Biesheuvel 2018-08-07 18:07 ` Ard Biesheuvel 2018-08-07 18:17 ` Mikulas Patocka 2018-08-07 18:17 ` Mikulas Patocka 2018-08-07 18:17 ` Mikulas Patocka [not found] ` <CAPv3WKcKoEe=Qysp6Oac2C=G9bUhUQf1twSRCY+_qJ6XEC-iag@mail.gmail.com> 2018-08-08 14:10 ` Mikulas Patocka 2018-08-08 14:10 ` Mikulas Patocka 2018-08-08 14:10 ` Mikulas Patocka 2018-08-06 17:13 ` Catalin Marinas 2018-08-06 17:13 ` Catalin Marinas 2018-08-06 17:13 ` Catalin Marinas 2018-08-06 17:19 ` Mikulas Patocka 2018-08-06 17:19 ` Mikulas Patocka 2018-08-06 17:19 ` Mikulas Patocka 2018-08-08 18:31 ` Mikulas Patocka 2018-08-08 18:31 ` Mikulas Patocka 2018-08-08 18:31 ` Mikulas Patocka 2018-08-04 13:29 ` Mikulas Patocka 2018-08-04 13:29 ` Mikulas Patocka 2018-08-04 13:29 ` Mikulas Patocka 2018-08-08 12:16 ` Catalin Marinas 2018-08-08 12:16 ` Catalin Marinas 2018-08-08 12:16 ` Catalin Marinas 2018-08-08 13:02 ` David Laight 2018-08-08 13:02 ` David Laight 2018-08-08 13:02 ` David Laight 2018-08-08 13:46 ` Mikulas Patocka 2018-08-08 13:46 ` Mikulas Patocka 2018-08-08 13:46 ` Mikulas Patocka 2018-08-08 14:26 ` David Laight 2018-08-08 14:26 ` David Laight 2018-08-08 14:26 ` David Laight 2018-08-08 14:50 ` Catalin Marinas 2018-08-08 14:50 ` Catalin Marinas 2018-08-08 14:50 ` Catalin Marinas 2018-08-08 16:21 ` Mikulas Patocka 2018-08-08 16:21 ` Mikulas Patocka 2018-08-08 16:21 ` Mikulas Patocka 2018-08-08 16:31 ` Arnd Bergmann 2018-08-08 16:31 ` Arnd Bergmann 2018-08-08 16:31 ` Arnd Bergmann 2018-08-08 16:43 ` David Laight 2018-08-08 16:43 ` David Laight 2018-08-08 16:43 ` David Laight 2018-08-08 18:56 ` Mikulas Patocka 2018-08-08 18:56 ` Mikulas Patocka 2018-08-08 18:56 ` Mikulas Patocka 2018-08-08 18:37 ` Mikulas Patocka 2018-08-08 18:37 ` Mikulas Patocka 2018-08-08 18:37 ` Mikulas Patocka 2018-08-08 11:39 ` Catalin Marinas 2018-08-08 11:39 ` Catalin Marinas 2018-08-08 11:39 ` Catalin Marinas 2018-08-08 14:12 ` Mikulas Patocka 2018-08-08 14:12 ` Mikulas Patocka 2018-08-08 14:12 ` Mikulas Patocka 2018-08-08 14:28 ` Catalin Marinas 2018-08-08 14:28 ` Catalin Marinas 2018-08-08 14:28 ` Catalin Marinas 2018-08-08 18:40 ` Mikulas Patocka 2018-08-08 18:40 ` Mikulas Patocka 2018-08-08 18:40 ` Mikulas Patocka 2018-08-08 15:01 ` Richard Earnshaw (lists) 2018-08-08 15:01 ` Richard Earnshaw (lists) 2018-08-08 15:01 ` Richard Earnshaw (lists) 2018-08-08 15:14 ` Catalin Marinas 2018-08-08 15:14 ` Catalin Marinas 2018-08-08 15:14 ` Catalin Marinas 2018-08-08 16:01 ` Arnd Bergmann 2018-08-08 16:01 ` Arnd Bergmann 2018-08-08 16:01 ` Arnd Bergmann 2018-08-08 18:25 ` Mikulas Patocka 2018-08-08 18:25 ` Mikulas Patocka 2018-08-08 18:25 ` Mikulas Patocka 2018-08-08 21:51 ` Arnd Bergmann 2018-08-08 21:51 ` Arnd Bergmann 2018-08-08 21:51 ` Arnd Bergmann 2018-08-09 15:29 ` Arnd Bergmann 2018-08-09 15:29 ` Arnd Bergmann 2018-08-09 15:29 ` Arnd Bergmann 2018-08-03 7:11 ` Andrew Pinski 2018-08-03 7:11 ` Andrew Pinski 2018-08-03 7:53 ` Florian Weimer 2018-08-03 7:53 ` Florian Weimer 2018-08-03 9:12 ` Szabolcs Nagy 2018-08-03 9:12 ` Szabolcs Nagy 2018-08-03 9:15 ` Ramana Radhakrishnan 2018-08-03 9:15 ` Ramana Radhakrishnan 2018-08-03 9:29 ` Ard Biesheuvel 2018-08-03 9:29 ` Ard Biesheuvel 2018-08-03 9:37 ` Ramana Radhakrishnan 2018-08-03 9:37 ` Ramana Radhakrishnan 2018-08-03 9:42 ` Richard Earnshaw (lists) 2018-08-03 9:42 ` Richard Earnshaw (lists) 2018-08-04 0:58 ` Mikulas Patocka 2018-08-04 0:58 ` Mikulas Patocka 2018-08-04 1:13 ` Andrew Pinski 2018-08-04 1:13 ` Andrew Pinski 2018-08-04 11:04 ` Mikulas Patocka 2018-08-04 11:04 ` Mikulas Patocka 2018-08-05 18:33 ` Florian Weimer 2018-08-05 18:33 ` Florian Weimer 2018-08-06 8:02 ` Mikulas Patocka 2018-08-06 8:02 ` Mikulas Patocka 2018-08-06 8:10 ` Ard Biesheuvel 2018-08-06 8:10 ` Ard Biesheuvel 2018-08-06 10:31 ` Mikulas Patocka 2018-08-06 10:31 ` Mikulas Patocka 2018-08-06 10:37 ` Ard Biesheuvel 2018-08-06 10:37 ` Ard Biesheuvel 2018-08-06 10:42 ` Mikulas Patocka 2018-08-06 10:42 ` Mikulas Patocka 2018-08-06 10:48 ` Ard Biesheuvel 2018-08-06 10:48 ` Ard Biesheuvel 2018-08-06 12:09 ` Mikulas Patocka 2018-08-06 12:09 ` Mikulas Patocka 2018-08-06 12:19 ` Ard Biesheuvel 2018-08-06 12:19 ` Ard Biesheuvel 2018-08-06 12:22 ` Ard Biesheuvel 2018-08-06 12:22 ` Ard Biesheuvel 2018-08-07 14:14 ` Mikulas Patocka 2018-08-07 14:14 ` Mikulas Patocka 2018-08-07 14:40 ` Ard Biesheuvel 2018-08-07 14:40 ` Ard Biesheuvel 2018-08-08 19:15 ` Mikulas Patocka 2018-08-08 19:15 ` Mikulas Patocka 2018-08-06 11:19 ` Siddhesh Poyarekar 2018-08-06 11:19 ` Siddhesh Poyarekar 2018-08-06 11:29 ` Ard Biesheuvel 2018-08-06 11:29 ` Ard Biesheuvel 2018-08-06 14:26 ` Tulio Magno Quites Machado Filho 2018-08-06 14:26 ` Tulio Magno Quites Machado Filho 2018-08-05 21:51 ` Pavel Machek 2018-08-05 21:51 ` Pavel Machek 2018-08-06 14:30 ` Mikulas Patocka 2018-08-06 14:30 ` Mikulas Patocka 2018-08-03 11:24 ` David Laight 2018-08-03 11:24 ` David Laight 2018-08-03 12:04 ` Mikulas Patocka 2018-08-03 12:04 ` Mikulas Patocka 2018-08-03 13:04 ` David Laight 2018-08-03 13:04 ` David Laight 2018-08-05 14:36 ` Mikulas Patocka 2018-08-05 14:36 ` Mikulas Patocka 2018-08-06 10:18 ` David Laight 2018-08-06 10:18 ` David Laight 2018-08-07 14:07 ` Mikulas Patocka 2018-08-07 14:07 ` Mikulas Patocka 2018-08-07 14:33 ` David Laight 2018-08-07 14:33 ` David Laight 2018-08-08 14:21 ` Mikulas Patocka 2018-08-08 14:21 ` Mikulas Patocka 2018-08-03 13:20 ` Mikulas Patocka 2018-08-03 13:20 ` Mikulas Patocka 2018-08-03 13:31 ` Mikulas Patocka 2018-08-03 13:31 ` Mikulas Patocka 2018-08-03 14:17 ` Richard Earnshaw (lists) 2018-08-03 14:17 ` Richard Earnshaw (lists) 2018-08-05 21:36 ` Pavel Machek 2018-08-05 21:36 ` Pavel Machek 2018-08-06 8:04 ` Ramana Radhakrishnan 2018-08-06 8:04 ` Ramana Radhakrishnan 2018-08-06 8:44 ` Pavel Machek 2018-08-06 8:44 ` Pavel Machek 2018-08-06 9:11 ` Ard Biesheuvel 2018-08-06 9:11 ` Ard Biesheuvel
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.