All of lore.kernel.org
 help / color / mirror / Atom feed
* imx6q: PL310 caching issues?
@ 2020-11-25 21:50 Kegl Rohit
  2020-12-02  2:49 ` Fabio Estevam
  0 siblings, 1 reply; 5+ messages in thread
From: Kegl Rohit @ 2020-11-25 21:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hello!

We are running an imx6q platform with kernel version 3.10.108.

The error rate increased at the memory stress testing step.
During this step up to 10 "memtester 35M" processes are spawned and
run without core bindings on all 4 cores.
Attached are multiple memtester log of the same device.
All kinds of different subtests of memtester fail from time to time.

Improving memtester to resolve the virtual addresses to physical like
so: https://shanetully.com/2014/12/translating-virtual-addresses-to-physcial-addresses-in-user-space/
showed that the addresses are jumping around, so I think it is no
memory problem.
Binding all processes to a single core or running a single instance
with all available memory showed also no problems.

Attached is also the memtester routine for the "Block Sequential" subtest.
memtester splits the given memory into two buffers and compares them
after each memory modification.
It looks like the compare steps reads an old value from one buffer.
  Block Sequential    : testing 105
FAILURE: 0x69696969 != 0x68686868 at offset 0x01045d1c.
The correct value at step 105 is 105 == 0x69 => 0x69696969

As you can see in the attached failure logs, every failure consists of
a contiguous miss match of 8*32bit=32bytes.
Maybe it has something to do with the PL310 cache, its cacheline
consists of 32byte?
A short test with some newer 4.* kernel did not show this issue. But I
have to run it for longer.
Binding all processes to the same core also solves the issue.

The PL310 driver and the many PL310 ERRATA workarounds were heavily
reworked since the 3.10.108 kernel.
Did anybody suffer from the same issue in the past?
Can anybody remember which patch fixed the issue and point me in the
right direction?

memtester version 4.3.0 (32-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).
pagesize is 4096
pagesizemask is 0xfffff000
want 35MB (36700160 bytes)
got  35MB (36700160 bytes), trying mlock ...locked.
Loop 1/1:
  Stuck Address       : ok
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok
  Block Sequential    : testing 105
FAILURE: 0x69696969 != 0x68686868 at offset 0x01045d1c.
FAILURE: 0x69696969 != 0x68686868 at offset 0x01045d20.
FAILURE: 0x69696969 != 0x68686868 at offset 0x01045d24.
FAILURE: 0x69696969 != 0x68686868 at offset 0x01045d28.
FAILURE: 0x69696969 != 0x68686868 at offset 0x01045d2c.
FAILURE: 0x69696969 != 0x68686868 at offset 0x01045d30.
FAILURE: 0x69696969 != 0x68686868 at offset 0x01045d34.
FAILURE: 0x69696969 != 0x68686868 at offset 0x01045d38.
  Checkerboard        : ok
  Bit Spread          : ok
  Bit Flip            : ok
  Walking Ones        : ok
  Walking Zeroes      : ok
  8-bit Writes        : ok
  16-bit Writes       : ok
Done.

memtester version 4.3.0 (32-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).
pagesize is 4096
pagesizemask is 0xfffff000
want 35MB (36700160 bytes)
got  35MB (36700160 bytes), trying mlock ...locked.
Loop 1/1:
  Stuck Address       : ok
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok
  Block Sequential    : testing 123
FAILURE: 0x7a7a7a7a != 0x7b7b7b7b at offset 0x00a60f00.
FAILURE: 0x7a7a7a7a != 0x7b7b7b7b at offset 0x00a60f04.
FAILURE: 0x7a7a7a7a != 0x7b7b7b7b at offset 0x00a60f08.
FAILURE: 0x7a7a7a7a != 0x7b7b7b7b at offset 0x00a60f0c.
FAILURE: 0x7a7a7a7a != 0x7b7b7b7b at offset 0x00a60f10.
FAILURE: 0x7a7a7a7a != 0x7b7b7b7b at offset 0x00a60f14.
FAILURE: 0x7a7a7a7a != 0x7b7b7b7b at offset 0x00a60f18.
FAILURE: 0x7a7a7a7a != 0x7b7b7b7b at offset 0x00a60f1c.
  Checkerboard        : ok
  Bit Spread          : ok
  Bit Flip            : ok
  Walking Ones        : ok
  Walking Zeroes      : ok
  8-bit Writes        : ok
  16-bit Writes       : ok
Done.

memtester version 4.3.0 (32-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).
pagesize is 4096
pagesizemask is 0xfffff000
want 35MB (36700160 bytes)
got  35MB (36700160 bytes), trying mlock ...locked.
Loop 1/1:
  Stuck Address       : ok
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : ok
  Block Sequential    : testing 247
FAILURE: 0xf7f7f7f7 != 0xf6f6f6f6 at offset 0x0079bb1c.
FAILURE: 0xf7f7f7f7 != 0xf6f6f6f6 at offset 0x0079bb20.
FAILURE: 0xf7f7f7f7 != 0xf6f6f6f6 at offset 0x0079bb24.
FAILURE: 0xf7f7f7f7 != 0xf6f6f6f6 at offset 0x0079bb28.
FAILURE: 0xf7f7f7f7 != 0xf6f6f6f6 at offset 0x0079bb2c.
FAILURE: 0xf7f7f7f7 != 0xf6f6f6f6 at offset 0x0079bb30.
FAILURE: 0xf7f7f7f7 != 0xf6f6f6f6 at offset 0x0079bb34.
FAILURE: 0xf7f7f7f7 != 0xf6f6f6f6 at offset 0x0079bb38.
  Checkerboard        : ok
  Bit Spread          : ok
  Bit Flip            : ok
  Walking Ones        : ok
  Walking Zeroes      : ok
  8-bit Writes        : ok
  16-bit Writes       : ok
Done.

memtester version 4.3.0 (32-bit)
Copyright (C) 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).
pagesize is 4096
pagesizemask is 0xfffff000
want 35MB (36700160 bytes)
got  35MB (36700160 bytes), trying mlock ...locked.
Loop 1/1:
  Stuck Address       : ok
  Random Value        : ok
  Compare XOR         : ok
  Compare SUB         : ok
  Compare MUL         : ok
  Compare DIV         : ok
  Compare OR          : ok
  Compare AND         : ok
  Sequential Increment: ok
  Solid Bits          : testing  37
FAILURE: 0xffffffff != 0x00000000 at offset 0x0097c27c.
FAILURE: 0x00000000 != 0xffffffff at offset 0x0097c280.
FAILURE: 0xffffffff != 0x00000000 at offset 0x0097c284.
FAILURE: 0x00000000 != 0xffffffff at offset 0x0097c288.
FAILURE: 0xffffffff != 0x00000000 at offset 0x0097c28c.
FAILURE: 0x00000000 != 0xffffffff at offset 0x0097c290.
FAILURE: 0xffffffff != 0x00000000 at offset 0x0097c294.
FAILURE: 0x00000000 != 0xffffffff at offset 0x0097c298.
  Block Sequential    : ok
  Checkerboard        : ok
  Bit Spread          : ok
  Bit Flip            : ok
  Walking Ones        : ok
  Walking Zeroes      : ok
  8-bit Writes        : ok
  16-bit Writes       : ok
Done.


int compare_regions(ulv *bufa, ulv *bufb, size_t count) {
    int r = 0;
    size_t i;
    ulv *p1 = bufa;
    ulv *p2 = bufb;
    off_t physaddr;

    for (i = 0; i < count; i++, p1++, p2++) {
        if (*p1 != *p2) {
            if (use_phys) {
                physaddr = physaddrbase + (i * sizeof(ul));
                fprintf(stderr,
                        "FAILURE: 0x%08lx != 0x%08lx at physical address "
                        "0x%08lx.\n",
                        (ul) *p1, (ul) *p2, physaddr);
            } else {
                fprintf(stderr,
                        "FAILURE: 0x%08lx != 0x%08lx at offset 0x%08lx.\n",
                        (ul) *p1, (ul) *p2, (ul) (i * sizeof(ul)));
            }
            /* printf("Skipping to next test..."); */
            r = -1;
        }
    }
    return r;
}

int test_blockseq_comparison(ulv *bufa, ulv *bufb, size_t count) {
    ulv *p1 = bufa;
    ulv *p2 = bufb;
    unsigned int j;
    size_t i;

    printf("           ");
    fflush(stdout);
    for (j = 0; j < 256; j++) {
        printf("\b\b\b\b\b\b\b\b\b\b\b");
        p1 = (ulv *) bufa;
        p2 = (ulv *) bufb;
        printf("setting %3u", j);
        fflush(stdout);
        for (i = 0; i < count; i++) {
            *p1++ = *p2++ = (ul) UL_BYTE(j);
        }
        printf("\b\b\b\b\b\b\b\b\b\b\b");
        printf("testing %3u", j);
        fflush(stdout);
        if (compare_regions(bufa, bufb, count)) {
            return -1;
        }
    }
    printf("\b\b\b\b\b\b\b\b\b\b\b           \b\b\b\b\b\b\b\b\b\b\b");
    fflush(stdout);
    return 0;
}

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: imx6q: PL310 caching issues?
  2020-11-25 21:50 imx6q: PL310 caching issues? Kegl Rohit
@ 2020-12-02  2:49 ` Fabio Estevam
  2020-12-03  7:31   ` Kegl Rohit
  0 siblings, 1 reply; 5+ messages in thread
From: Fabio Estevam @ 2020-12-02  2:49 UTC (permalink / raw)
  To: Kegl Rohit; +Cc: moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

Hi Kegl,

On Wed, Nov 25, 2020 at 6:52 PM Kegl Rohit <keglrohit@gmail.com> wrote:
>
> Hello!
>
> We are running an imx6q platform with kernel version 3.10.108.

This is an old and unsupported kernel version.

> A short test with some newer 4.* kernel did not show this issue. But I
> have to run it for longer.

You should consider upgrading your kernel.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: imx6q: PL310 caching issues?
  2020-12-02  2:49 ` Fabio Estevam
@ 2020-12-03  7:31   ` Kegl Rohit
  2020-12-03  9:52     ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 5+ messages in thread
From: Kegl Rohit @ 2020-12-03  7:31 UTC (permalink / raw)
  To: Fabio Estevam; +Cc: moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

Kernel / Uboot version doesn't matter when configured wrong.
I thought some memory management expert could point me to the right
path after looking at this distinct error pattern.
Because it has to be known, everywhere questions but no useful answers.

ARM/PL310: 752271—Double linefill feature can cause data corruption
[i.MX 6Dual/6Quad only]
Is the issue.
In u-boot 2015_07 exists a CONFIG_MX6Q and CONFIG_MX6QDL. Our BSP from
the vendor set only CONFIG_MX6QDL on a imx6q platform.
=> errata was bypassed in uboot => kernel had same errata correctly
enabled, but kernel does not actively reset this bit (jumps over
setting this bit)
=> uboot sets the bit and kernel with activated errata does not reset it.
Other Boardconfigs with imx6q and uboot < v2019.04-rc1 could also be
affected. But depends also on the kernel version. Maybe newer kernels
actively reset the bit.

I think CONFIG_MX6QDL is pretty misleading or the vendor did not know
about CONFIG_MX6Q which exists also in the latest u-boot.
But the 752271 handling is completely different as of v2019.04-rc1. So
maybe only the 752271 handling based on #ifdef CONFIG_MX6Q is not
enough.
Here is the switch to PL310 revision based checks:
https://github.com/u-boot/u-boot/commit/d8bbf362f3dc87326597217b8bab083516cf534f
It would be great if errata changes could be committed with their own commit.
Commit d8bbf362f3dc87326597217b8bab083516cf534f affects errata 752271
and 765569 at once.
We could also observe better memory throughput with the changes to
765569. But still pretty big impact without activated doubleline fill.

On Wed, Dec 2, 2020 at 3:49 AM Fabio Estevam <festevam@gmail.com> wrote:
>
> Hi Kegl,
>
> On Wed, Nov 25, 2020 at 6:52 PM Kegl Rohit <keglrohit@gmail.com> wrote:
> >
> > Hello!
> >
> > We are running an imx6q platform with kernel version 3.10.108.
>
> This is an old and unsupported kernel version.
>
> > A short test with some newer 4.* kernel did not show this issue. But I
> > have to run it for longer.
>
> You should consider upgrading your kernel.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: imx6q: PL310 caching issues?
  2020-12-03  7:31   ` Kegl Rohit
@ 2020-12-03  9:52     ` Russell King - ARM Linux admin
  2020-12-05  9:26       ` Kegl Rohit
  0 siblings, 1 reply; 5+ messages in thread
From: Russell King - ARM Linux admin @ 2020-12-03  9:52 UTC (permalink / raw)
  To: Kegl Rohit
  Cc: Fabio Estevam, moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

On Thu, Dec 03, 2020 at 08:31:59AM +0100, Kegl Rohit wrote:
> Kernel / Uboot version doesn't matter when configured wrong.
> I thought some memory management expert could point me to the right
> path after looking at this distinct error pattern.

Highly unlikely that anyone will remember. You are asking people to
remember stuff some seven years ago.

Even if we _could_, you are using such an old kernel that it will be
buggy in other ways, including with security holes. It is not worth
spending the time using.

Yes, it may be the favourite kernel because that's the one Freescale
used, but that's no excuse for exposing everyone to the effects of
such a buggy and unmaintained kernel on the wider Internet.

Consider all those buggy Internet connected cameras that are used as
spam bots... or to spy on people.

I'm sorry, but it is highly unlikely anyone (apart from yourself) has
an interest in helping with 3.10 kernels.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: imx6q: PL310 caching issues?
  2020-12-03  9:52     ` Russell King - ARM Linux admin
@ 2020-12-05  9:26       ` Kegl Rohit
  0 siblings, 0 replies; 5+ messages in thread
From: Kegl Rohit @ 2020-12-05  9:26 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: Fabio Estevam, moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

Yes, I unterstand. But sometimes you have to do things even if you
don't want to. Newer is in certain situations/applications not always
better / more stable.
And the issue in u-boot can happen up to v2019.04-rc1.

On Thu, Dec 3, 2020 at 10:53 AM Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Thu, Dec 03, 2020 at 08:31:59AM +0100, Kegl Rohit wrote:
> > Kernel / Uboot version doesn't matter when configured wrong.
> > I thought some memory management expert could point me to the right
> > path after looking at this distinct error pattern.
>
> Highly unlikely that anyone will remember. You are asking people to
> remember stuff some seven years ago.
>
> Even if we _could_, you are using such an old kernel that it will be
> buggy in other ways, including with security holes. It is not worth
> spending the time using.
>
> Yes, it may be the favourite kernel because that's the one Freescale
> used, but that's no excuse for exposing everyone to the effects of
> such a buggy and unmaintained kernel on the wider Internet.
>
> Consider all those buggy Internet connected cameras that are used as
> spam bots... or to spy on people.
>
> I'm sorry, but it is highly unlikely anyone (apart from yourself) has
> an interest in helping with 3.10 kernels.
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-12-05  9:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-25 21:50 imx6q: PL310 caching issues? Kegl Rohit
2020-12-02  2:49 ` Fabio Estevam
2020-12-03  7:31   ` Kegl Rohit
2020-12-03  9:52     ` Russell King - ARM Linux admin
2020-12-05  9:26       ` Kegl Rohit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.