From: <Tudor.Ambarus@microchip.com> To: <peda@axentia.se>, <regressions@leemhuis.info>, <Nicolas.Ferre@microchip.com>, <alexandre.belloni@bootlin.com> Cc: <du@axentia.se>, <Patrice.Vilchez@microchip.com>, <Cristian.Birsan@microchip.com>, <Ludovic.Desroches@microchip.com>, <linux-kernel@vger.kernel.org>, <linux-arm-kernel@lists.infradead.org>, <gregkh@linuxfoundation.org>, <saravanak@google.com> Subject: Re: Regression: memory corruption on Atmel SAMA5D31 Date: Thu, 30 Jun 2022 10:20:06 +0000 [thread overview] Message-ID: <ab0d6f40-bbb8-81e2-b703-d33f4057aedc@microchip.com> (raw) In-Reply-To: <17835914-cc0d-4a8d-4795-b16ff9243b76@microchip.com> On 6/30/22 12:23, Tudor.Ambarus@microchip.com wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > On 6/30/22 08:20, Peter Rosin wrote: >> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe >> >> Hi! > > Hi, Peter! >> >> 2022-06-27 at 18:53, Tudor.Ambarus@microchip.com wrote: >>> On 6/27/22 15:26, Tudor.Ambarus@microchip.com wrote: >>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe >>>> >>>> On 6/21/22 13:46, Peter Rosin wrote: >>>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe >>>>> >>>>> 2022-06-20 at 16:22, Tudor.Ambarus@microchip.com wrote: >>>>>> >>>>>>> >>>>>>> git@github.com:ambarus/linux-0day.git, branch dma-regression-hdmac-v5.18-rc7-4th-attempt >>>>>>> >>>>>> >>>>>> Hi, Peter, >>>>>> >>>>>> I've just forced pushed on this branch, I had a typo somewhere and with that fixed I could >>>>>> no longer reproduce the bug. Tested for ~20 minutes. Would you please test last 3 patches >>>>>> and tell me if you can still reproduce the bug? >>>>> >>>>> Hi! >>>>> >>>>> I rebased your patches onto my current branch which is v5.18.2 plus a few unrelated >>>>> changes (at least they are unrelated after removing the previous workaround to disable >>>>> nand-dma entirely). >>>>> >>>>> The unrelated patches are two backports so that drivers recognize new compatibles [1][2], >>>>> which should be completely harmless, plus a couple of proposed fixes that happens to fix >>>>> eeprom issues with the at91 I2C driver from Codrin Ciubotariu [3]. >>>>> >>>>> On that kernel, I can still reproduce. It seems a bit harder to reproduce the problem now >>>>> though. If the system is otherwise idle, the sha256sum test did not reproduce in a run of >>>>> 150+ attempts, but if I let the "real" application run while I do the test, I get a failure rate >>>>> of about 10%, see below. The real application burns some CPU (but not all of it) and >>>>> communicates with HW using I2C, native UARTs and two of the four USB-serial ports >>>>> (FTDI, with the latency set to 1ms as mentioned earlier), so I guess there is more DMA >>>>> pressure or something? There is a 100mbps network connection, but it was left "idle" >>>>> during this test. >>>>> >>>> >>>> Thanks, Peter. >>>> I got back to the office, I'm rechecking what could go wrong. >>>> >>> >>> Hi, Peter, >>> >>> Would you please help me with another round of testing? I have difficulties >>> in reproducing the bug and maybe you can speed up the process while I copy >>> your testing setup. I made two more patches on top of the same branch [1]. >>> My assumption is that the last problem that you saw is that a transfer >>> could be started multiple times. I think these are the last less invasive >>> changes that I try, I'll have to rewrite the logic anyway. >>> >>> Thanks! >>> >>> [1] To github.com:ambarus/linux-0day.git >>> cbb2ddca4618..79c7784dbcf2 dma-regression-hdmac-v5.18-rc7-4th-attempt -> dma-regression-hdmac-v5.18-rc7-4th-attempt >> >> I was out of office, but I managed to get a test running over night and can >> report that It still fails. This is a longer run of about 500 with a failure >> rate of 5% compared to the last time when the failure rate was 10%. I tend > > Thanks! > >> to think that the observed difference in failure rate may well be statistical >> noise, but who knows? Would it be useful with a longer run without the last >> two patches to see if they make a difference? I forgot to answer, sorry. No, not needed as it still fails. > > I pushed another patch were I added a write mem barrier to make sure everything > is in place before starting the transfer. Could you also take the last patch > and re-test if it's not too complicated? I still can't reproduce it on my side, > I'm checking what else I can add to stress test the DMA. I could reproduce the bug even with the wmb(). I'm rechecking what I missed. Cheers, ta
WARNING: multiple messages have this Message-ID (diff)
From: <Tudor.Ambarus@microchip.com> To: <peda@axentia.se>, <regressions@leemhuis.info>, <Nicolas.Ferre@microchip.com>, <alexandre.belloni@bootlin.com> Cc: <du@axentia.se>, <Patrice.Vilchez@microchip.com>, <Cristian.Birsan@microchip.com>, <Ludovic.Desroches@microchip.com>, <linux-kernel@vger.kernel.org>, <linux-arm-kernel@lists.infradead.org>, <gregkh@linuxfoundation.org>, <saravanak@google.com> Subject: Re: Regression: memory corruption on Atmel SAMA5D31 Date: Thu, 30 Jun 2022 10:20:06 +0000 [thread overview] Message-ID: <ab0d6f40-bbb8-81e2-b703-d33f4057aedc@microchip.com> (raw) In-Reply-To: <17835914-cc0d-4a8d-4795-b16ff9243b76@microchip.com> On 6/30/22 12:23, Tudor.Ambarus@microchip.com wrote: > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe > > On 6/30/22 08:20, Peter Rosin wrote: >> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe >> >> Hi! > > Hi, Peter! >> >> 2022-06-27 at 18:53, Tudor.Ambarus@microchip.com wrote: >>> On 6/27/22 15:26, Tudor.Ambarus@microchip.com wrote: >>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe >>>> >>>> On 6/21/22 13:46, Peter Rosin wrote: >>>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe >>>>> >>>>> 2022-06-20 at 16:22, Tudor.Ambarus@microchip.com wrote: >>>>>> >>>>>>> >>>>>>> git@github.com:ambarus/linux-0day.git, branch dma-regression-hdmac-v5.18-rc7-4th-attempt >>>>>>> >>>>>> >>>>>> Hi, Peter, >>>>>> >>>>>> I've just forced pushed on this branch, I had a typo somewhere and with that fixed I could >>>>>> no longer reproduce the bug. Tested for ~20 minutes. Would you please test last 3 patches >>>>>> and tell me if you can still reproduce the bug? >>>>> >>>>> Hi! >>>>> >>>>> I rebased your patches onto my current branch which is v5.18.2 plus a few unrelated >>>>> changes (at least they are unrelated after removing the previous workaround to disable >>>>> nand-dma entirely). >>>>> >>>>> The unrelated patches are two backports so that drivers recognize new compatibles [1][2], >>>>> which should be completely harmless, plus a couple of proposed fixes that happens to fix >>>>> eeprom issues with the at91 I2C driver from Codrin Ciubotariu [3]. >>>>> >>>>> On that kernel, I can still reproduce. It seems a bit harder to reproduce the problem now >>>>> though. If the system is otherwise idle, the sha256sum test did not reproduce in a run of >>>>> 150+ attempts, but if I let the "real" application run while I do the test, I get a failure rate >>>>> of about 10%, see below. The real application burns some CPU (but not all of it) and >>>>> communicates with HW using I2C, native UARTs and two of the four USB-serial ports >>>>> (FTDI, with the latency set to 1ms as mentioned earlier), so I guess there is more DMA >>>>> pressure or something? There is a 100mbps network connection, but it was left "idle" >>>>> during this test. >>>>> >>>> >>>> Thanks, Peter. >>>> I got back to the office, I'm rechecking what could go wrong. >>>> >>> >>> Hi, Peter, >>> >>> Would you please help me with another round of testing? I have difficulties >>> in reproducing the bug and maybe you can speed up the process while I copy >>> your testing setup. I made two more patches on top of the same branch [1]. >>> My assumption is that the last problem that you saw is that a transfer >>> could be started multiple times. I think these are the last less invasive >>> changes that I try, I'll have to rewrite the logic anyway. >>> >>> Thanks! >>> >>> [1] To github.com:ambarus/linux-0day.git >>> cbb2ddca4618..79c7784dbcf2 dma-regression-hdmac-v5.18-rc7-4th-attempt -> dma-regression-hdmac-v5.18-rc7-4th-attempt >> >> I was out of office, but I managed to get a test running over night and can >> report that It still fails. This is a longer run of about 500 with a failure >> rate of 5% compared to the last time when the failure rate was 10%. I tend > > Thanks! > >> to think that the observed difference in failure rate may well be statistical >> noise, but who knows? Would it be useful with a longer run without the last >> two patches to see if they make a difference? I forgot to answer, sorry. No, not needed as it still fails. > > I pushed another patch were I added a write mem barrier to make sure everything > is in place before starting the transfer. Could you also take the last patch > and re-test if it's not too complicated? I still can't reproduce it on my side, > I'm checking what else I can add to stress test the DMA. I could reproduce the bug even with the wmb(). I'm rechecking what I missed. Cheers, ta _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2022-06-30 10:20 UTC|newest] Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-03-03 0:29 Regression: memory corruption on Atmel SAMA5D31 Peter Rosin 2022-03-03 3:02 ` Saravana Kannan 2022-03-03 3:02 ` Saravana Kannan 2022-03-03 9:17 ` Peter Rosin 2022-03-03 9:17 ` Peter Rosin 2022-03-04 3:55 ` Saravana Kannan 2022-03-04 3:55 ` Saravana Kannan 2022-03-04 6:57 ` Peter Rosin 2022-03-04 6:57 ` Peter Rosin 2022-03-04 10:57 ` Peter Rosin 2022-03-04 10:57 ` Peter Rosin 2022-03-04 11:12 ` Tudor.Ambarus 2022-03-04 11:12 ` Tudor.Ambarus 2022-03-04 12:38 ` Peter Rosin 2022-03-04 12:38 ` Peter Rosin 2022-03-04 16:48 ` Tudor.Ambarus 2022-03-04 16:48 ` Tudor.Ambarus 2022-03-07 9:45 ` Tudor.Ambarus 2022-03-07 9:45 ` Tudor.Ambarus 2022-03-07 11:32 ` Peter Rosin 2022-03-07 11:32 ` Peter Rosin 2022-03-07 20:32 ` Peter Rosin 2022-03-07 20:32 ` Peter Rosin 2022-03-08 7:55 ` Nicolas Ferre 2022-03-08 7:55 ` Nicolas Ferre 2022-03-09 8:30 ` Peter Rosin 2022-03-09 8:30 ` Peter Rosin [not found] ` <6d9561a4-39e4-3dbe-5fe2-c6f88ee2a4c6@axentia.se> [not found] ` <ed24a281-1790-8e24-5f5a-25b66527044b@microchip.com> [not found] ` <d563c7ba-6431-2639-9f2a-2e2c6788e625@axentia.se> [not found] ` <e5a715c5-ad9f-6fd4-071e-084ab950603e@microchip.com> 2022-03-10 9:58 ` Peter Rosin 2022-03-10 9:58 ` Peter Rosin 2022-03-10 10:40 ` Peter Rosin 2022-03-10 10:40 ` Peter Rosin 2022-04-09 13:02 ` Thorsten Leemhuis 2022-04-09 13:02 ` Thorsten Leemhuis 2022-04-11 6:21 ` Tudor.Ambarus 2022-04-11 6:21 ` Tudor.Ambarus 2022-05-17 14:50 ` Peter Rosin 2022-05-17 14:50 ` Peter Rosin 2022-05-18 6:21 ` Tudor.Ambarus 2022-05-18 6:21 ` Tudor.Ambarus 2022-05-18 7:51 ` Peter Rosin 2022-05-18 7:51 ` Peter Rosin 2022-06-20 7:04 ` Thorsten Leemhuis 2022-06-20 7:04 ` Thorsten Leemhuis 2022-06-20 8:43 ` Tudor.Ambarus 2022-06-20 8:43 ` Tudor.Ambarus 2022-06-20 14:22 ` Tudor.Ambarus 2022-06-20 14:22 ` Tudor.Ambarus 2022-06-21 7:00 ` Peter Rosin 2022-06-21 7:00 ` Peter Rosin 2022-06-21 10:46 ` Peter Rosin 2022-06-21 10:46 ` Peter Rosin 2022-06-27 12:26 ` Tudor.Ambarus 2022-06-27 12:26 ` Tudor.Ambarus 2022-06-27 16:53 ` Tudor.Ambarus 2022-06-27 16:53 ` Tudor.Ambarus 2022-06-30 5:20 ` Peter Rosin 2022-06-30 5:20 ` Peter Rosin 2022-06-30 9:23 ` Tudor.Ambarus 2022-06-30 9:23 ` Tudor.Ambarus 2022-06-30 10:20 ` Tudor.Ambarus [this message] 2022-06-30 10:20 ` Tudor.Ambarus 2022-07-13 16:01 ` Tudor.Ambarus 2022-07-13 16:01 ` Tudor.Ambarus 2022-07-28 7:45 ` Tudor.Ambarus 2022-07-28 7:45 ` Tudor.Ambarus 2022-07-28 8:39 ` Tudor.Ambarus 2022-07-28 8:39 ` Tudor.Ambarus 2022-07-29 20:09 ` Peter Rosin 2022-07-29 20:09 ` Peter Rosin 2022-07-30 11:37 ` Peter Rosin 2022-07-30 11:37 ` Peter Rosin 2022-07-31 3:44 ` Tudor.Ambarus 2022-07-31 3:44 ` Tudor.Ambarus 2022-03-04 20:06 ` Saravana Kannan 2022-03-04 20:06 ` Saravana Kannan 2022-03-04 8:00 ` Thorsten Leemhuis 2022-03-04 8:00 ` Thorsten Leemhuis
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=ab0d6f40-bbb8-81e2-b703-d33f4057aedc@microchip.com \ --to=tudor.ambarus@microchip.com \ --cc=Cristian.Birsan@microchip.com \ --cc=Ludovic.Desroches@microchip.com \ --cc=Nicolas.Ferre@microchip.com \ --cc=Patrice.Vilchez@microchip.com \ --cc=alexandre.belloni@bootlin.com \ --cc=du@axentia.se \ --cc=gregkh@linuxfoundation.org \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=peda@axentia.se \ --cc=regressions@leemhuis.info \ --cc=saravanak@google.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.