Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / Atom feed
* kexec on rk3399
@ 2019-07-22 14:31 Vicente Bergas
  2019-07-22 14:46 ` Will Deacon
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Vicente Bergas @ 2019-07-22 14:31 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Heiko Stuebner,
	linux-rockchip

Hi, i have been running linux on rk3399 booted with kexec fine until 5.2
From 5.2 onwards, there are memory corruption issues as reported here:
http://lkml.iu.edu/hypermail/linux/kernel/1906.2/07211.html
kexec has been identified as the principal reason for the issues.

It turns out that kexec has never worked reliably on this platform,
i was just lucky until recently.

Please, can you provide some directions on how to debug the issue?

The latest test performed is:
 1.- Boot v5.3-rc1
 2.- Kexec into v5.2.2
 3.- The kernel reports nothing (earlycon not enabled) and does not boot.
The same kernel v5.2.2 works fine when booted without kexec.

Regards,
  Vicenç.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kexec on rk3399
  2019-07-22 14:31 kexec on rk3399 Vicente Bergas
@ 2019-07-22 14:46 ` Will Deacon
  2019-07-22 16:08   ` Vicente Bergas
  2019-07-22 14:47 ` Matthias Brugger
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: Will Deacon @ 2019-07-22 14:46 UTC (permalink / raw)
  To: Vicente Bergas
  Cc: Heiko Stuebner, Marc Zyngier, Catalin Marinas, Will Deacon,
	linux-rockchip, linux-arm-kernel

On Mon, Jul 22, 2019 at 04:31:27PM +0200, Vicente Bergas wrote:
> Hi, i have been running linux on rk3399 booted with kexec fine until 5.2
> From 5.2 onwards, there are memory corruption issues as reported here:
> http://lkml.iu.edu/hypermail/linux/kernel/1906.2/07211.html
> kexec has been identified as the principal reason for the issues.

I thought that you resolved this issue by upgrading u-boot. Did that not
actually work?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kexec on rk3399
  2019-07-22 14:31 kexec on rk3399 Vicente Bergas
  2019-07-22 14:46 ` Will Deacon
@ 2019-07-22 14:47 ` Matthias Brugger
  2019-07-22 16:56   ` Vicente Bergas
  2019-07-22 14:54 ` Marc Zyngier
  2019-08-14 12:53 ` Vicente Bergas
  3 siblings, 1 reply; 23+ messages in thread
From: Matthias Brugger @ 2019-07-22 14:47 UTC (permalink / raw)
  To: Vicente Bergas, linux-arm-kernel
  Cc: Marc Zyngier, Catalin Marinas, Will Deacon, Heiko Stuebner,
	linux-rockchip



On 22/07/2019 16:31, Vicente Bergas wrote:
> Hi, i have been running linux on rk3399 booted with kexec fine until 5.2
> From 5.2 onwards, there are memory corruption issues as reported here:
> http://lkml.iu.edu/hypermail/linux/kernel/1906.2/07211.html
> kexec has been identified as the principal reason for the issues.
> 
> It turns out that kexec has never worked reliably on this platform,
> i was just lucky until recently.
> 
> Please, can you provide some directions on how to debug the issue?
> 
> The latest test performed is:
> 1.- Boot v5.3-rc1
> 2.- Kexec into v5.2.2
> 3.- The kernel reports nothing (earlycon not enabled) and does not boot.
> The same kernel v5.2.2 works fine when booted without kexec.

Is there a known kernel where kexec was working (v5.2 e.g.)?
If you can find one, you can run git bisect to find the commit that breaks things.

Apart from that you should try to enable earlycon on your kernel you want to
kexec into, so at least you can see if it crashes in the new kernel. If you
still don't see anything you can try to use prugatory printing (if you use
kexec_load and not kexec_file_load). I have a work-in-progress patches that I
can share with you if needed. Just let me know.

Regards,
Matthias


> 
> Regards,
>  Vicenç.
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kexec on rk3399
  2019-07-22 14:31 kexec on rk3399 Vicente Bergas
  2019-07-22 14:46 ` Will Deacon
  2019-07-22 14:47 ` Matthias Brugger
@ 2019-07-22 14:54 ` Marc Zyngier
  2019-07-22 17:05   ` Vicente Bergas
  2019-08-14 12:53 ` Vicente Bergas
  3 siblings, 1 reply; 23+ messages in thread
From: Marc Zyngier @ 2019-07-22 14:54 UTC (permalink / raw)
  To: Vicente Bergas, linux-arm-kernel
  Cc: Catalin Marinas, Will Deacon, Heiko Stuebner, linux-rockchip

On 22/07/2019 15:31, Vicente Bergas wrote:
> Hi, i have been running linux on rk3399 booted with kexec fine until 5.2
> From 5.2 onwards, there are memory corruption issues as reported here:
> http://lkml.iu.edu/hypermail/linux/kernel/1906.2/07211.html
> kexec has been identified as the principal reason for the issues.
> 
> It turns out that kexec has never worked reliably on this platform,
> i was just lucky until recently.
> 
> Please, can you provide some directions on how to debug the issue?
> 
> The latest test performed is:
>  1.- Boot v5.3-rc1

Not the most trusted version, but hey, why not...

>  2.- Kexec into v5.2.2
>  3.- The kernel reports nothing (earlycon not enabled) and does not boot.
> The same kernel v5.2.2 works fine when booted without kexec.

Can you please enable earlycon and let us know how far it goes? Your
previous reports hinting at runtime memory corruption, but this seems to
be much more radical...

FWIW, I was able to kexec using 5.2-rcx (for values of x around 6) on a
NanoPC-T4, using a mainline u-boot and firmware.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kexec on rk3399
  2019-07-22 14:46 ` Will Deacon
@ 2019-07-22 16:08   ` Vicente Bergas
  0 siblings, 0 replies; 23+ messages in thread
From: Vicente Bergas @ 2019-07-22 16:08 UTC (permalink / raw)
  To: Will Deacon
  Cc: Matthias Brugger, Heiko Stuebner, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-rockchip, linux-arm-kernel

On Monday, July 22, 2019 4:46:36 PM CEST, Will Deacon wrote:
> On Mon, Jul 22, 2019 at 04:31:27PM +0200, Vicente Bergas wrote:
>> Hi, i have been running linux on rk3399 booted with kexec fine until 5.2
>> From 5.2 onwards, there are memory corruption issues as reported here:
>> http://lkml.iu.edu/hypermail/linux/kernel/1906.2/07211.html
>> kexec has been identified as the principal reason for the issues.
>
> I thought that you resolved this issue by upgrading u-boot. Did that not
> actually work?
>
> Will

I am now using u-boot, but that is not a solution.
My intention is to use it on the Sapphire board and on gru-kevin.
I would like the bootloader to support HDMI on the sapphire and EDP on 
kevin
for the interactive boot menu.
U-Boot on the Sapphire boots, but has no graphic display. Kevin is not 
supported at all.
I've achived some progress on the u-boot front, but not enough to have 
display.

With a kexec-based bootloader display is already working, out of the box.

From another point of view, I would like to reduce the TCB (trusted 
computing base).
Linux is huge, but is already in the TCB, so, executing it twice does not 
add to the TCB.
U-Boot, instead, is enlarging the TCB, in this case.

Regards,
  Vicenç.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kexec on rk3399
  2019-07-22 14:47 ` Matthias Brugger
@ 2019-07-22 16:56   ` Vicente Bergas
  0 siblings, 0 replies; 23+ messages in thread
From: Vicente Bergas @ 2019-07-22 16:56 UTC (permalink / raw)
  To: Matthias Brugger
  Cc: Heiko Stuebner, Marc Zyngier, Catalin Marinas, Will Deacon,
	linux-rockchip, linux-arm-kernel

On Monday, July 22, 2019 4:47:51 PM CEST, Matthias Brugger wrote:
>
> On 22/07/2019 16:31, Vicente Bergas wrote:
>> Hi, i have been running linux on rk3399 booted with kexec fine until 5.2
>> From 5.2 onwards, there are memory corruption issues as reported here:
>> http://lkml.iu.edu/hypermail/linux/kernel/1906.2/07211.html
>> kexec has been identified as the principal reason for the issues.
>> 
>> It turns out that kexec has never worked reliably on this platform, ...
>
> Is there a known kernel where kexec was working (v5.2 e.g.)?
> If you can find one, you can run git bisect to find the commit 
> that breaks things.

Yes, most kernels i have tried before 5.2 worked, more or less.
From time to time there were random freezes at boot time.
Re-booting 2 or 3 times fixed it. This was happenning about 1 out of 10 
times.
I've rarely seen kernel traces on dmesg, but these were not affecting the 
stability of the system.
I've also used 5.2 (a known affected version) during days with no issues.
So, the randomness of the issue reproducibility makes bisecting too 
dificult.

> Apart from that you should try to enable earlycon on your kernel you want to
> kexec into, so at least you can see if it crashes in the new kernel.

Added earlycon to the cmdline. Now it boots fine, although it is not
working, the d_lookup issues are still there.
Basically, it looks like anything that changes the kernel memory layout
(like a single extra alloc) affects the issue, making it completely random.

> If you
> still don't see anything you can try to use prugatory printing (if you use
> kexec_load and not kexec_file_load). I have a work-in-progress 
> patches that I
> can share with you if needed. Just let me know.

What is 'prugatory printing' ? I am using kexec_load from kexec-tools. The
option to use kexec_file_load is not used.

> Regards,
> Matthias
>
>
>> Regards,
>>  Vicenç.
>> 
>> 
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org ...



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kexec on rk3399
  2019-07-22 14:54 ` Marc Zyngier
@ 2019-07-22 17:05   ` Vicente Bergas
  2019-07-22 17:35     ` Robin Murphy
  0 siblings, 1 reply; 23+ messages in thread
From: Vicente Bergas @ 2019-07-22 17:05 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Matthias Brugger, Heiko Stuebner, Catalin Marinas, Will Deacon,
	linux-rockchip, linux-arm-kernel

On Monday, July 22, 2019 4:54:41 PM CEST, Marc Zyngier wrote:
> On 22/07/2019 15:31, Vicente Bergas wrote:
>> Hi, i have been running linux on rk3399 booted with kexec fine until 5.2
>> From 5.2 onwards, there are memory corruption issues as reported here:
>> http://lkml.iu.edu/hypermail/linux/kernel/1906.2/07211.html
>> kexec has been identified as the principal reason for the issues.
>> 
>> It turns out that kexec has never worked reliably on this platform, ...
>
> Not the most trusted version, but hey, why not...
>
>>  2.- Kexec into v5.2.2
>>  3.- The kernel reports nothing (earlycon not enabled) and does not boot.
>> The same kernel v5.2.2 works fine when booted without kexec.
>
> Can you please enable earlycon and let us know how far it goes? Your
> previous reports hinting at runtime memory corruption, but this seems to
> be much more radical...

Details on previous email, but basically, earlycon affects reproducibility.

If it is a runtime memory corruption, what are we hunting for? an issue in
the kernel before kexec or in the kernel after kexec?

> FWIW, I was able to kexec using 5.2-rcx (for values of x around 6) on a
> NanoPC-T4, using a mainline u-boot and firmware.
>
> Thanks,
>
> 	M.

Regards,
  Vicenç.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kexec on rk3399
  2019-07-22 17:05   ` Vicente Bergas
@ 2019-07-22 17:35     ` Robin Murphy
  2019-07-22 18:53       ` Vicente Bergas
  0 siblings, 1 reply; 23+ messages in thread
From: Robin Murphy @ 2019-07-22 17:35 UTC (permalink / raw)
  To: Vicente Bergas, Marc Zyngier
  Cc: Matthias Brugger, Heiko Stuebner, Catalin Marinas, Will Deacon,
	linux-rockchip, linux-arm-kernel

On 22/07/2019 18:05, Vicente Bergas wrote:
> On Monday, July 22, 2019 4:54:41 PM CEST, Marc Zyngier wrote:
>> On 22/07/2019 15:31, Vicente Bergas wrote:
>>> Hi, i have been running linux on rk3399 booted with kexec fine until 5.2
>>> From 5.2 onwards, there are memory corruption issues as reported here:
>>> http://lkml.iu.edu/hypermail/linux/kernel/1906.2/07211.html
>>> kexec has been identified as the principal reason for the issues.
>>>
>>> It turns out that kexec has never worked reliably on this platform, ...
>>
>> Not the most trusted version, but hey, why not...
>>
>>>  2.- Kexec into v5.2.2
>>>  3.- The kernel reports nothing (earlycon not enabled) and does not 
>>> boot.
>>> The same kernel v5.2.2 works fine when booted without kexec.
>>
>> Can you please enable earlycon and let us know how far it goes? Your
>> previous reports hinting at runtime memory corruption, but this seems to
>> be much more radical...
> 
> Details on previous email, but basically, earlycon affects reproducibility.
> 
> If it is a runtime memory corruption, what are we hunting for? an issue in
> the kernel before kexec or in the kernel after kexec?

The obvious culprit would be DMA devices left running by the first 
kernel scribbling over the second kernel's memory before it's had the 
chance to reset them. Since boot-time memory allocation patterns tend to 
be relatively repeatable for a given platform and kernel configuration, 
"random" may well look a lot less random than you might expect, and it 
wouldn't be unheard of for e.g. the second kernel to mostly allocate its 
dentry cache from the same area the first kernel was mostly putting a 
network Rx descriptor ring.

Robin.

> 
>> FWIW, I was able to kexec using 5.2-rcx (for values of x around 6) on a
>> NanoPC-T4, using a mainline u-boot and firmware.
>>
>> Thanks,
>>
>>     M.
> 
> Regards,
>   Vicenç.
> 
> 
> _______________________________________________
> Linux-rockchip mailing list
> Linux-rockchip@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-rockchip

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kexec on rk3399
  2019-07-22 17:35     ` Robin Murphy
@ 2019-07-22 18:53       ` Vicente Bergas
  2019-07-23 13:32         ` Robin Murphy
  0 siblings, 1 reply; 23+ messages in thread
From: Vicente Bergas @ 2019-07-22 18:53 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Matthias Brugger, Heiko Stuebner, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-rockchip, linux-arm-kernel

On Monday, July 22, 2019 7:35:01 PM CEST, Robin Murphy wrote:
> On 22/07/2019 18:05, Vicente Bergas wrote:
>> On Monday, July 22, 2019 4:54:41 PM CEST, Marc Zyngier wrote: ...
>
> The obvious culprit would be DMA devices left running by the 
> first kernel scribbling over the second kernel's memory before 
> it's had the chance to reset them. Since boot-time memory 
> allocation patterns tend to be relatively repeatable for a given 
> platform and kernel configuration, "random" may well look a lot 
> less random than you might expect, and it wouldn't be unheard of 
> for e.g. the second kernel to mostly allocate its dentry cache 
> from the same area the first kernel was mostly putting a network 
> Rx descriptor ring.
>
> Robin.

Here is another attempted test: on the first kernel disable:
# CONFIG_ZONE_DMA32 is not set
# CONFIG_DMADEVICES is not set
# CONFIG_PL330_DMA is not set
that is, all i could disable matching CONFIG_*DMA*=y, which is not much.
Still enabled are:
CONFIG_HAVE_DMA_CONTIGUOUS=y
CONFIG_DMA_SHARED_BUFFER=y
CONFIG_SCSI_DMA=y
CONFIG_IOMMU_DMA=y
CONFIG_HAS_DMA=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_ARCH_DMA_ADDR_T_64BIT=y
CONFIG_DMA_DECLARE_COHERENT=y
CONFIG_ARCH_HAS_SETUP_DMA_OPS=y
CONFIG_ARCH_HAS_TEARDOWN_DMA_OPS=y
CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE=y
CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU=y
CONFIG_ARCH_HAS_DMA_PREP_COHERENT=y
CONFIG_ARCH_HAS_DMA_COHERENT_TO_PFN=y
CONFIG_ARCH_HAS_DMA_MMAP_PGPROT=y
CONFIG_DMA_REMAP=y
CONFIG_DMA_DIRECT_REMAP=y

Then the second kernel still fails with d_lookup errors.

Regards,
  Vicenç.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kexec on rk3399
  2019-07-22 18:53       ` Vicente Bergas
@ 2019-07-23 13:32         ` Robin Murphy
  0 siblings, 0 replies; 23+ messages in thread
From: Robin Murphy @ 2019-07-23 13:32 UTC (permalink / raw)
  To: Vicente Bergas
  Cc: Matthias Brugger, Heiko Stuebner, Marc Zyngier, Catalin Marinas,
	Will Deacon, linux-rockchip, linux-arm-kernel

On 22/07/2019 19:53, Vicente Bergas wrote:
> On Monday, July 22, 2019 7:35:01 PM CEST, Robin Murphy wrote:
>> On 22/07/2019 18:05, Vicente Bergas wrote:
>>> On Monday, July 22, 2019 4:54:41 PM CEST, Marc Zyngier wrote: ...
>>
>> The obvious culprit would be DMA devices left running by the first 
>> kernel scribbling over the second kernel's memory before it's had the 
>> chance to reset them. Since boot-time memory allocation patterns tend 
>> to be relatively repeatable for a given platform and kernel 
>> configuration, "random" may well look a lot less random than you might 
>> expect, and it wouldn't be unheard of for e.g. the second kernel to 
>> mostly allocate its dentry cache from the same area the first kernel 
>> was mostly putting a network Rx descriptor ring.
>>
>> Robin.
> 
> Here is another attempted test: on the first kernel disable:
> # CONFIG_ZONE_DMA32 is not set
> # CONFIG_DMADEVICES is not set
> # CONFIG_PL330_DMA is not set
> that is, all i could disable matching CONFIG_*DMA*=y, which is not much.
> Still enabled are:
> CONFIG_HAVE_DMA_CONTIGUOUS=y
> CONFIG_DMA_SHARED_BUFFER=y
> CONFIG_SCSI_DMA=y
> CONFIG_IOMMU_DMA=y
> CONFIG_HAS_DMA=y
> CONFIG_NEED_SG_DMA_LENGTH=y
> CONFIG_NEED_DMA_MAP_STATE=y
> CONFIG_ARCH_DMA_ADDR_T_64BIT=y
> CONFIG_DMA_DECLARE_COHERENT=y
> CONFIG_ARCH_HAS_SETUP_DMA_OPS=y
> CONFIG_ARCH_HAS_TEARDOWN_DMA_OPS=y
> CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE=y
> CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU=y
> CONFIG_ARCH_HAS_DMA_PREP_COHERENT=y
> CONFIG_ARCH_HAS_DMA_COHERENT_TO_PFN=y
> CONFIG_ARCH_HAS_DMA_MMAP_PGPROT=y
> CONFIG_DMA_REMAP=y
> CONFIG_DMA_DIRECT_REMAP=y
> 
> Then the second kernel still fails with d_lookup errors.

Yeah, none of those configs are particularly relevant - I'm thinking 
more about the drivers for ethernet, wifi, USB, PCI devices, and any 
other peripherals which may be set up to make DMA accesses based on 
external stimuli and don't get shut down cleanly by a kexec.

Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kexec on rk3399
  2019-07-22 14:31 kexec on rk3399 Vicente Bergas
                   ` (2 preceding siblings ...)
  2019-07-22 14:54 ` Marc Zyngier
@ 2019-08-14 12:53 ` Vicente Bergas
  2019-08-14 13:06   ` Felipe Balbi
  2019-08-14 13:12   ` Robin Murphy
  3 siblings, 2 replies; 23+ messages in thread
From: Vicente Bergas @ 2019-08-14 12:53 UTC (permalink / raw)
  To: Felipe Balbi, Robin Murphy
  Cc: Matthias Brugger, Heiko Stuebner, Marc Zyngier, Catalin Marinas,
	linux-usb, Will Deacon, linux-kernel, linux-rockchip,
	Greg Kroah-Hartman, linux-arm-kernel

On Monday, July 22, 2019 4:31:27 PM CEST, Vicente Bergas wrote:
> Hi, i have been running linux on rk3399 booted with kexec fine until 5.2
> From 5.2 onwards, there are memory corruption issues as reported here:
> http://lkml.iu.edu/hypermail/linux/kernel/1906.2/07211.html
> kexec has been identified as the principal reason for the issues.
>
> It turns out that kexec has never worked reliably on this platform,
> i was just lucky until recently.
>
> Please, can you provide some directions on how to debug the issue?

Thank you all for your suggestions on where the issue could be.

It seems that it was the USB driver.
Now using v5.2.8 booted with kexec from v5.2.8 with a workaround and
so far so good. It is being tested on the Sapphire board.

The workaround is:
--- a/drivers/usb/dwc3/dwc3-of-simple.c
+++ b/drivers/usb/dwc3/dwc3-of-simple.c
@@ -133,6 +133,13 @@
 	return 0;
 }
 
+static void dwc3_of_simple_shutdown(struct platform_device *pdev)
+{
+	struct dwc3_of_simple *simple = platform_get_drvdata(pdev);
+
+	reset_control_assert(simple->resets);
+}
+
 static int __maybe_unused dwc3_of_simple_runtime_suspend(struct device 
*dev)
 {
 	struct dwc3_of_simple	*simple = dev_get_drvdata(dev);
@@ -190,6 +197,7 @@
 static struct platform_driver dwc3_of_simple_driver = {
 	.probe		= dwc3_of_simple_probe,
 	.remove		= dwc3_of_simple_remove,
+	.shutdown	= dwc3_of_simple_shutdown,
 	.driver		= {
 		.name	= "dwc3-of-simple",
 		.of_match_table = of_dwc3_simple_match,

If this patch is OK after review i can resubmit it as a pull request.
Should a similar change be applied to drivers/usb/dwc3/core.c ?

Regards,
  Vicenç.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kexec on rk3399
  2019-08-14 12:53 ` Vicente Bergas
@ 2019-08-14 13:06   ` Felipe Balbi
  2019-08-14 13:15     ` Vicente Bergas
  2019-08-14 13:12   ` Robin Murphy
  1 sibling, 1 reply; 23+ messages in thread
From: Felipe Balbi @ 2019-08-14 13:06 UTC (permalink / raw)
  To: Vicente Bergas, Robin Murphy
  Cc: Matthias Brugger, Heiko Stuebner, Marc Zyngier, Catalin Marinas,
	linux-usb, Will Deacon, linux-kernel, linux-rockchip,
	Greg Kroah-Hartman, linux-arm-kernel

[-- Attachment #1.1: Type: text/plain, Size: 2085 bytes --]


Hi,

Vicente Bergas <vicencb@gmail.com> writes:
> On Monday, July 22, 2019 4:31:27 PM CEST, Vicente Bergas wrote:
>> Hi, i have been running linux on rk3399 booted with kexec fine until 5.2
>> From 5.2 onwards, there are memory corruption issues as reported here:
>> http://lkml.iu.edu/hypermail/linux/kernel/1906.2/07211.html
>> kexec has been identified as the principal reason for the issues.
>>
>> It turns out that kexec has never worked reliably on this platform,
>> i was just lucky until recently.
>>
>> Please, can you provide some directions on how to debug the issue?
>
> Thank you all for your suggestions on where the issue could be.
>
> It seems that it was the USB driver.
> Now using v5.2.8 booted with kexec from v5.2.8 with a workaround and
> so far so good. It is being tested on the Sapphire board.
>
> The workaround is:
> --- a/drivers/usb/dwc3/dwc3-of-simple.c
> +++ b/drivers/usb/dwc3/dwc3-of-simple.c
> @@ -133,6 +133,13 @@
>  	return 0;
>  }
>  
> +static void dwc3_of_simple_shutdown(struct platform_device *pdev)
> +{
> +	struct dwc3_of_simple *simple = platform_get_drvdata(pdev);
> +
> +	reset_control_assert(simple->resets);
> +}
> +
>  static int __maybe_unused dwc3_of_simple_runtime_suspend(struct device 
> *dev)
>  {
>  	struct dwc3_of_simple	*simple = dev_get_drvdata(dev);
> @@ -190,6 +197,7 @@
>  static struct platform_driver dwc3_of_simple_driver = {
>  	.probe		= dwc3_of_simple_probe,
>  	.remove		= dwc3_of_simple_remove,
> +	.shutdown	= dwc3_of_simple_shutdown,
>  	.driver		= {
>  		.name	= "dwc3-of-simple",
>  		.of_match_table = of_dwc3_simple_match,
>
> If this patch is OK after review i can resubmit it as a pull request.

not a pull request, just send a patch using git send-email

> Should a similar change be applied to drivers/usb/dwc3/core.c ?

Is it necessary? We haven't had any bug reports regarding that. Also, if
we have reset control support in the core driver, why do we need it in
of_simple? Seems like of_simple could just rely on what core does.

-- 
balbi

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kexec on rk3399
  2019-08-14 12:53 ` Vicente Bergas
  2019-08-14 13:06   ` Felipe Balbi
@ 2019-08-14 13:12   ` Robin Murphy
  2019-08-15  1:15     ` Vicente Bergas
  1 sibling, 1 reply; 23+ messages in thread
From: Robin Murphy @ 2019-08-14 13:12 UTC (permalink / raw)
  To: Vicente Bergas, Felipe Balbi
  Cc: Matthias Brugger, Heiko Stuebner, Marc Zyngier, Catalin Marinas,
	linux-usb, Will Deacon, linux-kernel, linux-rockchip,
	Greg Kroah-Hartman, linux-arm-kernel

On 14/08/2019 13:53, Vicente Bergas wrote:
> On Monday, July 22, 2019 4:31:27 PM CEST, Vicente Bergas wrote:
>> Hi, i have been running linux on rk3399 booted with kexec fine until 5.2
>> From 5.2 onwards, there are memory corruption issues as reported here:
>> http://lkml.iu.edu/hypermail/linux/kernel/1906.2/07211.html
>> kexec has been identified as the principal reason for the issues.
>>
>> It turns out that kexec has never worked reliably on this platform,
>> i was just lucky until recently.
>>
>> Please, can you provide some directions on how to debug the issue?
> 
> Thank you all for your suggestions on where the issue could be.
> 
> It seems that it was the USB driver.
> Now using v5.2.8 booted with kexec from v5.2.8 with a workaround and
> so far so good. It is being tested on the Sapphire board.
> 
> The workaround is:
> --- a/drivers/usb/dwc3/dwc3-of-simple.c
> +++ b/drivers/usb/dwc3/dwc3-of-simple.c
> @@ -133,6 +133,13 @@
>      return 0;
> }
> 
> +static void dwc3_of_simple_shutdown(struct platform_device *pdev)
> +{
> +    struct dwc3_of_simple *simple = platform_get_drvdata(pdev);
> +
> +    reset_control_assert(simple->resets);
> +}
> +
> static int __maybe_unused dwc3_of_simple_runtime_suspend(struct device 
> *dev)
> {
>      struct dwc3_of_simple    *simple = dev_get_drvdata(dev);
> @@ -190,6 +197,7 @@
> static struct platform_driver dwc3_of_simple_driver = {
>      .probe        = dwc3_of_simple_probe,
>      .remove        = dwc3_of_simple_remove,
> +    .shutdown    = dwc3_of_simple_shutdown,
>      .driver        = {
>          .name    = "dwc3-of-simple",
>          .of_match_table = of_dwc3_simple_match,
> 
> If this patch is OK after review i can resubmit it as a pull request.
> Should a similar change be applied to drivers/usb/dwc3/core.c ?

This particular change looks like it's implicitly specific to RK3399, 
which wouldn't be ideal. Presumably if the core dwc3 driver implemented 
shutdown correctly (echoing parts of dwc3_remove(), I guess) then the 
glue layers shouldn't need anything special anyway.

Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kexec on rk3399
  2019-08-14 13:06   ` Felipe Balbi
@ 2019-08-14 13:15     ` Vicente Bergas
  2019-08-15  6:00       ` Felipe Balbi
  0 siblings, 1 reply; 23+ messages in thread
From: Vicente Bergas @ 2019-08-14 13:15 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Matthias Brugger, Heiko Stuebner, Marc Zyngier, Catalin Marinas,
	linux-usb, Will Deacon, linux-kernel, linux-rockchip,
	Greg Kroah-Hartman, Robin Murphy, linux-arm-kernel

On Wednesday, August 14, 2019 3:06:04 PM CEST, Felipe Balbi wrote:
> Hi,
>
> Vicente Bergas <vicencb@gmail.com> writes:
>> On Monday, July 22, 2019 4:31:27 PM CEST, Vicente Bergas wrote:
>>> Hi, i have been running linux on rk3399 booted with kexec fine until 5.2
>>> From 5.2 onwards, there are memory corruption issues as reported here:
>>> http://lkml.iu.edu/hypermail/linux/kernel/1906.2/07211.html
>>> kexec has been identified as the principal reason for the issues.
>>> 
>>> It turns out that kexec has never worked reliably on this platform, ...
>> 
>> Thank you all for your suggestions on where the issue could be.
>> 
>> It seems that it was the USB driver.
>> Now using v5.2.8 booted with kexec from v5.2.8 with a workaround and
>> so far so good. It is being tested on the Sapphire board.
>> 
>> The workaround is:
>> --- a/drivers/usb/dwc3/dwc3-of-simple.c
>> +++ b/drivers/usb/dwc3/dwc3-of-simple.c
>> @@ -133,6 +133,13 @@
>>  	return 0;
>>  }
>>  
>> +static void dwc3_of_simple_shutdown(struct platform_device *pdev)
>> +{
>> +	struct dwc3_of_simple *simple = platform_get_drvdata(pdev);
>> +
>> +	reset_control_assert(simple->resets);
>> +}
>> +
>>  static int __maybe_unused dwc3_of_simple_runtime_suspend(struct device 
>> *dev)
>>  {
>>  	struct dwc3_of_simple	*simple = dev_get_drvdata(dev);
>> @@ -190,6 +197,7 @@
>>  static struct platform_driver dwc3_of_simple_driver = {
>>  	.probe		= dwc3_of_simple_probe,
>>  	.remove		= dwc3_of_simple_remove,
>> +	.shutdown	= dwc3_of_simple_shutdown,
>>  	.driver		= {
>>  		.name	= "dwc3-of-simple",
>>  		.of_match_table = of_dwc3_simple_match,
>> 
>> If this patch is OK after review i can resubmit it as a pull request.
>
> not a pull request, just send a patch using git send-email
>
>> Should a similar change be applied to drivers/usb/dwc3/core.c ?
>
> Is it necessary? We haven't had any bug reports regarding that. Also, if
> we have reset control support in the core driver, why do we need it in
> of_simple? Seems like of_simple could just rely on what core does.

the workaround has been tested patching only core.c with
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1561,6 +1561,13 @@
 	return 0;
 }
 
+static void dwc3_shutdown(struct platform_device *pdev)
+{
+	struct dwc3 *dwc = platform_get_drvdata(pdev);
+
+	reset_control_assert(dwc->reset);
+}
+
 #ifdef CONFIG_PM
 static int dwc3_core_init_for_resume(struct dwc3 *dwc)
 {
@@ -1866,6 +1873,7 @@
 static struct platform_driver dwc3_driver = {
 	.probe		= dwc3_probe,
 	.remove		= dwc3_remove,
+	.shutdown	= dwc3_shutdown,
 	.driver		= {
 		.name	= "dwc3",
 		.of_match_table	= of_match_ptr(of_dwc3_match),

and leaving dwc3-of-simple.c as is, the issue persisted.

Regards,
  Vicenç.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kexec on rk3399
  2019-08-14 13:12   ` Robin Murphy
@ 2019-08-15  1:15     ` Vicente Bergas
  2019-08-15  6:06       ` Felipe Balbi
  0 siblings, 1 reply; 23+ messages in thread
From: Vicente Bergas @ 2019-08-15  1:15 UTC (permalink / raw)
  To: Robin Murphy, Felipe Balbi
  Cc: Matthias Brugger, Heiko Stuebner, Marc Zyngier, Catalin Marinas,
	linux-usb, Will Deacon, linux-kernel, linux-rockchip,
	Greg Kroah-Hartman, linux-arm-kernel

On Wednesday, August 14, 2019 3:12:26 PM CEST, Robin Murphy wrote:
> On 14/08/2019 13:53, Vicente Bergas wrote:
>> On Monday, July 22, 2019 4:31:27 PM CEST, Vicente Bergas wrote: ...
>
> This particular change looks like it's implicitly specific to 
> RK3399, which wouldn't be ideal. Presumably if the core dwc3 
> driver implemented shutdown correctly (echoing parts of 
> dwc3_remove(), I guess) then the glue layers shouldn't need 
> anything special anyway.
>
> Robin.

I just checked simple->resets from dwc3-of-simple.c and it is an array
with multiple resets whereas dwc->reset from core.c is NULL.
So the reset seems specific to the glue layers.
Is there another way than resetting the thing that is
generic enough to go to core.c and allows kexec?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kexec on rk3399
  2019-08-14 13:15     ` Vicente Bergas
@ 2019-08-15  6:00       ` Felipe Balbi
  0 siblings, 0 replies; 23+ messages in thread
From: Felipe Balbi @ 2019-08-15  6:00 UTC (permalink / raw)
  To: Vicente Bergas
  Cc: Matthias Brugger, Heiko Stuebner, Marc Zyngier, Catalin Marinas,
	linux-usb, Will Deacon, linux-kernel, linux-rockchip,
	Greg Kroah-Hartman, Robin Murphy, linux-arm-kernel


Hi,

Vicente Bergas <vicencb@gmail.com> writes:
>> Vicente Bergas <vicencb@gmail.com> writes:
>>> On Monday, July 22, 2019 4:31:27 PM CEST, Vicente Bergas wrote:
>>>> Hi, i have been running linux on rk3399 booted with kexec fine until 5.2
>>>> From 5.2 onwards, there are memory corruption issues as reported here:
>>>> http://lkml.iu.edu/hypermail/linux/kernel/1906.2/07211.html
>>>> kexec has been identified as the principal reason for the issues.
>>>> 
>>>> It turns out that kexec has never worked reliably on this platform, ...
>>> 
>>> Thank you all for your suggestions on where the issue could be.
>>> 
>>> It seems that it was the USB driver.
>>> Now using v5.2.8 booted with kexec from v5.2.8 with a workaround and
>>> so far so good. It is being tested on the Sapphire board.
>>> 
>>> The workaround is:
>>> --- a/drivers/usb/dwc3/dwc3-of-simple.c
>>> +++ b/drivers/usb/dwc3/dwc3-of-simple.c
>>> @@ -133,6 +133,13 @@
>>>  	return 0;
>>>  }
>>>  
>>> +static void dwc3_of_simple_shutdown(struct platform_device *pdev)
>>> +{
>>> +	struct dwc3_of_simple *simple = platform_get_drvdata(pdev);
>>> +
>>> +	reset_control_assert(simple->resets);
>>> +}
>>> +
>>>  static int __maybe_unused dwc3_of_simple_runtime_suspend(struct device 
>>> *dev)
>>>  {
>>>  	struct dwc3_of_simple	*simple = dev_get_drvdata(dev);
>>> @@ -190,6 +197,7 @@
>>>  static struct platform_driver dwc3_of_simple_driver = {
>>>  	.probe		= dwc3_of_simple_probe,
>>>  	.remove		= dwc3_of_simple_remove,
>>> +	.shutdown	= dwc3_of_simple_shutdown,
>>>  	.driver		= {
>>>  		.name	= "dwc3-of-simple",
>>>  		.of_match_table = of_dwc3_simple_match,
>>> 
>>> If this patch is OK after review i can resubmit it as a pull request.
>>
>> not a pull request, just send a patch using git send-email
>>
>>> Should a similar change be applied to drivers/usb/dwc3/core.c ?
>>
>> Is it necessary? We haven't had any bug reports regarding that. Also, if
>> we have reset control support in the core driver, why do we need it in
>> of_simple? Seems like of_simple could just rely on what core does.
>
> the workaround has been tested patching only core.c with
> --- a/drivers/usb/dwc3/core.c
> +++ b/drivers/usb/dwc3/core.c
> @@ -1561,6 +1561,13 @@
>  	return 0;
>  }
>  
> +static void dwc3_shutdown(struct platform_device *pdev)
> +{
> +	struct dwc3 *dwc = platform_get_drvdata(pdev);
> +
> +	reset_control_assert(dwc->reset);
> +}
> +
>  #ifdef CONFIG_PM
>  static int dwc3_core_init_for_resume(struct dwc3 *dwc)
>  {
> @@ -1866,6 +1873,7 @@
>  static struct platform_driver dwc3_driver = {
>  	.probe		= dwc3_probe,
>  	.remove		= dwc3_remove,
> +	.shutdown	= dwc3_shutdown,
>  	.driver		= {
>  		.name	= "dwc3",
>  		.of_match_table	= of_match_ptr(of_dwc3_match),
>
> and leaving dwc3-of-simple.c as is, the issue persisted.

That's because your reset controller is not passed to dwc3 core, only to
your glue layer.

-- 
balbi

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kexec on rk3399
  2019-08-15  1:15     ` Vicente Bergas
@ 2019-08-15  6:06       ` Felipe Balbi
  2019-08-15 11:09         ` Robin Murphy
  0 siblings, 1 reply; 23+ messages in thread
From: Felipe Balbi @ 2019-08-15  6:06 UTC (permalink / raw)
  To: Vicente Bergas, Robin Murphy
  Cc: Matthias Brugger, Heiko Stuebner, Marc Zyngier, Catalin Marinas,
	linux-usb, Will Deacon, linux-kernel, linux-rockchip,
	Greg Kroah-Hartman, linux-arm-kernel


Hi,

Vicente Bergas <vicencb@gmail.com> writes:

> On Wednesday, August 14, 2019 3:12:26 PM CEST, Robin Murphy wrote:
>> On 14/08/2019 13:53, Vicente Bergas wrote:
>>> On Monday, July 22, 2019 4:31:27 PM CEST, Vicente Bergas wrote: ...
>>
>> This particular change looks like it's implicitly specific to 
>> RK3399, which wouldn't be ideal. Presumably if the core dwc3 
>> driver implemented shutdown correctly (echoing parts of 
>> dwc3_remove(), I guess) then the glue layers shouldn't need 
>> anything special anyway.
>>
>> Robin.
>
> I just checked simple->resets from dwc3-of-simple.c and it is an array
> with multiple resets whereas dwc->reset from core.c is NULL.
> So the reset seems specific to the glue layers.
> Is there another way than resetting the thing that is
> generic enough to go to core.c and allows kexec?

This is a really odd 'failure'. We do full soft reset during driver
initialization on dwc3. We shouldn't need to assert reset on shutdown,
really.

I think the problem is here:

	if (simple->pulse_resets) {
		ret = reset_control_reset(simple->resets);
		if (ret)
			goto err_resetc_put;
	} else {
		ret = reset_control_deassert(simple->resets);
		if (ret)
			goto err_resetc_put;
	}

Note that if pulse_resets is set, we will run a reset. But if
pulse_resets is false and need_reset is true, we deassert the reset.

I think below patch is enough:

diff --git a/drivers/usb/dwc3/dwc3-of-simple.c b/drivers/usb/dwc3/dwc3-of-simple.c
index bdac3e7d7b18..9a2f3e09aa2e 100644
--- a/drivers/usb/dwc3/dwc3-of-simple.c
+++ b/drivers/usb/dwc3/dwc3-of-simple.c
@@ -72,7 +72,15 @@ static int dwc3_of_simple_probe(struct platform_device *pdev)
 		ret = reset_control_reset(simple->resets);
 		if (ret)
 			goto err_resetc_put;
-	} else {
+	}
+
+	if (simple->need_reset) {
+		ret = reset_control_assert(simple->resets);
+		if (ret)
+			goto err_resetc_put;
+
+		usleep_range(1000, 2000);
+
 		ret = reset_control_deassert(simple->resets);
 		if (ret)
 			goto err_resetc_put;
@@ -121,9 +129,6 @@ static int dwc3_of_simple_remove(struct platform_device *pdev)
 	clk_bulk_put_all(simple->num_clocks, simple->clks);
 	simple->num_clocks = 0;
 
-	if (!simple->pulse_resets)
-		reset_control_assert(simple->resets);
-
 	reset_control_put(simple->resets);
 
 	pm_runtime_disable(dev);

Can you test?

-- 
balbi

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: kexec on rk3399
  2019-08-15  6:06       ` Felipe Balbi
@ 2019-08-15 11:09         ` Robin Murphy
  2019-08-17 17:41           ` [PATCH] usb: dwc3: Add shutdown to platform_driver Vicente Bergas
  0 siblings, 1 reply; 23+ messages in thread
From: Robin Murphy @ 2019-08-15 11:09 UTC (permalink / raw)
  To: Felipe Balbi, Vicente Bergas
  Cc: Matthias Brugger, Heiko Stuebner, Marc Zyngier, Catalin Marinas,
	linux-usb, Will Deacon, linux-kernel, linux-rockchip,
	Greg Kroah-Hartman, linux-arm-kernel

On 15/08/2019 07:06, Felipe Balbi wrote:
> 
> Hi,
> 
> Vicente Bergas <vicencb@gmail.com> writes:
> 
>> On Wednesday, August 14, 2019 3:12:26 PM CEST, Robin Murphy wrote:
>>> On 14/08/2019 13:53, Vicente Bergas wrote:
>>>> On Monday, July 22, 2019 4:31:27 PM CEST, Vicente Bergas wrote: ...
>>>
>>> This particular change looks like it's implicitly specific to
>>> RK3399, which wouldn't be ideal. Presumably if the core dwc3
>>> driver implemented shutdown correctly (echoing parts of
>>> dwc3_remove(), I guess) then the glue layers shouldn't need
>>> anything special anyway.
>>>
>>> Robin.
>>
>> I just checked simple->resets from dwc3-of-simple.c and it is an array
>> with multiple resets whereas dwc->reset from core.c is NULL.
>> So the reset seems specific to the glue layers.
>> Is there another way than resetting the thing that is
>> generic enough to go to core.c and allows kexec?
> 
> This is a really odd 'failure'. We do full soft reset during driver
> initialization on dwc3. We shouldn't need to assert reset on shutdown,
> really.

Probing/initialisation has never been the problem. The issue for the 
kexec case is that when the first kernel shuts down, there is currently 
nothing to quiesce the controller (since only driver->shutdown gets 
called, not driver->remove), and thus (presumably) external USB activity 
causes it to keep writing back over the memory where the 
descriptors/command ring used to be while the second kernel boots. The 
second kernel will eventually probe and reset it appropriately, but by 
that time the damage is already done.

Yanking on a hardware reset line when the first kernel shuts down is 
certainly one way to stop any memory accesses if such a control is 
available, but presumably there's a general software way to gracefully 
disable the controller's DMA functions until a subsequent probe can 
fully reset it again - I think that would be the preferable solution.

Robin.

> I think the problem is here:
> 
> 	if (simple->pulse_resets) {
> 		ret = reset_control_reset(simple->resets);
> 		if (ret)
> 			goto err_resetc_put;
> 	} else {
> 		ret = reset_control_deassert(simple->resets);
> 		if (ret)
> 			goto err_resetc_put;
> 	}
> 
> Note that if pulse_resets is set, we will run a reset. But if
> pulse_resets is false and need_reset is true, we deassert the reset.
> 
> I think below patch is enough:
> 
> diff --git a/drivers/usb/dwc3/dwc3-of-simple.c b/drivers/usb/dwc3/dwc3-of-simple.c
> index bdac3e7d7b18..9a2f3e09aa2e 100644
> --- a/drivers/usb/dwc3/dwc3-of-simple.c
> +++ b/drivers/usb/dwc3/dwc3-of-simple.c
> @@ -72,7 +72,15 @@ static int dwc3_of_simple_probe(struct platform_device *pdev)
>   		ret = reset_control_reset(simple->resets);
>   		if (ret)
>   			goto err_resetc_put;
> -	} else {
> +	}
> +
> +	if (simple->need_reset) {
> +		ret = reset_control_assert(simple->resets);
> +		if (ret)
> +			goto err_resetc_put;
> +
> +		usleep_range(1000, 2000);
> +
>   		ret = reset_control_deassert(simple->resets);
>   		if (ret)
>   			goto err_resetc_put;
> @@ -121,9 +129,6 @@ static int dwc3_of_simple_remove(struct platform_device *pdev)
>   	clk_bulk_put_all(simple->num_clocks, simple->clks);
>   	simple->num_clocks = 0;
>   
> -	if (!simple->pulse_resets)
> -		reset_control_assert(simple->resets);
> -
>   	reset_control_put(simple->resets);
>   
>   	pm_runtime_disable(dev);
> 
> Can you test?
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] usb: dwc3: Add shutdown to platform_driver
  2019-08-15 11:09         ` Robin Murphy
@ 2019-08-17 17:41           ` Vicente Bergas
  2019-08-27 11:45             ` Vicente Bergas
  0 siblings, 1 reply; 23+ messages in thread
From: Vicente Bergas @ 2019-08-17 17:41 UTC (permalink / raw)
  To: Felipe Balbi, Robin Murphy
  Cc: Matthias Brugger, Heiko Stuebner, MarcZyngier, Catalin Marinas,
	linux-usb, Will Deacon, Vicente Bergas, linux-rockchip,
	Greg Kroah-Hartman, linux-arm-kernel

Otherwise the device keeps writing to memory after kexec and disturbs
the next kernel.

Signed-off-by: Vicente Bergas <vicencb@gmail.com>
---
 drivers/usb/dwc3/dwc3-of-simple.c | 6 ++++++
 1 file changed, 6 insertions(+)

Hi Felipe, Robin,
this version calls 'remove' from 'shutdown' instead of just asserting
a reset because it looks like a cleaner way to stop the device.

Calling remove from shutdown in core.c instead of dwc3-of-simple.c does not
fix the issue either.

It has been tested on the sapphire board, a RK3399 platform.

Regards,
  Vicenç.

diff --git a/drivers/usb/dwc3/dwc3-of-simple.c b/drivers/usb/dwc3/dwc3-of-simple.c
index bdac3e7d7b18..d5fd45c64901 100644
--- a/drivers/usb/dwc3/dwc3-of-simple.c
+++ b/drivers/usb/dwc3/dwc3-of-simple.c
@@ -133,6 +133,11 @@ static int dwc3_of_simple_remove(struct platform_device *pdev)
 	return 0;
 }
 
+static void dwc3_of_simple_shutdown(struct platform_device *pdev)
+{
+	dwc3_of_simple_remove(pdev);
+}
+
 static int __maybe_unused dwc3_of_simple_runtime_suspend(struct device *dev)
 {
 	struct dwc3_of_simple	*simple = dev_get_drvdata(dev);
@@ -190,6 +195,7 @@ MODULE_DEVICE_TABLE(of, of_dwc3_simple_match);
 static struct platform_driver dwc3_of_simple_driver = {
 	.probe		= dwc3_of_simple_probe,
 	.remove		= dwc3_of_simple_remove,
+	.shutdown	= dwc3_of_simple_shutdown,
 	.driver		= {
 		.name	= "dwc3-of-simple",
 		.of_match_table = of_dwc3_simple_match,
-- 
2.22.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: Add shutdown to platform_driver
  2019-08-17 17:41           ` [PATCH] usb: dwc3: Add shutdown to platform_driver Vicente Bergas
@ 2019-08-27 11:45             ` Vicente Bergas
  2019-08-27 11:53               ` Felipe Balbi
  0 siblings, 1 reply; 23+ messages in thread
From: Vicente Bergas @ 2019-08-27 11:45 UTC (permalink / raw)
  To: Felipe Balbi, Robin Murphy
  Cc: Matthias Brugger, Heiko Stuebner, MarcZyngier, Catalin Marinas,
	linux-usb, Will Deacon, linux-rockchip, Greg Kroah-Hartman,
	linux-arm-kernel

On Saturday, August 17, 2019 7:41:40 PM CEST, Vicente Bergas wrote:
> Otherwise the device keeps writing to memory after kexec and disturbs
> the next kernel.
>
> Signed-off-by: Vicente Bergas <vicencb@gmail.com>
> ---
>  drivers/usb/dwc3/dwc3-of-simple.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> Hi Felipe, Robin,
> this version calls 'remove' from 'shutdown' instead of just asserting
> a reset because it looks like a cleaner way to stop the device.
>
> Calling remove from shutdown in core.c instead of dwc3-of-simple.c does not
> fix the issue either.
>
> It has been tested on the sapphire board, a RK3399 platform.
>
> Regards,
>   Vicenç.
>
> diff --git a/drivers/usb/dwc3/dwc3-of-simple.c 
> b/drivers/usb/dwc3/dwc3-of-simple.c
> index bdac3e7d7b18..d5fd45c64901 100644
> --- a/drivers/usb/dwc3/dwc3-of-simple.c
> +++ b/drivers/usb/dwc3/dwc3-of-simple.c
> @@ -133,6 +133,11 @@ static int dwc3_of_simple_remove(struct 
> platform_device *pdev)
>  	return 0;
>  }
>  
> +static void dwc3_of_simple_shutdown(struct platform_device *pdev)
> +{
> +	dwc3_of_simple_remove(pdev);
> +}
> +
>  static int __maybe_unused 
> dwc3_of_simple_runtime_suspend(struct device *dev)
>  {
>  	struct dwc3_of_simple	*simple = dev_get_drvdata(dev);
> @@ -190,6 +195,7 @@ MODULE_DEVICE_TABLE(of, of_dwc3_simple_match);
>  static struct platform_driver dwc3_of_simple_driver = {
>  	.probe		= dwc3_of_simple_probe,
>  	.remove		= dwc3_of_simple_remove,
> +	.shutdown	= dwc3_of_simple_shutdown,
>  	.driver		= {
>  		.name	= "dwc3-of-simple",
>  		.of_match_table = of_dwc3_simple_match,

Hi,
please, can you provide some feedback on this?

Regards,
  Vicenç.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: Add shutdown to platform_driver
  2019-08-27 11:45             ` Vicente Bergas
@ 2019-08-27 11:53               ` Felipe Balbi
  2019-08-27 12:16                 ` Vicente Bergas
  0 siblings, 1 reply; 23+ messages in thread
From: Felipe Balbi @ 2019-08-27 11:53 UTC (permalink / raw)
  To: Vicente Bergas, Robin Murphy
  Cc: Matthias Brugger, Heiko Stuebner, MarcZyngier, Catalin Marinas,
	linux-usb, Will Deacon, linux-rockchip, Greg Kroah-Hartman,
	linux-arm-kernel

[-- Attachment #1.1: Type: text/plain, Size: 1852 bytes --]


Hi,

Vicente Bergas <vicencb@gmail.com> writes:
> On Saturday, August 17, 2019 7:41:40 PM CEST, Vicente Bergas wrote:
>> Otherwise the device keeps writing to memory after kexec and disturbs
>> the next kernel.
>>
>> Signed-off-by: Vicente Bergas <vicencb@gmail.com>
>> ---
>>  drivers/usb/dwc3/dwc3-of-simple.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> Hi Felipe, Robin,
>> this version calls 'remove' from 'shutdown' instead of just asserting
>> a reset because it looks like a cleaner way to stop the device.
>>
>> Calling remove from shutdown in core.c instead of dwc3-of-simple.c does not
>> fix the issue either.
>>
>> It has been tested on the sapphire board, a RK3399 platform.
>>
>> Regards,
>>   Vicenç.
>>
>> diff --git a/drivers/usb/dwc3/dwc3-of-simple.c 
>> b/drivers/usb/dwc3/dwc3-of-simple.c
>> index bdac3e7d7b18..d5fd45c64901 100644
>> --- a/drivers/usb/dwc3/dwc3-of-simple.c
>> +++ b/drivers/usb/dwc3/dwc3-of-simple.c
>> @@ -133,6 +133,11 @@ static int dwc3_of_simple_remove(struct 
>> platform_device *pdev)
>>  	return 0;
>>  }
>>  
>> +static void dwc3_of_simple_shutdown(struct platform_device *pdev)
>> +{
>> +	dwc3_of_simple_remove(pdev);
>> +}
>> +
>>  static int __maybe_unused 
>> dwc3_of_simple_runtime_suspend(struct device *dev)
>>  {
>>  	struct dwc3_of_simple	*simple = dev_get_drvdata(dev);
>> @@ -190,6 +195,7 @@ MODULE_DEVICE_TABLE(of, of_dwc3_simple_match);
>>  static struct platform_driver dwc3_of_simple_driver = {
>>  	.probe		= dwc3_of_simple_probe,
>>  	.remove		= dwc3_of_simple_remove,
>> +	.shutdown	= dwc3_of_simple_shutdown,

why don't you just have shutdown use the same exact function as remove?
Frankly, though, I still don't fully understand what's going wrong
here. Why is the device still alive during kexec?

cheers

-- 
balbi

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: Add shutdown to platform_driver
  2019-08-27 11:53               ` Felipe Balbi
@ 2019-08-27 12:16                 ` Vicente Bergas
  2019-09-09 15:07                   ` Vicente Bergas
  0 siblings, 1 reply; 23+ messages in thread
From: Vicente Bergas @ 2019-08-27 12:16 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Matthias Brugger, Heiko Stuebner, MarcZyngier, Catalin Marinas,
	linux-usb, Will Deacon, linux-rockchip, Greg Kroah-Hartman,
	Robin Murphy, linux-arm-kernel

On Tuesday, August 27, 2019 1:53:04 PM CEST, Felipe Balbi wrote:
> Hi,
>
> Vicente Bergas <vicencb@gmail.com> writes:
>> On Saturday, August 17, 2019 7:41:40 PM CEST, Vicente Bergas wrote:
>>> Otherwise the device keeps writing to memory after kexec and disturbs
>>> the next kernel.
...
>
> why don't you just have shutdown use the same exact function as remove?
> Frankly, though, I still don't fully understand what's going wrong
> here. Why is the device still alive during kexec?
>
> cheers

Hi Felipe,
the remove and shutdown functions have different prototypes, so
shutdown is wrapping remove.
Would it be preferable to cast remove as shutdown?

The issue with kexec is that the device is being used during the livetime
of the first kernel. When the first kernel executes kexec it calls the
shutdown function of drivers (instead of remove). Because of this the dwc3
device keeps doing things like DMA.
While the second kernel is taking over, it gets its memory corrupted with
such DMA accesses from the device. When the second kernel reaches the point
of taking over the dwc3 device, re-initializes it, but it is already too
late. Still worse, if the second kernel did not have the dwc3 driver, it
would get endless memory corruptions.
All in all, devices that can do DMA need to stop doing it on shutdown.

Regards,
  Vicenç.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] usb: dwc3: Add shutdown to platform_driver
  2019-08-27 12:16                 ` Vicente Bergas
@ 2019-09-09 15:07                   ` Vicente Bergas
  0 siblings, 0 replies; 23+ messages in thread
From: Vicente Bergas @ 2019-09-09 15:07 UTC (permalink / raw)
  To: Felipe Balbi, Robin Murphy
  Cc: Matthias Brugger, Heiko Stuebner, MarcZyngier, Catalin Marinas,
	linux-usb, Will Deacon, linux-rockchip, Greg Kroah-Hartman,
	linux-arm-kernel

On Tuesday, August 27, 2019 2:16:20 PM CEST, Vicente Bergas wrote:
> On Tuesday, August 27, 2019 1:53:04 PM CEST, Felipe Balbi wrote:
>> Hi,
>> 
>> Vicente Bergas <vicencb@gmail.com> writes:
>>> On Saturday, August 17, 2019 7:41:40 PM CEST, Vicente Bergas wrote:
>>>> Otherwise the device keeps writing to memory after kexec and disturbs
>>>> the next kernel.
> ...
>> 
>> why don't you just have shutdown use the same exact function as remove?
>> Frankly, though, I still don't fully understand what's going wrong
>> here. Why is the device still alive during kexec?
>> 
>> cheers
>
> Hi Felipe,
> the remove and shutdown functions have different prototypes, so
> shutdown is wrapping remove.
> Would it be preferable to cast remove as shutdown?
>
> The issue with kexec is that the device is being used during the livetime
> of the first kernel. When the first kernel executes kexec it calls the
> shutdown function of drivers (instead of remove). Because of this the dwc3
> device keeps doing things like DMA.
> While the second kernel is taking over, it gets its memory corrupted with
> such DMA accesses from the device. When the second kernel reaches the point
> of taking over the dwc3 device, re-initializes it, but it is already too
> late. Still worse, if the second kernel did not have the dwc3 driver, it
> would get endless memory corruptions.
> All in all, devices that can do DMA need to stop doing it on shutdown.
>
> Regards,
>  Vicenç.

Hi,
please, can you provide some feedback on this?

Regards,
 Vicenç.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, back to index

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-22 14:31 kexec on rk3399 Vicente Bergas
2019-07-22 14:46 ` Will Deacon
2019-07-22 16:08   ` Vicente Bergas
2019-07-22 14:47 ` Matthias Brugger
2019-07-22 16:56   ` Vicente Bergas
2019-07-22 14:54 ` Marc Zyngier
2019-07-22 17:05   ` Vicente Bergas
2019-07-22 17:35     ` Robin Murphy
2019-07-22 18:53       ` Vicente Bergas
2019-07-23 13:32         ` Robin Murphy
2019-08-14 12:53 ` Vicente Bergas
2019-08-14 13:06   ` Felipe Balbi
2019-08-14 13:15     ` Vicente Bergas
2019-08-15  6:00       ` Felipe Balbi
2019-08-14 13:12   ` Robin Murphy
2019-08-15  1:15     ` Vicente Bergas
2019-08-15  6:06       ` Felipe Balbi
2019-08-15 11:09         ` Robin Murphy
2019-08-17 17:41           ` [PATCH] usb: dwc3: Add shutdown to platform_driver Vicente Bergas
2019-08-27 11:45             ` Vicente Bergas
2019-08-27 11:53               ` Felipe Balbi
2019-08-27 12:16                 ` Vicente Bergas
2019-09-09 15:07                   ` Vicente Bergas

Linux-ARM-Kernel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-arm-kernel/0 linux-arm-kernel/git/0.git
	git clone --mirror https://lore.kernel.org/linux-arm-kernel/1 linux-arm-kernel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-arm-kernel linux-arm-kernel/ https://lore.kernel.org/linux-arm-kernel \
		linux-arm-kernel@lists.infradead.org infradead-linux-arm-kernel@archiver.kernel.org
	public-inbox-index linux-arm-kernel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-arm-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox