All of lore.kernel.org
 help / color / mirror / Atom feed
* How to debug u-boot data abort
@ 2022-03-23  2:28 qianfan
  2022-03-23  7:45 ` qianfan
  2022-03-23  7:51 ` Abder
  0 siblings, 2 replies; 23+ messages in thread
From: qianfan @ 2022-03-23  2:28 UTC (permalink / raw)
  To: u-boot

Hi:

I had a custom AM335X board connected my computer by usbnet. It always report 
data abort when 'dhcp':

Next it the log:

U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800)

CPU  : AM335X-GP rev 2.1
Model: WISDOM AM335X CCT
DRAM:  512 MiB
NAND:  256 MiB
MMC:   OMAP SD/MMC: 0
Loading Environment from NAND... *** Warning - bad CRC, using default environment

Net:   Could not get PHY for ethernet@4a100000: addr 0
eth2: ethernet@4a100000, eth3: usb_ether
Hit any key to stop autoboot:  0
=> setenv autoload no
=> dhcp
using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
MAC de:ad:be:ef:00:01
HOST MAC de:ad:be:ef:00:00
RNDIS ready
musb-hdrc: peripheral reset irq lost!
high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
USB RNDIS network up!
BOOTP broadcast 1
BOOTP broadcast 2
BOOTP broadcast 3
DHCP client bound to address 192.168.200.4 (757 ms)
data abort
pc : [<9fe9b0a2>]          lr : [<9febbc3f>]
reloc pc : [<808130a2>]    lr : [<80833c3f>]
sp : 9de53410  ip : 9de53578     fp : 00000001
r10: 9de5345c  r9 : 9de67e80     r8 : 9febbae5
r7 : 9de72c30  r6 : 9feec710     r5 : 0000000d  r4 : 00000018
r3 : 3fdd8e04  r2 : 00000002     r1 : 9feec728  r0 : 9feec700
Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
Code: f023 0303 60ca 4403 (6091) 685a
Resetting CPU ...

resetting ...


It's there has any doc about how to debug data abort? Or is the bug is already 
fixed?

Thanks

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to debug u-boot data abort
  2022-03-23  2:28 How to debug u-boot data abort qianfan
@ 2022-03-23  7:45 ` qianfan
  2022-03-23  8:02   ` data abort when run 'dhcp' qianfan
  2022-03-23  8:27   ` How to debug u-boot data abort Heinrich Schuchardt
  2022-03-23  7:51 ` Abder
  1 sibling, 2 replies; 23+ messages in thread
From: qianfan @ 2022-03-23  7:45 UTC (permalink / raw)
  To: u-boot; +Cc: Heinrich Schuchardt


在 2022/3/23 10:28, qianfan 写道:
>
> Hi:
>
> I had a custom AM335X board connected my computer by usbnet. It always report 
> data abort when 'dhcp':
>
> Next it the log:
>
> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800)
>
> CPU  : AM335X-GP rev 2.1
> Model: WISDOM AM335X CCT
> DRAM:  512 MiB
> NAND:  256 MiB
> MMC:   OMAP SD/MMC: 0
> Loading Environment from NAND... *** Warning - bad CRC, using default environment
>
> Net:   Could not get PHY for ethernet@4a100000: addr 0
> eth2: ethernet@4a100000, eth3: usb_ether
> Hit any key to stop autoboot:  0
> => setenv autoload no
> => dhcp
> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
> MAC de:ad:be:ef:00:01
> HOST MAC de:ad:be:ef:00:00
> RNDIS ready
> musb-hdrc: peripheral reset irq lost!
> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
> USB RNDIS network up!
> BOOTP broadcast 1
> BOOTP broadcast 2
> BOOTP broadcast 3
> DHCP client bound to address 192.168.200.4 (757 ms)
> data abort
> pc : [<9fe9b0a2>]          lr : [<9febbc3f>]
> reloc pc : [<808130a2>]    lr : [<80833c3f>]
> sp : 9de53410  ip : 9de53578     fp : 00000001
> r10: 9de5345c  r9 : 9de67e80     r8 : 9febbae5
> r7 : 9de72c30  r6 : 9feec710     r5 : 0000000d  r4 : 00000018
> r3 : 3fdd8e04  r2 : 00000002     r1 : 9feec728  r0 : 9feec700
> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
> Code: f023 0303 60ca 4403 (6091) 685a
> Resetting CPU ...
>
> resetting ...
>
>
> It's there has any doc about how to debug data abort? Or is the bug is already 
> fixed?
>
> Thanks
>
This bug doesn't fixed on master code. I found v2021.01 is good and v2021.04-rc2 
is bad.

Also I had tested this on beaglebone black with am335x_evm_defconfig, has the 
simliar problem.

find the first bug commit via 'git bisect': it told me that commit 
e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very strange due to 
this commit doesn't touch any dhcp or network code.

➜  u-boot-main git:(e97eb638de) ✗ git bisect bug
e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit
commit e97eb638de0dc8f6e989e20eaeb0342f103cb917
Author: Heinrich Schuchardt <xypron.glpk@gmx.de>
Date:   Wed Jan 20 22:21:53 2021 +0100

     fs: fat: consistent error handling for flush_dir()

     Provide function description for flush_dir().
     Move all error messages for flush_dir() from the callers to the function.
     Move mapping of errors to -EIO to the function.
     Always check return value of flush_dir() (Coverity CID 316362).

     In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.

     Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>

:040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e 
77d188b1c99181fd71f2167fdeee3434a09db209 M      fs


184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before 
e97eb638de0dc8f6e989e20eaeb0342f103cb917:

* e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error handling 
for flush_dir()
*   184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag 
'u-boot-rockchip-20210121' of 
https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip
|\
| * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based PCIe 
controller driver

I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.

U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)

CPU  : AM335X-GP rev 2.1
Model: TI AM335x BeagleBone Black
DRAM:  512 MiB
WDT:   Started with servicing (60s timeout)
NAND:  0 MiB
MMC:   OMAP SD/MMC: 0, OMAP SD/MMC: 1
Loading Environment from FAT... <ethaddr> not set. Validating first E-fuse MAC
Net:   eth2: ethernet@4a100000, eth3: usb_ether
Hit any key to stop autoboot:  0
=> dhcp
ethernet@4a100000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
MAC de:ad:be:ef:00:01
HOST MAC de:ad:be:ef:00:00
RNDIS ready
musb-hdrc: peripheral reset irq lost!
high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
USB RNDIS network up!
BOOTP broadcast 1
BOOTP broadcast 2
BOOTP broadcast 3
DHCP client bound to address 192.168.200.157 (757 ms)
Using usb_ether device
TFTP from server 192.168.200.1; our IP address is 192.168.200.157
Filename 'u-boot.img'.
Load address: 0x82000000
Loading: #################################################################
#################################################################
#################################################################
          #########################
          2.5 MiB/s
done
Bytes transferred = 1123888 (112630 hex)
=>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to debug u-boot data abort
  2022-03-23  2:28 How to debug u-boot data abort qianfan
  2022-03-23  7:45 ` qianfan
@ 2022-03-23  7:51 ` Abder
  2022-03-23  7:59   ` qianfan
  1 sibling, 1 reply; 23+ messages in thread
From: Abder @ 2022-03-23  7:51 UTC (permalink / raw)
  To: qianfan; +Cc: U-Boot Mailing List

Le mer. 23 mars 2022 à 03:28, qianfan <qianfanguijin@163.com> a écrit :
>
> Hi:
>
> I had a custom AM335X board connected my computer by usbnet. It always report
> data abort when 'dhcp':
>
> Next it the log:
>
> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800)
>
> CPU  : AM335X-GP rev 2.1
> Model: WISDOM AM335X CCT
> DRAM:  512 MiB
> NAND:  256 MiB
> MMC:   OMAP SD/MMC: 0
> Loading Environment from NAND... *** Warning - bad CRC, using default environment
>
> Net:   Could not get PHY for ethernet@4a100000: addr 0
> eth2: ethernet@4a100000, eth3: usb_ether
> Hit any key to stop autoboot:  0
> => setenv autoload no
> => dhcp
> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
> MAC de:ad:be:ef:00:01
> HOST MAC de:ad:be:ef:00:00
> RNDIS ready
> musb-hdrc: peripheral reset irq lost!
> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
> USB RNDIS network up!
> BOOTP broadcast 1
> BOOTP broadcast 2
> BOOTP broadcast 3
> DHCP client bound to address 192.168.200.4 (757 ms)
> data abort
> pc : [<9fe9b0a2>]          lr : [<9febbc3f>]
> reloc pc : [<808130a2>]    lr : [<80833c3f>]
> sp : 9de53410  ip : 9de53578     fp : 00000001
> r10: 9de5345c  r9 : 9de67e80     r8 : 9febbae5
> r7 : 9de72c30  r6 : 9feec710     r5 : 0000000d  r4 : 00000018
> r3 : 3fdd8e04  r2 : 00000002     r1 : 9feec728  r0 : 9feec700
> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
> Code: f023 0303 60ca 4403 (6091) 685a
> Resetting CPU ...
>
Don't have any idea on what is causing the crash,  but to answer your
question about debugging data abort :
from the reg dump, you can look at the PC and LR registers to see the
function that caused the crash (in PC) and its caller (in LR) by using
the .map file (generated after compilation).
use the values of pc and lr ante relocation (the 2nd ligne in the dump
above: reloc pc ...)

Regards
--
Abder

> resetting ...
>
>
> It's there has any doc about how to debug data abort? Or is the bug is already
> fixed?
>
> Thanks

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to debug u-boot data abort
  2022-03-23  7:51 ` Abder
@ 2022-03-23  7:59   ` qianfan
  0 siblings, 0 replies; 23+ messages in thread
From: qianfan @ 2022-03-23  7:59 UTC (permalink / raw)
  To: Abder; +Cc: U-Boot Mailing List


在 2022/3/23 15:51, Abder 写道:
> Le mer. 23 mars 2022 à 03:28, qianfan <qianfanguijin@163.com> a écrit :
>> Hi:
>>
>> I had a custom AM335X board connected my computer by usbnet. It always report
>> data abort when 'dhcp':
>>
>> Next it the log:
>>
>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800)
>>
>> CPU  : AM335X-GP rev 2.1
>> Model: WISDOM AM335X CCT
>> DRAM:  512 MiB
>> NAND:  256 MiB
>> MMC:   OMAP SD/MMC: 0
>> Loading Environment from NAND... *** Warning - bad CRC, using default environment
>>
>> Net:   Could not get PHY for ethernet@4a100000: addr 0
>> eth2: ethernet@4a100000, eth3: usb_ether
>> Hit any key to stop autoboot:  0
>> => setenv autoload no
>> => dhcp
>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>> MAC de:ad:be:ef:00:01
>> HOST MAC de:ad:be:ef:00:00
>> RNDIS ready
>> musb-hdrc: peripheral reset irq lost!
>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>> USB RNDIS network up!
>> BOOTP broadcast 1
>> BOOTP broadcast 2
>> BOOTP broadcast 3
>> DHCP client bound to address 192.168.200.4 (757 ms)
>> data abort
>> pc : [<9fe9b0a2>]          lr : [<9febbc3f>]
>> reloc pc : [<808130a2>]    lr : [<80833c3f>]
>> sp : 9de53410  ip : 9de53578     fp : 00000001
>> r10: 9de5345c  r9 : 9de67e80     r8 : 9febbae5
>> r7 : 9de72c30  r6 : 9feec710     r5 : 0000000d  r4 : 00000018
>> r3 : 3fdd8e04  r2 : 00000002     r1 : 9feec728  r0 : 9feec700
>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
>> Code: f023 0303 60ca 4403 (6091) 685a
>> Resetting CPU ...
>>
> Don't have any idea on what is causing the crash,  but to answer your
> question about debugging data abort :
> from the reg dump, you can look at the PC and LR registers to see the
> function that caused the crash (in PC) and its caller (in LR) by using
> the .map file (generated after compilation).
> use the values of pc and lr ante relocation (the 2nd ligne in the dump
> above: reloc pc ...)

Hi:

Thanks for your's guide. I had this data abort message:

data abort
pc : [<9ff8196c>]          lr : [<9ffa1cd7>]
reloc pc : [<8081496c>]    lr : [<80834cd7>]

and found the pc and lc address in u-boot.map:

  .text.env_attr_walk
                 0x0000000080834c54       0xb4 env/built-in.o
                 0x0000000080834c54                env_attr_walk
  .text.malloc   0x0000000080814900      0x420 common/built-in.o
                 0x0000000080814900                malloc

Is means that data abort when 'malloc' called from env_attr_walk? It's there has 
a better way that can dump stack?

>
> Regards
> --
> Abder
>
>> resetting ...
>>
>>
>> It's there has any doc about how to debug data abort? Or is the bug is already
>> fixed?
>>
>> Thanks


^ permalink raw reply	[flat|nested] 23+ messages in thread

* data abort when run 'dhcp'
  2022-03-23  7:45 ` qianfan
@ 2022-03-23  8:02   ` qianfan
  2022-03-23  9:13     ` qianfan
  2022-03-23  8:27   ` How to debug u-boot data abort Heinrich Schuchardt
  1 sibling, 1 reply; 23+ messages in thread
From: qianfan @ 2022-03-23  8:02 UTC (permalink / raw)
  To: u-boot; +Cc: Heinrich Schuchardt


在 2022/3/23 15:45, qianfan 写道:
>
>
> 在 2022/3/23 10:28, qianfan 写道:
>>
>> Hi:
>>
>> I had a custom AM335X board connected my computer by usbnet. It always report 
>> data abort when 'dhcp':
>>
>> Next it the log:
>>
>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800)
>>
>> CPU  : AM335X-GP rev 2.1
>> Model: WISDOM AM335X CCT
>> DRAM:  512 MiB
>> NAND:  256 MiB
>> MMC:   OMAP SD/MMC: 0
>> Loading Environment from NAND... *** Warning - bad CRC, using default environment
>>
>> Net:   Could not get PHY for ethernet@4a100000: addr 0
>> eth2: ethernet@4a100000, eth3: usb_ether
>> Hit any key to stop autoboot:  0
>> => setenv autoload no
>> => dhcp
>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>> MAC de:ad:be:ef:00:01
>> HOST MAC de:ad:be:ef:00:00
>> RNDIS ready
>> musb-hdrc: peripheral reset irq lost!
>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>> USB RNDIS network up!
>> BOOTP broadcast 1
>> BOOTP broadcast 2
>> BOOTP broadcast 3
>> DHCP client bound to address 192.168.200.4 (757 ms)
>> data abort
>> pc : [<9fe9b0a2>]          lr : [<9febbc3f>]
>> reloc pc : [<808130a2>]    lr : [<80833c3f>]
>> sp : 9de53410  ip : 9de53578     fp : 00000001
>> r10: 9de5345c  r9 : 9de67e80     r8 : 9febbae5
>> r7 : 9de72c30  r6 : 9feec710     r5 : 0000000d  r4 : 00000018
>> r3 : 3fdd8e04  r2 : 00000002     r1 : 9feec728  r0 : 9feec700
>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
>> Code: f023 0303 60ca 4403 (6091) 685a
>> Resetting CPU ...
>>
>> resetting ...
>>
>>
>> It's there has any doc about how to debug data abort? Or is the bug is 
>> already fixed?
>>
>> Thanks
>>
> This bug doesn't fixed on master code. I found v2021.01 is good and 
> v2021.04-rc2 is bad.
>
> Also I had tested this on beaglebone black with am335x_evm_defconfig, has the 
> simliar problem.
>
> find the first bug commit via 'git bisect': it told me that commit 
> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very strange due 
> to this commit doesn't touch any dhcp or network code.
>
> ➜  u-boot-main git:(e97eb638de) ✗ git bisect bug
> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit
> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917
> Author: Heinrich Schuchardt <xypron.glpk@gmx.de>
> Date:   Wed Jan 20 22:21:53 2021 +0100
>
>     fs: fat: consistent error handling for flush_dir()
>
>     Provide function description for flush_dir().
>     Move all error messages for flush_dir() from the callers to the function.
>     Move mapping of errors to -EIO to the function.
>     Always check return value of flush_dir() (Coverity CID 316362).
>
>     In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
>
>     Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
>
> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e 
> 77d188b1c99181fd71f2167fdeee3434a09db209 M      fs
>
>
> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before 
> e97eb638de0dc8f6e989e20eaeb0342f103cb917:
>
> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error handling 
> for flush_dir()
> *   184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag 
> 'u-boot-rockchip-20210121' of 
> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip
> |\
> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based PCIe 
> controller driver
>
> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
>
> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
>
> CPU  : AM335X-GP rev 2.1
> Model: TI AM335x BeagleBone Black
> DRAM:  512 MiB
> WDT:   Started with servicing (60s timeout)
> NAND:  0 MiB
> MMC:   OMAP SD/MMC: 0, OMAP SD/MMC: 1
> Loading Environment from FAT... <ethaddr> not set. Validating first E-fuse MAC
> Net:   eth2: ethernet@4a100000, eth3: usb_ether
> Hit any key to stop autoboot:  0
> => dhcp
> ethernet@4a100000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
> MAC de:ad:be:ef:00:01
> HOST MAC de:ad:be:ef:00:00
> RNDIS ready
> musb-hdrc: peripheral reset irq lost!
> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
> USB RNDIS network up!
> BOOTP broadcast 1
> BOOTP broadcast 2
> BOOTP broadcast 3
> DHCP client bound to address 192.168.200.157 (757 ms)
> Using usb_ether device
> TFTP from server 192.168.200.1; our IP address is 192.168.200.157
> Filename 'u-boot.img'.
> Load address: 0x82000000
> Loading: #################################################################
> #################################################################
> #################################################################
>          #########################
>          2.5 MiB/s
> done
> Bytes transferred = 1123888 (112630 hex)
> =>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to debug u-boot data abort
  2022-03-23  7:45 ` qianfan
  2022-03-23  8:02   ` data abort when run 'dhcp' qianfan
@ 2022-03-23  8:27   ` Heinrich Schuchardt
  2022-03-24  3:18     ` AKASHI Takahiro
  1 sibling, 1 reply; 23+ messages in thread
From: Heinrich Schuchardt @ 2022-03-23  8:27 UTC (permalink / raw)
  To: qianfan; +Cc: u-boot

On 3/23/22 08:45, qianfan wrote:
>
> 在 2022/3/23 10:28, qianfan 写道:
>>
>> Hi:
>>
>> I had a custom AM335X board connected my computer by usbnet. It always
>> report data abort when 'dhcp':
>>
>> Next it the log:
>>
>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800)
>>
>> CPU  : AM335X-GP rev 2.1
>> Model: WISDOM AM335X CCT
>> DRAM:  512 MiB
>> NAND:  256 MiB
>> MMC:   OMAP SD/MMC: 0
>> Loading Environment from NAND... *** Warning - bad CRC, using default
>> environment
>>
>> Net:   Could not get PHY for ethernet@4a100000: addr 0
>> eth2: ethernet@4a100000, eth3: usb_ether
>> Hit any key to stop autoboot:  0
>> => setenv autoload no
>> => dhcp
>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>> MAC de:ad:be:ef:00:01
>> HOST MAC de:ad:be:ef:00:00
>> RNDIS ready
>> musb-hdrc: peripheral reset irq lost!
>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>> USB RNDIS network up!
>> BOOTP broadcast 1
>> BOOTP broadcast 2
>> BOOTP broadcast 3
>> DHCP client bound to address 192.168.200.4 (757 ms)
>> data abort

This could be an alignment error.

>> pc : [<9fe9b0a2>]          lr : [<9febbc3f>]
>> reloc pc : [<808130a2>]    lr : [<80833c3f>]

You can use these addresses together with the u-boot.map file to figure
out in which function the abort occurs and from where it was called.

Use 'arm-linux-gnueabihf-objdump -S -D' to find the exact code positions.

>> sp : 9de53410  ip : 9de53578     fp : 00000001
>> r10: 9de5345c  r9 : 9de67e80     r8 : 9febbae5
>> r7 : 9de72c30  r6 : 9feec710     r5 : 0000000d  r4 : 00000018
>> r3 : 3fdd8e04  r2 : 00000002     r1 : 9feec728  r0 : 9feec700
>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
>> Code: f023 0303 60ca 4403 (6091) 685a

This is how to find the exact instruction causing the problem:

$ echo 'Code: f023 0303 60ca 4403 (6091) 685a' | \
 >       ARCH=arm scripts/decodecode
Code: f023 0303 60ca 4403 (6091) 685a
All code
========
    0:   23 f0                   and    %eax,%esi
    2:   03 03                   add    (%rbx),%eax
    4:   ca 60 03                lret   $0x360
    7:*  44 91                   rex.R xchg %eax,%ecx            <--
trapping instruction
    9:   60                      (bad)
    a:   5a                      pop    %rdx
    b:   68                      .byte 0x68

Code starting with the faulting instruction
===========================================
    0:   91                      xchg   %eax,%ecx
    1:   60                      (bad)
    2:   5a                      pop    %rdx
    3:   68                      .byte 0x68

I hope this helps to figure out, where exactly the problem occurs

Best regards

Heinrich

>> Resetting CPU ...
>>
>> resetting ...
>>
>>
>> It's there has any doc about how to debug data abort? Or is the bug is
>> already fixed?
>>
>> Thanks
>>
> This bug doesn't fixed on master code. I found v2021.01 is good and
> v2021.04-rc2 is bad.
>
> Also I had tested this on beaglebone black with am335x_evm_defconfig,
> has the simliar problem.
>
> find the first bug commit via 'git bisect': it told me that commit
> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very
> strange due to this commit doesn't touch any dhcp or network code.
>
> ➜  u-boot-main git:(e97eb638de) ✗ git bisect bug
> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit
> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917
> Author: Heinrich Schuchardt <xypron.glpk@gmx.de>
> Date:   Wed Jan 20 22:21:53 2021 +0100
>
>      fs: fat: consistent error handling for flush_dir()
>
>      Provide function description for flush_dir().
>      Move all error messages for flush_dir() from the callers to the
> function.
>      Move mapping of errors to -EIO to the function.
>      Always check return value of flush_dir() (Coverity CID 316362).
>
>      In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
>
>      Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
>
> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e
> 77d188b1c99181fd71f2167fdeee3434a09db209 M      fs
>
>
> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before
> e97eb638de0dc8f6e989e20eaeb0342f103cb917:
>
> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
> handling for flush_dir()
> *   184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag
> 'u-boot-rockchip-20210121' of
> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip
> |\
> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based
> PCIe controller driver
>
> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
>
> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
>
> CPU  : AM335X-GP rev 2.1
> Model: TI AM335x BeagleBone Black
> DRAM:  512 MiB
> WDT:   Started with servicing (60s timeout)
> NAND:  0 MiB
> MMC:   OMAP SD/MMC: 0, OMAP SD/MMC: 1
> Loading Environment from FAT... <ethaddr> not set. Validating first
> E-fuse MAC
> Net:   eth2: ethernet@4a100000, eth3: usb_ether
> Hit any key to stop autoboot:  0
> => dhcp
> ethernet@4a100000 Waiting for PHY auto negotiation to complete.........
> TIMEOUT !
> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
> MAC de:ad:be:ef:00:01
> HOST MAC de:ad:be:ef:00:00
> RNDIS ready
> musb-hdrc: peripheral reset irq lost!
> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
> USB RNDIS network up!
> BOOTP broadcast 1
> BOOTP broadcast 2
> BOOTP broadcast 3
> DHCP client bound to address 192.168.200.157 (757 ms)
> Using usb_ether device
> TFTP from server 192.168.200.1; our IP address is 192.168.200.157
> Filename 'u-boot.img'.
> Load address: 0x82000000
> Loading: #################################################################
> #################################################################
> #################################################################
>           #########################
>           2.5 MiB/s
> done
> Bytes transferred = 1123888 (112630 hex)
> =>
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: data abort when run 'dhcp'
  2022-03-23  8:02   ` data abort when run 'dhcp' qianfan
@ 2022-03-23  9:13     ` qianfan
  2022-03-23  9:51       ` Heinrich Schuchardt
  0 siblings, 1 reply; 23+ messages in thread
From: qianfan @ 2022-03-23  9:13 UTC (permalink / raw)
  To: u-boot; +Cc: Heinrich Schuchardt, Heinrich Schuchardt


在 2022/3/23 16:02, qianfan 写道:
>
>
> 在 2022/3/23 15:45, qianfan 写道:
>>
>>
>> 在 2022/3/23 10:28, qianfan 写道:
>>>
>>> Hi:
>>>
>>> I had a custom AM335X board connected my computer by usbnet. It always 
>>> report data abort when 'dhcp':
>>>
>>> Next it the log:
>>>
>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800)
>>>
>>> CPU  : AM335X-GP rev 2.1
>>> Model: WISDOM AM335X CCT
>>> DRAM:  512 MiB
>>> NAND:  256 MiB
>>> MMC:   OMAP SD/MMC: 0
>>> Loading Environment from NAND... *** Warning - bad CRC, using default 
>>> environment
>>>
>>> Net:   Could not get PHY for ethernet@4a100000: addr 0
>>> eth2: ethernet@4a100000, eth3: usb_ether
>>> Hit any key to stop autoboot:  0
>>> => setenv autoload no
>>> => dhcp
>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>>> MAC de:ad:be:ef:00:01
>>> HOST MAC de:ad:be:ef:00:00
>>> RNDIS ready
>>> musb-hdrc: peripheral reset irq lost!
>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>>> USB RNDIS network up!
>>> BOOTP broadcast 1
>>> BOOTP broadcast 2
>>> BOOTP broadcast 3
>>> DHCP client bound to address 192.168.200.4 (757 ms)
>>> data abort
>>> pc : [<9fe9b0a2>]          lr : [<9febbc3f>]
>>> reloc pc : [<808130a2>]    lr : [<80833c3f>]
>>> sp : 9de53410  ip : 9de53578     fp : 00000001
>>> r10: 9de5345c  r9 : 9de67e80     r8 : 9febbae5
>>> r7 : 9de72c30  r6 : 9feec710     r5 : 0000000d  r4 : 00000018
>>> r3 : 3fdd8e04  r2 : 00000002     r1 : 9feec728  r0 : 9feec700
>>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
>>> Code: f023 0303 60ca 4403 (6091) 685a
>>> Resetting CPU ...
>>>
>>> resetting ...
>>>
>>>
>>> It's there has any doc about how to debug data abort? Or is the bug is 
>>> already fixed?
>>>
>>> Thanks
>>>
>> This bug doesn't fixed on master code. I found v2021.01 is good and 
>> v2021.04-rc2 is bad.
>>
>> Also I had tested this on beaglebone black with am335x_evm_defconfig, has the 
>> simliar problem.
>>
>> find the first bug commit via 'git bisect': it told me that commit 
>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very strange due 
>> to this commit doesn't touch any dhcp or network code.
>>
>> ➜  u-boot-main git:(e97eb638de) ✗ git bisect bug
>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit
>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917
>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de>
>> Date:   Wed Jan 20 22:21:53 2021 +0100
>>
>>     fs: fat: consistent error handling for flush_dir()
>>
>>     Provide function description for flush_dir().
>>     Move all error messages for flush_dir() from the callers to the function.
>>     Move mapping of errors to -EIO to the function.
>>     Always check return value of flush_dir() (Coverity CID 316362).
>>
>>     In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
>>
>>     Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>
>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e 
>> 77d188b1c99181fd71f2167fdeee3434a09db209 M      fs
>>
>>
>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before 
>> e97eb638de0dc8f6e989e20eaeb0342f103cb917:
>>
>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error handling 
>> for flush_dir()
>> *   184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag 
>> 'u-boot-rockchip-20210121' of 
>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip
>> |\
>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based PCIe 
>> controller driver
>>
>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
>>
>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
>>
>> CPU  : AM335X-GP rev 2.1
>> Model: TI AM335x BeagleBone Black
>> DRAM:  512 MiB
>> WDT:   Started with servicing (60s timeout)
>> NAND:  0 MiB
>> MMC:   OMAP SD/MMC: 0, OMAP SD/MMC: 1
>> Loading Environment from FAT... <ethaddr> not set. Validating first E-fuse MAC
>> Net:   eth2: ethernet@4a100000, eth3: usb_ether
>> Hit any key to stop autoboot:  0
>> => dhcp
>> ethernet@4a100000 Waiting for PHY auto negotiation to complete......... TIMEOUT !
>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>> MAC de:ad:be:ef:00:01
>> HOST MAC de:ad:be:ef:00:00
>> RNDIS ready
>> musb-hdrc: peripheral reset irq lost!
>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>> USB RNDIS network up!
>> BOOTP broadcast 1
>> BOOTP broadcast 2
>> BOOTP broadcast 3
>> DHCP client bound to address 192.168.200.157 (757 ms)
>> Using usb_ether device
>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157
>> Filename 'u-boot.img'.
>> Load address: 0x82000000
>> Loading: #################################################################
>> #################################################################
>> #################################################################
>>          #########################
>>          2.5 MiB/s
>> done
>> Bytes transferred = 1123888 (112630 hex)
>> =>
>>
"data abort" messages:

data abort
pc : [<9ff8196c>]          lr : [<9ffa1cd7>]
reloc pc : [<8081496c>]    lr : [<80834cd7>]
sp : 9df38e60  ip : 9df38fc8     fp : 00000001
r10: 9df38eac  r9 : 9df4ceb0     r8 : 9ffa1b7d
r7 : 9df52fd0  r6 : 9ffdbba8     r5 : 0000000d  r4 : 00000018
r3 : 3ff589e0  r2 : 9ffafa11     r1 : 9ffdbbc0  r0 : 9ffdbb00
Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
Code: 0303 60ca 4403 6091 (685a) f042
Resetting CPU ...

objdump u-boot:pc is in malloc and lr is in env_attr_walk

       unlink(victim, bck, fwd);
80814966:    60ca          str    r2, [r1, #12]
       set_inuse_bit_at_offset(victim, victim_size);
80814968:    4403          add    r3, r0
       unlink(victim, bck, fwd);
8081496a:    6091          str    r1, [r2, #8]
     set_inuse_bit_at_offset(victim, victim_size);
8081496c:    685a          ldr    r2, [r3, #4]
8081496e:    f042 0201     orr.w    r2, r2, #1
80814972:    605a          str    r2, [r3, #4]

r3 is 3ff589e0 and it's not a valid ram address on am335x.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: data abort when run 'dhcp'
  2022-03-23  9:13     ` qianfan
@ 2022-03-23  9:51       ` Heinrich Schuchardt
  2022-03-23 10:07         ` qianfan
  0 siblings, 1 reply; 23+ messages in thread
From: Heinrich Schuchardt @ 2022-03-23  9:51 UTC (permalink / raw)
  To: qianfan; +Cc: u-boot

On 3/23/22 10:13, qianfan wrote:
>
> 在 2022/3/23 16:02, qianfan 写道:
>>
>>
>> 在 2022/3/23 15:45, qianfan 写道:
>>>
>>>
>>> 在 2022/3/23 10:28, qianfan 写道:
>>>>
>>>> Hi:
>>>>
>>>> I had a custom AM335X board connected my computer by usbnet. It
>>>> always report data abort when 'dhcp':
>>>>
>>>> Next it the log:
>>>>
>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02
>>>> +0800)
>>>>
>>>> CPU  : AM335X-GP rev 2.1
>>>> Model: WISDOM AM335X CCT
>>>> DRAM:  512 MiB
>>>> NAND:  256 MiB
>>>> MMC:   OMAP SD/MMC: 0
>>>> Loading Environment from NAND... *** Warning - bad CRC, using
>>>> default environment
>>>>
>>>> Net:   Could not get PHY for ethernet@4a100000: addr 0
>>>> eth2: ethernet@4a100000, eth3: usb_ether
>>>> Hit any key to stop autoboot:  0
>>>> => setenv autoload no
>>>> => dhcp
>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>>>> MAC de:ad:be:ef:00:01
>>>> HOST MAC de:ad:be:ef:00:00
>>>> RNDIS ready
>>>> musb-hdrc: peripheral reset irq lost!
>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>>>> USB RNDIS network up!
>>>> BOOTP broadcast 1
>>>> BOOTP broadcast 2
>>>> BOOTP broadcast 3
>>>> DHCP client bound to address 192.168.200.4 (757 ms)
>>>> data abort
>>>> pc : [<9fe9b0a2>]          lr : [<9febbc3f>]
>>>> reloc pc : [<808130a2>]    lr : [<80833c3f>]
>>>> sp : 9de53410  ip : 9de53578     fp : 00000001
>>>> r10: 9de5345c  r9 : 9de67e80     r8 : 9febbae5
>>>> r7 : 9de72c30  r6 : 9feec710     r5 : 0000000d  r4 : 00000018
>>>> r3 : 3fdd8e04  r2 : 00000002     r1 : 9feec728  r0 : 9feec700
>>>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
>>>> Code: f023 0303 60ca 4403 (6091) 685a
>>>> Resetting CPU ...
>>>>
>>>> resetting ...
>>>>
>>>>
>>>> It's there has any doc about how to debug data abort? Or is the bug
>>>> is already fixed?
>>>>
>>>> Thanks
>>>>
>>> This bug doesn't fixed on master code. I found v2021.01 is good and
>>> v2021.04-rc2 is bad.
>>>
>>> Also I had tested this on beaglebone black with am335x_evm_defconfig,
>>> has the simliar problem.
>>>
>>> find the first bug commit via 'git bisect': it told me that commit
>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very
>>> strange due to this commit doesn't touch any dhcp or network code.
>>>
>>> ➜  u-boot-main git:(e97eb638de) ✗ git bisect bug
>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit
>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917
>>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>> Date:   Wed Jan 20 22:21:53 2021 +0100
>>>
>>>     fs: fat: consistent error handling for flush_dir()
>>>
>>>     Provide function description for flush_dir().
>>>     Move all error messages for flush_dir() from the callers to the
>>> function.
>>>     Move mapping of errors to -EIO to the function.
>>>     Always check return value of flush_dir() (Coverity CID 316362).
>>>
>>>     In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
>>>
>>>     Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>>
>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e
>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M      fs
>>>
>>>
>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before
>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917:
>>>
>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
>>> handling for flush_dir()
>>> *   184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag
>>> 'u-boot-rockchip-20210121' of
>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip
>>> |\
>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc
>>> based PCIe controller driver
>>>
>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
>>>
>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
>>>
>>> CPU  : AM335X-GP rev 2.1
>>> Model: TI AM335x BeagleBone Black
>>> DRAM:  512 MiB
>>> WDT:   Started with servicing (60s timeout)
>>> NAND:  0 MiB
>>> MMC:   OMAP SD/MMC: 0, OMAP SD/MMC: 1
>>> Loading Environment from FAT... <ethaddr> not set. Validating first
>>> E-fuse MAC
>>> Net:   eth2: ethernet@4a100000, eth3: usb_ether
>>> Hit any key to stop autoboot:  0
>>> => dhcp
>>> ethernet@4a100000 Waiting for PHY auto negotiation to
>>> complete......... TIMEOUT !
>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>>> MAC de:ad:be:ef:00:01
>>> HOST MAC de:ad:be:ef:00:00
>>> RNDIS ready
>>> musb-hdrc: peripheral reset irq lost!
>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>>> USB RNDIS network up!
>>> BOOTP broadcast 1
>>> BOOTP broadcast 2
>>> BOOTP broadcast 3
>>> DHCP client bound to address 192.168.200.157 (757 ms)
>>> Using usb_ether device
>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157
>>> Filename 'u-boot.img'.
>>> Load address: 0x82000000
>>> Loading:
>>> #################################################################
>>> #################################################################
>>> #################################################################
>>>          #########################
>>>          2.5 MiB/s
>>> done
>>> Bytes transferred = 1123888 (112630 hex)
>>> =>
>>>
> "data abort" messages:
>
> data abort
> pc : [<9ff8196c>]          lr : [<9ffa1cd7>]
> reloc pc : [<8081496c>]    lr : [<80834cd7>]
> sp : 9df38e60  ip : 9df38fc8     fp : 00000001
> r10: 9df38eac  r9 : 9df4ceb0     r8 : 9ffa1b7d
> r7 : 9df52fd0  r6 : 9ffdbba8     r5 : 0000000d  r4 : 00000018
> r3 : 3ff589e0  r2 : 9ffafa11     r1 : 9ffdbbc0  r0 : 9ffdbb00
> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
> Code: 0303 60ca 4403 6091 (685a) f042
> Resetting CPU ...
>
> objdump u-boot:pc is in malloc and lr is in env_attr_walk
>
>        unlink(victim, bck, fwd);
> 80814966:    60ca          str    r2, [r1, #12]
>        set_inuse_bit_at_offset(victim, victim_size);
> 80814968:    4403          add    r3, r0
>        unlink(victim, bck, fwd);
> 8081496a:    6091          str    r1, [r2, #8]
>      set_inuse_bit_at_offset(victim, victim_size);
> 8081496c:    685a          ldr    r2, [r3, #4]
> 8081496e:    f042 0201     orr.w    r2, r2, #1
> 80814972:    605a          str    r2, [r3, #4]
>
> r3 is 3ff589e0 and it's not a valid ram address on am335x.
>
>

I have seen crashes in common/dlmalloc.c before after double free() or
free() with an incorrect pointer.

The assert() statements in do_check_inuse_chunk() are meant to catch
this but assert() as defined in include/log.h does not stop the code and
even does not print without _DEBUG=1.

You should be able to get the assert output with

#include <common.h>
#define _DEBUG 1
#include <log.h>

at the top of common/dlmalloc.c.

You should get full malloc debug output with

#define DEBUG 1
#include <common.h>
#include <log.h>

Best regards

Heinrich

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: data abort when run 'dhcp'
  2022-03-23  9:51       ` Heinrich Schuchardt
@ 2022-03-23 10:07         ` qianfan
  2022-03-23 10:12           ` Heinrich Schuchardt
  0 siblings, 1 reply; 23+ messages in thread
From: qianfan @ 2022-03-23 10:07 UTC (permalink / raw)
  To: Heinrich Schuchardt; +Cc: u-boot


在 2022/3/23 17:51, Heinrich Schuchardt 写道:
> On 3/23/22 10:13, qianfan wrote:
>>
>> 在 2022/3/23 16:02, qianfan 写道:
>>>
>>>
>>> 在 2022/3/23 15:45, qianfan 写道:
>>>>
>>>>
>>>> 在 2022/3/23 10:28, qianfan 写道:
>>>>>
>>>>> Hi:
>>>>>
>>>>> I had a custom AM335X board connected my computer by usbnet. It
>>>>> always report data abort when 'dhcp':
>>>>>
>>>>> Next it the log:
>>>>>
>>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02
>>>>> +0800)
>>>>>
>>>>> CPU  : AM335X-GP rev 2.1
>>>>> Model: WISDOM AM335X CCT
>>>>> DRAM:  512 MiB
>>>>> NAND:  256 MiB
>>>>> MMC:   OMAP SD/MMC: 0
>>>>> Loading Environment from NAND... *** Warning - bad CRC, using
>>>>> default environment
>>>>>
>>>>> Net:   Could not get PHY for ethernet@4a100000: addr 0
>>>>> eth2: ethernet@4a100000, eth3: usb_ether
>>>>> Hit any key to stop autoboot:  0
>>>>> => setenv autoload no
>>>>> => dhcp
>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>>>>> MAC de:ad:be:ef:00:01
>>>>> HOST MAC de:ad:be:ef:00:00
>>>>> RNDIS ready
>>>>> musb-hdrc: peripheral reset irq lost!
>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>>>>> USB RNDIS network up!
>>>>> BOOTP broadcast 1
>>>>> BOOTP broadcast 2
>>>>> BOOTP broadcast 3
>>>>> DHCP client bound to address 192.168.200.4 (757 ms)
>>>>> data abort
>>>>> pc : [<9fe9b0a2>]          lr : [<9febbc3f>]
>>>>> reloc pc : [<808130a2>]    lr : [<80833c3f>]
>>>>> sp : 9de53410  ip : 9de53578     fp : 00000001
>>>>> r10: 9de5345c  r9 : 9de67e80     r8 : 9febbae5
>>>>> r7 : 9de72c30  r6 : 9feec710     r5 : 0000000d  r4 : 00000018
>>>>> r3 : 3fdd8e04  r2 : 00000002     r1 : 9feec728  r0 : 9feec700
>>>>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
>>>>> Code: f023 0303 60ca 4403 (6091) 685a
>>>>> Resetting CPU ...
>>>>>
>>>>> resetting ...
>>>>>
>>>>>
>>>>> It's there has any doc about how to debug data abort? Or is the bug
>>>>> is already fixed?
>>>>>
>>>>> Thanks
>>>>>
>>>> This bug doesn't fixed on master code. I found v2021.01 is good and
>>>> v2021.04-rc2 is bad.
>>>>
>>>> Also I had tested this on beaglebone black with am335x_evm_defconfig,
>>>> has the simliar problem.
>>>>
>>>> find the first bug commit via 'git bisect': it told me that commit
>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very
>>>> strange due to this commit doesn't touch any dhcp or network code.
>>>>
>>>> ➜  u-boot-main git:(e97eb638de) ✗ git bisect bug
>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit
>>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917
>>>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>>> Date:   Wed Jan 20 22:21:53 2021 +0100
>>>>
>>>>     fs: fat: consistent error handling for flush_dir()
>>>>
>>>>     Provide function description for flush_dir().
>>>>     Move all error messages for flush_dir() from the callers to the
>>>> function.
>>>>     Move mapping of errors to -EIO to the function.
>>>>     Always check return value of flush_dir() (Coverity CID 316362).
>>>>
>>>>     In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
>>>>
>>>>     Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>>>
>>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e
>>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M      fs
>>>>
>>>>
>>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before
>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917:
>>>>
>>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
>>>> handling for flush_dir()
>>>> *   184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag
>>>> 'u-boot-rockchip-20210121' of
>>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip
>>>> |\
>>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc
>>>> based PCIe controller driver
>>>>
>>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
>>>>
>>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
>>>>
>>>> CPU  : AM335X-GP rev 2.1
>>>> Model: TI AM335x BeagleBone Black
>>>> DRAM:  512 MiB
>>>> WDT:   Started with servicing (60s timeout)
>>>> NAND:  0 MiB
>>>> MMC:   OMAP SD/MMC: 0, OMAP SD/MMC: 1
>>>> Loading Environment from FAT... <ethaddr> not set. Validating first
>>>> E-fuse MAC
>>>> Net:   eth2: ethernet@4a100000, eth3: usb_ether
>>>> Hit any key to stop autoboot:  0
>>>> => dhcp
>>>> ethernet@4a100000 Waiting for PHY auto negotiation to
>>>> complete......... TIMEOUT !
>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>>>> MAC de:ad:be:ef:00:01
>>>> HOST MAC de:ad:be:ef:00:00
>>>> RNDIS ready
>>>> musb-hdrc: peripheral reset irq lost!
>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>>>> USB RNDIS network up!
>>>> BOOTP broadcast 1
>>>> BOOTP broadcast 2
>>>> BOOTP broadcast 3
>>>> DHCP client bound to address 192.168.200.157 (757 ms)
>>>> Using usb_ether device
>>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157
>>>> Filename 'u-boot.img'.
>>>> Load address: 0x82000000
>>>> Loading:
>>>> #################################################################
>>>> #################################################################
>>>> #################################################################
>>>>          #########################
>>>>          2.5 MiB/s
>>>> done
>>>> Bytes transferred = 1123888 (112630 hex)
>>>> =>
>>>>
>> "data abort" messages:
>>
>> data abort
>> pc : [<9ff8196c>]          lr : [<9ffa1cd7>]
>> reloc pc : [<8081496c>]    lr : [<80834cd7>]
>> sp : 9df38e60  ip : 9df38fc8     fp : 00000001
>> r10: 9df38eac  r9 : 9df4ceb0     r8 : 9ffa1b7d
>> r7 : 9df52fd0  r6 : 9ffdbba8     r5 : 0000000d  r4 : 00000018
>> r3 : 3ff589e0  r2 : 9ffafa11     r1 : 9ffdbbc0  r0 : 9ffdbb00
>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
>> Code: 0303 60ca 4403 6091 (685a) f042
>> Resetting CPU ...
>>
>> objdump u-boot:pc is in malloc and lr is in env_attr_walk
>>
>>        unlink(victim, bck, fwd);
>> 80814966:    60ca          str    r2, [r1, #12]
>>        set_inuse_bit_at_offset(victim, victim_size);
>> 80814968:    4403          add    r3, r0
>>        unlink(victim, bck, fwd);
>> 8081496a:    6091          str    r1, [r2, #8]
>>      set_inuse_bit_at_offset(victim, victim_size);
>> 8081496c:    685a          ldr    r2, [r3, #4]
>> 8081496e:    f042 0201     orr.w    r2, r2, #1
>> 80814972:    605a          str    r2, [r3, #4]
>>
>> r3 is 3ff589e0 and it's not a valid ram address on am335x.
>>
>>
>
> I have seen crashes in common/dlmalloc.c before after double free() or
> free() with an incorrect pointer.
>
> The assert() statements in do_check_inuse_chunk() are meant to catch
> this but assert() as defined in include/log.h does not stop the code and
> even does not print without _DEBUG=1.
>
> You should be able to get the assert output with
>
> #include <common.h>
> #define _DEBUG 1
> #include <log.h>
>
> at the top of common/dlmalloc.c.
>
> You should get full malloc debug output with

Hi: I had try add DEBUG marco before <log.h> and no other malloc message printed.

>
> #define DEBUG 1
> #include <common.h>
> #include <log.h>
>
> Best regards
>
> Heinrich


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: data abort when run 'dhcp'
  2022-03-23 10:07         ` qianfan
@ 2022-03-23 10:12           ` Heinrich Schuchardt
  2022-03-23 11:54             ` qianfanguijin
                               ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Heinrich Schuchardt @ 2022-03-23 10:12 UTC (permalink / raw)
  To: qianfan; +Cc: u-boot

On 3/23/22 11:07, qianfan wrote:
>
> 在 2022/3/23 17:51, Heinrich Schuchardt 写道:
>> On 3/23/22 10:13, qianfan wrote:
>>>
>>> 在 2022/3/23 16:02, qianfan 写道:
>>>>
>>>>
>>>> 在 2022/3/23 15:45, qianfan 写道:
>>>>>
>>>>>
>>>>> 在 2022/3/23 10:28, qianfan 写道:
>>>>>>
>>>>>> Hi:
>>>>>>
>>>>>> I had a custom AM335X board connected my computer by usbnet. It
>>>>>> always report data abort when 'dhcp':
>>>>>>
>>>>>> Next it the log:
>>>>>>
>>>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02
>>>>>> +0800)
>>>>>>
>>>>>> CPU  : AM335X-GP rev 2.1
>>>>>> Model: WISDOM AM335X CCT
>>>>>> DRAM:  512 MiB
>>>>>> NAND:  256 MiB
>>>>>> MMC:   OMAP SD/MMC: 0
>>>>>> Loading Environment from NAND... *** Warning - bad CRC, using
>>>>>> default environment
>>>>>>
>>>>>> Net:   Could not get PHY for ethernet@4a100000: addr 0
>>>>>> eth2: ethernet@4a100000, eth3: usb_ether
>>>>>> Hit any key to stop autoboot:  0
>>>>>> => setenv autoload no
>>>>>> => dhcp
>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>>>>>> MAC de:ad:be:ef:00:01
>>>>>> HOST MAC de:ad:be:ef:00:00
>>>>>> RNDIS ready
>>>>>> musb-hdrc: peripheral reset irq lost!
>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>>>>>> USB RNDIS network up!
>>>>>> BOOTP broadcast 1
>>>>>> BOOTP broadcast 2
>>>>>> BOOTP broadcast 3
>>>>>> DHCP client bound to address 192.168.200.4 (757 ms)
>>>>>> data abort
>>>>>> pc : [<9fe9b0a2>]          lr : [<9febbc3f>]
>>>>>> reloc pc : [<808130a2>]    lr : [<80833c3f>]
>>>>>> sp : 9de53410  ip : 9de53578     fp : 00000001
>>>>>> r10: 9de5345c  r9 : 9de67e80     r8 : 9febbae5
>>>>>> r7 : 9de72c30  r6 : 9feec710     r5 : 0000000d  r4 : 00000018
>>>>>> r3 : 3fdd8e04  r2 : 00000002     r1 : 9feec728  r0 : 9feec700
>>>>>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
>>>>>> Code: f023 0303 60ca 4403 (6091) 685a
>>>>>> Resetting CPU ...
>>>>>>
>>>>>> resetting ...
>>>>>>
>>>>>>
>>>>>> It's there has any doc about how to debug data abort? Or is the bug
>>>>>> is already fixed?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>> This bug doesn't fixed on master code. I found v2021.01 is good and
>>>>> v2021.04-rc2 is bad.
>>>>>
>>>>> Also I had tested this on beaglebone black with am335x_evm_defconfig,
>>>>> has the simliar problem.
>>>>>
>>>>> find the first bug commit via 'git bisect': it told me that commit
>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very
>>>>> strange due to this commit doesn't touch any dhcp or network code.
>>>>>
>>>>> ➜  u-boot-main git:(e97eb638de) ✗ git bisect bug
>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit
>>>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917
>>>>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>>>> Date:   Wed Jan 20 22:21:53 2021 +0100
>>>>>
>>>>>     fs: fat: consistent error handling for flush_dir()
>>>>>
>>>>>     Provide function description for flush_dir().
>>>>>     Move all error messages for flush_dir() from the callers to the
>>>>> function.
>>>>>     Move mapping of errors to -EIO to the function.
>>>>>     Always check return value of flush_dir() (Coverity CID 316362).
>>>>>
>>>>>     In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
>>>>>
>>>>>     Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>>>>
>>>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e
>>>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M      fs
>>>>>
>>>>>
>>>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before
>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917:
>>>>>
>>>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
>>>>> handling for flush_dir()
>>>>> *   184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag
>>>>> 'u-boot-rockchip-20210121' of
>>>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip
>>>>> |\
>>>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc
>>>>> based PCIe controller driver
>>>>>
>>>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
>>>>>
>>>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
>>>>>
>>>>> CPU  : AM335X-GP rev 2.1
>>>>> Model: TI AM335x BeagleBone Black
>>>>> DRAM:  512 MiB
>>>>> WDT:   Started with servicing (60s timeout)
>>>>> NAND:  0 MiB
>>>>> MMC:   OMAP SD/MMC: 0, OMAP SD/MMC: 1
>>>>> Loading Environment from FAT... <ethaddr> not set. Validating first
>>>>> E-fuse MAC
>>>>> Net:   eth2: ethernet@4a100000, eth3: usb_ether
>>>>> Hit any key to stop autoboot:  0
>>>>> => dhcp
>>>>> ethernet@4a100000 Waiting for PHY auto negotiation to
>>>>> complete......... TIMEOUT !
>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>>>>> MAC de:ad:be:ef:00:01
>>>>> HOST MAC de:ad:be:ef:00:00
>>>>> RNDIS ready
>>>>> musb-hdrc: peripheral reset irq lost!
>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>>>>> USB RNDIS network up!
>>>>> BOOTP broadcast 1
>>>>> BOOTP broadcast 2
>>>>> BOOTP broadcast 3
>>>>> DHCP client bound to address 192.168.200.157 (757 ms)
>>>>> Using usb_ether device
>>>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157
>>>>> Filename 'u-boot.img'.
>>>>> Load address: 0x82000000
>>>>> Loading:
>>>>> #################################################################
>>>>> #################################################################
>>>>> #################################################################
>>>>>          #########################
>>>>>          2.5 MiB/s
>>>>> done
>>>>> Bytes transferred = 1123888 (112630 hex)
>>>>> =>
>>>>>
>>> "data abort" messages:
>>>
>>> data abort
>>> pc : [<9ff8196c>]          lr : [<9ffa1cd7>]
>>> reloc pc : [<8081496c>]    lr : [<80834cd7>]
>>> sp : 9df38e60  ip : 9df38fc8     fp : 00000001
>>> r10: 9df38eac  r9 : 9df4ceb0     r8 : 9ffa1b7d
>>> r7 : 9df52fd0  r6 : 9ffdbba8     r5 : 0000000d  r4 : 00000018
>>> r3 : 3ff589e0  r2 : 9ffafa11     r1 : 9ffdbbc0  r0 : 9ffdbb00
>>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
>>> Code: 0303 60ca 4403 6091 (685a) f042
>>> Resetting CPU ...
>>>
>>> objdump u-boot:pc is in malloc and lr is in env_attr_walk
>>>
>>>        unlink(victim, bck, fwd);
>>> 80814966:    60ca          str    r2, [r1, #12]
>>>        set_inuse_bit_at_offset(victim, victim_size);
>>> 80814968:    4403          add    r3, r0
>>>        unlink(victim, bck, fwd);
>>> 8081496a:    6091          str    r1, [r2, #8]
>>>      set_inuse_bit_at_offset(victim, victim_size);
>>> 8081496c:    685a          ldr    r2, [r3, #4]
>>> 8081496e:    f042 0201     orr.w    r2, r2, #1
>>> 80814972:    605a          str    r2, [r3, #4]
>>>
>>> r3 is 3ff589e0 and it's not a valid ram address on am335x.
>>>
>>>
>>
>> I have seen crashes in common/dlmalloc.c before after double free() or
>> free() with an incorrect pointer.
>>
>> The assert() statements in do_check_inuse_chunk() are meant to catch
>> this but assert() as defined in include/log.h does not stop the code and
>> even does not print without _DEBUG=1.
>>
>> You should be able to get the assert output with
>>
>> #include <common.h>
>> #define _DEBUG 1
>> #include <log.h>
>>
>> at the top of common/dlmalloc.c.
>>
>> You should get full malloc debug output with
>
> Hi: I had try add DEBUG marco before <log.h> and no other malloc message

assert() checks for _DEBUG. Defining DEBUG after common.h will not
define _DEBUG.

Best regards

Heinrich

> printed.
>
>>
>> #define DEBUG 1
>> #include <common.h>
>> #include <log.h>
>>
>> Best regards
>>
>> Heinrich
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: data abort when run 'dhcp'
  2022-03-23 10:12           ` Heinrich Schuchardt
@ 2022-03-23 11:54             ` qianfanguijin
  2022-03-24  1:23             ` qianfan
  2022-03-24  9:33             ` qianfan
  2 siblings, 0 replies; 23+ messages in thread
From: qianfanguijin @ 2022-03-23 11:54 UTC (permalink / raw)
  To: Heinrich Schuchardt; +Cc: u-boot

no malloc messages even if i remove the _DEBUG marco check in assert. maybe it can’t detected by do_check_inuse_chunk().

> 在 2022年3月23日,18:12,Heinrich Schuchardt <xypron.glpk@gmx.de> 写道:
> On 3/23/22 11:07, qianfan wrote:
>> 
>> 在 2022/3/23 17:51, Heinrich Schuchardt 写道:
>>> On 3/23/22 10:13, qianfan wrote:
>>>> 在 2022/3/23 16:02, qianfan 写道:
>>>>> 在 2022/3/23 15:45, qianfan 写道:
>>>>>> 在 2022/3/23 10:28, qianfan 写道:
>>>>>>> Hi:
>>>>>>> I had a custom AM335X board connected my computer by usbnet. It
>>>>>>> always report data abort when 'dhcp':
>>>>>>> Next it the log:
>>>>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02
>>>>>>> +0800)
>>>>>>> CPU  : AM335X-GP rev 2.1
>>>>>>> Model: WISDOM AM335X CCT
>>>>>>> DRAM:  512 MiB
>>>>>>> NAND:  256 MiB
>>>>>>> MMC:   OMAP SD/MMC: 0
>>>>>>> Loading Environment from NAND... *** Warning - bad CRC, using
>>>>>>> default environment
>>>>>>> Net:   Could not get PHY for ethernet@4a100000: addr 0
>>>>>>> eth2: ethernet@4a100000, eth3: usb_ether
>>>>>>> Hit any key to stop autoboot:  0
>>>>>>> => setenv autoload no
>>>>>>> => dhcp
>>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>>>>>>> MAC de:ad:be:ef:00:01
>>>>>>> HOST MAC de:ad:be:ef:00:00
>>>>>>> RNDIS ready
>>>>>>> musb-hdrc: peripheral reset irq lost!
>>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>>>>>>> USB RNDIS network up!
>>>>>>> BOOTP broadcast 1
>>>>>>> BOOTP broadcast 2
>>>>>>> BOOTP broadcast 3
>>>>>>> DHCP client bound to address 192.168.200.4 (757 ms)
>>>>>>> data abort
>>>>>>> pc : [<9fe9b0a2>]          lr : [<9febbc3f>]
>>>>>>> reloc pc : [<808130a2>]    lr : [<80833c3f>]
>>>>>>> sp : 9de53410  ip : 9de53578     fp : 00000001
>>>>>>> r10: 9de5345c  r9 : 9de67e80     r8 : 9febbae5
>>>>>>> r7 : 9de72c30  r6 : 9feec710     r5 : 0000000d  r4 : 00000018
>>>>>>> r3 : 3fdd8e04  r2 : 00000002     r1 : 9feec728  r0 : 9feec700
>>>>>>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
>>>>>>> Code: f023 0303 60ca 4403 (6091) 685a
>>>>>>> Resetting CPU ...
>>>>>>> resetting ...
>>>>>>> It's there has any doc about how to debug data abort? Or is the bug
>>>>>>> is already fixed?
>>>>>>> Thanks
>>>>>> This bug doesn't fixed on master code. I found v2021.01 is good and
>>>>>> v2021.04-rc2 is bad.
>>>>>> Also I had tested this on beaglebone black with am335x_evm_defconfig,
>>>>>> has the simliar problem.
>>>>>> find the first bug commit via 'git bisect': it told me that commit
>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very
>>>>>> strange due to this commit doesn't touch any dhcp or network code.
>>>>>> ➜  u-boot-main git:(e97eb638de) ✗ git bisect bug
>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit
>>>>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917
>>>>>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>>>>> Date:   Wed Jan 20 22:21:53 2021 +0100
>>>>>>     fs: fat: consistent error handling for flush_dir()
>>>>>>     Provide function description for flush_dir().
>>>>>>     Move all error messages for flush_dir() from the callers to the
>>>>>> function.
>>>>>>     Move mapping of errors to -EIO to the function.
>>>>>>     Always check return value of flush_dir() (Coverity CID 316362).
>>>>>>     In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
>>>>>>     Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>>>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e
>>>>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M      fs
>>>>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before
>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917:
>>>>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
>>>>>> handling for flush_dir()
>>>>>> *   184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag
>>>>>> 'u-boot-rockchip-20210121' of
>>>>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip
>>>>>> |\
>>>>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc
>>>>>> based PCIe controller driver
>>>>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
>>>>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
>>>>>> CPU  : AM335X-GP rev 2.1
>>>>>> Model: TI AM335x BeagleBone Black
>>>>>> DRAM:  512 MiB
>>>>>> WDT:   Started with servicing (60s timeout)
>>>>>> NAND:  0 MiB
>>>>>> MMC:   OMAP SD/MMC: 0, OMAP SD/MMC: 1
>>>>>> Loading Environment from FAT... <ethaddr> not set. Validating first
>>>>>> E-fuse MAC
>>>>>> Net:   eth2: ethernet@4a100000, eth3: usb_ether
>>>>>> Hit any key to stop autoboot:  0
>>>>>> => dhcp
>>>>>> ethernet@4a100000 Waiting for PHY auto negotiation to
>>>>>> complete......... TIMEOUT !
>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>>>>>> MAC de:ad:be:ef:00:01
>>>>>> HOST MAC de:ad:be:ef:00:00
>>>>>> RNDIS ready
>>>>>> musb-hdrc: peripheral reset irq lost!
>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>>>>>> USB RNDIS network up!
>>>>>> BOOTP broadcast 1
>>>>>> BOOTP broadcast 2
>>>>>> BOOTP broadcast 3
>>>>>> DHCP client bound to address 192.168.200.157 (757 ms)
>>>>>> Using usb_ether device
>>>>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157
>>>>>> Filename 'u-boot.img'.
>>>>>> Load address: 0x82000000
>>>>>> Loading:
>>>>>> #################################################################
>>>>>> #################################################################
>>>>>> #################################################################
>>>>>>          #########################
>>>>>>          2.5 MiB/s
>>>>>> done
>>>>>> Bytes transferred = 1123888 (112630 hex)
>>>>>> =>
>>>> "data abort" messages:
>>>> data abort
>>>> pc : [<9ff8196c>]          lr : [<9ffa1cd7>]
>>>> reloc pc : [<8081496c>]    lr : [<80834cd7>]
>>>> sp : 9df38e60  ip : 9df38fc8     fp : 00000001
>>>> r10: 9df38eac  r9 : 9df4ceb0     r8 : 9ffa1b7d
>>>> r7 : 9df52fd0  r6 : 9ffdbba8     r5 : 0000000d  r4 : 00000018
>>>> r3 : 3ff589e0  r2 : 9ffafa11     r1 : 9ffdbbc0  r0 : 9ffdbb00
>>>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
>>>> Code: 0303 60ca 4403 6091 (685a) f042
>>>> Resetting CPU ...
>>>> objdump u-boot:pc is in malloc and lr is in env_attr_walk
>>>>        unlink(victim, bck, fwd);
>>>> 80814966:    60ca          str    r2, [r1, #12]
>>>>        set_inuse_bit_at_offset(victim, victim_size);
>>>> 80814968:    4403          add    r3, r0
>>>>        unlink(victim, bck, fwd);
>>>> 8081496a:    6091          str    r1, [r2, #8]
>>>>      set_inuse_bit_at_offset(victim, victim_size);
>>>> 8081496c:    685a          ldr    r2, [r3, #4]
>>>> 8081496e:    f042 0201     orr.w    r2, r2, #1
>>>> 80814972:    605a          str    r2, [r3, #4]
>>>> r3 is 3ff589e0 and it's not a valid ram address on am335x.
>>> I have seen crashes in common/dlmalloc.c before after double free() or
>>> free() with an incorrect pointer.
>>> The assert() statements in do_check_inuse_chunk() are meant to catch
>>> this but assert() as defined in include/log.h does not stop the code and
>>> even does not print without _DEBUG=1.
>>> You should be able to get the assert output with
>>> #include <common.h>
>>> #define _DEBUG 1
>>> #include <log.h>
>>> at the top of common/dlmalloc.c.
>>> You should get full malloc debug output with
>> 
>> Hi: I had try add DEBUG marco before <log.h> and no other malloc message
> 
> assert() checks for _DEBUG. Defining DEBUG after common.h will not
> define _DEBUG.
> 
> Best regards
> 
> Heinrich
> 
>> printed.
>> 
>>> #define DEBUG 1
>>> #include <common.h>
>>> #include <log.h>
>>> Best regards
>>> Heinrich


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: data abort when run 'dhcp'
  2022-03-23 10:12           ` Heinrich Schuchardt
  2022-03-23 11:54             ` qianfanguijin
@ 2022-03-24  1:23             ` qianfan
  2022-03-24  9:33             ` qianfan
  2 siblings, 0 replies; 23+ messages in thread
From: qianfan @ 2022-03-24  1:23 UTC (permalink / raw)
  To: Heinrich Schuchardt, Sean Anderson; +Cc: u-boot

Adding 'while (1) ;' before bad_mode in data_abort function and so I can gdb 
u-boot when data abort.

Yes, I can connect it via gdb, but bt can't show the full stack.

(gdb) add-symbol-file u-boot 0x9ff66000
add symbol table from file "u-boot" at
         .text_addr = 0x9ff66000
(y or n) y
Reading symbols from u-boot...done.
(gdb) bt
#0  do_data_abort (pt_regs=0x9df30eb8) at arch/arm/lib/interrupts.c:169
#1  0x9ff661c8 in data_abort () at arch/arm/lib/vectors.S:271
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Hi @Sean Anderson, I had notice that you just submit some patchs about malloc, 
could you please also check this?

在 2022/3/23 18:12, Heinrich Schuchardt 写道:
> On 3/23/22 11:07, qianfan wrote:
>>
>> 在 2022/3/23 17:51, Heinrich Schuchardt 写道:
>>> On 3/23/22 10:13, qianfan wrote:
>>>>
>>>> 在 2022/3/23 16:02, qianfan 写道:
>>>>>
>>>>>
>>>>> 在 2022/3/23 15:45, qianfan 写道:
>>>>>>
>>>>>>
>>>>>> 在 2022/3/23 10:28, qianfan 写道:
>>>>>>>
>>>>>>> Hi:
>>>>>>>
>>>>>>> I had a custom AM335X board connected my computer by usbnet. It
>>>>>>> always report data abort when 'dhcp':
>>>>>>>
>>>>>>> Next it the log:
>>>>>>>
>>>>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02
>>>>>>> +0800)
>>>>>>>
>>>>>>> CPU  : AM335X-GP rev 2.1
>>>>>>> Model: WISDOM AM335X CCT
>>>>>>> DRAM:  512 MiB
>>>>>>> NAND:  256 MiB
>>>>>>> MMC:   OMAP SD/MMC: 0
>>>>>>> Loading Environment from NAND... *** Warning - bad CRC, using
>>>>>>> default environment
>>>>>>>
>>>>>>> Net:   Could not get PHY for ethernet@4a100000: addr 0
>>>>>>> eth2: ethernet@4a100000, eth3: usb_ether
>>>>>>> Hit any key to stop autoboot:  0
>>>>>>> => setenv autoload no
>>>>>>> => dhcp
>>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>>>>>>> MAC de:ad:be:ef:00:01
>>>>>>> HOST MAC de:ad:be:ef:00:00
>>>>>>> RNDIS ready
>>>>>>> musb-hdrc: peripheral reset irq lost!
>>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>>>>>>> USB RNDIS network up!
>>>>>>> BOOTP broadcast 1
>>>>>>> BOOTP broadcast 2
>>>>>>> BOOTP broadcast 3
>>>>>>> DHCP client bound to address 192.168.200.4 (757 ms)
>>>>>>> data abort
>>>>>>> pc : [<9fe9b0a2>]          lr : [<9febbc3f>]
>>>>>>> reloc pc : [<808130a2>]    lr : [<80833c3f>]
>>>>>>> sp : 9de53410  ip : 9de53578     fp : 00000001
>>>>>>> r10: 9de5345c  r9 : 9de67e80     r8 : 9febbae5
>>>>>>> r7 : 9de72c30  r6 : 9feec710     r5 : 0000000d  r4 : 00000018
>>>>>>> r3 : 3fdd8e04  r2 : 00000002     r1 : 9feec728  r0 : 9feec700
>>>>>>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
>>>>>>> Code: f023 0303 60ca 4403 (6091) 685a
>>>>>>> Resetting CPU ...
>>>>>>>
>>>>>>> resetting ...
>>>>>>>
>>>>>>>
>>>>>>> It's there has any doc about how to debug data abort? Or is the bug
>>>>>>> is already fixed?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>> This bug doesn't fixed on master code. I found v2021.01 is good and
>>>>>> v2021.04-rc2 is bad.
>>>>>>
>>>>>> Also I had tested this on beaglebone black with am335x_evm_defconfig,
>>>>>> has the simliar problem.
>>>>>>
>>>>>> find the first bug commit via 'git bisect': it told me that commit
>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very
>>>>>> strange due to this commit doesn't touch any dhcp or network code.
>>>>>>
>>>>>> ➜  u-boot-main git:(e97eb638de) ✗ git bisect bug
>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit
>>>>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917
>>>>>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>>>>> Date:   Wed Jan 20 22:21:53 2021 +0100
>>>>>>
>>>>>>     fs: fat: consistent error handling for flush_dir()
>>>>>>
>>>>>>     Provide function description for flush_dir().
>>>>>>     Move all error messages for flush_dir() from the callers to the
>>>>>> function.
>>>>>>     Move mapping of errors to -EIO to the function.
>>>>>>     Always check return value of flush_dir() (Coverity CID 316362).
>>>>>>
>>>>>>     In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
>>>>>>
>>>>>>     Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>>>>>
>>>>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e
>>>>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M      fs
>>>>>>
>>>>>>
>>>>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before
>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917:
>>>>>>
>>>>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
>>>>>> handling for flush_dir()
>>>>>> *   184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag
>>>>>> 'u-boot-rockchip-20210121' of
>>>>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip
>>>>>> |\
>>>>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc
>>>>>> based PCIe controller driver
>>>>>>
>>>>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
>>>>>>
>>>>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
>>>>>>
>>>>>> CPU  : AM335X-GP rev 2.1
>>>>>> Model: TI AM335x BeagleBone Black
>>>>>> DRAM:  512 MiB
>>>>>> WDT:   Started with servicing (60s timeout)
>>>>>> NAND:  0 MiB
>>>>>> MMC:   OMAP SD/MMC: 0, OMAP SD/MMC: 1
>>>>>> Loading Environment from FAT... <ethaddr> not set. Validating first
>>>>>> E-fuse MAC
>>>>>> Net:   eth2: ethernet@4a100000, eth3: usb_ether
>>>>>> Hit any key to stop autoboot:  0
>>>>>> => dhcp
>>>>>> ethernet@4a100000 Waiting for PHY auto negotiation to
>>>>>> complete......... TIMEOUT !
>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>>>>>> MAC de:ad:be:ef:00:01
>>>>>> HOST MAC de:ad:be:ef:00:00
>>>>>> RNDIS ready
>>>>>> musb-hdrc: peripheral reset irq lost!
>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>>>>>> USB RNDIS network up!
>>>>>> BOOTP broadcast 1
>>>>>> BOOTP broadcast 2
>>>>>> BOOTP broadcast 3
>>>>>> DHCP client bound to address 192.168.200.157 (757 ms)
>>>>>> Using usb_ether device
>>>>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157
>>>>>> Filename 'u-boot.img'.
>>>>>> Load address: 0x82000000
>>>>>> Loading:
>>>>>> #################################################################
>>>>>> #################################################################
>>>>>> #################################################################
>>>>>>          #########################
>>>>>>          2.5 MiB/s
>>>>>> done
>>>>>> Bytes transferred = 1123888 (112630 hex)
>>>>>> =>
>>>>>>
>>>> "data abort" messages:
>>>>
>>>> data abort
>>>> pc : [<9ff8196c>]          lr : [<9ffa1cd7>]
>>>> reloc pc : [<8081496c>]    lr : [<80834cd7>]
>>>> sp : 9df38e60  ip : 9df38fc8     fp : 00000001
>>>> r10: 9df38eac  r9 : 9df4ceb0     r8 : 9ffa1b7d
>>>> r7 : 9df52fd0  r6 : 9ffdbba8     r5 : 0000000d  r4 : 00000018
>>>> r3 : 3ff589e0  r2 : 9ffafa11     r1 : 9ffdbbc0  r0 : 9ffdbb00
>>>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
>>>> Code: 0303 60ca 4403 6091 (685a) f042
>>>> Resetting CPU ...
>>>>
>>>> objdump u-boot:pc is in malloc and lr is in env_attr_walk
>>>>
>>>>        unlink(victim, bck, fwd);
>>>> 80814966:    60ca          str    r2, [r1, #12]
>>>>        set_inuse_bit_at_offset(victim, victim_size);
>>>> 80814968:    4403          add    r3, r0
>>>>        unlink(victim, bck, fwd);
>>>> 8081496a:    6091          str    r1, [r2, #8]
>>>>      set_inuse_bit_at_offset(victim, victim_size);
>>>> 8081496c:    685a          ldr    r2, [r3, #4]
>>>> 8081496e:    f042 0201     orr.w    r2, r2, #1
>>>> 80814972:    605a          str    r2, [r3, #4]
>>>>
>>>> r3 is 3ff589e0 and it's not a valid ram address on am335x.
>>>>
>>>>
>>>
>>> I have seen crashes in common/dlmalloc.c before after double free() or
>>> free() with an incorrect pointer.
>>>
>>> The assert() statements in do_check_inuse_chunk() are meant to catch
>>> this but assert() as defined in include/log.h does not stop the code and
>>> even does not print without _DEBUG=1.
>>>
>>> You should be able to get the assert output with
>>>
>>> #include <common.h>
>>> #define _DEBUG 1
>>> #include <log.h>
>>>
>>> at the top of common/dlmalloc.c.
>>>
>>> You should get full malloc debug output with
>>
>> Hi: I had try add DEBUG marco before <log.h> and no other malloc message
>
> assert() checks for _DEBUG. Defining DEBUG after common.h will not
> define _DEBUG.
>
> Best regards
>
> Heinrich
>
>> printed.
>>
>>>
>>> #define DEBUG 1
>>> #include <common.h>
>>> #include <log.h>
>>>
>>> Best regards
>>>
>>> Heinrich
>>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to debug u-boot data abort
  2022-03-23  8:27   ` How to debug u-boot data abort Heinrich Schuchardt
@ 2022-03-24  3:18     ` AKASHI Takahiro
  2022-03-24  7:38       ` qianfan
  0 siblings, 1 reply; 23+ messages in thread
From: AKASHI Takahiro @ 2022-03-24  3:18 UTC (permalink / raw)
  To: Heinrich Schuchardt; +Cc: qianfan, u-boot

On Wed, Mar 23, 2022 at 09:27:08AM +0100, Heinrich Schuchardt wrote:
> On 3/23/22 08:45, qianfan wrote:
> > 
> > 在 2022/3/23 10:28, qianfan 写道:
> > > 
> > > Hi:
> > > 
> > > I had a custom AM335X board connected my computer by usbnet. It always
> > > report data abort when 'dhcp':
> > > 
> > > Next it the log:
> > > 
> > > U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800)
> > > 
> > > CPU  : AM335X-GP rev 2.1
> > > Model: WISDOM AM335X CCT
> > > DRAM:  512 MiB
> > > NAND:  256 MiB
> > > MMC:   OMAP SD/MMC: 0
> > > Loading Environment from NAND... *** Warning - bad CRC, using default
> > > environment
> > > 
> > > Net:   Could not get PHY for ethernet@4a100000: addr 0
> > > eth2: ethernet@4a100000, eth3: usb_ether
> > > Hit any key to stop autoboot:  0
> > > => setenv autoload no
> > > => dhcp
> > > using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
> > > MAC de:ad:be:ef:00:01
> > > HOST MAC de:ad:be:ef:00:00
> > > RNDIS ready
> > > musb-hdrc: peripheral reset irq lost!
> > > high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
> > > USB RNDIS network up!
> > > BOOTP broadcast 1
> > > BOOTP broadcast 2
> > > BOOTP broadcast 3
> > > DHCP client bound to address 192.168.200.4 (757 ms)
> > > data abort
> 
> This could be an alignment error.
> 
> > > pc : [<9fe9b0a2>]          lr : [<9febbc3f>]
> > > reloc pc : [<808130a2>]    lr : [<80833c3f>]
> 
> You can use these addresses together with the u-boot.map file to figure
> out in which function the abort occurs and from where it was called.
> 
> Use 'arm-linux-gnueabihf-objdump -S -D' to find the exact code positions.
> 
> > > sp : 9de53410  ip : 9de53578     fp : 00000001
> > > r10: 9de5345c  r9 : 9de67e80     r8 : 9febbae5
> > > r7 : 9de72c30  r6 : 9feec710     r5 : 0000000d  r4 : 00000018
> > > r3 : 3fdd8e04  r2 : 00000002     r1 : 9feec728  r0 : 9feec700
> > > Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
> > > Code: f023 0303 60ca 4403 (6091) 685a
> 
> This is how to find the exact instruction causing the problem:
> 
> $ echo 'Code: f023 0303 60ca 4403 (6091) 685a' | \
> >       ARCH=arm scripts/decodecode
> Code: f023 0303 60ca 4403 (6091) 685a
> All code
> ========
>    0:   23 f0                   and    %eax,%esi
>    2:   03 03                   add    (%rbx),%eax
>    4:   ca 60 03                lret   $0x360
>    7:*  44 91                   rex.R xchg %eax,%ecx            <--
> trapping instruction
>    9:   60                      (bad)
>    a:   5a                      pop    %rdx
>    b:   68                      .byte 0x68
> 
> Code starting with the faulting instruction
> ===========================================
>    0:   91                      xchg   %eax,%ecx
>    1:   60                      (bad)
>    2:   5a                      pop    %rdx
>    3:   68                      .byte 0x68

The code looks like x86 instructions.
Please don't forget to add "CROSS_COMPILE=..." :)

Code: f023 0303 60ca 4403 (6091) 685a
All code
========
   0:	f023 0303 	bic.w	r3, r3, #3
   4:	60ca      	str	r2, [r1, #12]
   6:	4403      	add	r3, r0
   8:*	6091      	str	r1, [r2, #8]		<-- trapping instruction
   a:	685a      	ldr	r2, [r3, #4]

Code starting with the faulting instruction
===========================================
   0:	6091      	str	r1, [r2, #8]
   2:	685a      	ldr	r2, [r3, #4]

Then,
${CROSS_COMPILE}objdump --disassemble=malloc -lS ${BUILDDIR}/u-boot | grep -A 10 -B 20 ${PATTERN}
# Here, PATTERN may be the instruction ("6091") or the location ("8081496c" in your case?)

or similarly

${CROSS_COMPILE}gdb --batch -ex "disas/m ${LOC}" ${BUILDDIR}/u-boot | grep -A 10 -B 20 ${LOC}
# Here, LOC is your "reloc pc" (0x80817586)

gives you some hint about the exact location.

-Takahiro Akashi


> I hope this helps to figure out, where exactly the problem occurs
> 
> Best regards
> 
> Heinrich
> 
> > > Resetting CPU ...
> > > 
> > > resetting ...
> > > 
> > > 
> > > It's there has any doc about how to debug data abort? Or is the bug is
> > > already fixed?
> > > 
> > > Thanks
> > > 
> > This bug doesn't fixed on master code. I found v2021.01 is good and
> > v2021.04-rc2 is bad.
> > 
> > Also I had tested this on beaglebone black with am335x_evm_defconfig,
> > has the simliar problem.
> > 
> > find the first bug commit via 'git bisect': it told me that commit
> > e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very
> > strange due to this commit doesn't touch any dhcp or network code.
> > 
> > ➜  u-boot-main git:(e97eb638de) ✗ git bisect bug
> > e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit
> > commit e97eb638de0dc8f6e989e20eaeb0342f103cb917
> > Author: Heinrich Schuchardt <xypron.glpk@gmx.de>
> > Date:   Wed Jan 20 22:21:53 2021 +0100
> > 
> >      fs: fat: consistent error handling for flush_dir()
> > 
> >      Provide function description for flush_dir().
> >      Move all error messages for flush_dir() from the callers to the
> > function.
> >      Move mapping of errors to -EIO to the function.
> >      Always check return value of flush_dir() (Coverity CID 316362).
> > 
> >      In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
> > 
> >      Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
> > 
> > :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e
> > 77d188b1c99181fd71f2167fdeee3434a09db209 M      fs
> > 
> > 
> > 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before
> > e97eb638de0dc8f6e989e20eaeb0342f103cb917:
> > 
> > * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
> > handling for flush_dir()
> > *   184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag
> > 'u-boot-rockchip-20210121' of
> > https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip
> > |\
> > | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based
> > PCIe controller driver
> > 
> > I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
> > 
> > U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
> > 
> > CPU  : AM335X-GP rev 2.1
> > Model: TI AM335x BeagleBone Black
> > DRAM:  512 MiB
> > WDT:   Started with servicing (60s timeout)
> > NAND:  0 MiB
> > MMC:   OMAP SD/MMC: 0, OMAP SD/MMC: 1
> > Loading Environment from FAT... <ethaddr> not set. Validating first
> > E-fuse MAC
> > Net:   eth2: ethernet@4a100000, eth3: usb_ether
> > Hit any key to stop autoboot:  0
> > => dhcp
> > ethernet@4a100000 Waiting for PHY auto negotiation to complete.........
> > TIMEOUT !
> > using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
> > MAC de:ad:be:ef:00:01
> > HOST MAC de:ad:be:ef:00:00
> > RNDIS ready
> > musb-hdrc: peripheral reset irq lost!
> > high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
> > USB RNDIS network up!
> > BOOTP broadcast 1
> > BOOTP broadcast 2
> > BOOTP broadcast 3
> > DHCP client bound to address 192.168.200.157 (757 ms)
> > Using usb_ether device
> > TFTP from server 192.168.200.1; our IP address is 192.168.200.157
> > Filename 'u-boot.img'.
> > Load address: 0x82000000
> > Loading: #################################################################
> > #################################################################
> > #################################################################
> >           #########################
> >           2.5 MiB/s
> > done
> > Bytes transferred = 1123888 (112630 hex)
> > =>
> > 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: How to debug u-boot data abort
  2022-03-24  3:18     ` AKASHI Takahiro
@ 2022-03-24  7:38       ` qianfan
  0 siblings, 0 replies; 23+ messages in thread
From: qianfan @ 2022-03-24  7:38 UTC (permalink / raw)
  To: AKASHI Takahiro, Heinrich Schuchardt, u-boot


在 2022/3/24 11:18, AKASHI Takahiro 写道:
> On Wed, Mar 23, 2022 at 09:27:08AM +0100, Heinrich Schuchardt wrote:
>> On 3/23/22 08:45, qianfan wrote:
>>> 在 2022/3/23 10:28, qianfan 写道:
>>>> Hi:
>>>>
>>>> I had a custom AM335X board connected my computer by usbnet. It always
>>>> report data abort when 'dhcp':
>>>>
>>>> Next it the log:
>>>>
>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800)
>>>>
>>>> CPU  : AM335X-GP rev 2.1
>>>> Model: WISDOM AM335X CCT
>>>> DRAM:  512 MiB
>>>> NAND:  256 MiB
>>>> MMC:   OMAP SD/MMC: 0
>>>> Loading Environment from NAND... *** Warning - bad CRC, using default
>>>> environment
>>>>
>>>> Net:   Could not get PHY for ethernet@4a100000: addr 0
>>>> eth2: ethernet@4a100000, eth3: usb_ether
>>>> Hit any key to stop autoboot:  0
>>>> => setenv autoload no
>>>> => dhcp
>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>>>> MAC de:ad:be:ef:00:01
>>>> HOST MAC de:ad:be:ef:00:00
>>>> RNDIS ready
>>>> musb-hdrc: peripheral reset irq lost!
>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>>>> USB RNDIS network up!
>>>> BOOTP broadcast 1
>>>> BOOTP broadcast 2
>>>> BOOTP broadcast 3
>>>> DHCP client bound to address 192.168.200.4 (757 ms)
>>>> data abort
>> This could be an alignment error.
>>
>>>> pc : [<9fe9b0a2>]          lr : [<9febbc3f>]
>>>> reloc pc : [<808130a2>]    lr : [<80833c3f>]
>> You can use these addresses together with the u-boot.map file to figure
>> out in which function the abort occurs and from where it was called.
>>
>> Use 'arm-linux-gnueabihf-objdump -S -D' to find the exact code positions.
>>
>>>> sp : 9de53410  ip : 9de53578     fp : 00000001
>>>> r10: 9de5345c  r9 : 9de67e80     r8 : 9febbae5
>>>> r7 : 9de72c30  r6 : 9feec710     r5 : 0000000d  r4 : 00000018
>>>> r3 : 3fdd8e04  r2 : 00000002     r1 : 9feec728  r0 : 9feec700
>>>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
>>>> Code: f023 0303 60ca 4403 (6091) 685a
>> This is how to find the exact instruction causing the problem:
>>
>> $ echo 'Code: f023 0303 60ca 4403 (6091) 685a' | \
>>>        ARCH=arm scripts/decodecode
>> Code: f023 0303 60ca 4403 (6091) 685a
>> All code
>> ========
>>     0:   23 f0                   and    %eax,%esi
>>     2:   03 03                   add    (%rbx),%eax
>>     4:   ca 60 03                lret   $0x360
>>     7:*  44 91                   rex.R xchg %eax,%ecx            <--
>> trapping instruction
>>     9:   60                      (bad)
>>     a:   5a                      pop    %rdx
>>     b:   68                      .byte 0x68
>>
>> Code starting with the faulting instruction
>> ===========================================
>>     0:   91                      xchg   %eax,%ecx
>>     1:   60                      (bad)
>>     2:   5a                      pop    %rdx
>>     3:   68                      .byte 0x68
> The code looks like x86 instructions.
> Please don't forget to add "CROSS_COMPILE=..." :)
>
> Code: f023 0303 60ca 4403 (6091) 685a
> All code
> ========
>     0:	f023 0303 	bic.w	r3, r3, #3
>     4:	60ca      	str	r2, [r1, #12]
>     6:	4403      	add	r3, r0
>     8:*	6091      	str	r1, [r2, #8]		<-- trapping instruction
>     a:	685a      	ldr	r2, [r3, #4]
>
> Code starting with the faulting instruction
> ===========================================
>     0:	6091      	str	r1, [r2, #8]
>     2:	685a      	ldr	r2, [r3, #4]
>
> Then,
> ${CROSS_COMPILE}objdump --disassemble=malloc -lS ${BUILDDIR}/u-boot | grep -A 10 -B 20 ${PATTERN}
> # Here, PATTERN may be the instruction ("6091") or the location ("8081496c" in your case?)
>
> or similarly
>
> ${CROSS_COMPILE}gdb --batch -ex "disas/m ${LOC}" ${BUILDDIR}/u-boot | grep -A 10 -B 20 ${LOC}
> # Here, LOC is your "reloc pc" (0x80817586)
>
> gives you some hint about the exact location.
>
> -Takahiro Akashi

Hi:

Thanks for your's guide. I know the pc in malloc and lr is env_attr_walk. But 
can't get the full stack or malloc.

I can't understand dlmalloc's logic and it's hard to me to solve this problem.

>
>
>> I hope this helps to figure out, where exactly the problem occurs
>>
>> Best regards
>>
>> Heinrich
>>
>>>> Resetting CPU ...
>>>>
>>>> resetting ...
>>>>
>>>>
>>>> It's there has any doc about how to debug data abort? Or is the bug is
>>>> already fixed?
>>>>
>>>> Thanks
>>>>
>>> This bug doesn't fixed on master code. I found v2021.01 is good and
>>> v2021.04-rc2 is bad.
>>>
>>> Also I had tested this on beaglebone black with am335x_evm_defconfig,
>>> has the simliar problem.
>>>
>>> find the first bug commit via 'git bisect': it told me that commit
>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very
>>> strange due to this commit doesn't touch any dhcp or network code.
>>>
>>> ➜  u-boot-main git:(e97eb638de) ✗ git bisect bug
>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit
>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917
>>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>> Date:   Wed Jan 20 22:21:53 2021 +0100
>>>
>>>       fs: fat: consistent error handling for flush_dir()
>>>
>>>       Provide function description for flush_dir().
>>>       Move all error messages for flush_dir() from the callers to the
>>> function.
>>>       Move mapping of errors to -EIO to the function.
>>>       Always check return value of flush_dir() (Coverity CID 316362).
>>>
>>>       In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
>>>
>>>       Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>>
>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e
>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M      fs
>>>
>>>
>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before
>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917:
>>>
>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
>>> handling for flush_dir()
>>> *   184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag
>>> 'u-boot-rockchip-20210121' of
>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip
>>> |\
>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based
>>> PCIe controller driver
>>>
>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
>>>
>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
>>>
>>> CPU  : AM335X-GP rev 2.1
>>> Model: TI AM335x BeagleBone Black
>>> DRAM:  512 MiB
>>> WDT:   Started with servicing (60s timeout)
>>> NAND:  0 MiB
>>> MMC:   OMAP SD/MMC: 0, OMAP SD/MMC: 1
>>> Loading Environment from FAT... <ethaddr> not set. Validating first
>>> E-fuse MAC
>>> Net:   eth2: ethernet@4a100000, eth3: usb_ether
>>> Hit any key to stop autoboot:  0
>>> => dhcp
>>> ethernet@4a100000 Waiting for PHY auto negotiation to complete.........
>>> TIMEOUT !
>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>>> MAC de:ad:be:ef:00:01
>>> HOST MAC de:ad:be:ef:00:00
>>> RNDIS ready
>>> musb-hdrc: peripheral reset irq lost!
>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>>> USB RNDIS network up!
>>> BOOTP broadcast 1
>>> BOOTP broadcast 2
>>> BOOTP broadcast 3
>>> DHCP client bound to address 192.168.200.157 (757 ms)
>>> Using usb_ether device
>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157
>>> Filename 'u-boot.img'.
>>> Load address: 0x82000000
>>> Loading: #################################################################
>>> #################################################################
>>> #################################################################
>>>            #########################
>>>            2.5 MiB/s
>>> done
>>> Bytes transferred = 1123888 (112630 hex)
>>> =>
>>>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: data abort when run 'dhcp'
  2022-03-23 10:12           ` Heinrich Schuchardt
  2022-03-23 11:54             ` qianfanguijin
  2022-03-24  1:23             ` qianfan
@ 2022-03-24  9:33             ` qianfan
  2022-03-25 10:04               ` qianfan
  2 siblings, 1 reply; 23+ messages in thread
From: qianfan @ 2022-03-24  9:33 UTC (permalink / raw)
  To: Heinrich Schuchardt; +Cc: u-boot


在 2022/3/23 18:12, Heinrich Schuchardt 写道:
> On 3/23/22 11:07, qianfan wrote:
>>
>> 在 2022/3/23 17:51, Heinrich Schuchardt 写道:
>>> On 3/23/22 10:13, qianfan wrote:
>>>>
>>>> 在 2022/3/23 16:02, qianfan 写道:
>>>>>
>>>>>
>>>>> 在 2022/3/23 15:45, qianfan 写道:
>>>>>>
>>>>>>
>>>>>> 在 2022/3/23 10:28, qianfan 写道:
>>>>>>>
>>>>>>> Hi:
>>>>>>>
>>>>>>> I had a custom AM335X board connected my computer by usbnet. It
>>>>>>> always report data abort when 'dhcp':
>>>>>>>
>>>>>>> Next it the log:
>>>>>>>
>>>>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02
>>>>>>> +0800)
>>>>>>>
>>>>>>> CPU  : AM335X-GP rev 2.1
>>>>>>> Model: WISDOM AM335X CCT
>>>>>>> DRAM:  512 MiB
>>>>>>> NAND:  256 MiB
>>>>>>> MMC:   OMAP SD/MMC: 0
>>>>>>> Loading Environment from NAND... *** Warning - bad CRC, using
>>>>>>> default environment
>>>>>>>
>>>>>>> Net:   Could not get PHY for ethernet@4a100000: addr 0
>>>>>>> eth2: ethernet@4a100000, eth3: usb_ether
>>>>>>> Hit any key to stop autoboot:  0
>>>>>>> => setenv autoload no
>>>>>>> => dhcp
>>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>>>>>>> MAC de:ad:be:ef:00:01
>>>>>>> HOST MAC de:ad:be:ef:00:00
>>>>>>> RNDIS ready
>>>>>>> musb-hdrc: peripheral reset irq lost!
>>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>>>>>>> USB RNDIS network up!
>>>>>>> BOOTP broadcast 1
>>>>>>> BOOTP broadcast 2
>>>>>>> BOOTP broadcast 3
>>>>>>> DHCP client bound to address 192.168.200.4 (757 ms)
>>>>>>> data abort
>>>>>>> pc : [<9fe9b0a2>]          lr : [<9febbc3f>]
>>>>>>> reloc pc : [<808130a2>]    lr : [<80833c3f>]
>>>>>>> sp : 9de53410  ip : 9de53578     fp : 00000001
>>>>>>> r10: 9de5345c  r9 : 9de67e80     r8 : 9febbae5
>>>>>>> r7 : 9de72c30  r6 : 9feec710     r5 : 0000000d  r4 : 00000018
>>>>>>> r3 : 3fdd8e04  r2 : 00000002     r1 : 9feec728  r0 : 9feec700
>>>>>>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
>>>>>>> Code: f023 0303 60ca 4403 (6091) 685a
>>>>>>> Resetting CPU ...
>>>>>>>
>>>>>>> resetting ...
>>>>>>>
>>>>>>>
>>>>>>> It's there has any doc about how to debug data abort? Or is the bug
>>>>>>> is already fixed?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>> This bug doesn't fixed on master code. I found v2021.01 is good and
>>>>>> v2021.04-rc2 is bad.
>>>>>>
>>>>>> Also I had tested this on beaglebone black with am335x_evm_defconfig,
>>>>>> has the simliar problem.
>>>>>>
>>>>>> find the first bug commit via 'git bisect': it told me that commit
>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very
>>>>>> strange due to this commit doesn't touch any dhcp or network code.
>>>>>>
>>>>>> ➜  u-boot-main git:(e97eb638de) ✗ git bisect bug
>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit
>>>>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917
>>>>>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>>>>> Date:   Wed Jan 20 22:21:53 2021 +0100
>>>>>>
>>>>>>     fs: fat: consistent error handling for flush_dir()
>>>>>>
>>>>>>     Provide function description for flush_dir().
>>>>>>     Move all error messages for flush_dir() from the callers to the
>>>>>> function.
>>>>>>     Move mapping of errors to -EIO to the function.
>>>>>>     Always check return value of flush_dir() (Coverity CID 316362).
>>>>>>
>>>>>>     In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
>>>>>>
>>>>>>     Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>>>>>
>>>>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e
>>>>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M      fs
>>>>>>
>>>>>>
>>>>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before
>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917:
>>>>>>
>>>>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
>>>>>> handling for flush_dir()
>>>>>> *   184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag
>>>>>> 'u-boot-rockchip-20210121' of
>>>>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip
>>>>>> |\
>>>>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc
>>>>>> based PCIe controller driver
>>>>>>
>>>>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
>>>>>>
>>>>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
>>>>>>
>>>>>> CPU  : AM335X-GP rev 2.1
>>>>>> Model: TI AM335x BeagleBone Black
>>>>>> DRAM:  512 MiB
>>>>>> WDT:   Started with servicing (60s timeout)
>>>>>> NAND:  0 MiB
>>>>>> MMC:   OMAP SD/MMC: 0, OMAP SD/MMC: 1
>>>>>> Loading Environment from FAT... <ethaddr> not set. Validating first
>>>>>> E-fuse MAC
>>>>>> Net:   eth2: ethernet@4a100000, eth3: usb_ether
>>>>>> Hit any key to stop autoboot:  0
>>>>>> => dhcp
>>>>>> ethernet@4a100000 Waiting for PHY auto negotiation to
>>>>>> complete......... TIMEOUT !
>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>>>>>> MAC de:ad:be:ef:00:01
>>>>>> HOST MAC de:ad:be:ef:00:00
>>>>>> RNDIS ready
>>>>>> musb-hdrc: peripheral reset irq lost!
>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>>>>>> USB RNDIS network up!
>>>>>> BOOTP broadcast 1
>>>>>> BOOTP broadcast 2
>>>>>> BOOTP broadcast 3
>>>>>> DHCP client bound to address 192.168.200.157 (757 ms)
>>>>>> Using usb_ether device
>>>>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157
>>>>>> Filename 'u-boot.img'.
>>>>>> Load address: 0x82000000
>>>>>> Loading:
>>>>>> #################################################################
>>>>>> #################################################################
>>>>>> #################################################################
>>>>>>          #########################
>>>>>>          2.5 MiB/s
>>>>>> done
>>>>>> Bytes transferred = 1123888 (112630 hex)
>>>>>> =>
>>>>>>
>>>> "data abort" messages:
>>>>
>>>> data abort
>>>> pc : [<9ff8196c>]          lr : [<9ffa1cd7>]
>>>> reloc pc : [<8081496c>]    lr : [<80834cd7>]
>>>> sp : 9df38e60  ip : 9df38fc8     fp : 00000001
>>>> r10: 9df38eac  r9 : 9df4ceb0     r8 : 9ffa1b7d
>>>> r7 : 9df52fd0  r6 : 9ffdbba8     r5 : 0000000d  r4 : 00000018
>>>> r3 : 3ff589e0  r2 : 9ffafa11     r1 : 9ffdbbc0  r0 : 9ffdbb00
>>>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
>>>> Code: 0303 60ca 4403 6091 (685a) f042
>>>> Resetting CPU ...
>>>>
>>>> objdump u-boot:pc is in malloc and lr is in env_attr_walk
>>>>
>>>>        unlink(victim, bck, fwd);
>>>> 80814966:    60ca          str    r2, [r1, #12]
>>>>        set_inuse_bit_at_offset(victim, victim_size);
>>>> 80814968:    4403          add    r3, r0
>>>>        unlink(victim, bck, fwd);
>>>> 8081496a:    6091          str    r1, [r2, #8]
>>>>      set_inuse_bit_at_offset(victim, victim_size);
>>>> 8081496c:    685a          ldr    r2, [r3, #4]
>>>> 8081496e:    f042 0201     orr.w    r2, r2, #1
>>>> 80814972:    605a          str    r2, [r3, #4]
>>>>
>>>> r3 is 3ff589e0 and it's not a valid ram address on am335x.
>>>>
>>>>
>>>
>>> I have seen crashes in common/dlmalloc.c before after double free() or
>>> free() with an incorrect pointer.
>>>
>>> The assert() statements in do_check_inuse_chunk() are meant to catch
>>> this but assert() as defined in include/log.h does not stop the code and
>>> even does not print without _DEBUG=1.
>>>
>>> You should be able to get the assert output with
>>>
>>> #include <common.h>
>>> #define _DEBUG 1
>>> #include <log.h>
>>>
>>> at the top of common/dlmalloc.c.
>>>
>>> You should get full malloc debug output with
>>
>> Hi: I had try add DEBUG marco before <log.h> and no other malloc message
>
> assert() checks for _DEBUG. Defining DEBUG after common.h will not
> define _DEBUG.

Finally I got a malloc error message on console:

TFTP from server 192.168.200.1; our IP address is 192.168.200.39
Filename 'u-boot.img'.
Load address: 0x82000000
Loading: #################################################################
#################################################################
#################################################################
          ######################################################  0 Bytes
          1.9 MiB/s
done
Bytes transferred = 1274816 (1373c0 hex)
common/dlmalloc.c:819: do_check_chunk: Assertion `(char*)p + sz <= (char*)top' 
failed.

I had tried many times, do_check_chunk not always failed, and sometimes it 
report common/dlmalloc.c:802: do_check_chunk: Assertion `!chunk_is_mmapped(p)' 
failed. The situation is not the same.

I got a bt stack when malloc failed:

(gdb) bt
#0  0x9ffb5684 in panic_finish () at lib/panic.c:23
#1  panic (fmt=0x9ffbd96b "%s:%u: %s: Assertion `%s' failed.") at lib/panic.c:49
#2  0x9ffb5696 in __assert_fail (assertion=<optimized out>, file=<optimized 
out>, line=<optimized out>, function=<optimized out>) at lib/panic.c:56
#3  0x9ff76910 in do_check_inuse_chunk (p=p@entry=0x9ffd7200) at 
common/dlmalloc.c:866
#4  0x9ff769d6 in do_check_malloced_chunk (p=p@entry=0x9ffd7200, s=s@entry=24) 
at common/dlmalloc.c:900
#5  0x9ff76da6 in malloc (bytes=<optimized out>) at common/dlmalloc.c:1552
#6  0x9ff96b72 in env_attr_walk (attr_list=<optimized out>, callback=0x9ff969f9 
<regex_callback>, priv=0x9df28dc8) at env/attr.c:70
#7  0x9ff96bc2 in env_attr_lookup (attr_list=<optimized out>, name=<optimized 
out>, attributes=0x9df28dec "") at env/attr.c:184
#8  0x9ff97146 in env_callback_init (var_entry=0x9df46f60) at env/callback.c:67
#9  0x9ffb36fc in hsearch_r (item=..., action=ENV_ENTER, retval=0x9df28f60, 
htab=0x9ffdbce8, flag=512) at lib/hashtable.c:403
#10 0x9ff7090e in _do_env_set (argc=<optimized out>, argv=<optimized out>, 
env_flag=512, flag=0) at cmd/nvedit.c:296
#11 0x9ff70b64 in env_set (varname=<optimized out>, varvalue=<optimized out>) at 
cmd/nvedit.c:318
#12 0x9ff6d522 in netboot_update_env () at cmd/net.c:133
#13 netboot_common (proto=DHCP, cmdtp=0x9ffdd0e8, argc=<optimized out>, 
argv=0x9df442c8) at cmd/net.c:268
#14 0x9ff783a4 in cmd_call (repeatable=0x9df29008, argv=0x9df442c8, argc=1, 
flag=0, cmdtp=0x9ffdd0e8) at common/command.c:580
#15 cmd_process (flag=<optimized out>, argc=1, argv=0x9df442c8, 
repeatable=0x9ffdf6a0, ticks=0x0) at common/command.c:635
#16 0x9ff71d16 in run_pipe_real (pi=0x9df44220) at common/cli_hush.c:1676
#17 run_list_real (pi=<optimized out>) at common/cli_hush.c:1873
#18 0x9ff71e28 in run_list (pi=0x9df44220) at common/cli_hush.c:2022
#19 parse_stream_outer (inp=inp@entry=0x9df290e8, flag=flag@entry=2) at 
common/cli_hush.c:3206
#20 0x9ff721ba in parse_file_outer () at common/cli_hush.c:3289
#21 0x9ff77c1a in cli_loop () at common/cli.c:229
#22 0x9ff70d3e in main_loop () at common/main.c:66
#23 0x9ff72672 in run_main_loop () at common/board_r.c:584
#24 0x9ff72830 in initcall_run_list (init_sequence=0x9ffd7224) at 
include/initcall.h:46
#25 board_init_r (new_gd=<optimized out>, dest_addr=<optimized out>) at 
common/board_r.c:822
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

>
> Best regards
>
> Heinrich
>
>> printed.
>>
>>>
>>> #define DEBUG 1
>>> #include <common.h>
>>> #include <log.h>
>>>
>>> Best regards
>>>
>>> Heinrich
>>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: data abort when run 'dhcp'
  2022-03-24  9:33             ` qianfan
@ 2022-03-25 10:04               ` qianfan
  2023-07-20 16:39                 ` Miquel Raynal
  0 siblings, 1 reply; 23+ messages in thread
From: qianfan @ 2022-03-25 10:04 UTC (permalink / raw)
  To: Heinrich Schuchardt; +Cc: u-boot

It's very strange. And I can't detect it's a bug of usb or dlmalloc.

1. Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok.

2. Starting u-boot and dhcp via am335x's usb net, data abort.

3. start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok.

在 2022/3/24 17:33, qianfan 写道:
>
> 在 2022/3/23 18:12, Heinrich Schuchardt 写道:
>> On 3/23/22 11:07, qianfan wrote:
>>>
>>> 在 2022/3/23 17:51, Heinrich Schuchardt 写道:
>>>> On 3/23/22 10:13, qianfan wrote:
>>>>>
>>>>> 在 2022/3/23 16:02, qianfan 写道:
>>>>>>
>>>>>>
>>>>>> 在 2022/3/23 15:45, qianfan 写道:
>>>>>>>
>>>>>>>
>>>>>>> 在 2022/3/23 10:28, qianfan 写道:
>>>>>>>>
>>>>>>>> Hi:
>>>>>>>>
>>>>>>>> I had a custom AM335X board connected my computer by usbnet. It
>>>>>>>> always report data abort when 'dhcp':
>>>>>>>>
>>>>>>>> Next it the log:
>>>>>>>>
>>>>>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02
>>>>>>>> +0800)
>>>>>>>>
>>>>>>>> CPU  : AM335X-GP rev 2.1
>>>>>>>> Model: WISDOM AM335X CCT
>>>>>>>> DRAM:  512 MiB
>>>>>>>> NAND:  256 MiB
>>>>>>>> MMC:   OMAP SD/MMC: 0
>>>>>>>> Loading Environment from NAND... *** Warning - bad CRC, using
>>>>>>>> default environment
>>>>>>>>
>>>>>>>> Net:   Could not get PHY for ethernet@4a100000: addr 0
>>>>>>>> eth2: ethernet@4a100000, eth3: usb_ether
>>>>>>>> Hit any key to stop autoboot:  0
>>>>>>>> => setenv autoload no
>>>>>>>> => dhcp
>>>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>>>>>>>> MAC de:ad:be:ef:00:01
>>>>>>>> HOST MAC de:ad:be:ef:00:00
>>>>>>>> RNDIS ready
>>>>>>>> musb-hdrc: peripheral reset irq lost!
>>>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>>>>>>>> USB RNDIS network up!
>>>>>>>> BOOTP broadcast 1
>>>>>>>> BOOTP broadcast 2
>>>>>>>> BOOTP broadcast 3
>>>>>>>> DHCP client bound to address 192.168.200.4 (757 ms)
>>>>>>>> data abort
>>>>>>>> pc : [<9fe9b0a2>]          lr : [<9febbc3f>]
>>>>>>>> reloc pc : [<808130a2>]    lr : [<80833c3f>]
>>>>>>>> sp : 9de53410  ip : 9de53578     fp : 00000001
>>>>>>>> r10: 9de5345c  r9 : 9de67e80     r8 : 9febbae5
>>>>>>>> r7 : 9de72c30  r6 : 9feec710     r5 : 0000000d  r4 : 00000018
>>>>>>>> r3 : 3fdd8e04  r2 : 00000002     r1 : 9feec728  r0 : 9feec700
>>>>>>>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
>>>>>>>> Code: f023 0303 60ca 4403 (6091) 685a
>>>>>>>> Resetting CPU ...
>>>>>>>>
>>>>>>>> resetting ...
>>>>>>>>
>>>>>>>>
>>>>>>>> It's there has any doc about how to debug data abort? Or is the bug
>>>>>>>> is already fixed?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>> This bug doesn't fixed on master code. I found v2021.01 is good and
>>>>>>> v2021.04-rc2 is bad.
>>>>>>>
>>>>>>> Also I had tested this on beaglebone black with am335x_evm_defconfig,
>>>>>>> has the simliar problem.
>>>>>>>
>>>>>>> find the first bug commit via 'git bisect': it told me that commit
>>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very
>>>>>>> strange due to this commit doesn't touch any dhcp or network code.
>>>>>>>
>>>>>>> ➜  u-boot-main git:(e97eb638de) ✗ git bisect bug
>>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit
>>>>>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917
>>>>>>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>>>>>> Date:   Wed Jan 20 22:21:53 2021 +0100
>>>>>>>
>>>>>>>     fs: fat: consistent error handling for flush_dir()
>>>>>>>
>>>>>>>     Provide function description for flush_dir().
>>>>>>>     Move all error messages for flush_dir() from the callers to the
>>>>>>> function.
>>>>>>>     Move mapping of errors to -EIO to the function.
>>>>>>>     Always check return value of flush_dir() (Coverity CID 316362).
>>>>>>>
>>>>>>>     In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
>>>>>>>
>>>>>>>     Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>>>>>>
>>>>>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e
>>>>>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M      fs
>>>>>>>
>>>>>>>
>>>>>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before
>>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917:
>>>>>>>
>>>>>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
>>>>>>> handling for flush_dir()
>>>>>>> *   184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag
>>>>>>> 'u-boot-rockchip-20210121' of
>>>>>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip
>>>>>>> |\
>>>>>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc
>>>>>>> based PCIe controller driver
>>>>>>>
>>>>>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
>>>>>>>
>>>>>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
>>>>>>>
>>>>>>> CPU  : AM335X-GP rev 2.1
>>>>>>> Model: TI AM335x BeagleBone Black
>>>>>>> DRAM:  512 MiB
>>>>>>> WDT:   Started with servicing (60s timeout)
>>>>>>> NAND:  0 MiB
>>>>>>> MMC:   OMAP SD/MMC: 0, OMAP SD/MMC: 1
>>>>>>> Loading Environment from FAT... <ethaddr> not set. Validating first
>>>>>>> E-fuse MAC
>>>>>>> Net:   eth2: ethernet@4a100000, eth3: usb_ether
>>>>>>> Hit any key to stop autoboot:  0
>>>>>>> => dhcp
>>>>>>> ethernet@4a100000 Waiting for PHY auto negotiation to
>>>>>>> complete......... TIMEOUT !
>>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>>>>>>> MAC de:ad:be:ef:00:01
>>>>>>> HOST MAC de:ad:be:ef:00:00
>>>>>>> RNDIS ready
>>>>>>> musb-hdrc: peripheral reset irq lost!
>>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>>>>>>> USB RNDIS network up!
>>>>>>> BOOTP broadcast 1
>>>>>>> BOOTP broadcast 2
>>>>>>> BOOTP broadcast 3
>>>>>>> DHCP client bound to address 192.168.200.157 (757 ms)
>>>>>>> Using usb_ether device
>>>>>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157
>>>>>>> Filename 'u-boot.img'.
>>>>>>> Load address: 0x82000000
>>>>>>> Loading:
>>>>>>> #################################################################
>>>>>>> #################################################################
>>>>>>> #################################################################
>>>>>>>          #########################
>>>>>>>          2.5 MiB/s
>>>>>>> done
>>>>>>> Bytes transferred = 1123888 (112630 hex)
>>>>>>> =>
>>>>>>>
>>>>> "data abort" messages:
>>>>>
>>>>> data abort
>>>>> pc : [<9ff8196c>]          lr : [<9ffa1cd7>]
>>>>> reloc pc : [<8081496c>]    lr : [<80834cd7>]
>>>>> sp : 9df38e60  ip : 9df38fc8     fp : 00000001
>>>>> r10: 9df38eac  r9 : 9df4ceb0     r8 : 9ffa1b7d
>>>>> r7 : 9df52fd0  r6 : 9ffdbba8     r5 : 0000000d  r4 : 00000018
>>>>> r3 : 3ff589e0  r2 : 9ffafa11     r1 : 9ffdbbc0  r0 : 9ffdbb00
>>>>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
>>>>> Code: 0303 60ca 4403 6091 (685a) f042
>>>>> Resetting CPU ...
>>>>>
>>>>> objdump u-boot:pc is in malloc and lr is in env_attr_walk
>>>>>
>>>>>        unlink(victim, bck, fwd);
>>>>> 80814966:    60ca          str    r2, [r1, #12]
>>>>>        set_inuse_bit_at_offset(victim, victim_size);
>>>>> 80814968:    4403          add    r3, r0
>>>>>        unlink(victim, bck, fwd);
>>>>> 8081496a:    6091          str    r1, [r2, #8]
>>>>>      set_inuse_bit_at_offset(victim, victim_size);
>>>>> 8081496c:    685a          ldr    r2, [r3, #4]
>>>>> 8081496e:    f042 0201     orr.w    r2, r2, #1
>>>>> 80814972:    605a          str    r2, [r3, #4]
>>>>>
>>>>> r3 is 3ff589e0 and it's not a valid ram address on am335x.
>>>>>
>>>>>
>>>>
>>>> I have seen crashes in common/dlmalloc.c before after double free() or
>>>> free() with an incorrect pointer.
>>>>
>>>> The assert() statements in do_check_inuse_chunk() are meant to catch
>>>> this but assert() as defined in include/log.h does not stop the code and
>>>> even does not print without _DEBUG=1.
>>>>
>>>> You should be able to get the assert output with
>>>>
>>>> #include <common.h>
>>>> #define _DEBUG 1
>>>> #include <log.h>
>>>>
>>>> at the top of common/dlmalloc.c.
>>>>
>>>> You should get full malloc debug output with
>>>
>>> Hi: I had try add DEBUG marco before <log.h> and no other malloc message
>>
>> assert() checks for _DEBUG. Defining DEBUG after common.h will not
>> define _DEBUG.
>
> Finally I got a malloc error message on console:
>
> TFTP from server 192.168.200.1; our IP address is 192.168.200.39
> Filename 'u-boot.img'.
> Load address: 0x82000000
> Loading: #################################################################
> #################################################################
> #################################################################
>          ######################################################  0 Bytes
>          1.9 MiB/s
> done
> Bytes transferred = 1274816 (1373c0 hex)
> common/dlmalloc.c:819: do_check_chunk: Assertion `(char*)p + sz <= (char*)top' 
> failed.
>
> I had tried many times, do_check_chunk not always failed, and sometimes it 
> report common/dlmalloc.c:802: do_check_chunk: Assertion `!chunk_is_mmapped(p)' 
> failed. The situation is not the same.
>
> I got a bt stack when malloc failed:
>
> (gdb) bt
> #0  0x9ffb5684 in panic_finish () at lib/panic.c:23
> #1  panic (fmt=0x9ffbd96b "%s:%u: %s: Assertion `%s' failed.") at lib/panic.c:49
> #2  0x9ffb5696 in __assert_fail (assertion=<optimized out>, file=<optimized 
> out>, line=<optimized out>, function=<optimized out>) at lib/panic.c:56
> #3  0x9ff76910 in do_check_inuse_chunk (p=p@entry=0x9ffd7200) at 
> common/dlmalloc.c:866
> #4  0x9ff769d6 in do_check_malloced_chunk (p=p@entry=0x9ffd7200, s=s@entry=24) 
> at common/dlmalloc.c:900
> #5  0x9ff76da6 in malloc (bytes=<optimized out>) at common/dlmalloc.c:1552
> #6  0x9ff96b72 in env_attr_walk (attr_list=<optimized out>, 
> callback=0x9ff969f9 <regex_callback>, priv=0x9df28dc8) at env/attr.c:70
> #7  0x9ff96bc2 in env_attr_lookup (attr_list=<optimized out>, name=<optimized 
> out>, attributes=0x9df28dec "") at env/attr.c:184
> #8  0x9ff97146 in env_callback_init (var_entry=0x9df46f60) at env/callback.c:67
> #9  0x9ffb36fc in hsearch_r (item=..., action=ENV_ENTER, retval=0x9df28f60, 
> htab=0x9ffdbce8, flag=512) at lib/hashtable.c:403
> #10 0x9ff7090e in _do_env_set (argc=<optimized out>, argv=<optimized out>, 
> env_flag=512, flag=0) at cmd/nvedit.c:296
> #11 0x9ff70b64 in env_set (varname=<optimized out>, varvalue=<optimized out>) 
> at cmd/nvedit.c:318
> #12 0x9ff6d522 in netboot_update_env () at cmd/net.c:133
> #13 netboot_common (proto=DHCP, cmdtp=0x9ffdd0e8, argc=<optimized out>, 
> argv=0x9df442c8) at cmd/net.c:268
> #14 0x9ff783a4 in cmd_call (repeatable=0x9df29008, argv=0x9df442c8, argc=1, 
> flag=0, cmdtp=0x9ffdd0e8) at common/command.c:580
> #15 cmd_process (flag=<optimized out>, argc=1, argv=0x9df442c8, 
> repeatable=0x9ffdf6a0, ticks=0x0) at common/command.c:635
> #16 0x9ff71d16 in run_pipe_real (pi=0x9df44220) at common/cli_hush.c:1676
> #17 run_list_real (pi=<optimized out>) at common/cli_hush.c:1873
> #18 0x9ff71e28 in run_list (pi=0x9df44220) at common/cli_hush.c:2022
> #19 parse_stream_outer (inp=inp@entry=0x9df290e8, flag=flag@entry=2) at 
> common/cli_hush.c:3206
> #20 0x9ff721ba in parse_file_outer () at common/cli_hush.c:3289
> #21 0x9ff77c1a in cli_loop () at common/cli.c:229
> #22 0x9ff70d3e in main_loop () at common/main.c:66
> #23 0x9ff72672 in run_main_loop () at common/board_r.c:584
> #24 0x9ff72830 in initcall_run_list (init_sequence=0x9ffd7224) at 
> include/initcall.h:46
> #25 board_init_r (new_gd=<optimized out>, dest_addr=<optimized out>) at 
> common/board_r.c:822
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>
>>
>> Best regards
>>
>> Heinrich
>>
>>> printed.
>>>
>>>>
>>>> #define DEBUG 1
>>>> #include <common.h>
>>>> #include <log.h>
>>>>
>>>> Best regards
>>>>
>>>> Heinrich
>>>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: data abort when run 'dhcp'
  2022-03-25 10:04               ` qianfan
@ 2023-07-20 16:39                 ` Miquel Raynal
  2023-07-20 17:55                   ` Heinrich Schuchardt
                                     ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Miquel Raynal @ 2023-07-20 16:39 UTC (permalink / raw)
  To: qianfan; +Cc: Heinrich Schuchardt, u-boot, Tom Rini

Hello,

qianfanguijin@163.com wrote on Fri, 25 Mar 2022 18:04:46 +0800:

> It's very strange. And I can't detect it's a bug of usb or dlmalloc.
> 
> 1. Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok.
> 
> 2. Starting u-boot and dhcp via am335x's usb net, data abort.
> 
> 3. start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok.

I am sorry to re-open a thread that is one year old but this is
still an open bug. The BBB is affected. In particular the BBBW
because there is no Ethernet connector, which makes the Eth-over-USB
emulation even more important. All U-Boots since 2021 are affected:
spurious data aborts, usually at the end of network interactions (tftp,
ping). I could not bisect it because the boot was deeply broken as
well on a significant range of commits :-/.

On my side I narrowed it down to an env update which fails in malloc as
well. If I comment the env update, it fails a bit later. It really
looks like a stack corruption which is either related to the Ethernet
USB gadget or the USB controller driver itself. Network transfers on
the BBBW using regular Ethernet does not trigger any error.

I also observe the very strange "fix" mentioned above: starting and
killing fastboot makes all tftp pass... If anyone has more details to
share, or perhaps a subsequent thread giving more details, I would
really like to see this fixed upstream, I suppose I am not the only one
:-)

Thanks,
Miquèl

> 
> 在 2022/3/24 17:33, qianfan 写道:
> >
> > 在 2022/3/23 18:12, Heinrich Schuchardt 写道:  
> >> On 3/23/22 11:07, qianfan wrote:  
> >>>
> >>> 在 2022/3/23 17:51, Heinrich Schuchardt 写道:  
> >>>> On 3/23/22 10:13, qianfan wrote:  
> >>>>>
> >>>>> 在 2022/3/23 16:02, qianfan 写道:  
> >>>>>>
> >>>>>>
> >>>>>> 在 2022/3/23 15:45, qianfan 写道:  
> >>>>>>>
> >>>>>>>
> >>>>>>> 在 2022/3/23 10:28, qianfan 写道:  
> >>>>>>>>
> >>>>>>>> Hi:
> >>>>>>>>
> >>>>>>>> I had a custom AM335X board connected my computer by usbnet. It
> >>>>>>>> always report data abort when 'dhcp':
> >>>>>>>>
> >>>>>>>> Next it the log:
> >>>>>>>>
> >>>>>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02
> >>>>>>>> +0800)
> >>>>>>>>
> >>>>>>>> CPU  : AM335X-GP rev 2.1
> >>>>>>>> Model: WISDOM AM335X CCT
> >>>>>>>> DRAM:  512 MiB
> >>>>>>>> NAND:  256 MiB
> >>>>>>>> MMC:   OMAP SD/MMC: 0
> >>>>>>>> Loading Environment from NAND... *** Warning - bad CRC, using
> >>>>>>>> default environment
> >>>>>>>>
> >>>>>>>> Net:   Could not get PHY for ethernet@4a100000: addr 0
> >>>>>>>> eth2: ethernet@4a100000, eth3: usb_ether
> >>>>>>>> Hit any key to stop autoboot:  0  
> >>>>>>>> => setenv autoload no
> >>>>>>>> => dhcp  
> >>>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
> >>>>>>>> MAC de:ad:be:ef:00:01
> >>>>>>>> HOST MAC de:ad:be:ef:00:00
> >>>>>>>> RNDIS ready
> >>>>>>>> musb-hdrc: peripheral reset irq lost!
> >>>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
> >>>>>>>> USB RNDIS network up!
> >>>>>>>> BOOTP broadcast 1
> >>>>>>>> BOOTP broadcast 2
> >>>>>>>> BOOTP broadcast 3
> >>>>>>>> DHCP client bound to address 192.168.200.4 (757 ms)
> >>>>>>>> data abort
> >>>>>>>> pc : [<9fe9b0a2>]          lr : [<9febbc3f>]
> >>>>>>>> reloc pc : [<808130a2>]    lr : [<80833c3f>]
> >>>>>>>> sp : 9de53410  ip : 9de53578     fp : 00000001
> >>>>>>>> r10: 9de5345c  r9 : 9de67e80     r8 : 9febbae5
> >>>>>>>> r7 : 9de72c30  r6 : 9feec710     r5 : 0000000d  r4 : 00000018
> >>>>>>>> r3 : 3fdd8e04  r2 : 00000002     r1 : 9feec728  r0 : 9feec700
> >>>>>>>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
> >>>>>>>> Code: f023 0303 60ca 4403 (6091) 685a
> >>>>>>>> Resetting CPU ...
> >>>>>>>>
> >>>>>>>> resetting ...
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> It's there has any doc about how to debug data abort? Or is the bug
> >>>>>>>> is already fixed?
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>  
> >>>>>>> This bug doesn't fixed on master code. I found v2021.01 is good and
> >>>>>>> v2021.04-rc2 is bad.
> >>>>>>>
> >>>>>>> Also I had tested this on beaglebone black with am335x_evm_defconfig,
> >>>>>>> has the simliar problem.
> >>>>>>>
> >>>>>>> find the first bug commit via 'git bisect': it told me that commit
> >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very
> >>>>>>> strange due to this commit doesn't touch any dhcp or network code.
> >>>>>>>
> >>>>>>> ➜  u-boot-main git:(e97eb638de) ✗ git bisect bug
> >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit
> >>>>>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917
> >>>>>>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de>
> >>>>>>> Date:   Wed Jan 20 22:21:53 2021 +0100
> >>>>>>>
> >>>>>>>     fs: fat: consistent error handling for flush_dir()
> >>>>>>>
> >>>>>>>     Provide function description for flush_dir().
> >>>>>>>     Move all error messages for flush_dir() from the callers to the
> >>>>>>> function.
> >>>>>>>     Move mapping of errors to -EIO to the function.
> >>>>>>>     Always check return value of flush_dir() (Coverity CID 316362).
> >>>>>>>
> >>>>>>>     In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
> >>>>>>>
> >>>>>>>     Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
> >>>>>>>
> >>>>>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e
> >>>>>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M      fs
> >>>>>>>
> >>>>>>>
> >>>>>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before
> >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917:
> >>>>>>>
> >>>>>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
> >>>>>>> handling for flush_dir()
> >>>>>>> *   184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag
> >>>>>>> 'u-boot-rockchip-20210121' of
> >>>>>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip
> >>>>>>> |\
> >>>>>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc
> >>>>>>> based PCIe controller driver
> >>>>>>>
> >>>>>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
> >>>>>>>
> >>>>>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
> >>>>>>>
> >>>>>>> CPU  : AM335X-GP rev 2.1
> >>>>>>> Model: TI AM335x BeagleBone Black
> >>>>>>> DRAM:  512 MiB
> >>>>>>> WDT:   Started with servicing (60s timeout)
> >>>>>>> NAND:  0 MiB
> >>>>>>> MMC:   OMAP SD/MMC: 0, OMAP SD/MMC: 1
> >>>>>>> Loading Environment from FAT... <ethaddr> not set. Validating first
> >>>>>>> E-fuse MAC
> >>>>>>> Net:   eth2: ethernet@4a100000, eth3: usb_ether
> >>>>>>> Hit any key to stop autoboot:  0  
> >>>>>>> => dhcp  
> >>>>>>> ethernet@4a100000 Waiting for PHY auto negotiation to
> >>>>>>> complete......... TIMEOUT !
> >>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
> >>>>>>> MAC de:ad:be:ef:00:01
> >>>>>>> HOST MAC de:ad:be:ef:00:00
> >>>>>>> RNDIS ready
> >>>>>>> musb-hdrc: peripheral reset irq lost!
> >>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
> >>>>>>> USB RNDIS network up!
> >>>>>>> BOOTP broadcast 1
> >>>>>>> BOOTP broadcast 2
> >>>>>>> BOOTP broadcast 3
> >>>>>>> DHCP client bound to address 192.168.200.157 (757 ms)
> >>>>>>> Using usb_ether device
> >>>>>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157
> >>>>>>> Filename 'u-boot.img'.
> >>>>>>> Load address: 0x82000000
> >>>>>>> Loading:
> >>>>>>> #################################################################
> >>>>>>> #################################################################
> >>>>>>> #################################################################
> >>>>>>>          #########################
> >>>>>>>          2.5 MiB/s
> >>>>>>> done
> >>>>>>> Bytes transferred = 1123888 (112630 hex)  
> >>>>>>> =>  
> >>>>>>>  
> >>>>> "data abort" messages:
> >>>>>
> >>>>> data abort
> >>>>> pc : [<9ff8196c>]          lr : [<9ffa1cd7>]
> >>>>> reloc pc : [<8081496c>]    lr : [<80834cd7>]
> >>>>> sp : 9df38e60  ip : 9df38fc8     fp : 00000001
> >>>>> r10: 9df38eac  r9 : 9df4ceb0     r8 : 9ffa1b7d
> >>>>> r7 : 9df52fd0  r6 : 9ffdbba8     r5 : 0000000d  r4 : 00000018
> >>>>> r3 : 3ff589e0  r2 : 9ffafa11     r1 : 9ffdbbc0  r0 : 9ffdbb00
> >>>>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
> >>>>> Code: 0303 60ca 4403 6091 (685a) f042
> >>>>> Resetting CPU ...
> >>>>>
> >>>>> objdump u-boot:pc is in malloc and lr is in env_attr_walk
> >>>>>
> >>>>>        unlink(victim, bck, fwd);
> >>>>> 80814966:    60ca          str    r2, [r1, #12]
> >>>>>        set_inuse_bit_at_offset(victim, victim_size);
> >>>>> 80814968:    4403          add    r3, r0
> >>>>>        unlink(victim, bck, fwd);
> >>>>> 8081496a:    6091          str    r1, [r2, #8]
> >>>>>      set_inuse_bit_at_offset(victim, victim_size);
> >>>>> 8081496c:    685a          ldr    r2, [r3, #4]
> >>>>> 8081496e:    f042 0201     orr.w    r2, r2, #1
> >>>>> 80814972:    605a          str    r2, [r3, #4]
> >>>>>
> >>>>> r3 is 3ff589e0 and it's not a valid ram address on am335x.
> >>>>>
> >>>>>  
> >>>>
> >>>> I have seen crashes in common/dlmalloc.c before after double free() or
> >>>> free() with an incorrect pointer.
> >>>>
> >>>> The assert() statements in do_check_inuse_chunk() are meant to catch
> >>>> this but assert() as defined in include/log.h does not stop the code and
> >>>> even does not print without _DEBUG=1.
> >>>>
> >>>> You should be able to get the assert output with
> >>>>
> >>>> #include <common.h>
> >>>> #define _DEBUG 1
> >>>> #include <log.h>
> >>>>
> >>>> at the top of common/dlmalloc.c.
> >>>>
> >>>> You should get full malloc debug output with  
> >>>
> >>> Hi: I had try add DEBUG marco before <log.h> and no other malloc message  
> >>
> >> assert() checks for _DEBUG. Defining DEBUG after common.h will not
> >> define _DEBUG.  
> >
> > Finally I got a malloc error message on console:
> >
> > TFTP from server 192.168.200.1; our IP address is 192.168.200.39
> > Filename 'u-boot.img'.
> > Load address: 0x82000000
> > Loading: #################################################################
> > #################################################################
> > #################################################################
> >          ######################################################  0 Bytes
> >          1.9 MiB/s
> > done
> > Bytes transferred = 1274816 (1373c0 hex)
> > common/dlmalloc.c:819: do_check_chunk: Assertion `(char*)p + sz <= (char*)top' > failed.
> >
> > I had tried many times, do_check_chunk not always failed, and sometimes it > report common/dlmalloc.c:802: do_check_chunk: Assertion `!chunk_is_mmapped(p)' > failed. The situation is not the same.
> >
> > I got a bt stack when malloc failed:
> >
> > (gdb) bt
> > #0  0x9ffb5684 in panic_finish () at lib/panic.c:23
> > #1  panic (fmt=0x9ffbd96b "%s:%u: %s: Assertion `%s' failed.") at lib/panic.c:49
> > #2  0x9ffb5696 in __assert_fail (assertion=<optimized out>, file=<optimized > out>, line=<optimized out>, function=<optimized out>) at lib/panic.c:56
> > #3  0x9ff76910 in do_check_inuse_chunk (p=p@entry=0x9ffd7200) at > common/dlmalloc.c:866
> > #4  0x9ff769d6 in do_check_malloced_chunk (p=p@entry=0x9ffd7200, s=s@entry=24) > at common/dlmalloc.c:900
> > #5  0x9ff76da6 in malloc (bytes=<optimized out>) at common/dlmalloc.c:1552
> > #6  0x9ff96b72 in env_attr_walk (attr_list=<optimized out>, > callback=0x9ff969f9 <regex_callback>, priv=0x9df28dc8) at env/attr.c:70
> > #7  0x9ff96bc2 in env_attr_lookup (attr_list=<optimized out>, name=<optimized > out>, attributes=0x9df28dec "") at env/attr.c:184
> > #8  0x9ff97146 in env_callback_init (var_entry=0x9df46f60) at env/callback.c:67
> > #9  0x9ffb36fc in hsearch_r (item=..., action=ENV_ENTER, retval=0x9df28f60, > htab=0x9ffdbce8, flag=512) at lib/hashtable.c:403
> > #10 0x9ff7090e in _do_env_set (argc=<optimized out>, argv=<optimized out>, > env_flag=512, flag=0) at cmd/nvedit.c:296
> > #11 0x9ff70b64 in env_set (varname=<optimized out>, varvalue=<optimized out>) > at cmd/nvedit.c:318
> > #12 0x9ff6d522 in netboot_update_env () at cmd/net.c:133
> > #13 netboot_common (proto=DHCP, cmdtp=0x9ffdd0e8, argc=<optimized out>, > argv=0x9df442c8) at cmd/net.c:268
> > #14 0x9ff783a4 in cmd_call (repeatable=0x9df29008, argv=0x9df442c8, argc=1, > flag=0, cmdtp=0x9ffdd0e8) at common/command.c:580
> > #15 cmd_process (flag=<optimized out>, argc=1, argv=0x9df442c8, > repeatable=0x9ffdf6a0, ticks=0x0) at common/command.c:635
> > #16 0x9ff71d16 in run_pipe_real (pi=0x9df44220) at common/cli_hush.c:1676
> > #17 run_list_real (pi=<optimized out>) at common/cli_hush.c:1873
> > #18 0x9ff71e28 in run_list (pi=0x9df44220) at common/cli_hush.c:2022
> > #19 parse_stream_outer (inp=inp@entry=0x9df290e8, flag=flag@entry=2) at > common/cli_hush.c:3206
> > #20 0x9ff721ba in parse_file_outer () at common/cli_hush.c:3289
> > #21 0x9ff77c1a in cli_loop () at common/cli.c:229
> > #22 0x9ff70d3e in main_loop () at common/main.c:66
> > #23 0x9ff72672 in run_main_loop () at common/board_r.c:584
> > #24 0x9ff72830 in initcall_run_list (init_sequence=0x9ffd7224) at > include/initcall.h:46
> > #25 board_init_r (new_gd=<optimized out>, dest_addr=<optimized out>) at > common/board_r.c:822
> > Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> >  
> >>
> >> Best regards
> >>
> >> Heinrich
> >>  
> >>> printed.
> >>>  
> >>>>
> >>>> #define DEBUG 1
> >>>> #include <common.h>
> >>>> #include <log.h>
> >>>>
> >>>> Best regards
> >>>>
> >>>> Heinrich  
> >>>  

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: data abort when run 'dhcp'
  2023-07-20 16:39                 ` Miquel Raynal
@ 2023-07-20 17:55                   ` Heinrich Schuchardt
  2023-07-21 11:54                     ` Miquel Raynal
  2023-07-20 18:34                   ` Tom Rini
  2023-07-21  0:31                   ` qianfan
  2 siblings, 1 reply; 23+ messages in thread
From: Heinrich Schuchardt @ 2023-07-20 17:55 UTC (permalink / raw)
  To: Miquel Raynal, qianfan; +Cc: u-boot, Tom Rini



Am 20. Juli 2023 18:39:17 MESZ schrieb Miquel Raynal <miquel.raynal@bootlin.com>:
>Hello,
>
>qianfanguijin@163.com wrote on Fri, 25 Mar 2022 18:04:46 +0800:
>
>> It's very strange. And I can't detect it's a bug of usb or dlmalloc.
>> 
>> 1. Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok.
>> 
>> 2. Starting u-boot and dhcp via am335x's usb net, data abort.
>> 
>> 3. start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok.
>
>I am sorry to re-open a thread that is one year old but this is
>still an open bug. The BBB is affected. In particular the BBBW
>because there is no Ethernet connector, which makes the Eth-over-USB
>emulation even more important. All U-Boots since 2021 are affected:
>spurious data aborts, usually at the end of network interactions (tftp,
>ping). I could not bisect it because the boot was deeply broken as
>well on a significant range of commits :-/.
>
>On my side I narrowed it down to an env update which fails in malloc as
>well. If I comment the env update, it fails a bit later. It really
>looks like a stack corruption which is either related to the Ethernet
>USB gadget or the USB controller driver itself. Network transfers on
>the BBBW using regular Ethernet does not trigger any error.
>
>I also observe the very strange "fix" mentioned above: starting and
>killing fastboot makes all tftp pass... If anyone has more details to
>share, or perhaps a subsequent thread giving more details, I would
>really like to see this fixed upstream, I suppose I am not the only one
>:-)
>
>Thanks,
>Miquèl


Can this problem be reproduced on QEMU?

Best regards

Heinrich 

>
>> 
>> 在 2022/3/24 17:33, qianfan 写道:
>> >
>> > 在 2022/3/23 18:12, Heinrich Schuchardt 写道:  
>> >> On 3/23/22 11:07, qianfan wrote:  
>> >>>
>> >>> 在 2022/3/23 17:51, Heinrich Schuchardt 写道:  
>> >>>> On 3/23/22 10:13, qianfan wrote:  
>> >>>>>
>> >>>>> 在 2022/3/23 16:02, qianfan 写道:  
>> >>>>>>
>> >>>>>>
>> >>>>>> 在 2022/3/23 15:45, qianfan 写道:  
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> 在 2022/3/23 10:28, qianfan 写道:  
>> >>>>>>>>
>> >>>>>>>> Hi:
>> >>>>>>>>
>> >>>>>>>> I had a custom AM335X board connected my computer by usbnet. It
>> >>>>>>>> always report data abort when 'dhcp':
>> >>>>>>>>
>> >>>>>>>> Next it the log:
>> >>>>>>>>
>> >>>>>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02
>> >>>>>>>> +0800)
>> >>>>>>>>
>> >>>>>>>> CPU  : AM335X-GP rev 2.1
>> >>>>>>>> Model: WISDOM AM335X CCT
>> >>>>>>>> DRAM:  512 MiB
>> >>>>>>>> NAND:  256 MiB
>> >>>>>>>> MMC:   OMAP SD/MMC: 0
>> >>>>>>>> Loading Environment from NAND... *** Warning - bad CRC, using
>> >>>>>>>> default environment
>> >>>>>>>>
>> >>>>>>>> Net:   Could not get PHY for ethernet@4a100000: addr 0
>> >>>>>>>> eth2: ethernet@4a100000, eth3: usb_ether
>> >>>>>>>> Hit any key to stop autoboot:  0  
>> >>>>>>>> => setenv autoload no
>> >>>>>>>> => dhcp  
>> >>>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>> >>>>>>>> MAC de:ad:be:ef:00:01
>> >>>>>>>> HOST MAC de:ad:be:ef:00:00
>> >>>>>>>> RNDIS ready
>> >>>>>>>> musb-hdrc: peripheral reset irq lost!
>> >>>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>> >>>>>>>> USB RNDIS network up!
>> >>>>>>>> BOOTP broadcast 1
>> >>>>>>>> BOOTP broadcast 2
>> >>>>>>>> BOOTP broadcast 3
>> >>>>>>>> DHCP client bound to address 192.168.200.4 (757 ms)
>> >>>>>>>> data abort
>> >>>>>>>> pc : [<9fe9b0a2>]          lr : [<9febbc3f>]
>> >>>>>>>> reloc pc : [<808130a2>]    lr : [<80833c3f>]
>> >>>>>>>> sp : 9de53410  ip : 9de53578     fp : 00000001
>> >>>>>>>> r10: 9de5345c  r9 : 9de67e80     r8 : 9febbae5
>> >>>>>>>> r7 : 9de72c30  r6 : 9feec710     r5 : 0000000d  r4 : 00000018
>> >>>>>>>> r3 : 3fdd8e04  r2 : 00000002     r1 : 9feec728  r0 : 9feec700
>> >>>>>>>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
>> >>>>>>>> Code: f023 0303 60ca 4403 (6091) 685a
>> >>>>>>>> Resetting CPU ...
>> >>>>>>>>
>> >>>>>>>> resetting ...
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> It's there has any doc about how to debug data abort? Or is the bug
>> >>>>>>>> is already fixed?
>> >>>>>>>>
>> >>>>>>>> Thanks
>> >>>>>>>>  
>> >>>>>>> This bug doesn't fixed on master code. I found v2021.01 is good and
>> >>>>>>> v2021.04-rc2 is bad.
>> >>>>>>>
>> >>>>>>> Also I had tested this on beaglebone black with am335x_evm_defconfig,
>> >>>>>>> has the simliar problem.
>> >>>>>>>
>> >>>>>>> find the first bug commit via 'git bisect': it told me that commit
>> >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very
>> >>>>>>> strange due to this commit doesn't touch any dhcp or network code.
>> >>>>>>>
>> >>>>>>> ➜  u-boot-main git:(e97eb638de) ✗ git bisect bug
>> >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit
>> >>>>>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917
>> >>>>>>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de>
>> >>>>>>> Date:   Wed Jan 20 22:21:53 2021 +0100
>> >>>>>>>
>> >>>>>>>     fs: fat: consistent error handling for flush_dir()
>> >>>>>>>
>> >>>>>>>     Provide function description for flush_dir().
>> >>>>>>>     Move all error messages for flush_dir() from the callers to the
>> >>>>>>> function.
>> >>>>>>>     Move mapping of errors to -EIO to the function.
>> >>>>>>>     Always check return value of flush_dir() (Coverity CID 316362).
>> >>>>>>>
>> >>>>>>>     In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
>> >>>>>>>
>> >>>>>>>     Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
>> >>>>>>>
>> >>>>>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e
>> >>>>>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M      fs
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before
>> >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917:
>> >>>>>>>
>> >>>>>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
>> >>>>>>> handling for flush_dir()
>> >>>>>>> *   184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag
>> >>>>>>> 'u-boot-rockchip-20210121' of
>> >>>>>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip
>> >>>>>>> |\
>> >>>>>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc
>> >>>>>>> based PCIe controller driver
>> >>>>>>>
>> >>>>>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
>> >>>>>>>
>> >>>>>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
>> >>>>>>>
>> >>>>>>> CPU  : AM335X-GP rev 2.1
>> >>>>>>> Model: TI AM335x BeagleBone Black
>> >>>>>>> DRAM:  512 MiB
>> >>>>>>> WDT:   Started with servicing (60s timeout)
>> >>>>>>> NAND:  0 MiB
>> >>>>>>> MMC:   OMAP SD/MMC: 0, OMAP SD/MMC: 1
>> >>>>>>> Loading Environment from FAT... <ethaddr> not set. Validating first
>> >>>>>>> E-fuse MAC
>> >>>>>>> Net:   eth2: ethernet@4a100000, eth3: usb_ether
>> >>>>>>> Hit any key to stop autoboot:  0  
>> >>>>>>> => dhcp  
>> >>>>>>> ethernet@4a100000 Waiting for PHY auto negotiation to
>> >>>>>>> complete......... TIMEOUT !
>> >>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>> >>>>>>> MAC de:ad:be:ef:00:01
>> >>>>>>> HOST MAC de:ad:be:ef:00:00
>> >>>>>>> RNDIS ready
>> >>>>>>> musb-hdrc: peripheral reset irq lost!
>> >>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>> >>>>>>> USB RNDIS network up!
>> >>>>>>> BOOTP broadcast 1
>> >>>>>>> BOOTP broadcast 2
>> >>>>>>> BOOTP broadcast 3
>> >>>>>>> DHCP client bound to address 192.168.200.157 (757 ms)
>> >>>>>>> Using usb_ether device
>> >>>>>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157
>> >>>>>>> Filename 'u-boot.img'.
>> >>>>>>> Load address: 0x82000000
>> >>>>>>> Loading:
>> >>>>>>> #################################################################
>> >>>>>>> #################################################################
>> >>>>>>> #################################################################
>> >>>>>>>          #########################
>> >>>>>>>          2.5 MiB/s
>> >>>>>>> done
>> >>>>>>> Bytes transferred = 1123888 (112630 hex)  
>> >>>>>>> =>  
>> >>>>>>>  
>> >>>>> "data abort" messages:
>> >>>>>
>> >>>>> data abort
>> >>>>> pc : [<9ff8196c>]          lr : [<9ffa1cd7>]
>> >>>>> reloc pc : [<8081496c>]    lr : [<80834cd7>]
>> >>>>> sp : 9df38e60  ip : 9df38fc8     fp : 00000001
>> >>>>> r10: 9df38eac  r9 : 9df4ceb0     r8 : 9ffa1b7d
>> >>>>> r7 : 9df52fd0  r6 : 9ffdbba8     r5 : 0000000d  r4 : 00000018
>> >>>>> r3 : 3ff589e0  r2 : 9ffafa11     r1 : 9ffdbbc0  r0 : 9ffdbb00
>> >>>>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
>> >>>>> Code: 0303 60ca 4403 6091 (685a) f042
>> >>>>> Resetting CPU ...
>> >>>>>
>> >>>>> objdump u-boot:pc is in malloc and lr is in env_attr_walk
>> >>>>>
>> >>>>>        unlink(victim, bck, fwd);
>> >>>>> 80814966:    60ca          str    r2, [r1, #12]
>> >>>>>        set_inuse_bit_at_offset(victim, victim_size);
>> >>>>> 80814968:    4403          add    r3, r0
>> >>>>>        unlink(victim, bck, fwd);
>> >>>>> 8081496a:    6091          str    r1, [r2, #8]
>> >>>>>      set_inuse_bit_at_offset(victim, victim_size);
>> >>>>> 8081496c:    685a          ldr    r2, [r3, #4]
>> >>>>> 8081496e:    f042 0201     orr.w    r2, r2, #1
>> >>>>> 80814972:    605a          str    r2, [r3, #4]
>> >>>>>
>> >>>>> r3 is 3ff589e0 and it's not a valid ram address on am335x.
>> >>>>>
>> >>>>>  
>> >>>>
>> >>>> I have seen crashes in common/dlmalloc.c before after double free() or
>> >>>> free() with an incorrect pointer.
>> >>>>
>> >>>> The assert() statements in do_check_inuse_chunk() are meant to catch
>> >>>> this but assert() as defined in include/log.h does not stop the code and
>> >>>> even does not print without _DEBUG=1.
>> >>>>
>> >>>> You should be able to get the assert output with
>> >>>>
>> >>>> #include <common.h>
>> >>>> #define _DEBUG 1
>> >>>> #include <log.h>
>> >>>>
>> >>>> at the top of common/dlmalloc.c.
>> >>>>
>> >>>> You should get full malloc debug output with  
>> >>>
>> >>> Hi: I had try add DEBUG marco before <log.h> and no other malloc message  
>> >>
>> >> assert() checks for _DEBUG. Defining DEBUG after common.h will not
>> >> define _DEBUG.  
>> >
>> > Finally I got a malloc error message on console:
>> >
>> > TFTP from server 192.168.200.1; our IP address is 192.168.200.39
>> > Filename 'u-boot.img'.
>> > Load address: 0x82000000
>> > Loading: #################################################################
>> > #################################################################
>> > #################################################################
>> >          ######################################################  0 Bytes
>> >          1.9 MiB/s
>> > done
>> > Bytes transferred = 1274816 (1373c0 hex)
>> > common/dlmalloc.c:819: do_check_chunk: Assertion `(char*)p + sz <= (char*)top' > failed.
>> >
>> > I had tried many times, do_check_chunk not always failed, and sometimes it > report common/dlmalloc.c:802: do_check_chunk: Assertion `!chunk_is_mmapped(p)' > failed. The situation is not the same.
>> >
>> > I got a bt stack when malloc failed:
>> >
>> > (gdb) bt
>> > #0  0x9ffb5684 in panic_finish () at lib/panic.c:23
>> > #1  panic (fmt=0x9ffbd96b "%s:%u: %s: Assertion `%s' failed.") at lib/panic.c:49
>> > #2  0x9ffb5696 in __assert_fail (assertion=<optimized out>, file=<optimized > out>, line=<optimized out>, function=<optimized out>) at lib/panic.c:56
>> > #3  0x9ff76910 in do_check_inuse_chunk (p=p@entry=0x9ffd7200) at > common/dlmalloc.c:866
>> > #4  0x9ff769d6 in do_check_malloced_chunk (p=p@entry=0x9ffd7200, s=s@entry=24) > at common/dlmalloc.c:900
>> > #5  0x9ff76da6 in malloc (bytes=<optimized out>) at common/dlmalloc.c:1552
>> > #6  0x9ff96b72 in env_attr_walk (attr_list=<optimized out>, > callback=0x9ff969f9 <regex_callback>, priv=0x9df28dc8) at env/attr.c:70
>> > #7  0x9ff96bc2 in env_attr_lookup (attr_list=<optimized out>, name=<optimized > out>, attributes=0x9df28dec "") at env/attr.c:184
>> > #8  0x9ff97146 in env_callback_init (var_entry=0x9df46f60) at env/callback.c:67
>> > #9  0x9ffb36fc in hsearch_r (item=..., action=ENV_ENTER, retval=0x9df28f60, > htab=0x9ffdbce8, flag=512) at lib/hashtable.c:403
>> > #10 0x9ff7090e in _do_env_set (argc=<optimized out>, argv=<optimized out>, > env_flag=512, flag=0) at cmd/nvedit.c:296
>> > #11 0x9ff70b64 in env_set (varname=<optimized out>, varvalue=<optimized out>) > at cmd/nvedit.c:318
>> > #12 0x9ff6d522 in netboot_update_env () at cmd/net.c:133
>> > #13 netboot_common (proto=DHCP, cmdtp=0x9ffdd0e8, argc=<optimized out>, > argv=0x9df442c8) at cmd/net.c:268
>> > #14 0x9ff783a4 in cmd_call (repeatable=0x9df29008, argv=0x9df442c8, argc=1, > flag=0, cmdtp=0x9ffdd0e8) at common/command.c:580
>> > #15 cmd_process (flag=<optimized out>, argc=1, argv=0x9df442c8, > repeatable=0x9ffdf6a0, ticks=0x0) at common/command.c:635
>> > #16 0x9ff71d16 in run_pipe_real (pi=0x9df44220) at common/cli_hush.c:1676
>> > #17 run_list_real (pi=<optimized out>) at common/cli_hush.c:1873
>> > #18 0x9ff71e28 in run_list (pi=0x9df44220) at common/cli_hush.c:2022
>> > #19 parse_stream_outer (inp=inp@entry=0x9df290e8, flag=flag@entry=2) at > common/cli_hush.c:3206
>> > #20 0x9ff721ba in parse_file_outer () at common/cli_hush.c:3289
>> > #21 0x9ff77c1a in cli_loop () at common/cli.c:229
>> > #22 0x9ff70d3e in main_loop () at common/main.c:66
>> > #23 0x9ff72672 in run_main_loop () at common/board_r.c:584
>> > #24 0x9ff72830 in initcall_run_list (init_sequence=0x9ffd7224) at > include/initcall.h:46
>> > #25 board_init_r (new_gd=<optimized out>, dest_addr=<optimized out>) at > common/board_r.c:822
>> > Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>> >  
>> >>
>> >> Best regards
>> >>
>> >> Heinrich
>> >>  
>> >>> printed.
>> >>>  
>> >>>>
>> >>>> #define DEBUG 1
>> >>>> #include <common.h>
>> >>>> #include <log.h>
>> >>>>
>> >>>> Best regards
>> >>>>
>> >>>> Heinrich  
>> >>>  

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: data abort when run 'dhcp'
  2023-07-20 16:39                 ` Miquel Raynal
  2023-07-20 17:55                   ` Heinrich Schuchardt
@ 2023-07-20 18:34                   ` Tom Rini
  2023-07-21 11:55                     ` Miquel Raynal
  2023-07-21  0:31                   ` qianfan
  2 siblings, 1 reply; 23+ messages in thread
From: Tom Rini @ 2023-07-20 18:34 UTC (permalink / raw)
  To: Miquel Raynal; +Cc: qianfan, Heinrich Schuchardt, u-boot

[-- Attachment #1: Type: text/plain, Size: 1724 bytes --]

On Thu, Jul 20, 2023 at 06:39:17PM +0200, Miquel Raynal wrote:
> Hello,
> 
> qianfanguijin@163.com wrote on Fri, 25 Mar 2022 18:04:46 +0800:
> 
> > It's very strange. And I can't detect it's a bug of usb or dlmalloc.
> > 
> > 1. Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok.
> > 
> > 2. Starting u-boot and dhcp via am335x's usb net, data abort.
> > 
> > 3. start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok.
> 
> I am sorry to re-open a thread that is one year old but this is
> still an open bug. The BBB is affected. In particular the BBBW
> because there is no Ethernet connector, which makes the Eth-over-USB
> emulation even more important. All U-Boots since 2021 are affected:
> spurious data aborts, usually at the end of network interactions (tftp,
> ping). I could not bisect it because the boot was deeply broken as
> well on a significant range of commits :-/.
> 
> On my side I narrowed it down to an env update which fails in malloc as
> well. If I comment the env update, it fails a bit later. It really
> looks like a stack corruption which is either related to the Ethernet
> USB gadget or the USB controller driver itself. Network transfers on
> the BBBW using regular Ethernet does not trigger any error.
> 
> I also observe the very strange "fix" mentioned above: starting and
> killing fastboot makes all tftp pass... If anyone has more details to
> share, or perhaps a subsequent thread giving more details, I would
> really like to see this fixed upstream, I suppose I am not the only one
> :-)

What happens if you increase the malloc pool from say 32MB (current
value, 0x2000000) to 64MB (so 0x4000000) ?

-- 
Tom

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: data abort when run 'dhcp'
  2023-07-20 16:39                 ` Miquel Raynal
  2023-07-20 17:55                   ` Heinrich Schuchardt
  2023-07-20 18:34                   ` Tom Rini
@ 2023-07-21  0:31                   ` qianfan
  2023-07-21 22:26                     ` Miquel Raynal
  2 siblings, 1 reply; 23+ messages in thread
From: qianfan @ 2023-07-21  0:31 UTC (permalink / raw)
  To: Miquel Raynal; +Cc: Heinrich Schuchardt, u-boot, Tom Rini



在 2023/7/21 0:39, Miquel Raynal 写道:
> Hello,
>
> qianfanguijin@163.com wrote on Fri, 25 Mar 2022 18:04:46 +0800:
>
>> It's very strange. And I can't detect it's a bug of usb or dlmalloc.
>>
>> 1. Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok.
>>
>> 2. Starting u-boot and dhcp via am335x's usb net, data abort.
>>
>> 3. start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok.
> I am sorry to re-open a thread that is one year old but this is
> still an open bug. The BBB is affected. In particular the BBBW
> because there is no Ethernet connector, which makes the Eth-over-USB
> emulation even more important. All U-Boots since 2021 are affected:
> spurious data aborts, usually at the end of network interactions (tftp,
> ping). I could not bisect it because the boot was deeply broken as
> well on a significant range of commits :-/.
>
> On my side I narrowed it down to an env update which fails in malloc as
> well. If I comment the env update, it fails a bit later. It really
> looks like a stack corruption which is either related to the Ethernet
> USB gadget or the USB controller driver itself. Network transfers on
> the BBBW using regular Ethernet does not trigger any error.
>
> I also observe the very strange "fix" mentioned above: starting and
> killing fastboot makes all tftp pass... If anyone has more details to
> share, or perhaps a subsequent thread giving more details, I would
> really like to see this fixed upstream, I suppose I am not the only one
> :-)
Hi:

Could you please try this two patches?

http://patchwork.ozlabs.org/project/uboot/patch/20220402025836.19374-1-qianfanguijin@163.com/

http://patchwork.ozlabs.org/project/uboot/patch/20220402025836.19374-2-qianfanguijin@163.com/

Thanks
>
> Thanks,
> Miquèl
>
>> 在 2022/3/24 17:33, qianfan 写道:
>>> 在 2022/3/23 18:12, Heinrich Schuchardt 写道:
>>>> On 3/23/22 11:07, qianfan wrote:
>>>>> 在 2022/3/23 17:51, Heinrich Schuchardt 写道:
>>>>>> On 3/23/22 10:13, qianfan wrote:
>>>>>>> 在 2022/3/23 16:02, qianfan 写道:
>>>>>>>>
>>>>>>>> 在 2022/3/23 15:45, qianfan 写道:
>>>>>>>>>
>>>>>>>>> 在 2022/3/23 10:28, qianfan 写道:
>>>>>>>>>> Hi:
>>>>>>>>>>
>>>>>>>>>> I had a custom AM335X board connected my computer by usbnet. It
>>>>>>>>>> always report data abort when 'dhcp':
>>>>>>>>>>
>>>>>>>>>> Next it the log:
>>>>>>>>>>
>>>>>>>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02
>>>>>>>>>> +0800)
>>>>>>>>>>
>>>>>>>>>> CPU  : AM335X-GP rev 2.1
>>>>>>>>>> Model: WISDOM AM335X CCT
>>>>>>>>>> DRAM:  512 MiB
>>>>>>>>>> NAND:  256 MiB
>>>>>>>>>> MMC:   OMAP SD/MMC: 0
>>>>>>>>>> Loading Environment from NAND... *** Warning - bad CRC, using
>>>>>>>>>> default environment
>>>>>>>>>>
>>>>>>>>>> Net:   Could not get PHY for ethernet@4a100000: addr 0
>>>>>>>>>> eth2: ethernet@4a100000, eth3: usb_ether
>>>>>>>>>> Hit any key to stop autoboot:  0
>>>>>>>>>> => setenv autoload no
>>>>>>>>>> => dhcp
>>>>>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>>>>>>>>>> MAC de:ad:be:ef:00:01
>>>>>>>>>> HOST MAC de:ad:be:ef:00:00
>>>>>>>>>> RNDIS ready
>>>>>>>>>> musb-hdrc: peripheral reset irq lost!
>>>>>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>>>>>>>>>> USB RNDIS network up!
>>>>>>>>>> BOOTP broadcast 1
>>>>>>>>>> BOOTP broadcast 2
>>>>>>>>>> BOOTP broadcast 3
>>>>>>>>>> DHCP client bound to address 192.168.200.4 (757 ms)
>>>>>>>>>> data abort
>>>>>>>>>> pc : [<9fe9b0a2>]          lr : [<9febbc3f>]
>>>>>>>>>> reloc pc : [<808130a2>]    lr : [<80833c3f>]
>>>>>>>>>> sp : 9de53410  ip : 9de53578     fp : 00000001
>>>>>>>>>> r10: 9de5345c  r9 : 9de67e80     r8 : 9febbae5
>>>>>>>>>> r7 : 9de72c30  r6 : 9feec710     r5 : 0000000d  r4 : 00000018
>>>>>>>>>> r3 : 3fdd8e04  r2 : 00000002     r1 : 9feec728  r0 : 9feec700
>>>>>>>>>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
>>>>>>>>>> Code: f023 0303 60ca 4403 (6091) 685a
>>>>>>>>>> Resetting CPU ...
>>>>>>>>>>
>>>>>>>>>> resetting ...
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It's there has any doc about how to debug data abort? Or is the bug
>>>>>>>>>> is already fixed?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>   
>>>>>>>>> This bug doesn't fixed on master code. I found v2021.01 is good and
>>>>>>>>> v2021.04-rc2 is bad.
>>>>>>>>>
>>>>>>>>> Also I had tested this on beaglebone black with am335x_evm_defconfig,
>>>>>>>>> has the simliar problem.
>>>>>>>>>
>>>>>>>>> find the first bug commit via 'git bisect': it told me that commit
>>>>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very
>>>>>>>>> strange due to this commit doesn't touch any dhcp or network code.
>>>>>>>>>
>>>>>>>>> ➜  u-boot-main git:(e97eb638de) ✗ git bisect bug
>>>>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit
>>>>>>>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917
>>>>>>>>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>>>>>>>> Date:   Wed Jan 20 22:21:53 2021 +0100
>>>>>>>>>
>>>>>>>>>      fs: fat: consistent error handling for flush_dir()
>>>>>>>>>
>>>>>>>>>      Provide function description for flush_dir().
>>>>>>>>>      Move all error messages for flush_dir() from the callers to the
>>>>>>>>> function.
>>>>>>>>>      Move mapping of errors to -EIO to the function.
>>>>>>>>>      Always check return value of flush_dir() (Coverity CID 316362).
>>>>>>>>>
>>>>>>>>>      In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
>>>>>>>>>
>>>>>>>>>      Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
>>>>>>>>>
>>>>>>>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e
>>>>>>>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M      fs
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before
>>>>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917:
>>>>>>>>>
>>>>>>>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
>>>>>>>>> handling for flush_dir()
>>>>>>>>> *   184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag
>>>>>>>>> 'u-boot-rockchip-20210121' of
>>>>>>>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip
>>>>>>>>> |\
>>>>>>>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc
>>>>>>>>> based PCIe controller driver
>>>>>>>>>
>>>>>>>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
>>>>>>>>>
>>>>>>>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
>>>>>>>>>
>>>>>>>>> CPU  : AM335X-GP rev 2.1
>>>>>>>>> Model: TI AM335x BeagleBone Black
>>>>>>>>> DRAM:  512 MiB
>>>>>>>>> WDT:   Started with servicing (60s timeout)
>>>>>>>>> NAND:  0 MiB
>>>>>>>>> MMC:   OMAP SD/MMC: 0, OMAP SD/MMC: 1
>>>>>>>>> Loading Environment from FAT... <ethaddr> not set. Validating first
>>>>>>>>> E-fuse MAC
>>>>>>>>> Net:   eth2: ethernet@4a100000, eth3: usb_ether
>>>>>>>>> Hit any key to stop autoboot:  0
>>>>>>>>> => dhcp
>>>>>>>>> ethernet@4a100000 Waiting for PHY auto negotiation to
>>>>>>>>> complete......... TIMEOUT !
>>>>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
>>>>>>>>> MAC de:ad:be:ef:00:01
>>>>>>>>> HOST MAC de:ad:be:ef:00:00
>>>>>>>>> RNDIS ready
>>>>>>>>> musb-hdrc: peripheral reset irq lost!
>>>>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
>>>>>>>>> USB RNDIS network up!
>>>>>>>>> BOOTP broadcast 1
>>>>>>>>> BOOTP broadcast 2
>>>>>>>>> BOOTP broadcast 3
>>>>>>>>> DHCP client bound to address 192.168.200.157 (757 ms)
>>>>>>>>> Using usb_ether device
>>>>>>>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157
>>>>>>>>> Filename 'u-boot.img'.
>>>>>>>>> Load address: 0x82000000
>>>>>>>>> Loading:
>>>>>>>>> #################################################################
>>>>>>>>> #################################################################
>>>>>>>>> #################################################################
>>>>>>>>>           #########################
>>>>>>>>>           2.5 MiB/s
>>>>>>>>> done
>>>>>>>>> Bytes transferred = 1123888 (112630 hex)
>>>>>>>>> =>
>>>>>>>>>   
>>>>>>> "data abort" messages:
>>>>>>>
>>>>>>> data abort
>>>>>>> pc : [<9ff8196c>]          lr : [<9ffa1cd7>]
>>>>>>> reloc pc : [<8081496c>]    lr : [<80834cd7>]
>>>>>>> sp : 9df38e60  ip : 9df38fc8     fp : 00000001
>>>>>>> r10: 9df38eac  r9 : 9df4ceb0     r8 : 9ffa1b7d
>>>>>>> r7 : 9df52fd0  r6 : 9ffdbba8     r5 : 0000000d  r4 : 00000018
>>>>>>> r3 : 3ff589e0  r2 : 9ffafa11     r1 : 9ffdbbc0  r0 : 9ffdbb00
>>>>>>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
>>>>>>> Code: 0303 60ca 4403 6091 (685a) f042
>>>>>>> Resetting CPU ...
>>>>>>>
>>>>>>> objdump u-boot:pc is in malloc and lr is in env_attr_walk
>>>>>>>
>>>>>>>         unlink(victim, bck, fwd);
>>>>>>> 80814966:    60ca          str    r2, [r1, #12]
>>>>>>>         set_inuse_bit_at_offset(victim, victim_size);
>>>>>>> 80814968:    4403          add    r3, r0
>>>>>>>         unlink(victim, bck, fwd);
>>>>>>> 8081496a:    6091          str    r1, [r2, #8]
>>>>>>>       set_inuse_bit_at_offset(victim, victim_size);
>>>>>>> 8081496c:    685a          ldr    r2, [r3, #4]
>>>>>>> 8081496e:    f042 0201     orr.w    r2, r2, #1
>>>>>>> 80814972:    605a          str    r2, [r3, #4]
>>>>>>>
>>>>>>> r3 is 3ff589e0 and it's not a valid ram address on am335x.
>>>>>>>
>>>>>>>   
>>>>>> I have seen crashes in common/dlmalloc.c before after double free() or
>>>>>> free() with an incorrect pointer.
>>>>>>
>>>>>> The assert() statements in do_check_inuse_chunk() are meant to catch
>>>>>> this but assert() as defined in include/log.h does not stop the code and
>>>>>> even does not print without _DEBUG=1.
>>>>>>
>>>>>> You should be able to get the assert output with
>>>>>>
>>>>>> #include <common.h>
>>>>>> #define _DEBUG 1
>>>>>> #include <log.h>
>>>>>>
>>>>>> at the top of common/dlmalloc.c.
>>>>>>
>>>>>> You should get full malloc debug output with
>>>>> Hi: I had try add DEBUG marco before <log.h> and no other malloc message
>>>> assert() checks for _DEBUG. Defining DEBUG after common.h will not
>>>> define _DEBUG.
>>> Finally I got a malloc error message on console:
>>>
>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.39
>>> Filename 'u-boot.img'.
>>> Load address: 0x82000000
>>> Loading: #################################################################
>>> #################################################################
>>> #################################################################
>>>           ######################################################  0 Bytes
>>>           1.9 MiB/s
>>> done
>>> Bytes transferred = 1274816 (1373c0 hex)
>>> common/dlmalloc.c:819: do_check_chunk: Assertion `(char*)p + sz <= (char*)top' > failed.
>>>
>>> I had tried many times, do_check_chunk not always failed, and sometimes it > report common/dlmalloc.c:802: do_check_chunk: Assertion `!chunk_is_mmapped(p)' > failed. The situation is not the same.
>>>
>>> I got a bt stack when malloc failed:
>>>
>>> (gdb) bt
>>> #0  0x9ffb5684 in panic_finish () at lib/panic.c:23
>>> #1  panic (fmt=0x9ffbd96b "%s:%u: %s: Assertion `%s' failed.") at lib/panic.c:49
>>> #2  0x9ffb5696 in __assert_fail (assertion=<optimized out>, file=<optimized > out>, line=<optimized out>, function=<optimized out>) at lib/panic.c:56
>>> #3  0x9ff76910 in do_check_inuse_chunk (p=p@entry=0x9ffd7200) at > common/dlmalloc.c:866
>>> #4  0x9ff769d6 in do_check_malloced_chunk (p=p@entry=0x9ffd7200, s=s@entry=24) > at common/dlmalloc.c:900
>>> #5  0x9ff76da6 in malloc (bytes=<optimized out>) at common/dlmalloc.c:1552
>>> #6  0x9ff96b72 in env_attr_walk (attr_list=<optimized out>, > callback=0x9ff969f9 <regex_callback>, priv=0x9df28dc8) at env/attr.c:70
>>> #7  0x9ff96bc2 in env_attr_lookup (attr_list=<optimized out>, name=<optimized > out>, attributes=0x9df28dec "") at env/attr.c:184
>>> #8  0x9ff97146 in env_callback_init (var_entry=0x9df46f60) at env/callback.c:67
>>> #9  0x9ffb36fc in hsearch_r (item=..., action=ENV_ENTER, retval=0x9df28f60, > htab=0x9ffdbce8, flag=512) at lib/hashtable.c:403
>>> #10 0x9ff7090e in _do_env_set (argc=<optimized out>, argv=<optimized out>, > env_flag=512, flag=0) at cmd/nvedit.c:296
>>> #11 0x9ff70b64 in env_set (varname=<optimized out>, varvalue=<optimized out>) > at cmd/nvedit.c:318
>>> #12 0x9ff6d522 in netboot_update_env () at cmd/net.c:133
>>> #13 netboot_common (proto=DHCP, cmdtp=0x9ffdd0e8, argc=<optimized out>, > argv=0x9df442c8) at cmd/net.c:268
>>> #14 0x9ff783a4 in cmd_call (repeatable=0x9df29008, argv=0x9df442c8, argc=1, > flag=0, cmdtp=0x9ffdd0e8) at common/command.c:580
>>> #15 cmd_process (flag=<optimized out>, argc=1, argv=0x9df442c8, > repeatable=0x9ffdf6a0, ticks=0x0) at common/command.c:635
>>> #16 0x9ff71d16 in run_pipe_real (pi=0x9df44220) at common/cli_hush.c:1676
>>> #17 run_list_real (pi=<optimized out>) at common/cli_hush.c:1873
>>> #18 0x9ff71e28 in run_list (pi=0x9df44220) at common/cli_hush.c:2022
>>> #19 parse_stream_outer (inp=inp@entry=0x9df290e8, flag=flag@entry=2) at > common/cli_hush.c:3206
>>> #20 0x9ff721ba in parse_file_outer () at common/cli_hush.c:3289
>>> #21 0x9ff77c1a in cli_loop () at common/cli.c:229
>>> #22 0x9ff70d3e in main_loop () at common/main.c:66
>>> #23 0x9ff72672 in run_main_loop () at common/board_r.c:584
>>> #24 0x9ff72830 in initcall_run_list (init_sequence=0x9ffd7224) at > include/initcall.h:46
>>> #25 board_init_r (new_gd=<optimized out>, dest_addr=<optimized out>) at > common/board_r.c:822
>>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>>>   
>>>> Best regards
>>>>
>>>> Heinrich
>>>>   
>>>>> printed.
>>>>>   
>>>>>> #define DEBUG 1
>>>>>> #include <common.h>
>>>>>> #include <log.h>
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>> Heinrich
>>>>>   


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: data abort when run 'dhcp'
  2023-07-20 17:55                   ` Heinrich Schuchardt
@ 2023-07-21 11:54                     ` Miquel Raynal
  0 siblings, 0 replies; 23+ messages in thread
From: Miquel Raynal @ 2023-07-21 11:54 UTC (permalink / raw)
  To: Heinrich Schuchardt; +Cc: qianfan, u-boot, Tom Rini

Hi Heinrich,

xypron.glpk@gmx.de wrote on Thu, 20 Jul 2023 19:55:39 +0200:

> Am 20. Juli 2023 18:39:17 MESZ schrieb Miquel Raynal <miquel.raynal@bootlin.com>:
> >Hello,
> >
> >qianfanguijin@163.com wrote on Fri, 25 Mar 2022 18:04:46 +0800:
> >  
> >> It's very strange. And I can't detect it's a bug of usb or dlmalloc.
> >> 
> >> 1. Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok.
> >> 
> >> 2. Starting u-boot and dhcp via am335x's usb net, data abort.
> >> 
> >> 3. start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok.  
> >
> >I am sorry to re-open a thread that is one year old but this is
> >still an open bug. The BBB is affected. In particular the BBBW
> >because there is no Ethernet connector, which makes the Eth-over-USB
> >emulation even more important. All U-Boots since 2021 are affected:
> >spurious data aborts, usually at the end of network interactions (tftp,
> >ping). I could not bisect it because the boot was deeply broken as
> >well on a significant range of commits :-/.
> >
> >On my side I narrowed it down to an env update which fails in malloc as
> >well. If I comment the env update, it fails a bit later. It really
> >looks like a stack corruption which is either related to the Ethernet
> >USB gadget or the USB controller driver itself. Network transfers on
> >the BBBW using regular Ethernet does not trigger any error.
> >
> >I also observe the very strange "fix" mentioned above: starting and
> >killing fastboot makes all tftp pass... If anyone has more details to
> >share, or perhaps a subsequent thread giving more details, I would
> >really like to see this fixed upstream, I suppose I am not the only one
> >:-)
> >
> >Thanks,
> >Miquèl  
> 
> 
> Can this problem be reproduced on QEMU?

I haven't tried on QEMU, what do you have in mind? What should we try
to do?

Thanks,
Miquèl

> 
> Best regards
> 
> Heinrich 
> 
> >  
> >> 
> >> 在 2022/3/24 17:33, qianfan 写道:  
> >> >
> >> > 在 2022/3/23 18:12, Heinrich Schuchardt 写道:    
> >> >> On 3/23/22 11:07, qianfan wrote:    
> >> >>>
> >> >>> 在 2022/3/23 17:51, Heinrich Schuchardt 写道:    
> >> >>>> On 3/23/22 10:13, qianfan wrote:    
> >> >>>>>
> >> >>>>> 在 2022/3/23 16:02, qianfan 写道:    
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> 在 2022/3/23 15:45, qianfan 写道:    
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> 在 2022/3/23 10:28, qianfan 写道:    
> >> >>>>>>>>
> >> >>>>>>>> Hi:
> >> >>>>>>>>
> >> >>>>>>>> I had a custom AM335X board connected my computer by usbnet. It
> >> >>>>>>>> always report data abort when 'dhcp':
> >> >>>>>>>>
> >> >>>>>>>> Next it the log:
> >> >>>>>>>>
> >> >>>>>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02
> >> >>>>>>>> +0800)
> >> >>>>>>>>
> >> >>>>>>>> CPU  : AM335X-GP rev 2.1
> >> >>>>>>>> Model: WISDOM AM335X CCT
> >> >>>>>>>> DRAM:  512 MiB
> >> >>>>>>>> NAND:  256 MiB
> >> >>>>>>>> MMC:   OMAP SD/MMC: 0
> >> >>>>>>>> Loading Environment from NAND... *** Warning - bad CRC, using
> >> >>>>>>>> default environment
> >> >>>>>>>>
> >> >>>>>>>> Net:   Could not get PHY for ethernet@4a100000: addr 0
> >> >>>>>>>> eth2: ethernet@4a100000, eth3: usb_ether
> >> >>>>>>>> Hit any key to stop autoboot:  0    
> >> >>>>>>>> => setenv autoload no
> >> >>>>>>>> => dhcp    
> >> >>>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
> >> >>>>>>>> MAC de:ad:be:ef:00:01
> >> >>>>>>>> HOST MAC de:ad:be:ef:00:00
> >> >>>>>>>> RNDIS ready
> >> >>>>>>>> musb-hdrc: peripheral reset irq lost!
> >> >>>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
> >> >>>>>>>> USB RNDIS network up!
> >> >>>>>>>> BOOTP broadcast 1
> >> >>>>>>>> BOOTP broadcast 2
> >> >>>>>>>> BOOTP broadcast 3
> >> >>>>>>>> DHCP client bound to address 192.168.200.4 (757 ms)
> >> >>>>>>>> data abort
> >> >>>>>>>> pc : [<9fe9b0a2>]          lr : [<9febbc3f>]
> >> >>>>>>>> reloc pc : [<808130a2>]    lr : [<80833c3f>]
> >> >>>>>>>> sp : 9de53410  ip : 9de53578     fp : 00000001
> >> >>>>>>>> r10: 9de5345c  r9 : 9de67e80     r8 : 9febbae5
> >> >>>>>>>> r7 : 9de72c30  r6 : 9feec710     r5 : 0000000d  r4 : 00000018
> >> >>>>>>>> r3 : 3fdd8e04  r2 : 00000002     r1 : 9feec728  r0 : 9feec700
> >> >>>>>>>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
> >> >>>>>>>> Code: f023 0303 60ca 4403 (6091) 685a
> >> >>>>>>>> Resetting CPU ...
> >> >>>>>>>>
> >> >>>>>>>> resetting ...
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> It's there has any doc about how to debug data abort? Or is the bug
> >> >>>>>>>> is already fixed?
> >> >>>>>>>>
> >> >>>>>>>> Thanks
> >> >>>>>>>>    
> >> >>>>>>> This bug doesn't fixed on master code. I found v2021.01 is good and
> >> >>>>>>> v2021.04-rc2 is bad.
> >> >>>>>>>
> >> >>>>>>> Also I had tested this on beaglebone black with am335x_evm_defconfig,
> >> >>>>>>> has the simliar problem.
> >> >>>>>>>
> >> >>>>>>> find the first bug commit via 'git bisect': it told me that commit
> >> >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very
> >> >>>>>>> strange due to this commit doesn't touch any dhcp or network code.
> >> >>>>>>>
> >> >>>>>>> ➜  u-boot-main git:(e97eb638de) ✗ git bisect bug
> >> >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit
> >> >>>>>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917
> >> >>>>>>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de>
> >> >>>>>>> Date:   Wed Jan 20 22:21:53 2021 +0100
> >> >>>>>>>
> >> >>>>>>>     fs: fat: consistent error handling for flush_dir()
> >> >>>>>>>
> >> >>>>>>>     Provide function description for flush_dir().
> >> >>>>>>>     Move all error messages for flush_dir() from the callers to the
> >> >>>>>>> function.
> >> >>>>>>>     Move mapping of errors to -EIO to the function.
> >> >>>>>>>     Always check return value of flush_dir() (Coverity CID 316362).
> >> >>>>>>>
> >> >>>>>>>     In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails.
> >> >>>>>>>
> >> >>>>>>>     Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
> >> >>>>>>>
> >> >>>>>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e
> >> >>>>>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M      fs
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before
> >> >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917:
> >> >>>>>>>
> >> >>>>>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error
> >> >>>>>>> handling for flush_dir()
> >> >>>>>>> *   184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag
> >> >>>>>>> 'u-boot-rockchip-20210121' of
> >> >>>>>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip
> >> >>>>>>> |\
> >> >>>>>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc
> >> >>>>>>> based PCIe controller driver
> >> >>>>>>>
> >> >>>>>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine.
> >> >>>>>>>
> >> >>>>>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800)
> >> >>>>>>>
> >> >>>>>>> CPU  : AM335X-GP rev 2.1
> >> >>>>>>> Model: TI AM335x BeagleBone Black
> >> >>>>>>> DRAM:  512 MiB
> >> >>>>>>> WDT:   Started with servicing (60s timeout)
> >> >>>>>>> NAND:  0 MiB
> >> >>>>>>> MMC:   OMAP SD/MMC: 0, OMAP SD/MMC: 1
> >> >>>>>>> Loading Environment from FAT... <ethaddr> not set. Validating first
> >> >>>>>>> E-fuse MAC
> >> >>>>>>> Net:   eth2: ethernet@4a100000, eth3: usb_ether
> >> >>>>>>> Hit any key to stop autoboot:  0    
> >> >>>>>>> => dhcp    
> >> >>>>>>> ethernet@4a100000 Waiting for PHY auto negotiation to
> >> >>>>>>> complete......... TIMEOUT !
> >> >>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in
> >> >>>>>>> MAC de:ad:be:ef:00:01
> >> >>>>>>> HOST MAC de:ad:be:ef:00:00
> >> >>>>>>> RNDIS ready
> >> >>>>>>> musb-hdrc: peripheral reset irq lost!
> >> >>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS
> >> >>>>>>> USB RNDIS network up!
> >> >>>>>>> BOOTP broadcast 1
> >> >>>>>>> BOOTP broadcast 2
> >> >>>>>>> BOOTP broadcast 3
> >> >>>>>>> DHCP client bound to address 192.168.200.157 (757 ms)
> >> >>>>>>> Using usb_ether device
> >> >>>>>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157
> >> >>>>>>> Filename 'u-boot.img'.
> >> >>>>>>> Load address: 0x82000000
> >> >>>>>>> Loading:
> >> >>>>>>> #################################################################
> >> >>>>>>> #################################################################
> >> >>>>>>> #################################################################
> >> >>>>>>>          #########################
> >> >>>>>>>          2.5 MiB/s
> >> >>>>>>> done
> >> >>>>>>> Bytes transferred = 1123888 (112630 hex)    
> >> >>>>>>> =>    
> >> >>>>>>>    
> >> >>>>> "data abort" messages:
> >> >>>>>
> >> >>>>> data abort
> >> >>>>> pc : [<9ff8196c>]          lr : [<9ffa1cd7>]
> >> >>>>> reloc pc : [<8081496c>]    lr : [<80834cd7>]
> >> >>>>> sp : 9df38e60  ip : 9df38fc8     fp : 00000001
> >> >>>>> r10: 9df38eac  r9 : 9df4ceb0     r8 : 9ffa1b7d
> >> >>>>> r7 : 9df52fd0  r6 : 9ffdbba8     r5 : 0000000d  r4 : 00000018
> >> >>>>> r3 : 3ff589e0  r2 : 9ffafa11     r1 : 9ffdbbc0  r0 : 9ffdbb00
> >> >>>>> Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32 (T)
> >> >>>>> Code: 0303 60ca 4403 6091 (685a) f042
> >> >>>>> Resetting CPU ...
> >> >>>>>
> >> >>>>> objdump u-boot:pc is in malloc and lr is in env_attr_walk
> >> >>>>>
> >> >>>>>        unlink(victim, bck, fwd);
> >> >>>>> 80814966:    60ca          str    r2, [r1, #12]
> >> >>>>>        set_inuse_bit_at_offset(victim, victim_size);
> >> >>>>> 80814968:    4403          add    r3, r0
> >> >>>>>        unlink(victim, bck, fwd);
> >> >>>>> 8081496a:    6091          str    r1, [r2, #8]
> >> >>>>>      set_inuse_bit_at_offset(victim, victim_size);
> >> >>>>> 8081496c:    685a          ldr    r2, [r3, #4]
> >> >>>>> 8081496e:    f042 0201     orr.w    r2, r2, #1
> >> >>>>> 80814972:    605a          str    r2, [r3, #4]
> >> >>>>>
> >> >>>>> r3 is 3ff589e0 and it's not a valid ram address on am335x.
> >> >>>>>
> >> >>>>>    
> >> >>>>
> >> >>>> I have seen crashes in common/dlmalloc.c before after double free() or
> >> >>>> free() with an incorrect pointer.
> >> >>>>
> >> >>>> The assert() statements in do_check_inuse_chunk() are meant to catch
> >> >>>> this but assert() as defined in include/log.h does not stop the code and
> >> >>>> even does not print without _DEBUG=1.
> >> >>>>
> >> >>>> You should be able to get the assert output with
> >> >>>>
> >> >>>> #include <common.h>
> >> >>>> #define _DEBUG 1
> >> >>>> #include <log.h>
> >> >>>>
> >> >>>> at the top of common/dlmalloc.c.
> >> >>>>
> >> >>>> You should get full malloc debug output with    
> >> >>>
> >> >>> Hi: I had try add DEBUG marco before <log.h> and no other malloc message    
> >> >>
> >> >> assert() checks for _DEBUG. Defining DEBUG after common.h will not
> >> >> define _DEBUG.    
> >> >
> >> > Finally I got a malloc error message on console:
> >> >
> >> > TFTP from server 192.168.200.1; our IP address is 192.168.200.39
> >> > Filename 'u-boot.img'.
> >> > Load address: 0x82000000
> >> > Loading: #################################################################
> >> > #################################################################
> >> > #################################################################
> >> >          ######################################################  0 Bytes
> >> >          1.9 MiB/s
> >> > done
> >> > Bytes transferred = 1274816 (1373c0 hex)
> >> > common/dlmalloc.c:819: do_check_chunk: Assertion `(char*)p + sz <= (char*)top' > failed.
> >> >
> >> > I had tried many times, do_check_chunk not always failed, and sometimes it > report common/dlmalloc.c:802: do_check_chunk: Assertion `!chunk_is_mmapped(p)' > failed. The situation is not the same.
> >> >
> >> > I got a bt stack when malloc failed:
> >> >
> >> > (gdb) bt
> >> > #0  0x9ffb5684 in panic_finish () at lib/panic.c:23
> >> > #1  panic (fmt=0x9ffbd96b "%s:%u: %s: Assertion `%s' failed.") at lib/panic.c:49
> >> > #2  0x9ffb5696 in __assert_fail (assertion=<optimized out>, file=<optimized > out>, line=<optimized out>, function=<optimized out>) at lib/panic.c:56
> >> > #3  0x9ff76910 in do_check_inuse_chunk (p=p@entry=0x9ffd7200) at > common/dlmalloc.c:866
> >> > #4  0x9ff769d6 in do_check_malloced_chunk (p=p@entry=0x9ffd7200, s=s@entry=24) > at common/dlmalloc.c:900
> >> > #5  0x9ff76da6 in malloc (bytes=<optimized out>) at common/dlmalloc.c:1552
> >> > #6  0x9ff96b72 in env_attr_walk (attr_list=<optimized out>, > callback=0x9ff969f9 <regex_callback>, priv=0x9df28dc8) at env/attr.c:70
> >> > #7  0x9ff96bc2 in env_attr_lookup (attr_list=<optimized out>, name=<optimized > out>, attributes=0x9df28dec "") at env/attr.c:184
> >> > #8  0x9ff97146 in env_callback_init (var_entry=0x9df46f60) at env/callback.c:67
> >> > #9  0x9ffb36fc in hsearch_r (item=..., action=ENV_ENTER, retval=0x9df28f60, > htab=0x9ffdbce8, flag=512) at lib/hashtable.c:403
> >> > #10 0x9ff7090e in _do_env_set (argc=<optimized out>, argv=<optimized out>, > env_flag=512, flag=0) at cmd/nvedit.c:296
> >> > #11 0x9ff70b64 in env_set (varname=<optimized out>, varvalue=<optimized out>) > at cmd/nvedit.c:318
> >> > #12 0x9ff6d522 in netboot_update_env () at cmd/net.c:133
> >> > #13 netboot_common (proto=DHCP, cmdtp=0x9ffdd0e8, argc=<optimized out>, > argv=0x9df442c8) at cmd/net.c:268
> >> > #14 0x9ff783a4 in cmd_call (repeatable=0x9df29008, argv=0x9df442c8, argc=1, > flag=0, cmdtp=0x9ffdd0e8) at common/command.c:580
> >> > #15 cmd_process (flag=<optimized out>, argc=1, argv=0x9df442c8, > repeatable=0x9ffdf6a0, ticks=0x0) at common/command.c:635
> >> > #16 0x9ff71d16 in run_pipe_real (pi=0x9df44220) at common/cli_hush.c:1676
> >> > #17 run_list_real (pi=<optimized out>) at common/cli_hush.c:1873
> >> > #18 0x9ff71e28 in run_list (pi=0x9df44220) at common/cli_hush.c:2022
> >> > #19 parse_stream_outer (inp=inp@entry=0x9df290e8, flag=flag@entry=2) at > common/cli_hush.c:3206
> >> > #20 0x9ff721ba in parse_file_outer () at common/cli_hush.c:3289
> >> > #21 0x9ff77c1a in cli_loop () at common/cli.c:229
> >> > #22 0x9ff70d3e in main_loop () at common/main.c:66
> >> > #23 0x9ff72672 in run_main_loop () at common/board_r.c:584
> >> > #24 0x9ff72830 in initcall_run_list (init_sequence=0x9ffd7224) at > include/initcall.h:46
> >> > #25 board_init_r (new_gd=<optimized out>, dest_addr=<optimized out>) at > common/board_r.c:822
> >> > Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> >> >    
> >> >>
> >> >> Best regards
> >> >>
> >> >> Heinrich
> >> >>    
> >> >>> printed.
> >> >>>    
> >> >>>>
> >> >>>> #define DEBUG 1
> >> >>>> #include <common.h>
> >> >>>> #include <log.h>
> >> >>>>
> >> >>>> Best regards
> >> >>>>
> >> >>>> Heinrich    
> >> >>>    


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: data abort when run 'dhcp'
  2023-07-20 18:34                   ` Tom Rini
@ 2023-07-21 11:55                     ` Miquel Raynal
  0 siblings, 0 replies; 23+ messages in thread
From: Miquel Raynal @ 2023-07-21 11:55 UTC (permalink / raw)
  To: Tom Rini; +Cc: qianfan, Heinrich Schuchardt, u-boot

Hi Tom,

trini@konsulko.com wrote on Thu, 20 Jul 2023 14:34:52 -0400:

> On Thu, Jul 20, 2023 at 06:39:17PM +0200, Miquel Raynal wrote:
> > Hello,
> > 
> > qianfanguijin@163.com wrote on Fri, 25 Mar 2022 18:04:46 +0800:
> >   
> > > It's very strange. And I can't detect it's a bug of usb or dlmalloc.
> > > 
> > > 1. Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok.
> > > 
> > > 2. Starting u-boot and dhcp via am335x's usb net, data abort.
> > > 
> > > 3. start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok.  
> > 
> > I am sorry to re-open a thread that is one year old but this is
> > still an open bug. The BBB is affected. In particular the BBBW
> > because there is no Ethernet connector, which makes the Eth-over-USB
> > emulation even more important. All U-Boots since 2021 are affected:
> > spurious data aborts, usually at the end of network interactions (tftp,
> > ping). I could not bisect it because the boot was deeply broken as
> > well on a significant range of commits :-/.
> > 
> > On my side I narrowed it down to an env update which fails in malloc as
> > well. If I comment the env update, it fails a bit later. It really
> > looks like a stack corruption which is either related to the Ethernet
> > USB gadget or the USB controller driver itself. Network transfers on
> > the BBBW using regular Ethernet does not trigger any error.
> > 
> > I also observe the very strange "fix" mentioned above: starting and
> > killing fastboot makes all tftp pass... If anyone has more details to
> > share, or perhaps a subsequent thread giving more details, I would
> > really like to see this fixed upstream, I suppose I am not the only one
> > :-)  
> 
> What happens if you increase the malloc pool from say 32MB (current
> value, 0x2000000) to 64MB (so 0x4000000) ?

Same result. I tried to increment the heap size to 64MB as well as the
stack size (16 -> 64MB), same behavior.

Thanks,
Miquèl

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: data abort when run 'dhcp'
  2023-07-21  0:31                   ` qianfan
@ 2023-07-21 22:26                     ` Miquel Raynal
  0 siblings, 0 replies; 23+ messages in thread
From: Miquel Raynal @ 2023-07-21 22:26 UTC (permalink / raw)
  To: qianfan; +Cc: Heinrich Schuchardt, u-boot, Tom Rini

Hi qianfan,

qianfanguijin@163.com wrote on Fri, 21 Jul 2023 08:31:17 +0800:

> 在 2023/7/21 0:39, Miquel Raynal 写道:
> > Hello,
> >
> > qianfanguijin@163.com wrote on Fri, 25 Mar 2022 18:04:46 +0800:
> >  
> >> It's very strange. And I can't detect it's a bug of usb or dlmalloc.
> >>
> >> 1. Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok.
> >>
> >> 2. Starting u-boot and dhcp via am335x's usb net, data abort.
> >>
> >> 3. start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok.  
> > I am sorry to re-open a thread that is one year old but this is
> > still an open bug. The BBB is affected. In particular the BBBW
> > because there is no Ethernet connector, which makes the Eth-over-USB
> > emulation even more important. All U-Boots since 2021 are affected:
> > spurious data aborts, usually at the end of network interactions (tftp,
> > ping). I could not bisect it because the boot was deeply broken as
> > well on a significant range of commits :-/.
> >
> > On my side I narrowed it down to an env update which fails in malloc as
> > well. If I comment the env update, it fails a bit later. It really
> > looks like a stack corruption which is either related to the Ethernet
> > USB gadget or the USB controller driver itself. Network transfers on
> > the BBBW using regular Ethernet does not trigger any error.
> >
> > I also observe the very strange "fix" mentioned above: starting and
> > killing fastboot makes all tftp pass... If anyone has more details to
> > share, or perhaps a subsequent thread giving more details, I would
> > really like to see this fixed upstream, I suppose I am not the only one
> > :-)  
> Hi:
> 
> Could you please try this two patches?
> 
> http://patchwork.ozlabs.org/project/uboot/patch/20220402025836.19374-1-qianfanguijin@163.com/
> 
> http://patchwork.ozlabs.org/project/uboot/patch/20220402025836.19374-2-qianfanguijin@163.com/

Indeed these patches work. I ended up rewriting one of them to propose
a different approach. I also found two other proposals for the same
issue which are still pending around. I hope this submission will make
it to avoid more time to be spent on this :-)

Thanks a lot for the pointers, I've Cc'ed you on the submissions.

Kind regards,
Miquèl

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-07-21 22:27 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-23  2:28 How to debug u-boot data abort qianfan
2022-03-23  7:45 ` qianfan
2022-03-23  8:02   ` data abort when run 'dhcp' qianfan
2022-03-23  9:13     ` qianfan
2022-03-23  9:51       ` Heinrich Schuchardt
2022-03-23 10:07         ` qianfan
2022-03-23 10:12           ` Heinrich Schuchardt
2022-03-23 11:54             ` qianfanguijin
2022-03-24  1:23             ` qianfan
2022-03-24  9:33             ` qianfan
2022-03-25 10:04               ` qianfan
2023-07-20 16:39                 ` Miquel Raynal
2023-07-20 17:55                   ` Heinrich Schuchardt
2023-07-21 11:54                     ` Miquel Raynal
2023-07-20 18:34                   ` Tom Rini
2023-07-21 11:55                     ` Miquel Raynal
2023-07-21  0:31                   ` qianfan
2023-07-21 22:26                     ` Miquel Raynal
2022-03-23  8:27   ` How to debug u-boot data abort Heinrich Schuchardt
2022-03-24  3:18     ` AKASHI Takahiro
2022-03-24  7:38       ` qianfan
2022-03-23  7:51 ` Abder
2022-03-23  7:59   ` qianfan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.