* How to debug u-boot data abort @ 2022-03-23 2:28 qianfan 2022-03-23 7:45 ` qianfan 2022-03-23 7:51 ` Abder 0 siblings, 2 replies; 23+ messages in thread From: qianfan @ 2022-03-23 2:28 UTC (permalink / raw) To: u-boot Hi: I had a custom AM335X board connected my computer by usbnet. It always report data abort when 'dhcp': Next it the log: U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800) CPU : AM335X-GP rev 2.1 Model: WISDOM AM335X CCT DRAM: 512 MiB NAND: 256 MiB MMC: OMAP SD/MMC: 0 Loading Environment from NAND... *** Warning - bad CRC, using default environment Net: Could not get PHY for ethernet@4a100000: addr 0 eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => setenv autoload no => dhcp using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.4 (757 ms) data abort pc : [<9fe9b0a2>] lr : [<9febbc3f>] reloc pc : [<808130a2>] lr : [<80833c3f>] sp : 9de53410 ip : 9de53578 fp : 00000001 r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) Code: f023 0303 60ca 4403 (6091) 685a Resetting CPU ... resetting ... It's there has any doc about how to debug data abort? Or is the bug is already fixed? Thanks ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: How to debug u-boot data abort 2022-03-23 2:28 How to debug u-boot data abort qianfan @ 2022-03-23 7:45 ` qianfan 2022-03-23 8:02 ` data abort when run 'dhcp' qianfan 2022-03-23 8:27 ` How to debug u-boot data abort Heinrich Schuchardt 2022-03-23 7:51 ` Abder 1 sibling, 2 replies; 23+ messages in thread From: qianfan @ 2022-03-23 7:45 UTC (permalink / raw) To: u-boot; +Cc: Heinrich Schuchardt 在 2022/3/23 10:28, qianfan 写道: > > Hi: > > I had a custom AM335X board connected my computer by usbnet. It always report > data abort when 'dhcp': > > Next it the log: > > U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800) > > CPU : AM335X-GP rev 2.1 > Model: WISDOM AM335X CCT > DRAM: 512 MiB > NAND: 256 MiB > MMC: OMAP SD/MMC: 0 > Loading Environment from NAND... *** Warning - bad CRC, using default environment > > Net: Could not get PHY for ethernet@4a100000: addr 0 > eth2: ethernet@4a100000, eth3: usb_ether > Hit any key to stop autoboot: 0 > => setenv autoload no > => dhcp > using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in > MAC de:ad:be:ef:00:01 > HOST MAC de:ad:be:ef:00:00 > RNDIS ready > musb-hdrc: peripheral reset irq lost! > high speed config #2: 2 mA, Ethernet Gadget, using RNDIS > USB RNDIS network up! > BOOTP broadcast 1 > BOOTP broadcast 2 > BOOTP broadcast 3 > DHCP client bound to address 192.168.200.4 (757 ms) > data abort > pc : [<9fe9b0a2>] lr : [<9febbc3f>] > reloc pc : [<808130a2>] lr : [<80833c3f>] > sp : 9de53410 ip : 9de53578 fp : 00000001 > r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 > r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 > r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 > Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) > Code: f023 0303 60ca 4403 (6091) 685a > Resetting CPU ... > > resetting ... > > > It's there has any doc about how to debug data abort? Or is the bug is already > fixed? > > Thanks > This bug doesn't fixed on master code. I found v2021.01 is good and v2021.04-rc2 is bad. Also I had tested this on beaglebone black with am335x_evm_defconfig, has the simliar problem. find the first bug commit via 'git bisect': it told me that commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very strange due to this commit doesn't touch any dhcp or network code. ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 Author: Heinrich Schuchardt <xypron.glpk@gmx.de> Date: Wed Jan 20 22:21:53 2021 +0100 fs: fat: consistent error handling for flush_dir() Provide function description for flush_dir(). Move all error messages for flush_dir() from the callers to the function. Move mapping of errors to -EIO to the function. Always check return value of flush_dir() (Coverity CID 316362). In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e 77d188b1c99181fd71f2167fdeee3434a09db209 M fs 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before e97eb638de0dc8f6e989e20eaeb0342f103cb917: * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error handling for flush_dir() * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag 'u-boot-rockchip-20210121' of https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip |\ | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based PCIe controller driver I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800) CPU : AM335X-GP rev 2.1 Model: TI AM335x BeagleBone Black DRAM: 512 MiB WDT: Started with servicing (60s timeout) NAND: 0 MiB MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 Loading Environment from FAT... <ethaddr> not set. Validating first E-fuse MAC Net: eth2: ethernet@4a100000, eth3: usb_ether Hit any key to stop autoboot: 0 => dhcp ethernet@4a100000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in MAC de:ad:be:ef:00:01 HOST MAC de:ad:be:ef:00:00 RNDIS ready musb-hdrc: peripheral reset irq lost! high speed config #2: 2 mA, Ethernet Gadget, using RNDIS USB RNDIS network up! BOOTP broadcast 1 BOOTP broadcast 2 BOOTP broadcast 3 DHCP client bound to address 192.168.200.157 (757 ms) Using usb_ether device TFTP from server 192.168.200.1; our IP address is 192.168.200.157 Filename 'u-boot.img'. Load address: 0x82000000 Loading: ################################################################# ################################################################# ################################################################# ######################### 2.5 MiB/s done Bytes transferred = 1123888 (112630 hex) => ^ permalink raw reply [flat|nested] 23+ messages in thread
* data abort when run 'dhcp' 2022-03-23 7:45 ` qianfan @ 2022-03-23 8:02 ` qianfan 2022-03-23 9:13 ` qianfan 2022-03-23 8:27 ` How to debug u-boot data abort Heinrich Schuchardt 1 sibling, 1 reply; 23+ messages in thread From: qianfan @ 2022-03-23 8:02 UTC (permalink / raw) To: u-boot; +Cc: Heinrich Schuchardt 在 2022/3/23 15:45, qianfan 写道: > > > 在 2022/3/23 10:28, qianfan 写道: >> >> Hi: >> >> I had a custom AM335X board connected my computer by usbnet. It always report >> data abort when 'dhcp': >> >> Next it the log: >> >> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800) >> >> CPU : AM335X-GP rev 2.1 >> Model: WISDOM AM335X CCT >> DRAM: 512 MiB >> NAND: 256 MiB >> MMC: OMAP SD/MMC: 0 >> Loading Environment from NAND... *** Warning - bad CRC, using default environment >> >> Net: Could not get PHY for ethernet@4a100000: addr 0 >> eth2: ethernet@4a100000, eth3: usb_ether >> Hit any key to stop autoboot: 0 >> => setenv autoload no >> => dhcp >> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >> MAC de:ad:be:ef:00:01 >> HOST MAC de:ad:be:ef:00:00 >> RNDIS ready >> musb-hdrc: peripheral reset irq lost! >> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >> USB RNDIS network up! >> BOOTP broadcast 1 >> BOOTP broadcast 2 >> BOOTP broadcast 3 >> DHCP client bound to address 192.168.200.4 (757 ms) >> data abort >> pc : [<9fe9b0a2>] lr : [<9febbc3f>] >> reloc pc : [<808130a2>] lr : [<80833c3f>] >> sp : 9de53410 ip : 9de53578 fp : 00000001 >> r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 >> r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 >> r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 >> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >> Code: f023 0303 60ca 4403 (6091) 685a >> Resetting CPU ... >> >> resetting ... >> >> >> It's there has any doc about how to debug data abort? Or is the bug is >> already fixed? >> >> Thanks >> > This bug doesn't fixed on master code. I found v2021.01 is good and > v2021.04-rc2 is bad. > > Also I had tested this on beaglebone black with am335x_evm_defconfig, has the > simliar problem. > > find the first bug commit via 'git bisect': it told me that commit > e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very strange due > to this commit doesn't touch any dhcp or network code. > > ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug > e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit > commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 > Author: Heinrich Schuchardt <xypron.glpk@gmx.de> > Date: Wed Jan 20 22:21:53 2021 +0100 > > fs: fat: consistent error handling for flush_dir() > > Provide function description for flush_dir(). > Move all error messages for flush_dir() from the callers to the function. > Move mapping of errors to -EIO to the function. > Always check return value of flush_dir() (Coverity CID 316362). > > In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. > > Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> > > :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e > 77d188b1c99181fd71f2167fdeee3434a09db209 M fs > > > 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before > e97eb638de0dc8f6e989e20eaeb0342f103cb917: > > * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error handling > for flush_dir() > * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag > 'u-boot-rockchip-20210121' of > https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip > |\ > | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based PCIe > controller driver > > I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. > > U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800) > > CPU : AM335X-GP rev 2.1 > Model: TI AM335x BeagleBone Black > DRAM: 512 MiB > WDT: Started with servicing (60s timeout) > NAND: 0 MiB > MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 > Loading Environment from FAT... <ethaddr> not set. Validating first E-fuse MAC > Net: eth2: ethernet@4a100000, eth3: usb_ether > Hit any key to stop autoboot: 0 > => dhcp > ethernet@4a100000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! > using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in > MAC de:ad:be:ef:00:01 > HOST MAC de:ad:be:ef:00:00 > RNDIS ready > musb-hdrc: peripheral reset irq lost! > high speed config #2: 2 mA, Ethernet Gadget, using RNDIS > USB RNDIS network up! > BOOTP broadcast 1 > BOOTP broadcast 2 > BOOTP broadcast 3 > DHCP client bound to address 192.168.200.157 (757 ms) > Using usb_ether device > TFTP from server 192.168.200.1; our IP address is 192.168.200.157 > Filename 'u-boot.img'. > Load address: 0x82000000 > Loading: ################################################################# > ################################################################# > ################################################################# > ######################### > 2.5 MiB/s > done > Bytes transferred = 1123888 (112630 hex) > => > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: data abort when run 'dhcp' 2022-03-23 8:02 ` data abort when run 'dhcp' qianfan @ 2022-03-23 9:13 ` qianfan 2022-03-23 9:51 ` Heinrich Schuchardt 0 siblings, 1 reply; 23+ messages in thread From: qianfan @ 2022-03-23 9:13 UTC (permalink / raw) To: u-boot; +Cc: Heinrich Schuchardt, Heinrich Schuchardt 在 2022/3/23 16:02, qianfan 写道: > > > 在 2022/3/23 15:45, qianfan 写道: >> >> >> 在 2022/3/23 10:28, qianfan 写道: >>> >>> Hi: >>> >>> I had a custom AM335X board connected my computer by usbnet. It always >>> report data abort when 'dhcp': >>> >>> Next it the log: >>> >>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800) >>> >>> CPU : AM335X-GP rev 2.1 >>> Model: WISDOM AM335X CCT >>> DRAM: 512 MiB >>> NAND: 256 MiB >>> MMC: OMAP SD/MMC: 0 >>> Loading Environment from NAND... *** Warning - bad CRC, using default >>> environment >>> >>> Net: Could not get PHY for ethernet@4a100000: addr 0 >>> eth2: ethernet@4a100000, eth3: usb_ether >>> Hit any key to stop autoboot: 0 >>> => setenv autoload no >>> => dhcp >>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>> MAC de:ad:be:ef:00:01 >>> HOST MAC de:ad:be:ef:00:00 >>> RNDIS ready >>> musb-hdrc: peripheral reset irq lost! >>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>> USB RNDIS network up! >>> BOOTP broadcast 1 >>> BOOTP broadcast 2 >>> BOOTP broadcast 3 >>> DHCP client bound to address 192.168.200.4 (757 ms) >>> data abort >>> pc : [<9fe9b0a2>] lr : [<9febbc3f>] >>> reloc pc : [<808130a2>] lr : [<80833c3f>] >>> sp : 9de53410 ip : 9de53578 fp : 00000001 >>> r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 >>> r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 >>> r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 >>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >>> Code: f023 0303 60ca 4403 (6091) 685a >>> Resetting CPU ... >>> >>> resetting ... >>> >>> >>> It's there has any doc about how to debug data abort? Or is the bug is >>> already fixed? >>> >>> Thanks >>> >> This bug doesn't fixed on master code. I found v2021.01 is good and >> v2021.04-rc2 is bad. >> >> Also I had tested this on beaglebone black with am335x_evm_defconfig, has the >> simliar problem. >> >> find the first bug commit via 'git bisect': it told me that commit >> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very strange due >> to this commit doesn't touch any dhcp or network code. >> >> ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug >> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit >> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 >> Author: Heinrich Schuchardt <xypron.glpk@gmx.de> >> Date: Wed Jan 20 22:21:53 2021 +0100 >> >> fs: fat: consistent error handling for flush_dir() >> >> Provide function description for flush_dir(). >> Move all error messages for flush_dir() from the callers to the function. >> Move mapping of errors to -EIO to the function. >> Always check return value of flush_dir() (Coverity CID 316362). >> >> In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. >> >> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> >> >> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e >> 77d188b1c99181fd71f2167fdeee3434a09db209 M fs >> >> >> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before >> e97eb638de0dc8f6e989e20eaeb0342f103cb917: >> >> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error handling >> for flush_dir() >> * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag >> 'u-boot-rockchip-20210121' of >> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip >> |\ >> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based PCIe >> controller driver >> >> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. >> >> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800) >> >> CPU : AM335X-GP rev 2.1 >> Model: TI AM335x BeagleBone Black >> DRAM: 512 MiB >> WDT: Started with servicing (60s timeout) >> NAND: 0 MiB >> MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 >> Loading Environment from FAT... <ethaddr> not set. Validating first E-fuse MAC >> Net: eth2: ethernet@4a100000, eth3: usb_ether >> Hit any key to stop autoboot: 0 >> => dhcp >> ethernet@4a100000 Waiting for PHY auto negotiation to complete......... TIMEOUT ! >> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >> MAC de:ad:be:ef:00:01 >> HOST MAC de:ad:be:ef:00:00 >> RNDIS ready >> musb-hdrc: peripheral reset irq lost! >> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >> USB RNDIS network up! >> BOOTP broadcast 1 >> BOOTP broadcast 2 >> BOOTP broadcast 3 >> DHCP client bound to address 192.168.200.157 (757 ms) >> Using usb_ether device >> TFTP from server 192.168.200.1; our IP address is 192.168.200.157 >> Filename 'u-boot.img'. >> Load address: 0x82000000 >> Loading: ################################################################# >> ################################################################# >> ################################################################# >> ######################### >> 2.5 MiB/s >> done >> Bytes transferred = 1123888 (112630 hex) >> => >> "data abort" messages: data abort pc : [<9ff8196c>] lr : [<9ffa1cd7>] reloc pc : [<8081496c>] lr : [<80834cd7>] sp : 9df38e60 ip : 9df38fc8 fp : 00000001 r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) Code: 0303 60ca 4403 6091 (685a) f042 Resetting CPU ... objdump u-boot:pc is in malloc and lr is in env_attr_walk unlink(victim, bck, fwd); 80814966: 60ca str r2, [r1, #12] set_inuse_bit_at_offset(victim, victim_size); 80814968: 4403 add r3, r0 unlink(victim, bck, fwd); 8081496a: 6091 str r1, [r2, #8] set_inuse_bit_at_offset(victim, victim_size); 8081496c: 685a ldr r2, [r3, #4] 8081496e: f042 0201 orr.w r2, r2, #1 80814972: 605a str r2, [r3, #4] r3 is 3ff589e0 and it's not a valid ram address on am335x. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: data abort when run 'dhcp' 2022-03-23 9:13 ` qianfan @ 2022-03-23 9:51 ` Heinrich Schuchardt 2022-03-23 10:07 ` qianfan 0 siblings, 1 reply; 23+ messages in thread From: Heinrich Schuchardt @ 2022-03-23 9:51 UTC (permalink / raw) To: qianfan; +Cc: u-boot On 3/23/22 10:13, qianfan wrote: > > 在 2022/3/23 16:02, qianfan 写道: >> >> >> 在 2022/3/23 15:45, qianfan 写道: >>> >>> >>> 在 2022/3/23 10:28, qianfan 写道: >>>> >>>> Hi: >>>> >>>> I had a custom AM335X board connected my computer by usbnet. It >>>> always report data abort when 'dhcp': >>>> >>>> Next it the log: >>>> >>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 >>>> +0800) >>>> >>>> CPU : AM335X-GP rev 2.1 >>>> Model: WISDOM AM335X CCT >>>> DRAM: 512 MiB >>>> NAND: 256 MiB >>>> MMC: OMAP SD/MMC: 0 >>>> Loading Environment from NAND... *** Warning - bad CRC, using >>>> default environment >>>> >>>> Net: Could not get PHY for ethernet@4a100000: addr 0 >>>> eth2: ethernet@4a100000, eth3: usb_ether >>>> Hit any key to stop autoboot: 0 >>>> => setenv autoload no >>>> => dhcp >>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>>> MAC de:ad:be:ef:00:01 >>>> HOST MAC de:ad:be:ef:00:00 >>>> RNDIS ready >>>> musb-hdrc: peripheral reset irq lost! >>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>>> USB RNDIS network up! >>>> BOOTP broadcast 1 >>>> BOOTP broadcast 2 >>>> BOOTP broadcast 3 >>>> DHCP client bound to address 192.168.200.4 (757 ms) >>>> data abort >>>> pc : [<9fe9b0a2>] lr : [<9febbc3f>] >>>> reloc pc : [<808130a2>] lr : [<80833c3f>] >>>> sp : 9de53410 ip : 9de53578 fp : 00000001 >>>> r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 >>>> r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 >>>> r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 >>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >>>> Code: f023 0303 60ca 4403 (6091) 685a >>>> Resetting CPU ... >>>> >>>> resetting ... >>>> >>>> >>>> It's there has any doc about how to debug data abort? Or is the bug >>>> is already fixed? >>>> >>>> Thanks >>>> >>> This bug doesn't fixed on master code. I found v2021.01 is good and >>> v2021.04-rc2 is bad. >>> >>> Also I had tested this on beaglebone black with am335x_evm_defconfig, >>> has the simliar problem. >>> >>> find the first bug commit via 'git bisect': it told me that commit >>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very >>> strange due to this commit doesn't touch any dhcp or network code. >>> >>> ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug >>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit >>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 >>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de> >>> Date: Wed Jan 20 22:21:53 2021 +0100 >>> >>> fs: fat: consistent error handling for flush_dir() >>> >>> Provide function description for flush_dir(). >>> Move all error messages for flush_dir() from the callers to the >>> function. >>> Move mapping of errors to -EIO to the function. >>> Always check return value of flush_dir() (Coverity CID 316362). >>> >>> In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. >>> >>> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> >>> >>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e >>> 77d188b1c99181fd71f2167fdeee3434a09db209 M fs >>> >>> >>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before >>> e97eb638de0dc8f6e989e20eaeb0342f103cb917: >>> >>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error >>> handling for flush_dir() >>> * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag >>> 'u-boot-rockchip-20210121' of >>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip >>> |\ >>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc >>> based PCIe controller driver >>> >>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. >>> >>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800) >>> >>> CPU : AM335X-GP rev 2.1 >>> Model: TI AM335x BeagleBone Black >>> DRAM: 512 MiB >>> WDT: Started with servicing (60s timeout) >>> NAND: 0 MiB >>> MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 >>> Loading Environment from FAT... <ethaddr> not set. Validating first >>> E-fuse MAC >>> Net: eth2: ethernet@4a100000, eth3: usb_ether >>> Hit any key to stop autoboot: 0 >>> => dhcp >>> ethernet@4a100000 Waiting for PHY auto negotiation to >>> complete......... TIMEOUT ! >>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>> MAC de:ad:be:ef:00:01 >>> HOST MAC de:ad:be:ef:00:00 >>> RNDIS ready >>> musb-hdrc: peripheral reset irq lost! >>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>> USB RNDIS network up! >>> BOOTP broadcast 1 >>> BOOTP broadcast 2 >>> BOOTP broadcast 3 >>> DHCP client bound to address 192.168.200.157 (757 ms) >>> Using usb_ether device >>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157 >>> Filename 'u-boot.img'. >>> Load address: 0x82000000 >>> Loading: >>> ################################################################# >>> ################################################################# >>> ################################################################# >>> ######################### >>> 2.5 MiB/s >>> done >>> Bytes transferred = 1123888 (112630 hex) >>> => >>> > "data abort" messages: > > data abort > pc : [<9ff8196c>] lr : [<9ffa1cd7>] > reloc pc : [<8081496c>] lr : [<80834cd7>] > sp : 9df38e60 ip : 9df38fc8 fp : 00000001 > r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d > r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 > r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 > Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) > Code: 0303 60ca 4403 6091 (685a) f042 > Resetting CPU ... > > objdump u-boot:pc is in malloc and lr is in env_attr_walk > > unlink(victim, bck, fwd); > 80814966: 60ca str r2, [r1, #12] > set_inuse_bit_at_offset(victim, victim_size); > 80814968: 4403 add r3, r0 > unlink(victim, bck, fwd); > 8081496a: 6091 str r1, [r2, #8] > set_inuse_bit_at_offset(victim, victim_size); > 8081496c: 685a ldr r2, [r3, #4] > 8081496e: f042 0201 orr.w r2, r2, #1 > 80814972: 605a str r2, [r3, #4] > > r3 is 3ff589e0 and it's not a valid ram address on am335x. > > I have seen crashes in common/dlmalloc.c before after double free() or free() with an incorrect pointer. The assert() statements in do_check_inuse_chunk() are meant to catch this but assert() as defined in include/log.h does not stop the code and even does not print without _DEBUG=1. You should be able to get the assert output with #include <common.h> #define _DEBUG 1 #include <log.h> at the top of common/dlmalloc.c. You should get full malloc debug output with #define DEBUG 1 #include <common.h> #include <log.h> Best regards Heinrich ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: data abort when run 'dhcp' 2022-03-23 9:51 ` Heinrich Schuchardt @ 2022-03-23 10:07 ` qianfan 2022-03-23 10:12 ` Heinrich Schuchardt 0 siblings, 1 reply; 23+ messages in thread From: qianfan @ 2022-03-23 10:07 UTC (permalink / raw) To: Heinrich Schuchardt; +Cc: u-boot 在 2022/3/23 17:51, Heinrich Schuchardt 写道: > On 3/23/22 10:13, qianfan wrote: >> >> 在 2022/3/23 16:02, qianfan 写道: >>> >>> >>> 在 2022/3/23 15:45, qianfan 写道: >>>> >>>> >>>> 在 2022/3/23 10:28, qianfan 写道: >>>>> >>>>> Hi: >>>>> >>>>> I had a custom AM335X board connected my computer by usbnet. It >>>>> always report data abort when 'dhcp': >>>>> >>>>> Next it the log: >>>>> >>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 >>>>> +0800) >>>>> >>>>> CPU : AM335X-GP rev 2.1 >>>>> Model: WISDOM AM335X CCT >>>>> DRAM: 512 MiB >>>>> NAND: 256 MiB >>>>> MMC: OMAP SD/MMC: 0 >>>>> Loading Environment from NAND... *** Warning - bad CRC, using >>>>> default environment >>>>> >>>>> Net: Could not get PHY for ethernet@4a100000: addr 0 >>>>> eth2: ethernet@4a100000, eth3: usb_ether >>>>> Hit any key to stop autoboot: 0 >>>>> => setenv autoload no >>>>> => dhcp >>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>>>> MAC de:ad:be:ef:00:01 >>>>> HOST MAC de:ad:be:ef:00:00 >>>>> RNDIS ready >>>>> musb-hdrc: peripheral reset irq lost! >>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>>>> USB RNDIS network up! >>>>> BOOTP broadcast 1 >>>>> BOOTP broadcast 2 >>>>> BOOTP broadcast 3 >>>>> DHCP client bound to address 192.168.200.4 (757 ms) >>>>> data abort >>>>> pc : [<9fe9b0a2>] lr : [<9febbc3f>] >>>>> reloc pc : [<808130a2>] lr : [<80833c3f>] >>>>> sp : 9de53410 ip : 9de53578 fp : 00000001 >>>>> r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 >>>>> r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 >>>>> r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 >>>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >>>>> Code: f023 0303 60ca 4403 (6091) 685a >>>>> Resetting CPU ... >>>>> >>>>> resetting ... >>>>> >>>>> >>>>> It's there has any doc about how to debug data abort? Or is the bug >>>>> is already fixed? >>>>> >>>>> Thanks >>>>> >>>> This bug doesn't fixed on master code. I found v2021.01 is good and >>>> v2021.04-rc2 is bad. >>>> >>>> Also I had tested this on beaglebone black with am335x_evm_defconfig, >>>> has the simliar problem. >>>> >>>> find the first bug commit via 'git bisect': it told me that commit >>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very >>>> strange due to this commit doesn't touch any dhcp or network code. >>>> >>>> ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug >>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit >>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 >>>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de> >>>> Date: Wed Jan 20 22:21:53 2021 +0100 >>>> >>>> fs: fat: consistent error handling for flush_dir() >>>> >>>> Provide function description for flush_dir(). >>>> Move all error messages for flush_dir() from the callers to the >>>> function. >>>> Move mapping of errors to -EIO to the function. >>>> Always check return value of flush_dir() (Coverity CID 316362). >>>> >>>> In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. >>>> >>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> >>>> >>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e >>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M fs >>>> >>>> >>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before >>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917: >>>> >>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error >>>> handling for flush_dir() >>>> * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag >>>> 'u-boot-rockchip-20210121' of >>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip >>>> |\ >>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc >>>> based PCIe controller driver >>>> >>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. >>>> >>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800) >>>> >>>> CPU : AM335X-GP rev 2.1 >>>> Model: TI AM335x BeagleBone Black >>>> DRAM: 512 MiB >>>> WDT: Started with servicing (60s timeout) >>>> NAND: 0 MiB >>>> MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 >>>> Loading Environment from FAT... <ethaddr> not set. Validating first >>>> E-fuse MAC >>>> Net: eth2: ethernet@4a100000, eth3: usb_ether >>>> Hit any key to stop autoboot: 0 >>>> => dhcp >>>> ethernet@4a100000 Waiting for PHY auto negotiation to >>>> complete......... TIMEOUT ! >>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>>> MAC de:ad:be:ef:00:01 >>>> HOST MAC de:ad:be:ef:00:00 >>>> RNDIS ready >>>> musb-hdrc: peripheral reset irq lost! >>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>>> USB RNDIS network up! >>>> BOOTP broadcast 1 >>>> BOOTP broadcast 2 >>>> BOOTP broadcast 3 >>>> DHCP client bound to address 192.168.200.157 (757 ms) >>>> Using usb_ether device >>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157 >>>> Filename 'u-boot.img'. >>>> Load address: 0x82000000 >>>> Loading: >>>> ################################################################# >>>> ################################################################# >>>> ################################################################# >>>> ######################### >>>> 2.5 MiB/s >>>> done >>>> Bytes transferred = 1123888 (112630 hex) >>>> => >>>> >> "data abort" messages: >> >> data abort >> pc : [<9ff8196c>] lr : [<9ffa1cd7>] >> reloc pc : [<8081496c>] lr : [<80834cd7>] >> sp : 9df38e60 ip : 9df38fc8 fp : 00000001 >> r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d >> r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 >> r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 >> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >> Code: 0303 60ca 4403 6091 (685a) f042 >> Resetting CPU ... >> >> objdump u-boot:pc is in malloc and lr is in env_attr_walk >> >> unlink(victim, bck, fwd); >> 80814966: 60ca str r2, [r1, #12] >> set_inuse_bit_at_offset(victim, victim_size); >> 80814968: 4403 add r3, r0 >> unlink(victim, bck, fwd); >> 8081496a: 6091 str r1, [r2, #8] >> set_inuse_bit_at_offset(victim, victim_size); >> 8081496c: 685a ldr r2, [r3, #4] >> 8081496e: f042 0201 orr.w r2, r2, #1 >> 80814972: 605a str r2, [r3, #4] >> >> r3 is 3ff589e0 and it's not a valid ram address on am335x. >> >> > > I have seen crashes in common/dlmalloc.c before after double free() or > free() with an incorrect pointer. > > The assert() statements in do_check_inuse_chunk() are meant to catch > this but assert() as defined in include/log.h does not stop the code and > even does not print without _DEBUG=1. > > You should be able to get the assert output with > > #include <common.h> > #define _DEBUG 1 > #include <log.h> > > at the top of common/dlmalloc.c. > > You should get full malloc debug output with Hi: I had try add DEBUG marco before <log.h> and no other malloc message printed. > > #define DEBUG 1 > #include <common.h> > #include <log.h> > > Best regards > > Heinrich ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: data abort when run 'dhcp' 2022-03-23 10:07 ` qianfan @ 2022-03-23 10:12 ` Heinrich Schuchardt 2022-03-23 11:54 ` qianfanguijin ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: Heinrich Schuchardt @ 2022-03-23 10:12 UTC (permalink / raw) To: qianfan; +Cc: u-boot On 3/23/22 11:07, qianfan wrote: > > 在 2022/3/23 17:51, Heinrich Schuchardt 写道: >> On 3/23/22 10:13, qianfan wrote: >>> >>> 在 2022/3/23 16:02, qianfan 写道: >>>> >>>> >>>> 在 2022/3/23 15:45, qianfan 写道: >>>>> >>>>> >>>>> 在 2022/3/23 10:28, qianfan 写道: >>>>>> >>>>>> Hi: >>>>>> >>>>>> I had a custom AM335X board connected my computer by usbnet. It >>>>>> always report data abort when 'dhcp': >>>>>> >>>>>> Next it the log: >>>>>> >>>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 >>>>>> +0800) >>>>>> >>>>>> CPU : AM335X-GP rev 2.1 >>>>>> Model: WISDOM AM335X CCT >>>>>> DRAM: 512 MiB >>>>>> NAND: 256 MiB >>>>>> MMC: OMAP SD/MMC: 0 >>>>>> Loading Environment from NAND... *** Warning - bad CRC, using >>>>>> default environment >>>>>> >>>>>> Net: Could not get PHY for ethernet@4a100000: addr 0 >>>>>> eth2: ethernet@4a100000, eth3: usb_ether >>>>>> Hit any key to stop autoboot: 0 >>>>>> => setenv autoload no >>>>>> => dhcp >>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>>>>> MAC de:ad:be:ef:00:01 >>>>>> HOST MAC de:ad:be:ef:00:00 >>>>>> RNDIS ready >>>>>> musb-hdrc: peripheral reset irq lost! >>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>>>>> USB RNDIS network up! >>>>>> BOOTP broadcast 1 >>>>>> BOOTP broadcast 2 >>>>>> BOOTP broadcast 3 >>>>>> DHCP client bound to address 192.168.200.4 (757 ms) >>>>>> data abort >>>>>> pc : [<9fe9b0a2>] lr : [<9febbc3f>] >>>>>> reloc pc : [<808130a2>] lr : [<80833c3f>] >>>>>> sp : 9de53410 ip : 9de53578 fp : 00000001 >>>>>> r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 >>>>>> r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 >>>>>> r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 >>>>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >>>>>> Code: f023 0303 60ca 4403 (6091) 685a >>>>>> Resetting CPU ... >>>>>> >>>>>> resetting ... >>>>>> >>>>>> >>>>>> It's there has any doc about how to debug data abort? Or is the bug >>>>>> is already fixed? >>>>>> >>>>>> Thanks >>>>>> >>>>> This bug doesn't fixed on master code. I found v2021.01 is good and >>>>> v2021.04-rc2 is bad. >>>>> >>>>> Also I had tested this on beaglebone black with am335x_evm_defconfig, >>>>> has the simliar problem. >>>>> >>>>> find the first bug commit via 'git bisect': it told me that commit >>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very >>>>> strange due to this commit doesn't touch any dhcp or network code. >>>>> >>>>> ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug >>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit >>>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 >>>>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de> >>>>> Date: Wed Jan 20 22:21:53 2021 +0100 >>>>> >>>>> fs: fat: consistent error handling for flush_dir() >>>>> >>>>> Provide function description for flush_dir(). >>>>> Move all error messages for flush_dir() from the callers to the >>>>> function. >>>>> Move mapping of errors to -EIO to the function. >>>>> Always check return value of flush_dir() (Coverity CID 316362). >>>>> >>>>> In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. >>>>> >>>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> >>>>> >>>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e >>>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M fs >>>>> >>>>> >>>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before >>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917: >>>>> >>>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error >>>>> handling for flush_dir() >>>>> * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag >>>>> 'u-boot-rockchip-20210121' of >>>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip >>>>> |\ >>>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc >>>>> based PCIe controller driver >>>>> >>>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. >>>>> >>>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800) >>>>> >>>>> CPU : AM335X-GP rev 2.1 >>>>> Model: TI AM335x BeagleBone Black >>>>> DRAM: 512 MiB >>>>> WDT: Started with servicing (60s timeout) >>>>> NAND: 0 MiB >>>>> MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 >>>>> Loading Environment from FAT... <ethaddr> not set. Validating first >>>>> E-fuse MAC >>>>> Net: eth2: ethernet@4a100000, eth3: usb_ether >>>>> Hit any key to stop autoboot: 0 >>>>> => dhcp >>>>> ethernet@4a100000 Waiting for PHY auto negotiation to >>>>> complete......... TIMEOUT ! >>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>>>> MAC de:ad:be:ef:00:01 >>>>> HOST MAC de:ad:be:ef:00:00 >>>>> RNDIS ready >>>>> musb-hdrc: peripheral reset irq lost! >>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>>>> USB RNDIS network up! >>>>> BOOTP broadcast 1 >>>>> BOOTP broadcast 2 >>>>> BOOTP broadcast 3 >>>>> DHCP client bound to address 192.168.200.157 (757 ms) >>>>> Using usb_ether device >>>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157 >>>>> Filename 'u-boot.img'. >>>>> Load address: 0x82000000 >>>>> Loading: >>>>> ################################################################# >>>>> ################################################################# >>>>> ################################################################# >>>>> ######################### >>>>> 2.5 MiB/s >>>>> done >>>>> Bytes transferred = 1123888 (112630 hex) >>>>> => >>>>> >>> "data abort" messages: >>> >>> data abort >>> pc : [<9ff8196c>] lr : [<9ffa1cd7>] >>> reloc pc : [<8081496c>] lr : [<80834cd7>] >>> sp : 9df38e60 ip : 9df38fc8 fp : 00000001 >>> r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d >>> r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 >>> r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 >>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >>> Code: 0303 60ca 4403 6091 (685a) f042 >>> Resetting CPU ... >>> >>> objdump u-boot:pc is in malloc and lr is in env_attr_walk >>> >>> unlink(victim, bck, fwd); >>> 80814966: 60ca str r2, [r1, #12] >>> set_inuse_bit_at_offset(victim, victim_size); >>> 80814968: 4403 add r3, r0 >>> unlink(victim, bck, fwd); >>> 8081496a: 6091 str r1, [r2, #8] >>> set_inuse_bit_at_offset(victim, victim_size); >>> 8081496c: 685a ldr r2, [r3, #4] >>> 8081496e: f042 0201 orr.w r2, r2, #1 >>> 80814972: 605a str r2, [r3, #4] >>> >>> r3 is 3ff589e0 and it's not a valid ram address on am335x. >>> >>> >> >> I have seen crashes in common/dlmalloc.c before after double free() or >> free() with an incorrect pointer. >> >> The assert() statements in do_check_inuse_chunk() are meant to catch >> this but assert() as defined in include/log.h does not stop the code and >> even does not print without _DEBUG=1. >> >> You should be able to get the assert output with >> >> #include <common.h> >> #define _DEBUG 1 >> #include <log.h> >> >> at the top of common/dlmalloc.c. >> >> You should get full malloc debug output with > > Hi: I had try add DEBUG marco before <log.h> and no other malloc message assert() checks for _DEBUG. Defining DEBUG after common.h will not define _DEBUG. Best regards Heinrich > printed. > >> >> #define DEBUG 1 >> #include <common.h> >> #include <log.h> >> >> Best regards >> >> Heinrich > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: data abort when run 'dhcp' 2022-03-23 10:12 ` Heinrich Schuchardt @ 2022-03-23 11:54 ` qianfanguijin 2022-03-24 1:23 ` qianfan 2022-03-24 9:33 ` qianfan 2 siblings, 0 replies; 23+ messages in thread From: qianfanguijin @ 2022-03-23 11:54 UTC (permalink / raw) To: Heinrich Schuchardt; +Cc: u-boot no malloc messages even if i remove the _DEBUG marco check in assert. maybe it can’t detected by do_check_inuse_chunk(). > 在 2022年3月23日,18:12,Heinrich Schuchardt <xypron.glpk@gmx.de> 写道: > On 3/23/22 11:07, qianfan wrote: >> >> 在 2022/3/23 17:51, Heinrich Schuchardt 写道: >>> On 3/23/22 10:13, qianfan wrote: >>>> 在 2022/3/23 16:02, qianfan 写道: >>>>> 在 2022/3/23 15:45, qianfan 写道: >>>>>> 在 2022/3/23 10:28, qianfan 写道: >>>>>>> Hi: >>>>>>> I had a custom AM335X board connected my computer by usbnet. It >>>>>>> always report data abort when 'dhcp': >>>>>>> Next it the log: >>>>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 >>>>>>> +0800) >>>>>>> CPU : AM335X-GP rev 2.1 >>>>>>> Model: WISDOM AM335X CCT >>>>>>> DRAM: 512 MiB >>>>>>> NAND: 256 MiB >>>>>>> MMC: OMAP SD/MMC: 0 >>>>>>> Loading Environment from NAND... *** Warning - bad CRC, using >>>>>>> default environment >>>>>>> Net: Could not get PHY for ethernet@4a100000: addr 0 >>>>>>> eth2: ethernet@4a100000, eth3: usb_ether >>>>>>> Hit any key to stop autoboot: 0 >>>>>>> => setenv autoload no >>>>>>> => dhcp >>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>>>>>> MAC de:ad:be:ef:00:01 >>>>>>> HOST MAC de:ad:be:ef:00:00 >>>>>>> RNDIS ready >>>>>>> musb-hdrc: peripheral reset irq lost! >>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>>>>>> USB RNDIS network up! >>>>>>> BOOTP broadcast 1 >>>>>>> BOOTP broadcast 2 >>>>>>> BOOTP broadcast 3 >>>>>>> DHCP client bound to address 192.168.200.4 (757 ms) >>>>>>> data abort >>>>>>> pc : [<9fe9b0a2>] lr : [<9febbc3f>] >>>>>>> reloc pc : [<808130a2>] lr : [<80833c3f>] >>>>>>> sp : 9de53410 ip : 9de53578 fp : 00000001 >>>>>>> r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 >>>>>>> r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 >>>>>>> r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 >>>>>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >>>>>>> Code: f023 0303 60ca 4403 (6091) 685a >>>>>>> Resetting CPU ... >>>>>>> resetting ... >>>>>>> It's there has any doc about how to debug data abort? Or is the bug >>>>>>> is already fixed? >>>>>>> Thanks >>>>>> This bug doesn't fixed on master code. I found v2021.01 is good and >>>>>> v2021.04-rc2 is bad. >>>>>> Also I had tested this on beaglebone black with am335x_evm_defconfig, >>>>>> has the simliar problem. >>>>>> find the first bug commit via 'git bisect': it told me that commit >>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very >>>>>> strange due to this commit doesn't touch any dhcp or network code. >>>>>> ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug >>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit >>>>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 >>>>>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de> >>>>>> Date: Wed Jan 20 22:21:53 2021 +0100 >>>>>> fs: fat: consistent error handling for flush_dir() >>>>>> Provide function description for flush_dir(). >>>>>> Move all error messages for flush_dir() from the callers to the >>>>>> function. >>>>>> Move mapping of errors to -EIO to the function. >>>>>> Always check return value of flush_dir() (Coverity CID 316362). >>>>>> In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. >>>>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> >>>>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e >>>>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M fs >>>>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before >>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917: >>>>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error >>>>>> handling for flush_dir() >>>>>> * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag >>>>>> 'u-boot-rockchip-20210121' of >>>>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip >>>>>> |\ >>>>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc >>>>>> based PCIe controller driver >>>>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. >>>>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800) >>>>>> CPU : AM335X-GP rev 2.1 >>>>>> Model: TI AM335x BeagleBone Black >>>>>> DRAM: 512 MiB >>>>>> WDT: Started with servicing (60s timeout) >>>>>> NAND: 0 MiB >>>>>> MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 >>>>>> Loading Environment from FAT... <ethaddr> not set. Validating first >>>>>> E-fuse MAC >>>>>> Net: eth2: ethernet@4a100000, eth3: usb_ether >>>>>> Hit any key to stop autoboot: 0 >>>>>> => dhcp >>>>>> ethernet@4a100000 Waiting for PHY auto negotiation to >>>>>> complete......... TIMEOUT ! >>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>>>>> MAC de:ad:be:ef:00:01 >>>>>> HOST MAC de:ad:be:ef:00:00 >>>>>> RNDIS ready >>>>>> musb-hdrc: peripheral reset irq lost! >>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>>>>> USB RNDIS network up! >>>>>> BOOTP broadcast 1 >>>>>> BOOTP broadcast 2 >>>>>> BOOTP broadcast 3 >>>>>> DHCP client bound to address 192.168.200.157 (757 ms) >>>>>> Using usb_ether device >>>>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157 >>>>>> Filename 'u-boot.img'. >>>>>> Load address: 0x82000000 >>>>>> Loading: >>>>>> ################################################################# >>>>>> ################################################################# >>>>>> ################################################################# >>>>>> ######################### >>>>>> 2.5 MiB/s >>>>>> done >>>>>> Bytes transferred = 1123888 (112630 hex) >>>>>> => >>>> "data abort" messages: >>>> data abort >>>> pc : [<9ff8196c>] lr : [<9ffa1cd7>] >>>> reloc pc : [<8081496c>] lr : [<80834cd7>] >>>> sp : 9df38e60 ip : 9df38fc8 fp : 00000001 >>>> r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d >>>> r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 >>>> r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 >>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >>>> Code: 0303 60ca 4403 6091 (685a) f042 >>>> Resetting CPU ... >>>> objdump u-boot:pc is in malloc and lr is in env_attr_walk >>>> unlink(victim, bck, fwd); >>>> 80814966: 60ca str r2, [r1, #12] >>>> set_inuse_bit_at_offset(victim, victim_size); >>>> 80814968: 4403 add r3, r0 >>>> unlink(victim, bck, fwd); >>>> 8081496a: 6091 str r1, [r2, #8] >>>> set_inuse_bit_at_offset(victim, victim_size); >>>> 8081496c: 685a ldr r2, [r3, #4] >>>> 8081496e: f042 0201 orr.w r2, r2, #1 >>>> 80814972: 605a str r2, [r3, #4] >>>> r3 is 3ff589e0 and it's not a valid ram address on am335x. >>> I have seen crashes in common/dlmalloc.c before after double free() or >>> free() with an incorrect pointer. >>> The assert() statements in do_check_inuse_chunk() are meant to catch >>> this but assert() as defined in include/log.h does not stop the code and >>> even does not print without _DEBUG=1. >>> You should be able to get the assert output with >>> #include <common.h> >>> #define _DEBUG 1 >>> #include <log.h> >>> at the top of common/dlmalloc.c. >>> You should get full malloc debug output with >> >> Hi: I had try add DEBUG marco before <log.h> and no other malloc message > > assert() checks for _DEBUG. Defining DEBUG after common.h will not > define _DEBUG. > > Best regards > > Heinrich > >> printed. >> >>> #define DEBUG 1 >>> #include <common.h> >>> #include <log.h> >>> Best regards >>> Heinrich ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: data abort when run 'dhcp' 2022-03-23 10:12 ` Heinrich Schuchardt 2022-03-23 11:54 ` qianfanguijin @ 2022-03-24 1:23 ` qianfan 2022-03-24 9:33 ` qianfan 2 siblings, 0 replies; 23+ messages in thread From: qianfan @ 2022-03-24 1:23 UTC (permalink / raw) To: Heinrich Schuchardt, Sean Anderson; +Cc: u-boot Adding 'while (1) ;' before bad_mode in data_abort function and so I can gdb u-boot when data abort. Yes, I can connect it via gdb, but bt can't show the full stack. (gdb) add-symbol-file u-boot 0x9ff66000 add symbol table from file "u-boot" at .text_addr = 0x9ff66000 (y or n) y Reading symbols from u-boot...done. (gdb) bt #0 do_data_abort (pt_regs=0x9df30eb8) at arch/arm/lib/interrupts.c:169 #1 0x9ff661c8 in data_abort () at arch/arm/lib/vectors.S:271 Backtrace stopped: previous frame identical to this frame (corrupt stack?) Hi @Sean Anderson, I had notice that you just submit some patchs about malloc, could you please also check this? 在 2022/3/23 18:12, Heinrich Schuchardt 写道: > On 3/23/22 11:07, qianfan wrote: >> >> 在 2022/3/23 17:51, Heinrich Schuchardt 写道: >>> On 3/23/22 10:13, qianfan wrote: >>>> >>>> 在 2022/3/23 16:02, qianfan 写道: >>>>> >>>>> >>>>> 在 2022/3/23 15:45, qianfan 写道: >>>>>> >>>>>> >>>>>> 在 2022/3/23 10:28, qianfan 写道: >>>>>>> >>>>>>> Hi: >>>>>>> >>>>>>> I had a custom AM335X board connected my computer by usbnet. It >>>>>>> always report data abort when 'dhcp': >>>>>>> >>>>>>> Next it the log: >>>>>>> >>>>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 >>>>>>> +0800) >>>>>>> >>>>>>> CPU : AM335X-GP rev 2.1 >>>>>>> Model: WISDOM AM335X CCT >>>>>>> DRAM: 512 MiB >>>>>>> NAND: 256 MiB >>>>>>> MMC: OMAP SD/MMC: 0 >>>>>>> Loading Environment from NAND... *** Warning - bad CRC, using >>>>>>> default environment >>>>>>> >>>>>>> Net: Could not get PHY for ethernet@4a100000: addr 0 >>>>>>> eth2: ethernet@4a100000, eth3: usb_ether >>>>>>> Hit any key to stop autoboot: 0 >>>>>>> => setenv autoload no >>>>>>> => dhcp >>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>>>>>> MAC de:ad:be:ef:00:01 >>>>>>> HOST MAC de:ad:be:ef:00:00 >>>>>>> RNDIS ready >>>>>>> musb-hdrc: peripheral reset irq lost! >>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>>>>>> USB RNDIS network up! >>>>>>> BOOTP broadcast 1 >>>>>>> BOOTP broadcast 2 >>>>>>> BOOTP broadcast 3 >>>>>>> DHCP client bound to address 192.168.200.4 (757 ms) >>>>>>> data abort >>>>>>> pc : [<9fe9b0a2>] lr : [<9febbc3f>] >>>>>>> reloc pc : [<808130a2>] lr : [<80833c3f>] >>>>>>> sp : 9de53410 ip : 9de53578 fp : 00000001 >>>>>>> r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 >>>>>>> r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 >>>>>>> r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 >>>>>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >>>>>>> Code: f023 0303 60ca 4403 (6091) 685a >>>>>>> Resetting CPU ... >>>>>>> >>>>>>> resetting ... >>>>>>> >>>>>>> >>>>>>> It's there has any doc about how to debug data abort? Or is the bug >>>>>>> is already fixed? >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>> This bug doesn't fixed on master code. I found v2021.01 is good and >>>>>> v2021.04-rc2 is bad. >>>>>> >>>>>> Also I had tested this on beaglebone black with am335x_evm_defconfig, >>>>>> has the simliar problem. >>>>>> >>>>>> find the first bug commit via 'git bisect': it told me that commit >>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very >>>>>> strange due to this commit doesn't touch any dhcp or network code. >>>>>> >>>>>> ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug >>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit >>>>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 >>>>>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de> >>>>>> Date: Wed Jan 20 22:21:53 2021 +0100 >>>>>> >>>>>> fs: fat: consistent error handling for flush_dir() >>>>>> >>>>>> Provide function description for flush_dir(). >>>>>> Move all error messages for flush_dir() from the callers to the >>>>>> function. >>>>>> Move mapping of errors to -EIO to the function. >>>>>> Always check return value of flush_dir() (Coverity CID 316362). >>>>>> >>>>>> In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. >>>>>> >>>>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> >>>>>> >>>>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e >>>>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M fs >>>>>> >>>>>> >>>>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before >>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917: >>>>>> >>>>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error >>>>>> handling for flush_dir() >>>>>> * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag >>>>>> 'u-boot-rockchip-20210121' of >>>>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip >>>>>> |\ >>>>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc >>>>>> based PCIe controller driver >>>>>> >>>>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. >>>>>> >>>>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800) >>>>>> >>>>>> CPU : AM335X-GP rev 2.1 >>>>>> Model: TI AM335x BeagleBone Black >>>>>> DRAM: 512 MiB >>>>>> WDT: Started with servicing (60s timeout) >>>>>> NAND: 0 MiB >>>>>> MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 >>>>>> Loading Environment from FAT... <ethaddr> not set. Validating first >>>>>> E-fuse MAC >>>>>> Net: eth2: ethernet@4a100000, eth3: usb_ether >>>>>> Hit any key to stop autoboot: 0 >>>>>> => dhcp >>>>>> ethernet@4a100000 Waiting for PHY auto negotiation to >>>>>> complete......... TIMEOUT ! >>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>>>>> MAC de:ad:be:ef:00:01 >>>>>> HOST MAC de:ad:be:ef:00:00 >>>>>> RNDIS ready >>>>>> musb-hdrc: peripheral reset irq lost! >>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>>>>> USB RNDIS network up! >>>>>> BOOTP broadcast 1 >>>>>> BOOTP broadcast 2 >>>>>> BOOTP broadcast 3 >>>>>> DHCP client bound to address 192.168.200.157 (757 ms) >>>>>> Using usb_ether device >>>>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157 >>>>>> Filename 'u-boot.img'. >>>>>> Load address: 0x82000000 >>>>>> Loading: >>>>>> ################################################################# >>>>>> ################################################################# >>>>>> ################################################################# >>>>>> ######################### >>>>>> 2.5 MiB/s >>>>>> done >>>>>> Bytes transferred = 1123888 (112630 hex) >>>>>> => >>>>>> >>>> "data abort" messages: >>>> >>>> data abort >>>> pc : [<9ff8196c>] lr : [<9ffa1cd7>] >>>> reloc pc : [<8081496c>] lr : [<80834cd7>] >>>> sp : 9df38e60 ip : 9df38fc8 fp : 00000001 >>>> r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d >>>> r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 >>>> r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 >>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >>>> Code: 0303 60ca 4403 6091 (685a) f042 >>>> Resetting CPU ... >>>> >>>> objdump u-boot:pc is in malloc and lr is in env_attr_walk >>>> >>>> unlink(victim, bck, fwd); >>>> 80814966: 60ca str r2, [r1, #12] >>>> set_inuse_bit_at_offset(victim, victim_size); >>>> 80814968: 4403 add r3, r0 >>>> unlink(victim, bck, fwd); >>>> 8081496a: 6091 str r1, [r2, #8] >>>> set_inuse_bit_at_offset(victim, victim_size); >>>> 8081496c: 685a ldr r2, [r3, #4] >>>> 8081496e: f042 0201 orr.w r2, r2, #1 >>>> 80814972: 605a str r2, [r3, #4] >>>> >>>> r3 is 3ff589e0 and it's not a valid ram address on am335x. >>>> >>>> >>> >>> I have seen crashes in common/dlmalloc.c before after double free() or >>> free() with an incorrect pointer. >>> >>> The assert() statements in do_check_inuse_chunk() are meant to catch >>> this but assert() as defined in include/log.h does not stop the code and >>> even does not print without _DEBUG=1. >>> >>> You should be able to get the assert output with >>> >>> #include <common.h> >>> #define _DEBUG 1 >>> #include <log.h> >>> >>> at the top of common/dlmalloc.c. >>> >>> You should get full malloc debug output with >> >> Hi: I had try add DEBUG marco before <log.h> and no other malloc message > > assert() checks for _DEBUG. Defining DEBUG after common.h will not > define _DEBUG. > > Best regards > > Heinrich > >> printed. >> >>> >>> #define DEBUG 1 >>> #include <common.h> >>> #include <log.h> >>> >>> Best regards >>> >>> Heinrich >> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: data abort when run 'dhcp' 2022-03-23 10:12 ` Heinrich Schuchardt 2022-03-23 11:54 ` qianfanguijin 2022-03-24 1:23 ` qianfan @ 2022-03-24 9:33 ` qianfan 2022-03-25 10:04 ` qianfan 2 siblings, 1 reply; 23+ messages in thread From: qianfan @ 2022-03-24 9:33 UTC (permalink / raw) To: Heinrich Schuchardt; +Cc: u-boot 在 2022/3/23 18:12, Heinrich Schuchardt 写道: > On 3/23/22 11:07, qianfan wrote: >> >> 在 2022/3/23 17:51, Heinrich Schuchardt 写道: >>> On 3/23/22 10:13, qianfan wrote: >>>> >>>> 在 2022/3/23 16:02, qianfan 写道: >>>>> >>>>> >>>>> 在 2022/3/23 15:45, qianfan 写道: >>>>>> >>>>>> >>>>>> 在 2022/3/23 10:28, qianfan 写道: >>>>>>> >>>>>>> Hi: >>>>>>> >>>>>>> I had a custom AM335X board connected my computer by usbnet. It >>>>>>> always report data abort when 'dhcp': >>>>>>> >>>>>>> Next it the log: >>>>>>> >>>>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 >>>>>>> +0800) >>>>>>> >>>>>>> CPU : AM335X-GP rev 2.1 >>>>>>> Model: WISDOM AM335X CCT >>>>>>> DRAM: 512 MiB >>>>>>> NAND: 256 MiB >>>>>>> MMC: OMAP SD/MMC: 0 >>>>>>> Loading Environment from NAND... *** Warning - bad CRC, using >>>>>>> default environment >>>>>>> >>>>>>> Net: Could not get PHY for ethernet@4a100000: addr 0 >>>>>>> eth2: ethernet@4a100000, eth3: usb_ether >>>>>>> Hit any key to stop autoboot: 0 >>>>>>> => setenv autoload no >>>>>>> => dhcp >>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>>>>>> MAC de:ad:be:ef:00:01 >>>>>>> HOST MAC de:ad:be:ef:00:00 >>>>>>> RNDIS ready >>>>>>> musb-hdrc: peripheral reset irq lost! >>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>>>>>> USB RNDIS network up! >>>>>>> BOOTP broadcast 1 >>>>>>> BOOTP broadcast 2 >>>>>>> BOOTP broadcast 3 >>>>>>> DHCP client bound to address 192.168.200.4 (757 ms) >>>>>>> data abort >>>>>>> pc : [<9fe9b0a2>] lr : [<9febbc3f>] >>>>>>> reloc pc : [<808130a2>] lr : [<80833c3f>] >>>>>>> sp : 9de53410 ip : 9de53578 fp : 00000001 >>>>>>> r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 >>>>>>> r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 >>>>>>> r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 >>>>>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >>>>>>> Code: f023 0303 60ca 4403 (6091) 685a >>>>>>> Resetting CPU ... >>>>>>> >>>>>>> resetting ... >>>>>>> >>>>>>> >>>>>>> It's there has any doc about how to debug data abort? Or is the bug >>>>>>> is already fixed? >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>> This bug doesn't fixed on master code. I found v2021.01 is good and >>>>>> v2021.04-rc2 is bad. >>>>>> >>>>>> Also I had tested this on beaglebone black with am335x_evm_defconfig, >>>>>> has the simliar problem. >>>>>> >>>>>> find the first bug commit via 'git bisect': it told me that commit >>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very >>>>>> strange due to this commit doesn't touch any dhcp or network code. >>>>>> >>>>>> ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug >>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit >>>>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 >>>>>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de> >>>>>> Date: Wed Jan 20 22:21:53 2021 +0100 >>>>>> >>>>>> fs: fat: consistent error handling for flush_dir() >>>>>> >>>>>> Provide function description for flush_dir(). >>>>>> Move all error messages for flush_dir() from the callers to the >>>>>> function. >>>>>> Move mapping of errors to -EIO to the function. >>>>>> Always check return value of flush_dir() (Coverity CID 316362). >>>>>> >>>>>> In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. >>>>>> >>>>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> >>>>>> >>>>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e >>>>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M fs >>>>>> >>>>>> >>>>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before >>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917: >>>>>> >>>>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error >>>>>> handling for flush_dir() >>>>>> * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag >>>>>> 'u-boot-rockchip-20210121' of >>>>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip >>>>>> |\ >>>>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc >>>>>> based PCIe controller driver >>>>>> >>>>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. >>>>>> >>>>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800) >>>>>> >>>>>> CPU : AM335X-GP rev 2.1 >>>>>> Model: TI AM335x BeagleBone Black >>>>>> DRAM: 512 MiB >>>>>> WDT: Started with servicing (60s timeout) >>>>>> NAND: 0 MiB >>>>>> MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 >>>>>> Loading Environment from FAT... <ethaddr> not set. Validating first >>>>>> E-fuse MAC >>>>>> Net: eth2: ethernet@4a100000, eth3: usb_ether >>>>>> Hit any key to stop autoboot: 0 >>>>>> => dhcp >>>>>> ethernet@4a100000 Waiting for PHY auto negotiation to >>>>>> complete......... TIMEOUT ! >>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>>>>> MAC de:ad:be:ef:00:01 >>>>>> HOST MAC de:ad:be:ef:00:00 >>>>>> RNDIS ready >>>>>> musb-hdrc: peripheral reset irq lost! >>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>>>>> USB RNDIS network up! >>>>>> BOOTP broadcast 1 >>>>>> BOOTP broadcast 2 >>>>>> BOOTP broadcast 3 >>>>>> DHCP client bound to address 192.168.200.157 (757 ms) >>>>>> Using usb_ether device >>>>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157 >>>>>> Filename 'u-boot.img'. >>>>>> Load address: 0x82000000 >>>>>> Loading: >>>>>> ################################################################# >>>>>> ################################################################# >>>>>> ################################################################# >>>>>> ######################### >>>>>> 2.5 MiB/s >>>>>> done >>>>>> Bytes transferred = 1123888 (112630 hex) >>>>>> => >>>>>> >>>> "data abort" messages: >>>> >>>> data abort >>>> pc : [<9ff8196c>] lr : [<9ffa1cd7>] >>>> reloc pc : [<8081496c>] lr : [<80834cd7>] >>>> sp : 9df38e60 ip : 9df38fc8 fp : 00000001 >>>> r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d >>>> r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 >>>> r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 >>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >>>> Code: 0303 60ca 4403 6091 (685a) f042 >>>> Resetting CPU ... >>>> >>>> objdump u-boot:pc is in malloc and lr is in env_attr_walk >>>> >>>> unlink(victim, bck, fwd); >>>> 80814966: 60ca str r2, [r1, #12] >>>> set_inuse_bit_at_offset(victim, victim_size); >>>> 80814968: 4403 add r3, r0 >>>> unlink(victim, bck, fwd); >>>> 8081496a: 6091 str r1, [r2, #8] >>>> set_inuse_bit_at_offset(victim, victim_size); >>>> 8081496c: 685a ldr r2, [r3, #4] >>>> 8081496e: f042 0201 orr.w r2, r2, #1 >>>> 80814972: 605a str r2, [r3, #4] >>>> >>>> r3 is 3ff589e0 and it's not a valid ram address on am335x. >>>> >>>> >>> >>> I have seen crashes in common/dlmalloc.c before after double free() or >>> free() with an incorrect pointer. >>> >>> The assert() statements in do_check_inuse_chunk() are meant to catch >>> this but assert() as defined in include/log.h does not stop the code and >>> even does not print without _DEBUG=1. >>> >>> You should be able to get the assert output with >>> >>> #include <common.h> >>> #define _DEBUG 1 >>> #include <log.h> >>> >>> at the top of common/dlmalloc.c. >>> >>> You should get full malloc debug output with >> >> Hi: I had try add DEBUG marco before <log.h> and no other malloc message > > assert() checks for _DEBUG. Defining DEBUG after common.h will not > define _DEBUG. Finally I got a malloc error message on console: TFTP from server 192.168.200.1; our IP address is 192.168.200.39 Filename 'u-boot.img'. Load address: 0x82000000 Loading: ################################################################# ################################################################# ################################################################# ###################################################### 0 Bytes 1.9 MiB/s done Bytes transferred = 1274816 (1373c0 hex) common/dlmalloc.c:819: do_check_chunk: Assertion `(char*)p + sz <= (char*)top' failed. I had tried many times, do_check_chunk not always failed, and sometimes it report common/dlmalloc.c:802: do_check_chunk: Assertion `!chunk_is_mmapped(p)' failed. The situation is not the same. I got a bt stack when malloc failed: (gdb) bt #0 0x9ffb5684 in panic_finish () at lib/panic.c:23 #1 panic (fmt=0x9ffbd96b "%s:%u: %s: Assertion `%s' failed.") at lib/panic.c:49 #2 0x9ffb5696 in __assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>, function=<optimized out>) at lib/panic.c:56 #3 0x9ff76910 in do_check_inuse_chunk (p=p@entry=0x9ffd7200) at common/dlmalloc.c:866 #4 0x9ff769d6 in do_check_malloced_chunk (p=p@entry=0x9ffd7200, s=s@entry=24) at common/dlmalloc.c:900 #5 0x9ff76da6 in malloc (bytes=<optimized out>) at common/dlmalloc.c:1552 #6 0x9ff96b72 in env_attr_walk (attr_list=<optimized out>, callback=0x9ff969f9 <regex_callback>, priv=0x9df28dc8) at env/attr.c:70 #7 0x9ff96bc2 in env_attr_lookup (attr_list=<optimized out>, name=<optimized out>, attributes=0x9df28dec "") at env/attr.c:184 #8 0x9ff97146 in env_callback_init (var_entry=0x9df46f60) at env/callback.c:67 #9 0x9ffb36fc in hsearch_r (item=..., action=ENV_ENTER, retval=0x9df28f60, htab=0x9ffdbce8, flag=512) at lib/hashtable.c:403 #10 0x9ff7090e in _do_env_set (argc=<optimized out>, argv=<optimized out>, env_flag=512, flag=0) at cmd/nvedit.c:296 #11 0x9ff70b64 in env_set (varname=<optimized out>, varvalue=<optimized out>) at cmd/nvedit.c:318 #12 0x9ff6d522 in netboot_update_env () at cmd/net.c:133 #13 netboot_common (proto=DHCP, cmdtp=0x9ffdd0e8, argc=<optimized out>, argv=0x9df442c8) at cmd/net.c:268 #14 0x9ff783a4 in cmd_call (repeatable=0x9df29008, argv=0x9df442c8, argc=1, flag=0, cmdtp=0x9ffdd0e8) at common/command.c:580 #15 cmd_process (flag=<optimized out>, argc=1, argv=0x9df442c8, repeatable=0x9ffdf6a0, ticks=0x0) at common/command.c:635 #16 0x9ff71d16 in run_pipe_real (pi=0x9df44220) at common/cli_hush.c:1676 #17 run_list_real (pi=<optimized out>) at common/cli_hush.c:1873 #18 0x9ff71e28 in run_list (pi=0x9df44220) at common/cli_hush.c:2022 #19 parse_stream_outer (inp=inp@entry=0x9df290e8, flag=flag@entry=2) at common/cli_hush.c:3206 #20 0x9ff721ba in parse_file_outer () at common/cli_hush.c:3289 #21 0x9ff77c1a in cli_loop () at common/cli.c:229 #22 0x9ff70d3e in main_loop () at common/main.c:66 #23 0x9ff72672 in run_main_loop () at common/board_r.c:584 #24 0x9ff72830 in initcall_run_list (init_sequence=0x9ffd7224) at include/initcall.h:46 #25 board_init_r (new_gd=<optimized out>, dest_addr=<optimized out>) at common/board_r.c:822 Backtrace stopped: previous frame identical to this frame (corrupt stack?) > > Best regards > > Heinrich > >> printed. >> >>> >>> #define DEBUG 1 >>> #include <common.h> >>> #include <log.h> >>> >>> Best regards >>> >>> Heinrich >> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: data abort when run 'dhcp' 2022-03-24 9:33 ` qianfan @ 2022-03-25 10:04 ` qianfan 2023-07-20 16:39 ` Miquel Raynal 0 siblings, 1 reply; 23+ messages in thread From: qianfan @ 2022-03-25 10:04 UTC (permalink / raw) To: Heinrich Schuchardt; +Cc: u-boot It's very strange. And I can't detect it's a bug of usb or dlmalloc. 1. Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok. 2. Starting u-boot and dhcp via am335x's usb net, data abort. 3. start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok. 在 2022/3/24 17:33, qianfan 写道: > > 在 2022/3/23 18:12, Heinrich Schuchardt 写道: >> On 3/23/22 11:07, qianfan wrote: >>> >>> 在 2022/3/23 17:51, Heinrich Schuchardt 写道: >>>> On 3/23/22 10:13, qianfan wrote: >>>>> >>>>> 在 2022/3/23 16:02, qianfan 写道: >>>>>> >>>>>> >>>>>> 在 2022/3/23 15:45, qianfan 写道: >>>>>>> >>>>>>> >>>>>>> 在 2022/3/23 10:28, qianfan 写道: >>>>>>>> >>>>>>>> Hi: >>>>>>>> >>>>>>>> I had a custom AM335X board connected my computer by usbnet. It >>>>>>>> always report data abort when 'dhcp': >>>>>>>> >>>>>>>> Next it the log: >>>>>>>> >>>>>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 >>>>>>>> +0800) >>>>>>>> >>>>>>>> CPU : AM335X-GP rev 2.1 >>>>>>>> Model: WISDOM AM335X CCT >>>>>>>> DRAM: 512 MiB >>>>>>>> NAND: 256 MiB >>>>>>>> MMC: OMAP SD/MMC: 0 >>>>>>>> Loading Environment from NAND... *** Warning - bad CRC, using >>>>>>>> default environment >>>>>>>> >>>>>>>> Net: Could not get PHY for ethernet@4a100000: addr 0 >>>>>>>> eth2: ethernet@4a100000, eth3: usb_ether >>>>>>>> Hit any key to stop autoboot: 0 >>>>>>>> => setenv autoload no >>>>>>>> => dhcp >>>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>>>>>>> MAC de:ad:be:ef:00:01 >>>>>>>> HOST MAC de:ad:be:ef:00:00 >>>>>>>> RNDIS ready >>>>>>>> musb-hdrc: peripheral reset irq lost! >>>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>>>>>>> USB RNDIS network up! >>>>>>>> BOOTP broadcast 1 >>>>>>>> BOOTP broadcast 2 >>>>>>>> BOOTP broadcast 3 >>>>>>>> DHCP client bound to address 192.168.200.4 (757 ms) >>>>>>>> data abort >>>>>>>> pc : [<9fe9b0a2>] lr : [<9febbc3f>] >>>>>>>> reloc pc : [<808130a2>] lr : [<80833c3f>] >>>>>>>> sp : 9de53410 ip : 9de53578 fp : 00000001 >>>>>>>> r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 >>>>>>>> r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 >>>>>>>> r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 >>>>>>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >>>>>>>> Code: f023 0303 60ca 4403 (6091) 685a >>>>>>>> Resetting CPU ... >>>>>>>> >>>>>>>> resetting ... >>>>>>>> >>>>>>>> >>>>>>>> It's there has any doc about how to debug data abort? Or is the bug >>>>>>>> is already fixed? >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>> This bug doesn't fixed on master code. I found v2021.01 is good and >>>>>>> v2021.04-rc2 is bad. >>>>>>> >>>>>>> Also I had tested this on beaglebone black with am335x_evm_defconfig, >>>>>>> has the simliar problem. >>>>>>> >>>>>>> find the first bug commit via 'git bisect': it told me that commit >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very >>>>>>> strange due to this commit doesn't touch any dhcp or network code. >>>>>>> >>>>>>> ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit >>>>>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 >>>>>>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de> >>>>>>> Date: Wed Jan 20 22:21:53 2021 +0100 >>>>>>> >>>>>>> fs: fat: consistent error handling for flush_dir() >>>>>>> >>>>>>> Provide function description for flush_dir(). >>>>>>> Move all error messages for flush_dir() from the callers to the >>>>>>> function. >>>>>>> Move mapping of errors to -EIO to the function. >>>>>>> Always check return value of flush_dir() (Coverity CID 316362). >>>>>>> >>>>>>> In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. >>>>>>> >>>>>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> >>>>>>> >>>>>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e >>>>>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M fs >>>>>>> >>>>>>> >>>>>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917: >>>>>>> >>>>>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error >>>>>>> handling for flush_dir() >>>>>>> * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag >>>>>>> 'u-boot-rockchip-20210121' of >>>>>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip >>>>>>> |\ >>>>>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc >>>>>>> based PCIe controller driver >>>>>>> >>>>>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. >>>>>>> >>>>>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800) >>>>>>> >>>>>>> CPU : AM335X-GP rev 2.1 >>>>>>> Model: TI AM335x BeagleBone Black >>>>>>> DRAM: 512 MiB >>>>>>> WDT: Started with servicing (60s timeout) >>>>>>> NAND: 0 MiB >>>>>>> MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 >>>>>>> Loading Environment from FAT... <ethaddr> not set. Validating first >>>>>>> E-fuse MAC >>>>>>> Net: eth2: ethernet@4a100000, eth3: usb_ether >>>>>>> Hit any key to stop autoboot: 0 >>>>>>> => dhcp >>>>>>> ethernet@4a100000 Waiting for PHY auto negotiation to >>>>>>> complete......... TIMEOUT ! >>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>>>>>> MAC de:ad:be:ef:00:01 >>>>>>> HOST MAC de:ad:be:ef:00:00 >>>>>>> RNDIS ready >>>>>>> musb-hdrc: peripheral reset irq lost! >>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>>>>>> USB RNDIS network up! >>>>>>> BOOTP broadcast 1 >>>>>>> BOOTP broadcast 2 >>>>>>> BOOTP broadcast 3 >>>>>>> DHCP client bound to address 192.168.200.157 (757 ms) >>>>>>> Using usb_ether device >>>>>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157 >>>>>>> Filename 'u-boot.img'. >>>>>>> Load address: 0x82000000 >>>>>>> Loading: >>>>>>> ################################################################# >>>>>>> ################################################################# >>>>>>> ################################################################# >>>>>>> ######################### >>>>>>> 2.5 MiB/s >>>>>>> done >>>>>>> Bytes transferred = 1123888 (112630 hex) >>>>>>> => >>>>>>> >>>>> "data abort" messages: >>>>> >>>>> data abort >>>>> pc : [<9ff8196c>] lr : [<9ffa1cd7>] >>>>> reloc pc : [<8081496c>] lr : [<80834cd7>] >>>>> sp : 9df38e60 ip : 9df38fc8 fp : 00000001 >>>>> r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d >>>>> r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 >>>>> r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 >>>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >>>>> Code: 0303 60ca 4403 6091 (685a) f042 >>>>> Resetting CPU ... >>>>> >>>>> objdump u-boot:pc is in malloc and lr is in env_attr_walk >>>>> >>>>> unlink(victim, bck, fwd); >>>>> 80814966: 60ca str r2, [r1, #12] >>>>> set_inuse_bit_at_offset(victim, victim_size); >>>>> 80814968: 4403 add r3, r0 >>>>> unlink(victim, bck, fwd); >>>>> 8081496a: 6091 str r1, [r2, #8] >>>>> set_inuse_bit_at_offset(victim, victim_size); >>>>> 8081496c: 685a ldr r2, [r3, #4] >>>>> 8081496e: f042 0201 orr.w r2, r2, #1 >>>>> 80814972: 605a str r2, [r3, #4] >>>>> >>>>> r3 is 3ff589e0 and it's not a valid ram address on am335x. >>>>> >>>>> >>>> >>>> I have seen crashes in common/dlmalloc.c before after double free() or >>>> free() with an incorrect pointer. >>>> >>>> The assert() statements in do_check_inuse_chunk() are meant to catch >>>> this but assert() as defined in include/log.h does not stop the code and >>>> even does not print without _DEBUG=1. >>>> >>>> You should be able to get the assert output with >>>> >>>> #include <common.h> >>>> #define _DEBUG 1 >>>> #include <log.h> >>>> >>>> at the top of common/dlmalloc.c. >>>> >>>> You should get full malloc debug output with >>> >>> Hi: I had try add DEBUG marco before <log.h> and no other malloc message >> >> assert() checks for _DEBUG. Defining DEBUG after common.h will not >> define _DEBUG. > > Finally I got a malloc error message on console: > > TFTP from server 192.168.200.1; our IP address is 192.168.200.39 > Filename 'u-boot.img'. > Load address: 0x82000000 > Loading: ################################################################# > ################################################################# > ################################################################# > ###################################################### 0 Bytes > 1.9 MiB/s > done > Bytes transferred = 1274816 (1373c0 hex) > common/dlmalloc.c:819: do_check_chunk: Assertion `(char*)p + sz <= (char*)top' > failed. > > I had tried many times, do_check_chunk not always failed, and sometimes it > report common/dlmalloc.c:802: do_check_chunk: Assertion `!chunk_is_mmapped(p)' > failed. The situation is not the same. > > I got a bt stack when malloc failed: > > (gdb) bt > #0 0x9ffb5684 in panic_finish () at lib/panic.c:23 > #1 panic (fmt=0x9ffbd96b "%s:%u: %s: Assertion `%s' failed.") at lib/panic.c:49 > #2 0x9ffb5696 in __assert_fail (assertion=<optimized out>, file=<optimized > out>, line=<optimized out>, function=<optimized out>) at lib/panic.c:56 > #3 0x9ff76910 in do_check_inuse_chunk (p=p@entry=0x9ffd7200) at > common/dlmalloc.c:866 > #4 0x9ff769d6 in do_check_malloced_chunk (p=p@entry=0x9ffd7200, s=s@entry=24) > at common/dlmalloc.c:900 > #5 0x9ff76da6 in malloc (bytes=<optimized out>) at common/dlmalloc.c:1552 > #6 0x9ff96b72 in env_attr_walk (attr_list=<optimized out>, > callback=0x9ff969f9 <regex_callback>, priv=0x9df28dc8) at env/attr.c:70 > #7 0x9ff96bc2 in env_attr_lookup (attr_list=<optimized out>, name=<optimized > out>, attributes=0x9df28dec "") at env/attr.c:184 > #8 0x9ff97146 in env_callback_init (var_entry=0x9df46f60) at env/callback.c:67 > #9 0x9ffb36fc in hsearch_r (item=..., action=ENV_ENTER, retval=0x9df28f60, > htab=0x9ffdbce8, flag=512) at lib/hashtable.c:403 > #10 0x9ff7090e in _do_env_set (argc=<optimized out>, argv=<optimized out>, > env_flag=512, flag=0) at cmd/nvedit.c:296 > #11 0x9ff70b64 in env_set (varname=<optimized out>, varvalue=<optimized out>) > at cmd/nvedit.c:318 > #12 0x9ff6d522 in netboot_update_env () at cmd/net.c:133 > #13 netboot_common (proto=DHCP, cmdtp=0x9ffdd0e8, argc=<optimized out>, > argv=0x9df442c8) at cmd/net.c:268 > #14 0x9ff783a4 in cmd_call (repeatable=0x9df29008, argv=0x9df442c8, argc=1, > flag=0, cmdtp=0x9ffdd0e8) at common/command.c:580 > #15 cmd_process (flag=<optimized out>, argc=1, argv=0x9df442c8, > repeatable=0x9ffdf6a0, ticks=0x0) at common/command.c:635 > #16 0x9ff71d16 in run_pipe_real (pi=0x9df44220) at common/cli_hush.c:1676 > #17 run_list_real (pi=<optimized out>) at common/cli_hush.c:1873 > #18 0x9ff71e28 in run_list (pi=0x9df44220) at common/cli_hush.c:2022 > #19 parse_stream_outer (inp=inp@entry=0x9df290e8, flag=flag@entry=2) at > common/cli_hush.c:3206 > #20 0x9ff721ba in parse_file_outer () at common/cli_hush.c:3289 > #21 0x9ff77c1a in cli_loop () at common/cli.c:229 > #22 0x9ff70d3e in main_loop () at common/main.c:66 > #23 0x9ff72672 in run_main_loop () at common/board_r.c:584 > #24 0x9ff72830 in initcall_run_list (init_sequence=0x9ffd7224) at > include/initcall.h:46 > #25 board_init_r (new_gd=<optimized out>, dest_addr=<optimized out>) at > common/board_r.c:822 > Backtrace stopped: previous frame identical to this frame (corrupt stack?) > >> >> Best regards >> >> Heinrich >> >>> printed. >>> >>>> >>>> #define DEBUG 1 >>>> #include <common.h> >>>> #include <log.h> >>>> >>>> Best regards >>>> >>>> Heinrich >>> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: data abort when run 'dhcp' 2022-03-25 10:04 ` qianfan @ 2023-07-20 16:39 ` Miquel Raynal 2023-07-20 17:55 ` Heinrich Schuchardt ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: Miquel Raynal @ 2023-07-20 16:39 UTC (permalink / raw) To: qianfan; +Cc: Heinrich Schuchardt, u-boot, Tom Rini Hello, qianfanguijin@163.com wrote on Fri, 25 Mar 2022 18:04:46 +0800: > It's very strange. And I can't detect it's a bug of usb or dlmalloc. > > 1. Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok. > > 2. Starting u-boot and dhcp via am335x's usb net, data abort. > > 3. start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok. I am sorry to re-open a thread that is one year old but this is still an open bug. The BBB is affected. In particular the BBBW because there is no Ethernet connector, which makes the Eth-over-USB emulation even more important. All U-Boots since 2021 are affected: spurious data aborts, usually at the end of network interactions (tftp, ping). I could not bisect it because the boot was deeply broken as well on a significant range of commits :-/. On my side I narrowed it down to an env update which fails in malloc as well. If I comment the env update, it fails a bit later. It really looks like a stack corruption which is either related to the Ethernet USB gadget or the USB controller driver itself. Network transfers on the BBBW using regular Ethernet does not trigger any error. I also observe the very strange "fix" mentioned above: starting and killing fastboot makes all tftp pass... If anyone has more details to share, or perhaps a subsequent thread giving more details, I would really like to see this fixed upstream, I suppose I am not the only one :-) Thanks, Miquèl > > 在 2022/3/24 17:33, qianfan 写道: > > > > 在 2022/3/23 18:12, Heinrich Schuchardt 写道: > >> On 3/23/22 11:07, qianfan wrote: > >>> > >>> 在 2022/3/23 17:51, Heinrich Schuchardt 写道: > >>>> On 3/23/22 10:13, qianfan wrote: > >>>>> > >>>>> 在 2022/3/23 16:02, qianfan 写道: > >>>>>> > >>>>>> > >>>>>> 在 2022/3/23 15:45, qianfan 写道: > >>>>>>> > >>>>>>> > >>>>>>> 在 2022/3/23 10:28, qianfan 写道: > >>>>>>>> > >>>>>>>> Hi: > >>>>>>>> > >>>>>>>> I had a custom AM335X board connected my computer by usbnet. It > >>>>>>>> always report data abort when 'dhcp': > >>>>>>>> > >>>>>>>> Next it the log: > >>>>>>>> > >>>>>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 > >>>>>>>> +0800) > >>>>>>>> > >>>>>>>> CPU : AM335X-GP rev 2.1 > >>>>>>>> Model: WISDOM AM335X CCT > >>>>>>>> DRAM: 512 MiB > >>>>>>>> NAND: 256 MiB > >>>>>>>> MMC: OMAP SD/MMC: 0 > >>>>>>>> Loading Environment from NAND... *** Warning - bad CRC, using > >>>>>>>> default environment > >>>>>>>> > >>>>>>>> Net: Could not get PHY for ethernet@4a100000: addr 0 > >>>>>>>> eth2: ethernet@4a100000, eth3: usb_ether > >>>>>>>> Hit any key to stop autoboot: 0 > >>>>>>>> => setenv autoload no > >>>>>>>> => dhcp > >>>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in > >>>>>>>> MAC de:ad:be:ef:00:01 > >>>>>>>> HOST MAC de:ad:be:ef:00:00 > >>>>>>>> RNDIS ready > >>>>>>>> musb-hdrc: peripheral reset irq lost! > >>>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS > >>>>>>>> USB RNDIS network up! > >>>>>>>> BOOTP broadcast 1 > >>>>>>>> BOOTP broadcast 2 > >>>>>>>> BOOTP broadcast 3 > >>>>>>>> DHCP client bound to address 192.168.200.4 (757 ms) > >>>>>>>> data abort > >>>>>>>> pc : [<9fe9b0a2>] lr : [<9febbc3f>] > >>>>>>>> reloc pc : [<808130a2>] lr : [<80833c3f>] > >>>>>>>> sp : 9de53410 ip : 9de53578 fp : 00000001 > >>>>>>>> r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 > >>>>>>>> r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 > >>>>>>>> r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 > >>>>>>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) > >>>>>>>> Code: f023 0303 60ca 4403 (6091) 685a > >>>>>>>> Resetting CPU ... > >>>>>>>> > >>>>>>>> resetting ... > >>>>>>>> > >>>>>>>> > >>>>>>>> It's there has any doc about how to debug data abort? Or is the bug > >>>>>>>> is already fixed? > >>>>>>>> > >>>>>>>> Thanks > >>>>>>>> > >>>>>>> This bug doesn't fixed on master code. I found v2021.01 is good and > >>>>>>> v2021.04-rc2 is bad. > >>>>>>> > >>>>>>> Also I had tested this on beaglebone black with am335x_evm_defconfig, > >>>>>>> has the simliar problem. > >>>>>>> > >>>>>>> find the first bug commit via 'git bisect': it told me that commit > >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very > >>>>>>> strange due to this commit doesn't touch any dhcp or network code. > >>>>>>> > >>>>>>> ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug > >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit > >>>>>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 > >>>>>>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de> > >>>>>>> Date: Wed Jan 20 22:21:53 2021 +0100 > >>>>>>> > >>>>>>> fs: fat: consistent error handling for flush_dir() > >>>>>>> > >>>>>>> Provide function description for flush_dir(). > >>>>>>> Move all error messages for flush_dir() from the callers to the > >>>>>>> function. > >>>>>>> Move mapping of errors to -EIO to the function. > >>>>>>> Always check return value of flush_dir() (Coverity CID 316362). > >>>>>>> > >>>>>>> In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. > >>>>>>> > >>>>>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> > >>>>>>> > >>>>>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e > >>>>>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M fs > >>>>>>> > >>>>>>> > >>>>>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before > >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917: > >>>>>>> > >>>>>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error > >>>>>>> handling for flush_dir() > >>>>>>> * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag > >>>>>>> 'u-boot-rockchip-20210121' of > >>>>>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip > >>>>>>> |\ > >>>>>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc > >>>>>>> based PCIe controller driver > >>>>>>> > >>>>>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. > >>>>>>> > >>>>>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800) > >>>>>>> > >>>>>>> CPU : AM335X-GP rev 2.1 > >>>>>>> Model: TI AM335x BeagleBone Black > >>>>>>> DRAM: 512 MiB > >>>>>>> WDT: Started with servicing (60s timeout) > >>>>>>> NAND: 0 MiB > >>>>>>> MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 > >>>>>>> Loading Environment from FAT... <ethaddr> not set. Validating first > >>>>>>> E-fuse MAC > >>>>>>> Net: eth2: ethernet@4a100000, eth3: usb_ether > >>>>>>> Hit any key to stop autoboot: 0 > >>>>>>> => dhcp > >>>>>>> ethernet@4a100000 Waiting for PHY auto negotiation to > >>>>>>> complete......... TIMEOUT ! > >>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in > >>>>>>> MAC de:ad:be:ef:00:01 > >>>>>>> HOST MAC de:ad:be:ef:00:00 > >>>>>>> RNDIS ready > >>>>>>> musb-hdrc: peripheral reset irq lost! > >>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS > >>>>>>> USB RNDIS network up! > >>>>>>> BOOTP broadcast 1 > >>>>>>> BOOTP broadcast 2 > >>>>>>> BOOTP broadcast 3 > >>>>>>> DHCP client bound to address 192.168.200.157 (757 ms) > >>>>>>> Using usb_ether device > >>>>>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157 > >>>>>>> Filename 'u-boot.img'. > >>>>>>> Load address: 0x82000000 > >>>>>>> Loading: > >>>>>>> ################################################################# > >>>>>>> ################################################################# > >>>>>>> ################################################################# > >>>>>>> ######################### > >>>>>>> 2.5 MiB/s > >>>>>>> done > >>>>>>> Bytes transferred = 1123888 (112630 hex) > >>>>>>> => > >>>>>>> > >>>>> "data abort" messages: > >>>>> > >>>>> data abort > >>>>> pc : [<9ff8196c>] lr : [<9ffa1cd7>] > >>>>> reloc pc : [<8081496c>] lr : [<80834cd7>] > >>>>> sp : 9df38e60 ip : 9df38fc8 fp : 00000001 > >>>>> r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d > >>>>> r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 > >>>>> r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 > >>>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) > >>>>> Code: 0303 60ca 4403 6091 (685a) f042 > >>>>> Resetting CPU ... > >>>>> > >>>>> objdump u-boot:pc is in malloc and lr is in env_attr_walk > >>>>> > >>>>> unlink(victim, bck, fwd); > >>>>> 80814966: 60ca str r2, [r1, #12] > >>>>> set_inuse_bit_at_offset(victim, victim_size); > >>>>> 80814968: 4403 add r3, r0 > >>>>> unlink(victim, bck, fwd); > >>>>> 8081496a: 6091 str r1, [r2, #8] > >>>>> set_inuse_bit_at_offset(victim, victim_size); > >>>>> 8081496c: 685a ldr r2, [r3, #4] > >>>>> 8081496e: f042 0201 orr.w r2, r2, #1 > >>>>> 80814972: 605a str r2, [r3, #4] > >>>>> > >>>>> r3 is 3ff589e0 and it's not a valid ram address on am335x. > >>>>> > >>>>> > >>>> > >>>> I have seen crashes in common/dlmalloc.c before after double free() or > >>>> free() with an incorrect pointer. > >>>> > >>>> The assert() statements in do_check_inuse_chunk() are meant to catch > >>>> this but assert() as defined in include/log.h does not stop the code and > >>>> even does not print without _DEBUG=1. > >>>> > >>>> You should be able to get the assert output with > >>>> > >>>> #include <common.h> > >>>> #define _DEBUG 1 > >>>> #include <log.h> > >>>> > >>>> at the top of common/dlmalloc.c. > >>>> > >>>> You should get full malloc debug output with > >>> > >>> Hi: I had try add DEBUG marco before <log.h> and no other malloc message > >> > >> assert() checks for _DEBUG. Defining DEBUG after common.h will not > >> define _DEBUG. > > > > Finally I got a malloc error message on console: > > > > TFTP from server 192.168.200.1; our IP address is 192.168.200.39 > > Filename 'u-boot.img'. > > Load address: 0x82000000 > > Loading: ################################################################# > > ################################################################# > > ################################################################# > > ###################################################### 0 Bytes > > 1.9 MiB/s > > done > > Bytes transferred = 1274816 (1373c0 hex) > > common/dlmalloc.c:819: do_check_chunk: Assertion `(char*)p + sz <= (char*)top' > failed. > > > > I had tried many times, do_check_chunk not always failed, and sometimes it > report common/dlmalloc.c:802: do_check_chunk: Assertion `!chunk_is_mmapped(p)' > failed. The situation is not the same. > > > > I got a bt stack when malloc failed: > > > > (gdb) bt > > #0 0x9ffb5684 in panic_finish () at lib/panic.c:23 > > #1 panic (fmt=0x9ffbd96b "%s:%u: %s: Assertion `%s' failed.") at lib/panic.c:49 > > #2 0x9ffb5696 in __assert_fail (assertion=<optimized out>, file=<optimized > out>, line=<optimized out>, function=<optimized out>) at lib/panic.c:56 > > #3 0x9ff76910 in do_check_inuse_chunk (p=p@entry=0x9ffd7200) at > common/dlmalloc.c:866 > > #4 0x9ff769d6 in do_check_malloced_chunk (p=p@entry=0x9ffd7200, s=s@entry=24) > at common/dlmalloc.c:900 > > #5 0x9ff76da6 in malloc (bytes=<optimized out>) at common/dlmalloc.c:1552 > > #6 0x9ff96b72 in env_attr_walk (attr_list=<optimized out>, > callback=0x9ff969f9 <regex_callback>, priv=0x9df28dc8) at env/attr.c:70 > > #7 0x9ff96bc2 in env_attr_lookup (attr_list=<optimized out>, name=<optimized > out>, attributes=0x9df28dec "") at env/attr.c:184 > > #8 0x9ff97146 in env_callback_init (var_entry=0x9df46f60) at env/callback.c:67 > > #9 0x9ffb36fc in hsearch_r (item=..., action=ENV_ENTER, retval=0x9df28f60, > htab=0x9ffdbce8, flag=512) at lib/hashtable.c:403 > > #10 0x9ff7090e in _do_env_set (argc=<optimized out>, argv=<optimized out>, > env_flag=512, flag=0) at cmd/nvedit.c:296 > > #11 0x9ff70b64 in env_set (varname=<optimized out>, varvalue=<optimized out>) > at cmd/nvedit.c:318 > > #12 0x9ff6d522 in netboot_update_env () at cmd/net.c:133 > > #13 netboot_common (proto=DHCP, cmdtp=0x9ffdd0e8, argc=<optimized out>, > argv=0x9df442c8) at cmd/net.c:268 > > #14 0x9ff783a4 in cmd_call (repeatable=0x9df29008, argv=0x9df442c8, argc=1, > flag=0, cmdtp=0x9ffdd0e8) at common/command.c:580 > > #15 cmd_process (flag=<optimized out>, argc=1, argv=0x9df442c8, > repeatable=0x9ffdf6a0, ticks=0x0) at common/command.c:635 > > #16 0x9ff71d16 in run_pipe_real (pi=0x9df44220) at common/cli_hush.c:1676 > > #17 run_list_real (pi=<optimized out>) at common/cli_hush.c:1873 > > #18 0x9ff71e28 in run_list (pi=0x9df44220) at common/cli_hush.c:2022 > > #19 parse_stream_outer (inp=inp@entry=0x9df290e8, flag=flag@entry=2) at > common/cli_hush.c:3206 > > #20 0x9ff721ba in parse_file_outer () at common/cli_hush.c:3289 > > #21 0x9ff77c1a in cli_loop () at common/cli.c:229 > > #22 0x9ff70d3e in main_loop () at common/main.c:66 > > #23 0x9ff72672 in run_main_loop () at common/board_r.c:584 > > #24 0x9ff72830 in initcall_run_list (init_sequence=0x9ffd7224) at > include/initcall.h:46 > > #25 board_init_r (new_gd=<optimized out>, dest_addr=<optimized out>) at > common/board_r.c:822 > > Backtrace stopped: previous frame identical to this frame (corrupt stack?) > > > >> > >> Best regards > >> > >> Heinrich > >> > >>> printed. > >>> > >>>> > >>>> #define DEBUG 1 > >>>> #include <common.h> > >>>> #include <log.h> > >>>> > >>>> Best regards > >>>> > >>>> Heinrich > >>> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: data abort when run 'dhcp' 2023-07-20 16:39 ` Miquel Raynal @ 2023-07-20 17:55 ` Heinrich Schuchardt 2023-07-21 11:54 ` Miquel Raynal 2023-07-20 18:34 ` Tom Rini 2023-07-21 0:31 ` qianfan 2 siblings, 1 reply; 23+ messages in thread From: Heinrich Schuchardt @ 2023-07-20 17:55 UTC (permalink / raw) To: Miquel Raynal, qianfan; +Cc: u-boot, Tom Rini Am 20. Juli 2023 18:39:17 MESZ schrieb Miquel Raynal <miquel.raynal@bootlin.com>: >Hello, > >qianfanguijin@163.com wrote on Fri, 25 Mar 2022 18:04:46 +0800: > >> It's very strange. And I can't detect it's a bug of usb or dlmalloc. >> >> 1. Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok. >> >> 2. Starting u-boot and dhcp via am335x's usb net, data abort. >> >> 3. start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok. > >I am sorry to re-open a thread that is one year old but this is >still an open bug. The BBB is affected. In particular the BBBW >because there is no Ethernet connector, which makes the Eth-over-USB >emulation even more important. All U-Boots since 2021 are affected: >spurious data aborts, usually at the end of network interactions (tftp, >ping). I could not bisect it because the boot was deeply broken as >well on a significant range of commits :-/. > >On my side I narrowed it down to an env update which fails in malloc as >well. If I comment the env update, it fails a bit later. It really >looks like a stack corruption which is either related to the Ethernet >USB gadget or the USB controller driver itself. Network transfers on >the BBBW using regular Ethernet does not trigger any error. > >I also observe the very strange "fix" mentioned above: starting and >killing fastboot makes all tftp pass... If anyone has more details to >share, or perhaps a subsequent thread giving more details, I would >really like to see this fixed upstream, I suppose I am not the only one >:-) > >Thanks, >Miquèl Can this problem be reproduced on QEMU? Best regards Heinrich > >> >> 在 2022/3/24 17:33, qianfan 写道: >> > >> > 在 2022/3/23 18:12, Heinrich Schuchardt 写道: >> >> On 3/23/22 11:07, qianfan wrote: >> >>> >> >>> 在 2022/3/23 17:51, Heinrich Schuchardt 写道: >> >>>> On 3/23/22 10:13, qianfan wrote: >> >>>>> >> >>>>> 在 2022/3/23 16:02, qianfan 写道: >> >>>>>> >> >>>>>> >> >>>>>> 在 2022/3/23 15:45, qianfan 写道: >> >>>>>>> >> >>>>>>> >> >>>>>>> 在 2022/3/23 10:28, qianfan 写道: >> >>>>>>>> >> >>>>>>>> Hi: >> >>>>>>>> >> >>>>>>>> I had a custom AM335X board connected my computer by usbnet. It >> >>>>>>>> always report data abort when 'dhcp': >> >>>>>>>> >> >>>>>>>> Next it the log: >> >>>>>>>> >> >>>>>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 >> >>>>>>>> +0800) >> >>>>>>>> >> >>>>>>>> CPU : AM335X-GP rev 2.1 >> >>>>>>>> Model: WISDOM AM335X CCT >> >>>>>>>> DRAM: 512 MiB >> >>>>>>>> NAND: 256 MiB >> >>>>>>>> MMC: OMAP SD/MMC: 0 >> >>>>>>>> Loading Environment from NAND... *** Warning - bad CRC, using >> >>>>>>>> default environment >> >>>>>>>> >> >>>>>>>> Net: Could not get PHY for ethernet@4a100000: addr 0 >> >>>>>>>> eth2: ethernet@4a100000, eth3: usb_ether >> >>>>>>>> Hit any key to stop autoboot: 0 >> >>>>>>>> => setenv autoload no >> >>>>>>>> => dhcp >> >>>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >> >>>>>>>> MAC de:ad:be:ef:00:01 >> >>>>>>>> HOST MAC de:ad:be:ef:00:00 >> >>>>>>>> RNDIS ready >> >>>>>>>> musb-hdrc: peripheral reset irq lost! >> >>>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >> >>>>>>>> USB RNDIS network up! >> >>>>>>>> BOOTP broadcast 1 >> >>>>>>>> BOOTP broadcast 2 >> >>>>>>>> BOOTP broadcast 3 >> >>>>>>>> DHCP client bound to address 192.168.200.4 (757 ms) >> >>>>>>>> data abort >> >>>>>>>> pc : [<9fe9b0a2>] lr : [<9febbc3f>] >> >>>>>>>> reloc pc : [<808130a2>] lr : [<80833c3f>] >> >>>>>>>> sp : 9de53410 ip : 9de53578 fp : 00000001 >> >>>>>>>> r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 >> >>>>>>>> r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 >> >>>>>>>> r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 >> >>>>>>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >> >>>>>>>> Code: f023 0303 60ca 4403 (6091) 685a >> >>>>>>>> Resetting CPU ... >> >>>>>>>> >> >>>>>>>> resetting ... >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> It's there has any doc about how to debug data abort? Or is the bug >> >>>>>>>> is already fixed? >> >>>>>>>> >> >>>>>>>> Thanks >> >>>>>>>> >> >>>>>>> This bug doesn't fixed on master code. I found v2021.01 is good and >> >>>>>>> v2021.04-rc2 is bad. >> >>>>>>> >> >>>>>>> Also I had tested this on beaglebone black with am335x_evm_defconfig, >> >>>>>>> has the simliar problem. >> >>>>>>> >> >>>>>>> find the first bug commit via 'git bisect': it told me that commit >> >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very >> >>>>>>> strange due to this commit doesn't touch any dhcp or network code. >> >>>>>>> >> >>>>>>> ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug >> >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit >> >>>>>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 >> >>>>>>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de> >> >>>>>>> Date: Wed Jan 20 22:21:53 2021 +0100 >> >>>>>>> >> >>>>>>> fs: fat: consistent error handling for flush_dir() >> >>>>>>> >> >>>>>>> Provide function description for flush_dir(). >> >>>>>>> Move all error messages for flush_dir() from the callers to the >> >>>>>>> function. >> >>>>>>> Move mapping of errors to -EIO to the function. >> >>>>>>> Always check return value of flush_dir() (Coverity CID 316362). >> >>>>>>> >> >>>>>>> In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. >> >>>>>>> >> >>>>>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> >> >>>>>>> >> >>>>>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e >> >>>>>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M fs >> >>>>>>> >> >>>>>>> >> >>>>>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before >> >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917: >> >>>>>>> >> >>>>>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error >> >>>>>>> handling for flush_dir() >> >>>>>>> * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag >> >>>>>>> 'u-boot-rockchip-20210121' of >> >>>>>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip >> >>>>>>> |\ >> >>>>>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc >> >>>>>>> based PCIe controller driver >> >>>>>>> >> >>>>>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. >> >>>>>>> >> >>>>>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800) >> >>>>>>> >> >>>>>>> CPU : AM335X-GP rev 2.1 >> >>>>>>> Model: TI AM335x BeagleBone Black >> >>>>>>> DRAM: 512 MiB >> >>>>>>> WDT: Started with servicing (60s timeout) >> >>>>>>> NAND: 0 MiB >> >>>>>>> MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 >> >>>>>>> Loading Environment from FAT... <ethaddr> not set. Validating first >> >>>>>>> E-fuse MAC >> >>>>>>> Net: eth2: ethernet@4a100000, eth3: usb_ether >> >>>>>>> Hit any key to stop autoboot: 0 >> >>>>>>> => dhcp >> >>>>>>> ethernet@4a100000 Waiting for PHY auto negotiation to >> >>>>>>> complete......... TIMEOUT ! >> >>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >> >>>>>>> MAC de:ad:be:ef:00:01 >> >>>>>>> HOST MAC de:ad:be:ef:00:00 >> >>>>>>> RNDIS ready >> >>>>>>> musb-hdrc: peripheral reset irq lost! >> >>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >> >>>>>>> USB RNDIS network up! >> >>>>>>> BOOTP broadcast 1 >> >>>>>>> BOOTP broadcast 2 >> >>>>>>> BOOTP broadcast 3 >> >>>>>>> DHCP client bound to address 192.168.200.157 (757 ms) >> >>>>>>> Using usb_ether device >> >>>>>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157 >> >>>>>>> Filename 'u-boot.img'. >> >>>>>>> Load address: 0x82000000 >> >>>>>>> Loading: >> >>>>>>> ################################################################# >> >>>>>>> ################################################################# >> >>>>>>> ################################################################# >> >>>>>>> ######################### >> >>>>>>> 2.5 MiB/s >> >>>>>>> done >> >>>>>>> Bytes transferred = 1123888 (112630 hex) >> >>>>>>> => >> >>>>>>> >> >>>>> "data abort" messages: >> >>>>> >> >>>>> data abort >> >>>>> pc : [<9ff8196c>] lr : [<9ffa1cd7>] >> >>>>> reloc pc : [<8081496c>] lr : [<80834cd7>] >> >>>>> sp : 9df38e60 ip : 9df38fc8 fp : 00000001 >> >>>>> r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d >> >>>>> r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 >> >>>>> r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 >> >>>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >> >>>>> Code: 0303 60ca 4403 6091 (685a) f042 >> >>>>> Resetting CPU ... >> >>>>> >> >>>>> objdump u-boot:pc is in malloc and lr is in env_attr_walk >> >>>>> >> >>>>> unlink(victim, bck, fwd); >> >>>>> 80814966: 60ca str r2, [r1, #12] >> >>>>> set_inuse_bit_at_offset(victim, victim_size); >> >>>>> 80814968: 4403 add r3, r0 >> >>>>> unlink(victim, bck, fwd); >> >>>>> 8081496a: 6091 str r1, [r2, #8] >> >>>>> set_inuse_bit_at_offset(victim, victim_size); >> >>>>> 8081496c: 685a ldr r2, [r3, #4] >> >>>>> 8081496e: f042 0201 orr.w r2, r2, #1 >> >>>>> 80814972: 605a str r2, [r3, #4] >> >>>>> >> >>>>> r3 is 3ff589e0 and it's not a valid ram address on am335x. >> >>>>> >> >>>>> >> >>>> >> >>>> I have seen crashes in common/dlmalloc.c before after double free() or >> >>>> free() with an incorrect pointer. >> >>>> >> >>>> The assert() statements in do_check_inuse_chunk() are meant to catch >> >>>> this but assert() as defined in include/log.h does not stop the code and >> >>>> even does not print without _DEBUG=1. >> >>>> >> >>>> You should be able to get the assert output with >> >>>> >> >>>> #include <common.h> >> >>>> #define _DEBUG 1 >> >>>> #include <log.h> >> >>>> >> >>>> at the top of common/dlmalloc.c. >> >>>> >> >>>> You should get full malloc debug output with >> >>> >> >>> Hi: I had try add DEBUG marco before <log.h> and no other malloc message >> >> >> >> assert() checks for _DEBUG. Defining DEBUG after common.h will not >> >> define _DEBUG. >> > >> > Finally I got a malloc error message on console: >> > >> > TFTP from server 192.168.200.1; our IP address is 192.168.200.39 >> > Filename 'u-boot.img'. >> > Load address: 0x82000000 >> > Loading: ################################################################# >> > ################################################################# >> > ################################################################# >> > ###################################################### 0 Bytes >> > 1.9 MiB/s >> > done >> > Bytes transferred = 1274816 (1373c0 hex) >> > common/dlmalloc.c:819: do_check_chunk: Assertion `(char*)p + sz <= (char*)top' > failed. >> > >> > I had tried many times, do_check_chunk not always failed, and sometimes it > report common/dlmalloc.c:802: do_check_chunk: Assertion `!chunk_is_mmapped(p)' > failed. The situation is not the same. >> > >> > I got a bt stack when malloc failed: >> > >> > (gdb) bt >> > #0 0x9ffb5684 in panic_finish () at lib/panic.c:23 >> > #1 panic (fmt=0x9ffbd96b "%s:%u: %s: Assertion `%s' failed.") at lib/panic.c:49 >> > #2 0x9ffb5696 in __assert_fail (assertion=<optimized out>, file=<optimized > out>, line=<optimized out>, function=<optimized out>) at lib/panic.c:56 >> > #3 0x9ff76910 in do_check_inuse_chunk (p=p@entry=0x9ffd7200) at > common/dlmalloc.c:866 >> > #4 0x9ff769d6 in do_check_malloced_chunk (p=p@entry=0x9ffd7200, s=s@entry=24) > at common/dlmalloc.c:900 >> > #5 0x9ff76da6 in malloc (bytes=<optimized out>) at common/dlmalloc.c:1552 >> > #6 0x9ff96b72 in env_attr_walk (attr_list=<optimized out>, > callback=0x9ff969f9 <regex_callback>, priv=0x9df28dc8) at env/attr.c:70 >> > #7 0x9ff96bc2 in env_attr_lookup (attr_list=<optimized out>, name=<optimized > out>, attributes=0x9df28dec "") at env/attr.c:184 >> > #8 0x9ff97146 in env_callback_init (var_entry=0x9df46f60) at env/callback.c:67 >> > #9 0x9ffb36fc in hsearch_r (item=..., action=ENV_ENTER, retval=0x9df28f60, > htab=0x9ffdbce8, flag=512) at lib/hashtable.c:403 >> > #10 0x9ff7090e in _do_env_set (argc=<optimized out>, argv=<optimized out>, > env_flag=512, flag=0) at cmd/nvedit.c:296 >> > #11 0x9ff70b64 in env_set (varname=<optimized out>, varvalue=<optimized out>) > at cmd/nvedit.c:318 >> > #12 0x9ff6d522 in netboot_update_env () at cmd/net.c:133 >> > #13 netboot_common (proto=DHCP, cmdtp=0x9ffdd0e8, argc=<optimized out>, > argv=0x9df442c8) at cmd/net.c:268 >> > #14 0x9ff783a4 in cmd_call (repeatable=0x9df29008, argv=0x9df442c8, argc=1, > flag=0, cmdtp=0x9ffdd0e8) at common/command.c:580 >> > #15 cmd_process (flag=<optimized out>, argc=1, argv=0x9df442c8, > repeatable=0x9ffdf6a0, ticks=0x0) at common/command.c:635 >> > #16 0x9ff71d16 in run_pipe_real (pi=0x9df44220) at common/cli_hush.c:1676 >> > #17 run_list_real (pi=<optimized out>) at common/cli_hush.c:1873 >> > #18 0x9ff71e28 in run_list (pi=0x9df44220) at common/cli_hush.c:2022 >> > #19 parse_stream_outer (inp=inp@entry=0x9df290e8, flag=flag@entry=2) at > common/cli_hush.c:3206 >> > #20 0x9ff721ba in parse_file_outer () at common/cli_hush.c:3289 >> > #21 0x9ff77c1a in cli_loop () at common/cli.c:229 >> > #22 0x9ff70d3e in main_loop () at common/main.c:66 >> > #23 0x9ff72672 in run_main_loop () at common/board_r.c:584 >> > #24 0x9ff72830 in initcall_run_list (init_sequence=0x9ffd7224) at > include/initcall.h:46 >> > #25 board_init_r (new_gd=<optimized out>, dest_addr=<optimized out>) at > common/board_r.c:822 >> > Backtrace stopped: previous frame identical to this frame (corrupt stack?) >> > >> >> >> >> Best regards >> >> >> >> Heinrich >> >> >> >>> printed. >> >>> >> >>>> >> >>>> #define DEBUG 1 >> >>>> #include <common.h> >> >>>> #include <log.h> >> >>>> >> >>>> Best regards >> >>>> >> >>>> Heinrich >> >>> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: data abort when run 'dhcp' 2023-07-20 17:55 ` Heinrich Schuchardt @ 2023-07-21 11:54 ` Miquel Raynal 0 siblings, 0 replies; 23+ messages in thread From: Miquel Raynal @ 2023-07-21 11:54 UTC (permalink / raw) To: Heinrich Schuchardt; +Cc: qianfan, u-boot, Tom Rini Hi Heinrich, xypron.glpk@gmx.de wrote on Thu, 20 Jul 2023 19:55:39 +0200: > Am 20. Juli 2023 18:39:17 MESZ schrieb Miquel Raynal <miquel.raynal@bootlin.com>: > >Hello, > > > >qianfanguijin@163.com wrote on Fri, 25 Mar 2022 18:04:46 +0800: > > > >> It's very strange. And I can't detect it's a bug of usb or dlmalloc. > >> > >> 1. Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok. > >> > >> 2. Starting u-boot and dhcp via am335x's usb net, data abort. > >> > >> 3. start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok. > > > >I am sorry to re-open a thread that is one year old but this is > >still an open bug. The BBB is affected. In particular the BBBW > >because there is no Ethernet connector, which makes the Eth-over-USB > >emulation even more important. All U-Boots since 2021 are affected: > >spurious data aborts, usually at the end of network interactions (tftp, > >ping). I could not bisect it because the boot was deeply broken as > >well on a significant range of commits :-/. > > > >On my side I narrowed it down to an env update which fails in malloc as > >well. If I comment the env update, it fails a bit later. It really > >looks like a stack corruption which is either related to the Ethernet > >USB gadget or the USB controller driver itself. Network transfers on > >the BBBW using regular Ethernet does not trigger any error. > > > >I also observe the very strange "fix" mentioned above: starting and > >killing fastboot makes all tftp pass... If anyone has more details to > >share, or perhaps a subsequent thread giving more details, I would > >really like to see this fixed upstream, I suppose I am not the only one > >:-) > > > >Thanks, > >Miquèl > > > Can this problem be reproduced on QEMU? I haven't tried on QEMU, what do you have in mind? What should we try to do? Thanks, Miquèl > > Best regards > > Heinrich > > > > >> > >> 在 2022/3/24 17:33, qianfan 写道: > >> > > >> > 在 2022/3/23 18:12, Heinrich Schuchardt 写道: > >> >> On 3/23/22 11:07, qianfan wrote: > >> >>> > >> >>> 在 2022/3/23 17:51, Heinrich Schuchardt 写道: > >> >>>> On 3/23/22 10:13, qianfan wrote: > >> >>>>> > >> >>>>> 在 2022/3/23 16:02, qianfan 写道: > >> >>>>>> > >> >>>>>> > >> >>>>>> 在 2022/3/23 15:45, qianfan 写道: > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> 在 2022/3/23 10:28, qianfan 写道: > >> >>>>>>>> > >> >>>>>>>> Hi: > >> >>>>>>>> > >> >>>>>>>> I had a custom AM335X board connected my computer by usbnet. It > >> >>>>>>>> always report data abort when 'dhcp': > >> >>>>>>>> > >> >>>>>>>> Next it the log: > >> >>>>>>>> > >> >>>>>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 > >> >>>>>>>> +0800) > >> >>>>>>>> > >> >>>>>>>> CPU : AM335X-GP rev 2.1 > >> >>>>>>>> Model: WISDOM AM335X CCT > >> >>>>>>>> DRAM: 512 MiB > >> >>>>>>>> NAND: 256 MiB > >> >>>>>>>> MMC: OMAP SD/MMC: 0 > >> >>>>>>>> Loading Environment from NAND... *** Warning - bad CRC, using > >> >>>>>>>> default environment > >> >>>>>>>> > >> >>>>>>>> Net: Could not get PHY for ethernet@4a100000: addr 0 > >> >>>>>>>> eth2: ethernet@4a100000, eth3: usb_ether > >> >>>>>>>> Hit any key to stop autoboot: 0 > >> >>>>>>>> => setenv autoload no > >> >>>>>>>> => dhcp > >> >>>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in > >> >>>>>>>> MAC de:ad:be:ef:00:01 > >> >>>>>>>> HOST MAC de:ad:be:ef:00:00 > >> >>>>>>>> RNDIS ready > >> >>>>>>>> musb-hdrc: peripheral reset irq lost! > >> >>>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS > >> >>>>>>>> USB RNDIS network up! > >> >>>>>>>> BOOTP broadcast 1 > >> >>>>>>>> BOOTP broadcast 2 > >> >>>>>>>> BOOTP broadcast 3 > >> >>>>>>>> DHCP client bound to address 192.168.200.4 (757 ms) > >> >>>>>>>> data abort > >> >>>>>>>> pc : [<9fe9b0a2>] lr : [<9febbc3f>] > >> >>>>>>>> reloc pc : [<808130a2>] lr : [<80833c3f>] > >> >>>>>>>> sp : 9de53410 ip : 9de53578 fp : 00000001 > >> >>>>>>>> r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 > >> >>>>>>>> r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 > >> >>>>>>>> r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 > >> >>>>>>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) > >> >>>>>>>> Code: f023 0303 60ca 4403 (6091) 685a > >> >>>>>>>> Resetting CPU ... > >> >>>>>>>> > >> >>>>>>>> resetting ... > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> It's there has any doc about how to debug data abort? Or is the bug > >> >>>>>>>> is already fixed? > >> >>>>>>>> > >> >>>>>>>> Thanks > >> >>>>>>>> > >> >>>>>>> This bug doesn't fixed on master code. I found v2021.01 is good and > >> >>>>>>> v2021.04-rc2 is bad. > >> >>>>>>> > >> >>>>>>> Also I had tested this on beaglebone black with am335x_evm_defconfig, > >> >>>>>>> has the simliar problem. > >> >>>>>>> > >> >>>>>>> find the first bug commit via 'git bisect': it told me that commit > >> >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very > >> >>>>>>> strange due to this commit doesn't touch any dhcp or network code. > >> >>>>>>> > >> >>>>>>> ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug > >> >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit > >> >>>>>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 > >> >>>>>>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de> > >> >>>>>>> Date: Wed Jan 20 22:21:53 2021 +0100 > >> >>>>>>> > >> >>>>>>> fs: fat: consistent error handling for flush_dir() > >> >>>>>>> > >> >>>>>>> Provide function description for flush_dir(). > >> >>>>>>> Move all error messages for flush_dir() from the callers to the > >> >>>>>>> function. > >> >>>>>>> Move mapping of errors to -EIO to the function. > >> >>>>>>> Always check return value of flush_dir() (Coverity CID 316362). > >> >>>>>>> > >> >>>>>>> In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. > >> >>>>>>> > >> >>>>>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> > >> >>>>>>> > >> >>>>>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e > >> >>>>>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M fs > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before > >> >>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917: > >> >>>>>>> > >> >>>>>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error > >> >>>>>>> handling for flush_dir() > >> >>>>>>> * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag > >> >>>>>>> 'u-boot-rockchip-20210121' of > >> >>>>>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip > >> >>>>>>> |\ > >> >>>>>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc > >> >>>>>>> based PCIe controller driver > >> >>>>>>> > >> >>>>>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. > >> >>>>>>> > >> >>>>>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800) > >> >>>>>>> > >> >>>>>>> CPU : AM335X-GP rev 2.1 > >> >>>>>>> Model: TI AM335x BeagleBone Black > >> >>>>>>> DRAM: 512 MiB > >> >>>>>>> WDT: Started with servicing (60s timeout) > >> >>>>>>> NAND: 0 MiB > >> >>>>>>> MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 > >> >>>>>>> Loading Environment from FAT... <ethaddr> not set. Validating first > >> >>>>>>> E-fuse MAC > >> >>>>>>> Net: eth2: ethernet@4a100000, eth3: usb_ether > >> >>>>>>> Hit any key to stop autoboot: 0 > >> >>>>>>> => dhcp > >> >>>>>>> ethernet@4a100000 Waiting for PHY auto negotiation to > >> >>>>>>> complete......... TIMEOUT ! > >> >>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in > >> >>>>>>> MAC de:ad:be:ef:00:01 > >> >>>>>>> HOST MAC de:ad:be:ef:00:00 > >> >>>>>>> RNDIS ready > >> >>>>>>> musb-hdrc: peripheral reset irq lost! > >> >>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS > >> >>>>>>> USB RNDIS network up! > >> >>>>>>> BOOTP broadcast 1 > >> >>>>>>> BOOTP broadcast 2 > >> >>>>>>> BOOTP broadcast 3 > >> >>>>>>> DHCP client bound to address 192.168.200.157 (757 ms) > >> >>>>>>> Using usb_ether device > >> >>>>>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157 > >> >>>>>>> Filename 'u-boot.img'. > >> >>>>>>> Load address: 0x82000000 > >> >>>>>>> Loading: > >> >>>>>>> ################################################################# > >> >>>>>>> ################################################################# > >> >>>>>>> ################################################################# > >> >>>>>>> ######################### > >> >>>>>>> 2.5 MiB/s > >> >>>>>>> done > >> >>>>>>> Bytes transferred = 1123888 (112630 hex) > >> >>>>>>> => > >> >>>>>>> > >> >>>>> "data abort" messages: > >> >>>>> > >> >>>>> data abort > >> >>>>> pc : [<9ff8196c>] lr : [<9ffa1cd7>] > >> >>>>> reloc pc : [<8081496c>] lr : [<80834cd7>] > >> >>>>> sp : 9df38e60 ip : 9df38fc8 fp : 00000001 > >> >>>>> r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d > >> >>>>> r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 > >> >>>>> r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 > >> >>>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) > >> >>>>> Code: 0303 60ca 4403 6091 (685a) f042 > >> >>>>> Resetting CPU ... > >> >>>>> > >> >>>>> objdump u-boot:pc is in malloc and lr is in env_attr_walk > >> >>>>> > >> >>>>> unlink(victim, bck, fwd); > >> >>>>> 80814966: 60ca str r2, [r1, #12] > >> >>>>> set_inuse_bit_at_offset(victim, victim_size); > >> >>>>> 80814968: 4403 add r3, r0 > >> >>>>> unlink(victim, bck, fwd); > >> >>>>> 8081496a: 6091 str r1, [r2, #8] > >> >>>>> set_inuse_bit_at_offset(victim, victim_size); > >> >>>>> 8081496c: 685a ldr r2, [r3, #4] > >> >>>>> 8081496e: f042 0201 orr.w r2, r2, #1 > >> >>>>> 80814972: 605a str r2, [r3, #4] > >> >>>>> > >> >>>>> r3 is 3ff589e0 and it's not a valid ram address on am335x. > >> >>>>> > >> >>>>> > >> >>>> > >> >>>> I have seen crashes in common/dlmalloc.c before after double free() or > >> >>>> free() with an incorrect pointer. > >> >>>> > >> >>>> The assert() statements in do_check_inuse_chunk() are meant to catch > >> >>>> this but assert() as defined in include/log.h does not stop the code and > >> >>>> even does not print without _DEBUG=1. > >> >>>> > >> >>>> You should be able to get the assert output with > >> >>>> > >> >>>> #include <common.h> > >> >>>> #define _DEBUG 1 > >> >>>> #include <log.h> > >> >>>> > >> >>>> at the top of common/dlmalloc.c. > >> >>>> > >> >>>> You should get full malloc debug output with > >> >>> > >> >>> Hi: I had try add DEBUG marco before <log.h> and no other malloc message > >> >> > >> >> assert() checks for _DEBUG. Defining DEBUG after common.h will not > >> >> define _DEBUG. > >> > > >> > Finally I got a malloc error message on console: > >> > > >> > TFTP from server 192.168.200.1; our IP address is 192.168.200.39 > >> > Filename 'u-boot.img'. > >> > Load address: 0x82000000 > >> > Loading: ################################################################# > >> > ################################################################# > >> > ################################################################# > >> > ###################################################### 0 Bytes > >> > 1.9 MiB/s > >> > done > >> > Bytes transferred = 1274816 (1373c0 hex) > >> > common/dlmalloc.c:819: do_check_chunk: Assertion `(char*)p + sz <= (char*)top' > failed. > >> > > >> > I had tried many times, do_check_chunk not always failed, and sometimes it > report common/dlmalloc.c:802: do_check_chunk: Assertion `!chunk_is_mmapped(p)' > failed. The situation is not the same. > >> > > >> > I got a bt stack when malloc failed: > >> > > >> > (gdb) bt > >> > #0 0x9ffb5684 in panic_finish () at lib/panic.c:23 > >> > #1 panic (fmt=0x9ffbd96b "%s:%u: %s: Assertion `%s' failed.") at lib/panic.c:49 > >> > #2 0x9ffb5696 in __assert_fail (assertion=<optimized out>, file=<optimized > out>, line=<optimized out>, function=<optimized out>) at lib/panic.c:56 > >> > #3 0x9ff76910 in do_check_inuse_chunk (p=p@entry=0x9ffd7200) at > common/dlmalloc.c:866 > >> > #4 0x9ff769d6 in do_check_malloced_chunk (p=p@entry=0x9ffd7200, s=s@entry=24) > at common/dlmalloc.c:900 > >> > #5 0x9ff76da6 in malloc (bytes=<optimized out>) at common/dlmalloc.c:1552 > >> > #6 0x9ff96b72 in env_attr_walk (attr_list=<optimized out>, > callback=0x9ff969f9 <regex_callback>, priv=0x9df28dc8) at env/attr.c:70 > >> > #7 0x9ff96bc2 in env_attr_lookup (attr_list=<optimized out>, name=<optimized > out>, attributes=0x9df28dec "") at env/attr.c:184 > >> > #8 0x9ff97146 in env_callback_init (var_entry=0x9df46f60) at env/callback.c:67 > >> > #9 0x9ffb36fc in hsearch_r (item=..., action=ENV_ENTER, retval=0x9df28f60, > htab=0x9ffdbce8, flag=512) at lib/hashtable.c:403 > >> > #10 0x9ff7090e in _do_env_set (argc=<optimized out>, argv=<optimized out>, > env_flag=512, flag=0) at cmd/nvedit.c:296 > >> > #11 0x9ff70b64 in env_set (varname=<optimized out>, varvalue=<optimized out>) > at cmd/nvedit.c:318 > >> > #12 0x9ff6d522 in netboot_update_env () at cmd/net.c:133 > >> > #13 netboot_common (proto=DHCP, cmdtp=0x9ffdd0e8, argc=<optimized out>, > argv=0x9df442c8) at cmd/net.c:268 > >> > #14 0x9ff783a4 in cmd_call (repeatable=0x9df29008, argv=0x9df442c8, argc=1, > flag=0, cmdtp=0x9ffdd0e8) at common/command.c:580 > >> > #15 cmd_process (flag=<optimized out>, argc=1, argv=0x9df442c8, > repeatable=0x9ffdf6a0, ticks=0x0) at common/command.c:635 > >> > #16 0x9ff71d16 in run_pipe_real (pi=0x9df44220) at common/cli_hush.c:1676 > >> > #17 run_list_real (pi=<optimized out>) at common/cli_hush.c:1873 > >> > #18 0x9ff71e28 in run_list (pi=0x9df44220) at common/cli_hush.c:2022 > >> > #19 parse_stream_outer (inp=inp@entry=0x9df290e8, flag=flag@entry=2) at > common/cli_hush.c:3206 > >> > #20 0x9ff721ba in parse_file_outer () at common/cli_hush.c:3289 > >> > #21 0x9ff77c1a in cli_loop () at common/cli.c:229 > >> > #22 0x9ff70d3e in main_loop () at common/main.c:66 > >> > #23 0x9ff72672 in run_main_loop () at common/board_r.c:584 > >> > #24 0x9ff72830 in initcall_run_list (init_sequence=0x9ffd7224) at > include/initcall.h:46 > >> > #25 board_init_r (new_gd=<optimized out>, dest_addr=<optimized out>) at > common/board_r.c:822 > >> > Backtrace stopped: previous frame identical to this frame (corrupt stack?) > >> > > >> >> > >> >> Best regards > >> >> > >> >> Heinrich > >> >> > >> >>> printed. > >> >>> > >> >>>> > >> >>>> #define DEBUG 1 > >> >>>> #include <common.h> > >> >>>> #include <log.h> > >> >>>> > >> >>>> Best regards > >> >>>> > >> >>>> Heinrich > >> >>> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: data abort when run 'dhcp' 2023-07-20 16:39 ` Miquel Raynal 2023-07-20 17:55 ` Heinrich Schuchardt @ 2023-07-20 18:34 ` Tom Rini 2023-07-21 11:55 ` Miquel Raynal 2023-07-21 0:31 ` qianfan 2 siblings, 1 reply; 23+ messages in thread From: Tom Rini @ 2023-07-20 18:34 UTC (permalink / raw) To: Miquel Raynal; +Cc: qianfan, Heinrich Schuchardt, u-boot [-- Attachment #1: Type: text/plain, Size: 1724 bytes --] On Thu, Jul 20, 2023 at 06:39:17PM +0200, Miquel Raynal wrote: > Hello, > > qianfanguijin@163.com wrote on Fri, 25 Mar 2022 18:04:46 +0800: > > > It's very strange. And I can't detect it's a bug of usb or dlmalloc. > > > > 1. Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok. > > > > 2. Starting u-boot and dhcp via am335x's usb net, data abort. > > > > 3. start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok. > > I am sorry to re-open a thread that is one year old but this is > still an open bug. The BBB is affected. In particular the BBBW > because there is no Ethernet connector, which makes the Eth-over-USB > emulation even more important. All U-Boots since 2021 are affected: > spurious data aborts, usually at the end of network interactions (tftp, > ping). I could not bisect it because the boot was deeply broken as > well on a significant range of commits :-/. > > On my side I narrowed it down to an env update which fails in malloc as > well. If I comment the env update, it fails a bit later. It really > looks like a stack corruption which is either related to the Ethernet > USB gadget or the USB controller driver itself. Network transfers on > the BBBW using regular Ethernet does not trigger any error. > > I also observe the very strange "fix" mentioned above: starting and > killing fastboot makes all tftp pass... If anyone has more details to > share, or perhaps a subsequent thread giving more details, I would > really like to see this fixed upstream, I suppose I am not the only one > :-) What happens if you increase the malloc pool from say 32MB (current value, 0x2000000) to 64MB (so 0x4000000) ? -- Tom [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 659 bytes --] ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: data abort when run 'dhcp' 2023-07-20 18:34 ` Tom Rini @ 2023-07-21 11:55 ` Miquel Raynal 0 siblings, 0 replies; 23+ messages in thread From: Miquel Raynal @ 2023-07-21 11:55 UTC (permalink / raw) To: Tom Rini; +Cc: qianfan, Heinrich Schuchardt, u-boot Hi Tom, trini@konsulko.com wrote on Thu, 20 Jul 2023 14:34:52 -0400: > On Thu, Jul 20, 2023 at 06:39:17PM +0200, Miquel Raynal wrote: > > Hello, > > > > qianfanguijin@163.com wrote on Fri, 25 Mar 2022 18:04:46 +0800: > > > > > It's very strange. And I can't detect it's a bug of usb or dlmalloc. > > > > > > 1. Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok. > > > > > > 2. Starting u-boot and dhcp via am335x's usb net, data abort. > > > > > > 3. start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok. > > > > I am sorry to re-open a thread that is one year old but this is > > still an open bug. The BBB is affected. In particular the BBBW > > because there is no Ethernet connector, which makes the Eth-over-USB > > emulation even more important. All U-Boots since 2021 are affected: > > spurious data aborts, usually at the end of network interactions (tftp, > > ping). I could not bisect it because the boot was deeply broken as > > well on a significant range of commits :-/. > > > > On my side I narrowed it down to an env update which fails in malloc as > > well. If I comment the env update, it fails a bit later. It really > > looks like a stack corruption which is either related to the Ethernet > > USB gadget or the USB controller driver itself. Network transfers on > > the BBBW using regular Ethernet does not trigger any error. > > > > I also observe the very strange "fix" mentioned above: starting and > > killing fastboot makes all tftp pass... If anyone has more details to > > share, or perhaps a subsequent thread giving more details, I would > > really like to see this fixed upstream, I suppose I am not the only one > > :-) > > What happens if you increase the malloc pool from say 32MB (current > value, 0x2000000) to 64MB (so 0x4000000) ? Same result. I tried to increment the heap size to 64MB as well as the stack size (16 -> 64MB), same behavior. Thanks, Miquèl ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: data abort when run 'dhcp' 2023-07-20 16:39 ` Miquel Raynal 2023-07-20 17:55 ` Heinrich Schuchardt 2023-07-20 18:34 ` Tom Rini @ 2023-07-21 0:31 ` qianfan 2023-07-21 22:26 ` Miquel Raynal 2 siblings, 1 reply; 23+ messages in thread From: qianfan @ 2023-07-21 0:31 UTC (permalink / raw) To: Miquel Raynal; +Cc: Heinrich Schuchardt, u-boot, Tom Rini 在 2023/7/21 0:39, Miquel Raynal 写道: > Hello, > > qianfanguijin@163.com wrote on Fri, 25 Mar 2022 18:04:46 +0800: > >> It's very strange. And I can't detect it's a bug of usb or dlmalloc. >> >> 1. Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok. >> >> 2. Starting u-boot and dhcp via am335x's usb net, data abort. >> >> 3. start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok. > I am sorry to re-open a thread that is one year old but this is > still an open bug. The BBB is affected. In particular the BBBW > because there is no Ethernet connector, which makes the Eth-over-USB > emulation even more important. All U-Boots since 2021 are affected: > spurious data aborts, usually at the end of network interactions (tftp, > ping). I could not bisect it because the boot was deeply broken as > well on a significant range of commits :-/. > > On my side I narrowed it down to an env update which fails in malloc as > well. If I comment the env update, it fails a bit later. It really > looks like a stack corruption which is either related to the Ethernet > USB gadget or the USB controller driver itself. Network transfers on > the BBBW using regular Ethernet does not trigger any error. > > I also observe the very strange "fix" mentioned above: starting and > killing fastboot makes all tftp pass... If anyone has more details to > share, or perhaps a subsequent thread giving more details, I would > really like to see this fixed upstream, I suppose I am not the only one > :-) Hi: Could you please try this two patches? http://patchwork.ozlabs.org/project/uboot/patch/20220402025836.19374-1-qianfanguijin@163.com/ http://patchwork.ozlabs.org/project/uboot/patch/20220402025836.19374-2-qianfanguijin@163.com/ Thanks > > Thanks, > Miquèl > >> 在 2022/3/24 17:33, qianfan 写道: >>> 在 2022/3/23 18:12, Heinrich Schuchardt 写道: >>>> On 3/23/22 11:07, qianfan wrote: >>>>> 在 2022/3/23 17:51, Heinrich Schuchardt 写道: >>>>>> On 3/23/22 10:13, qianfan wrote: >>>>>>> 在 2022/3/23 16:02, qianfan 写道: >>>>>>>> >>>>>>>> 在 2022/3/23 15:45, qianfan 写道: >>>>>>>>> >>>>>>>>> 在 2022/3/23 10:28, qianfan 写道: >>>>>>>>>> Hi: >>>>>>>>>> >>>>>>>>>> I had a custom AM335X board connected my computer by usbnet. It >>>>>>>>>> always report data abort when 'dhcp': >>>>>>>>>> >>>>>>>>>> Next it the log: >>>>>>>>>> >>>>>>>>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 >>>>>>>>>> +0800) >>>>>>>>>> >>>>>>>>>> CPU : AM335X-GP rev 2.1 >>>>>>>>>> Model: WISDOM AM335X CCT >>>>>>>>>> DRAM: 512 MiB >>>>>>>>>> NAND: 256 MiB >>>>>>>>>> MMC: OMAP SD/MMC: 0 >>>>>>>>>> Loading Environment from NAND... *** Warning - bad CRC, using >>>>>>>>>> default environment >>>>>>>>>> >>>>>>>>>> Net: Could not get PHY for ethernet@4a100000: addr 0 >>>>>>>>>> eth2: ethernet@4a100000, eth3: usb_ether >>>>>>>>>> Hit any key to stop autoboot: 0 >>>>>>>>>> => setenv autoload no >>>>>>>>>> => dhcp >>>>>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>>>>>>>>> MAC de:ad:be:ef:00:01 >>>>>>>>>> HOST MAC de:ad:be:ef:00:00 >>>>>>>>>> RNDIS ready >>>>>>>>>> musb-hdrc: peripheral reset irq lost! >>>>>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>>>>>>>>> USB RNDIS network up! >>>>>>>>>> BOOTP broadcast 1 >>>>>>>>>> BOOTP broadcast 2 >>>>>>>>>> BOOTP broadcast 3 >>>>>>>>>> DHCP client bound to address 192.168.200.4 (757 ms) >>>>>>>>>> data abort >>>>>>>>>> pc : [<9fe9b0a2>] lr : [<9febbc3f>] >>>>>>>>>> reloc pc : [<808130a2>] lr : [<80833c3f>] >>>>>>>>>> sp : 9de53410 ip : 9de53578 fp : 00000001 >>>>>>>>>> r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 >>>>>>>>>> r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 >>>>>>>>>> r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 >>>>>>>>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >>>>>>>>>> Code: f023 0303 60ca 4403 (6091) 685a >>>>>>>>>> Resetting CPU ... >>>>>>>>>> >>>>>>>>>> resetting ... >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> It's there has any doc about how to debug data abort? Or is the bug >>>>>>>>>> is already fixed? >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> >>>>>>>>> This bug doesn't fixed on master code. I found v2021.01 is good and >>>>>>>>> v2021.04-rc2 is bad. >>>>>>>>> >>>>>>>>> Also I had tested this on beaglebone black with am335x_evm_defconfig, >>>>>>>>> has the simliar problem. >>>>>>>>> >>>>>>>>> find the first bug commit via 'git bisect': it told me that commit >>>>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very >>>>>>>>> strange due to this commit doesn't touch any dhcp or network code. >>>>>>>>> >>>>>>>>> ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug >>>>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit >>>>>>>>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 >>>>>>>>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de> >>>>>>>>> Date: Wed Jan 20 22:21:53 2021 +0100 >>>>>>>>> >>>>>>>>> fs: fat: consistent error handling for flush_dir() >>>>>>>>> >>>>>>>>> Provide function description for flush_dir(). >>>>>>>>> Move all error messages for flush_dir() from the callers to the >>>>>>>>> function. >>>>>>>>> Move mapping of errors to -EIO to the function. >>>>>>>>> Always check return value of flush_dir() (Coverity CID 316362). >>>>>>>>> >>>>>>>>> In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. >>>>>>>>> >>>>>>>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> >>>>>>>>> >>>>>>>>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e >>>>>>>>> 77d188b1c99181fd71f2167fdeee3434a09db209 M fs >>>>>>>>> >>>>>>>>> >>>>>>>>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before >>>>>>>>> e97eb638de0dc8f6e989e20eaeb0342f103cb917: >>>>>>>>> >>>>>>>>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error >>>>>>>>> handling for flush_dir() >>>>>>>>> * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag >>>>>>>>> 'u-boot-rockchip-20210121' of >>>>>>>>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip >>>>>>>>> |\ >>>>>>>>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc >>>>>>>>> based PCIe controller driver >>>>>>>>> >>>>>>>>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. >>>>>>>>> >>>>>>>>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800) >>>>>>>>> >>>>>>>>> CPU : AM335X-GP rev 2.1 >>>>>>>>> Model: TI AM335x BeagleBone Black >>>>>>>>> DRAM: 512 MiB >>>>>>>>> WDT: Started with servicing (60s timeout) >>>>>>>>> NAND: 0 MiB >>>>>>>>> MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 >>>>>>>>> Loading Environment from FAT... <ethaddr> not set. Validating first >>>>>>>>> E-fuse MAC >>>>>>>>> Net: eth2: ethernet@4a100000, eth3: usb_ether >>>>>>>>> Hit any key to stop autoboot: 0 >>>>>>>>> => dhcp >>>>>>>>> ethernet@4a100000 Waiting for PHY auto negotiation to >>>>>>>>> complete......... TIMEOUT ! >>>>>>>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>>>>>>>> MAC de:ad:be:ef:00:01 >>>>>>>>> HOST MAC de:ad:be:ef:00:00 >>>>>>>>> RNDIS ready >>>>>>>>> musb-hdrc: peripheral reset irq lost! >>>>>>>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>>>>>>>> USB RNDIS network up! >>>>>>>>> BOOTP broadcast 1 >>>>>>>>> BOOTP broadcast 2 >>>>>>>>> BOOTP broadcast 3 >>>>>>>>> DHCP client bound to address 192.168.200.157 (757 ms) >>>>>>>>> Using usb_ether device >>>>>>>>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157 >>>>>>>>> Filename 'u-boot.img'. >>>>>>>>> Load address: 0x82000000 >>>>>>>>> Loading: >>>>>>>>> ################################################################# >>>>>>>>> ################################################################# >>>>>>>>> ################################################################# >>>>>>>>> ######################### >>>>>>>>> 2.5 MiB/s >>>>>>>>> done >>>>>>>>> Bytes transferred = 1123888 (112630 hex) >>>>>>>>> => >>>>>>>>> >>>>>>> "data abort" messages: >>>>>>> >>>>>>> data abort >>>>>>> pc : [<9ff8196c>] lr : [<9ffa1cd7>] >>>>>>> reloc pc : [<8081496c>] lr : [<80834cd7>] >>>>>>> sp : 9df38e60 ip : 9df38fc8 fp : 00000001 >>>>>>> r10: 9df38eac r9 : 9df4ceb0 r8 : 9ffa1b7d >>>>>>> r7 : 9df52fd0 r6 : 9ffdbba8 r5 : 0000000d r4 : 00000018 >>>>>>> r3 : 3ff589e0 r2 : 9ffafa11 r1 : 9ffdbbc0 r0 : 9ffdbb00 >>>>>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >>>>>>> Code: 0303 60ca 4403 6091 (685a) f042 >>>>>>> Resetting CPU ... >>>>>>> >>>>>>> objdump u-boot:pc is in malloc and lr is in env_attr_walk >>>>>>> >>>>>>> unlink(victim, bck, fwd); >>>>>>> 80814966: 60ca str r2, [r1, #12] >>>>>>> set_inuse_bit_at_offset(victim, victim_size); >>>>>>> 80814968: 4403 add r3, r0 >>>>>>> unlink(victim, bck, fwd); >>>>>>> 8081496a: 6091 str r1, [r2, #8] >>>>>>> set_inuse_bit_at_offset(victim, victim_size); >>>>>>> 8081496c: 685a ldr r2, [r3, #4] >>>>>>> 8081496e: f042 0201 orr.w r2, r2, #1 >>>>>>> 80814972: 605a str r2, [r3, #4] >>>>>>> >>>>>>> r3 is 3ff589e0 and it's not a valid ram address on am335x. >>>>>>> >>>>>>> >>>>>> I have seen crashes in common/dlmalloc.c before after double free() or >>>>>> free() with an incorrect pointer. >>>>>> >>>>>> The assert() statements in do_check_inuse_chunk() are meant to catch >>>>>> this but assert() as defined in include/log.h does not stop the code and >>>>>> even does not print without _DEBUG=1. >>>>>> >>>>>> You should be able to get the assert output with >>>>>> >>>>>> #include <common.h> >>>>>> #define _DEBUG 1 >>>>>> #include <log.h> >>>>>> >>>>>> at the top of common/dlmalloc.c. >>>>>> >>>>>> You should get full malloc debug output with >>>>> Hi: I had try add DEBUG marco before <log.h> and no other malloc message >>>> assert() checks for _DEBUG. Defining DEBUG after common.h will not >>>> define _DEBUG. >>> Finally I got a malloc error message on console: >>> >>> TFTP from server 192.168.200.1; our IP address is 192.168.200.39 >>> Filename 'u-boot.img'. >>> Load address: 0x82000000 >>> Loading: ################################################################# >>> ################################################################# >>> ################################################################# >>> ###################################################### 0 Bytes >>> 1.9 MiB/s >>> done >>> Bytes transferred = 1274816 (1373c0 hex) >>> common/dlmalloc.c:819: do_check_chunk: Assertion `(char*)p + sz <= (char*)top' > failed. >>> >>> I had tried many times, do_check_chunk not always failed, and sometimes it > report common/dlmalloc.c:802: do_check_chunk: Assertion `!chunk_is_mmapped(p)' > failed. The situation is not the same. >>> >>> I got a bt stack when malloc failed: >>> >>> (gdb) bt >>> #0 0x9ffb5684 in panic_finish () at lib/panic.c:23 >>> #1 panic (fmt=0x9ffbd96b "%s:%u: %s: Assertion `%s' failed.") at lib/panic.c:49 >>> #2 0x9ffb5696 in __assert_fail (assertion=<optimized out>, file=<optimized > out>, line=<optimized out>, function=<optimized out>) at lib/panic.c:56 >>> #3 0x9ff76910 in do_check_inuse_chunk (p=p@entry=0x9ffd7200) at > common/dlmalloc.c:866 >>> #4 0x9ff769d6 in do_check_malloced_chunk (p=p@entry=0x9ffd7200, s=s@entry=24) > at common/dlmalloc.c:900 >>> #5 0x9ff76da6 in malloc (bytes=<optimized out>) at common/dlmalloc.c:1552 >>> #6 0x9ff96b72 in env_attr_walk (attr_list=<optimized out>, > callback=0x9ff969f9 <regex_callback>, priv=0x9df28dc8) at env/attr.c:70 >>> #7 0x9ff96bc2 in env_attr_lookup (attr_list=<optimized out>, name=<optimized > out>, attributes=0x9df28dec "") at env/attr.c:184 >>> #8 0x9ff97146 in env_callback_init (var_entry=0x9df46f60) at env/callback.c:67 >>> #9 0x9ffb36fc in hsearch_r (item=..., action=ENV_ENTER, retval=0x9df28f60, > htab=0x9ffdbce8, flag=512) at lib/hashtable.c:403 >>> #10 0x9ff7090e in _do_env_set (argc=<optimized out>, argv=<optimized out>, > env_flag=512, flag=0) at cmd/nvedit.c:296 >>> #11 0x9ff70b64 in env_set (varname=<optimized out>, varvalue=<optimized out>) > at cmd/nvedit.c:318 >>> #12 0x9ff6d522 in netboot_update_env () at cmd/net.c:133 >>> #13 netboot_common (proto=DHCP, cmdtp=0x9ffdd0e8, argc=<optimized out>, > argv=0x9df442c8) at cmd/net.c:268 >>> #14 0x9ff783a4 in cmd_call (repeatable=0x9df29008, argv=0x9df442c8, argc=1, > flag=0, cmdtp=0x9ffdd0e8) at common/command.c:580 >>> #15 cmd_process (flag=<optimized out>, argc=1, argv=0x9df442c8, > repeatable=0x9ffdf6a0, ticks=0x0) at common/command.c:635 >>> #16 0x9ff71d16 in run_pipe_real (pi=0x9df44220) at common/cli_hush.c:1676 >>> #17 run_list_real (pi=<optimized out>) at common/cli_hush.c:1873 >>> #18 0x9ff71e28 in run_list (pi=0x9df44220) at common/cli_hush.c:2022 >>> #19 parse_stream_outer (inp=inp@entry=0x9df290e8, flag=flag@entry=2) at > common/cli_hush.c:3206 >>> #20 0x9ff721ba in parse_file_outer () at common/cli_hush.c:3289 >>> #21 0x9ff77c1a in cli_loop () at common/cli.c:229 >>> #22 0x9ff70d3e in main_loop () at common/main.c:66 >>> #23 0x9ff72672 in run_main_loop () at common/board_r.c:584 >>> #24 0x9ff72830 in initcall_run_list (init_sequence=0x9ffd7224) at > include/initcall.h:46 >>> #25 board_init_r (new_gd=<optimized out>, dest_addr=<optimized out>) at > common/board_r.c:822 >>> Backtrace stopped: previous frame identical to this frame (corrupt stack?) >>> >>>> Best regards >>>> >>>> Heinrich >>>> >>>>> printed. >>>>> >>>>>> #define DEBUG 1 >>>>>> #include <common.h> >>>>>> #include <log.h> >>>>>> >>>>>> Best regards >>>>>> >>>>>> Heinrich >>>>> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: data abort when run 'dhcp' 2023-07-21 0:31 ` qianfan @ 2023-07-21 22:26 ` Miquel Raynal 0 siblings, 0 replies; 23+ messages in thread From: Miquel Raynal @ 2023-07-21 22:26 UTC (permalink / raw) To: qianfan; +Cc: Heinrich Schuchardt, u-boot, Tom Rini Hi qianfan, qianfanguijin@163.com wrote on Fri, 21 Jul 2023 08:31:17 +0800: > 在 2023/7/21 0:39, Miquel Raynal 写道: > > Hello, > > > > qianfanguijin@163.com wrote on Fri, 25 Mar 2022 18:04:46 +0800: > > > >> It's very strange. And I can't detect it's a bug of usb or dlmalloc. > >> > >> 1. Starting u-boot and dhcp via am335x's ethernet(cpsw driver), it's ok. > >> > >> 2. Starting u-boot and dhcp via am335x's usb net, data abort. > >> > >> 3. start fastboot, and CTRL C right now, dhcp via am335x's usb net, it's ok. > > I am sorry to re-open a thread that is one year old but this is > > still an open bug. The BBB is affected. In particular the BBBW > > because there is no Ethernet connector, which makes the Eth-over-USB > > emulation even more important. All U-Boots since 2021 are affected: > > spurious data aborts, usually at the end of network interactions (tftp, > > ping). I could not bisect it because the boot was deeply broken as > > well on a significant range of commits :-/. > > > > On my side I narrowed it down to an env update which fails in malloc as > > well. If I comment the env update, it fails a bit later. It really > > looks like a stack corruption which is either related to the Ethernet > > USB gadget or the USB controller driver itself. Network transfers on > > the BBBW using regular Ethernet does not trigger any error. > > > > I also observe the very strange "fix" mentioned above: starting and > > killing fastboot makes all tftp pass... If anyone has more details to > > share, or perhaps a subsequent thread giving more details, I would > > really like to see this fixed upstream, I suppose I am not the only one > > :-) > Hi: > > Could you please try this two patches? > > http://patchwork.ozlabs.org/project/uboot/patch/20220402025836.19374-1-qianfanguijin@163.com/ > > http://patchwork.ozlabs.org/project/uboot/patch/20220402025836.19374-2-qianfanguijin@163.com/ Indeed these patches work. I ended up rewriting one of them to propose a different approach. I also found two other proposals for the same issue which are still pending around. I hope this submission will make it to avoid more time to be spent on this :-) Thanks a lot for the pointers, I've Cc'ed you on the submissions. Kind regards, Miquèl ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: How to debug u-boot data abort 2022-03-23 7:45 ` qianfan 2022-03-23 8:02 ` data abort when run 'dhcp' qianfan @ 2022-03-23 8:27 ` Heinrich Schuchardt 2022-03-24 3:18 ` AKASHI Takahiro 1 sibling, 1 reply; 23+ messages in thread From: Heinrich Schuchardt @ 2022-03-23 8:27 UTC (permalink / raw) To: qianfan; +Cc: u-boot On 3/23/22 08:45, qianfan wrote: > > 在 2022/3/23 10:28, qianfan 写道: >> >> Hi: >> >> I had a custom AM335X board connected my computer by usbnet. It always >> report data abort when 'dhcp': >> >> Next it the log: >> >> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800) >> >> CPU : AM335X-GP rev 2.1 >> Model: WISDOM AM335X CCT >> DRAM: 512 MiB >> NAND: 256 MiB >> MMC: OMAP SD/MMC: 0 >> Loading Environment from NAND... *** Warning - bad CRC, using default >> environment >> >> Net: Could not get PHY for ethernet@4a100000: addr 0 >> eth2: ethernet@4a100000, eth3: usb_ether >> Hit any key to stop autoboot: 0 >> => setenv autoload no >> => dhcp >> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >> MAC de:ad:be:ef:00:01 >> HOST MAC de:ad:be:ef:00:00 >> RNDIS ready >> musb-hdrc: peripheral reset irq lost! >> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >> USB RNDIS network up! >> BOOTP broadcast 1 >> BOOTP broadcast 2 >> BOOTP broadcast 3 >> DHCP client bound to address 192.168.200.4 (757 ms) >> data abort This could be an alignment error. >> pc : [<9fe9b0a2>] lr : [<9febbc3f>] >> reloc pc : [<808130a2>] lr : [<80833c3f>] You can use these addresses together with the u-boot.map file to figure out in which function the abort occurs and from where it was called. Use 'arm-linux-gnueabihf-objdump -S -D' to find the exact code positions. >> sp : 9de53410 ip : 9de53578 fp : 00000001 >> r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 >> r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 >> r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 >> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >> Code: f023 0303 60ca 4403 (6091) 685a This is how to find the exact instruction causing the problem: $ echo 'Code: f023 0303 60ca 4403 (6091) 685a' | \ > ARCH=arm scripts/decodecode Code: f023 0303 60ca 4403 (6091) 685a All code ======== 0: 23 f0 and %eax,%esi 2: 03 03 add (%rbx),%eax 4: ca 60 03 lret $0x360 7:* 44 91 rex.R xchg %eax,%ecx <-- trapping instruction 9: 60 (bad) a: 5a pop %rdx b: 68 .byte 0x68 Code starting with the faulting instruction =========================================== 0: 91 xchg %eax,%ecx 1: 60 (bad) 2: 5a pop %rdx 3: 68 .byte 0x68 I hope this helps to figure out, where exactly the problem occurs Best regards Heinrich >> Resetting CPU ... >> >> resetting ... >> >> >> It's there has any doc about how to debug data abort? Or is the bug is >> already fixed? >> >> Thanks >> > This bug doesn't fixed on master code. I found v2021.01 is good and > v2021.04-rc2 is bad. > > Also I had tested this on beaglebone black with am335x_evm_defconfig, > has the simliar problem. > > find the first bug commit via 'git bisect': it told me that commit > e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very > strange due to this commit doesn't touch any dhcp or network code. > > ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug > e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit > commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 > Author: Heinrich Schuchardt <xypron.glpk@gmx.de> > Date: Wed Jan 20 22:21:53 2021 +0100 > > fs: fat: consistent error handling for flush_dir() > > Provide function description for flush_dir(). > Move all error messages for flush_dir() from the callers to the > function. > Move mapping of errors to -EIO to the function. > Always check return value of flush_dir() (Coverity CID 316362). > > In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. > > Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> > > :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e > 77d188b1c99181fd71f2167fdeee3434a09db209 M fs > > > 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before > e97eb638de0dc8f6e989e20eaeb0342f103cb917: > > * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error > handling for flush_dir() > * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag > 'u-boot-rockchip-20210121' of > https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip > |\ > | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based > PCIe controller driver > > I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. > > U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800) > > CPU : AM335X-GP rev 2.1 > Model: TI AM335x BeagleBone Black > DRAM: 512 MiB > WDT: Started with servicing (60s timeout) > NAND: 0 MiB > MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 > Loading Environment from FAT... <ethaddr> not set. Validating first > E-fuse MAC > Net: eth2: ethernet@4a100000, eth3: usb_ether > Hit any key to stop autoboot: 0 > => dhcp > ethernet@4a100000 Waiting for PHY auto negotiation to complete......... > TIMEOUT ! > using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in > MAC de:ad:be:ef:00:01 > HOST MAC de:ad:be:ef:00:00 > RNDIS ready > musb-hdrc: peripheral reset irq lost! > high speed config #2: 2 mA, Ethernet Gadget, using RNDIS > USB RNDIS network up! > BOOTP broadcast 1 > BOOTP broadcast 2 > BOOTP broadcast 3 > DHCP client bound to address 192.168.200.157 (757 ms) > Using usb_ether device > TFTP from server 192.168.200.1; our IP address is 192.168.200.157 > Filename 'u-boot.img'. > Load address: 0x82000000 > Loading: ################################################################# > ################################################################# > ################################################################# > ######################### > 2.5 MiB/s > done > Bytes transferred = 1123888 (112630 hex) > => > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: How to debug u-boot data abort 2022-03-23 8:27 ` How to debug u-boot data abort Heinrich Schuchardt @ 2022-03-24 3:18 ` AKASHI Takahiro 2022-03-24 7:38 ` qianfan 0 siblings, 1 reply; 23+ messages in thread From: AKASHI Takahiro @ 2022-03-24 3:18 UTC (permalink / raw) To: Heinrich Schuchardt; +Cc: qianfan, u-boot On Wed, Mar 23, 2022 at 09:27:08AM +0100, Heinrich Schuchardt wrote: > On 3/23/22 08:45, qianfan wrote: > > > > 在 2022/3/23 10:28, qianfan 写道: > > > > > > Hi: > > > > > > I had a custom AM335X board connected my computer by usbnet. It always > > > report data abort when 'dhcp': > > > > > > Next it the log: > > > > > > U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800) > > > > > > CPU : AM335X-GP rev 2.1 > > > Model: WISDOM AM335X CCT > > > DRAM: 512 MiB > > > NAND: 256 MiB > > > MMC: OMAP SD/MMC: 0 > > > Loading Environment from NAND... *** Warning - bad CRC, using default > > > environment > > > > > > Net: Could not get PHY for ethernet@4a100000: addr 0 > > > eth2: ethernet@4a100000, eth3: usb_ether > > > Hit any key to stop autoboot: 0 > > > => setenv autoload no > > > => dhcp > > > using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in > > > MAC de:ad:be:ef:00:01 > > > HOST MAC de:ad:be:ef:00:00 > > > RNDIS ready > > > musb-hdrc: peripheral reset irq lost! > > > high speed config #2: 2 mA, Ethernet Gadget, using RNDIS > > > USB RNDIS network up! > > > BOOTP broadcast 1 > > > BOOTP broadcast 2 > > > BOOTP broadcast 3 > > > DHCP client bound to address 192.168.200.4 (757 ms) > > > data abort > > This could be an alignment error. > > > > pc : [<9fe9b0a2>] lr : [<9febbc3f>] > > > reloc pc : [<808130a2>] lr : [<80833c3f>] > > You can use these addresses together with the u-boot.map file to figure > out in which function the abort occurs and from where it was called. > > Use 'arm-linux-gnueabihf-objdump -S -D' to find the exact code positions. > > > > sp : 9de53410 ip : 9de53578 fp : 00000001 > > > r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 > > > r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 > > > r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 > > > Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) > > > Code: f023 0303 60ca 4403 (6091) 685a > > This is how to find the exact instruction causing the problem: > > $ echo 'Code: f023 0303 60ca 4403 (6091) 685a' | \ > > ARCH=arm scripts/decodecode > Code: f023 0303 60ca 4403 (6091) 685a > All code > ======== > 0: 23 f0 and %eax,%esi > 2: 03 03 add (%rbx),%eax > 4: ca 60 03 lret $0x360 > 7:* 44 91 rex.R xchg %eax,%ecx <-- > trapping instruction > 9: 60 (bad) > a: 5a pop %rdx > b: 68 .byte 0x68 > > Code starting with the faulting instruction > =========================================== > 0: 91 xchg %eax,%ecx > 1: 60 (bad) > 2: 5a pop %rdx > 3: 68 .byte 0x68 The code looks like x86 instructions. Please don't forget to add "CROSS_COMPILE=..." :) Code: f023 0303 60ca 4403 (6091) 685a All code ======== 0: f023 0303 bic.w r3, r3, #3 4: 60ca str r2, [r1, #12] 6: 4403 add r3, r0 8:* 6091 str r1, [r2, #8] <-- trapping instruction a: 685a ldr r2, [r3, #4] Code starting with the faulting instruction =========================================== 0: 6091 str r1, [r2, #8] 2: 685a ldr r2, [r3, #4] Then, ${CROSS_COMPILE}objdump --disassemble=malloc -lS ${BUILDDIR}/u-boot | grep -A 10 -B 20 ${PATTERN} # Here, PATTERN may be the instruction ("6091") or the location ("8081496c" in your case?) or similarly ${CROSS_COMPILE}gdb --batch -ex "disas/m ${LOC}" ${BUILDDIR}/u-boot | grep -A 10 -B 20 ${LOC} # Here, LOC is your "reloc pc" (0x80817586) gives you some hint about the exact location. -Takahiro Akashi > I hope this helps to figure out, where exactly the problem occurs > > Best regards > > Heinrich > > > > Resetting CPU ... > > > > > > resetting ... > > > > > > > > > It's there has any doc about how to debug data abort? Or is the bug is > > > already fixed? > > > > > > Thanks > > > > > This bug doesn't fixed on master code. I found v2021.01 is good and > > v2021.04-rc2 is bad. > > > > Also I had tested this on beaglebone black with am335x_evm_defconfig, > > has the simliar problem. > > > > find the first bug commit via 'git bisect': it told me that commit > > e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very > > strange due to this commit doesn't touch any dhcp or network code. > > > > ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug > > e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit > > commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 > > Author: Heinrich Schuchardt <xypron.glpk@gmx.de> > > Date: Wed Jan 20 22:21:53 2021 +0100 > > > > fs: fat: consistent error handling for flush_dir() > > > > Provide function description for flush_dir(). > > Move all error messages for flush_dir() from the callers to the > > function. > > Move mapping of errors to -EIO to the function. > > Always check return value of flush_dir() (Coverity CID 316362). > > > > In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. > > > > Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> > > > > :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e > > 77d188b1c99181fd71f2167fdeee3434a09db209 M fs > > > > > > 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before > > e97eb638de0dc8f6e989e20eaeb0342f103cb917: > > > > * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error > > handling for flush_dir() > > * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag > > 'u-boot-rockchip-20210121' of > > https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip > > |\ > > | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based > > PCIe controller driver > > > > I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. > > > > U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800) > > > > CPU : AM335X-GP rev 2.1 > > Model: TI AM335x BeagleBone Black > > DRAM: 512 MiB > > WDT: Started with servicing (60s timeout) > > NAND: 0 MiB > > MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 > > Loading Environment from FAT... <ethaddr> not set. Validating first > > E-fuse MAC > > Net: eth2: ethernet@4a100000, eth3: usb_ether > > Hit any key to stop autoboot: 0 > > => dhcp > > ethernet@4a100000 Waiting for PHY auto negotiation to complete......... > > TIMEOUT ! > > using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in > > MAC de:ad:be:ef:00:01 > > HOST MAC de:ad:be:ef:00:00 > > RNDIS ready > > musb-hdrc: peripheral reset irq lost! > > high speed config #2: 2 mA, Ethernet Gadget, using RNDIS > > USB RNDIS network up! > > BOOTP broadcast 1 > > BOOTP broadcast 2 > > BOOTP broadcast 3 > > DHCP client bound to address 192.168.200.157 (757 ms) > > Using usb_ether device > > TFTP from server 192.168.200.1; our IP address is 192.168.200.157 > > Filename 'u-boot.img'. > > Load address: 0x82000000 > > Loading: ################################################################# > > ################################################################# > > ################################################################# > > ######################### > > 2.5 MiB/s > > done > > Bytes transferred = 1123888 (112630 hex) > > => > > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: How to debug u-boot data abort 2022-03-24 3:18 ` AKASHI Takahiro @ 2022-03-24 7:38 ` qianfan 0 siblings, 0 replies; 23+ messages in thread From: qianfan @ 2022-03-24 7:38 UTC (permalink / raw) To: AKASHI Takahiro, Heinrich Schuchardt, u-boot 在 2022/3/24 11:18, AKASHI Takahiro 写道: > On Wed, Mar 23, 2022 at 09:27:08AM +0100, Heinrich Schuchardt wrote: >> On 3/23/22 08:45, qianfan wrote: >>> 在 2022/3/23 10:28, qianfan 写道: >>>> Hi: >>>> >>>> I had a custom AM335X board connected my computer by usbnet. It always >>>> report data abort when 'dhcp': >>>> >>>> Next it the log: >>>> >>>> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800) >>>> >>>> CPU : AM335X-GP rev 2.1 >>>> Model: WISDOM AM335X CCT >>>> DRAM: 512 MiB >>>> NAND: 256 MiB >>>> MMC: OMAP SD/MMC: 0 >>>> Loading Environment from NAND... *** Warning - bad CRC, using default >>>> environment >>>> >>>> Net: Could not get PHY for ethernet@4a100000: addr 0 >>>> eth2: ethernet@4a100000, eth3: usb_ether >>>> Hit any key to stop autoboot: 0 >>>> => setenv autoload no >>>> => dhcp >>>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>>> MAC de:ad:be:ef:00:01 >>>> HOST MAC de:ad:be:ef:00:00 >>>> RNDIS ready >>>> musb-hdrc: peripheral reset irq lost! >>>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>>> USB RNDIS network up! >>>> BOOTP broadcast 1 >>>> BOOTP broadcast 2 >>>> BOOTP broadcast 3 >>>> DHCP client bound to address 192.168.200.4 (757 ms) >>>> data abort >> This could be an alignment error. >> >>>> pc : [<9fe9b0a2>] lr : [<9febbc3f>] >>>> reloc pc : [<808130a2>] lr : [<80833c3f>] >> You can use these addresses together with the u-boot.map file to figure >> out in which function the abort occurs and from where it was called. >> >> Use 'arm-linux-gnueabihf-objdump -S -D' to find the exact code positions. >> >>>> sp : 9de53410 ip : 9de53578 fp : 00000001 >>>> r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 >>>> r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 >>>> r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 >>>> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >>>> Code: f023 0303 60ca 4403 (6091) 685a >> This is how to find the exact instruction causing the problem: >> >> $ echo 'Code: f023 0303 60ca 4403 (6091) 685a' | \ >>> ARCH=arm scripts/decodecode >> Code: f023 0303 60ca 4403 (6091) 685a >> All code >> ======== >> 0: 23 f0 and %eax,%esi >> 2: 03 03 add (%rbx),%eax >> 4: ca 60 03 lret $0x360 >> 7:* 44 91 rex.R xchg %eax,%ecx <-- >> trapping instruction >> 9: 60 (bad) >> a: 5a pop %rdx >> b: 68 .byte 0x68 >> >> Code starting with the faulting instruction >> =========================================== >> 0: 91 xchg %eax,%ecx >> 1: 60 (bad) >> 2: 5a pop %rdx >> 3: 68 .byte 0x68 > The code looks like x86 instructions. > Please don't forget to add "CROSS_COMPILE=..." :) > > Code: f023 0303 60ca 4403 (6091) 685a > All code > ======== > 0: f023 0303 bic.w r3, r3, #3 > 4: 60ca str r2, [r1, #12] > 6: 4403 add r3, r0 > 8:* 6091 str r1, [r2, #8] <-- trapping instruction > a: 685a ldr r2, [r3, #4] > > Code starting with the faulting instruction > =========================================== > 0: 6091 str r1, [r2, #8] > 2: 685a ldr r2, [r3, #4] > > Then, > ${CROSS_COMPILE}objdump --disassemble=malloc -lS ${BUILDDIR}/u-boot | grep -A 10 -B 20 ${PATTERN} > # Here, PATTERN may be the instruction ("6091") or the location ("8081496c" in your case?) > > or similarly > > ${CROSS_COMPILE}gdb --batch -ex "disas/m ${LOC}" ${BUILDDIR}/u-boot | grep -A 10 -B 20 ${LOC} > # Here, LOC is your "reloc pc" (0x80817586) > > gives you some hint about the exact location. > > -Takahiro Akashi Hi: Thanks for your's guide. I know the pc in malloc and lr is env_attr_walk. But can't get the full stack or malloc. I can't understand dlmalloc's logic and it's hard to me to solve this problem. > > >> I hope this helps to figure out, where exactly the problem occurs >> >> Best regards >> >> Heinrich >> >>>> Resetting CPU ... >>>> >>>> resetting ... >>>> >>>> >>>> It's there has any doc about how to debug data abort? Or is the bug is >>>> already fixed? >>>> >>>> Thanks >>>> >>> This bug doesn't fixed on master code. I found v2021.01 is good and >>> v2021.04-rc2 is bad. >>> >>> Also I had tested this on beaglebone black with am335x_evm_defconfig, >>> has the simliar problem. >>> >>> find the first bug commit via 'git bisect': it told me that commit >>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 broke it. But it is very >>> strange due to this commit doesn't touch any dhcp or network code. >>> >>> ➜ u-boot-main git:(e97eb638de) ✗ git bisect bug >>> e97eb638de0dc8f6e989e20eaeb0342f103cb917 is the first bug commit >>> commit e97eb638de0dc8f6e989e20eaeb0342f103cb917 >>> Author: Heinrich Schuchardt <xypron.glpk@gmx.de> >>> Date: Wed Jan 20 22:21:53 2021 +0100 >>> >>> fs: fat: consistent error handling for flush_dir() >>> >>> Provide function description for flush_dir(). >>> Move all error messages for flush_dir() from the callers to the >>> function. >>> Move mapping of errors to -EIO to the function. >>> Always check return value of flush_dir() (Coverity CID 316362). >>> >>> In fat_unlink() return -EIO if flush_dirty_fat_buffer() fails. >>> >>> Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> >>> >>> :040000 040000 2281a449f2d134078d7faa1ee735a367b55aad7e >>> 77d188b1c99181fd71f2167fdeee3434a09db209 M fs >>> >>> >>> 184aa6504143b452132e28cd3ebecc7b941cdfa1 is the first commit before >>> e97eb638de0dc8f6e989e20eaeb0342f103cb917: >>> >>> * e97eb638de0dc8f6e989e20eaeb0342f103cb917 fs: fat: consistent error >>> handling for flush_dir() >>> * 184aa6504143b452132e28cd3ebecc7b941cdfa1 Merge tag >>> 'u-boot-rockchip-20210121' of >>> https://gitlab.denx.de/u-boot/custodians/u-boot-rockchip >>> |\ >>> | * 9ddc0787bd660214366e386ce689dd78299ac9d0 pci: Add Rockchip dwc based >>> PCIe controller driver >>> >>> I checked 184aa6504143b452132e28cd3ebecc7b941cdfa1 can work fine. >>> >>> U-Boot 2021.01-00688-g184aa65041-dirty (Mar 23 2022 - 15:07:56 +0800) >>> >>> CPU : AM335X-GP rev 2.1 >>> Model: TI AM335x BeagleBone Black >>> DRAM: 512 MiB >>> WDT: Started with servicing (60s timeout) >>> NAND: 0 MiB >>> MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1 >>> Loading Environment from FAT... <ethaddr> not set. Validating first >>> E-fuse MAC >>> Net: eth2: ethernet@4a100000, eth3: usb_ether >>> Hit any key to stop autoboot: 0 >>> => dhcp >>> ethernet@4a100000 Waiting for PHY auto negotiation to complete......... >>> TIMEOUT ! >>> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >>> MAC de:ad:be:ef:00:01 >>> HOST MAC de:ad:be:ef:00:00 >>> RNDIS ready >>> musb-hdrc: peripheral reset irq lost! >>> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >>> USB RNDIS network up! >>> BOOTP broadcast 1 >>> BOOTP broadcast 2 >>> BOOTP broadcast 3 >>> DHCP client bound to address 192.168.200.157 (757 ms) >>> Using usb_ether device >>> TFTP from server 192.168.200.1; our IP address is 192.168.200.157 >>> Filename 'u-boot.img'. >>> Load address: 0x82000000 >>> Loading: ################################################################# >>> ################################################################# >>> ################################################################# >>> ######################### >>> 2.5 MiB/s >>> done >>> Bytes transferred = 1123888 (112630 hex) >>> => >>> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: How to debug u-boot data abort 2022-03-23 2:28 How to debug u-boot data abort qianfan 2022-03-23 7:45 ` qianfan @ 2022-03-23 7:51 ` Abder 2022-03-23 7:59 ` qianfan 1 sibling, 1 reply; 23+ messages in thread From: Abder @ 2022-03-23 7:51 UTC (permalink / raw) To: qianfan; +Cc: U-Boot Mailing List Le mer. 23 mars 2022 à 03:28, qianfan <qianfanguijin@163.com> a écrit : > > Hi: > > I had a custom AM335X board connected my computer by usbnet. It always report > data abort when 'dhcp': > > Next it the log: > > U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800) > > CPU : AM335X-GP rev 2.1 > Model: WISDOM AM335X CCT > DRAM: 512 MiB > NAND: 256 MiB > MMC: OMAP SD/MMC: 0 > Loading Environment from NAND... *** Warning - bad CRC, using default environment > > Net: Could not get PHY for ethernet@4a100000: addr 0 > eth2: ethernet@4a100000, eth3: usb_ether > Hit any key to stop autoboot: 0 > => setenv autoload no > => dhcp > using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in > MAC de:ad:be:ef:00:01 > HOST MAC de:ad:be:ef:00:00 > RNDIS ready > musb-hdrc: peripheral reset irq lost! > high speed config #2: 2 mA, Ethernet Gadget, using RNDIS > USB RNDIS network up! > BOOTP broadcast 1 > BOOTP broadcast 2 > BOOTP broadcast 3 > DHCP client bound to address 192.168.200.4 (757 ms) > data abort > pc : [<9fe9b0a2>] lr : [<9febbc3f>] > reloc pc : [<808130a2>] lr : [<80833c3f>] > sp : 9de53410 ip : 9de53578 fp : 00000001 > r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 > r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 > r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 > Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) > Code: f023 0303 60ca 4403 (6091) 685a > Resetting CPU ... > Don't have any idea on what is causing the crash, but to answer your question about debugging data abort : from the reg dump, you can look at the PC and LR registers to see the function that caused the crash (in PC) and its caller (in LR) by using the .map file (generated after compilation). use the values of pc and lr ante relocation (the 2nd ligne in the dump above: reloc pc ...) Regards -- Abder > resetting ... > > > It's there has any doc about how to debug data abort? Or is the bug is already > fixed? > > Thanks ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: How to debug u-boot data abort 2022-03-23 7:51 ` Abder @ 2022-03-23 7:59 ` qianfan 0 siblings, 0 replies; 23+ messages in thread From: qianfan @ 2022-03-23 7:59 UTC (permalink / raw) To: Abder; +Cc: U-Boot Mailing List 在 2022/3/23 15:51, Abder 写道: > Le mer. 23 mars 2022 à 03:28, qianfan <qianfanguijin@163.com> a écrit : >> Hi: >> >> I had a custom AM335X board connected my computer by usbnet. It always report >> data abort when 'dhcp': >> >> Next it the log: >> >> U-Boot 2022.01-rc1-00183-gfa5b4e2d19-dirty (Feb 25 2022 - 15:45:02 +0800) >> >> CPU : AM335X-GP rev 2.1 >> Model: WISDOM AM335X CCT >> DRAM: 512 MiB >> NAND: 256 MiB >> MMC: OMAP SD/MMC: 0 >> Loading Environment from NAND... *** Warning - bad CRC, using default environment >> >> Net: Could not get PHY for ethernet@4a100000: addr 0 >> eth2: ethernet@4a100000, eth3: usb_ether >> Hit any key to stop autoboot: 0 >> => setenv autoload no >> => dhcp >> using musb-hdrc, OUT ep1out IN ep1in STATUS ep2in >> MAC de:ad:be:ef:00:01 >> HOST MAC de:ad:be:ef:00:00 >> RNDIS ready >> musb-hdrc: peripheral reset irq lost! >> high speed config #2: 2 mA, Ethernet Gadget, using RNDIS >> USB RNDIS network up! >> BOOTP broadcast 1 >> BOOTP broadcast 2 >> BOOTP broadcast 3 >> DHCP client bound to address 192.168.200.4 (757 ms) >> data abort >> pc : [<9fe9b0a2>] lr : [<9febbc3f>] >> reloc pc : [<808130a2>] lr : [<80833c3f>] >> sp : 9de53410 ip : 9de53578 fp : 00000001 >> r10: 9de5345c r9 : 9de67e80 r8 : 9febbae5 >> r7 : 9de72c30 r6 : 9feec710 r5 : 0000000d r4 : 00000018 >> r3 : 3fdd8e04 r2 : 00000002 r1 : 9feec728 r0 : 9feec700 >> Flags: Nzcv IRQs off FIQs on Mode SVC_32 (T) >> Code: f023 0303 60ca 4403 (6091) 685a >> Resetting CPU ... >> > Don't have any idea on what is causing the crash, but to answer your > question about debugging data abort : > from the reg dump, you can look at the PC and LR registers to see the > function that caused the crash (in PC) and its caller (in LR) by using > the .map file (generated after compilation). > use the values of pc and lr ante relocation (the 2nd ligne in the dump > above: reloc pc ...) Hi: Thanks for your's guide. I had this data abort message: data abort pc : [<9ff8196c>] lr : [<9ffa1cd7>] reloc pc : [<8081496c>] lr : [<80834cd7>] and found the pc and lc address in u-boot.map: .text.env_attr_walk 0x0000000080834c54 0xb4 env/built-in.o 0x0000000080834c54 env_attr_walk .text.malloc 0x0000000080814900 0x420 common/built-in.o 0x0000000080814900 malloc Is means that data abort when 'malloc' called from env_attr_walk? It's there has a better way that can dump stack? > > Regards > -- > Abder > >> resetting ... >> >> >> It's there has any doc about how to debug data abort? Or is the bug is already >> fixed? >> >> Thanks ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2023-07-21 22:27 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-03-23 2:28 How to debug u-boot data abort qianfan 2022-03-23 7:45 ` qianfan 2022-03-23 8:02 ` data abort when run 'dhcp' qianfan 2022-03-23 9:13 ` qianfan 2022-03-23 9:51 ` Heinrich Schuchardt 2022-03-23 10:07 ` qianfan 2022-03-23 10:12 ` Heinrich Schuchardt 2022-03-23 11:54 ` qianfanguijin 2022-03-24 1:23 ` qianfan 2022-03-24 9:33 ` qianfan 2022-03-25 10:04 ` qianfan 2023-07-20 16:39 ` Miquel Raynal 2023-07-20 17:55 ` Heinrich Schuchardt 2023-07-21 11:54 ` Miquel Raynal 2023-07-20 18:34 ` Tom Rini 2023-07-21 11:55 ` Miquel Raynal 2023-07-21 0:31 ` qianfan 2023-07-21 22:26 ` Miquel Raynal 2022-03-23 8:27 ` How to debug u-boot data abort Heinrich Schuchardt 2022-03-24 3:18 ` AKASHI Takahiro 2022-03-24 7:38 ` qianfan 2022-03-23 7:51 ` Abder 2022-03-23 7:59 ` qianfan
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.