All of lore.kernel.org
 help / color / mirror / Atom feed
* [U-Boot] netconsole: USB Ethernet connection dropping with ping or tftpboot
@ 2015-02-03 21:44 Jörg Krause
  2015-02-05 11:21 ` Jörg Krause
  2015-02-05 19:20 ` Joe Hershberger
  0 siblings, 2 replies; 15+ messages in thread
From: Jörg Krause @ 2015-02-03 21:44 UTC (permalink / raw)
  To: u-boot

I followed the instructions from Marek in 'M28 U-Boot Single-Wire Debug
preview' to enable NetConsole:
http://www.denx-cs.de/?q=blogm28singlewiredebug

I'm using mxsldr to flash the u-boot.sb image to RAM. This is the dmesg
output from my host:

        [31048.492181] usb 3-13: new high-speed USB device number 24
        using xhci_hcd
        [31048.667617] cdc_ether 3-13:1.0 eth0: register 'cdc_ether' at
        usb-0000:00:14.0-13, CDC Ethernet Device, 00:19:b8:00:00:01
        [31048.669101] cdc_ether 3-13:1.0 enp0s20u13: renamed from eth0
        [31048.696758] cdc_ether 3-13:1.0 enp0s20u13: kevent 12 may have
        been dropped

I noticed the 'kevent 12 may have been dropped' message here.

Nevertheless, NetworkManager shows me a USB Ethernet connection and
using './netconsole 10.0.0.2' I am able to communicate with the device,
e.g. 'nand info' outputs correctly.

But if I use 'ping 10.0.0.1' or 'tftpboot u-boot.sb' the network
connection drops. Both commands work fine if I switch back from
netconsole to serial in-/output.

This is the output from dmesg:
        [31620.215354] usb 3-13: USB disconnect, device number 24
        [31620.215422] cdc_ether 3-13:1.0 enp0s20u13: unregister
        'cdc_ether' usb-0000:00:14.0-13, CDC Ethernet Device

Do I missed something?

Best regards
J?rg Krause

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] netconsole: USB Ethernet connection dropping with ping or tftpboot
  2015-02-03 21:44 [U-Boot] netconsole: USB Ethernet connection dropping with ping or tftpboot Jörg Krause
@ 2015-02-05 11:21 ` Jörg Krause
  2015-02-05 15:33   ` Stephen Warren
  2015-02-05 19:20 ` Joe Hershberger
  1 sibling, 1 reply; 15+ messages in thread
From: Jörg Krause @ 2015-02-05 11:21 UTC (permalink / raw)
  To: u-boot

On Di, 2015-02-03 at 22:44 +0100, J?rg Krause wrote:
> I followed the instructions from Marek in 'M28 U-Boot Single-Wire Debug
> preview' to enable NetConsole:
> http://www.denx-cs.de/?q=blogm28singlewiredebug
> 
> I'm using mxsldr to flash the u-boot.sb image to RAM. This is the dmesg
> output from my host:
> 
>         [31048.492181] usb 3-13: new high-speed USB device number 24
>         using xhci_hcd
>         [31048.667617] cdc_ether 3-13:1.0 eth0: register 'cdc_ether' at
>         usb-0000:00:14.0-13, CDC Ethernet Device, 00:19:b8:00:00:01
>         [31048.669101] cdc_ether 3-13:1.0 enp0s20u13: renamed from eth0
>         [31048.696758] cdc_ether 3-13:1.0 enp0s20u13: kevent 12 may have
>         been dropped
> 
> I noticed the 'kevent 12 may have been dropped' message here.
> 
> Nevertheless, NetworkManager shows me a USB Ethernet connection and
> using './netconsole 10.0.0.2' I am able to communicate with the device,
> e.g. 'nand info' outputs correctly.
> 
> But if I use 'ping 10.0.0.1' or 'tftpboot u-boot.sb' the network
> connection drops. Both commands work fine if I switch back from
> netconsole to serial in-/output.
> 
> This is the output from dmesg:
>         [31620.215354] usb 3-13: USB disconnect, device number 24
>         [31620.215422] cdc_ether 3-13:1.0 enp0s20u13: unregister
>         'cdc_ether' usb-0000:00:14.0-13, CDC Ethernet Device
> 
> Do I missed something?

I managed to get serial console and netconsole muxing to work, so I can
see all the debug messages:

        Hit any key to stop autoboot:  5 using ci_udc, OUT ep- IN ep-
        STATUS ep-
        MAC 00:19:b8:00:00:02
        HOST MAC 00:19:b8:00:00:01
        high speed config #1: 2 mA, Ethernet Gadget, using CDC Ethernet
        USB network up!
         0 
        => tftp u-boot.sb
        using ci_udc, OUT ep- IN ep- STATUS ep-
        timeout sending packets to usb ethernet

This reminded me about an issue I had some months ago:
http://lists.denx.de/pipermail/u-boot/2014-July/182885.html

I enabled debug output in arch/arm/cpu/arm926ejs/cache.c and I get
error:
=> tftp u-boot.sb
using ci_udc, OUT ep- IN ep- STATUS ep-
CACHE: Misaligned operation at range [43f7b010, 43f7b070]

I removed the flush_cache() call in cmd_net.c:netboot_common() as
suggested by Marek in the thread. But the error message is still there.

Any idea?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] netconsole: USB Ethernet connection dropping with ping or tftpboot
  2015-02-05 11:21 ` Jörg Krause
@ 2015-02-05 15:33   ` Stephen Warren
  2015-02-05 22:10     ` Jörg Krause
  0 siblings, 1 reply; 15+ messages in thread
From: Stephen Warren @ 2015-02-05 15:33 UTC (permalink / raw)
  To: u-boot

On 02/05/2015 04:21 AM, J?rg Krause wrote:
> On Di, 2015-02-03 at 22:44 +0100, J?rg Krause wrote:
>> I followed the instructions from Marek in 'M28 U-Boot Single-Wire Debug
>> preview' to enable NetConsole:
>> http://www.denx-cs.de/?q=blogm28singlewiredebug
>>
>> I'm using mxsldr to flash the u-boot.sb image to RAM. This is the dmesg
>> output from my host:
>>
>>          [31048.492181] usb 3-13: new high-speed USB device number 24
>>          using xhci_hcd
>>          [31048.667617] cdc_ether 3-13:1.0 eth0: register 'cdc_ether' at
>>          usb-0000:00:14.0-13, CDC Ethernet Device, 00:19:b8:00:00:01
>>          [31048.669101] cdc_ether 3-13:1.0 enp0s20u13: renamed from eth0
>>          [31048.696758] cdc_ether 3-13:1.0 enp0s20u13: kevent 12 may have
>>          been dropped
>>
>> I noticed the 'kevent 12 may have been dropped' message here.
>>
>> Nevertheless, NetworkManager shows me a USB Ethernet connection and
>> using './netconsole 10.0.0.2' I am able to communicate with the device,
>> e.g. 'nand info' outputs correctly.
>>
>> But if I use 'ping 10.0.0.1' or 'tftpboot u-boot.sb' the network
>> connection drops. Both commands work fine if I switch back from
>> netconsole to serial in-/output.
>>
>> This is the output from dmesg:
>>          [31620.215354] usb 3-13: USB disconnect, device number 24
>>          [31620.215422] cdc_ether 3-13:1.0 enp0s20u13: unregister
>>          'cdc_ether' usb-0000:00:14.0-13, CDC Ethernet Device
>>
>> Do I missed something?
>
> I managed to get serial console and netconsole muxing to work, so I can
> see all the debug messages:
>
>          Hit any key to stop autoboot:  5 using ci_udc, OUT ep- IN ep-
>          STATUS ep-
>          MAC 00:19:b8:00:00:02
>          HOST MAC 00:19:b8:00:00:01
>          high speed config #1: 2 mA, Ethernet Gadget, using CDC Ethernet
>          USB network up!
>           0
>          => tftp u-boot.sb
>          using ci_udc, OUT ep- IN ep- STATUS ep-
>          timeout sending packets to usb ethernet
>
> This reminded me about an issue I had some months ago:
> http://lists.denx.de/pipermail/u-boot/2014-July/182885.html
>
> I enabled debug output in arch/arm/cpu/arm926ejs/cache.c and I get
> error:
> => tftp u-boot.sb
> using ci_udc, OUT ep- IN ep- STATUS ep-
> CACHE: Misaligned operation at range [43f7b010, 43f7b070]
>
> I removed the flush_cache() call in cmd_net.c:netboot_common() as
> suggested by Marek in the thread. But the error message is still there.

Perhaps make flush_cache() a macro that also passes in the file/line 
number where it's called from, and print those out along with he 
"misaligned operation" error message?

(or find some other way to perform a stack dump from within that function).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] netconsole: USB Ethernet connection dropping with ping or tftpboot
  2015-02-03 21:44 [U-Boot] netconsole: USB Ethernet connection dropping with ping or tftpboot Jörg Krause
  2015-02-05 11:21 ` Jörg Krause
@ 2015-02-05 19:20 ` Joe Hershberger
  2015-02-05 20:39   ` Jörg Krause
  1 sibling, 1 reply; 15+ messages in thread
From: Joe Hershberger @ 2015-02-05 19:20 UTC (permalink / raw)
  To: u-boot

On Tue, Feb 3, 2015 at 3:44 PM, J?rg Krause <jkrause@posteo.de> wrote:
> But if I use 'ping 10.0.0.1' or 'tftpboot u-boot.sb' the network
> connection drops. Both commands work fine if I switch back from
> netconsole to serial in-/output.
>
> This is the output from dmesg:
>         [31620.215354] usb 3-13: USB disconnect, device number 24
>         [31620.215422] cdc_ether 3-13:1.0 enp0s20u13: unregister
>         'cdc_ether' usb-0000:00:14.0-13, CDC Ethernet Device

One thing to note is that the network stack generally is designed to sit in
a state of inactive (i.e. devices halted / inactive). When a network
command is issued, the driver is initialized, then the command runs, then
the driver is halted again.

NetConsole breaks this assumption, since the network stack needs to be up
the whole time it is selected.  Net console used to bring the network
driver up and down for every character it sent.  Naturally this was a huge
problem on USB network adapters that don't come up fast or any other that
doesn't.  As a workaround (and its current state) when net console is used,
the network stack is lied to about the state of the driver (telling it the
driver is halted or inited) when the current and previous packets are net
console packets.  When a different type (ping or tftp, etc) of network
packet is sent, the driver is actually brought down and back up to ensure
the assumptions about the network stack hold true.  With more work we can
potentially make these better when net console is enabled.

Cheers,
-Joe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] netconsole: USB Ethernet connection dropping with ping or tftpboot
  2015-02-05 19:20 ` Joe Hershberger
@ 2015-02-05 20:39   ` Jörg Krause
  2015-02-05 20:48     ` Joe Hershberger
  0 siblings, 1 reply; 15+ messages in thread
From: Jörg Krause @ 2015-02-05 20:39 UTC (permalink / raw)
  To: u-boot

Hi Joe,

On Do, 2015-02-05 at 13:20 -0600, Joe Hershberger wrote:
> On Tue, Feb 3, 2015 at 3:44 PM, J?rg Krause <jkrause@posteo.de> wrote:
> > But if I use 'ping 10.0.0.1' or 'tftpboot u-boot.sb' the network
> > connection drops. Both commands work fine if I switch back from
> > netconsole to serial in-/output.
> >
> > This is the output from dmesg:
> >         [31620.215354] usb 3-13: USB disconnect, device number 24
> >         [31620.215422] cdc_ether 3-13:1.0 enp0s20u13: unregister
> >         'cdc_ether' usb-0000:00:14.0-13, CDC Ethernet Device
> 
> One thing to note is that the network stack generally is designed to
> sit in a state of inactive (i.e. devices halted / inactive). When a
> network command is issued, the driver is initialized, then the command
> runs, then the driver is halted again.
> 
> 
> NetConsole breaks this assumption, since the network stack needs to be
> up the whole time it is selected.  Net console used to bring the
> network driver up and down for every character it sent.  Naturally
> this was a huge problem on USB network adapters that don't come up
> fast or any other that doesn't.  As a workaround (and its current
> state) when net console is used, the network stack is lied to about
> the state of the driver (telling it the driver is halted or inited)
> when the current and previous packets are net console packets.  When a
> different type (ping or tftp, etc) of network packet is sent, the
> driver is actually brought down and back up to ensure the assumptions
> about the network stack hold true.  With more work we can potentially
> make these better when net console is enabled.

Thank you for the explanation! What do you mean with more work? Do you
have already some ideas in mind?

Best regards
J?rg

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] netconsole: USB Ethernet connection dropping with ping or tftpboot
  2015-02-05 20:39   ` Jörg Krause
@ 2015-02-05 20:48     ` Joe Hershberger
  2015-02-05 22:28       ` Jörg Krause
  0 siblings, 1 reply; 15+ messages in thread
From: Joe Hershberger @ 2015-02-05 20:48 UTC (permalink / raw)
  To: u-boot

On Thu, Feb 5, 2015 at 2:39 PM, J?rg Krause <jkrause@posteo.de> wrote:
>
> Hi Joe,
>
> On Do, 2015-02-05 at 13:20 -0600, Joe Hershberger wrote:
> > On Tue, Feb 3, 2015 at 3:44 PM, J?rg Krause <jkrause@posteo.de> wrote:
> > > But if I use 'ping 10.0.0.1' or 'tftpboot u-boot.sb' the network
> > > connection drops. Both commands work fine if I switch back from
> > > netconsole to serial in-/output.
> > >
> > > This is the output from dmesg:
> > >         [31620.215354] usb 3-13: USB disconnect, device number 24
> > >         [31620.215422] cdc_ether 3-13:1.0 enp0s20u13: unregister
> > >         'cdc_ether' usb-0000:00:14.0-13, CDC Ethernet Device
> >
> > One thing to note is that the network stack generally is designed to
> > sit in a state of inactive (i.e. devices halted / inactive). When a
> > network command is issued, the driver is initialized, then the command
> > runs, then the driver is halted again.
> >
> >
> > NetConsole breaks this assumption, since the network stack needs to be
> > up the whole time it is selected.  Net console used to bring the
> > network driver up and down for every character it sent.  Naturally
> > this was a huge problem on USB network adapters that don't come up
> > fast or any other that doesn't.  As a workaround (and its current
> > state) when net console is used, the network stack is lied to about
> > the state of the driver (telling it the driver is halted or inited)
> > when the current and previous packets are net console packets.  When a
> > different type (ping or tftp, etc) of network packet is sent, the
> > driver is actually brought down and back up to ensure the assumptions
> > about the network stack hold true.  With more work we can potentially
> > make these better when net console is enabled.
>
> Thank you for the explanation! What do you mean with more work? Do you
> have already some ideas in mind?

I think it might be possible to work through the reasons that these network
functions assume the network interface should be down when they begin and
for certain special cases not bring them down at all when net console is
enabled.  I think this is likely to be non-trivial, though.  If you look in
include/net.h you'll see eth_is_on_demand_init().  Changing the logic here
would be the first step to testing.

Cheers,
-Joe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] netconsole: USB Ethernet connection dropping with ping or tftpboot
  2015-02-05 15:33   ` Stephen Warren
@ 2015-02-05 22:10     ` Jörg Krause
  2015-02-05 22:23       ` Stephen Warren
  0 siblings, 1 reply; 15+ messages in thread
From: Jörg Krause @ 2015-02-05 22:10 UTC (permalink / raw)
  To: u-boot

On Do, 2015-02-05 at 08:33 -0700, Stephen Warren wrote:
> On 02/05/2015 04:21 AM, J?rg Krause wrote:
> > On Di, 2015-02-03 at 22:44 +0100, J?rg Krause wrote:
> >> I followed the instructions from Marek in 'M28 U-Boot Single-Wire Debug
> >> preview' to enable NetConsole:
> >> http://www.denx-cs.de/?q=blogm28singlewiredebug
> >>
> >> I'm using mxsldr to flash the u-boot.sb image to RAM. This is the dmesg
> >> output from my host:
> >>
> >>          [31048.492181] usb 3-13: new high-speed USB device number 24
> >>          using xhci_hcd
> >>          [31048.667617] cdc_ether 3-13:1.0 eth0: register 'cdc_ether' at
> >>          usb-0000:00:14.0-13, CDC Ethernet Device, 00:19:b8:00:00:01
> >>          [31048.669101] cdc_ether 3-13:1.0 enp0s20u13: renamed from eth0
> >>          [31048.696758] cdc_ether 3-13:1.0 enp0s20u13: kevent 12 may have
> >>          been dropped
> >>
> >> I noticed the 'kevent 12 may have been dropped' message here.
> >>
> >> Nevertheless, NetworkManager shows me a USB Ethernet connection and
> >> using './netconsole 10.0.0.2' I am able to communicate with the device,
> >> e.g. 'nand info' outputs correctly.
> >>
> >> But if I use 'ping 10.0.0.1' or 'tftpboot u-boot.sb' the network
> >> connection drops. Both commands work fine if I switch back from
> >> netconsole to serial in-/output.
> >>
> >> This is the output from dmesg:
> >>          [31620.215354] usb 3-13: USB disconnect, device number 24
> >>          [31620.215422] cdc_ether 3-13:1.0 enp0s20u13: unregister
> >>          'cdc_ether' usb-0000:00:14.0-13, CDC Ethernet Device
> >>
> >> Do I missed something?
> >
> > I managed to get serial console and netconsole muxing to work, so I can
> > see all the debug messages:
> >
> >          Hit any key to stop autoboot:  5 using ci_udc, OUT ep- IN ep-
> >          STATUS ep-
> >          MAC 00:19:b8:00:00:02
> >          HOST MAC 00:19:b8:00:00:01
> >          high speed config #1: 2 mA, Ethernet Gadget, using CDC Ethernet
> >          USB network up!
> >           0
> >          => tftp u-boot.sb
> >          using ci_udc, OUT ep- IN ep- STATUS ep-
> >          timeout sending packets to usb ethernet
> >
> > This reminded me about an issue I had some months ago:
> > http://lists.denx.de/pipermail/u-boot/2014-July/182885.html
> >
> > I enabled debug output in arch/arm/cpu/arm926ejs/cache.c and I get
> > error:
> > => tftp u-boot.sb
> > using ci_udc, OUT ep- IN ep- STATUS ep-
> > CACHE: Misaligned operation at range [43f7b010, 43f7b070]
> >
> > I removed the flush_cache() call in cmd_net.c:netboot_common() as
> > suggested by Marek in the thread. But the error message is still there.
> 
> Perhaps make flush_cache() a macro that also passes in the file/line 
> number where it's called from, and print those out along with he 
> "misaligned operation" error message?
> 
> (or find some other way to perform a stack dump from within that function).

I've added in each function in ci_udc a printf statement before a cache
function is called, eg:

        static void ci_flush_qh(int ep_num)
        {
        	[..]
        	printf("CACHE: flush_dcache_range %s %d
        \n",__FUNCTION__,__LINE__);
        	flush_dcache_range(start, end);
        }

I've also looked at all calling functions of flush_cache or
flush_dcache_range, but I think there is nothing of relevance.

This is a snippet of the output:

        Using usb_ether device
        CACHE: flush_dcache_range ci_bounce 356 
        timeout sending packets to usb ethernet
        CACHE: flush_dcache_range ci_bounce 356 
        CACHE: flush_dcache_range ci_bounce 356 
        timeout sending packets to usb ethernet
        timeout sending packets to usb ethernet
        CACHE: flush_dcache_range ci_bounce 356 
        timeout sending packets to usb ethernet
        CACHE: flush_dcache_range ci_bounce 356 
        CACHE: flush_dcache_range ci_bounce 356 
        timeout sending packets to usb ethernet
        timeout sending packets to usb ethernet
        CACHE: flush_dcache_range ci_bounce 356 
        timeout sending packets to usb ethernet
        CACHE: flush_dcache_range ci_flush_qh 166 
        CACHE: flush_dcache_range ci_bounce 356 
        timeout sending packets to usb ethernet
        CACHE: flush_dcache_range ci_flush_qh 166 
        CACHE: flush_dcache_range ci_bounce 356 
        timeout sending packets to usb ethernet
        CACHE: flush_dcache_range ci_flush_qh 166 
        CACHE: flush_dcache_range ci_bounce 356 
        timeout sending packets to usb ethernet
        CACHE: flush_dcache_range ci_flush_qh 166 
        CACHE: flush_dcache_range ci_bounce 356 
        timeout sending packets to usb ethernet
        CACHE: flush_dcache_range ci_flush_qh 166 
        CACHE: flush_dcache_range ci_bounce 356 
        CACHE: Misaligned operation at range [43f7b010, 43f7b070]
        timeout sending packets to usb ethernet
        ping failed; host 10.0.0.1 is not alive
        
I hope this helps.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] netconsole: USB Ethernet connection dropping with ping or tftpboot
  2015-02-05 22:10     ` Jörg Krause
@ 2015-02-05 22:23       ` Stephen Warren
  2015-02-05 23:19         ` Jörg Krause
  2015-02-06  1:06         ` Jörg Krause
  0 siblings, 2 replies; 15+ messages in thread
From: Stephen Warren @ 2015-02-05 22:23 UTC (permalink / raw)
  To: u-boot

On 02/05/2015 03:10 PM, J?rg Krause wrote:
> On Do, 2015-02-05 at 08:33 -0700, Stephen Warren wrote:
>> On 02/05/2015 04:21 AM, J?rg Krause wrote:
...
>>> This reminded me about an issue I had some months ago:
>>> http://lists.denx.de/pipermail/u-boot/2014-July/182885.html
>>>
>>> I enabled debug output in arch/arm/cpu/arm926ejs/cache.c and I get
>>> error:
>>> => tftp u-boot.sb
>>> using ci_udc, OUT ep- IN ep- STATUS ep-
>>> CACHE: Misaligned operation at range [43f7b010, 43f7b070]
>>>
>>> I removed the flush_cache() call in cmd_net.c:netboot_common() as
>>> suggested by Marek in the thread. But the error message is still there.
>>
>> Perhaps make flush_cache() a macro that also passes in the file/line
>> number where it's called from, and print those out along with he
>> "misaligned operation" error message?
>>
>> (or find some other way to perform a stack dump from within that function).
>
> I've added in each function in ci_udc a printf statement before a cache
> function is called, eg:
>
>          static void ci_flush_qh(int ep_num)
>          {
>          	[..]
>          	printf("CACHE: flush_dcache_range %s %d
>          \n",__FUNCTION__,__LINE__);
>          	flush_dcache_range(start, end);
>          }
>
> I've also looked at all calling functions of flush_cache or
> flush_dcache_range, but I think there is nothing of relevance.
>
> This is a snippet of the output:
>
...
>          CACHE: flush_dcache_range ci_bounce 356
>          CACHE: Misaligned operation at range [43f7b010, 43f7b070]
>          timeout sending packets to usb ethernet
>          ping failed; host 10.0.0.1 is not alive

Which git commit did you build, and which board?

I'm curious what values you have for ARCH_DMA_MINALIGN and 
CONFIG_SYS_CACHELINE_SIZE.

Are you sure flush_dcache_range() is the code printing the alignment 
error, not e.g. invalidate_dcache_range()?

The reason I ask most of these questions is that line 356 mentioned in 
your debug spew is in function ci_debounce() in the code I have; no 
doubt I have a different git commit than you have checked out, and the 
debug printfs you added changed the line numbers too.

About the only thing I can think of is that:

a) You're not using an up-to-date ci_udc.c; IIRC I fixed quite a few 
cache issues when I reworked it a while back.

b) In ci_bounce(), the bounce buffer is only allocated if the 
user-buffer is already aligned, and if a large-enough bounce buffer 
wasn't previously allocated. If ci_req->b_buf was uninitialized it could 
be non-zero (thus preventing the expected aligned allocation) yet not 
actually aligned enough.

c) Perhaps ARCH_DMA_MINALIGN or CONFIG_SYS_CACHELINE_SIZE aren't correct 
or are inconsistent.

Ah. I note that check_cache_range() in either arch/arm/cpu/arm1136/cpu.c 
or arch/arm/cpu/arm926ejs/cache.c uses CONFIG_SYS_CACHELINE_SIZE to 
check for alignment, whereas ci_udc.c uses ARCH_DMA_MINALIGN. 
Inconsistency between those two could be at fault.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] netconsole: USB Ethernet connection dropping with ping or tftpboot
  2015-02-05 20:48     ` Joe Hershberger
@ 2015-02-05 22:28       ` Jörg Krause
  0 siblings, 0 replies; 15+ messages in thread
From: Jörg Krause @ 2015-02-05 22:28 UTC (permalink / raw)
  To: u-boot

On Do, 2015-02-05 at 14:48 -0600, Joe Hershberger wrote:
> 
> 
> On Thu, Feb 5, 2015 at 2:39 PM, J?rg Krause <jkrause@posteo.de> wrote:
> >
> > Hi Joe,
> >
> > On Do, 2015-02-05 at 13:20 -0600, Joe Hershberger wrote:
> > > On Tue, Feb 3, 2015 at 3:44 PM, J?rg Krause <jkrause@posteo.de>
> wrote:
> > > > But if I use 'ping 10.0.0.1' or 'tftpboot u-boot.sb' the network
> > > > connection drops. Both commands work fine if I switch back from
> > > > netconsole to serial in-/output.
> > > >
> > > > This is the output from dmesg:
> > > >         [31620.215354] usb 3-13: USB disconnect, device number
> 24
> > > >         [31620.215422] cdc_ether 3-13:1.0 enp0s20u13: unregister
> > > >         'cdc_ether' usb-0000:00:14.0-13, CDC Ethernet Device
> > >
> > > One thing to note is that the network stack generally is designed
> to
> > > sit in a state of inactive (i.e. devices halted / inactive). When
> a
> > > network command is issued, the driver is initialized, then the
> command
> > > runs, then the driver is halted again.
> > >
> > >
> > > NetConsole breaks this assumption, since the network stack needs
> to be
> > > up the whole time it is selected.  Net console used to bring the
> > > network driver up and down for every character it sent.  Naturally
> > > this was a huge problem on USB network adapters that don't come up
> > > fast or any other that doesn't.  As a workaround (and its current
> > > state) when net console is used, the network stack is lied to
> about
> > > the state of the driver (telling it the driver is halted or
> inited)
> > > when the current and previous packets are net console packets.
> When a
> > > different type (ping or tftp, etc) of network packet is sent, the
> > > driver is actually brought down and back up to ensure the
> assumptions
> > > about the network stack hold true.  With more work we can
> potentially
> > > make these better when net console is enabled.
> >
> > Thank you for the explanation! What do you mean with more work? Do
> you
> > have already some ideas in mind?
> 
> I think it might be possible to work through the reasons that these
> network functions assume the network interface should be down when
> they begin and for certain special cases not bring them down at all
> when net console is enabled.  I think this is likely to be
> non-trivial, though.  If you look in include/net.h you'll see
> eth_is_on_demand_init().  Changing the logic here would be the first
> step to testing.

I think I read sth about that the network interface should be down or in
a well-defined state for handling over the control to the Linux kernel.
There where some problems with the Linux drivers, if I remember rigthly.

I will take a closer look at eth_is_on_demand_init. Thank you!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] netconsole: USB Ethernet connection dropping with ping or tftpboot
  2015-02-05 22:23       ` Stephen Warren
@ 2015-02-05 23:19         ` Jörg Krause
  2015-02-06  1:06         ` Jörg Krause
  1 sibling, 0 replies; 15+ messages in thread
From: Jörg Krause @ 2015-02-05 23:19 UTC (permalink / raw)
  To: u-boot

On Do, 2015-02-05 at 15:23 -0700, Stephen Warren wrote:
> On 02/05/2015 03:10 PM, J?rg Krause wrote:
> > On Do, 2015-02-05 at 08:33 -0700, Stephen Warren wrote:
> >> On 02/05/2015 04:21 AM, J?rg Krause wrote:
> ...
> >>> This reminded me about an issue I had some months ago:
> >>> http://lists.denx.de/pipermail/u-boot/2014-July/182885.html
> >>>
> >>> I enabled debug output in arch/arm/cpu/arm926ejs/cache.c and I get
> >>> error:
> >>> => tftp u-boot.sb
> >>> using ci_udc, OUT ep- IN ep- STATUS ep-
> >>> CACHE: Misaligned operation at range [43f7b010, 43f7b070]
> >>>
> >>> I removed the flush_cache() call in cmd_net.c:netboot_common() as
> >>> suggested by Marek in the thread. But the error message is still there.
> >>
> >> Perhaps make flush_cache() a macro that also passes in the file/line
> >> number where it's called from, and print those out along with he
> >> "misaligned operation" error message?
> >>
> >> (or find some other way to perform a stack dump from within that function).
> >
> > I've added in each function in ci_udc a printf statement before a cache
> > function is called, eg:
> >
> >          static void ci_flush_qh(int ep_num)
> >          {
> >          	[..]
> >          	printf("CACHE: flush_dcache_range %s %d
> >          \n",__FUNCTION__,__LINE__);
> >          	flush_dcache_range(start, end);
> >          }
> >
> > I've also looked at all calling functions of flush_cache or
> > flush_dcache_range, but I think there is nothing of relevance.
> >
> > This is a snippet of the output:
> >
> ...
> >          CACHE: flush_dcache_range ci_bounce 356
> >          CACHE: Misaligned operation at range [43f7b010, 43f7b070]
> >          timeout sending packets to usb ethernet
> >          ping failed; host 10.0.0.1 is not alive
> 
> Which git commit did you build, and which board?

Building tag v2015.01 for a custom i.MX28 board which is not upstream.
It's configuration mainly based on mx28evk and m28evk.

> 
> I'm curious what values you have for ARCH_DMA_MINALIGN and 
> CONFIG_SYS_CACHELINE_SIZE.

I defined CONFIG_SYS_CACHELINE_SIZE 32 in the config. ARCH_DMA_MINALIGN
is also 32.

> 
> Are you sure flush_dcache_range() is the code printing the alignment 
> error, not e.g. invalidate_dcache_range()?

I've also added a printf statement to all functions in ci_udc which
calls invalidate_dcache_range(), but it is not logged when the alignment
error occurs. e.g.

        using ci_udc, OUT ep- IN ep- STATUS ep-
        MAC 00:19:b8:00:00:02
        HOST MAC 00:19:b8:00:00:01
        CACHE: flush_dcache_range ci_flush_qh 166 
        CACHE: invalidate_dcache_range ci_invalidate_qh 182 
        CACHE: flush_dcache_range ci_bounce 356 
        CACHE: flush_dcache_range ci_flush_qtd 199 
        CACHE: flush_dcache_range ci_flush_qh 166 

> 
> The reason I ask most of these questions is that line 356 mentioned in 
> your debug spew is in function ci_debounce() in the code I have; no 
> doubt I have a different git commit than you have checked out, and the 
> debug printfs you added changed the line numbers too.

You're right, the debug printfs changed the line number. It's the
flush_dcache_range() call at line 348 in ci_bounce(). I checked this
after removing all printfs.

> About the only thing I can think of is that:
> 
> a) You're not using an up-to-date ci_udc.c; IIRC I fixed quite a few 
> cache issues when I reworked it a while back.

Yes, we had a troubleshooting about this last summer. I had problems
with timeouts using tftp. Marek and you worked on this issue.

> 
> b) In ci_bounce(), the bounce buffer is only allocated if the 
> user-buffer is already aligned, and if a large-enough bounce buffer 
> wasn't previously allocated. If ci_req->b_buf was uninitialized it could 
> be non-zero (thus preventing the expected aligned allocation) yet not 
> actually aligned enough.

Maybe, we should work on this?

> 
> c) Perhaps ARCH_DMA_MINALIGN or CONFIG_SYS_CACHELINE_SIZE aren't correct 
> or are inconsistent.
> 
> Ah. I note that check_cache_range() in either arch/arm/cpu/arm1136/cpu.c 
> or arch/arm/cpu/arm926ejs/cache.c uses CONFIG_SYS_CACHELINE_SIZE to 
> check for alignment, whereas ci_udc.c uses ARCH_DMA_MINALIGN. 
> Inconsistency between those two could be at fault.

Both are set to 32, so this should not be the problem.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] netconsole: USB Ethernet connection dropping with ping or tftpboot
  2015-02-05 22:23       ` Stephen Warren
  2015-02-05 23:19         ` Jörg Krause
@ 2015-02-06  1:06         ` Jörg Krause
  2015-02-06 18:06           ` Stephen Warren
  1 sibling, 1 reply; 15+ messages in thread
From: Jörg Krause @ 2015-02-06  1:06 UTC (permalink / raw)
  To: u-boot

On Do, 2015-02-05 at 15:23 -0700, Stephen Warren wrote:
> 
> b) In ci_bounce(), the bounce buffer is only allocated if the 
> user-buffer is already aligned, and if a large-enough bounce buffer 
> wasn't previously allocated. If ci_req->b_buf was uninitialized it could 
> be non-zero (thus preventing the expected aligned allocation) yet not 
> actually aligned enough.

I can reproduce this issue now. After some "timeout sending packets to
usb ethernet" messages, the bounce buffer somehow gets corrupted.
ci_bounce() is called with an unaligned input buffer length
'req->length=66', but the bounce buffer length
'ci_req->b_len=1140305940' or in hex 'ci_req->b_len=0x43f7b014'. This
bounce buffer length is obviously an address, as the following
misaligned error message shows: "CACHE: Misaligned operation at range
[43f7b010, 43f7b070]".

Both if conditions in 'align:' are not entered.

This is a snippet from my debug output:

        timeout sending packets to usb ethernet
        1: 43b7e180 - 43b7e200.
        req->length: 66
        b_len_1: 96
        b_len_2: 96
        5: 43b7e660 - 43b7e6c0
        
        timeout sending packets to usb ethernet
        1: 43b7e000 - 43b7e080.
        req->length: 66
        b_len_1: 1140305940
        b_len_2: 1140305940
        5: 43f7b010 - 43f7b070
        CACHE: Misaligned operation at range [43f7b010, 43f7b070]
        timeout sending packets to usb ethernet
        ping failed; host 10.0.0.1 is not alive


This is the corresponding code (debug number 1 and 5):

        static void ci_flush_qh(int ep_num)
        {
        	struct ept_queue_head *head = ci_get_qh(ep_num, 0);
        	const uint32_t start = (uint32_t)head;
        	const uint32_t end = start + 2 * sizeof(*head);
        
        	printf("1: %x - %x.\n", start, end);
        	flush_dcache_range(start, end);
        }

[..]

        static int ci_bounce(struct ci_req *ci_req, int in)
        {
        	struct usb_request *req = &ci_req->req;
        	uint32_t addr = (uint32_t)req->buf;
        	uint32_t hwaddr;
        	uint32_t aligned_used_len;
        
        	/* Input buffer address is not aligned. */
        	if (addr & (ARCH_DMA_MINALIGN - 1)) {
        		goto align;
        	}
        
        	/* Input buffer length is not aligned. */
        	if (req->length & (ARCH_DMA_MINALIGN - 1)) {
        		printf("req->length: %d\n", req->length);
        		goto align;
        	}
        
        	/* The buffer is well aligned, only flush cache. */
        	ci_req->hw_len = req->length;
        	ci_req->hw_buf = req->buf;
        	goto flush;
        
        align:
        	printf("b_len_1: %d\n", ci_req->b_len);
        	if (ci_req->b_buf && req->length > ci_req->b_len) {
        		printf("A: %d - %d\n", req->length,  ci_req->b_len);
        		free(ci_req->b_buf);
        		ci_req->b_buf = 0;
        	}
        	if (!ci_req->b_buf) {
        		ci_req->b_len = roundup(req->length, ARCH_DMA_MINALIGN);
        		ci_req->b_buf = memalign(ARCH_DMA_MINALIGN, ci_req->b_len);
        		printf("B: %d - %d\n", req->length,  ci_req->b_len);
        		if (!ci_req->b_buf)
        			return -ENOMEM;
        	}
        	ci_req->hw_len = ci_req->b_len;
        	ci_req->hw_buf = ci_req->b_buf;
        
        	printf("b_len_2: %d\n", ci_req->b_len);
        
        	if (in)
        		memcpy(ci_req->hw_buf, req->buf, req->length);
        
        flush:
        	hwaddr = (uint32_t)ci_req->hw_buf;
        	aligned_used_len = roundup(req->length, ARCH_DMA_MINALIGN);
        	printf("5: %x - %x\n", hwaddr, hwaddr + aligned_used_len);
        	flush_dcache_range(hwaddr, hwaddr + aligned_used_len);
        
        	return 0;
        }

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] netconsole: USB Ethernet connection dropping with ping or tftpboot
  2015-02-06  1:06         ` Jörg Krause
@ 2015-02-06 18:06           ` Stephen Warren
  2015-02-08 21:25             ` Jörg Krause
  0 siblings, 1 reply; 15+ messages in thread
From: Stephen Warren @ 2015-02-06 18:06 UTC (permalink / raw)
  To: u-boot

On 02/05/2015 06:06 PM, J?rg Krause wrote:
> On Do, 2015-02-05 at 15:23 -0700, Stephen Warren wrote:
>>
>> b) In ci_bounce(), the bounce buffer is only allocated if the
>> user-buffer is already aligned, and if a large-enough bounce buffer
>> wasn't previously allocated. If ci_req->b_buf was uninitialized it could
>> be non-zero (thus preventing the expected aligned allocation) yet not
>> actually aligned enough.
>
> I can reproduce this issue now. After some "timeout sending packets to
> usb ethernet" messages, the bounce buffer somehow gets corrupted.
> ci_bounce() is called with an unaligned input buffer length
> 'req->length=66', but the bounce buffer length
> 'ci_req->b_len=1140305940' or in hex 'ci_req->b_len=0x43f7b014'. This
> bounce buffer length is obviously an address, as the following
> misaligned error message shows: "CACHE: Misaligned operation at range
> [43f7b010, 43f7b070]".

Ah, I hadn't realized that was [start, length] rather than [start, end].

The question is: How is ci_req->b_len getting corrupted? Is it simply 
never initialized, or does something trash that value later?

ci_ep_alloc_request() appears to calloc() the whole struct ci_req, so I 
imagine an initialization/allocating error isn't happening.

The only issue there might be some code somehow creating its own struct 
usb_request instead of calling into the controller's ->alloc_request() 
function. I vaguely recall fixing some of those, but might have missed 
some in protocols that I didn't test (i.e. anything other than USB Mass 
Storage or DFU, although I might have very briefly tested netconsole once?).

I would suggest adding a whole ton of printfs() to catch where ci_reqs 
are being allocated, and where ci_req->b_len is getting written in which 
ci_req objects, and then mapping that back to the ci_req that the cache 
alignment error message complains about. Sorry, this will be a bit painful.

If the ci_req is always at the same address on different boots of the 
code, that will make it easier, especially if you have a debugger with a 
data watchpoint, or can write some code to use any data watchpoint 
self-hosted debug capability in your CPU.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] netconsole: USB Ethernet connection dropping with ping or tftpboot
  2015-02-06 18:06           ` Stephen Warren
@ 2015-02-08 21:25             ` Jörg Krause
  2015-02-09 17:38               ` Stephen Warren
  0 siblings, 1 reply; 15+ messages in thread
From: Jörg Krause @ 2015-02-08 21:25 UTC (permalink / raw)
  To: u-boot

On Fr, 2015-02-06 at 11:06 -0700, Stephen Warren wrote:
> On 02/05/2015 06:06 PM, J?rg Krause wrote:
> > On Do, 2015-02-05 at 15:23 -0700, Stephen Warren wrote:
> >>
> >> b) In ci_bounce(), the bounce buffer is only allocated if the
> >> user-buffer is already aligned, and if a large-enough bounce buffer
> >> wasn't previously allocated. If ci_req->b_buf was uninitialized it could
> >> be non-zero (thus preventing the expected aligned allocation) yet not
> >> actually aligned enough.
> >
> > I can reproduce this issue now. After some "timeout sending packets to
> > usb ethernet" messages, the bounce buffer somehow gets corrupted.
> > ci_bounce() is called with an unaligned input buffer length
> > 'req->length=66', but the bounce buffer length
> > 'ci_req->b_len=1140305940' or in hex 'ci_req->b_len=0x43f7b014'. This
> > bounce buffer length is obviously an address, as the following
> > misaligned error message shows: "CACHE: Misaligned operation at range
> > [43f7b010, 43f7b070]".
> 
> Ah, I hadn't realized that was [start, length] rather than [start, end].
> 
> The question is: How is ci_req->b_len getting corrupted? Is it simply 
> never initialized, or does something trash that value later?
> 
> ci_ep_alloc_request() appears to calloc() the whole struct ci_req, so I 
> imagine an initialization/allocating error isn't happening.
> 
> The only issue there might be some code somehow creating its own struct 
> usb_request instead of calling into the controller's ->alloc_request() 
> function. I vaguely recall fixing some of those, but might have missed 
> some in protocols that I didn't test (i.e. anything other than USB Mass 
> Storage or DFU, although I might have very briefly tested netconsole once?).
> 
> I would suggest adding a whole ton of printfs() to catch where ci_reqs 
> are being allocated, and where ci_req->b_len is getting written in which 
> ci_req objects, and then mapping that back to the ci_req that the cache 
> alignment error message complains about. Sorry, this will be a bit painful.
> 
> If the ci_req is always at the same address on different boots of the 
> code, that will make it easier, especially if you have a debugger with a 
> data watchpoint, or can write some code to use any data watchpoint 
> self-hosted debug capability in your CPU.

I think I found the answer.

I used a lot of debug messages and tried to understand the involved
drivers. I will try to sum up my investigations.

NetLoop is entered first for the ping protocol and then for the
netconsole protocol:
        --- NetLoop Entry (PING)
        --- NetLoop Entry (NETCONSOLE)

ci_udc is probed and ci_ep_alloc_request() is called for EP0:
        ci_ep_alloc_request usb_ep:0x43fd0028 ci_req:0x43b83220

In eth_bind() ci_ep_alloc_request() is called for status EP:
        ci_ep_alloc_request usb_ep:0x43fd00a0 ci_req:0x43b7b4a8

Everything is fine:
        using ci_udc, OUT ep- IN ep- STATUS ep-
        MAC 00:19:b8:00:00:02
        HOST MAC 00:19:b8:00:00:01

In eth_set_config() ci_ep_alloc_request() is invoked again for tx_req
and rx_req:
        ci_ep_alloc_request usb_ep:0x43fd0050 ci_req:0x43b7b568
        ci_ep_alloc_request usb_ep:0x43fd0078 ci_req:0x43b7b5b0

Everything is fine:
        high speed config #1: 2 mA, Ethernet Gadget, using CDC Ethernet
        USB network up!

Now the NetLoop is available for netconsole:
        --- NetState set to 0 (CONTINUE)
        --- NetLoop Init (NETCONSOLE)
        --- NetLoop ARP handler set (43fa3b14)
        [..]
        Got ARP REPLY, set eth addr (00:19:b8:00:00:01)
        --- NetState set to 2 (SUCCESS)
        --- NetLoop UDP handler set (00000000)
        --- NetLoop ARP handler set (00000000)
        --- NetLoop timeout handler cancelled
        --- NetLoop Success! (NETCONSOLE)

Now it's pings turn in the NetLoop. The if (eth_is_on_demand_init())
branch is executed. eth_halt() and later eth_disconnect() and
eth_reset_config() are invoked. There the in and out EPs are freed:
        ci_ep_free_request ci_req:0x43b7b568
        ci_ep_free_request ci_req:0x43b7b5b0

However, netconsole tries to send every printf() as an UDP packet to the
host.
        sending UDP to 10.0.0.1/00:19:b8:00:00:01

But the usb_request [1] pointer in usb_eth_send is NULL after the free
request:
        usb_eth_send usb_request: 00000000 length: 82

usb_eth_send() calls ci_ep_queue() which calls ci_bounce() and because
of the corrupted ci_req pointer I get a misaligned cache.

So, as far as I understand, the main problem is that netconsole does not
knows that the connection is disconnected by the NetLoop?

What are your suggestions to this issue? Should we add an addition check
for an initialized usb_request pointer in usb_eth_send()? Or somewhere
else?

Best regards
J?rg Krause

[1]
http://git.denx.de/?p=u-boot.git;a=blob;f=drivers/usb/gadget/ether.c;h=ba442d5ed529bb04a88a41e630477b4a46b32d46;hb=HEAD#l2377

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] netconsole: USB Ethernet connection dropping with ping or tftpboot
  2015-02-08 21:25             ` Jörg Krause
@ 2015-02-09 17:38               ` Stephen Warren
  2015-02-11 22:08                 ` Jörg Krause
  0 siblings, 1 reply; 15+ messages in thread
From: Stephen Warren @ 2015-02-09 17:38 UTC (permalink / raw)
  To: u-boot

On 02/08/2015 02:25 PM, J?rg Krause wrote:
> On Fr, 2015-02-06 at 11:06 -0700, Stephen Warren wrote:
>> On 02/05/2015 06:06 PM, J?rg Krause wrote:
>>> On Do, 2015-02-05 at 15:23 -0700, Stephen Warren wrote:
>>>>
>>>> b) In ci_bounce(), the bounce buffer is only allocated if the
>>>> user-buffer is already aligned, and if a large-enough bounce buffer
>>>> wasn't previously allocated. If ci_req->b_buf was uninitialized it could
>>>> be non-zero (thus preventing the expected aligned allocation) yet not
>>>> actually aligned enough.
>>>
>>> I can reproduce this issue now. After some "timeout sending packets to
>>> usb ethernet" messages, the bounce buffer somehow gets corrupted.
>>> ci_bounce() is called with an unaligned input buffer length
>>> 'req->length=66', but the bounce buffer length
>>> 'ci_req->b_len=1140305940' or in hex 'ci_req->b_len=0x43f7b014'. This
>>> bounce buffer length is obviously an address, as the following
>>> misaligned error message shows: "CACHE: Misaligned operation at range
>>> [43f7b010, 43f7b070]".
>>
>> Ah, I hadn't realized that was [start, length] rather than [start, end].
>>
>> The question is: How is ci_req->b_len getting corrupted? Is it simply
>> never initialized, or does something trash that value later?
>>
>> ci_ep_alloc_request() appears to calloc() the whole struct ci_req, so I
>> imagine an initialization/allocating error isn't happening.
>>
>> The only issue there might be some code somehow creating its own struct
>> usb_request instead of calling into the controller's ->alloc_request()
>> function. I vaguely recall fixing some of those, but might have missed
>> some in protocols that I didn't test (i.e. anything other than USB Mass
>> Storage or DFU, although I might have very briefly tested netconsole once?).
>>
>> I would suggest adding a whole ton of printfs() to catch where ci_reqs
>> are being allocated, and where ci_req->b_len is getting written in which
>> ci_req objects, and then mapping that back to the ci_req that the cache
>> alignment error message complains about. Sorry, this will be a bit painful.
>>
>> If the ci_req is always at the same address on different boots of the
>> code, that will make it easier, especially if you have a debugger with a
>> data watchpoint, or can write some code to use any data watchpoint
>> self-hosted debug capability in your CPU.
>
> I think I found the answer.
>
> I used a lot of debug messages and tried to understand the involved
> drivers. I will try to sum up my investigations.
>
> NetLoop is entered first for the ping protocol and then for the
> netconsole protocol:
>          --- NetLoop Entry (PING)
>          --- NetLoop Entry (NETCONSOLE)
>
> ci_udc is probed and ci_ep_alloc_request() is called for EP0:
>          ci_ep_alloc_request usb_ep:0x43fd0028 ci_req:0x43b83220
>
> In eth_bind() ci_ep_alloc_request() is called for status EP:
>          ci_ep_alloc_request usb_ep:0x43fd00a0 ci_req:0x43b7b4a8
>
> Everything is fine:
>          using ci_udc, OUT ep- IN ep- STATUS ep-
>          MAC 00:19:b8:00:00:02
>          HOST MAC 00:19:b8:00:00:01
>
> In eth_set_config() ci_ep_alloc_request() is invoked again for tx_req
> and rx_req:
>          ci_ep_alloc_request usb_ep:0x43fd0050 ci_req:0x43b7b568
>          ci_ep_alloc_request usb_ep:0x43fd0078 ci_req:0x43b7b5b0
>
> Everything is fine:
>          high speed config #1: 2 mA, Ethernet Gadget, using CDC Ethernet
>          USB network up!
>
> Now the NetLoop is available for netconsole:
>          --- NetState set to 0 (CONTINUE)
>          --- NetLoop Init (NETCONSOLE)
>          --- NetLoop ARP handler set (43fa3b14)
>          [..]
>          Got ARP REPLY, set eth addr (00:19:b8:00:00:01)
>          --- NetState set to 2 (SUCCESS)
>          --- NetLoop UDP handler set (00000000)
>          --- NetLoop ARP handler set (00000000)
>          --- NetLoop timeout handler cancelled
>          --- NetLoop Success! (NETCONSOLE)
>
> Now it's pings turn in the NetLoop. The if (eth_is_on_demand_init())
> branch is executed. eth_halt() and later eth_disconnect() and
> eth_reset_config() are invoked. There the in and out EPs are freed:
>          ci_ep_free_request ci_req:0x43b7b568
>          ci_ep_free_request ci_req:0x43b7b5b0

That sounds like a problem.

I would assume that once netconsole is enabled, the Ethernet connection 
is no longer on-demand, but should be initialized once, and then never 
taken down.

Take a look at the definition of eth_is_on_demand_init() in 
include/net.h. It has a special case for CONFIG_NETCONSOLE, based on 
variable net_loop_last_protocol. Perhaps that isn't getting set 
correctly, or perhaps the variable is getting set to something other 
than netconsole, thus causing the connection to get taken down?

> However, netconsole tries to send every printf() as an UDP packet to the
> host.
>          sending UDP to 10.0.0.1/00:19:b8:00:00:01
>
> But the usb_request [1] pointer in usb_eth_send is NULL after the free
> request:
>          usb_eth_send usb_request: 00000000 length: 82
>
> usb_eth_send() calls ci_ep_queue() which calls ci_bounce() and because
> of the corrupted ci_req pointer I get a misaligned cache.
>
> So, as far as I understand, the main problem is that netconsole does not
> knows that the connection is disconnected by the NetLoop?
>
> What are your suggestions to this issue? Should we add an addition check
> for an initialized usb_request pointer in usb_eth_send()? Or somewhere
> else?

It might be a useful debugging/validation aid to add a check somewhere 
for valid non-free struct usb_requests. However, it sounds like that's 
just a symptom; the underlying problem is that the Ethernet connection 
is taken down at all, while netconsole is active.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [U-Boot] netconsole: USB Ethernet connection dropping with ping or tftpboot
  2015-02-09 17:38               ` Stephen Warren
@ 2015-02-11 22:08                 ` Jörg Krause
  0 siblings, 0 replies; 15+ messages in thread
From: Jörg Krause @ 2015-02-11 22:08 UTC (permalink / raw)
  To: u-boot

On Mo, 2015-02-09 at 10:38 -0700, Stephen Warren wrote:
> On 02/08/2015 02:25 PM, J?rg Krause wrote:
> > On Fr, 2015-02-06 at 11:06 -0700, Stephen Warren wrote:
> >> On 02/05/2015 06:06 PM, J?rg Krause wrote:
> >>> On Do, 2015-02-05 at 15:23 -0700, Stephen Warren wrote:
> >>>>
> >>>> b) In ci_bounce(), the bounce buffer is only allocated if the
> >>>> user-buffer is already aligned, and if a large-enough bounce buffer
> >>>> wasn't previously allocated. If ci_req->b_buf was uninitialized it could
> >>>> be non-zero (thus preventing the expected aligned allocation) yet not
> >>>> actually aligned enough.
> >>>
> >>> I can reproduce this issue now. After some "timeout sending packets to
> >>> usb ethernet" messages, the bounce buffer somehow gets corrupted.
> >>> ci_bounce() is called with an unaligned input buffer length
> >>> 'req->length=66', but the bounce buffer length
> >>> 'ci_req->b_len=1140305940' or in hex 'ci_req->b_len=0x43f7b014'. This
> >>> bounce buffer length is obviously an address, as the following
> >>> misaligned error message shows: "CACHE: Misaligned operation at range
> >>> [43f7b010, 43f7b070]".
> >>
> >> Ah, I hadn't realized that was [start, length] rather than [start, end].
> >>
> >> The question is: How is ci_req->b_len getting corrupted? Is it simply
> >> never initialized, or does something trash that value later?
> >>
> >> ci_ep_alloc_request() appears to calloc() the whole struct ci_req, so I
> >> imagine an initialization/allocating error isn't happening.
> >>
> >> The only issue there might be some code somehow creating its own struct
> >> usb_request instead of calling into the controller's ->alloc_request()
> >> function. I vaguely recall fixing some of those, but might have missed
> >> some in protocols that I didn't test (i.e. anything other than USB Mass
> >> Storage or DFU, although I might have very briefly tested netconsole once?).
> >>
> >> I would suggest adding a whole ton of printfs() to catch where ci_reqs
> >> are being allocated, and where ci_req->b_len is getting written in which
> >> ci_req objects, and then mapping that back to the ci_req that the cache
> >> alignment error message complains about. Sorry, this will be a bit painful.
> >>
> >> If the ci_req is always at the same address on different boots of the
> >> code, that will make it easier, especially if you have a debugger with a
> >> data watchpoint, or can write some code to use any data watchpoint
> >> self-hosted debug capability in your CPU.
> >
> > I think I found the answer.
> >
> > I used a lot of debug messages and tried to understand the involved
> > drivers. I will try to sum up my investigations.
> >
> > NetLoop is entered first for the ping protocol and then for the
> > netconsole protocol:
> >          --- NetLoop Entry (PING)
> >          --- NetLoop Entry (NETCONSOLE)
> >
> > ci_udc is probed and ci_ep_alloc_request() is called for EP0:
> >          ci_ep_alloc_request usb_ep:0x43fd0028 ci_req:0x43b83220
> >
> > In eth_bind() ci_ep_alloc_request() is called for status EP:
> >          ci_ep_alloc_request usb_ep:0x43fd00a0 ci_req:0x43b7b4a8
> >
> > Everything is fine:
> >          using ci_udc, OUT ep- IN ep- STATUS ep-
> >          MAC 00:19:b8:00:00:02
> >          HOST MAC 00:19:b8:00:00:01
> >
> > In eth_set_config() ci_ep_alloc_request() is invoked again for tx_req
> > and rx_req:
> >          ci_ep_alloc_request usb_ep:0x43fd0050 ci_req:0x43b7b568
> >          ci_ep_alloc_request usb_ep:0x43fd0078 ci_req:0x43b7b5b0
> >
> > Everything is fine:
> >          high speed config #1: 2 mA, Ethernet Gadget, using CDC Ethernet
> >          USB network up!
> >
> > Now the NetLoop is available for netconsole:
> >          --- NetState set to 0 (CONTINUE)
> >          --- NetLoop Init (NETCONSOLE)
> >          --- NetLoop ARP handler set (43fa3b14)
> >          [..]
> >          Got ARP REPLY, set eth addr (00:19:b8:00:00:01)
> >          --- NetState set to 2 (SUCCESS)
> >          --- NetLoop UDP handler set (00000000)
> >          --- NetLoop ARP handler set (00000000)
> >          --- NetLoop timeout handler cancelled
> >          --- NetLoop Success! (NETCONSOLE)
> >
> > Now it's pings turn in the NetLoop. The if (eth_is_on_demand_init())
> > branch is executed. eth_halt() and later eth_disconnect() and
> > eth_reset_config() are invoked. There the in and out EPs are freed:
> >          ci_ep_free_request ci_req:0x43b7b568
> >          ci_ep_free_request ci_req:0x43b7b5b0
> 
> That sounds like a problem.
> 
> I would assume that once netconsole is enabled, the Ethernet connection 
> is no longer on-demand, but should be initialized once, and then never 
> taken down.

I took me a while to fully understand the concept behind the net driver.
The initial suppose for the net driver was to halt the underlying
ethernet driver after each protocol communication. This was changed for
speed-up purpose of netconsole. However, I think this broke the
communication for other protocols used together with netconsole.

> Take a look at the definition of eth_is_on_demand_init() in 
> include/net.h. It has a special case for CONFIG_NETCONSOLE, based on 
> variable net_loop_last_protocol. Perhaps that isn't getting set 
> correctly, or perhaps the variable is getting set to something other 
> than netconsole, thus causing the connection to get taken down?

The cause it, that after running a ping or tftp command, the ethernet
connection is halt. I managed to get it working for those two protocols
by including them to eth_is_on_demand_init() and some other parts in the
net, but I'm not quite satisfied with the solution. I will discuss the
with Joe in another mail.

> 
> > However, netconsole tries to send every printf() as an UDP packet to the
> > host.
> >          sending UDP to 10.0.0.1/00:19:b8:00:00:01
> >
> > But the usb_request [1] pointer in usb_eth_send is NULL after the free
> > request:
> >          usb_eth_send usb_request: 00000000 length: 82
> >
> > usb_eth_send() calls ci_ep_queue() which calls ci_bounce() and because
> > of the corrupted ci_req pointer I get a misaligned cache.
> >
> > So, as far as I understand, the main problem is that netconsole does not
> > knows that the connection is disconnected by the NetLoop?
> >
> > What are your suggestions to this issue? Should we add an addition check
> > for an initialized usb_request pointer in usb_eth_send()? Or somewhere
> > else?
> 
> It might be a useful debugging/validation aid to add a check somewhere 
> for valid non-free struct usb_requests. However, it sounds like that's 
> just a symptom; the underlying problem is that the Ethernet connection 
> is taken down at all, while netconsole is active.

You're right, it's just a symptom, but it would help in early detection
of possible bugs in any of the other drivers. Like the cache misaligned
debug message.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-02-11 22:08 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-03 21:44 [U-Boot] netconsole: USB Ethernet connection dropping with ping or tftpboot Jörg Krause
2015-02-05 11:21 ` Jörg Krause
2015-02-05 15:33   ` Stephen Warren
2015-02-05 22:10     ` Jörg Krause
2015-02-05 22:23       ` Stephen Warren
2015-02-05 23:19         ` Jörg Krause
2015-02-06  1:06         ` Jörg Krause
2015-02-06 18:06           ` Stephen Warren
2015-02-08 21:25             ` Jörg Krause
2015-02-09 17:38               ` Stephen Warren
2015-02-11 22:08                 ` Jörg Krause
2015-02-05 19:20 ` Joe Hershberger
2015-02-05 20:39   ` Jörg Krause
2015-02-05 20:48     ` Joe Hershberger
2015-02-05 22:28       ` Jörg Krause

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.