All of lore.kernel.org
 help / color / mirror / Atom feed
* dma_alloc_coherent and cache?
@ 2014-04-15  5:43 Lee Essen
  2014-04-15  8:10 ` Andrew Lunn
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Lee Essen @ 2014-04-15  5:43 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

I'm working on a driver for a Marvell switch device (98dx4122) where the basic interface closely resembles the mv643xx_eth device.

(I should say this is a bit of a personal project to experiment with a trendnet switch, not affiliated with any commercial activities.)

GPL code from Marvell is available for an old kernel, so I have been working to use the mv643xx_eth concepts and at least get basic functionality up an running on the current kernel version.

At a high level I have it working, however I get regular (reproducible) hangs and I suspect it's to do with the writes to the descriptiors (from dma_alloc_coherent) being buffered or cached and not making it to the device when dma is triggered.

My theory is based on the fact that the hang always seems to happen at the point of enabling dma for transmit, and occasionally I get a packet out which is corrupt ... and if I add lots of debug printk's or delay loops then it happens less frequently.

The original GPL code has some functions in to invalidate/clear the L2 cache, but no other driver seems to do this, so it doesn't feel like it's a good solution.

It's a feroceon cpu, and I've tried disabling the L2 controller and also the d-cache - neither of which made any difference.

So I'm now completely out of ideas and way out of my depth ;-) 

Any suggestions would be greatly appreciated.

Regards,

Lee.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* dma_alloc_coherent and cache?
  2014-04-15  5:43 dma_alloc_coherent and cache? Lee Essen
@ 2014-04-15  8:10 ` Andrew Lunn
  2014-04-17 14:35   ` Valentin Longchamp
  2014-04-15  8:43 ` Arnd Bergmann
  2014-04-15 21:54 ` Troy Kisky
  2 siblings, 1 reply; 10+ messages in thread
From: Andrew Lunn @ 2014-04-15  8:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 15, 2014 at 09:43:38AM +0400, Lee Essen wrote:
> Hi,
> 
> I'm working on a driver for a Marvell switch device (98dx4122) where the basic interface closely resembles the mv643xx_eth device.

Hi Lee

There is basic support for this SoC in the kernel. See
arch/arm/boot/dts/kirkwood-98dx4122.dtsi and
arch/arm/boot/dts/kirkwood-km_kirkwood.dts which is keymile's
reference design.

Keymile are the experts for this device within the kernel community,
so maybe they can comment?

   Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* dma_alloc_coherent and cache?
  2014-04-15  5:43 dma_alloc_coherent and cache? Lee Essen
  2014-04-15  8:10 ` Andrew Lunn
@ 2014-04-15  8:43 ` Arnd Bergmann
  2014-04-15 10:01   ` Lee Essen
  2014-04-15 21:54 ` Troy Kisky
  2 siblings, 1 reply; 10+ messages in thread
From: Arnd Bergmann @ 2014-04-15  8:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 15 April 2014 09:43:38 Lee Essen wrote:
> 
> 
> At a high level I have it working, however I get regular (reproducible) 
> hangs and I suspect it's to do with the writes to the descriptiors 
> (from dma_alloc_coherent) being buffered or cached and not making it
> to the device when dma is triggered.

dma_alloc_coherent() is a wrapper around a device-specific allocator,
based on the dma_map_ops implementation. The default allocator
from arm_dma_ops gives you uncached, buffered memory. It is expected
that the driver uses a barrier (which is implied by readl/writel
but not __raw_readl/__raw_writel or readl_relaxed/writel_relaxed)
to ensure the write buffers are flushed.

If the machine sets arm_coherent_dma_ops rather than arm_dma_ops,
the memory will be cacheable, as it's assumed that the hardware
is set up for cache-coherent DMAs.
 
> My theory is based on the fact that the hang always seems to happen
> at the point of enabling dma for transmit, and occasionally I get a
> packet out which is corrupt ... and if I add lots of debug printk's
> or delay loops then it happens less frequently.
> 
> The original GPL code has some functions in to invalidate/clear the
> L2 cache, but no other driver seems to do this, so it doesn't feel
> like it's a good solution.

Correct, drivers should never do cache management by hand, they
should rely on dma_alloc_coherent, dma_map_* and dma_unmap_* to
do the right thing.

> It's a feroceon cpu, and I've tried disabling the L2 controller and
> also the d-cache - neither of which made any difference.
> 
> So I'm now completely out of ideas and way out of my depth  
> 
> Any suggestions would be greatly appreciated.

Can you post a link to the source code?

	Arnd

^ permalink raw reply	[flat|nested] 10+ messages in thread

* dma_alloc_coherent and cache?
  2014-04-15  8:43 ` Arnd Bergmann
@ 2014-04-15 10:01   ` Lee Essen
  2014-04-15 10:49     ` Arnd Bergmann
  0 siblings, 1 reply; 10+ messages in thread
From: Lee Essen @ 2014-04-15 10:01 UTC (permalink / raw)
  To: linux-arm-kernel

> On 15 Apr 2014, at 12:43, Arnd Bergmann <arnd@arndb.de> wrote:
> 
> dma_alloc_coherent() is a wrapper around a device-specific allocator,
> based on the dma_map_ops implementation. The default allocator
> from arm_dma_ops gives you uncached, buffered memory. It is expected
> that the driver uses a barrier (which is implied by readl/writel
> but not __raw_readl/__raw_writel or readl_relaxed/writel_relaxed)
> to ensure the write buffers are flushed.
> 
> If the machine sets arm_coherent_dma_ops rather than arm_dma_ops,
> the memory will be cacheable, as it's assumed that the hardware
> is set up for cache-coherent DMAs.

Hi,

The driver writes to the descriptor and then uses wmb() before enabling DMA. The descriptor is in dma_alloc_coherent() space, but the enable is a writel().

> 
> Can you post a link to the source code?
> 
>   Arnd

The code is available here:

http://www.nowonline.co.uk/scratch/le_netdev.c

It hangs consistently when it executes the txq_enable() on line 1280. Occasionally I see a corrupt packet on the wire, but mostly it's just a hang. If I uncomment all the printk's then it generally gets 20 or 30 packets out before it freezes.

Regards,

Lee.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* dma_alloc_coherent and cache?
  2014-04-15 10:01   ` Lee Essen
@ 2014-04-15 10:49     ` Arnd Bergmann
  2014-04-15 16:22       ` Lee Essen
  0 siblings, 1 reply; 10+ messages in thread
From: Arnd Bergmann @ 2014-04-15 10:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 15 April 2014 14:01:39 Lee Essen wrote:
> > On 15 Apr 2014, at 12:43, Arnd Bergmann <arnd@arndb.de> wrote:
> > 
> > dma_alloc_coherent() is a wrapper around a device-specific allocator,
> > based on the dma_map_ops implementation. The default allocator
> > from arm_dma_ops gives you uncached, buffered memory. It is expected
> > that the driver uses a barrier (which is implied by readl/writel
> > but not __raw_readl/__raw_writel or readl_relaxed/writel_relaxed)
> > to ensure the write buffers are flushed.
> > 
> > If the machine sets arm_coherent_dma_ops rather than arm_dma_ops,
> > the memory will be cacheable, as it's assumed that the hardware
> > is set up for cache-coherent DMAs.
> 
> Hi,
> 
> The driver writes to the descriptor and then uses wmb() before enabling DMA. The descriptor is in dma_alloc_coherent() space, but the enable is a writel().

Ok

> > 
> > Can you post a link to the source code?
> > 
> >   Arnd
> 
> The code is available here:
> 
> http://www.nowonline.co.uk/scratch/le_netdev.c
> 
> It hangs consistently when it executes the txq_enable() on line 1280. Occasionally I see a corrupt packet on the wire, but mostly it's just a hang. If I uncomment all the printk's then it generally gets 20 or 30 packets out before it freezes.
> 


Unfortunately I don't see an obvious mistake with the DMA handling there,
I would try looking somewhere other than the dma code first. What
kind of freeze do you see? Does the entire machine hang, or is it
just the network interface that stops sending packets?

	Arnd

^ permalink raw reply	[flat|nested] 10+ messages in thread

* dma_alloc_coherent and cache?
  2014-04-15 10:49     ` Arnd Bergmann
@ 2014-04-15 16:22       ` Lee Essen
  2014-04-15 16:49         ` Andrew Lunn
  0 siblings, 1 reply; 10+ messages in thread
From: Lee Essen @ 2014-04-15 16:22 UTC (permalink / raw)
  To: linux-arm-kernel


On 15 Apr 2014, at 14:49, Arnd Bergmann <arnd@arndb.de> wrote:

> On Tuesday 15 April 2014 14:01:39 Lee Essen wrote:
>>> On 15 Apr 2014, at 12:43, Arnd Bergmann <arnd@arndb.de> wrote:
>>> 
>>> dma_alloc_coherent() is a wrapper around a device-specific allocator,
>>> based on the dma_map_ops implementation. The default allocator
>>> from arm_dma_ops gives you uncached, buffered memory. It is expected
>>> that the driver uses a barrier (which is implied by readl/writel
>>> but not __raw_readl/__raw_writel or readl_relaxed/writel_relaxed)
>>> to ensure the write buffers are flushed.
>>> 
>>> If the machine sets arm_coherent_dma_ops rather than arm_dma_ops,
>>> the memory will be cacheable, as it's assumed that the hardware
>>> is set up for cache-coherent DMAs.
>> 
>> Hi,
>> 
>> The driver writes to the descriptor and then uses wmb() before enabling DMA. The descriptor is in dma_alloc_coherent() space, but the enable is a writel().
> 
> Ok
> 
>>> 
>>> Can you post a link to the source code?
>>> 
>>>  Arnd
>> 
>> The code is available here:
>> 
>> http://www.nowonline.co.uk/scratch/le_netdev.c
>> 
>> It hangs consistently when it executes the txq_enable() on line 1280. Occasionally I see a corrupt packet on the wire, but mostly it's just a hang. If I uncomment all the printk's then it generally gets 20 or 30 packets out before it freezes.
>> 
> 
> 
> Unfortunately I don't see an obvious mistake with the DMA handling there,
> I would try looking somewhere other than the dma code first. What
> kind of freeze do you see? Does the entire machine hang, or is it
> just the network interface that stops sending packets?
> 

Hi Arnd,

Thanks for having a look ? on the hangs it?s a complete machine hang, I?m connected via serial and it just stops dead.

I?m starting to look at other differences compared to the GPL code, but unfortunately there are many things. One very big difference is that the GPL code (used with 2.6.22) uses a patched version of proc-arm926.S that caters for a couple of Feroceon specifics.

The proc-feroceon.S version, which is used in the newer kernel, doesn?t seem to have some of these ? one section of note is:

ENTRY(cpu_arm926_do_idle)
#ifdef CONFIG_ARCH_FEROCEON
        /* Implement workaround for FEr# CPU-C16: Wait for interrupt command */
        /* is not processed properly, the workaround is not to use this command */
        /* the erratum is relevant for 5281 devices with revision less than C0 */

        ldr     r0, support_wait_for_interrupt_address /* this variable set in core.c*/
        ldr     r0, [r0]
        cmp     r0, #1    /* check if the device doesn't support wait for interrupt*/
        bne     1f        /* if yes, then go out*/
        /* workaround ends here*/
#endif
    mov r0, #0
    mrc p15, 0, r1, c1, c0, 0       @ Read control register
    mcr p15, 0, r0, c7, c10, 4      @ Drain write buffer
    bic r2, r1, #1 << 12
    mcr p15, 0, r2, c1, c0, 0       @ Disable I cache
    mcr p15, 0, r0, c7, c0, 4       @ Wait for interrupt
    mcr p15, 0, r1, c1, c0, 0       @ Restore ICache enable
#ifdef CONFIG_ARCH_FEROCEON
1:
#endif
    mov pc, lr

? but there are many others. I?ll continue to look@these and see if I can experiment ? but I?m am definitely way out of my depth.

Regards,

Lee.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* dma_alloc_coherent and cache?
  2014-04-15 16:22       ` Lee Essen
@ 2014-04-15 16:49         ` Andrew Lunn
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Lunn @ 2014-04-15 16:49 UTC (permalink / raw)
  To: linux-arm-kernel

> ENTRY(cpu_arm926_do_idle)
> #ifdef CONFIG_ARCH_FEROCEON
>         /* Implement workaround for FEr# CPU-C16: Wait for interrupt command */
>         /* is not processed properly, the workaround is not to use this command */
>         /* the erratum is relevant for 5281 devices with revision less than C0 */

5281 is the old Orion5x devices, not kirkwood. So is not relevant. You
should be keeping an eye out for 6xxx issues, which would be Kirkwood.

       Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* dma_alloc_coherent and cache?
  2014-04-15  5:43 dma_alloc_coherent and cache? Lee Essen
  2014-04-15  8:10 ` Andrew Lunn
  2014-04-15  8:43 ` Arnd Bergmann
@ 2014-04-15 21:54 ` Troy Kisky
  2014-04-16 15:55   ` Lee Essen
  2 siblings, 1 reply; 10+ messages in thread
From: Troy Kisky @ 2014-04-15 21:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 4/14/2014 10:43 PM, Lee Essen wrote:
> Hi,
> 
> I'm working on a driver for a Marvell switch device (98dx4122) where the basic interface closely resembles the mv643xx_eth device.
> 
> (I should say this is a bit of a personal project to experiment with a trendnet switch, not affiliated with any commercial activities.)
> 
> GPL code from Marvell is available for an old kernel, so I have been working to use the mv643xx_eth concepts and at least get basic functionality up an running on the current kernel version.
> 
> At a high level I have it working, however I get regular (reproducible) hangs and I suspect it's to do with the writes to the descriptiors (from dma_alloc_coherent) being buffered or cached and not making it to the device when dma is triggered.
> 

Have you verified that a wmb() precedes transferring ownership of the descriptor to the controller
and the cpu does not touch the descriptor afterwards?


Regards
Troy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* dma_alloc_coherent and cache?
  2014-04-15 21:54 ` Troy Kisky
@ 2014-04-16 15:55   ` Lee Essen
  0 siblings, 0 replies; 10+ messages in thread
From: Lee Essen @ 2014-04-16 15:55 UTC (permalink / raw)
  To: linux-arm-kernel


On 16 Apr 2014, at 01:54, Troy Kisky <troy.kisky@boundarydevices.com> wrote:

> On 4/14/2014 10:43 PM, Lee Essen wrote:
>> Hi,
>> 
>> I'm working on a driver for a Marvell switch device (98dx4122) where the basic interface closely resembles the mv643xx_eth device.
>> 
>> (I should say this is a bit of a personal project to experiment with a trendnet switch, not affiliated with any commercial activities.)
>> 
>> GPL code from Marvell is available for an old kernel, so I have been working to use the mv643xx_eth concepts and at least get basic functionality up an running on the current kernel version.
>> 
>> At a high level I have it working, however I get regular (reproducible) hangs and I suspect it's to do with the writes to the descriptiors (from dma_alloc_coherent) being buffered or cached and not making it to the device when dma is triggered.
>> 
> 
> Have you verified that a wmb() precedes transferring ownership of the descriptor to the controller
> and the cpu does not touch the descriptor afterwards?

Hi Troy,

Yes, there?s a smb() before the write of the register that starts the DMA, and nothing else that touches it.  The hang happens almost instantly (probably it is instantly) after the dma start.

It still feels like a cache/buffering issue, but nobody can see any obvious problems ? the other possibility is that it?s interrupt related, since an interrupt will occur pretty quickly after the dma is enabled.

I am going to try to rework the driver slightly to be entirely polling, just to eliminate the interrupt side of things ? although it will increase the delay between the descriptor write and the dma start which may also mask the problem.

Lee.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* dma_alloc_coherent and cache?
  2014-04-15  8:10 ` Andrew Lunn
@ 2014-04-17 14:35   ` Valentin Longchamp
  0 siblings, 0 replies; 10+ messages in thread
From: Valentin Longchamp @ 2014-04-17 14:35 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/15/2014 10:10 AM, Andrew Lunn wrote:
> On Tue, Apr 15, 2014 at 09:43:38AM +0400, Lee Essen wrote:
>> Hi,
>>
>> I'm working on a driver for a Marvell switch device (98dx4122) where the basic interface closely resembles the mv643xx_eth device.
> 
> There is basic support for this SoC in the kernel. See
> arch/arm/boot/dts/kirkwood-98dx4122.dtsi and
> arch/arm/boot/dts/kirkwood-km_kirkwood.dts which is keymile's
> reference design.
> 
> Keymile are the experts for this device within the kernel community,
> so maybe they can comment?
> 

Hi Lee and Andrew,

Sorry for the quite late answer. At Keymile we rely on the driver (CPSS) Marvell
provides to manage the switch but personally I am not very happy about this
situation and I think your effort is great.

As you know, there are several ways to access the Switch from the CPU, one of
them is through the "internal" RGMII that you use, and another one is through
the CPU internal bus that we use at Keymile (thus we have to reserve a Mbus
window for the switch that then can be ioremapped). Since I have never used the
RGMII device, I cannot really give you some interesting hints here.

If we are talking about feroceon and caches, we have experienced a lot of
problems with the L2 cache that was sometimes not coherent with the CPU, but
with the latest versions of u-boot and Linux we do not see that anymore, so I
don't think you problem is this one.

That's unfortunately all the information I can provide to this thread.

Valentin

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-04-17 14:35 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-15  5:43 dma_alloc_coherent and cache? Lee Essen
2014-04-15  8:10 ` Andrew Lunn
2014-04-17 14:35   ` Valentin Longchamp
2014-04-15  8:43 ` Arnd Bergmann
2014-04-15 10:01   ` Lee Essen
2014-04-15 10:49     ` Arnd Bergmann
2014-04-15 16:22       ` Lee Essen
2014-04-15 16:49         ` Andrew Lunn
2014-04-15 21:54 ` Troy Kisky
2014-04-16 15:55   ` Lee Essen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.