linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] Fixes for dw_dmac and atmel-mci for AP700x
       [not found]               ` <50310F10.2080701@yahoo.es>
@ 2012-08-21  4:42                 ` viresh kumar
  2012-08-21  6:12                   ` Hein Tibosch
  0 siblings, 1 reply; 11+ messages in thread
From: viresh kumar @ 2012-08-21  4:42 UTC (permalink / raw)
  To: Hein Tibosch
  Cc: Hans-Christian Egtvedt, Nicolas Ferre, Havard Skinnemoen,
	ludovic.desroches, linux-kernel, spear-devel

Hi Hein,

I have added linux-kernel list in cc as there might be other users of
this patch.
Also, please try to keep spear-devel list in cc for dw_dmac as we are using this
driver for SPEAr.

On Sun, Aug 19, 2012 at 9:36 PM, Hein Tibosch <hein_tibosch@yahoo.es> wrote:
> Hi Hans-Christian,
>
> Do you know to which list I could send the patch(es) below?
>
> Viresh Kumar: did you have an urgent reason to replace __raw_readl
> and __raw_writel with readl/writel in the dw_dmac module?
> (see http://lists.infradead.org/pipermail/linux-arm-kernel/2011-March/044799.html)
>
> Here's a draft:
>
> Because of the changes since 2.6.38 to dw_dmac.c and atmel-mci.c,
> the MCI driver on the AP700x platforms didn't work anymore using DMA.
> The PIO method still worked well.
>
> I found several problems. The patch below makes it work again.
> It was tested on an atngw100 board.
>
> atmel-mci:
> - The AP700x has an MCI controller with version 0x210
>   Therefor it was supposed to work the PDC (peripheral DMA
>   controller). However, in this CPU the PDC is not connected
>   to the MCI.
> - When using DMA, the driver was accessing a register ATMCI_DMA
>   at offset 0x50, which is undefined for the AP700x. But it
>   probably doesn't hurt either...
>
> dw_dmac:
> - After 2.6.39, the registers were accessed using readl/writel
>   in stead of the __raw_readl and __raw_writel causing a 16-bit
>   swap of all values (little endian access)

Ahhhh!! Firstly we can't use __raw* for architectures >= ARMv6. It is not
only for endianess but for memory barriers. Why are they getting swapped
for your case? Does your processor and dw_dmac have different endianess?

And if i am not wrong, we should always try not to use __raw* variants just
due to endianess things... instead use either readl/writel OR
readl_/writel_ relaxed.
I am not sure if relaxed versions are available for architectures
other than ARM.

> - Access to memory was sometimes done in chunks of 64-bits,
>   which gives an undefined value of 0x03 for SRC/DST_TR_WIDTH
>   field in the CTLxL register

Looks fine. But there should be a separate patch for this.

> - The SMS field in the CTLxL register received the wrong value:
>   0 in stead of 1

I believe it is not for dw_dmac?

--
viresh

> Signed-off-by: Hein Tibosch <hein_tibosch@yahoo.es>
> ---
>  arch/avr32/mach-at32ap/at32ap700x.c |    6 ++++++
>  drivers/dma/dw_dmac.c               |    8 ++++++++
>  drivers/dma/dw_dmac_regs.h          |    8 ++++----
>  drivers/mmc/host/atmel-mci.c        |   19 ++++++++++++++++---
>  4 files changed, 34 insertions(+), 7 deletions(-)
>
> diff --git a/arch/avr32/mach-at32ap/at32ap700x.c b/arch/avr32/mach-at32ap/at32ap700x.c
> index 0445c4f..06b6eff 100644
> --- a/arch/avr32/mach-at32ap/at32ap700x.c
> +++ b/arch/avr32/mach-at32ap/at32ap700x.c
> @@ -1355,6 +1355,12 @@ at32_add_device_mci(unsigned int id, struct mci_platform_data *data)
>                                 | DWC_CFGH_DST_PER(1));
>         slave->sdata.cfg_lo &= ~(DWC_CFGL_HS_DST_POL
>                                 | DWC_CFGL_HS_SRC_POL);
> +       /* value for SMS: Source Master Select */
> +       slave->sdata.src_master = 1;
> +       /* value for DMS: Destination Master Select */
> +       slave->sdata.dst_master = 0;
> +       /* SRC/DST_TR_WIDTH register only accepts 0,1,2 */
> +       slave->sdata.max_mem_width = 2;
>         data->dma_slave = slave;
>  diff --git a/drivers/dma/dw_dmac.c b/drivers/dma/dw_dmac.c
> index 7212961..a4bdf1d 100644
> --- a/drivers/dma/dw_dmac.c
> +++ b/drivers/dma/dw_dmac.c
> @@ -754,6 +754,10 @@ dwc_prep_slave_sg(struct dma_chan *chan, struct scatterlist *sgl,
>                                 mem_width = 1;
>                         else
>                                 mem_width = 0;
> +                       /* Some controllers don't support 64-bits mem access */
> +                       if (dws->max_mem_width &&
> +                               mem_width > dws->max_mem_width)
> +                               mem_width = dws->max_mem_width;
>   slave_sg_todev_fill_desc:
>                         desc = dwc_desc_get(dwc);
> @@ -821,6 +825,10 @@ slave_sg_todev_fill_desc:
>                                 mem_width = 1;
>                         else
>                                 mem_width = 0;
> +                       /* Some controllers don't support 64-bits mem access */
> +                       if (dws->max_mem_width &&
> +                               mem_width > dws->max_mem_width)
> +                               mem_width = dws->max_mem_width;
>   slave_sg_fromdev_fill_desc:
>                         desc = dwc_desc_get(dwc);
> diff --git a/drivers/dma/dw_dmac_regs.h b/drivers/dma/dw_dmac_regs.h
> index f298f69..b380d5f 100644
> --- a/drivers/dma/dw_dmac_regs.h
> +++ b/drivers/dma/dw_dmac_regs.h
> @@ -176,9 +176,9 @@ __dwc_regs(struct dw_dma_chan *dwc)
>  }
>   #define channel_readl(dwc, name) \
> -       readl(&(__dwc_regs(dwc)->name))
> +       __raw_readl(&(__dwc_regs(dwc)->name))
>  #define channel_writel(dwc, name, val) \
> -       writel((val), &(__dwc_regs(dwc)->name))
> +       __raw_writel((val), &(__dwc_regs(dwc)->name))
>   static inline struct dw_dma_chan *to_dw_dma_chan(struct dma_chan *chan)
>  {
> @@ -202,9 +202,9 @@ static inline struct dw_dma_regs __iomem *__dw_regs(struct dw_dma *dw)
>  }
>   #define dma_readl(dw, name) \
> -       readl(&(__dw_regs(dw)->name))
> +       __raw_readl(&(__dw_regs(dw)->name))
>  #define dma_writel(dw, name, val) \
> -       writel((val), &(__dw_regs(dw)->name))
> +       __raw_writel((val), &(__dw_regs(dw)->name))
>   #define channel_set_bit(dw, reg, mask) \
>         dma_writel(dw, reg, ((mask) << 8) | (mask))
> diff --git a/drivers/mmc/host/atmel-mci.c b/drivers/mmc/host/atmel-mci.c
> index f2c115e..b78a674 100644
> --- a/drivers/mmc/host/atmel-mci.c
> +++ b/drivers/mmc/host/atmel-mci.c
> @@ -75,6 +75,7 @@ struct atmel_mci_caps {
>         bool    has_pdc;
>         bool    has_cfg_reg;
>         bool    has_cstor_reg;
> +       bool    has_dma_reg;
>         bool    has_highspeed;
>         bool    has_rwproof;
>         bool    has_odd_clk_div;
> @@ -411,7 +412,7 @@ static int atmci_regs_show(struct seq_file *s, void *v)
>         atmci_show_status_reg(s, "SR", buf[ATMCI_SR / 4]);
>         atmci_show_status_reg(s, "IMR", buf[ATMCI_IMR / 4]);
>  -      if (host->caps.has_dma) {
> +       if (host->caps.has_dma_reg) {
>                 u32 val;
>                 val = buf[ATMCI_DMA / 4];
> @@ -767,7 +768,7 @@ static void atmci_dma_complete(void *arg)
>         dev_vdbg(&host->pdev->dev, "DMA complete\n");
>  -      if (host->caps.has_dma)
> +       if (host->caps.has_dma_reg)
>                 /* Disable DMA hardware handshaking on MCI */
>                 atmci_writel(host, ATMCI_DMA, atmci_readl(host, ATMCI_DMA) & ~ATMCI_DMAEN);
>  @@ -954,7 +955,9 @@ atmci_prepare_data_dma(struct atmel_mci *host, struct mmc_data *data)
>                 maxburst = atmci_convert_chksize(host->dma_conf.dst_maxburst);
>         }
>  -      atmci_writel(host, ATMCI_DMA, ATMCI_DMA_CHKSIZE(maxburst) | ATMCI_DMAEN);
> +       if (host->caps.has_dma_reg)
> +               atmci_writel(host, ATMCI_DMA,
> +                       ATMCI_DMA_CHKSIZE(maxburst) | ATMCI_DMAEN);
>         sglen = dma_map_sg(chan->device->dev, data->sg,
>                         data->sg_len, direction);
> @@ -2203,7 +2206,11 @@ static void __init atmci_get_cap(struct atmel_mci *host)
>                         "version: 0x%x\n", version);
>         host->caps.has_dma = 0;
> +#if defined(CONFIG_CPU_AT32AP700X)
> +       host->caps.has_pdc = 0;
> +#else
>         host->caps.has_pdc = 1;
> +#endif
>         host->caps.has_cfg_reg = 0;
>         host->caps.has_cstor_reg = 0;
>         host->caps.has_highspeed = 0;
> @@ -2221,6 +2228,7 @@ static void __init atmci_get_cap(struct atmel_mci *host)
>         case 0x300:
>  #ifdef CONFIG_AT_HDMAC
>                 host->caps.has_dma = 1;
> +               host->caps.has_dma_reg = 1;
>  #else
>                 dev_info(&host->pdev->dev,
>                         "has dma capability but dma engine is not selected, then use pio\n");
> @@ -2230,6 +2238,11 @@ static void __init atmci_get_cap(struct atmel_mci *host)
>                 host->caps.has_cstor_reg = 1;
>                 host->caps.has_highspeed = 1;
>         case 0x200:
> +#if defined(CONFIG_DW_DMAC) && defined(CONFIG_MMC_ATMELMCI_DMA)
> +               host->caps.has_dma = 1;
> +               dev_info(&host->pdev->dev,
> +                       "using dma (AP7000)\n");
> +#endif
>                 host->caps.has_rwproof = 1;
>                 host->caps.need_blksz_mul_4 = 0;
>         case 0x100:
> --
> 1.7.8.0
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fixes for dw_dmac and atmel-mci for AP700x
  2012-08-21  4:42                 ` [PATCH] Fixes for dw_dmac and atmel-mci for AP700x viresh kumar
@ 2012-08-21  6:12                   ` Hein Tibosch
       [not found]                     ` <CAKohponN16krs-WWw6Abh1fLPO3+iYndTaxsPDfeXCoS9OHufQ@mail.gmail.com>
  0 siblings, 1 reply; 11+ messages in thread
From: Hein Tibosch @ 2012-08-21  6:12 UTC (permalink / raw)
  To: viresh kumar
  Cc: Hans-Christian Egtvedt, Nicolas Ferre, Havard Skinnemoen,
	ludovic.desroches, linux-kernel, spear-devel

Hi Viresh,


On 8/21/2012 12:42 PM, viresh kumar wrote:
> I have added linux-kernel list in cc as there might be other users of
> this patch.
> Also, please try to keep spear-devel list in cc for dw_dmac as we are using this
> driver for SPEAr.

Yes sure, I didn't want to bother the lists with the first preparations of this patch.

> On Sun, Aug 19, 2012 at 9:36 PM, Hein Tibosch <hein_tibosch@yahoo.es> wrote:
>> dw_dmac:
>> - After 2.6.39, the registers were accessed using readl/writel
>>   in stead of the __raw_readl and __raw_writel causing a 16-bit
>>   swap of all values (little endian access)
> Ahhhh!! Firstly we can't use __raw* for architectures >= ARMv6. It is not
> only for endianess but for memory barriers. Why are they getting swapped
> for your case? Does your processor and dw_dmac have different endianess?

If I'm not wrong: the __raw_* functions will access the i/o memory in native endianess.
As far as I know, all AVR32 drivers are currently using the __raw* functions. I never
encountered  a problem with that.

> And if i am not wrong, we should always try not to use __raw* variants just
> due to endianess things... instead use either readl/writel OR
> readl_/writel_ relaxed.
> I am not sure if relaxed versions are available for architectures
> other than ARM.

Would you agree to have this depend on CONFIG_AVR ?

>> - Access to memory was sometimes done in chunks of 64-bits,
>>   which gives an undefined value of 0x03 for SRC/DST_TR_WIDTH
>>   field in the CTLxL register
> Looks fine. But there should be a separate patch for this.
>
>> - The SMS field in the CTLxL register received the wrong value:
>>   0 in stead of 1
> I believe it is not for dw_dmac?

That's correct, I'll put it in a separate Atmel patch (at32ap700x.c)

So for the AP700x fixes for the MCI/DMA driver I'm thinking of this
series of patches:

To you, cc spear-devel & linux-kernel
- include/linux/dw_dmac.h:
  adding field max_mem_width to limit SRC/DST_TR_WIDTH
- dw_dmac.c:
  check max_mem_width before setting SRC/DST_TR_WIDTH
- dw_dmac_regs.h:
  use __raw* functions for AVR32, readl/writel for all others

To Ludovic Desroches, linux-mmc:
- atmel-mci.c:
  avoid using peripheral DMA controller (PDC) in case of AVR32
- atmel-mci.c:
  only use the ATMCI_DMA register if supported by arch

To Hans-Christian, linux-kernel cc Andrew:
- at32ap700x.c:
  set src_master=1 to get SMS (Source Master Select) correct
  set max_mem_width=2 to get SRC/DST_TR_WIDTH correct

Ok?

Thanks, Hein

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fixes for dw_dmac and atmel-mci for AP700x
       [not found]                     ` <CAKohponN16krs-WWw6Abh1fLPO3+iYndTaxsPDfeXCoS9OHufQ@mail.gmail.com>
@ 2012-08-21  7:32                       ` Hein Tibosch
       [not found]                         ` <CAKohpomMPeewDBVxVR=g-op7E53rqt-AZgOdyoib6W+5QsLOOA@mail.gmail.com>
  2012-08-21  7:44                       ` Arnd Bergmann
  1 sibling, 1 reply; 11+ messages in thread
From: Hein Tibosch @ 2012-08-21  7:32 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Hans-Christian Egtvedt, Nicolas Ferre, Havard Skinnemoen,
	ludovic.desroches, linux-kernel, spear-devel, Arnd Bergmann

On 8/21/2012 2:35 PM, Viresh Kumar wrote:
> On 21 August 2012 11:42, Hein Tibosch <hein_tibosch@yahoo.es <mailto:hein_tibosch@yahoo.es>> wrote:
>
>     On 8/21/2012 12:42 PM, viresh kumar wrote:
>     > Ahhhh!! Firstly we can't use __raw* for architectures >= ARMv6. It is not
>     > only for endianess but for memory barriers. Why are they getting swapped
>     > for your case? Does your processor and dw_dmac have different endianess?
>
>     If I'm not wrong: the __raw_* functions will access the i/o memory in native endianess.
>     As far as I know, all AVR32 drivers are currently using the __raw* functions. I never
>     encountered  a problem with that.
>
>
> I don't have the best knowledge in this area :( and so cc-ing Arnd who can help us here.
> So my perception of these routines is:
>
> readl is defined generically as:
>
> #define readl(addr) __le32_to_cpu(__raw_readl(addr))
>
> Which converts to a simple __raw_readl() for little endian systems and
> swapped bytes for a big endian system.
>
> You wrote in the beginning
> >> - After 2.6.39, the registers were accessed using readl/writel
> >>   in stead of the __raw_readl and __raw_writel causing a 16-bit
> >>   swap of all values (little endian access)

It got swapped as 0xAABB.CCDD => 0xCCDD.AABB

> What's this 16-bit swap here? It should be a 8 bit swap by __swab32() routine.
>  
> Is AVR32 a big-endian system? Probably big-endian, that's why values are getting
> swapped. And dw_dmac is the standard one, can call it little endian for the time being.
>
AVR32 (AP700x) is a big-endian system.

It took me many hours to figure out why the DMA controller didn't do
anything at all.

Then I decided to access the i/o memory in different way:

    #include "dmaca_206a.h" // from the AVR32-tools library

    const struct __iomem avr32_dmaca_t *regs =
        (const struct __iomem avr32_dmaca_t *)0xFF200000;

and printk all register values. Each and every u32 field was swapped as
0xAABBCCDD => 0xCCDDAABB

Håvard's comment was: "it's always wrong to use little endian accessors
(i.e. readl/writel) in drivers for chip-internal devices"

Memory barriers: within the AVR32 code, one often sees explicit ways to
introduce memory barriers, e.g.:

    hsmc_readl(hsmc, MODE0); /* I/O barrier */

which means: reading back (just any) register (from the same peripheral)
to assure that the write was successful, especially from within
interrupt handlers.

But as I said, within avr32-linux I never encountered problems
with the __raw* approach, including dw_dmac.c

> @Arnd: What should we do here?

I'm curious too

Hein

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fixes for dw_dmac and atmel-mci for AP700x
       [not found]                     ` <CAKohponN16krs-WWw6Abh1fLPO3+iYndTaxsPDfeXCoS9OHufQ@mail.gmail.com>
  2012-08-21  7:32                       ` Hein Tibosch
@ 2012-08-21  7:44                       ` Arnd Bergmann
       [not found]                         ` <CAKohpo=DyQajiE-DmpMiv=gVOVOep1OEECVuw83D2u4VJisEsw@mail.gmail.com>
  1 sibling, 1 reply; 11+ messages in thread
From: Arnd Bergmann @ 2012-08-21  7:44 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Hein Tibosch, Hans-Christian Egtvedt, Nicolas Ferre,
	Havard Skinnemoen, ludovic.desroches, linux-kernel, spear-devel

On Tuesday 21 August 2012, Viresh Kumar wrote:
> On 21 August 2012 11:42, Hein Tibosch <hein_tibosch@yahoo.es> wrote:
> 
> > On 8/21/2012 12:42 PM, viresh kumar wrote:
> > > Ahhhh!! Firstly we can't use __raw* for architectures >= ARMv6. It is not
> > > only for endianess but for memory barriers. Why are they getting swapped
> > > for your case? Does your processor and dw_dmac have different endianess?
> >
> > If I'm not wrong: the __raw_* functions will access the i/o memory in
> > native endianess.
> > As far as I know, all AVR32 drivers are currently using the __raw*
> > functions. I never
> > encountered  a problem with that.
> >
> 
> I don't have the best knowledge in this area :( and so cc-ing Arnd who can
> help us here.
> So my perception of these routines is:
> 
> readl is defined generically as:
> 
> #define readl(addr) __le32_to_cpu(__raw_readl(addr))
> 
> Which converts to a simple __raw_readl() for little endian systems and
> swapped bytes for a big endian system.

readl does much more than that:

* It guarantees that read accesses arrive at the device in the order they
  are written in the source code
* It maintains ordering between DMA and MMIO accesses
* It computes the correct address to dereference from an __iomem token.
* It guarantees that the access is done in a single atomic load, rather
  than a series of byte accesses.

All this is necessary to make device drivers work in general, although
on many simple (strictly ordered) architectures a pointer dereference
will do the same thing. Device drivers should never use the __raw_*
accessors themselves. Some architectures define their own bus-specific
accessors, but not everyone thinks that is a good idea.

> You wrote in the beginning
> >> - After 2.6.39, the registers were accessed using readl/writel
> >>   in stead of the __raw_readl and __raw_writel causing a 16-bit
> >>   swap of all values (little endian access)
> 
> What's this 16-bit swap here? It should be a 8 bit swap by __swab32()
> routine.
> 
> Is AVR32 a big-endian system? Probably big-endian, that's why values are
> getting
> swapped. And dw_dmac is the standard one, can call it little endian for the
> time being.
> 
> @Arnd: What should we do here?

Yes, AVR32 is big-endian. I assume that dw_dmac can be either configured
as little-endian or big-endian and that it is configured as big-endian
on AVR32.

To solve this, we can either make the endianess of dw_dmac run-time
detected, or we add a configuration symbol for it and use either
ioread32_be() or readl(), depending how this is set. This symbol
can be hardwired by architectures on which it is known in advance.
I've also seen some devices that can be configured into either
endianess, so if this is true for the dw_dmac, we could just force
it to be little-endian all the time.

	Arnd

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fixes for dw_dmac and atmel-mci for AP700x
       [not found]                         ` <CAKohpo=DyQajiE-DmpMiv=gVOVOep1OEECVuw83D2u4VJisEsw@mail.gmail.com>
@ 2012-08-21  8:31                           ` Arnd Bergmann
  2012-08-21 14:15                             ` Havard Skinnemoen
  0 siblings, 1 reply; 11+ messages in thread
From: Arnd Bergmann @ 2012-08-21  8:31 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Hein Tibosch, Hans-Christian Egtvedt, Nicolas Ferre,
	Havard Skinnemoen, ludovic.desroches, linux-kernel, spear-devel

On Tuesday 21 August 2012, Viresh Kumar wrote:
> > Is AVR32 a big-endian system? Probably big-endian, that's why values are
> > > getting
> > > swapped. And dw_dmac is the standard one, can call it little endian for
> > the
> > > time being.
> > >
> > > @Arnd: What should we do here?
> >
> > Yes, AVR32 is big-endian. I assume that dw_dmac can be either configured
> > as little-endian or big-endian and that it is configured as big-endian
> > on AVR32.
> >
> 
> Just to understand a bit more on this always confusing endianess concept:
> - For AVR32, readl is calling swab everytime. So whatever we write will get
> swapped.
> - What are the implications of dw_dmac configured in little/big endian?
> 
> When we write something to register of a peripheral, whose endianess
> property decides how it will get written. Processor or Peripheral?

The device decides which accessor we need to use (readl, ioread, ioread_be,
in_be, in_le, ...). The architecture code must ensure that this is
implemented properly based on the CPU endianess. We don't have a proper
accessor function that implements "device has same endianess as CPU".
Using __raw_* is not a replacement for that.

I don't mind adding such an accessor at all, and a number people have
complained about the lack of this for some time, but you should be
aware that a lot of peripherals that are intended to be used in
"CPU-endian" mode eventually end up getting used in "wrong-endian"
mode, e.g. when someone decides to put that peripheral on a PCI
card and someone else sticks it into a machine that has a CPU
with the opposed (or configurable) endianess. It would be nice if
the likes of designware could at least pick one endianess per
device they do, but the reality is that we have to deal with both
variants and only the device driver can find out what it is.

	Arnd

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fixes for dw_dmac and atmel-mci for AP700x
       [not found]                         ` <CAKohpomMPeewDBVxVR=g-op7E53rqt-AZgOdyoib6W+5QsLOOA@mail.gmail.com>
@ 2012-08-21  8:34                           ` Arnd Bergmann
       [not found]                             ` <CAKohpokhM5WkUK6ww8Neu+fx554+-4uBCsNzBxB9q5rcqSf6cw@mail.gmail.com>
  0 siblings, 1 reply; 11+ messages in thread
From: Arnd Bergmann @ 2012-08-21  8:34 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Hein Tibosch, Hans-Christian Egtvedt, Nicolas Ferre,
	Havard Skinnemoen, ludovic.desroches, linux-kernel, spear-devel

On Tuesday 21 August 2012, Viresh Kumar wrote:
> On 21 August 2012 13:02, Hein Tibosch <hein_tibosch@yahoo.es> wrote:
> 
> > On 8/21/2012 2:35 PM, Viresh Kumar wrote:
> >
> > It got swapped as 0xAABB.CCDD => 0xCCDD.AABB
> >
> 
> @Arnd: How do we explain this? shouldn't it be DD CC BB AA??

Yes, this is very strange. Maybe the compiler already splits the
access into two 16-byte loads and that confuses the device?

> > Memory barriers: within the AVR32 code, one often sees explicit ways to
> > introduce memory barriers, e.g.:
> >
> >     hsmc_readl(hsmc, MODE0); /* I/O barrier */
> >
> 
> For ARM it has become a bit complex about using barriers. That's why they
> are added in readl/writel to remove any confusion.

To be more exact: the reason why readl/writel have the barriers is that
device drivers written for x86 expect the barrier semantics to be implied.
ARM also has readl_relaxed/writel_relaxed, which don't have a barrier
against DMA but still enforce ordering between the readl calls (implied
by the CPU architecture).

	Arnd

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fixes for dw_dmac and atmel-mci for AP700x
       [not found]                             ` <CAKohpokhM5WkUK6ww8Neu+fx554+-4uBCsNzBxB9q5rcqSf6cw@mail.gmail.com>
@ 2012-08-21  8:47                               ` Arnd Bergmann
       [not found]                                 ` <CAKohpokoeSLBtLdh8hr4GKv8VOxzK8Vq5SoqLE3yC-TDyjYA9w@mail.gmail.com>
  0 siblings, 1 reply; 11+ messages in thread
From: Arnd Bergmann @ 2012-08-21  8:47 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Hein Tibosch, Hans-Christian Egtvedt, Nicolas Ferre,
	Havard Skinnemoen, ludovic.desroches, linux-kernel, spear-devel

On Tuesday 21 August 2012, Viresh Kumar wrote:
> On 21 August 2012 14:04, Arnd Bergmann <arnd.bergmann@linaro.org> wrote:
>
> > Yes, this is very strange. Maybe the compiler already splits the
> > access into two 16-byte loads and that confuses the device?
>
> @Arnd: Is compiler allowed to do that even when we have volatile specified
> for the access? It shouldn't optimize the access at all i believe.

Yes. The "volatile" keyword implies that the compiler has to do the access
exactly once and that it cannot reorder the access with others on the same
address, or read more data than is specified, but it can (and does) split
the access if there is a reason for that, e.g. when the address might
be misaligned and the architecture cannot do misaligned loads.
	
	Arnd

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fixes for dw_dmac and atmel-mci for AP700x
       [not found]                                 ` <CAKohpokoeSLBtLdh8hr4GKv8VOxzK8Vq5SoqLE3yC-TDyjYA9w@mail.gmail.com>
@ 2012-08-21  9:05                                   ` Arnd Bergmann
  2012-08-21 14:24                                     ` Hein Tibosch
  0 siblings, 1 reply; 11+ messages in thread
From: Arnd Bergmann @ 2012-08-21  9:05 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Hein Tibosch, Hans-Christian Egtvedt, Nicolas Ferre,
	Havard Skinnemoen, ludovic.desroches, linux-kernel, spear-devel

On Tuesday 21 August 2012, Viresh Kumar wrote:
> On 21 August 2012 14:17, Arnd Bergmann <arnd.bergmann@linaro.org> wrote:
> 
> > On Tuesday 21 August 2012, Viresh Kumar wrote:
> > > On 21 August 2012 14:04, Arnd Bergmann <arnd.bergmann@linaro.org> wrote:
> > >
> > > > Yes, this is very strange. Maybe the compiler already splits the
> > > > access into two 16-byte loads and that confuses the device?
> > >
> > > @Arnd: Is compiler allowed to do that even when we have volatile
> > specified
> > > for the access? It shouldn't optimize the access at all i believe.
> >
> > Yes. The "volatile" keyword implies that the compiler has to do the access
> > exactly once and that it cannot reorder the access with others on the same
> > address, or read more data than is specified, but it can (and does) split
> > the access if there is a reason for that, e.g. when the address might
> > be misaligned and the architecture cannot do misaligned loads.
> 
> 
> But that can't be the case here. Isn't it? So we shouldn't have got such
> results.

It should be easy to tell from the object code whether this happened
or not. If it did, then we can investigate why gcc did that, otherwise
something else caused the strange byte swap.

The safe way to define the readl() function in asm/io.h is to
use an inline assembly that prevents the access from getting split,
but avr32 just uses a pointer dereference here.

I think I just found the answer elsewhere in
arch/avr32/mach-at32ap/include/mach/io.h, which defines

# define __mem_ioswabl(a, x)    swahb32(x)

and that apparently does the halfword swap when CONFIG_AP700X_16_BIT_SMC
is set. This explains why Havard said it's wrong to use readl on
internal deviceson avr32, but unfortunately that rule conflicts with how
we define the accessors on ARM.

	Arnd

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fixes for dw_dmac and atmel-mci for AP700x
  2012-08-21  8:31                           ` Arnd Bergmann
@ 2012-08-21 14:15                             ` Havard Skinnemoen
  2012-08-23  3:47                               ` Hein Tibosch
  0 siblings, 1 reply; 11+ messages in thread
From: Havard Skinnemoen @ 2012-08-21 14:15 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Viresh Kumar, Hein Tibosch, Hans-Christian Egtvedt,
	Nicolas Ferre, ludovic.desroches, linux-kernel, spear-devel

On Tue, Aug 21, 2012 at 1:31 AM, Arnd Bergmann <arnd.bergmann@linaro.org> wrote:
> On Tuesday 21 August 2012, Viresh Kumar wrote:
>> > Is AVR32 a big-endian system? Probably big-endian, that's why values are
>> > > getting
>> > > swapped. And dw_dmac is the standard one, can call it little endian for
>> > the
>> > > time being.
>> > >
>> > > @Arnd: What should we do here?
>> >
>> > Yes, AVR32 is big-endian. I assume that dw_dmac can be either configured
>> > as little-endian or big-endian and that it is configured as big-endian
>> > on AVR32.
>> >
>>
>> Just to understand a bit more on this always confusing endianess concept:
>> - For AVR32, readl is calling swab everytime. So whatever we write will get
>> swapped.
>> - What are the implications of dw_dmac configured in little/big endian?
>>
>> When we write something to register of a peripheral, whose endianess
>> property decides how it will get written. Processor or Peripheral?
>
> The device decides which accessor we need to use (readl, ioread, ioread_be,
> in_be, in_le, ...). The architecture code must ensure that this is
> implemented properly based on the CPU endianess. We don't have a proper
> accessor function that implements "device has same endianess as CPU".
> Using __raw_* is not a replacement for that.
>
> I don't mind adding such an accessor at all, and a number people have
> complained about the lack of this for some time, but you should be
> aware that a lot of peripherals that are intended to be used in
> "CPU-endian" mode eventually end up getting used in "wrong-endian"
> mode, e.g. when someone decides to put that peripheral on a PCI
> card and someone else sticks it into a machine that has a CPU
> with the opposed (or configurable) endianess. It would be nice if
> the likes of designware could at least pick one endianess per
> device they do, but the reality is that we have to deal with both
> variants and only the device driver can find out what it is.

A native-endian accessor would really help in this case. Back in 2006
when I did the AVR32 port, that's what I thought __raw_ was, but I
suspect I was both wrong, and the semantics of the I/O accessors have
changed slightly over time. But since __raw_ did exactly what was
needed on both AVR32 (big endian) and AT91 ARM (little endian)
devices, I ended up using it for all on-chip devices.

read[bwl] on the other hand was implemented for external device
access. Since the SMC controller on AP7000 is really bizarre
(basically big endian data with little endian addressing), these
accessors became pretty weird too.

Anyway, if we want to purge all those inappropriate __raw accesses
from the Atmel drivers, we're going to need a replacement for internal
native-endian access. Right now, we don't have any.

Havard

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fixes for dw_dmac and atmel-mci for AP700x
  2012-08-21  9:05                                   ` Arnd Bergmann
@ 2012-08-21 14:24                                     ` Hein Tibosch
  0 siblings, 0 replies; 11+ messages in thread
From: Hein Tibosch @ 2012-08-21 14:24 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Viresh Kumar, Hans-Christian Egtvedt, Nicolas Ferre,
	Havard Skinnemoen, ludovic.desroches, linux-kernel, spear-devel

On 8/21/2012 5:05 PM, Arnd Bergmann wrote:
> It should be easy to tell from the object code whether this happened
> or not. If it did, then we can investigate why gcc did that, otherwise
> something else caused the strange byte swap.
>
> The safe way to define the readl() function in asm/io.h is to
> use an inline assembly that prevents the access from getting split,
> but avr32 just uses a pointer dereference here.
>
> I think I just found the answer elsewhere in
> arch/avr32/mach-at32ap/include/mach/io.h, which defines
>
> # define __mem_ioswabl(a, x)    swahb32(x)
>
> and that apparently does the halfword swap when CONFIG_AP700X_16_BIT_SMC
> is set. This explains why Havard said it's wrong to use readl on
> internal deviceson avr32, but unfortunately that rule conflicts with how
> we define the accessors on ARM.

I already thought the 16-bit swap might be related to the 16-bit SMC
configuration. SMC will fetch each u32 word in 2 different 16-bit banks
of SDRAM.

In a meanwhile I tried dw_dmac using iowrite32be/ioread32be, and it worked
equally well! Which isn't surprising because for AVR32 they were defined as:

#define iowrite32be(v,p)    __raw_writel(v, p)
#define ioread32be(p)        ((unsigned int)__raw_readl(p))

The 'relaxed' versions won't work because:

#define readl_relaxed            readl

The object code confirms what was expected:

    u32 val = ioread32be (((unsigned*)0x100)); /* Same as __raw_readl */
    ld.w  r8,pc[256]

    iowrite32be (val, ((unsigned*)0x104)); /* same as __raw_writel */
    st.w  pc[260],r8

    val = readl (((unsigned*)0x108)); /* __raw_readl + swahb32 */
    ld.w  r8,pc[264]
    lsl   r9,r8,0x10  /* 16-bit swap */
    or    r9,r9,r8>>0x10

    writel (val, ((unsigned*)0x10C)); /* swahb32 + __raw_writel */
    lsl   r8,r9,0x10  /* 16-bit swap */
    or    r8,r8,r9>>0x10
    st.w  pc[268],r8


Hein

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fixes for dw_dmac and atmel-mci for AP700x
  2012-08-21 14:15                             ` Havard Skinnemoen
@ 2012-08-23  3:47                               ` Hein Tibosch
  0 siblings, 0 replies; 11+ messages in thread
From: Hein Tibosch @ 2012-08-23  3:47 UTC (permalink / raw)
  To: Havard Skinnemoen
  Cc: Arnd Bergmann, Viresh Kumar, egtvedt, Nicolas Ferre,
	ludovic.desroches, linux-kernel, spear-devel

On 8/21/2012 10:15 PM, Havard Skinnemoen wrote:
> On Tue, Aug 21, 2012 at 1:31 AM, Arnd Bergmann <arnd.bergmann@linaro.org> wrote:
>> On Tuesday 21 August 2012, Viresh Kumar wrote:
>>>> Is AVR32 a big-endian system? Probably big-endian, that's why values are
>>>>> getting
>>>>> swapped. And dw_dmac is the standard one, can call it little endian for
>>>> the
>>>>> time being.
>>>>>
>>>>> @Arnd: What should we do here?
>>>> Yes, AVR32 is big-endian. I assume that dw_dmac can be either configured
>>>> as little-endian or big-endian and that it is configured as big-endian
>>>> on AVR32.
>>>>
>>> Just to understand a bit more on this always confusing endianess concept:
>>> - For AVR32, readl is calling swab everytime. So whatever we write will get
>>> swapped.
>>> - What are the implications of dw_dmac configured in little/big endian?
>>>
>>> When we write something to register of a peripheral, whose endianess
>>> property decides how it will get written. Processor or Peripheral?
>> The device decides which accessor we need to use (readl, ioread, ioread_be,
>> in_be, in_le, ...). The architecture code must ensure that this is
>> implemented properly based on the CPU endianess. We don't have a proper
>> accessor function that implements "device has same endianess as CPU".
>> Using __raw_* is not a replacement for that.
>>
>> I don't mind adding such an accessor at all, and a number people have
>> complained about the lack of this for some time, but you should be
>> aware that a lot of peripherals that are intended to be used in
>> "CPU-endian" mode eventually end up getting used in "wrong-endian"
>> mode, e.g. when someone decides to put that peripheral on a PCI
>> card and someone else sticks it into a machine that has a CPU
>> with the opposed (or configurable) endianess. It would be nice if
>> the likes of designware could at least pick one endianess per
>> device they do, but the reality is that we have to deal with both
>> variants and only the device driver can find out what it is.
> A native-endian accessor would really help in this case. Back in 2006
> when I did the AVR32 port, that's what I thought __raw_ was, but I
> suspect I was both wrong, and the semantics of the I/O accessors have
> changed slightly over time. But since __raw_ did exactly what was
> needed on both AVR32 (big endian) and AT91 ARM (little endian)
> devices, I ended up using it for all on-chip devices.
>
> read[bwl] on the other hand was implemented for external device
> access. Since the SMC controller on AP7000 is really bizarre
> (basically big endian data with little endian addressing), these
> accessors became pretty weird too.
>
> Anyway, if we want to purge all those inappropriate __raw accesses
> from the Atmel drivers, we're going to need a replacement for internal
> native-endian access. Right now, we don't have any.

If I may summarize: Viresh replaced the __raw* accessor functions with
readl/writel, because they will enforce a correct ordering of instructions
exchanged with the peripheral.

On the ARM configurations this worked ok, because both the device and the
CPU are little endian.

However, for AVR32 the driver got broken: readl/writel are little endian
accessors.

I looked up an earlier thread "MMIO and gcc re-ordering issue"*, in which
Havard wrote:

> <cut> I don't think adding barriers is the
> right thing to do because they won't do anything useful in practice, so
> it's hard to tell whether they are used "correctly". And it will hurt
> performance at least on AVR32 since wmb() evaluates to a "sync"
> instruction, which flushes the write buffer to RAM. Since MMIO writes
> are unbuffered, that's pure overhead.

Four years later, we still don't have a generic native-endian MMIO-access
API which also introduce proper memory barriers.

So for the moment, do we have a better solution than this:

#ifdef CONFIG_AVR32
+#define dma_readl(dw, name) \
+	__raw_readl(&(__dw_regs(dw)->name))
+#define dma_writel(dw, name, val) \
+	__raw_writel((val), &(__dw_regs(dw)->name))
+#else
 #define dma_readl(dw, name) \
 	readl(&(__dw_regs(dw)->name))
 #define dma_writel(dw, name, val) \
 	writel((val), &(__dw_regs(dw)->name))
+#endif

and do the same for channel_readl/writel ?

Hein

*http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-05/msg13776.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-08-23  4:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <502BC31E.4070200@yahoo.es>
     [not found] ` <502BDD0E.4030106@yahoo.es>
     [not found]   ` <502CA8FC.7090705@atmel.com>
     [not found]     ` <502CB89B.4070302@yahoo.es>
     [not found]       ` <502CC326.8020605@atmel.com>
     [not found]         ` <502CC6A0.3050603@atmel.com>
     [not found]           ` <502F52E7.9050804@yahoo.es>
     [not found]             ` <CACiLriS-GZg24TJqJGQ=P04n-Zk3FdHtGgh5VeqGcCMHG6TnMw@mail.gmail.com>
     [not found]               ` <50310F10.2080701@yahoo.es>
2012-08-21  4:42                 ` [PATCH] Fixes for dw_dmac and atmel-mci for AP700x viresh kumar
2012-08-21  6:12                   ` Hein Tibosch
     [not found]                     ` <CAKohponN16krs-WWw6Abh1fLPO3+iYndTaxsPDfeXCoS9OHufQ@mail.gmail.com>
2012-08-21  7:32                       ` Hein Tibosch
     [not found]                         ` <CAKohpomMPeewDBVxVR=g-op7E53rqt-AZgOdyoib6W+5QsLOOA@mail.gmail.com>
2012-08-21  8:34                           ` Arnd Bergmann
     [not found]                             ` <CAKohpokhM5WkUK6ww8Neu+fx554+-4uBCsNzBxB9q5rcqSf6cw@mail.gmail.com>
2012-08-21  8:47                               ` Arnd Bergmann
     [not found]                                 ` <CAKohpokoeSLBtLdh8hr4GKv8VOxzK8Vq5SoqLE3yC-TDyjYA9w@mail.gmail.com>
2012-08-21  9:05                                   ` Arnd Bergmann
2012-08-21 14:24                                     ` Hein Tibosch
2012-08-21  7:44                       ` Arnd Bergmann
     [not found]                         ` <CAKohpo=DyQajiE-DmpMiv=gVOVOep1OEECVuw83D2u4VJisEsw@mail.gmail.com>
2012-08-21  8:31                           ` Arnd Bergmann
2012-08-21 14:15                             ` Havard Skinnemoen
2012-08-23  3:47                               ` Hein Tibosch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).