From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1165716AbeE1O1i (ORCPT ); Mon, 28 May 2018 10:27:38 -0400 Received: from mail.bootlin.com ([62.4.15.54]:54043 "EHLO mail.bootlin.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161626AbeE1O1b (ORCPT ); Mon, 28 May 2018 10:27:31 -0400 Date: Mon, 28 May 2018 16:27:18 +0200 From: Boris Brezillon To: Peter Rosin Cc: Tudor Ambarus , Nicolas Ferre , Ludovic Desroches , Alexandre Belloni , Marek Vasut , Josh Wu , Cyrille Pitchen , linux-kernel@vger.kernel.org, linux-mtd@lists.infradead.org, Richard Weinberger , Brian Norris , David Woodhouse , linux-arm-kernel@lists.infradead.org, Eugen Hristev Subject: Re: [PATCH] mtd: nand: raw: atmel: add module param to avoid using dma Message-ID: <20180528162718.6be7191f@bbrezillon> In-Reply-To: References: <20180329131054.22506-1-peda@axentia.se> <20180329153322.5e2fc1e7@bbrezillon> <20180329154416.5c1a0013@bbrezillon> <20180402142249.7e076a64@bbrezillon> <20180402212843.164d5d21@bbrezillon> <20180402222020.1d344c14@bbrezillon> <20180403091813.5fb5c18c@bbrezillon> <959d826d-1a98-ca22-acee-a4548427fcd3@microchip.com> <024079cb-77ad-9c48-e370-e6e8f2de171b@axentia.se> <9c496531-f7b6-4b9d-dd51-0bfb68ead303@axentia.se> X-Mailer: Claws Mail 3.15.0-dirty (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Peter, On Mon, 28 May 2018 12:10:02 +0200 Peter Rosin wrote: > On 2018-05-28 00:11, Peter Rosin wrote: > > On 2018-05-27 11:18, Peter Rosin wrote: > >> On 2018-05-25 16:51, Tudor Ambarus wrote: > >>> We think the best way is to keep LCD on DDR Ports 2 and 3 (8th and 9th > >>> slaves), to have maximum bandwidth and to use DMA on DDR port 1 for NAND > >>> (7th slave). > >> > >> Exactly how do I accomplish that? > >> > >> I can see how I can move the LCD between slave DDR port 2 and 3 by > >> selecting LCDC DMA master 8 or 9 (but according to the above it should > >> not matter). The big question is how I control what slave the NAND flash > >> is going to use? I find nothing in the datasheet, and the code is also > >> non-transparent enough for me to figure it out by myself without > >> throwing out this question first... > > > > I added this: > > > > diff --git a/drivers/mtd/nand/raw/atmel/nand-controller.c b/drivers/mtd/nand/raw/atmel/nand-controller.c > > index e686fe73159e..3b33c63d2ed4 100644 > > --- a/drivers/mtd/nand/raw/atmel/nand-controller.c > > +++ b/drivers/mtd/nand/raw/atmel/nand-controller.c > > @@ -1991,6 +1991,9 @@ static int atmel_nand_controller_init(struct atmel_nand_controller *nc, > > nc->dmac = dma_request_channel(mask, NULL, NULL); > > if (!nc->dmac) > > dev_err(nc->dev, "Failed to request DMA channel\n"); > > + > > + dev_info(nc->dev, "using %s for DMA transfers\n", > > + dma_chan_name(nc->dmac)); > > } > > > > /* We do not retrieve the SMC syscon when parsing old DTs. */ > > > > > > > > and the output is > > > > atmel-nand-controller 10000000.ebi:nand-controller: using dma0chan5 for DMA transfers > > > > So, DMA controller 0 is in use. I still don't know if IF0, IF1 or IF2 is used > > or how to find out. I guess IF2 is not in use since that does not allow any > > DDR2 port as slave... > > > > From the datasheet, DMAC0/IF0 uses DDR2 Port 2, and DMAC0/IF1 uses DDR2 Port 1. > > But, by the looks of the register content in my other mail, it seems as if > > DMA0/IF1 can also use DDR2 Port 3. > > > > So, I think I want either > > > > A) the NAND controller to use DMAC0/IF0 (i.e. DDR2 port 1, and possibly 3) and > > the LCDC to use master 9 (i.e. DDR2 Port 2) > > > > or > > > > B) the NAND controller to use DMAC1/IF1 (i.e. DDR2 port 2) and the LCDC to use > > master 8 (i.e. DDR2 Port 3) > > Crap, that was not what I meant to express. Sorry for the confusion. This is > better. > > So, I think I want either > > A) the NAND controller to use master 1 DMAC0/IF0 (i.e. slave 8 DDR2 port 2) and > the LCDC to use master 9 (i.e. slave 9 DDR2 Port 3) > > or > > B) the NAND controller to use master 2 DMAC0/IF1 (i.e. slave 7 DDR2 port 1, and > possibly slave 9 DDR2 port 3 (if my previous findings are relevant) and the > LCDC to use master 8 (i.e. slave 8 DDR2 Port 2) > > > But, again, how do I limit DMAC0 to either of IF0 or IF1 for NAND accesses? > > So, I added a horrid patch (attached), which mainly adds printk lines, but > additionally does one more thing in atc_prep_dma_memcpy. It changes the DSCR_IF > field (from 0) to 1 for DMA-memcpy for dma0chan5 (i.e. the NAND). At least I > think it does that? > > Running with that patch gets me this: > > # dmesg | grep -i dma > at_hdmac ffffe600.dma-controller: Atmel AHB DMA Controller ( cpy set slave ), 8 channels > at_hdmac ffffe800.dma-controller: Atmel AHB DMA Controller ( cpy set slave ), 8 channels > dma dma0chan0: xlate 0 2 > dma dma0chan1: xlate 0 2 > at91_i2c f0014000.i2c: using dma0chan0 (tx) and dma0chan1 (rx) for DMA transfers > dma dma1chan0: xlate 0 2 > dma dma1chan1: xlate 0 2 > at91_i2c f801c000.i2c: using dma1chan0 (tx) and dma1chan1 (rx) for DMA transfers > dma dma0chan2: xlate 0 2 > dma dma0chan3: xlate 0 2 > dma dma0chan4: xlate 0 2 > atmel_mci f0000000.mmc: using dma0chan4 for DMA transfers > dma dma1chan2: xlate 0 2 > dma dma1chan3: xlate 0 2 > atmel_aes f8038000.aes: Atmel AES - Using dma1chan2, dma1chan3 for DMA transfers > dma dma1chan4: xlate 0 2 > atmel_sha f8034000.sha: using dma1chan4 for DMA transfers > dma dma1chan5: xlate 0 2 > dma dma1chan6: xlate 0 2 > atmel_tdes f803c000.tdes: using dma1chan5, dma1chan6 for DMA transfers > atmel-nand-controller 10000000.ebi:nand-controller: using dma0chan5 for DMA transfers > dma dma0chan5: memcpy: 0 > dma dma0chan5: DSCR_IF: 1 > dma dma0chan5: memcpy: 1 > > So, output is as expected and I believe that the patch makes the NAND DMA > accesses use master 2 DMAC0/IF1 and are thus forced to use slave 7 DDR2 Port 1 > (and possibly 9). The LCDC is using slave 8 DDR2 Port 2. So there should be no > slave conflict? > > But the on-screen crap remains during NAND accesses. > > But pressing on. > > I then changed the priorities for all accesses to 0 in the PRxSy registers, except > the ones for masters 8/9 LCDC (slaves 8/9) which I left at priority 3. > > But the on-screen crap remains during NAND accesses. > > My guess is that the NAND DMA is doing too long bursts and that the LCDC therefore > has to wait too long and simply fails to keep the pipeline from running short? > > So I tried to reduce the maximum SLOT_CYCLE for slaves 7 and 9 in the SCFGx > registers. No noticeable effect either. > > I then tried to split bursts from master 2 (DMAC0/IF1) with small values in the > MCFG2 register. No effect. > > I'm getting nowhere. Could it just be that you're reaching the DDR bus limit. As I said previously, when you go through the CPU, and assuming you're consuming the data directly, you have: 1/ NFC SRAM -> CPU 2/ CPU -> L1 data cache --write-back--> DRAM 3/ L1-cache -> CPU While, if you use DMA you get: 1/ NFC SRAM -> DRAM 2/ SDRAM -> L1 data cache -> CPU So, if you're approaching the limit of (LP)DDR bandwidth, using the CPU might make things a bit better. Still, if the limitation really comes from the DDR bus, my opinion is that you should maybe use a smaller resolution or use a more compact pixel format (RGB565?). Did you calculate how much of the bandwidth is taken by the HLCDC block and compared it to the max (LP)DDR bandwidth? Regards, Boris From mboxrd@z Thu Jan 1 00:00:00 1970 From: boris.brezillon@bootlin.com (Boris Brezillon) Date: Mon, 28 May 2018 16:27:18 +0200 Subject: [PATCH] mtd: nand: raw: atmel: add module param to avoid using dma In-Reply-To: References: <20180329131054.22506-1-peda@axentia.se> <20180329153322.5e2fc1e7@bbrezillon> <20180329154416.5c1a0013@bbrezillon> <20180402142249.7e076a64@bbrezillon> <20180402212843.164d5d21@bbrezillon> <20180402222020.1d344c14@bbrezillon> <20180403091813.5fb5c18c@bbrezillon> <959d826d-1a98-ca22-acee-a4548427fcd3@microchip.com> <024079cb-77ad-9c48-e370-e6e8f2de171b@axentia.se> <9c496531-f7b6-4b9d-dd51-0bfb68ead303@axentia.se> Message-ID: <20180528162718.6be7191f@bbrezillon> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Hi Peter, On Mon, 28 May 2018 12:10:02 +0200 Peter Rosin wrote: > On 2018-05-28 00:11, Peter Rosin wrote: > > On 2018-05-27 11:18, Peter Rosin wrote: > >> On 2018-05-25 16:51, Tudor Ambarus wrote: > >>> We think the best way is to keep LCD on DDR Ports 2 and 3 (8th and 9th > >>> slaves), to have maximum bandwidth and to use DMA on DDR port 1 for NAND > >>> (7th slave). > >> > >> Exactly how do I accomplish that? > >> > >> I can see how I can move the LCD between slave DDR port 2 and 3 by > >> selecting LCDC DMA master 8 or 9 (but according to the above it should > >> not matter). The big question is how I control what slave the NAND flash > >> is going to use? I find nothing in the datasheet, and the code is also > >> non-transparent enough for me to figure it out by myself without > >> throwing out this question first... > > > > I added this: > > > > diff --git a/drivers/mtd/nand/raw/atmel/nand-controller.c b/drivers/mtd/nand/raw/atmel/nand-controller.c > > index e686fe73159e..3b33c63d2ed4 100644 > > --- a/drivers/mtd/nand/raw/atmel/nand-controller.c > > +++ b/drivers/mtd/nand/raw/atmel/nand-controller.c > > @@ -1991,6 +1991,9 @@ static int atmel_nand_controller_init(struct atmel_nand_controller *nc, > > nc->dmac = dma_request_channel(mask, NULL, NULL); > > if (!nc->dmac) > > dev_err(nc->dev, "Failed to request DMA channel\n"); > > + > > + dev_info(nc->dev, "using %s for DMA transfers\n", > > + dma_chan_name(nc->dmac)); > > } > > > > /* We do not retrieve the SMC syscon when parsing old DTs. */ > > > > > > > > and the output is > > > > atmel-nand-controller 10000000.ebi:nand-controller: using dma0chan5 for DMA transfers > > > > So, DMA controller 0 is in use. I still don't know if IF0, IF1 or IF2 is used > > or how to find out. I guess IF2 is not in use since that does not allow any > > DDR2 port as slave... > > > > From the datasheet, DMAC0/IF0 uses DDR2 Port 2, and DMAC0/IF1 uses DDR2 Port 1. > > But, by the looks of the register content in my other mail, it seems as if > > DMA0/IF1 can also use DDR2 Port 3. > > > > So, I think I want either > > > > A) the NAND controller to use DMAC0/IF0 (i.e. DDR2 port 1, and possibly 3) and > > the LCDC to use master 9 (i.e. DDR2 Port 2) > > > > or > > > > B) the NAND controller to use DMAC1/IF1 (i.e. DDR2 port 2) and the LCDC to use > > master 8 (i.e. DDR2 Port 3) > > Crap, that was not what I meant to express. Sorry for the confusion. This is > better. > > So, I think I want either > > A) the NAND controller to use master 1 DMAC0/IF0 (i.e. slave 8 DDR2 port 2) and > the LCDC to use master 9 (i.e. slave 9 DDR2 Port 3) > > or > > B) the NAND controller to use master 2 DMAC0/IF1 (i.e. slave 7 DDR2 port 1, and > possibly slave 9 DDR2 port 3 (if my previous findings are relevant) and the > LCDC to use master 8 (i.e. slave 8 DDR2 Port 2) > > > But, again, how do I limit DMAC0 to either of IF0 or IF1 for NAND accesses? > > So, I added a horrid patch (attached), which mainly adds printk lines, but > additionally does one more thing in atc_prep_dma_memcpy. It changes the DSCR_IF > field (from 0) to 1 for DMA-memcpy for dma0chan5 (i.e. the NAND). At least I > think it does that? > > Running with that patch gets me this: > > # dmesg | grep -i dma > at_hdmac ffffe600.dma-controller: Atmel AHB DMA Controller ( cpy set slave ), 8 channels > at_hdmac ffffe800.dma-controller: Atmel AHB DMA Controller ( cpy set slave ), 8 channels > dma dma0chan0: xlate 0 2 > dma dma0chan1: xlate 0 2 > at91_i2c f0014000.i2c: using dma0chan0 (tx) and dma0chan1 (rx) for DMA transfers > dma dma1chan0: xlate 0 2 > dma dma1chan1: xlate 0 2 > at91_i2c f801c000.i2c: using dma1chan0 (tx) and dma1chan1 (rx) for DMA transfers > dma dma0chan2: xlate 0 2 > dma dma0chan3: xlate 0 2 > dma dma0chan4: xlate 0 2 > atmel_mci f0000000.mmc: using dma0chan4 for DMA transfers > dma dma1chan2: xlate 0 2 > dma dma1chan3: xlate 0 2 > atmel_aes f8038000.aes: Atmel AES - Using dma1chan2, dma1chan3 for DMA transfers > dma dma1chan4: xlate 0 2 > atmel_sha f8034000.sha: using dma1chan4 for DMA transfers > dma dma1chan5: xlate 0 2 > dma dma1chan6: xlate 0 2 > atmel_tdes f803c000.tdes: using dma1chan5, dma1chan6 for DMA transfers > atmel-nand-controller 10000000.ebi:nand-controller: using dma0chan5 for DMA transfers > dma dma0chan5: memcpy: 0 > dma dma0chan5: DSCR_IF: 1 > dma dma0chan5: memcpy: 1 > > So, output is as expected and I believe that the patch makes the NAND DMA > accesses use master 2 DMAC0/IF1 and are thus forced to use slave 7 DDR2 Port 1 > (and possibly 9). The LCDC is using slave 8 DDR2 Port 2. So there should be no > slave conflict? > > But the on-screen crap remains during NAND accesses. > > But pressing on. > > I then changed the priorities for all accesses to 0 in the PRxSy registers, except > the ones for masters 8/9 LCDC (slaves 8/9) which I left at priority 3. > > But the on-screen crap remains during NAND accesses. > > My guess is that the NAND DMA is doing too long bursts and that the LCDC therefore > has to wait too long and simply fails to keep the pipeline from running short? > > So I tried to reduce the maximum SLOT_CYCLE for slaves 7 and 9 in the SCFGx > registers. No noticeable effect either. > > I then tried to split bursts from master 2 (DMAC0/IF1) with small values in the > MCFG2 register. No effect. > > I'm getting nowhere. Could it just be that you're reaching the DDR bus limit. As I said previously, when you go through the CPU, and assuming you're consuming the data directly, you have: 1/ NFC SRAM -> CPU 2/ CPU -> L1 data cache --write-back--> DRAM 3/ L1-cache -> CPU While, if you use DMA you get: 1/ NFC SRAM -> DRAM 2/ SDRAM -> L1 data cache -> CPU So, if you're approaching the limit of (LP)DDR bandwidth, using the CPU might make things a bit better. Still, if the limitation really comes from the DDR bus, my opinion is that you should maybe use a smaller resolution or use a more compact pixel format (RGB565?). Did you calculate how much of the bandwidth is taken by the HLCDC block and compared it to the max (LP)DDR bandwidth? Regards, Boris