Re: [PATCH 2/5] dmaengine: Add support for custom data mapping

From: Andy Gross <andy.gross@linaro.org>
To: Abhishek Sahu <absahu@codeaurora.org>
Cc: Vinod Koul <vinod.koul@intel.com>,
	dan.j.williams@intel.com, stanimir.varbanov@linaro.org,
	mcgrof@suse.com, okaya@codeaurora.org, pramod.gurav@linaro.org,
	arnd@arndb.de, linux-kernel@vger.kernel.org,
	dmaengine@vger.kernel.org, linux-arm-msm@vger.kernel.org
Subject: Re: [PATCH 2/5] dmaengine: Add support for custom data mapping
Date: Mon, 2 Jan 2017 10:12:33 -0600	[thread overview]
Message-ID: <20170102161233.GC17770@hector.attlocal.net> (raw)
In-Reply-To: <8055fa20b35139e7d13831583ebf4f4f@codeaurora.org>

On Mon, Jan 02, 2017 at 07:55:37PM +0530, Abhishek Sahu wrote:
> On 2016-12-29 23:24, Andy Gross wrote:
> >On Thu, Dec 22, 2016 at 01:04:37AM +0530, Abhishek Sahu wrote:
> >>On 2016-12-21 01:55, Andy Gross wrote:
> >>>On Wed, Dec 21, 2016 at 12:58:50AM +0530, Abhishek Sahu wrote:
> >>>
> >>><snip>
> >>>
> >>>>>>Okay, do you have pointer on that one, will avoid asking the same
> >>>>>>questions
> >>>>>>again.
> >>>>>
> >>>>>I'll see if I can find the correspondance and send to you directly.
> >>>>>
> >>>>>>
> >>>>>>> Ahbishek, correct me where i am wrong on the following:
> >>>>>>> So two main differences between a normal descriptor and a command descriptor:
> >>>>>>> 1) size of the descriptor
> >>>>>>> 2) the flag setting
> >>>>>>> 3) data sent in is a modified scatter gather that includes flags , vs a normal
> >>>>>>> scatter gather
> >>>>
> >>>>Top level descriptor is same for both. Only difference is Command flag.
> >>>>The
> >>>>command descriptor will contain list of register read/write instead of
> >>>>data
> >>>>address
> >>>>The peripheral driver can form the list with helper function provided in
> >>>>patch 5
> >>>>and submit it to BAM. The main issue is for other flag like EOT/NWD.
> >>>>
> >>>>The top level descriptor is again in the form of list where BAM writes
> >>>>the
> >>>>address of the list in register before starting of transfer. In this
> >>>>list,
> >>>>each element will have different flags.
> >>>
> >>>Ah that's right.  The command descriptor information is the format of the
> >>>data
> >>>pointed to by the sgl.  So you'd have some set of register read/writes
> >>>described
> >>>in those entries.
> >>>
> >>>>
> >>>>>>>
> >>>>>>> So the CMD descriptors in a given sgl can all have varying flags set? I'd assume
> >>>>>>> they all have CMD flag set.  Do the current users of the command descriptors
> >>>>>>> coalesce all of their requests into a big list?
> >>>>>>>
> >>>>
> >>>>The main user for command descriptor is currently QPIC NAND/LCD. The
> >>>>NAND
> >>>>uses
> >>>>3 BAM channels- tx, rx and command. NAND controller do the data transfer
> >>>>in
> >>>>chunk of codeword (maximum 544 bytes). NAND chip does the data transfer
> >>>>on
> >>>>page basis so each page read/write can have multiple codewords. The NAND
> >>>>driver prepares command, tx(write) or rx(read) descriptor for complete
> >>>>page
> >>>>, submit it to BAM and wait for its completion. So NAND driver coalesces
> >>>>all of their requests into a big list. In this big list,
> >>>>
> >>>>1. Some of the request for command channel requires NWD flag to be set.
> >>>
> >>>I'd expect this to occur at the end of a chain.  So if you had 5 CMD
> >>>descriptors
> >>>described in the SGL, only the last descriptor would have the NWD set.
> >>>Correct?
> >>>
> >>>>2. TX request depends upon the setting of EOT flag so some of the TX
> >>>>request
> >>>>   in complete page write will contain EOT flag and others will not. So
> >>>>this
> >>>>   custom mapping will be used for data descriptor also in NAND driver.
> >>>
> >>>Can you give a sequence description of the descriptors and flags?  I
> >>>haven't
> >>>seen the NAND documentation that describes the sequence/flow.
> >>
> >>Following is the sample list of command descriptor for page write(2K
> >>page).
> >>The actual list will contain more no of descriptor which involves
> >>spare area transfer also.
> >>
> >>Index	INT	NWD	CMD	24bit Register Address
> >>0	-	-	1	0x0000F0 (EBI2_ECC_BUF_CFG)
> >>
> >>1	-	-	1	0x000020 (NAND_DEVn_CFG0)
> >>				0x000024 (NAND_DEVn_CFG1)
> >>				0x0000F0 (EBI2_ECC_BUF_CFG)
> >>				0x00000C (NAND_FLASH_CHIP_SELECT)
> >>
> >>2	-	-	1	0x000004 (NAND_ADDR0)
> >>				0x000008 (NAND_ADDR1)
> >>
> >>3	-	1	1	0x000000 (NAND_FLASH_CMD)
> >>
> >>4	-	1	1	0x000010 (NANDC_EXEC_CMD)
> >>
> >>5	-	-	1	0x000014 (NAND_FLASH_STATUS)
> >>
> >>6	-	1	1	0x000000 (NAND_FLASH_CMD)
> >>
> >>7	-	1	1	0x000010 (NANDC_EXEC_CMD)
> >>
> >>8	-	-	1	0x000014 (NAND_FLASH_STATUS)
> >>
> >>9	-	1	1	0x000000 (NAND_FLASH_CMD)
> >>
> >>10	-	1	1	0x000010 (NANDC_EXEC_CMD)
> >>
> >>11	-	-	1	0x000014 (NAND_FLASH_STATUS)
> >>
> >>12	-	1	1	0x000000 (NAND_FLASH_CMD)
> >>
> >>13	-	1	1	0x000010 (NANDC_EXEC_CMD)
> >>
> >>14	1	-	1	0x000014 (NAND_FLASH_STATUS)
> >>
> >>15	-	-	1	0x000044 (NAND_FLASH_READ_STATUS)
> >>				0x000014 (NAND_FLASH_STATUS)
> >
> >Yeah I was expecting something like:
> >- Setup NAND controller using some command writes (indices 0-4)
> >  Loop doing the following until all the data is done:
> >  - Send/Receive the Data
> >  - Check status.
> >
> >The only one that sticks out to me is index 14.  Is the INT flag there to
> >mark
> >the actual end of the data transfer from the device?  Then you do one more
> >Status read.
> >
> This is sample list given in NAND document. INT will be set only for the
> last
> command. I checked the NAND driver in which status will be read only once
> for
> each codeword.
> >>>
> >>>>>>> So a couple of thoughts on how to deal with this:
> >>>>>>>
> >>>>>>> 1) Define a virtual channel for the command descriptors vs a normal DMA
> >>>>>>> transaction.  This lets you use the same hardware channel, but lets you discern
> >>>>>>> which descriptor format you need to use.  The only issue I see with this is the
> >>>>>>> required change in device tree binding to target the right type of channel (cmd
> >>>>>>> vs normal).
> >>>>>>
> >>>>>>Or mark the descriptor is cmd and write accordingly...
> >>>>>
> >>>>>The only issue i see with that is knowing how much to pre-allocate during
> >>>>>the
> >>>>>prep call.  The flag set API would be called on the allocated tx
> >>>>>descriptor.  So
> >>>>>you'd have to know up front and be able to specify it.
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> 2) Provide an API to set flags for the descriptor on a whole descriptor basis.
> >>>>>>>
> >>>>>>> 3) If you have a set of transactions described by an sgl that has disparate use
> >>>>>>> of flags, you split the list and use a separate transaction.  In other words, we
> >>>>>>> need to enforce that the flag set API will be applied to all descriptors
> >>>>>>> described by an sgl.  This means that the whole transaction may be comprised of
> >>>>>>> multiple async TX descriptors.
> >>>>
> >>>>Each async TX descriptor will generate the BAM interrupt so we are
> >>>>deviating
> >>>>from main purpose of DMA where ideally we should get the interrupt at
> >>>>the
> >>>>end
> >>>>of transfer. This is the main reason for going for this patch.
> >>>
> >>>If the client queues 1 descriptor or 5 descriptors, it doesn't matter that
> >>>much.
> >>>The client knows when it is done by waiting for the descriptors to
> >>>complete.  It
> >>>is less efficient than grouping them all, but it should still work.
> >>>
> >>Yes. client will wait for final descriptor completion. But these
> >>interrupts
> >>will be overhead for CPU. For 1-2 page it won't matter much I guess it
> >>will
> >>be
> >>significant for complete chip read/write(during boot and FS i.e JFFS
> >>operations).
> >>>>
> >>>>With the submitted patch, only 1 interrupt per channel is required for
> >>>>complete NAND page and it solves the setting of BAM specific flags also.
> >>>>Only issue with this patch is adding new API in DMA framework itself.
> >>>>But
> >>>>this API can be used by other DMA engines in future for which mapping
> >>>>cannot
> >>>>be done with available APIs and if this mapping is vendor specific.
> >>>
> >>>I guess the key point in all of this is that the DMA operation being done
> >>>is not
> >>>a normal data flow to/from the device.  It's direct remote register access
> >>>to
> >>>the device using address information contained in the sgl.  And you are
> >>>collating the standard data access along with the special command access
> >>>in the
> >>>same API call.
> >>Yes. Normally DMA engine (QCA ADM DMA engine also) allows to read/write
> >>memory mapped
> >>registers just like data. But BAM is different (Since it is not a global
> >>DMA
> >>Engine
> >>and coupled with peripheral). Also, this different flag requirement is
> >>not
> >>just
> >>for command descriptors but for data descriptors also.
> >>
> >>BAM data access and command access differs only with flag and register
> >>read/write
> >>list. The register read and write list will be simply array of
> >>struct bam_cmd_element added in patch
> >>struct bam_cmd_element {
> >>        __le32 addr:24;
> >>        __le32 command:8;
> >>        __le32 data;
> >>        __le32 mask;
> >>        __le32 reserved;
> >>};
> >>
> >>The address and size of the array will be passed in data and size field
> >>of
> >>SGL.
> >>If we want to form the SGL for mentioned list then we will have SGL of
> >>size
> >>15
> >>with just one descriptor.
> >>
> >>Now we require different flag for each SG entry. currently SG does not
> >>have
> >>flag parameter and we can't add flag parameter just for our requirement
> >>in
> >>generic SG. So we have added the custom mapping function and passed
> >>modified
> >>SG
> >>as parameter which is generic SG and flag.
> >
> >I really think that we need some additional API that allows for the flag
> >munging
> >for the descriptors instead of overriding the prep_slave_sg.  We already
> >needed
> >to change the way the flags are passed anyway.  And instead of building up
> >a
> >special sg list, the API should take a structure that has a 1:1 mapping of
> >the
> >flags to the descriptors.  And you would call this API on your descriptor
> >before
> >issuing it.
> >
> >So build up the sglist.  Call the prep_slave_sg.  You get back a tx
> >descriptor
> >that underneath is a bam descriptor.  Then call the API giving the
> >descriptor
> >and the structure that defines the flags for the descriptors.  Then submit
> >the
> >descriptor.
> >
> >Something like:
> >int qcom_bam_apply_descriptor_flags(struct dma_async_tx_descriptor *tx,
> >				    u16 *flags)
> >{
> >	struct bam_async_desc async_desc = container_of(tx,
> >							struct bam_async_desc,
> >							vd.tx);
> >	int i;
> >
> >	for (i = 0; i < async_desc->num_desc; i++)
> >		async_desc->desc[i].flags = cpu_to_le16(flags[i]);
> >}
> >
> >EXPORT_SYMBOL(qcom_bam_apply_descriptor_flags)
> >
> 
> We want to tightly couple the SG and its flag. But if this 1:1 mapping is
> acceptable
> in linux kernel then we can go ahead with this. It will solve our
> requirement and
> does not require any change in Linux DMA API. I will do the same and will
> submit the
> new patches.

Thanks.  Hopefully Vinod will be ok with the SoC specific EXPORT.

> 
> >This applies the flags directly to the underlying hardware descriptors.
> >The
> >prep_slave_sg call would need to remove all the flag munging.  The
> >bam_start_dma
> >would need to account for this as well by only setting the INT flag if the
> >transfer cannot get all of the descriptors in the FIFO.
> 
> It seems no major change is required in prep_slave_sg or bam_start_dma since
> it is just setting INT flag for last entry which is required for QPIC
> drivers
> also. We need to change the assignment of flag with bitwise OR assignment
> for
> last BAM desc in function bam_start_dma
> 
>         /* set any special flags on the last descriptor */
>         if (async_desc->num_desc == async_desc->xfer_len)
>                 desc[async_desc->xfer_len - 1].flags |=
>                                         cpu_to_le16(async_desc->flags);

Right.  That can be done in the start_dma directly.  That's the only function
that can really determine when the INT flag will be set (other than the last
descriptor).

Regards,

Andy