From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756248AbdABQMj (ORCPT ); Mon, 2 Jan 2017 11:12:39 -0500 Received: from mail-yw0-f171.google.com ([209.85.161.171]:36621 "EHLO mail-yw0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755641AbdABQMg (ORCPT ); Mon, 2 Jan 2017 11:12:36 -0500 Date: Mon, 2 Jan 2017 10:12:33 -0600 From: Andy Gross To: Abhishek Sahu Cc: Vinod Koul , dan.j.williams@intel.com, stanimir.varbanov@linaro.org, mcgrof@suse.com, okaya@codeaurora.org, pramod.gurav@linaro.org, arnd@arndb.de, linux-kernel@vger.kernel.org, dmaengine@vger.kernel.org, linux-arm-msm@vger.kernel.org Subject: Re: [PATCH 2/5] dmaengine: Add support for custom data mapping Message-ID: <20170102161233.GC17770@hector.attlocal.net> References: <1481795755-15302-3-git-send-email-absahu@codeaurora.org> <20161218162602.GR25795@localhost> <20161219050642.GA3047@hector.attlocal.net> <20161219154923.GT25795@localhost> <20161219175210.GB3047@hector.attlocal.net> <20161220202511.GD3047@hector.attlocal.net> <61cb961a6f448cfc48b983329e329b34@codeaurora.org> <20161229175450.GB17770@hector.attlocal.net> <8055fa20b35139e7d13831583ebf4f4f@codeaurora.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8055fa20b35139e7d13831583ebf4f4f@codeaurora.org> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 02, 2017 at 07:55:37PM +0530, Abhishek Sahu wrote: > On 2016-12-29 23:24, Andy Gross wrote: > >On Thu, Dec 22, 2016 at 01:04:37AM +0530, Abhishek Sahu wrote: > >>On 2016-12-21 01:55, Andy Gross wrote: > >>>On Wed, Dec 21, 2016 at 12:58:50AM +0530, Abhishek Sahu wrote: > >>> > >>> > >>> > >>>>>>Okay, do you have pointer on that one, will avoid asking the same > >>>>>>questions > >>>>>>again. > >>>>> > >>>>>I'll see if I can find the correspondance and send to you directly. > >>>>> > >>>>>> > >>>>>>> Ahbishek, correct me where i am wrong on the following: > >>>>>>> So two main differences between a normal descriptor and a command descriptor: > >>>>>>> 1) size of the descriptor > >>>>>>> 2) the flag setting > >>>>>>> 3) data sent in is a modified scatter gather that includes flags , vs a normal > >>>>>>> scatter gather > >>>> > >>>>Top level descriptor is same for both. Only difference is Command flag. > >>>>The > >>>>command descriptor will contain list of register read/write instead of > >>>>data > >>>>address > >>>>The peripheral driver can form the list with helper function provided in > >>>>patch 5 > >>>>and submit it to BAM. The main issue is for other flag like EOT/NWD. > >>>> > >>>>The top level descriptor is again in the form of list where BAM writes > >>>>the > >>>>address of the list in register before starting of transfer. In this > >>>>list, > >>>>each element will have different flags. > >>> > >>>Ah that's right. The command descriptor information is the format of the > >>>data > >>>pointed to by the sgl. So you'd have some set of register read/writes > >>>described > >>>in those entries. > >>> > >>>> > >>>>>>> > >>>>>>> So the CMD descriptors in a given sgl can all have varying flags set? I'd assume > >>>>>>> they all have CMD flag set. Do the current users of the command descriptors > >>>>>>> coalesce all of their requests into a big list? > >>>>>>> > >>>> > >>>>The main user for command descriptor is currently QPIC NAND/LCD. The > >>>>NAND > >>>>uses > >>>>3 BAM channels- tx, rx and command. NAND controller do the data transfer > >>>>in > >>>>chunk of codeword (maximum 544 bytes). NAND chip does the data transfer > >>>>on > >>>>page basis so each page read/write can have multiple codewords. The NAND > >>>>driver prepares command, tx(write) or rx(read) descriptor for complete > >>>>page > >>>>, submit it to BAM and wait for its completion. So NAND driver coalesces > >>>>all of their requests into a big list. In this big list, > >>>> > >>>>1. Some of the request for command channel requires NWD flag to be set. > >>> > >>>I'd expect this to occur at the end of a chain. So if you had 5 CMD > >>>descriptors > >>>described in the SGL, only the last descriptor would have the NWD set. > >>>Correct? > >>> > >>>>2. TX request depends upon the setting of EOT flag so some of the TX > >>>>request > >>>> in complete page write will contain EOT flag and others will not. So > >>>>this > >>>> custom mapping will be used for data descriptor also in NAND driver. > >>> > >>>Can you give a sequence description of the descriptors and flags? I > >>>haven't > >>>seen the NAND documentation that describes the sequence/flow. > >> > >>Following is the sample list of command descriptor for page write(2K > >>page). > >>The actual list will contain more no of descriptor which involves > >>spare area transfer also. > >> > >>Index INT NWD CMD 24bit Register Address > >>0 - - 1 0x0000F0 (EBI2_ECC_BUF_CFG) > >> > >>1 - - 1 0x000020 (NAND_DEVn_CFG0) > >> 0x000024 (NAND_DEVn_CFG1) > >> 0x0000F0 (EBI2_ECC_BUF_CFG) > >> 0x00000C (NAND_FLASH_CHIP_SELECT) > >> > >>2 - - 1 0x000004 (NAND_ADDR0) > >> 0x000008 (NAND_ADDR1) > >> > >>3 - 1 1 0x000000 (NAND_FLASH_CMD) > >> > >>4 - 1 1 0x000010 (NANDC_EXEC_CMD) > >> > >>5 - - 1 0x000014 (NAND_FLASH_STATUS) > >> > >>6 - 1 1 0x000000 (NAND_FLASH_CMD) > >> > >>7 - 1 1 0x000010 (NANDC_EXEC_CMD) > >> > >>8 - - 1 0x000014 (NAND_FLASH_STATUS) > >> > >>9 - 1 1 0x000000 (NAND_FLASH_CMD) > >> > >>10 - 1 1 0x000010 (NANDC_EXEC_CMD) > >> > >>11 - - 1 0x000014 (NAND_FLASH_STATUS) > >> > >>12 - 1 1 0x000000 (NAND_FLASH_CMD) > >> > >>13 - 1 1 0x000010 (NANDC_EXEC_CMD) > >> > >>14 1 - 1 0x000014 (NAND_FLASH_STATUS) > >> > >>15 - - 1 0x000044 (NAND_FLASH_READ_STATUS) > >> 0x000014 (NAND_FLASH_STATUS) > > > >Yeah I was expecting something like: > >- Setup NAND controller using some command writes (indices 0-4) > > Loop doing the following until all the data is done: > > - Send/Receive the Data > > - Check status. > > > >The only one that sticks out to me is index 14. Is the INT flag there to > >mark > >the actual end of the data transfer from the device? Then you do one more > >Status read. > > > This is sample list given in NAND document. INT will be set only for the > last > command. I checked the NAND driver in which status will be read only once > for > each codeword. > >>> > >>>>>>> So a couple of thoughts on how to deal with this: > >>>>>>> > >>>>>>> 1) Define a virtual channel for the command descriptors vs a normal DMA > >>>>>>> transaction. This lets you use the same hardware channel, but lets you discern > >>>>>>> which descriptor format you need to use. The only issue I see with this is the > >>>>>>> required change in device tree binding to target the right type of channel (cmd > >>>>>>> vs normal). > >>>>>> > >>>>>>Or mark the descriptor is cmd and write accordingly... > >>>>> > >>>>>The only issue i see with that is knowing how much to pre-allocate during > >>>>>the > >>>>>prep call. The flag set API would be called on the allocated tx > >>>>>descriptor. So > >>>>>you'd have to know up front and be able to specify it. > >>>>> > >>>>>> > >>>>>>> > >>>>>>> 2) Provide an API to set flags for the descriptor on a whole descriptor basis. > >>>>>>> > >>>>>>> 3) If you have a set of transactions described by an sgl that has disparate use > >>>>>>> of flags, you split the list and use a separate transaction. In other words, we > >>>>>>> need to enforce that the flag set API will be applied to all descriptors > >>>>>>> described by an sgl. This means that the whole transaction may be comprised of > >>>>>>> multiple async TX descriptors. > >>>> > >>>>Each async TX descriptor will generate the BAM interrupt so we are > >>>>deviating > >>>>from main purpose of DMA where ideally we should get the interrupt at > >>>>the > >>>>end > >>>>of transfer. This is the main reason for going for this patch. > >>> > >>>If the client queues 1 descriptor or 5 descriptors, it doesn't matter that > >>>much. > >>>The client knows when it is done by waiting for the descriptors to > >>>complete. It > >>>is less efficient than grouping them all, but it should still work. > >>> > >>Yes. client will wait for final descriptor completion. But these > >>interrupts > >>will be overhead for CPU. For 1-2 page it won't matter much I guess it > >>will > >>be > >>significant for complete chip read/write(during boot and FS i.e JFFS > >>operations). > >>>> > >>>>With the submitted patch, only 1 interrupt per channel is required for > >>>>complete NAND page and it solves the setting of BAM specific flags also. > >>>>Only issue with this patch is adding new API in DMA framework itself. > >>>>But > >>>>this API can be used by other DMA engines in future for which mapping > >>>>cannot > >>>>be done with available APIs and if this mapping is vendor specific. > >>> > >>>I guess the key point in all of this is that the DMA operation being done > >>>is not > >>>a normal data flow to/from the device. It's direct remote register access > >>>to > >>>the device using address information contained in the sgl. And you are > >>>collating the standard data access along with the special command access > >>>in the > >>>same API call. > >>Yes. Normally DMA engine (QCA ADM DMA engine also) allows to read/write > >>memory mapped > >>registers just like data. But BAM is different (Since it is not a global > >>DMA > >>Engine > >>and coupled with peripheral). Also, this different flag requirement is > >>not > >>just > >>for command descriptors but for data descriptors also. > >> > >>BAM data access and command access differs only with flag and register > >>read/write > >>list. The register read and write list will be simply array of > >>struct bam_cmd_element added in patch > >>struct bam_cmd_element { > >> __le32 addr:24; > >> __le32 command:8; > >> __le32 data; > >> __le32 mask; > >> __le32 reserved; > >>}; > >> > >>The address and size of the array will be passed in data and size field > >>of > >>SGL. > >>If we want to form the SGL for mentioned list then we will have SGL of > >>size > >>15 > >>with just one descriptor. > >> > >>Now we require different flag for each SG entry. currently SG does not > >>have > >>flag parameter and we can't add flag parameter just for our requirement > >>in > >>generic SG. So we have added the custom mapping function and passed > >>modified > >>SG > >>as parameter which is generic SG and flag. > > > >I really think that we need some additional API that allows for the flag > >munging > >for the descriptors instead of overriding the prep_slave_sg. We already > >needed > >to change the way the flags are passed anyway. And instead of building up > >a > >special sg list, the API should take a structure that has a 1:1 mapping of > >the > >flags to the descriptors. And you would call this API on your descriptor > >before > >issuing it. > > > >So build up the sglist. Call the prep_slave_sg. You get back a tx > >descriptor > >that underneath is a bam descriptor. Then call the API giving the > >descriptor > >and the structure that defines the flags for the descriptors. Then submit > >the > >descriptor. > > > >Something like: > >int qcom_bam_apply_descriptor_flags(struct dma_async_tx_descriptor *tx, > > u16 *flags) > >{ > > struct bam_async_desc async_desc = container_of(tx, > > struct bam_async_desc, > > vd.tx); > > int i; > > > > for (i = 0; i < async_desc->num_desc; i++) > > async_desc->desc[i].flags = cpu_to_le16(flags[i]); > >} > > > >EXPORT_SYMBOL(qcom_bam_apply_descriptor_flags) > > > > We want to tightly couple the SG and its flag. But if this 1:1 mapping is > acceptable > in linux kernel then we can go ahead with this. It will solve our > requirement and > does not require any change in Linux DMA API. I will do the same and will > submit the > new patches. Thanks. Hopefully Vinod will be ok with the SoC specific EXPORT. > > >This applies the flags directly to the underlying hardware descriptors. > >The > >prep_slave_sg call would need to remove all the flag munging. The > >bam_start_dma > >would need to account for this as well by only setting the INT flag if the > >transfer cannot get all of the descriptors in the FIFO. > > It seems no major change is required in prep_slave_sg or bam_start_dma since > it is just setting INT flag for last entry which is required for QPIC > drivers > also. We need to change the assignment of flag with bitwise OR assignment > for > last BAM desc in function bam_start_dma > > /* set any special flags on the last descriptor */ > if (async_desc->num_desc == async_desc->xfer_len) > desc[async_desc->xfer_len - 1].flags |= > cpu_to_le16(async_desc->flags); Right. That can be done in the start_dma directly. That's the only function that can really determine when the INT flag will be set (other than the last descriptor). Regards, Andy