Re: [PATCH v1 3/7] spi: qspi-tegra: Add support for Tegra210 QSPI controller

From: Sowjanya Komatineni <skomatineni@nvidia.com>
To: Mark Brown <broonie@kernel.org>
Cc: <thierry.reding@gmail.com>, <jonathanh@nvidia.com>,
	<robh+dt@kernel.org>, <linux-spi@vger.kernel.org>,
	<linux-tegra@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<devicetree@vger.kernel.org>
Subject: Re: [PATCH v1 3/7] spi: qspi-tegra: Add support for Tegra210 QSPI controller
Date: Fri, 4 Dec 2020 13:04:46 -0800	[thread overview]
Message-ID: <df8d6f25-c8cc-3b41-e4df-8e26c9b93475@nvidia.com> (raw)
In-Reply-To: <20201204185223.GF4558@sirena.org.uk>

On 12/4/20 10:52 AM, Mark Brown wrote:
> On Thu, Dec 03, 2020 at 04:22:54PM -0800, Sowjanya Komatineni wrote:
>> On 12/2/20 11:17 AM, Sowjanya Komatineni wrote:
>>>> It seems weird that this device needs us to do a memcpy() to do DMA,
>>>> most devices are able to DMA directly from the buffers provided by the
>>>> SPI API (and let the SPI core sync things).  What is going on here?
>>> For transfers of size more than max DMA transfer limit, data transfer
>>> happens in multiple iterations with each iteration transferring up to
>>> max DMA transfer limit.
>>> So using separate dma buffers and on every iteration copying them to SPI
>>> core provided tx/rx buffers.
> I don't understand this - there's no restriction on where DMA transfers
> can be done from within a DMA mapped region, the driver can do multiple
> transfers from different chunks of the source buffer without having to
> copy anything.  That's a very common scenario.
>
>> Also unpack mode needs to manually put the bytes together from read data to
>> SPI core rx buffer.
> Could you be more explicit here, I don't know what "unpack mode" is?

Tegra SPI/QSPI controller support packed mode and unpacked mode based on 
bits per word in a transfer.

Packed Mode: When enabled, all 32-bits of data in FIFO contains valid 
data packets of 8-bit/16-bit/32-bit length.

Non packed mode: For transfers like 24-bit data for example we disable 
packed mode and only 24-bits of FIFO data are valid and other bits are 0's.
So during TX for FIFO filling and during receive when FIFO data is read, 
SW need to skip invalid bits and should align order from/to SPI core 
tx/rx buffers.

>
>>>> This is worrying, the client device might be confused if /CS is doing
>>>> things outside of the standard handling.
>>> Do you mean to honor spi_transfer cs_change flag?
> At least, yes - more generally just if there's any feature to with chip
> select then the driver will need to open code it.  The driver should at
> least be double checking that what it's doing matches what it was told
> to do, though just letting this be handled by the generic code if
> there's no limitation on the hardware tends to be easier all round.
OK Will honor cs_change flag at end of transfer and will program CS 
state based on that.
>
>>> Tegra QSPI is master and is used only with QSPI flash devices. Looking
>>> at SPI NOR driver, I see QSPI Flash commands are executed with one flash
>>> command per spi_message and I dont see cs_change flag usage w.r.t QSPI
>>> flash. So, using SW based CS control for QSPI.
>>> Please correct me if I miss something to understand here.
> Someone might build a system that does something different, they may see
> a spare SPI controller and decide they can use it for something else or
> there may be some future change with the flash code which does something
> different.
>
>>>>> +    tegra_qspi_writel(tqspi, tqspi->def_command2_reg, QSPI_COMMAND2);
>>>> The setup for one device shouldn't be able to affect the operation of
>>>> another, already running, device so either these need to be configured
>>>> as part of the controller probe or these configurations need to be
>>>> deferred until we're actually doing a transfer.
>>> We will only have 1 device on QSPI as we only support single chip select.
> It's quite common for people to do things like add additional devices
> with GPIO chip selects.
Will move tap delay programming to happen during spi transfer..
>
>>>>> +    /*
>>>>> +     * Tegra QSPI hardware support dummy bytes transfer based on the
>>>>> +     * programmed dummy clock cyles in QSPI register.
>>>>> +     * So, get the total dummy bytes from the dummy bytes transfer in
>>>>> +     * spi_messages and convert to dummy clock cyles.
>>>>> +     */
>>>>> +    list_for_each_entry(xfer, &msg->transfers, transfer_list) {
>>>>> +        if (ntransfers == DUMMY_BYTES_XFER &&
>>>>> +            !(list_is_last(&xfer->transfer_list, &msg->transfers)))
>>>>> +            dummy_cycles = xfer->len * 8 / xfer->tx_nbits;
>>>>> +        ntransfers++;
>>>>> +    }
>>>> This seems weird, there's some hard coded assumption about particular
>>>> patterns that the client device is going to send.  What's going on here?
>>>> I don't really understand what this is trying to do.
>>> QSPI flash needs dummy cycles for data read operation which is actually
>>> the initial read latency and no. of dummy cycles required are vendor
>>> specific.
>>> SPI NOR driver gets required dummy cycles based on mode clock cycles and
>>> wait state clock cycles.
>>> During read operations, spi_nor_spimem_read_data() converts dummy cycles
>>> to number of dummy bytes.
>>> Tegra QSPI controller supports dummy clock cycles register and when
>>> programmed QSPI controller sends dummy bytes rather than SW handling
>>> extra cycles for transferring dummy bytes.
>>> Above equation converts this dummy bytes back to dummy clock cycles to
>>> program into QSPI register and avoid manual SW transfer of dummy bytes.
> This is not a good idea, attempting to reverse engineer the message and
> guess at the contents isn't going to be robust and if it's useful it
> will if nothing else lead to a bunch of duplicated code in drivers as
> every device that has this feature will need to reimplment it.  Instead
> we should extend the framework so there's explicit support for
> specifying transfers that are padding bytes, then there's no guesswork
> that can go wrong and no duplicated code between drivers.  A flag in the
> transfer struct might work?

As per QSPI spec, Dummy bytes for initial read latency are always FF's. 
So its not like guessing the contents.

Tegra QSPI controller HW supports transferring dummy bytes (sending FF's 
after address) based on dummy clock cycles programmed.

To allow Tegra QSPI HW transfer dummy bytes directly, controller driver 
need number of dummy bytes / actual dummy clock cycles which core driver 
gets from flash sfdp read data.

So, we can add flag to transfer and based on this flag if controller HW 
supports then we can ignore filling dummy bytes in spi_mem_exec_op but 
controller driver still needs dummy_cycles value. So probably along with 
flag, do you agree to have dummy_cycle as part of transfer struct which 
can be set to nor->read_dummy value?

Thanks

Sowjanya