All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eran Liberty <liberty@extricom.com>
To: Eran Liberty <liberty@extricom.com>
Cc: linuxppc-dev@ozlabs.org, linux-pci@vger.kernel.org
Subject: Re: Freescale P2020 CPU Freeze over PCIe abort signal
Date: Tue, 19 Oct 2010 18:53:58 +0200	[thread overview]
Message-ID: <4CBDCD26.2080004@extricom.com> (raw)
In-Reply-To: <4CBC8B40.4060706@extricom.com>

Eran Liberty wrote:
> Eran Liberty wrote:
>> This should probably go to the Freescale support, as it feels like a 
>> hardware issue yet the end result is a very frozen Linux kernel so I 
>> post here first...
>>
>> I have a programmable FPGA PCIe device connected to a Freescale's 
>> P2020 PCIe port. As part of the bring-up tests, we are testing two 
>> faulty scenarios:
>> 1. The FPGA totally ignores the PCIe transaction.
>> 2. The FPGA return a transaction abort.
>>
>> Both are plausible PCIe behavior and their should be outcome is 
>> documented in the PCIe spec. The first should be terminated by the 
>> transaction requestor timeout mechanism and raise an error, the 
>> second should abort the transaction and raise and error.
>>
>> In P2020 if I do any of those the CPU is left hung over the transaction.
>>
>> something like:
>> in_le32(addr)
>>
>> is turned into:
>> 7c 00 04 ac     sync   7c 00 4c 2c     lwbrx   r0,0,r9
>> 0c 00 00 00     twi     0,r0,0
>> 4c 00 01 2c     isync
>>
>> assembly code, where in r9 (in this example) hold an address which is 
>> physically mapped into the PCIe resource space.
>>
>> The CPU will hang over the load instruction.
>>
>> Just for the fun of it, I have wrote my own assembly function 
>> omitting everything but the load instruction; still freeze.
>> Replace "lwbrx" with a simple "lwz"; still freeze.
>>
>> It looks like the CPU snoozes till the PCIe transaction is done with 
>> no timeouts, ignoring any abort signal.
>>
>> I am going to:
>> A. Try to reach the Freescale support.
>> B. Asked the FPGA designed to give me a new behavior that will stall 
>> the PCIe transaction replay for 10 sec, but after those return ok.
>> C. report back here with either A or B.
>>
>> If you have any ideas I would love to hear them.
>>
>> -- Liberty
>>
> Some more info:
>
> As said the the FPGA designer provided me a PCIe device that will 
> stall its response to a variable amount of time. The CPU became 
> un-frozen after this amount of time. More over, we have found that in 
> that period till it un-froze the PCIe core did a retry to that 
> transaction over and over every 40 ms. This gave me the bright idea to 
> look for the word "retry" in the Freescale documentation which 
> rewarded me with these registers:
>
> ------------------------------------------------------- snip 
> -------------------------------------------------------
> 16.3.2.3        PCI Express Outbound Completion Timeout Register
>                (PEX_OTB_CPL_TOR)
> The PCI Express outbound completion timeout register, shown in Figure 
> 16-4, contains the maximum wait
> time for a response to come back as a result of an outbound non-posted 
> request before a timeout condition
> occurs.
> Offset 
> 0x00C                                                                                                
> Access: Read/Write
>         0   1              5     7   
> 8                                                                                      
> 31
>     R
>        TD            
> —                                                            TC
>     W
> Reset 0     0  0  0   0   0   0  0   0   0   0   1    0  0   0  0    
> 1   1  1    1   1  1   1   1   1  1   1   1  1  1   1  1
>            Figure 16-4. PCI Express Outbound Completion Timeout 
> Register (PEX_OTB_CPL_TOR)
> Table 16-6 describes the PCI Express outbound completion timeout 
> register fields.
>                                 Table 16-6. PEX_OTB_CPL_TOR Field 
> Descriptions
>  Bits     Name                                                     
> Description
>   0        TD     Timeout disable. This bit controls the 
> enabling/disabling of the timeout function.
>                   0 Enable completion timeout
>                   1 Disable completion timeout
>  1–7        —     Reserved
> 8–31       TC     Timeout counter. This is the value that is used to 
> load the response counter of the completion timeout.
>                   One TC unit is 8× the PCI Express controller clock 
> period; that is, one TC unit is 20 ns at 400 MHz, and 30
>                   ns at 266.66 MHz.
>                   The following are examples of timeout periods based 
> on different TC settings:
>                   0x00_0000 Reserved
>                   0x10_FFFF 22.28 ms at 400 MHz controller clock; 
> 33.34 ms at 266.66 MHz controller clock
>                   0xFF_FFFF 335.54 ms at 400 MHz controller clock; 
> 503.31 ms at 266.66 MHz controller clock
>
>
> 16.3.2.4       PCI Express Configuration Retry Timeout Register
>               (PEX_CONF_RTY_TOR)
> The PCI Express configuration retry timeout register, shown in Figure 
> 16-5, contains the maximum time
> period during which retries of configuration transactions which 
> resulted in a CRS response occur.
> Offset 
> 0x010                                                                               
> Access: Read/Write
>         0  1     3   
> 4                                                                                     
> 31
>     R
>        RD     —                                                 TC
>     W
> Reset 0    0  0  0  0   1  0  0  0  0   0  0   0  0  0  0   1  1  1  
> 1  1  1   1   1 1 1   1  1   1 1   1  1
>           Figure 16-5. PCI Express Configuration Retry Timeout 
> Register (PEX_CONF_RTY_TOR)
>                            QorIQ P2020 Integrated Processor Reference 
> Manual, Rev. 0
> 16-12                                                                                   
> Freescale Semiconductor
>                                                                                                 
> PCI Express Interface Controller
> Table 16-7 describes the PCI Express configuration retry timeout 
> register fields.
>                            Table 16-7. PEX_CONF_RTY_TOR Field 
> Descriptions
>  Bits  Name                                                     
> Description
>   0     RD    Retry disable. This bit disables the retry of a 
> configuration transaction that receives a CRS status response
>               packet.
>               0 Enable retry of a configuration transaction in 
> response to receiving a CRS status response until the timeout
>                  counter (defined by the PEX_CONF_RTY_TOR[TC] field) 
> has expired.
>               1 Disable retry of a configuration transaction 
> regardless of receiving a CRS status response.
>  1–3     —    Reserved
> 4–31    TC    Timeout counter. This is the value that is used to load 
> the CRS response counter.
>               One TC unit is 8× the PCI Express controller clock 
> period; that is, one TC unit is 20 ns at 400 MHz and 30 ns
>               at 266.66 MHz.
>               Timeout period based on different TC settings:
>               0x000_0000        Reserved
>               0x400_FFFF        1.34 s at 400 MHz controller clock, 
> 2.02 s at 266.66 MHz controller clock
>               0xFFF_FFFF        5.37 s at 400 MHz controller clock, 
> 8.05 s at 266.66 MHz controller clock
> ------------------------------------------------------- snap 
> -------------------------------------------------------
>
> Now this is all nice on the paper, but what the P2020 seems to be 
> doing in reality is
> 1. never expire
> 2. do re-tries even in the non configuration access
>
> I am going to try to disable completion timeout and see if I get 
> better behavior.
>
> -- Liberty
>
>
Disabling PEX_OTB_CPL_TOR,  PEX_CONF_RTY_TOR, or both yields the same 
behavior. The kernel freezes over the load command while the underlying 
hardware does PCIe transaction retries to infinity and beyond.

-- Liberty

  reply	other threads:[~2010-10-19 17:05 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-07 12:30 Freescale P2020 / 85xx PCIe and Advance Error Reporting (AER) service problem Eran Liberty
2010-10-07 14:42 ` Kumar Gala
2010-10-10 10:02   ` Eran Liberty
2010-10-11  0:19 ` Benjamin Herrenschmidt
2010-10-11 10:21   ` Eran Liberty
2010-10-11 11:32     ` Benjamin Herrenschmidt
2010-10-17 19:24       ` Freescale P2020 CPU Freeze over PCIe abort signal Eran Liberty
2010-10-18  5:26         ` Bin Meng
2010-10-18  9:52         ` tiejun.chen
2010-10-18 11:44           ` Eran Liberty
2010-10-18 18:00         ` Eran Liberty
2010-10-19 16:53           ` Eran Liberty [this message]
2013-01-23 17:41 siva kumar
2013-01-23 21:40 ` Scott Wood
2013-01-24 11:53   ` siva kumar
2013-01-25  1:03     ` Scott Wood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CBDCD26.2080004@extricom.com \
    --to=liberty@extricom.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.