All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shawn Anastasio <shawn@anastas.io>
To: Alexey Kardashevskiy <aik@ozlabs.ru>, linuxppc-dev@lists.ozlabs.org
Cc: Sam Bobroff <sbobroff@linux.ibm.com>,
	Alistair Popple <alistair@popple.id.au>,
	Oliver O'Halloran <oohall@gmail.com>,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59
Date: Tue, 18 Jun 2019 02:00:06 -0500	[thread overview]
Message-ID: <553b21c2-c57f-2b27-3210-a957158d22dc@anastas.io> (raw)
In-Reply-To: <d4a8d06e-aa5b-dab7-4b20-d1aa77b5304a@ozlabs.ru>

On 6/18/19 1:39 AM, Alexey Kardashevskiy wrote:
> 
> 
> On 18/06/2019 14:26, Shawn Anastasio wrote:
>> On 6/12/19 2:15 PM, Shawn Anastasio wrote:
>>> On 6/12/19 2:07 AM, Alexey Kardashevskiy wrote:
>>>>
>>>>
>>>> On 12/06/2019 15:05, Shawn Anastasio wrote:
>>>>> On 6/5/19 11:11 PM, Shawn Anastasio wrote:
>>>>>> On 5/30/19 2:03 AM, Alexey Kardashevskiy wrote:
>>>>>>> This is an attempt to allow DMA masks between 32..59 which are not
>>>>>>> large
>>>>>>> enough to use either a PHB3 bypass mode or a sketchy bypass.
>>>>>>> Depending
>>>>>>> on the max order, up to 40 is usually available.
>>>>>>>
>>>>>>>
>>>>>>> This is based on v5.2-rc2.
>>>>>>>
>>>>>>> Please comment. Thanks.
>>>>>>
>>>>>> I have tested this patch set with an AMD GPU that's limited to <64bit
>>>>>> DMA (I believe it's 40 or 42 bit). It successfully allows the card to
>>>>>> operate without falling back to 32-bit DMA mode as it does without
>>>>>> the patches.
>>>>>>
>>>>>> Relevant kernel log message:
>>>>>> ```
>>>>>> [    0.311211] pci 0033:01     : [PE# 00] Enabling 64-bit DMA bypass
>>>>>> ```
>>>>>>
>>>>>> Tested-by: Shawn Anastasio <shawn@anastas.io>
>>>>>
>>>>> After a few days of further testing, I've started to run into stability
>>>>> issues with the patch applied and used with an AMD GPU. Specifically,
>>>>> the system sometimes spontaneously crashes. Not just EEH errors either,
>>>>> the whole system shuts down in what looks like a checkstop.
>>>>>
>>>>> Perhaps some subtle corruption is occurring?
>>>>
>>>> Have you tried this?
>>>>
>>>> https://patchwork.ozlabs.org/patch/1113506/
>>>
>>> I have not. I'll give it a shot and try it out for a few days to see
>>> if I'm able to reproduce the crashes.
>>
>> A few days later and I was able to reproduce the checkstop while
>> watching a video in mpv. At this point the system had ~4 day
>> uptime and this wasn't the first video I watched during that time.
>>
>> This is with https://patchwork.ozlabs.org/patch/1113506/ applied, too.
> 
> 
> Any logs left? What was the reason for the checkstop and what is the
> hardware? "lscpu" and "lspci -vv" for the starter would help. Thanks,

The machine is a Talos II with 2x 8 core DD2.2 Sforza modules.
I've added the output of lscpu and lspci below. As for logs,
it doesn't seem there are any kernel logs of the event.
The opal-gard utility shows some error records which I have
also included below.

opal-gard:
```
$ sudo ./opal-gard show 1
Record ID:    0x00000001
========================
Error ID:     0x9000000b
Error Type:   Fatal (0xe3)
Path Type: physical
 >Sys, Instance #0
  >Node, Instance #0
   >Proc, Instance #1
    >EQ, Instance #0
     >EX, Instance #0

$ sudo ./opal-gard show 2
Record ID:    0x00000002
========================
Error ID:     0x90000021
Error Type:   Fatal (0xe3)
Path Type: physical
 >Sys, Instance #0
  >Node, Instance #0
   >Proc, Instance #1
    >EQ, Instance #2
     >EX, Instance #1

```

lscpu:
```
Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              52
On-line CPU(s) list: 0-3,8-31,36-47,52-63
Thread(s) per core:  4
Core(s) per socket:  6
Socket(s):           2
NUMA node(s):        2
Model:               2.2 (pvr 004e 1202)
Model name:          POWER9, altivec supported
CPU max MHz:         3800.0000
CPU min MHz:         2154.0000
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            10240K
NUMA node0 CPU(s):   0-3,8-31
NUMA node8 CPU(s):   36-47,52-63

```

lspci -vv:
Output at: https://upaste.anastas.io/IwVQzt

      reply	other threads:[~2019-06-18  7:04 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-30  7:03 [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59 Alexey Kardashevskiy
2019-05-30  7:03 ` [PATCH kernel v3 1/3] powerpc/iommu: Allow bypass-only for DMA Alexey Kardashevskiy
2019-06-03  2:03   ` David Gibson
2019-05-30  7:03 ` [PATCH kernel v3 2/3] powerpc/powernv/ioda2: Allocate TCE table levels on demand for default DMA window Alexey Kardashevskiy
2019-07-08  7:01   ` alistair
2019-05-30  7:03 ` [PATCH kernel v3 3/3] powerpc/powernv/ioda2: Create bigger default window with 64k IOMMU pages Alexey Kardashevskiy
2019-07-08  7:01   ` alistair
2019-07-09  0:57     ` Alexey Kardashevskiy
2019-06-06  4:11 ` [PATCH kernel v3 0/3] powerpc/ioda2: Yet another attempt to allow DMA masks between 32 and 59 Shawn Anastasio
2019-06-06  7:17   ` Alistair Popple
2019-06-06 12:07     ` Oliver
2019-06-07  1:41       ` Alistair Popple
2019-06-10  5:19         ` Alexey Kardashevskiy
2019-06-12  5:05   ` Shawn Anastasio
2019-06-12  6:16     ` Oliver O'Halloran
2019-06-12 19:14       ` Shawn Anastasio
2019-06-12  7:07     ` Alexey Kardashevskiy
2019-06-12 19:15       ` Shawn Anastasio
2019-06-18  4:26         ` Shawn Anastasio
2019-06-18  6:39           ` Alexey Kardashevskiy
2019-06-18  7:00             ` Shawn Anastasio [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=553b21c2-c57f-2b27-3210-a957158d22dc@anastas.io \
    --to=shawn@anastas.io \
    --cc=aik@ozlabs.ru \
    --cc=alistair@popple.id.au \
    --cc=david@gibson.dropbear.id.au \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=oohall@gmail.com \
    --cc=sbobroff@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.