linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Will Deacon <will.deacon@arm.com>
To: "Koenig, Christian" <Christian.Koenig@amd.com>
Cc: "Carsten Haitzler" <Carsten.Haitzler@arm.com>,
	"Ard Biesheuvel" <ard.biesheuvel@linaro.org>,
	"David Airlie" <airlied@linux.ie>,
	mpe@ellerman.id.au, "Michel Dänzer" <michel@daenzer.net>,
	"Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	"Huang, Ray" <Ray.Huang@amd.com>,
	benh@kernel.crashing.org, "Zhang, Jerry" <Jerry.Zhang@amd.com>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	"Bernhard Rosenkränzer" <Bernhard.Rosenkranzer@linaro.org>
Subject: Re: [RFC PATCH] drm/ttm: force cached mappings for system RAM on ARM
Date: Mon, 14 Jan 2019 19:35:48 +0000	[thread overview]
Message-ID: <20190114193548.GB29600@fuggles.cambridge.arm.com> (raw)
In-Reply-To: <a6d9d0ad-d1c0-9e60-17c6-a42f7bde258c@amd.com>

[+ BenH and MPE]

On Mon, Jan 14, 2019 at 07:21:08PM +0000, Koenig, Christian wrote:
> Am 14.01.19 um 20:13 schrieb Will Deacon:
> > On Mon, Jan 14, 2019 at 07:07:54PM +0000, Koenig, Christian wrote:
> >> Am 14.01.19 um 18:32 schrieb Ard Biesheuvel:
> >>              - The reason remapping the CPU side as cacheable does work (which I
> >>              did test) is because the GPU's uncacheable accesses (which I assume
> >>              are made using the NoSnoop PCIe transaction attribute) are actually
> >>              emitted as cacheable in some cases.
> >>                 . On my AMD Seattle, with or without SMMU (which is stage 2 only), I
> >>              must use cacheable accesses from the CPU side or things are broken.
> >>              This might be a h/w flaw, though.
> >>                 . On systems with stage 1+2 SMMUs, the driver uses stage 1
> >>              translations which always override the memory attributes to cacheable
> >>              for DMA coherent devices. This is what is affecting the Cavium
> >>              ThunderX2 (although it appears the attributes emitted by the RC may be
> >>              incorrect as well.)
> >>
> >>              The latter issue is a shortcoming in the SMMU driver that we have to
> >>              fix, i.e., it should take care not to modify the incoming attributes
> >>              of DMA coherent PCIe devices for NoSnoop to be able to work.
> >>
> >>              So in summary, the mismatch appears to be between the CPU accessing
> >>              the vmap region with non-cacheable attributes and the GPU accessing
> >>              the same memory with cacheable attributes, resulting in a loss of
> >>              coherency and lots of visible corruption.
> >>
> >>          Actually it is the other way around. The CPU thinks some data is in the
> >>          cache and the GPU only updates the system memory version because the
> >>          snoop flag is not set.
> >>
> >>
> >>      That doesn't seem to be what is happening. As far as we can tell from
> >>      our experiments, all inbound transactions are always cacheable, and so
> >>      the only way to make things work is to ensure that the CPU uses the
> >>      same attributes.
> >>
> >>
> >> Ok that doesn't make any sense. If inbound transactions are cacheable or not is
> >> irrelevant when the CPU always uses uncached accesses.
> >>
> >> See on the PCIe side you have the snoop bit in the read/write transactions
> >> which tells the root hub if the device wants to snoop caches or not.
> >>
> >> When the CPU accesses some memory as cached then devices need to snoop the
> >> cache for coherent accesses.
> >>
> >> When the CPU accesses some memory as uncached then devices can disable snooping
> >> to improve performance, but when they don't do this it is mandated by the spec
> >> that this still works.
> > Which spec?
> 
> The PCIe spec. The snoop bit (or rather the NoSnoop) in the transaction 
> is perfectly optional IIRC.

Thanks for the clarification. I suspect the devil is in the details, so I'll
try to dig up the spec.

> > The Arm architecture (and others including Power afaiu) doesn't
> > guarantee coherency when memory is accessed using mismatched cacheability
> > attributes.
> 
> Well what exactly goes wrong on ARM?

Coherency (and any ordering guarantees) can be lost, so the device may see a
stale copy of the memory it is accessing. The architecture requires cache
maintenance to restore coherency between the mismatched aliases.

> As far as I know Power doesn't really supports un-cached memory at all, 
> except for a very very old and odd configuration with AGP.

Hopefully Michael/Ben can elaborate here, but I was under the (possibly
mistaken) impression that mismatched attributes could cause a machine-check
on Power.

> I mean in theory I agree that devices should use matching cacheability 
> attributes, but in practice I know of quite a bunch of devices/engines 
> which fails to do this correctly.

Given that the experiences of Ard and I so far has been that the system
ends up making everything cacheable after the RC, perhaps that's an attempt
by system designers to correct for these devices. Unfortunately, it doesn't
help if the CPU carefully goes ahead and establishes a non-cacheable mapping
for itself!

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2019-01-14 19:35 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-10  7:28 [RFC PATCH] drm/ttm: force cached mappings for system RAM on ARM Ard Biesheuvel
2019-01-10  8:36 ` Zhang, Jerry(Junwei)
2019-01-10  8:36 ` Koenig, Christian
2019-01-10  9:34 ` Michel Dänzer
2019-01-14 10:53   ` Ard Biesheuvel
2019-01-14 11:38     ` Koenig, Christian
2019-01-14 17:32       ` Ard Biesheuvel
     [not found]         ` <9f956898-7973-98ee-6bf1-e1d445e9d365@amd.com>
2019-01-14 19:13           ` Will Deacon
2019-01-14 19:21             ` Koenig, Christian
2019-01-14 19:35               ` Will Deacon [this message]
2019-01-15 11:31                 ` Michael Ellerman
2019-01-16  0:33                   ` Benjamin Herrenschmidt
2019-01-16  7:35                     ` Koenig, Christian
2019-01-16  7:47                       ` Ard Biesheuvel
2019-01-17  6:07                         ` Benjamin Herrenschmidt
2019-01-17  8:02                           ` Ard Biesheuvel
2019-01-17  5:59                       ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190114193548.GB29600@fuggles.cambridge.arm.com \
    --to=will.deacon@arm.com \
    --cc=Bernhard.Rosenkranzer@linaro.org \
    --cc=Carsten.Haitzler@arm.com \
    --cc=Christian.Koenig@amd.com \
    --cc=Jerry.Zhang@amd.com \
    --cc=Ray.Huang@amd.com \
    --cc=airlied@linux.ie \
    --cc=ard.biesheuvel@linaro.org \
    --cc=benh@kernel.crashing.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michel@daenzer.net \
    --cc=mpe@ellerman.id.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).