All of lore.kernel.org
 help / color / mirror / Atom feed
* SMMU driver and stall vs terminate mode
@ 2016-06-20 15:28 ` Stuart Yoder
  0 siblings, 0 replies; 12+ messages in thread
From: Stuart Yoder @ 2016-06-20 15:28 UTC (permalink / raw)
  To: Robin Murphy, Will Deacon
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Bharat Bhushan,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Robin/Will,

Right now the SMMU driver is hardcoded to configure 'stall' mode for
context faults:

      /* SCTLR */
      reg = SCTLR_CFCFG | SCTLR_CFIE | SCTLR_CFRE | SCTLR_M | SCTLR_EAE_SBOP;

We are running into an issue with a device where it seems behave sanely
when SCTLR_CFCFG=0 ...i.e. 'terminate' mode, but in stall mode seems to be
unaware that an access violation occurred.

Is there really some assumption that all devices that send transcactions
through the SMMU _must_ be able to handle stall mode?  I am trying to
find out from our hw designers what is going on at the signal level for
the device in question, but it seems to me that 'terminate' mode exists
for a reason and I wonder what your thoughts are about providing a
configuration option to allow configuration of terminate mode if a particular
SoC requires it.

Thanks,
Stuart

^ permalink raw reply	[flat|nested] 12+ messages in thread

* SMMU driver and stall vs terminate mode
@ 2016-06-20 15:28 ` Stuart Yoder
  0 siblings, 0 replies; 12+ messages in thread
From: Stuart Yoder @ 2016-06-20 15:28 UTC (permalink / raw)
  To: linux-arm-kernel

Robin/Will,

Right now the SMMU driver is hardcoded to configure 'stall' mode for
context faults:

      /* SCTLR */
      reg = SCTLR_CFCFG | SCTLR_CFIE | SCTLR_CFRE | SCTLR_M | SCTLR_EAE_SBOP;

We are running into an issue with a device where it seems behave sanely
when SCTLR_CFCFG=0 ...i.e. 'terminate' mode, but in stall mode seems to be
unaware that an access violation occurred.

Is there really some assumption that all devices that send transcactions
through the SMMU _must_ be able to handle stall mode?  I am trying to
find out from our hw designers what is going on at the signal level for
the device in question, but it seems to me that 'terminate' mode exists
for a reason and I wonder what your thoughts are about providing a
configuration option to allow configuration of terminate mode if a particular
SoC requires it.

Thanks,
Stuart

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SMMU driver and stall vs terminate mode
  2016-06-20 15:28 ` Stuart Yoder
@ 2016-06-20 16:08     ` Robin Murphy
  -1 siblings, 0 replies; 12+ messages in thread
From: Robin Murphy @ 2016-06-20 16:08 UTC (permalink / raw)
  To: Stuart Yoder, Will Deacon
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Bharat Bhushan, Brian Starkey,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Stuart,

On 20/06/16 16:28, Stuart Yoder wrote:
> Robin/Will,
>
> Right now the SMMU driver is hardcoded to configure 'stall' mode for
> context faults:
>
>        /* SCTLR */
>        reg = SCTLR_CFCFG | SCTLR_CFIE | SCTLR_CFRE | SCTLR_M | SCTLR_EAE_SBOP;
>
> We are running into an issue with a device where it seems behave sanely
> when SCTLR_CFCFG=0 ...i.e. 'terminate' mode, but in stall mode seems to be
> unaware that an access violation occurred.

Does the device keep issuing transactions after the initial faulting 
one, by any chance? Brian (+cc) has seen similar-sounding issues in the 
past (albeit with backports to some horrible Android kernel), and I 
think we concluded that there's an inherent race window between writing 
RESUME and acking the interrupt in which MMU-500 can process another 
faulting transaction and reassert the IRQ without Linux realising, which 
then gets lost and things go out of whack.

> Is there really some assumption that all devices that send transcactions
> through the SMMU _must_ be able to handle stall mode?  I am trying to
> find out from our hw designers what is going on at the signal level for
> the device in question, but it seems to me that 'terminate' mode exists
> for a reason and I wonder what your thoughts are about providing a
> configuration option to allow configuration of terminate mode if a particular
> SoC requires it.

Personally, I'd quite happily leave it turned off (MMU-400/401 don't 
support stalling anyway), but I recall Will having a fairly 
reasonable-sounding argument in favour, which I now can't remember the 
details of. Hopefully he might remind us, unless his conference is too 
enthralling.

Robin.

>
> Thanks,
> Stuart
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* SMMU driver and stall vs terminate mode
@ 2016-06-20 16:08     ` Robin Murphy
  0 siblings, 0 replies; 12+ messages in thread
From: Robin Murphy @ 2016-06-20 16:08 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Stuart,

On 20/06/16 16:28, Stuart Yoder wrote:
> Robin/Will,
>
> Right now the SMMU driver is hardcoded to configure 'stall' mode for
> context faults:
>
>        /* SCTLR */
>        reg = SCTLR_CFCFG | SCTLR_CFIE | SCTLR_CFRE | SCTLR_M | SCTLR_EAE_SBOP;
>
> We are running into an issue with a device where it seems behave sanely
> when SCTLR_CFCFG=0 ...i.e. 'terminate' mode, but in stall mode seems to be
> unaware that an access violation occurred.

Does the device keep issuing transactions after the initial faulting 
one, by any chance? Brian (+cc) has seen similar-sounding issues in the 
past (albeit with backports to some horrible Android kernel), and I 
think we concluded that there's an inherent race window between writing 
RESUME and acking the interrupt in which MMU-500 can process another 
faulting transaction and reassert the IRQ without Linux realising, which 
then gets lost and things go out of whack.

> Is there really some assumption that all devices that send transcactions
> through the SMMU _must_ be able to handle stall mode?  I am trying to
> find out from our hw designers what is going on at the signal level for
> the device in question, but it seems to me that 'terminate' mode exists
> for a reason and I wonder what your thoughts are about providing a
> configuration option to allow configuration of terminate mode if a particular
> SoC requires it.

Personally, I'd quite happily leave it turned off (MMU-400/401 don't 
support stalling anyway), but I recall Will having a fairly 
reasonable-sounding argument in favour, which I now can't remember the 
details of. Hopefully he might remind us, unless his conference is too 
enthralling.

Robin.

>
> Thanks,
> Stuart
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SMMU driver and stall vs terminate mode
  2016-06-20 16:08     ` Robin Murphy
@ 2016-06-21  9:42         ` Will Deacon
  -1 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2016-06-21  9:42 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Stuart Yoder, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Bharat Bhushan, Brian Starkey,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Mon, Jun 20, 2016 at 05:08:45PM +0100, Robin Murphy wrote:
> On 20/06/16 16:28, Stuart Yoder wrote:
> >Right now the SMMU driver is hardcoded to configure 'stall' mode for
> >context faults:
> >
> >       /* SCTLR */
> >       reg = SCTLR_CFCFG | SCTLR_CFIE | SCTLR_CFRE | SCTLR_M | SCTLR_EAE_SBOP;
> >
> >We are running into an issue with a device where it seems behave sanely
> >when SCTLR_CFCFG=0 ...i.e. 'terminate' mode, but in stall mode seems to be
> >unaware that an access violation occurred.
> 
> Does the device keep issuing transactions after the initial faulting one, by
> any chance? Brian (+cc) has seen similar-sounding issues in the past (albeit
> with backports to some horrible Android kernel), and I think we concluded
> that there's an inherent race window between writing RESUME and acking the
> interrupt in which MMU-500 can process another faulting transaction and
> reassert the IRQ without Linux realising, which then gets lost and things go
> out of whack.

Do we not detect this with the MULTI bit in the FSR?

> >Is there really some assumption that all devices that send transcactions
> >through the SMMU _must_ be able to handle stall mode?  I am trying to
> >find out from our hw designers what is going on at the signal level for
> >the device in question, but it seems to me that 'terminate' mode exists
> >for a reason and I wonder what your thoughts are about providing a
> >configuration option to allow configuration of terminate mode if a particular
> >SoC requires it.
> 
> Personally, I'd quite happily leave it turned off (MMU-400/401 don't support
> stalling anyway), but I recall Will having a fairly reasonable-sounding
> argument in favour, which I now can't remember the details of. Hopefully he
> might remind us, unless his conference is too enthralling.

Given that we don't do anything particularly useful in the context fault
handler, I also wouldn't object to turning this off (and removing the
retry/reporting machinery). However, I'd want t better description of
*why* it's causing problems first, so that we can justify the decision
in case anybody is using this out of tree.

If we did make the thing configurable, would that be another command line
option, or something in DT? What about ACPI?

Will

^ permalink raw reply	[flat|nested] 12+ messages in thread

* SMMU driver and stall vs terminate mode
@ 2016-06-21  9:42         ` Will Deacon
  0 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2016-06-21  9:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 20, 2016 at 05:08:45PM +0100, Robin Murphy wrote:
> On 20/06/16 16:28, Stuart Yoder wrote:
> >Right now the SMMU driver is hardcoded to configure 'stall' mode for
> >context faults:
> >
> >       /* SCTLR */
> >       reg = SCTLR_CFCFG | SCTLR_CFIE | SCTLR_CFRE | SCTLR_M | SCTLR_EAE_SBOP;
> >
> >We are running into an issue with a device where it seems behave sanely
> >when SCTLR_CFCFG=0 ...i.e. 'terminate' mode, but in stall mode seems to be
> >unaware that an access violation occurred.
> 
> Does the device keep issuing transactions after the initial faulting one, by
> any chance? Brian (+cc) has seen similar-sounding issues in the past (albeit
> with backports to some horrible Android kernel), and I think we concluded
> that there's an inherent race window between writing RESUME and acking the
> interrupt in which MMU-500 can process another faulting transaction and
> reassert the IRQ without Linux realising, which then gets lost and things go
> out of whack.

Do we not detect this with the MULTI bit in the FSR?

> >Is there really some assumption that all devices that send transcactions
> >through the SMMU _must_ be able to handle stall mode?  I am trying to
> >find out from our hw designers what is going on at the signal level for
> >the device in question, but it seems to me that 'terminate' mode exists
> >for a reason and I wonder what your thoughts are about providing a
> >configuration option to allow configuration of terminate mode if a particular
> >SoC requires it.
> 
> Personally, I'd quite happily leave it turned off (MMU-400/401 don't support
> stalling anyway), but I recall Will having a fairly reasonable-sounding
> argument in favour, which I now can't remember the details of. Hopefully he
> might remind us, unless his conference is too enthralling.

Given that we don't do anything particularly useful in the context fault
handler, I also wouldn't object to turning this off (and removing the
retry/reporting machinery). However, I'd want t better description of
*why* it's causing problems first, so that we can justify the decision
in case anybody is using this out of tree.

If we did make the thing configurable, would that be another command line
option, or something in DT? What about ACPI?

Will

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: SMMU driver and stall vs terminate mode
  2016-06-20 16:08     ` Robin Murphy
@ 2016-06-21 14:33         ` Stuart Yoder
  -1 siblings, 0 replies; 12+ messages in thread
From: Stuart Yoder @ 2016-06-21 14:33 UTC (permalink / raw)
  To: Robin Murphy, Will Deacon
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Bharat Bhushan, Brian Starkey,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r



> -----Original Message-----
> From: Robin Murphy [mailto:robin.murphy-5wv7dgnIgG8@public.gmane.org]
> Sent: Monday, June 20, 2016 11:09 AM
> To: Stuart Yoder <stuart.yoder-3arQi8VN3Tc@public.gmane.org>; Will Deacon <Will.Deacon-5wv7dgnIgG8@public.gmane.org>
> Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org; iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; Nipun Gupta
> <nipun.gupta-3arQi8VN3Tc@public.gmane.org>; Bharat Bhushan <bharat.bhushan-3arQi8VN3Tc@public.gmane.org>; Brian Starkey <brian.starkey-5wv7dgnIgG8@public.gmane.org>
> Subject: Re: SMMU driver and stall vs terminate mode
> 
> Hi Stuart,
> 
> On 20/06/16 16:28, Stuart Yoder wrote:
> > Robin/Will,
> >
> > Right now the SMMU driver is hardcoded to configure 'stall' mode for
> > context faults:
> >
> >        /* SCTLR */
> >        reg = SCTLR_CFCFG | SCTLR_CFIE | SCTLR_CFRE | SCTLR_M | SCTLR_EAE_SBOP;
> >
> > We are running into an issue with a device where it seems behave sanely
> > when SCTLR_CFCFG=0 ...i.e. 'terminate' mode, but in stall mode seems to be
> > unaware that an access violation occurred.
> 
> Does the device keep issuing transactions after the initial faulting
> one, by any chance?

In the case we are seeing, it is a unit test type scenario and I think
there is only one transaction, so software doesn't try to continue issuing
transactions.

Stuart

^ permalink raw reply	[flat|nested] 12+ messages in thread

* SMMU driver and stall vs terminate mode
@ 2016-06-21 14:33         ` Stuart Yoder
  0 siblings, 0 replies; 12+ messages in thread
From: Stuart Yoder @ 2016-06-21 14:33 UTC (permalink / raw)
  To: linux-arm-kernel



> -----Original Message-----
> From: Robin Murphy [mailto:robin.murphy at arm.com]
> Sent: Monday, June 20, 2016 11:09 AM
> To: Stuart Yoder <stuart.yoder@nxp.com>; Will Deacon <Will.Deacon@arm.com>
> Cc: linux-arm-kernel at lists.infradead.org; iommu at lists.linux-foundation.org; Nipun Gupta
> <nipun.gupta@nxp.com>; Bharat Bhushan <bharat.bhushan@nxp.com>; Brian Starkey <brian.starkey@arm.com>
> Subject: Re: SMMU driver and stall vs terminate mode
> 
> Hi Stuart,
> 
> On 20/06/16 16:28, Stuart Yoder wrote:
> > Robin/Will,
> >
> > Right now the SMMU driver is hardcoded to configure 'stall' mode for
> > context faults:
> >
> >        /* SCTLR */
> >        reg = SCTLR_CFCFG | SCTLR_CFIE | SCTLR_CFRE | SCTLR_M | SCTLR_EAE_SBOP;
> >
> > We are running into an issue with a device where it seems behave sanely
> > when SCTLR_CFCFG=0 ...i.e. 'terminate' mode, but in stall mode seems to be
> > unaware that an access violation occurred.
> 
> Does the device keep issuing transactions after the initial faulting
> one, by any chance?

In the case we are seeing, it is a unit test type scenario and I think
there is only one transaction, so software doesn't try to continue issuing
transactions.

Stuart

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: SMMU driver and stall vs terminate mode
  2016-06-21  9:42         ` Will Deacon
@ 2016-06-21 14:36             ` Stuart Yoder
  -1 siblings, 0 replies; 12+ messages in thread
From: Stuart Yoder @ 2016-06-21 14:36 UTC (permalink / raw)
  To: Will Deacon, Robin Murphy
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Bharat Bhushan, Brian Starkey,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r



> -----Original Message-----
> From: Will Deacon [mailto:will.deacon-5wv7dgnIgG8@public.gmane.org]
> Sent: Tuesday, June 21, 2016 4:43 AM
> To: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> Cc: Stuart Yoder <stuart.yoder-3arQi8VN3Tc@public.gmane.org>; linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org; iommu-cunTk1MwBs/ROKNJybVBZg@public.gmane.org
> foundation.org; Nipun Gupta <nipun.gupta-3arQi8VN3Tc@public.gmane.org>; Bharat Bhushan <bharat.bhushan-3arQi8VN3Tc@public.gmane.org>; Brian
> Starkey <brian.starkey-5wv7dgnIgG8@public.gmane.org>
> Subject: Re: SMMU driver and stall vs terminate mode
> 
> On Mon, Jun 20, 2016 at 05:08:45PM +0100, Robin Murphy wrote:
> > On 20/06/16 16:28, Stuart Yoder wrote:
> > >Right now the SMMU driver is hardcoded to configure 'stall' mode for
> > >context faults:
> > >
> > >       /* SCTLR */
> > >       reg = SCTLR_CFCFG | SCTLR_CFIE | SCTLR_CFRE | SCTLR_M | SCTLR_EAE_SBOP;
> > >
> > >We are running into an issue with a device where it seems behave sanely
> > >when SCTLR_CFCFG=0 ...i.e. 'terminate' mode, but in stall mode seems to be
> > >unaware that an access violation occurred.
> >
> > Does the device keep issuing transactions after the initial faulting one, by
> > any chance? Brian (+cc) has seen similar-sounding issues in the past (albeit
> > with backports to some horrible Android kernel), and I think we concluded
> > that there's an inherent race window between writing RESUME and acking the
> > interrupt in which MMU-500 can process another faulting transaction and
> > reassert the IRQ without Linux realising, which then gets lost and things go
> > out of whack.
> 
> Do we not detect this with the MULTI bit in the FSR?
> 
> > >Is there really some assumption that all devices that send transcactions
> > >through the SMMU _must_ be able to handle stall mode?  I am trying to
> > >find out from our hw designers what is going on at the signal level for
> > >the device in question, but it seems to me that 'terminate' mode exists
> > >for a reason and I wonder what your thoughts are about providing a
> > >configuration option to allow configuration of terminate mode if a particular
> > >SoC requires it.
> >
> > Personally, I'd quite happily leave it turned off (MMU-400/401 don't support
> > stalling anyway), but I recall Will having a fairly reasonable-sounding
> > argument in favour, which I now can't remember the details of. Hopefully he
> > might remind us, unless his conference is too enthralling.
> 
> Given that we don't do anything particularly useful in the context fault
> handler, I also wouldn't object to turning this off (and removing the
> retry/reporting machinery). However, I'd want t better description of
> *why* it's causing problems first, so that we can justify the decision
> in case anybody is using this out of tree.

I am trying to get more details from HW owners of this device as to
its behavior in these 2 different SMMU modes.

Stuart

^ permalink raw reply	[flat|nested] 12+ messages in thread

* SMMU driver and stall vs terminate mode
@ 2016-06-21 14:36             ` Stuart Yoder
  0 siblings, 0 replies; 12+ messages in thread
From: Stuart Yoder @ 2016-06-21 14:36 UTC (permalink / raw)
  To: linux-arm-kernel



> -----Original Message-----
> From: Will Deacon [mailto:will.deacon at arm.com]
> Sent: Tuesday, June 21, 2016 4:43 AM
> To: Robin Murphy <robin.murphy@arm.com>
> Cc: Stuart Yoder <stuart.yoder@nxp.com>; linux-arm-kernel at lists.infradead.org; iommu at lists.linux-
> foundation.org; Nipun Gupta <nipun.gupta@nxp.com>; Bharat Bhushan <bharat.bhushan@nxp.com>; Brian
> Starkey <brian.starkey@arm.com>
> Subject: Re: SMMU driver and stall vs terminate mode
> 
> On Mon, Jun 20, 2016 at 05:08:45PM +0100, Robin Murphy wrote:
> > On 20/06/16 16:28, Stuart Yoder wrote:
> > >Right now the SMMU driver is hardcoded to configure 'stall' mode for
> > >context faults:
> > >
> > >       /* SCTLR */
> > >       reg = SCTLR_CFCFG | SCTLR_CFIE | SCTLR_CFRE | SCTLR_M | SCTLR_EAE_SBOP;
> > >
> > >We are running into an issue with a device where it seems behave sanely
> > >when SCTLR_CFCFG=0 ...i.e. 'terminate' mode, but in stall mode seems to be
> > >unaware that an access violation occurred.
> >
> > Does the device keep issuing transactions after the initial faulting one, by
> > any chance? Brian (+cc) has seen similar-sounding issues in the past (albeit
> > with backports to some horrible Android kernel), and I think we concluded
> > that there's an inherent race window between writing RESUME and acking the
> > interrupt in which MMU-500 can process another faulting transaction and
> > reassert the IRQ without Linux realising, which then gets lost and things go
> > out of whack.
> 
> Do we not detect this with the MULTI bit in the FSR?
> 
> > >Is there really some assumption that all devices that send transcactions
> > >through the SMMU _must_ be able to handle stall mode?  I am trying to
> > >find out from our hw designers what is going on at the signal level for
> > >the device in question, but it seems to me that 'terminate' mode exists
> > >for a reason and I wonder what your thoughts are about providing a
> > >configuration option to allow configuration of terminate mode if a particular
> > >SoC requires it.
> >
> > Personally, I'd quite happily leave it turned off (MMU-400/401 don't support
> > stalling anyway), but I recall Will having a fairly reasonable-sounding
> > argument in favour, which I now can't remember the details of. Hopefully he
> > might remind us, unless his conference is too enthralling.
> 
> Given that we don't do anything particularly useful in the context fault
> handler, I also wouldn't object to turning this off (and removing the
> retry/reporting machinery). However, I'd want t better description of
> *why* it's causing problems first, so that we can justify the decision
> in case anybody is using this out of tree.

I am trying to get more details from HW owners of this device as to
its behavior in these 2 different SMMU modes.

Stuart

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: SMMU driver and stall vs terminate mode
  2016-06-21 14:36             ` Stuart Yoder
@ 2016-06-21 14:47               ` Brian Starkey
  -1 siblings, 0 replies; 12+ messages in thread
From: Brian Starkey @ 2016-06-21 14:47 UTC (permalink / raw)
  To: Stuart Yoder
  Cc: Bharat Bhushan, Will Deacon, iommu, Nipun Gupta, Robin Murphy,
	linux-arm-kernel

Hi,

On Tue, Jun 21, 2016 at 02:36:17PM +0000, Stuart Yoder wrote:
>
>
>> -----Original Message-----
>> From: Will Deacon [mailto:will.deacon@arm.com]
>> Sent: Tuesday, June 21, 2016 4:43 AM
>> To: Robin Murphy <robin.murphy@arm.com>
>> Cc: Stuart Yoder <stuart.yoder@nxp.com>; linux-arm-kernel@lists.infradead.org; iommu@lists.linux-
>> foundation.org; Nipun Gupta <nipun.gupta@nxp.com>; Bharat Bhushan <bharat.bhushan@nxp.com>; Brian
>> Starkey <brian.starkey@arm.com>
>> Subject: Re: SMMU driver and stall vs terminate mode
>>
>> On Mon, Jun 20, 2016 at 05:08:45PM +0100, Robin Murphy wrote:
>> > On 20/06/16 16:28, Stuart Yoder wrote:
>> > >Right now the SMMU driver is hardcoded to configure 'stall' mode for
>> > >context faults:
>> > >
>> > >       /* SCTLR */
>> > >       reg = SCTLR_CFCFG | SCTLR_CFIE | SCTLR_CFRE | SCTLR_M | SCTLR_EAE_SBOP;
>> > >
>> > >We are running into an issue with a device where it seems behave sanely
>> > >when SCTLR_CFCFG=0 ...i.e. 'terminate' mode, but in stall mode seems to be
>> > >unaware that an access violation occurred.
>> >
>> > Does the device keep issuing transactions after the initial faulting one, by
>> > any chance? Brian (+cc) has seen similar-sounding issues in the past (albeit
>> > with backports to some horrible Android kernel), and I think we concluded
>> > that there's an inherent race window between writing RESUME and acking the
>> > interrupt in which MMU-500 can process another faulting transaction and
>> > reassert the IRQ without Linux realising, which then gets lost and things go
>> > out of whack.

The problem in my case ended up being that one of the IRQ lines for the
MMU wasn't actually wired up - so the MMU driver never knew there was an
IRQ to handle and so never un-stalled the transactions.
I think it was the context bank's line, so global faults worked fine but
not context faults.

Of course, there may also be a race on RESUME.

>>
>> Do we not detect this with the MULTI bit in the FSR?
>>
>> > >Is there really some assumption that all devices that send transcactions
>> > >through the SMMU _must_ be able to handle stall mode?  I am trying to
>> > >find out from our hw designers what is going on at the signal level for
>> > >the device in question, but it seems to me that 'terminate' mode exists
>> > >for a reason and I wonder what your thoughts are about providing a
>> > >configuration option to allow configuration of terminate mode if a particular
>> > >SoC requires it.
>> >
>> > Personally, I'd quite happily leave it turned off (MMU-400/401 don't support
>> > stalling anyway), but I recall Will having a fairly reasonable-sounding
>> > argument in favour, which I now can't remember the details of. Hopefully he
>> > might remind us, unless his conference is too enthralling.
>>
>> Given that we don't do anything particularly useful in the context fault
>> handler, I also wouldn't object to turning this off (and removing the
>> retry/reporting machinery). However, I'd want t better description of
>> *why* it's causing problems first, so that we can justify the decision
>> in case anybody is using this out of tree.

Is map-on-fault a valid enough use-case?
Drivers can register their own fault handlers, so even if arm-smmu isn't
doing anything interesting, I think the master's driver might.

>
>I am trying to get more details from HW owners of this device as to
>its behavior in these 2 different SMMU modes.
>

My understanding is that it should be transparent to the hardware. It
just looks like translation is taking a particularly long time (before
ultimately faulting). As long as the MMU IRQ handler is running as it
should, the transactions will eventually fault as normal.

Thanks,
Brian

>Stuart
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* SMMU driver and stall vs terminate mode
@ 2016-06-21 14:47               ` Brian Starkey
  0 siblings, 0 replies; 12+ messages in thread
From: Brian Starkey @ 2016-06-21 14:47 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Tue, Jun 21, 2016 at 02:36:17PM +0000, Stuart Yoder wrote:
>
>
>> -----Original Message-----
>> From: Will Deacon [mailto:will.deacon at arm.com]
>> Sent: Tuesday, June 21, 2016 4:43 AM
>> To: Robin Murphy <robin.murphy@arm.com>
>> Cc: Stuart Yoder <stuart.yoder@nxp.com>; linux-arm-kernel at lists.infradead.org; iommu at lists.linux-
>> foundation.org; Nipun Gupta <nipun.gupta@nxp.com>; Bharat Bhushan <bharat.bhushan@nxp.com>; Brian
>> Starkey <brian.starkey@arm.com>
>> Subject: Re: SMMU driver and stall vs terminate mode
>>
>> On Mon, Jun 20, 2016 at 05:08:45PM +0100, Robin Murphy wrote:
>> > On 20/06/16 16:28, Stuart Yoder wrote:
>> > >Right now the SMMU driver is hardcoded to configure 'stall' mode for
>> > >context faults:
>> > >
>> > >       /* SCTLR */
>> > >       reg = SCTLR_CFCFG | SCTLR_CFIE | SCTLR_CFRE | SCTLR_M | SCTLR_EAE_SBOP;
>> > >
>> > >We are running into an issue with a device where it seems behave sanely
>> > >when SCTLR_CFCFG=0 ...i.e. 'terminate' mode, but in stall mode seems to be
>> > >unaware that an access violation occurred.
>> >
>> > Does the device keep issuing transactions after the initial faulting one, by
>> > any chance? Brian (+cc) has seen similar-sounding issues in the past (albeit
>> > with backports to some horrible Android kernel), and I think we concluded
>> > that there's an inherent race window between writing RESUME and acking the
>> > interrupt in which MMU-500 can process another faulting transaction and
>> > reassert the IRQ without Linux realising, which then gets lost and things go
>> > out of whack.

The problem in my case ended up being that one of the IRQ lines for the
MMU wasn't actually wired up - so the MMU driver never knew there was an
IRQ to handle and so never un-stalled the transactions.
I think it was the context bank's line, so global faults worked fine but
not context faults.

Of course, there may also be a race on RESUME.

>>
>> Do we not detect this with the MULTI bit in the FSR?
>>
>> > >Is there really some assumption that all devices that send transcactions
>> > >through the SMMU _must_ be able to handle stall mode?  I am trying to
>> > >find out from our hw designers what is going on at the signal level for
>> > >the device in question, but it seems to me that 'terminate' mode exists
>> > >for a reason and I wonder what your thoughts are about providing a
>> > >configuration option to allow configuration of terminate mode if a particular
>> > >SoC requires it.
>> >
>> > Personally, I'd quite happily leave it turned off (MMU-400/401 don't support
>> > stalling anyway), but I recall Will having a fairly reasonable-sounding
>> > argument in favour, which I now can't remember the details of. Hopefully he
>> > might remind us, unless his conference is too enthralling.
>>
>> Given that we don't do anything particularly useful in the context fault
>> handler, I also wouldn't object to turning this off (and removing the
>> retry/reporting machinery). However, I'd want t better description of
>> *why* it's causing problems first, so that we can justify the decision
>> in case anybody is using this out of tree.

Is map-on-fault a valid enough use-case?
Drivers can register their own fault handlers, so even if arm-smmu isn't
doing anything interesting, I think the master's driver might.

>
>I am trying to get more details from HW owners of this device as to
>its behavior in these 2 different SMMU modes.
>

My understanding is that it should be transparent to the hardware. It
just looks like translation is taking a particularly long time (before
ultimately faulting). As long as the MMU IRQ handler is running as it
should, the transactions will eventually fault as normal.

Thanks,
Brian

>Stuart
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-06-21 14:47 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-20 15:28 SMMU driver and stall vs terminate mode Stuart Yoder
2016-06-20 15:28 ` Stuart Yoder
     [not found] ` <HE1PR04MB1641B0F8442061E3437037628D2A0-6LN7OEpIatU5tNmRkpaxD89NdZoXdze2vxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2016-06-20 16:08   ` Robin Murphy
2016-06-20 16:08     ` Robin Murphy
     [not found]     ` <5768150D.2070705-5wv7dgnIgG8@public.gmane.org>
2016-06-21  9:42       ` Will Deacon
2016-06-21  9:42         ` Will Deacon
     [not found]         ` <20160621094237.GL29165-5wv7dgnIgG8@public.gmane.org>
2016-06-21 14:36           ` Stuart Yoder
2016-06-21 14:36             ` Stuart Yoder
2016-06-21 14:47             ` Brian Starkey
2016-06-21 14:47               ` Brian Starkey
2016-06-21 14:33       ` Stuart Yoder
2016-06-21 14:33         ` Stuart Yoder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.