All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-01 17:32 ` Dave Martin
  0 siblings, 0 replies; 58+ messages in thread
From: Dave Martin @ 2014-05-01 17:32 UTC (permalink / raw)
  To: devicetree-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Shaik Ameer Basha, Stephen Warren, Grant Grundler, Hiroshi Doyu,
	Arnd Bergmann, Thierry Reding, Jason Gunthorpe, Will Deacon,
	Mark Rutland, Marc Zyngier

(Note, this is a long mail -- people in a hurry may want to skip to
"Outline binding" to get a feel for what is bring proposed, before
returning to the background wording.)

As highlighted in some previous discussions[1], it is becoming possible
to build ARM-based SoCs that seem to be impossible to describe using the
DT bindings currently specified by ePAPR.  This is driven by increasing
complexity of interconnects, the appearance of IOMMUs, MSI-capable
interrupt controllers and multiple bus masters.

This issue is not fundamental to ARM and could apply to other SoC
families with a similar bus architecture, but most of the current
discussion in this area has been about how to address these
requirements for ARM SoCs.

This RFC is an outline for some core bindings to solve part of the
problem of describing such systems, particularly how to describe master/
slave relationships not currently representable in DT.  It is premature
to make a concrete proposal yet: rather I'm presenting this as a starting
point for discussion initially.

The intent is not to rewrite existing bindings, but to define a common
DT approach for describing otherwise problematic features of future
systems.  Actual Linux support for this could be implemented as needed.


** Preamble **

ePAPR assumes that all bus mastering is from a node to one of its
child nodes, or from a node to its parent node.

The actual bus mastering relationships in SoCs using a unidirectional
bus architecture such as AMBA usually do not follow this model.
However, historically interconnects have been so simple in their
behaviour that the discrepancies are transparent to software: thus
it has been possible to get away with not describing the true hardware
relationships.

There is a risk that every exception to the tree structure will
be solved with a different ad-hoc, binding-specific solution if
no uniform approach is adopted.

This RFC sketches bindings for an additional way of specifying
a master-slave link, for cases where there is no possible arrangement
of nodes that maps all master/slave relationships consistently
onto the DT.

This aims to allow for correct description of the topological
relationships of all bus masters, as well as bridge-like components
such as IOMMUs and other bus adaptors, any of which may be shared
in many-to-many configurations in an ARM SoC.


[1] see for example "[PATCH v12 11/31] documentation: iommu: add binding
document of Exynos System MMU"
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/251231.html


** Outline binding **

generic device node property: "slaves"

	optional

	type : cell array consisting of one or more phandles

	Implies that the device represented by the containing node
	can issue transactions to the referenced node.

	The referenced node is any bus or device node, and is
	interpreted in the usual way, including the treatment
	of ranges, #address-cells and #size-cells.  If the
	referenced node has a non-empty ranges property, the
	referencing node's #address-cells must be the same as
	that of the referenced node's device tree parent.

generic device node property: "slave-names"

	prohibited if there is no "slaves" property; otherwise
	optional.  Recommended if the "slaves" property has
	two or more entries.

	type : string list with the same number of entries as
		the number of cells in the value of the
		"slaves" property.

	Assigns symbolic names to the entries in the "slaves"
	property, permitting slave connections with different
	roles to be disambiguated.  See
	Documentation/devicetree/bindings/resource-names.txt

generic device node: "slave"

	optional

	Implies that the device represented by the containing
	node can issue transactions to the "slave" node.  "slave"
	would always have these semantics; whether other child
	nodes have a similar meaning is binding-specific.

	property : "name"

		optional

		Assigns a symbolic name to this slave with
		respect to the master.

If neither "slaves" nor any "slave" node is present, the topological
relationships are those defined by ePAPR: the device may or not be
a master, and if it is a master then it masters onto the parent node,
optionally propagating through the parent to the parent's parent
via mappings described by dma-ranges.


Rationale:

The intention is that by substituting a master's node for /, and
adding traversal rules for "slaves" properties and "slave" nodes in
addition to the usual devicetree parent/child relationship rules, the
path from any master to any addressable slave can be determined, along
with any mappings associated with it.

This allows the potentially unique address mappings for a particular
master to be determined, as well as allowing per-device and global DMA
masks to be derived from the device tree.

Slave references via this binding would be strictly unidirectional,
so any dma-ranges property on the slave end is ignored and the
master is not considered to be addressable by the slave (unless
there is another, separate path in the DT from the slave back to
the master).

I consider this reasonable because the bidirectional bus case is
already well described by ePAPR; so, the slaves convention should
only be used for unidirectional links that break the ePAPR mould.


Questions:

1) Should the names "slaves" and "slave" be globally generic?

   Pro: Making them generic permits some processing to be done on the DT
   without knowing the individual bindings for every node, such as
   figuring out the global DMA mask.  It should also encourage adoption
   of the bindings as a common approach.

   Con: Namespace pollution

   Otherwise, there could be a special string in the node's compatible
   list (strictly not "simple-bus") to indicate that these properties
   should be interpreted.

   The alternative is for the way of identifying a node's slaves to be
   binding-specific.  This makes some generic operations on the DT
   impossible without knowing all the bindings, such as analysing
   reachability or determining the effective DMA mask.  This analysis
   can be performed using generic bindings alone today, for systems
   describable by ePAPR.  Breaking this concept feels like a backward
   step.

2) The generic "slave" node(s) are for convenience and readability.
   They could be made eliminated by using child nodes with
   binding-specific names and referencing them in "slaves".  This is a
   bit more awkward, but has the same expressive power.

   Should the generic "slave" nodes go away?

3) Should "slave" or "slaves" be traversable for bridge- or bus-like
   nodes?

   Saying "no" to this makes it impossible for the reachability graph of
   the DT to contain cycles.  This is a clear benefit for any software
   attempting to parse the DT in a robust way.  Only the first link,
   from the initiating master to the first bridge, would be permitted
   to be a "slaves" link.

   Ideally, we would want an IOMMU's bridge-like role to be represented
   by some deep node in the DT: it can't usually be on the global path
   from / since CPUs typically don't master through the IOMMU.

   Parsers could be made robust while still permitting this, by
   truncating the search if the initial master node is reached.
   Ill-formed DTs could contains cycles that can't be resolved in
   this way, e.g., A -> B -> B.  For now it might be reasonable to
   check for this in dtc.

4) Is the location of the memory node going to cause problems?

   Even in complex systems, it is usually topologically correct (or
   at least correct enough) to put memory in /.  ePAPR does not actually
   say that the memory must be at / or that there must be only one
   memory node, but some software may be relying on this even if it's
   not correct with respect to the topology.

   Linux's early FDT parsing would probably be affected by moving this
   node, since it appears at least to ignore ranges properties.  It
   is highly likely that firmware and bootloaders that manipulate
   /memory would get confused if it was moved out of /.

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-01 17:32 ` Dave Martin
  0 siblings, 0 replies; 58+ messages in thread
From: Dave Martin @ 2014-05-01 17:32 UTC (permalink / raw)
  To: linux-arm-kernel

(Note, this is a long mail -- people in a hurry may want to skip to
"Outline binding" to get a feel for what is bring proposed, before
returning to the background wording.)

As highlighted in some previous discussions[1], it is becoming possible
to build ARM-based SoCs that seem to be impossible to describe using the
DT bindings currently specified by ePAPR.  This is driven by increasing
complexity of interconnects, the appearance of IOMMUs, MSI-capable
interrupt controllers and multiple bus masters.

This issue is not fundamental to ARM and could apply to other SoC
families with a similar bus architecture, but most of the current
discussion in this area has been about how to address these
requirements for ARM SoCs.

This RFC is an outline for some core bindings to solve part of the
problem of describing such systems, particularly how to describe master/
slave relationships not currently representable in DT.  It is premature
to make a concrete proposal yet: rather I'm presenting this as a starting
point for discussion initially.

The intent is not to rewrite existing bindings, but to define a common
DT approach for describing otherwise problematic features of future
systems.  Actual Linux support for this could be implemented as needed.


** Preamble **

ePAPR assumes that all bus mastering is from a node to one of its
child nodes, or from a node to its parent node.

The actual bus mastering relationships in SoCs using a unidirectional
bus architecture such as AMBA usually do not follow this model.
However, historically interconnects have been so simple in their
behaviour that the discrepancies are transparent to software: thus
it has been possible to get away with not describing the true hardware
relationships.

There is a risk that every exception to the tree structure will
be solved with a different ad-hoc, binding-specific solution if
no uniform approach is adopted.

This RFC sketches bindings for an additional way of specifying
a master-slave link, for cases where there is no possible arrangement
of nodes that maps all master/slave relationships consistently
onto the DT.

This aims to allow for correct description of the topological
relationships of all bus masters, as well as bridge-like components
such as IOMMUs and other bus adaptors, any of which may be shared
in many-to-many configurations in an ARM SoC.


[1] see for example "[PATCH v12 11/31] documentation: iommu: add binding
document of Exynos System MMU"
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/251231.html


** Outline binding **

generic device node property: "slaves"

	optional

	type : cell array consisting of one or more phandles

	Implies that the device represented by the containing node
	can issue transactions to the referenced node.

	The referenced node is any bus or device node, and is
	interpreted in the usual way, including the treatment
	of ranges, #address-cells and #size-cells.  If the
	referenced node has a non-empty ranges property, the
	referencing node's #address-cells must be the same as
	that of the referenced node's device tree parent.

generic device node property: "slave-names"

	prohibited if there is no "slaves" property; otherwise
	optional.  Recommended if the "slaves" property has
	two or more entries.

	type : string list with the same number of entries as
		the number of cells in the value of the
		"slaves" property.

	Assigns symbolic names to the entries in the "slaves"
	property, permitting slave connections with different
	roles to be disambiguated.  See
	Documentation/devicetree/bindings/resource-names.txt

generic device node: "slave"

	optional

	Implies that the device represented by the containing
	node can issue transactions to the "slave" node.  "slave"
	would always have these semantics; whether other child
	nodes have a similar meaning is binding-specific.

	property : "name"

		optional

		Assigns a symbolic name to this slave with
		respect to the master.

If neither "slaves" nor any "slave" node is present, the topological
relationships are those defined by ePAPR: the device may or not be
a master, and if it is a master then it masters onto the parent node,
optionally propagating through the parent to the parent's parent
via mappings described by dma-ranges.


Rationale:

The intention is that by substituting a master's node for /, and
adding traversal rules for "slaves" properties and "slave" nodes in
addition to the usual devicetree parent/child relationship rules, the
path from any master to any addressable slave can be determined, along
with any mappings associated with it.

This allows the potentially unique address mappings for a particular
master to be determined, as well as allowing per-device and global DMA
masks to be derived from the device tree.

Slave references via this binding would be strictly unidirectional,
so any dma-ranges property on the slave end is ignored and the
master is not considered to be addressable by the slave (unless
there is another, separate path in the DT from the slave back to
the master).

I consider this reasonable because the bidirectional bus case is
already well described by ePAPR; so, the slaves convention should
only be used for unidirectional links that break the ePAPR mould.


Questions:

1) Should the names "slaves" and "slave" be globally generic?

   Pro: Making them generic permits some processing to be done on the DT
   without knowing the individual bindings for every node, such as
   figuring out the global DMA mask.  It should also encourage adoption
   of the bindings as a common approach.

   Con: Namespace pollution

   Otherwise, there could be a special string in the node's compatible
   list (strictly not "simple-bus") to indicate that these properties
   should be interpreted.

   The alternative is for the way of identifying a node's slaves to be
   binding-specific.  This makes some generic operations on the DT
   impossible without knowing all the bindings, such as analysing
   reachability or determining the effective DMA mask.  This analysis
   can be performed using generic bindings alone today, for systems
   describable by ePAPR.  Breaking this concept feels like a backward
   step.

2) The generic "slave" node(s) are for convenience and readability.
   They could be made eliminated by using child nodes with
   binding-specific names and referencing them in "slaves".  This is a
   bit more awkward, but has the same expressive power.

   Should the generic "slave" nodes go away?

3) Should "slave" or "slaves" be traversable for bridge- or bus-like
   nodes?

   Saying "no" to this makes it impossible for the reachability graph of
   the DT to contain cycles.  This is a clear benefit for any software
   attempting to parse the DT in a robust way.  Only the first link,
   from the initiating master to the first bridge, would be permitted
   to be a "slaves" link.

   Ideally, we would want an IOMMU's bridge-like role to be represented
   by some deep node in the DT: it can't usually be on the global path
   from / since CPUs typically don't master through the IOMMU.

   Parsers could be made robust while still permitting this, by
   truncating the search if the initial master node is reached.
   Ill-formed DTs could contains cycles that can't be resolved in
   this way, e.g., A -> B -> B.  For now it might be reasonable to
   check for this in dtc.

4) Is the location of the memory node going to cause problems?

   Even in complex systems, it is usually topologically correct (or
   at least correct enough) to put memory in /.  ePAPR does not actually
   say that the memory must be at / or that there must be only one
   memory node, but some software may be relying on this even if it's
   not correct with respect to the topology.

   Linux's early FDT parsing would probably be affected by moving this
   node, since it appears at least to ignore ranges properties.  It
   is highly likely that firmware and bootloaders that manipulate
   /memory would get confused if it was moved out of /.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-01 17:32 ` Dave Martin
@ 2014-05-02 11:05     ` Thierry Reding
  -1 siblings, 0 replies; 58+ messages in thread
From: Thierry Reding @ 2014-05-02 11:05 UTC (permalink / raw)
  To: Dave Martin
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Shaik Ameer Basha, Stephen Warren, Grant Grundler, Hiroshi Doyu,
	Arnd Bergmann, Jason Gunthorpe, Will Deacon, Mark Rutland,
	Marc Zyngier

[-- Attachment #1: Type: text/plain, Size: 4675 bytes --]

On Thu, May 01, 2014 at 06:32:48PM +0100, Dave Martin wrote:
[...]
> ** Outline binding **
> 
> generic device node property: "slaves"
> 
> 	optional
> 
> 	type : cell array consisting of one or more phandles
> 
> 	Implies that the device represented by the containing node
> 	can issue transactions to the referenced node.
> 
> 	The referenced node is any bus or device node, and is
> 	interpreted in the usual way, including the treatment
> 	of ranges, #address-cells and #size-cells.  If the
> 	referenced node has a non-empty ranges property, the
> 	referencing node's #address-cells must be the same as
> 	that of the referenced node's device tree parent.
> 
> generic device node property: "slave-names"
> 
> 	prohibited if there is no "slaves" property; otherwise
> 	optional.  Recommended if the "slaves" property has
> 	two or more entries.
> 
> 	type : string list with the same number of entries as
> 		the number of cells in the value of the
> 		"slaves" property.
> 
> 	Assigns symbolic names to the entries in the "slaves"
> 	property, permitting slave connections with different
> 	roles to be disambiguated.  See
> 	Documentation/devicetree/bindings/resource-names.txt
> 
> generic device node: "slave"
> 
> 	optional
> 
> 	Implies that the device represented by the containing
> 	node can issue transactions to the "slave" node.  "slave"
> 	would always have these semantics; whether other child
> 	nodes have a similar meaning is binding-specific.
> 
> 	property : "name"
> 
> 		optional
> 
> 		Assigns a symbolic name to this slave with
> 		respect to the master.
> 
> If neither "slaves" nor any "slave" node is present, the topological
> relationships are those defined by ePAPR: the device may or not be
> a master, and if it is a master then it masters onto the parent node,
> optionally propagating through the parent to the parent's parent
> via mappings described by dma-ranges.

Let me see if I understood the above proposal by trying to translate it
into a simple example for a specific use-case. On Tegra for example we
have various units that can either access system memory directly or use
the IOMMU to translate accesses for them. One such unit would be the
display controller that scans out a framebuffer from memory.

	dc@0,54200000 {
		...

		slave {
			/*
			 * 2 is the memory controller client ID of the
			 * display controller.
			 */
			iommu = <&iommu 2>;

			...
		};
	};

Admittedly this is probably a lot more trivial than what you're looking
for. There's no need for virtualization here, the IOMMU is simply used
to isolate memory accesses by devices. Still it's a use-case that needs
to be supported and one that at least Tegra and Exynos have an immediate
need for.

So the above isn't much different from the proposed bindings, except
that the iommu property is now nested within a slave node. I guess this
gives us a lot more flexibility to extend the description of a slave as
needed to represent more complex scenarios.

One thing that confuses me slightly about your proposal is that these
subnodes describe the master interfaces of the containing nodes. Would
it not be more appropriate to name the nodes "master" instead?

Also, are slaves/slave-names and slave subnodes mutually exclusive? It
sounds like slaves/slave-names would be a specialization of the slave
subnode concept for the trivial cases. Would the following be an
equivalent description of the above example?

	dc@0,54200000 {
		...

		slaves = <&iommu 2>;
	};

I don't see how it could be exactly equivalent since it misses context
regarding the type of slave that's being interacted with. Perhaps that
could be solved by making that knowledge driver-specific (i.e. the
driver for the Tegra display controller will know that it can only be
the master on an IOMMU and therefore derive the slave type). Or the
slave's type could be derived from the slave-names property.

While this proposal lacks specifics for IOMMU devices, I think it could
work well to describe them in a generic way. Especially when slave nodes
are used, arbitrarily more data can be added to describe more complex
master interfaces (DMA windows, ...).

I still see an issue with supporting this generically with the currently
recommended way to use IOMMUs (via the DMA mapping API). There's not
enough granularity in the API to support this. It's probably going to
work fine for Tegra, but I think for more complex cases drivers will
probably need to use the IOMMU API directly.

But that's an implementation detail and can probably be solved later.

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-02 11:05     ` Thierry Reding
  0 siblings, 0 replies; 58+ messages in thread
From: Thierry Reding @ 2014-05-02 11:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 01, 2014 at 06:32:48PM +0100, Dave Martin wrote:
[...]
> ** Outline binding **
> 
> generic device node property: "slaves"
> 
> 	optional
> 
> 	type : cell array consisting of one or more phandles
> 
> 	Implies that the device represented by the containing node
> 	can issue transactions to the referenced node.
> 
> 	The referenced node is any bus or device node, and is
> 	interpreted in the usual way, including the treatment
> 	of ranges, #address-cells and #size-cells.  If the
> 	referenced node has a non-empty ranges property, the
> 	referencing node's #address-cells must be the same as
> 	that of the referenced node's device tree parent.
> 
> generic device node property: "slave-names"
> 
> 	prohibited if there is no "slaves" property; otherwise
> 	optional.  Recommended if the "slaves" property has
> 	two or more entries.
> 
> 	type : string list with the same number of entries as
> 		the number of cells in the value of the
> 		"slaves" property.
> 
> 	Assigns symbolic names to the entries in the "slaves"
> 	property, permitting slave connections with different
> 	roles to be disambiguated.  See
> 	Documentation/devicetree/bindings/resource-names.txt
> 
> generic device node: "slave"
> 
> 	optional
> 
> 	Implies that the device represented by the containing
> 	node can issue transactions to the "slave" node.  "slave"
> 	would always have these semantics; whether other child
> 	nodes have a similar meaning is binding-specific.
> 
> 	property : "name"
> 
> 		optional
> 
> 		Assigns a symbolic name to this slave with
> 		respect to the master.
> 
> If neither "slaves" nor any "slave" node is present, the topological
> relationships are those defined by ePAPR: the device may or not be
> a master, and if it is a master then it masters onto the parent node,
> optionally propagating through the parent to the parent's parent
> via mappings described by dma-ranges.

Let me see if I understood the above proposal by trying to translate it
into a simple example for a specific use-case. On Tegra for example we
have various units that can either access system memory directly or use
the IOMMU to translate accesses for them. One such unit would be the
display controller that scans out a framebuffer from memory.

	dc at 0,54200000 {
		...

		slave {
			/*
			 * 2 is the memory controller client ID of the
			 * display controller.
			 */
			iommu = <&iommu 2>;

			...
		};
	};

Admittedly this is probably a lot more trivial than what you're looking
for. There's no need for virtualization here, the IOMMU is simply used
to isolate memory accesses by devices. Still it's a use-case that needs
to be supported and one that at least Tegra and Exynos have an immediate
need for.

So the above isn't much different from the proposed bindings, except
that the iommu property is now nested within a slave node. I guess this
gives us a lot more flexibility to extend the description of a slave as
needed to represent more complex scenarios.

One thing that confuses me slightly about your proposal is that these
subnodes describe the master interfaces of the containing nodes. Would
it not be more appropriate to name the nodes "master" instead?

Also, are slaves/slave-names and slave subnodes mutually exclusive? It
sounds like slaves/slave-names would be a specialization of the slave
subnode concept for the trivial cases. Would the following be an
equivalent description of the above example?

	dc at 0,54200000 {
		...

		slaves = <&iommu 2>;
	};

I don't see how it could be exactly equivalent since it misses context
regarding the type of slave that's being interacted with. Perhaps that
could be solved by making that knowledge driver-specific (i.e. the
driver for the Tegra display controller will know that it can only be
the master on an IOMMU and therefore derive the slave type). Or the
slave's type could be derived from the slave-names property.

While this proposal lacks specifics for IOMMU devices, I think it could
work well to describe them in a generic way. Especially when slave nodes
are used, arbitrarily more data can be added to describe more complex
master interfaces (DMA windows, ...).

I still see an issue with supporting this generically with the currently
recommended way to use IOMMUs (via the DMA mapping API). There's not
enough granularity in the API to support this. It's probably going to
work fine for Tegra, but I think for more complex cases drivers will
probably need to use the IOMMU API directly.

But that's an implementation detail and can probably be solved later.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140502/a79bbac7/attachment-0001.sig>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-02 11:05     ` Thierry Reding
@ 2014-05-02 12:32       ` Arnd Bergmann
  -1 siblings, 0 replies; 58+ messages in thread
From: Arnd Bergmann @ 2014-05-02 12:32 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Dave Martin, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Shaik Ameer Basha, Stephen Warren, Grant Grundler, Hiroshi Doyu,
	Jason Gunthorpe, Will Deacon, Mark Rutland, Marc Zyngier

On Friday 02 May 2014 13:05:58 Thierry Reding wrote:
> 
> Let me see if I understood the above proposal by trying to translate it
> into a simple example for a specific use-case. On Tegra for example we
> have various units that can either access system memory directly or use
> the IOMMU to translate accesses for them. One such unit would be the
> display controller that scans out a framebuffer from memory.

Can you explain how the decision is made whether the IOMMU gets used
or not? In all cases I've seen so far, I think we can hardwire this
in DT, and only expose one or the other. Are both ways used
concurrently?

>         dc@0,54200000 {
>                 ...
> 
>                 slave {
>                         /*
>                          * 2 is the memory controller client ID of the
>                          * display controller.
>                          */
>                         iommu = <&iommu 2>;
> 
>                         ...
>                 };
>         };
> 
> Admittedly this is probably a lot more trivial than what you're looking
> for. There's no need for virtualization here, the IOMMU is simply used
> to isolate memory accesses by devices. Still it's a use-case that needs
> to be supported and one that at least Tegra and Exynos have an immediate
> need for.
> 
> So the above isn't much different from the proposed bindings, except
> that the iommu property is now nested within a slave node. I guess this
> gives us a lot more flexibility to extend the description of a slave as
> needed to represent more complex scenarios.

This looks rather complicated to parse automatically in the generic
DT code when we try to decide which dma_map_ops to use. We'd have
to look for 'slave' nodes in each device we instatiate and then see
if they use an iommu or not.
 
> Also, are slaves/slave-names and slave subnodes mutually exclusive? It
> sounds like slaves/slave-names would be a specialization of the slave
> subnode concept for the trivial cases. Would the following be an
> equivalent description of the above example?
> 
>         dc@0,54200000 {
>                 ...
> 
>                 slaves = <&iommu 2>;
>         };
> 
> I don't see how it could be exactly equivalent since it misses context
> regarding the type of slave that's being interacted with. Perhaps that
> could be solved by making that knowledge driver-specific (i.e. the
> driver for the Tegra display controller will know that it can only be
> the master on an IOMMU and therefore derive the slave type). Or the
> slave's type could be derived from the slave-names property.

I'd rather have a device-specific property that tells the driver
about things the iommu driver doesn't need to know but the master
does. In most cases, we should be fine without a name attached to the
slave.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-02 12:32       ` Arnd Bergmann
  0 siblings, 0 replies; 58+ messages in thread
From: Arnd Bergmann @ 2014-05-02 12:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 02 May 2014 13:05:58 Thierry Reding wrote:
> 
> Let me see if I understood the above proposal by trying to translate it
> into a simple example for a specific use-case. On Tegra for example we
> have various units that can either access system memory directly or use
> the IOMMU to translate accesses for them. One such unit would be the
> display controller that scans out a framebuffer from memory.

Can you explain how the decision is made whether the IOMMU gets used
or not? In all cases I've seen so far, I think we can hardwire this
in DT, and only expose one or the other. Are both ways used
concurrently?

>         dc at 0,54200000 {
>                 ...
> 
>                 slave {
>                         /*
>                          * 2 is the memory controller client ID of the
>                          * display controller.
>                          */
>                         iommu = <&iommu 2>;
> 
>                         ...
>                 };
>         };
> 
> Admittedly this is probably a lot more trivial than what you're looking
> for. There's no need for virtualization here, the IOMMU is simply used
> to isolate memory accesses by devices. Still it's a use-case that needs
> to be supported and one that at least Tegra and Exynos have an immediate
> need for.
> 
> So the above isn't much different from the proposed bindings, except
> that the iommu property is now nested within a slave node. I guess this
> gives us a lot more flexibility to extend the description of a slave as
> needed to represent more complex scenarios.

This looks rather complicated to parse automatically in the generic
DT code when we try to decide which dma_map_ops to use. We'd have
to look for 'slave' nodes in each device we instatiate and then see
if they use an iommu or not.
 
> Also, are slaves/slave-names and slave subnodes mutually exclusive? It
> sounds like slaves/slave-names would be a specialization of the slave
> subnode concept for the trivial cases. Would the following be an
> equivalent description of the above example?
> 
>         dc at 0,54200000 {
>                 ...
> 
>                 slaves = <&iommu 2>;
>         };
> 
> I don't see how it could be exactly equivalent since it misses context
> regarding the type of slave that's being interacted with. Perhaps that
> could be solved by making that knowledge driver-specific (i.e. the
> driver for the Tegra display controller will know that it can only be
> the master on an IOMMU and therefore derive the slave type). Or the
> slave's type could be derived from the slave-names property.

I'd rather have a device-specific property that tells the driver
about things the iommu driver doesn't need to know but the master
does. In most cases, we should be fine without a name attached to the
slave.

	Arnd

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-02 12:32       ` Arnd Bergmann
@ 2014-05-02 13:23         ` Thierry Reding
  -1 siblings, 0 replies; 58+ messages in thread
From: Thierry Reding @ 2014-05-02 13:23 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Dave Martin, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Shaik Ameer Basha, Stephen Warren, Grant Grundler, Hiroshi Doyu,
	Jason Gunthorpe, Will Deacon, Mark Rutland, Marc Zyngier

[-- Attachment #1: Type: text/plain, Size: 4912 bytes --]

On Fri, May 02, 2014 at 02:32:08PM +0200, Arnd Bergmann wrote:
> On Friday 02 May 2014 13:05:58 Thierry Reding wrote:
> > 
> > Let me see if I understood the above proposal by trying to translate it
> > into a simple example for a specific use-case. On Tegra for example we
> > have various units that can either access system memory directly or use
> > the IOMMU to translate accesses for them. One such unit would be the
> > display controller that scans out a framebuffer from memory.
> 
> Can you explain how the decision is made whether the IOMMU gets used
> or not? In all cases I've seen so far, I think we can hardwire this
> in DT, and only expose one or the other. Are both ways used
> concurrently?

It should be possible to hardcode this in DT for Tegra. As I understand
it, both interfaces can't be used at the same time. Once translation has
been enabled for one client, all accesses generated by that client will
be translated.

Hiroshi, please correct me if I'm wrong.

> >         dc@0,54200000 {
> >                 ...
> > 
> >                 slave {
> >                         /*
> >                          * 2 is the memory controller client ID of the
> >                          * display controller.
> >                          */
> >                         iommu = <&iommu 2>;
> > 
> >                         ...
> >                 };
> >         };
> > 
> > Admittedly this is probably a lot more trivial than what you're looking
> > for. There's no need for virtualization here, the IOMMU is simply used
> > to isolate memory accesses by devices. Still it's a use-case that needs
> > to be supported and one that at least Tegra and Exynos have an immediate
> > need for.
> > 
> > So the above isn't much different from the proposed bindings, except
> > that the iommu property is now nested within a slave node. I guess this
> > gives us a lot more flexibility to extend the description of a slave as
> > needed to represent more complex scenarios.
> 
> This looks rather complicated to parse automatically in the generic
> DT code when we try to decide which dma_map_ops to use. We'd have
> to look for 'slave' nodes in each device we instatiate and then see
> if they use an iommu or not.

But we need to do that now anyway in order to find an iommu property,
don't we? Adding one extra level here shouldn't be all that bad if it
gives us more flexibility or uniformity with more complicated setups.

To some degree this also depends on how we want to handle IOMMUs. If
they should remain transparently handled via dma_map_ops, then it makes
sense to set this up at device instantiation time. But how can we handle
this in situations where one device needs to master on two IOMMUs at the
same time? Or if the device needs physically contiguous memory for
purposes other than device I/O. Using dma_map_ops we can't control which
allocations get mapped via the IOMMU and which don't.

> > Also, are slaves/slave-names and slave subnodes mutually exclusive? It
> > sounds like slaves/slave-names would be a specialization of the slave
> > subnode concept for the trivial cases. Would the following be an
> > equivalent description of the above example?
> > 
> >         dc@0,54200000 {
> >                 ...
> > 
> >                 slaves = <&iommu 2>;
> >         };
> > 
> > I don't see how it could be exactly equivalent since it misses context
> > regarding the type of slave that's being interacted with. Perhaps that
> > could be solved by making that knowledge driver-specific (i.e. the
> > driver for the Tegra display controller will know that it can only be
> > the master on an IOMMU and therefore derive the slave type). Or the
> > slave's type could be derived from the slave-names property.
> 
> I'd rather have a device-specific property that tells the driver
> about things the iommu driver doesn't need to know but the master
> does. In most cases, we should be fine without a name attached to the
> slave.

For the easy cases where we either have no IOMMU or a single IOMMU per
device, that should work fine. This only becomes problematic when there
are more than one, since you need to distinguish between possibly more
than one type.

As I understand it, Dave's proposal is for generic bus masters, which
may be an IOMMU but could also be something completely different. So in
those cases we need extra meta information so that we can look up the
proper type of object.

FWIW, I do prefer a device-specific property as well. Though I also
think we should agree on common names (like "iommu") as part of the
binding definition. And then, even for devices that can use multiple
IOMMUs they can be differentiated using iommu-names for example. If
another type of master interface is required, then a different set of
property names can be used.

Thierry

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-02 13:23         ` Thierry Reding
  0 siblings, 0 replies; 58+ messages in thread
From: Thierry Reding @ 2014-05-02 13:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 02, 2014 at 02:32:08PM +0200, Arnd Bergmann wrote:
> On Friday 02 May 2014 13:05:58 Thierry Reding wrote:
> > 
> > Let me see if I understood the above proposal by trying to translate it
> > into a simple example for a specific use-case. On Tegra for example we
> > have various units that can either access system memory directly or use
> > the IOMMU to translate accesses for them. One such unit would be the
> > display controller that scans out a framebuffer from memory.
> 
> Can you explain how the decision is made whether the IOMMU gets used
> or not? In all cases I've seen so far, I think we can hardwire this
> in DT, and only expose one or the other. Are both ways used
> concurrently?

It should be possible to hardcode this in DT for Tegra. As I understand
it, both interfaces can't be used at the same time. Once translation has
been enabled for one client, all accesses generated by that client will
be translated.

Hiroshi, please correct me if I'm wrong.

> >         dc at 0,54200000 {
> >                 ...
> > 
> >                 slave {
> >                         /*
> >                          * 2 is the memory controller client ID of the
> >                          * display controller.
> >                          */
> >                         iommu = <&iommu 2>;
> > 
> >                         ...
> >                 };
> >         };
> > 
> > Admittedly this is probably a lot more trivial than what you're looking
> > for. There's no need for virtualization here, the IOMMU is simply used
> > to isolate memory accesses by devices. Still it's a use-case that needs
> > to be supported and one that at least Tegra and Exynos have an immediate
> > need for.
> > 
> > So the above isn't much different from the proposed bindings, except
> > that the iommu property is now nested within a slave node. I guess this
> > gives us a lot more flexibility to extend the description of a slave as
> > needed to represent more complex scenarios.
> 
> This looks rather complicated to parse automatically in the generic
> DT code when we try to decide which dma_map_ops to use. We'd have
> to look for 'slave' nodes in each device we instatiate and then see
> if they use an iommu or not.

But we need to do that now anyway in order to find an iommu property,
don't we? Adding one extra level here shouldn't be all that bad if it
gives us more flexibility or uniformity with more complicated setups.

To some degree this also depends on how we want to handle IOMMUs. If
they should remain transparently handled via dma_map_ops, then it makes
sense to set this up at device instantiation time. But how can we handle
this in situations where one device needs to master on two IOMMUs at the
same time? Or if the device needs physically contiguous memory for
purposes other than device I/O. Using dma_map_ops we can't control which
allocations get mapped via the IOMMU and which don't.

> > Also, are slaves/slave-names and slave subnodes mutually exclusive? It
> > sounds like slaves/slave-names would be a specialization of the slave
> > subnode concept for the trivial cases. Would the following be an
> > equivalent description of the above example?
> > 
> >         dc at 0,54200000 {
> >                 ...
> > 
> >                 slaves = <&iommu 2>;
> >         };
> > 
> > I don't see how it could be exactly equivalent since it misses context
> > regarding the type of slave that's being interacted with. Perhaps that
> > could be solved by making that knowledge driver-specific (i.e. the
> > driver for the Tegra display controller will know that it can only be
> > the master on an IOMMU and therefore derive the slave type). Or the
> > slave's type could be derived from the slave-names property.
> 
> I'd rather have a device-specific property that tells the driver
> about things the iommu driver doesn't need to know but the master
> does. In most cases, we should be fine without a name attached to the
> slave.

For the easy cases where we either have no IOMMU or a single IOMMU per
device, that should work fine. This only becomes problematic when there
are more than one, since you need to distinguish between possibly more
than one type.

As I understand it, Dave's proposal is for generic bus masters, which
may be an IOMMU but could also be something completely different. So in
those cases we need extra meta information so that we can look up the
proper type of object.

FWIW, I do prefer a device-specific property as well. Though I also
think we should agree on common names (like "iommu") as part of the
binding definition. And then, even for devices that can use multiple
IOMMUs they can be differentiated using iommu-names for example. If
another type of master interface is required, then a different set of
property names can be used.

Thierry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140502/4e772132/attachment.sig>

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-02 13:23         ` Thierry Reding
@ 2014-05-02 15:19           ` Arnd Bergmann
  -1 siblings, 0 replies; 58+ messages in thread
From: Arnd Bergmann @ 2014-05-02 15:19 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Dave Martin, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Shaik Ameer Basha, Stephen Warren, Grant Grundler, Hiroshi Doyu,
	Jason Gunthorpe, Will Deacon, Mark Rutland, Marc Zyngier

On Friday 02 May 2014 15:23:29 Thierry Reding wrote:
> On Fri, May 02, 2014 at 02:32:08PM +0200, Arnd Bergmann wrote:
> > On Friday 02 May 2014 13:05:58 Thierry Reding wrote:
> > > 
> > > Let me see if I understood the above proposal by trying to translate it
> > > into a simple example for a specific use-case. On Tegra for example we
> > > have various units that can either access system memory directly or use
> > > the IOMMU to translate accesses for them. One such unit would be the
> > > display controller that scans out a framebuffer from memory.
> > 
> > Can you explain how the decision is made whether the IOMMU gets used
> > or not? In all cases I've seen so far, I think we can hardwire this
> > in DT, and only expose one or the other. Are both ways used
> > concurrently?
> 
> It should be possible to hardcode this in DT for Tegra. As I understand
> it, both interfaces can't be used at the same time. Once translation has
> been enabled for one client, all accesses generated by that client will
> be translated.

Ok.

> > >         dc@0,54200000 {
> > >                 ...
> > > 
> > >                 slave {
> > >                         /*
> > >                          * 2 is the memory controller client ID of the
> > >                          * display controller.
> > >                          */
> > >                         iommu = <&iommu 2>;
> > > 
> > >                         ...
> > >                 };
> > >         };
> > > 
> > > Admittedly this is probably a lot more trivial than what you're looking
> > > for. There's no need for virtualization here, the IOMMU is simply used
> > > to isolate memory accesses by devices. Still it's a use-case that needs
> > > to be supported and one that at least Tegra and Exynos have an immediate
> > > need for.
> > > 
> > > So the above isn't much different from the proposed bindings, except
> > > that the iommu property is now nested within a slave node. I guess this
> > > gives us a lot more flexibility to extend the description of a slave as
> > > needed to represent more complex scenarios.
> > 
> > This looks rather complicated to parse automatically in the generic
> > DT code when we try to decide which dma_map_ops to use. We'd have
> > to look for 'slave' nodes in each device we instatiate and then see
> > if they use an iommu or not.
> 
> But we need to do that now anyway in order to find an iommu property,
> don't we? Adding one extra level here shouldn't be all that bad if it
> gives us more flexibility or uniformity with more complicated setups.

The common code just needs to know whether an IOMMU is in use or
not, and what the mask/offset are.

> To some degree this also depends on how we want to handle IOMMUs. If
> they should remain transparently handled via dma_map_ops, then it makes
> sense to set this up at device instantiation time. But how can we handle
> this in situations where one device needs to master on two IOMMUs at the
> same time? Or if the device needs physically contiguous memory for
> purposes other than device I/O. Using dma_map_ops we can't control which
> allocations get mapped via the IOMMU and which don't.

I still hope we can handle this in common code by selecting the right
dma_map_ops when the devices are instantiated, at least for 99% of the
cases. I'm not convinced we really need to handle the 'multiple IOMMUs
on one device' case in a generic way. If there are no common use cases
for that, we can probably get away with having multiple device nodes
and an ugly driver for the exception, instead of making life complicated
for everybody.

> > > Also, are slaves/slave-names and slave subnodes mutually exclusive? It
> > > sounds like slaves/slave-names would be a specialization of the slave
> > > subnode concept for the trivial cases. Would the following be an
> > > equivalent description of the above example?
> > > 
> > >         dc@0,54200000 {
> > >                 ...
> > > 
> > >                 slaves = <&iommu 2>;
> > >         };
> > > 
> > > I don't see how it could be exactly equivalent since it misses context
> > > regarding the type of slave that's being interacted with. Perhaps that
> > > could be solved by making that knowledge driver-specific (i.e. the
> > > driver for the Tegra display controller will know that it can only be
> > > the master on an IOMMU and therefore derive the slave type). Or the
> > > slave's type could be derived from the slave-names property.
> > 
> > I'd rather have a device-specific property that tells the driver
> > about things the iommu driver doesn't need to know but the master
> > does. In most cases, we should be fine without a name attached to the
> > slave.
> 
> For the easy cases where we either have no IOMMU or a single IOMMU per
> device, that should work fine. This only becomes problematic when there
> are more than one, since you need to distinguish between possibly more
> than one type.
> 
> As I understand it, Dave's proposal is for generic bus masters, which
> may be an IOMMU but could also be something completely different. So in
> those cases we need extra meta information so that we can look up the
> proper type of object.

Doing something complicated for the IOMMUs themselves seems fine, also
for other nonstandard devices that are just weird. I just want to
handle the simple case automatically.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-02 15:19           ` Arnd Bergmann
  0 siblings, 0 replies; 58+ messages in thread
From: Arnd Bergmann @ 2014-05-02 15:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 02 May 2014 15:23:29 Thierry Reding wrote:
> On Fri, May 02, 2014 at 02:32:08PM +0200, Arnd Bergmann wrote:
> > On Friday 02 May 2014 13:05:58 Thierry Reding wrote:
> > > 
> > > Let me see if I understood the above proposal by trying to translate it
> > > into a simple example for a specific use-case. On Tegra for example we
> > > have various units that can either access system memory directly or use
> > > the IOMMU to translate accesses for them. One such unit would be the
> > > display controller that scans out a framebuffer from memory.
> > 
> > Can you explain how the decision is made whether the IOMMU gets used
> > or not? In all cases I've seen so far, I think we can hardwire this
> > in DT, and only expose one or the other. Are both ways used
> > concurrently?
> 
> It should be possible to hardcode this in DT for Tegra. As I understand
> it, both interfaces can't be used at the same time. Once translation has
> been enabled for one client, all accesses generated by that client will
> be translated.

Ok.

> > >         dc at 0,54200000 {
> > >                 ...
> > > 
> > >                 slave {
> > >                         /*
> > >                          * 2 is the memory controller client ID of the
> > >                          * display controller.
> > >                          */
> > >                         iommu = <&iommu 2>;
> > > 
> > >                         ...
> > >                 };
> > >         };
> > > 
> > > Admittedly this is probably a lot more trivial than what you're looking
> > > for. There's no need for virtualization here, the IOMMU is simply used
> > > to isolate memory accesses by devices. Still it's a use-case that needs
> > > to be supported and one that at least Tegra and Exynos have an immediate
> > > need for.
> > > 
> > > So the above isn't much different from the proposed bindings, except
> > > that the iommu property is now nested within a slave node. I guess this
> > > gives us a lot more flexibility to extend the description of a slave as
> > > needed to represent more complex scenarios.
> > 
> > This looks rather complicated to parse automatically in the generic
> > DT code when we try to decide which dma_map_ops to use. We'd have
> > to look for 'slave' nodes in each device we instatiate and then see
> > if they use an iommu or not.
> 
> But we need to do that now anyway in order to find an iommu property,
> don't we? Adding one extra level here shouldn't be all that bad if it
> gives us more flexibility or uniformity with more complicated setups.

The common code just needs to know whether an IOMMU is in use or
not, and what the mask/offset are.

> To some degree this also depends on how we want to handle IOMMUs. If
> they should remain transparently handled via dma_map_ops, then it makes
> sense to set this up at device instantiation time. But how can we handle
> this in situations where one device needs to master on two IOMMUs at the
> same time? Or if the device needs physically contiguous memory for
> purposes other than device I/O. Using dma_map_ops we can't control which
> allocations get mapped via the IOMMU and which don't.

I still hope we can handle this in common code by selecting the right
dma_map_ops when the devices are instantiated, at least for 99% of the
cases. I'm not convinced we really need to handle the 'multiple IOMMUs
on one device' case in a generic way. If there are no common use cases
for that, we can probably get away with having multiple device nodes
and an ugly driver for the exception, instead of making life complicated
for everybody.

> > > Also, are slaves/slave-names and slave subnodes mutually exclusive? It
> > > sounds like slaves/slave-names would be a specialization of the slave
> > > subnode concept for the trivial cases. Would the following be an
> > > equivalent description of the above example?
> > > 
> > >         dc at 0,54200000 {
> > >                 ...
> > > 
> > >                 slaves = <&iommu 2>;
> > >         };
> > > 
> > > I don't see how it could be exactly equivalent since it misses context
> > > regarding the type of slave that's being interacted with. Perhaps that
> > > could be solved by making that knowledge driver-specific (i.e. the
> > > driver for the Tegra display controller will know that it can only be
> > > the master on an IOMMU and therefore derive the slave type). Or the
> > > slave's type could be derived from the slave-names property.
> > 
> > I'd rather have a device-specific property that tells the driver
> > about things the iommu driver doesn't need to know but the master
> > does. In most cases, we should be fine without a name attached to the
> > slave.
> 
> For the easy cases where we either have no IOMMU or a single IOMMU per
> device, that should work fine. This only becomes problematic when there
> are more than one, since you need to distinguish between possibly more
> than one type.
> 
> As I understand it, Dave's proposal is for generic bus masters, which
> may be an IOMMU but could also be something completely different. So in
> those cases we need extra meta information so that we can look up the
> proper type of object.

Doing something complicated for the IOMMUs themselves seems fine, also
for other nonstandard devices that are just weird. I just want to
handle the simple case automatically.

	Arnd

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-01 17:32 ` Dave Martin
@ 2014-05-02 16:14     ` Arnd Bergmann
  -1 siblings, 0 replies; 58+ messages in thread
From: Arnd Bergmann @ 2014-05-02 16:14 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: Dave Martin, devicetree-u79uwXL29TY76Z2rM5mHXA, Mark Rutland,
	Shaik Ameer Basha, Stephen Warren, Grant Grundler, Will Deacon,
	Jason Gunthorpe, Marc Zyngier, Thierry Reding, Hiroshi Doyu

On Thursday 01 May 2014 18:32:48 Dave Martin wrote:
> (Note, this is a long mail -- people in a hurry may want to skip to
> "Outline binding" to get a feel for what is bring proposed, before
> returning to the background wording.)
> 
> As highlighted in some previous discussions[1], it is becoming possible
> to build ARM-based SoCs that seem to be impossible to describe using the
> DT bindings currently specified by ePAPR.  This is driven by increasing
> complexity of interconnects, the appearance of IOMMUs, MSI-capable
> interrupt controllers and multiple bus masters.
> 
> This issue is not fundamental to ARM and could apply to other SoC
> families with a similar bus architecture, but most of the current
> discussion in this area has been about how to address these
> requirements for ARM SoCs.
> 
> This RFC is an outline for some core bindings to solve part of the
> problem of describing such systems, particularly how to describe master/
> slave relationships not currently representable in DT.  It is premature
> to make a concrete proposal yet: rather I'm presenting this as a starting
> point for discussion initially.
> 
> The intent is not to rewrite existing bindings, but to define a common
> DT approach for describing otherwise problematic features of future
> systems.  Actual Linux support for this could be implemented as needed.

Thanks a lot for getting this rolling!


> ** Outline binding **
> 
> generic device node property: "slaves"
> 
> 	optional
> 
> 	type : cell array consisting of one or more phandles
> 
> 	Implies that the device represented by the containing node
> 	can issue transactions to the referenced node.
> 
> 	The referenced node is any bus or device node, and is
> 	interpreted in the usual way, including the treatment
> 	of ranges, #address-cells and #size-cells.  If the
> 	referenced node has a non-empty ranges property, the
> 	referencing node's #address-cells must be the same as
> 	that of the referenced node's device tree parent.

I guess you mean "dma-ranges" here, not "ranges", right?
I don't see how "ranges" is even relevant for this.

Don't you need arguments to the phandle? It seems that in most
cases, you need at least one of a dma-ranges like translation
or a master ID. What you need would be specific to the slave.

It may be best to make the ranges explicit here and then also
allow additional fields depending on e.g. a #dma-slave-cells
property in the slave.

For instance, a 32-bit master on a a 64-bit bus that has master-id
23 would look like

	otherbus: axi@somewhere{
		#address-cells = <2>;
		#size-cells = <2>;
	};

	somemaster@somewhere {
		#address-cells = <1>;
		#size-cells = <1>;
		slaves = <&otherbus  // phandle
				0     // local address
				0 0   // remote address
				0x1 0 // size
				23>;  // master id
	};

> Questions:
> 
> 1) Should the names "slaves" and "slave" be globally generic?
> 
>    Pro: Making them generic permits some processing to be done on the DT
>    without knowing the individual bindings for every node, such as
>    figuring out the global DMA mask.  It should also encourage adoption
>    of the bindings as a common approach.
> 
>    Con: Namespace pollution
> 
>    Otherwise, there could be a special string in the node's compatible
>    list (strictly not "simple-bus") to indicate that these properties
>    should be interpreted.
> 
>    The alternative is for the way of identifying a node's slaves to be
>    binding-specific.  This makes some generic operations on the DT
>    impossible without knowing all the bindings, such as analysing
>    reachability or determining the effective DMA mask.  This analysis
>    can be performed using generic bindings alone today, for systems
>    describable by ePAPR.  Breaking this concept feels like a backward
>    step.

How about being slightly more specific, using "dma-slaves" and
"dma-slave-names" etc?

> 2) The generic "slave" node(s) are for convenience and readability.
>    They could be made eliminated by using child nodes with
>    binding-specific names and referencing them in "slaves".  This is a
>    bit more awkward, but has the same expressive power.
> 
>    Should the generic "slave" nodes go away?

I would prefer not having to have subnodes for the simple case
where you just need to reference one slave iommu from a master
device.

It could be a recommendation for devices that have multiple slaves,
but I still haven't seen an example where this is actually needed.

> 3) Should "slave" or "slaves" be traversable for bridge- or bus-like
>    nodes?
> 
>    Saying "no" to this makes it impossible for the reachability graph of
>    the DT to contain cycles.  This is a clear benefit for any software
>    attempting to parse the DT in a robust way.  Only the first link,
>    from the initiating master to the first bridge, would be permitted
>    to be a "slaves" link.
> 
>    Ideally, we would want an IOMMU's bridge-like role to be represented
>    by some deep node in the DT: it can't usually be on the global path
>    from / since CPUs typically don't master through the IOMMU.
> 
>    Parsers could be made robust while still permitting this, by
>    truncating the search if the initial master node is reached.
>    Ill-formed DTs could contains cycles that can't be resolved in
>    this way, e.g., A -> B -> B.  For now it might be reasonable to
>    check for this in dtc.

I wouldn't be worried about cycles. We can just declare them forbidden
in the binding. Anything can break if you supply a broken DT, this
is the least of the problems.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-02 16:14     ` Arnd Bergmann
  0 siblings, 0 replies; 58+ messages in thread
From: Arnd Bergmann @ 2014-05-02 16:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 01 May 2014 18:32:48 Dave Martin wrote:
> (Note, this is a long mail -- people in a hurry may want to skip to
> "Outline binding" to get a feel for what is bring proposed, before
> returning to the background wording.)
> 
> As highlighted in some previous discussions[1], it is becoming possible
> to build ARM-based SoCs that seem to be impossible to describe using the
> DT bindings currently specified by ePAPR.  This is driven by increasing
> complexity of interconnects, the appearance of IOMMUs, MSI-capable
> interrupt controllers and multiple bus masters.
> 
> This issue is not fundamental to ARM and could apply to other SoC
> families with a similar bus architecture, but most of the current
> discussion in this area has been about how to address these
> requirements for ARM SoCs.
> 
> This RFC is an outline for some core bindings to solve part of the
> problem of describing such systems, particularly how to describe master/
> slave relationships not currently representable in DT.  It is premature
> to make a concrete proposal yet: rather I'm presenting this as a starting
> point for discussion initially.
> 
> The intent is not to rewrite existing bindings, but to define a common
> DT approach for describing otherwise problematic features of future
> systems.  Actual Linux support for this could be implemented as needed.

Thanks a lot for getting this rolling!


> ** Outline binding **
> 
> generic device node property: "slaves"
> 
> 	optional
> 
> 	type : cell array consisting of one or more phandles
> 
> 	Implies that the device represented by the containing node
> 	can issue transactions to the referenced node.
> 
> 	The referenced node is any bus or device node, and is
> 	interpreted in the usual way, including the treatment
> 	of ranges, #address-cells and #size-cells.  If the
> 	referenced node has a non-empty ranges property, the
> 	referencing node's #address-cells must be the same as
> 	that of the referenced node's device tree parent.

I guess you mean "dma-ranges" here, not "ranges", right?
I don't see how "ranges" is even relevant for this.

Don't you need arguments to the phandle? It seems that in most
cases, you need at least one of a dma-ranges like translation
or a master ID. What you need would be specific to the slave.

It may be best to make the ranges explicit here and then also
allow additional fields depending on e.g. a #dma-slave-cells
property in the slave.

For instance, a 32-bit master on a a 64-bit bus that has master-id
23 would look like

	otherbus: axi at somewhere{
		#address-cells = <2>;
		#size-cells = <2>;
	};

	somemaster at somewhere {
		#address-cells = <1>;
		#size-cells = <1>;
		slaves = <&otherbus  // phandle
				0     // local address
				0 0   // remote address
				0x1 0 // size
				23>;  // master id
	};

> Questions:
> 
> 1) Should the names "slaves" and "slave" be globally generic?
> 
>    Pro: Making them generic permits some processing to be done on the DT
>    without knowing the individual bindings for every node, such as
>    figuring out the global DMA mask.  It should also encourage adoption
>    of the bindings as a common approach.
> 
>    Con: Namespace pollution
> 
>    Otherwise, there could be a special string in the node's compatible
>    list (strictly not "simple-bus") to indicate that these properties
>    should be interpreted.
> 
>    The alternative is for the way of identifying a node's slaves to be
>    binding-specific.  This makes some generic operations on the DT
>    impossible without knowing all the bindings, such as analysing
>    reachability or determining the effective DMA mask.  This analysis
>    can be performed using generic bindings alone today, for systems
>    describable by ePAPR.  Breaking this concept feels like a backward
>    step.

How about being slightly more specific, using "dma-slaves" and
"dma-slave-names" etc?

> 2) The generic "slave" node(s) are for convenience and readability.
>    They could be made eliminated by using child nodes with
>    binding-specific names and referencing them in "slaves".  This is a
>    bit more awkward, but has the same expressive power.
> 
>    Should the generic "slave" nodes go away?

I would prefer not having to have subnodes for the simple case
where you just need to reference one slave iommu from a master
device.

It could be a recommendation for devices that have multiple slaves,
but I still haven't seen an example where this is actually needed.

> 3) Should "slave" or "slaves" be traversable for bridge- or bus-like
>    nodes?
> 
>    Saying "no" to this makes it impossible for the reachability graph of
>    the DT to contain cycles.  This is a clear benefit for any software
>    attempting to parse the DT in a robust way.  Only the first link,
>    from the initiating master to the first bridge, would be permitted
>    to be a "slaves" link.
> 
>    Ideally, we would want an IOMMU's bridge-like role to be represented
>    by some deep node in the DT: it can't usually be on the global path
>    from / since CPUs typically don't master through the IOMMU.
> 
>    Parsers could be made robust while still permitting this, by
>    truncating the search if the initial master node is reached.
>    Ill-formed DTs could contains cycles that can't be resolved in
>    this way, e.g., A -> B -> B.  For now it might be reasonable to
>    check for this in dtc.

I wouldn't be worried about cycles. We can just declare them forbidden
in the binding. Anything can break if you supply a broken DT, this
is the least of the problems.

	Arnd

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-02 11:05     ` Thierry Reding
@ 2014-05-02 16:19       ` Dave Martin
  -1 siblings, 0 replies; 58+ messages in thread
From: Dave Martin @ 2014-05-02 16:19 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Shaik Ameer Basha, Arnd Bergmann, Stephen Warren, Grant Grundler,
	Will Deacon, Jason Gunthorpe, Marc Zyngier,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Hiroshi Doyu

On Fri, May 02, 2014 at 01:05:58PM +0200, Thierry Reding wrote:
> On Thu, May 01, 2014 at 06:32:48PM +0100, Dave Martin wrote:
> [...]
> > ** Outline binding **
> > 
> > generic device node property: "slaves"
> > 
> > 	optional
> > 
> > 	type : cell array consisting of one or more phandles
> > 
> > 	Implies that the device represented by the containing node
> > 	can issue transactions to the referenced node.
> > 
> > 	The referenced node is any bus or device node, and is
> > 	interpreted in the usual way, including the treatment
> > 	of ranges, #address-cells and #size-cells.  If the
> > 	referenced node has a non-empty ranges property, the
> > 	referencing node's #address-cells must be the same as
> > 	that of the referenced node's device tree parent.
> > 
> > generic device node property: "slave-names"
> > 
> > 	prohibited if there is no "slaves" property; otherwise
> > 	optional.  Recommended if the "slaves" property has
> > 	two or more entries.
> > 
> > 	type : string list with the same number of entries as
> > 		the number of cells in the value of the
> > 		"slaves" property.
> > 
> > 	Assigns symbolic names to the entries in the "slaves"
> > 	property, permitting slave connections with different
> > 	roles to be disambiguated.  See
> > 	Documentation/devicetree/bindings/resource-names.txt
> > 
> > generic device node: "slave"
> > 
> > 	optional
> > 
> > 	Implies that the device represented by the containing
> > 	node can issue transactions to the "slave" node.  "slave"
> > 	would always have these semantics; whether other child
> > 	nodes have a similar meaning is binding-specific.
> > 
> > 	property : "name"
> > 
> > 		optional
> > 
> > 		Assigns a symbolic name to this slave with
> > 		respect to the master.
> > 
> > If neither "slaves" nor any "slave" node is present, the topological
> > relationships are those defined by ePAPR: the device may or not be
> > a master, and if it is a master then it masters onto the parent node,
> > optionally propagating through the parent to the parent's parent
> > via mappings described by dma-ranges.
> 
> Let me see if I understood the above proposal by trying to translate it
> into a simple example for a specific use-case. On Tegra for example we
> have various units that can either access system memory directly or use
> the IOMMU to translate accesses for them. One such unit would be the
> display controller that scans out a framebuffer from memory.
> 
> 	dc@0,54200000 {
> 		...
> 
> 		slave {
> 			/*
> 			 * 2 is the memory controller client ID of the
> 			 * display controller.
> 			 */
> 			iommu = <&iommu 2>;
> 
> 			...
> 		};
> 	};
> 
> Admittedly this is probably a lot more trivial than what you're looking
> for. There's no need for virtualization here, the IOMMU is simply used
> to isolate memory accesses by devices. Still it's a use-case that needs
> to be supported and one that at least Tegra and Exynos have an immediate
> need for.
> 
> So the above isn't much different from the proposed bindings, except
> that the iommu property is now nested within a slave node. I guess this
> gives us a lot more flexibility to extend the description of a slave as
> needed to represent more complex scenarios.
> 
> One thing that confuses me slightly about your proposal is that these
> subnodes describe the master interfaces of the containing nodes. Would
> it not be more appropriate to name the nodes "master" instead?
> 
> Also, are slaves/slave-names and slave subnodes mutually exclusive? It
> sounds like slaves/slave-names would be a specialization of the slave
> subnode concept for the trivial cases. Would the following be an
> equivalent description of the above example?
> 
> 	dc@0,54200000 {
> 		...
> 
> 		slaves = <&iommu 2>;
> 	};

A slave node has the same meaning as a node referenced by a phandle
in slaves.  The only difference is where in the DT the node appears:
if the slave is shared by two masters then it probably doesn't make
sense to bury the node inside one of the masters' nodes.

I would picture the DT for your example system something like this:

	root: / {
		#address-cells = <2>;
		#address-cells = <2>;
		memory { ... };

		peripherals {
			ranges = ...;

			iommu {
				reg = < ... >;
				interrupts = < ... >;
				slaves = <&root>;
			};

			dc {
				slaves = <&iommu>;
				...
			};

			dmac@1 {
				slaves = <&iommu>;
				...
			};

			dmac@2 {
				slave {
					#address-cells = <2>;
					#size-cells = <2>;
					ranges = < 0 0 0 0 0 1 >;
					slaves = <&root>;
				};
			};
		};
	};

Here, for reasons best known to the hardware designers, dmac@2 is
not connected via the IOMMU, but is only 32-bit capable, hence
we have a ranges property to describe how the root address space
is seen by dmac@2.  (slave/#size-cells is 2 purely because we cannot
describe a size of 2^32 with #address-cells=1).

I could equally have done

	dmac@2 {
		slaves = <&foo>;

		foo {
			#address-cells = <2>;
			#size-cells = <2>;
			ranges = < 0 0 0 0 0 1 >;
			slaves = <&root>;
		};
	};

... hence the doubt about whether "slave" nodes are really useful
in addition to "slaves" properties.

> I don't see how it could be exactly equivalent since it misses context
> regarding the type of slave that's being interacted with. Perhaps that

The compatible string on the referenced node tells you what it is.

The driver associated with the "iommu" node would provide functionality
to report the downstream mapping capabilities in this case, though it's
possible we could do parts of it in a generic way.

> could be solved by making that knowledge driver-specific (i.e. the
> driver for the Tegra display controller will know that it can only be
> the master on an IOMMU and therefore derive the slave type). Or the
> slave's type could be derived from the slave-names property.
> 
> While this proposal lacks specifics for IOMMU devices, I think it could

In case it wasn't clear, this proposoal is only about how to describe
linkage.  For IOMMUs and some other devices we also need to describe
how some other bus signals get mapped etc.

I have some ideas on these other areas, but I didn't want to dump it
all at once.

> work well to describe them in a generic way. Especially when slave nodes
> are used, arbitrarily more data can be added to describe more complex
> master interfaces (DMA windows, ...).
> 
> I still see an issue with supporting this generically with the currently
> recommended way to use IOMMUs (via the DMA mapping API). There's not
> enough granularity in the API to support this. It's probably going to
> work fine for Tegra, but I think for more complex cases drivers will
> probably need to use the IOMMU API directly.

Initially I would expect that Linux would only support rather limited
variations on this theme.

In particular, we would determine the path from each master device to
/memory, recording the mappings and components that we go through
along the way.

The simplest cases are:

  a) There is nothing on the path except static address remappings.

  b) There is nothing on the path except a single IOMMU.

> But that's an implementation detail and can probably be solved later.

There is no need why Linux should cope with anything that doesn't
match the simple cases until/unless such a system appears.  When and
if that happens, we would have a ready-made way of describing it.

Cheers
---Dave

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-02 16:19       ` Dave Martin
  0 siblings, 0 replies; 58+ messages in thread
From: Dave Martin @ 2014-05-02 16:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 02, 2014 at 01:05:58PM +0200, Thierry Reding wrote:
> On Thu, May 01, 2014 at 06:32:48PM +0100, Dave Martin wrote:
> [...]
> > ** Outline binding **
> > 
> > generic device node property: "slaves"
> > 
> > 	optional
> > 
> > 	type : cell array consisting of one or more phandles
> > 
> > 	Implies that the device represented by the containing node
> > 	can issue transactions to the referenced node.
> > 
> > 	The referenced node is any bus or device node, and is
> > 	interpreted in the usual way, including the treatment
> > 	of ranges, #address-cells and #size-cells.  If the
> > 	referenced node has a non-empty ranges property, the
> > 	referencing node's #address-cells must be the same as
> > 	that of the referenced node's device tree parent.
> > 
> > generic device node property: "slave-names"
> > 
> > 	prohibited if there is no "slaves" property; otherwise
> > 	optional.  Recommended if the "slaves" property has
> > 	two or more entries.
> > 
> > 	type : string list with the same number of entries as
> > 		the number of cells in the value of the
> > 		"slaves" property.
> > 
> > 	Assigns symbolic names to the entries in the "slaves"
> > 	property, permitting slave connections with different
> > 	roles to be disambiguated.  See
> > 	Documentation/devicetree/bindings/resource-names.txt
> > 
> > generic device node: "slave"
> > 
> > 	optional
> > 
> > 	Implies that the device represented by the containing
> > 	node can issue transactions to the "slave" node.  "slave"
> > 	would always have these semantics; whether other child
> > 	nodes have a similar meaning is binding-specific.
> > 
> > 	property : "name"
> > 
> > 		optional
> > 
> > 		Assigns a symbolic name to this slave with
> > 		respect to the master.
> > 
> > If neither "slaves" nor any "slave" node is present, the topological
> > relationships are those defined by ePAPR: the device may or not be
> > a master, and if it is a master then it masters onto the parent node,
> > optionally propagating through the parent to the parent's parent
> > via mappings described by dma-ranges.
> 
> Let me see if I understood the above proposal by trying to translate it
> into a simple example for a specific use-case. On Tegra for example we
> have various units that can either access system memory directly or use
> the IOMMU to translate accesses for them. One such unit would be the
> display controller that scans out a framebuffer from memory.
> 
> 	dc at 0,54200000 {
> 		...
> 
> 		slave {
> 			/*
> 			 * 2 is the memory controller client ID of the
> 			 * display controller.
> 			 */
> 			iommu = <&iommu 2>;
> 
> 			...
> 		};
> 	};
> 
> Admittedly this is probably a lot more trivial than what you're looking
> for. There's no need for virtualization here, the IOMMU is simply used
> to isolate memory accesses by devices. Still it's a use-case that needs
> to be supported and one that at least Tegra and Exynos have an immediate
> need for.
> 
> So the above isn't much different from the proposed bindings, except
> that the iommu property is now nested within a slave node. I guess this
> gives us a lot more flexibility to extend the description of a slave as
> needed to represent more complex scenarios.
> 
> One thing that confuses me slightly about your proposal is that these
> subnodes describe the master interfaces of the containing nodes. Would
> it not be more appropriate to name the nodes "master" instead?
> 
> Also, are slaves/slave-names and slave subnodes mutually exclusive? It
> sounds like slaves/slave-names would be a specialization of the slave
> subnode concept for the trivial cases. Would the following be an
> equivalent description of the above example?
> 
> 	dc at 0,54200000 {
> 		...
> 
> 		slaves = <&iommu 2>;
> 	};

A slave node has the same meaning as a node referenced by a phandle
in slaves.  The only difference is where in the DT the node appears:
if the slave is shared by two masters then it probably doesn't make
sense to bury the node inside one of the masters' nodes.

I would picture the DT for your example system something like this:

	root: / {
		#address-cells = <2>;
		#address-cells = <2>;
		memory { ... };

		peripherals {
			ranges = ...;

			iommu {
				reg = < ... >;
				interrupts = < ... >;
				slaves = <&root>;
			};

			dc {
				slaves = <&iommu>;
				...
			};

			dmac at 1 {
				slaves = <&iommu>;
				...
			};

			dmac at 2 {
				slave {
					#address-cells = <2>;
					#size-cells = <2>;
					ranges = < 0 0 0 0 0 1 >;
					slaves = <&root>;
				};
			};
		};
	};

Here, for reasons best known to the hardware designers, dmac at 2 is
not connected via the IOMMU, but is only 32-bit capable, hence
we have a ranges property to describe how the root address space
is seen by dmac at 2.  (slave/#size-cells is 2 purely because we cannot
describe a size of 2^32 with #address-cells=1).

I could equally have done

	dmac at 2 {
		slaves = <&foo>;

		foo {
			#address-cells = <2>;
			#size-cells = <2>;
			ranges = < 0 0 0 0 0 1 >;
			slaves = <&root>;
		};
	};

... hence the doubt about whether "slave" nodes are really useful
in addition to "slaves" properties.

> I don't see how it could be exactly equivalent since it misses context
> regarding the type of slave that's being interacted with. Perhaps that

The compatible string on the referenced node tells you what it is.

The driver associated with the "iommu" node would provide functionality
to report the downstream mapping capabilities in this case, though it's
possible we could do parts of it in a generic way.

> could be solved by making that knowledge driver-specific (i.e. the
> driver for the Tegra display controller will know that it can only be
> the master on an IOMMU and therefore derive the slave type). Or the
> slave's type could be derived from the slave-names property.
> 
> While this proposal lacks specifics for IOMMU devices, I think it could

In case it wasn't clear, this proposoal is only about how to describe
linkage.  For IOMMUs and some other devices we also need to describe
how some other bus signals get mapped etc.

I have some ideas on these other areas, but I didn't want to dump it
all at once.

> work well to describe them in a generic way. Especially when slave nodes
> are used, arbitrarily more data can be added to describe more complex
> master interfaces (DMA windows, ...).
> 
> I still see an issue with supporting this generically with the currently
> recommended way to use IOMMUs (via the DMA mapping API). There's not
> enough granularity in the API to support this. It's probably going to
> work fine for Tegra, but I think for more complex cases drivers will
> probably need to use the IOMMU API directly.

Initially I would expect that Linux would only support rather limited
variations on this theme.

In particular, we would determine the path from each master device to
/memory, recording the mappings and components that we go through
along the way.

The simplest cases are:

  a) There is nothing on the path except static address remappings.

  b) There is nothing on the path except a single IOMMU.

> But that's an implementation detail and can probably be solved later.

There is no need why Linux should cope with anything that doesn't
match the simple cases until/unless such a system appears.  When and
if that happens, we would have a ready-made way of describing it.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-02 16:14     ` Arnd Bergmann
@ 2014-05-02 17:31       ` Dave Martin
  -1 siblings, 0 replies; 58+ messages in thread
From: Dave Martin @ 2014-05-02 17:31 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Mark Rutland,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Shaik Ameer Basha,
	Stephen Warren, Grant Grundler, Will Deacon, Jason Gunthorpe,
	Marc Zyngier, Thierry Reding, Hiroshi Doyu

On Fri, May 02, 2014 at 06:14:58PM +0200, Arnd Bergmann wrote:
> On Thursday 01 May 2014 18:32:48 Dave Martin wrote:
> > (Note, this is a long mail -- people in a hurry may want to skip to
> > "Outline binding" to get a feel for what is bring proposed, before
> > returning to the background wording.)
> > 
> > As highlighted in some previous discussions[1], it is becoming possible
> > to build ARM-based SoCs that seem to be impossible to describe using the
> > DT bindings currently specified by ePAPR.  This is driven by increasing
> > complexity of interconnects, the appearance of IOMMUs, MSI-capable
> > interrupt controllers and multiple bus masters.
> > 
> > This issue is not fundamental to ARM and could apply to other SoC
> > families with a similar bus architecture, but most of the current
> > discussion in this area has been about how to address these
> > requirements for ARM SoCs.
> > 
> > This RFC is an outline for some core bindings to solve part of the
> > problem of describing such systems, particularly how to describe master/
> > slave relationships not currently representable in DT.  It is premature
> > to make a concrete proposal yet: rather I'm presenting this as a starting
> > point for discussion initially.
> > 
> > The intent is not to rewrite existing bindings, but to define a common
> > DT approach for describing otherwise problematic features of future
> > systems.  Actual Linux support for this could be implemented as needed.
> 
> Thanks a lot for getting this rolling!
> 
> 
> > ** Outline binding **
> > 
> > generic device node property: "slaves"
> > 
> > 	optional
> > 
> > 	type : cell array consisting of one or more phandles
> > 
> > 	Implies that the device represented by the containing node
> > 	can issue transactions to the referenced node.
> > 
> > 	The referenced node is any bus or device node, and is
> > 	interpreted in the usual way, including the treatment
> > 	of ranges, #address-cells and #size-cells.  If the
> > 	referenced node has a non-empty ranges property, the
> > 	referencing node's #address-cells must be the same as
> > 	that of the referenced node's device tree parent.
> 
> I guess you mean "dma-ranges" here, not "ranges", right?
> I don't see how "ranges" is even relevant for this.

No, but I didn't state it very clearly.

In this:

	parent {
		child {
			ranges = < ... >;
			dma-ranges = < ... >;
		};
	};

There are two transaction flows being described.  There are transactions
from parent -> child, for which "ranges" describes the mappings, and
there are transactions from child -> parent, for which "dma-ranges"
describes the mappings.

The name "dma-ranges" obfuscates this symmetry, so it took me a while
to figure out what it really means -- maybe I'm still confused, but
I think that's the gist of it.


For the purposes of cross-links, my plan was that we interpret all
those links as "forward" (i.e., parent -> child) links, where the
referencing node is deemed to be the parent, and the referenced node is
deemed to be the child. Just as in the ePAPR case, the associated mapping
is then described by "ranges".

> Don't you need arguments to the phandle? It seems that in most
> cases, you need at least one of a dma-ranges like translation
> or a master ID. What you need would be specific to the slave.

For any 1:N relationship between nodes, you can describe the
_relationship_ by putting properties on the nodes at the "1" end.  This
is precisely how "ranges" and "dma-ranges" work.

The N:M case can be resolved by inserting simple-bus nodes into any
links with non-default mappings: i.e., you split each affected link in
two, with a simple-bus node in the middle describing the mapping:

root: / {
	ranges;
	...

	master@1 {
		slave {
			ranges = < ... >;
			slaves = <&root>;
		};
	};

	master@2 {
		slave {
			slaves = < &root &master2_dma_slave >;
			slave-names = "config-fetch", "dma";

		master2_dma_slave: dma-slave {
				ranges = < ... >;
				slaves = <&root>;
			};
		};
	};

	master@3 {
		slaves = <&root>;
	};
};


Here, there are three master devices, one with two different mastering
roles.

master@2's configuration data fetch mechanism accesses the root bus
node, but with some remapping.  master@2 also does bulk DMA, which
has no remapping.

master@1 masters on / with its own remapping.  master@3 masters on
/ with no remapping.

(This is a silly made-up system: I don't claim I've seen something like
this.)


> 
> It may be best to make the ranges explicit here and then also
> allow additional fields depending on e.g. a #dma-slave-cells
> property in the slave.
> 
> For instance, a 32-bit master on a a 64-bit bus that has master-id
> 23 would look like
> 
> 	otherbus: axi@somewhere{
> 		#address-cells = <2>;
> 		#size-cells = <2>;
> 	};
> 
> 	somemaster@somewhere {
> 		#address-cells = <1>;
> 		#size-cells = <1>;
> 		slaves = <&otherbus  // phandle
> 				0     // local address
> 				0 0   // remote address
> 				0x1 0 // size
> 				23>;  // master id
> 	};

I thought about this possibility, but was worried that the "slaves"
property would become awkward to parse, where except for the "master id"
concept, all these attributes are well described by ePAPR already for
bus nodes if we can figure out how to piggyback on them -- hence my
alternative approach explained above.

How to describe the "master id" is particularly problematic and may
be a separate discussion.  It can get munged or remapped as it
passes through the interconnect: for example, a PCI device's ID 
accompanying an MSI write may be translated once as it passes from
the PCI RC to an IOMMU, then again before it reaches the GIC.

In the "windowed IOMMU" case, address bits are effectively being
mapped to ID bits as they reach IOMMU.

An IOMMU also does a complete mapping of ID+address -> ID'+address'
(although programmable rather than static and unprobeable, so the
actual mappings for an IOMMU won't be in the DT).

> 
> > Questions:
> > 
> > 1) Should the names "slaves" and "slave" be globally generic?
> > 
> >    Pro: Making them generic permits some processing to be done on the DT
> >    without knowing the individual bindings for every node, such as
> >    figuring out the global DMA mask.  It should also encourage adoption
> >    of the bindings as a common approach.
> > 
> >    Con: Namespace pollution
> > 
> >    Otherwise, there could be a special string in the node's compatible
> >    list (strictly not "simple-bus") to indicate that these properties
> >    should be interpreted.
> > 
> >    The alternative is for the way of identifying a node's slaves to be
> >    binding-specific.  This makes some generic operations on the DT
> >    impossible without knowing all the bindings, such as analysing
> >    reachability or determining the effective DMA mask.  This analysis
> >    can be performed using generic bindings alone today, for systems
> >    describable by ePAPR.  Breaking this concept feels like a backward
> >    step.
> 
> How about being slightly more specific, using "dma-slaves" and
> "dma-slave-names" etc?

I avoided the word "dma" because I found it confusing initially.

In the hardware there is no difference at all between "dma" and
bus mastering by CPUs, at least not in the ARM SoC world.

"DMA" suggests a relatively dumb peripheral doing grunt work on behalf
of a CPU.  Viewing pagetable fetches and polygon rendering or
RenderScript-style computataion offload done by a GPU as "DMA" seems a
bit of a stretch.

That said, DT is intended for use by OSes, so if it is CPU-centric,
that's OK (it already is CPU-centric, anyway).  A name is just a name;
so long as bindings are understandable, the choice of name shouldn't
matter too much.

> > 2) The generic "slave" node(s) are for convenience and readability.
> >    They could be made eliminated by using child nodes with
> >    binding-specific names and referencing them in "slaves".  This is a
> >    bit more awkward, but has the same expressive power.
> > 
> >    Should the generic "slave" nodes go away?
> 
> I would prefer not having to have subnodes for the simple case
> where you just need to reference one slave iommu from a master
> device.

My expectation is that subnodes would only be useful in special cases in
any case.

We can remove the special "slave" name, because there's nothing to
stop us referencing other random nested nodes with the "slaves" property.

> 
> It could be a recommendation for devices that have multiple slaves,
> but I still haven't seen an example where this is actually needed.
> 
> > 3) Should "slave" or "slaves" be traversable for bridge- or bus-like
> >    nodes?
> > 
> >    Saying "no" to this makes it impossible for the reachability graph of
> >    the DT to contain cycles.  This is a clear benefit for any software
> >    attempting to parse the DT in a robust way.  Only the first link,
> >    from the initiating master to the first bridge, would be permitted
> >    to be a "slaves" link.
> > 
> >    Ideally, we would want an IOMMU's bridge-like role to be represented
> >    by some deep node in the DT: it can't usually be on the global path
> >    from / since CPUs typically don't master through the IOMMU.
> > 
> >    Parsers could be made robust while still permitting this, by
> >    truncating the search if the initial master node is reached.
> >    Ill-formed DTs could contains cycles that can't be resolved in
> >    this way, e.g., A -> B -> B.  For now it might be reasonable to
> >    check for this in dtc.
> 
> I wouldn't be worried about cycles. We can just declare them forbidden
> in the binding. Anything can break if you supply a broken DT, this
> is the least of the problems.

That's my thought.  If there turns out to be a really good reason to
describe cycles then we can cross that bridge* when we come to it,
but it's best to forbid it until/unless the need for it is proven.

(*no pun intended)

Note that a certain kind of trivial cycle will always be created
when a node refers back to its parent:

root: / {
	ranges;

	iommu {
		reg = < ... >;
		slaves = <&root>;
	};
};

ePAPR says that if there is no "ranges" property, then the parent
node cannot access any address of the child -- we can interpret
this as saying that transactions do not propagate.  "ranges" with
an empty value imples a complete 1:1 mapping, which we can interpret
as transactions being forwarded without any transformation.

Crucially, "iommu" must not have a "ranges" property in this case,
because this would permit a static routing cycle root -> iommu ->
root.


Providing that the only devices that master on iommu are not
themselves bridges reachable from /, there is no cycle --
a given transaction will issued to the iommu will sooner or later hit
something that is not a bridge and disappear.

Note that there is no cycle through the "reg" property on iommu:
"reg" indicates a sink for transactions; "slaves" indicates a
source of transactions, and "ranges" indicates a propagator of
transactions.

"dma-ranges" indicates that the children of the node _might_ be
sources of transactions (but that does not mean that they definitely
are) -- and that the parent node acts as a bridge for those transactions,
forwarding them back to its own parent or children depending on the
address.

Cheers
---Dave
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-02 17:31       ` Dave Martin
  0 siblings, 0 replies; 58+ messages in thread
From: Dave Martin @ 2014-05-02 17:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 02, 2014 at 06:14:58PM +0200, Arnd Bergmann wrote:
> On Thursday 01 May 2014 18:32:48 Dave Martin wrote:
> > (Note, this is a long mail -- people in a hurry may want to skip to
> > "Outline binding" to get a feel for what is bring proposed, before
> > returning to the background wording.)
> > 
> > As highlighted in some previous discussions[1], it is becoming possible
> > to build ARM-based SoCs that seem to be impossible to describe using the
> > DT bindings currently specified by ePAPR.  This is driven by increasing
> > complexity of interconnects, the appearance of IOMMUs, MSI-capable
> > interrupt controllers and multiple bus masters.
> > 
> > This issue is not fundamental to ARM and could apply to other SoC
> > families with a similar bus architecture, but most of the current
> > discussion in this area has been about how to address these
> > requirements for ARM SoCs.
> > 
> > This RFC is an outline for some core bindings to solve part of the
> > problem of describing such systems, particularly how to describe master/
> > slave relationships not currently representable in DT.  It is premature
> > to make a concrete proposal yet: rather I'm presenting this as a starting
> > point for discussion initially.
> > 
> > The intent is not to rewrite existing bindings, but to define a common
> > DT approach for describing otherwise problematic features of future
> > systems.  Actual Linux support for this could be implemented as needed.
> 
> Thanks a lot for getting this rolling!
> 
> 
> > ** Outline binding **
> > 
> > generic device node property: "slaves"
> > 
> > 	optional
> > 
> > 	type : cell array consisting of one or more phandles
> > 
> > 	Implies that the device represented by the containing node
> > 	can issue transactions to the referenced node.
> > 
> > 	The referenced node is any bus or device node, and is
> > 	interpreted in the usual way, including the treatment
> > 	of ranges, #address-cells and #size-cells.  If the
> > 	referenced node has a non-empty ranges property, the
> > 	referencing node's #address-cells must be the same as
> > 	that of the referenced node's device tree parent.
> 
> I guess you mean "dma-ranges" here, not "ranges", right?
> I don't see how "ranges" is even relevant for this.

No, but I didn't state it very clearly.

In this:

	parent {
		child {
			ranges = < ... >;
			dma-ranges = < ... >;
		};
	};

There are two transaction flows being described.  There are transactions
from parent -> child, for which "ranges" describes the mappings, and
there are transactions from child -> parent, for which "dma-ranges"
describes the mappings.

The name "dma-ranges" obfuscates this symmetry, so it took me a while
to figure out what it really means -- maybe I'm still confused, but
I think that's the gist of it.


For the purposes of cross-links, my plan was that we interpret all
those links as "forward" (i.e., parent -> child) links, where the
referencing node is deemed to be the parent, and the referenced node is
deemed to be the child. Just as in the ePAPR case, the associated mapping
is then described by "ranges".

> Don't you need arguments to the phandle? It seems that in most
> cases, you need at least one of a dma-ranges like translation
> or a master ID. What you need would be specific to the slave.

For any 1:N relationship between nodes, you can describe the
_relationship_ by putting properties on the nodes at the "1" end.  This
is precisely how "ranges" and "dma-ranges" work.

The N:M case can be resolved by inserting simple-bus nodes into any
links with non-default mappings: i.e., you split each affected link in
two, with a simple-bus node in the middle describing the mapping:

root: / {
	ranges;
	...

	master at 1 {
		slave {
			ranges = < ... >;
			slaves = <&root>;
		};
	};

	master at 2 {
		slave {
			slaves = < &root &master2_dma_slave >;
			slave-names = "config-fetch", "dma";

		master2_dma_slave: dma-slave {
				ranges = < ... >;
				slaves = <&root>;
			};
		};
	};

	master at 3 {
		slaves = <&root>;
	};
};


Here, there are three master devices, one with two different mastering
roles.

master at 2's configuration data fetch mechanism accesses the root bus
node, but with some remapping.  master at 2 also does bulk DMA, which
has no remapping.

master at 1 masters on / with its own remapping.  master at 3 masters on
/ with no remapping.

(This is a silly made-up system: I don't claim I've seen something like
this.)


> 
> It may be best to make the ranges explicit here and then also
> allow additional fields depending on e.g. a #dma-slave-cells
> property in the slave.
> 
> For instance, a 32-bit master on a a 64-bit bus that has master-id
> 23 would look like
> 
> 	otherbus: axi at somewhere{
> 		#address-cells = <2>;
> 		#size-cells = <2>;
> 	};
> 
> 	somemaster at somewhere {
> 		#address-cells = <1>;
> 		#size-cells = <1>;
> 		slaves = <&otherbus  // phandle
> 				0     // local address
> 				0 0   // remote address
> 				0x1 0 // size
> 				23>;  // master id
> 	};

I thought about this possibility, but was worried that the "slaves"
property would become awkward to parse, where except for the "master id"
concept, all these attributes are well described by ePAPR already for
bus nodes if we can figure out how to piggyback on them -- hence my
alternative approach explained above.

How to describe the "master id" is particularly problematic and may
be a separate discussion.  It can get munged or remapped as it
passes through the interconnect: for example, a PCI device's ID 
accompanying an MSI write may be translated once as it passes from
the PCI RC to an IOMMU, then again before it reaches the GIC.

In the "windowed IOMMU" case, address bits are effectively being
mapped to ID bits as they reach IOMMU.

An IOMMU also does a complete mapping of ID+address -> ID'+address'
(although programmable rather than static and unprobeable, so the
actual mappings for an IOMMU won't be in the DT).

> 
> > Questions:
> > 
> > 1) Should the names "slaves" and "slave" be globally generic?
> > 
> >    Pro: Making them generic permits some processing to be done on the DT
> >    without knowing the individual bindings for every node, such as
> >    figuring out the global DMA mask.  It should also encourage adoption
> >    of the bindings as a common approach.
> > 
> >    Con: Namespace pollution
> > 
> >    Otherwise, there could be a special string in the node's compatible
> >    list (strictly not "simple-bus") to indicate that these properties
> >    should be interpreted.
> > 
> >    The alternative is for the way of identifying a node's slaves to be
> >    binding-specific.  This makes some generic operations on the DT
> >    impossible without knowing all the bindings, such as analysing
> >    reachability or determining the effective DMA mask.  This analysis
> >    can be performed using generic bindings alone today, for systems
> >    describable by ePAPR.  Breaking this concept feels like a backward
> >    step.
> 
> How about being slightly more specific, using "dma-slaves" and
> "dma-slave-names" etc?

I avoided the word "dma" because I found it confusing initially.

In the hardware there is no difference at all between "dma" and
bus mastering by CPUs, at least not in the ARM SoC world.

"DMA" suggests a relatively dumb peripheral doing grunt work on behalf
of a CPU.  Viewing pagetable fetches and polygon rendering or
RenderScript-style computataion offload done by a GPU as "DMA" seems a
bit of a stretch.

That said, DT is intended for use by OSes, so if it is CPU-centric,
that's OK (it already is CPU-centric, anyway).  A name is just a name;
so long as bindings are understandable, the choice of name shouldn't
matter too much.

> > 2) The generic "slave" node(s) are for convenience and readability.
> >    They could be made eliminated by using child nodes with
> >    binding-specific names and referencing them in "slaves".  This is a
> >    bit more awkward, but has the same expressive power.
> > 
> >    Should the generic "slave" nodes go away?
> 
> I would prefer not having to have subnodes for the simple case
> where you just need to reference one slave iommu from a master
> device.

My expectation is that subnodes would only be useful in special cases in
any case.

We can remove the special "slave" name, because there's nothing to
stop us referencing other random nested nodes with the "slaves" property.

> 
> It could be a recommendation for devices that have multiple slaves,
> but I still haven't seen an example where this is actually needed.
> 
> > 3) Should "slave" or "slaves" be traversable for bridge- or bus-like
> >    nodes?
> > 
> >    Saying "no" to this makes it impossible for the reachability graph of
> >    the DT to contain cycles.  This is a clear benefit for any software
> >    attempting to parse the DT in a robust way.  Only the first link,
> >    from the initiating master to the first bridge, would be permitted
> >    to be a "slaves" link.
> > 
> >    Ideally, we would want an IOMMU's bridge-like role to be represented
> >    by some deep node in the DT: it can't usually be on the global path
> >    from / since CPUs typically don't master through the IOMMU.
> > 
> >    Parsers could be made robust while still permitting this, by
> >    truncating the search if the initial master node is reached.
> >    Ill-formed DTs could contains cycles that can't be resolved in
> >    this way, e.g., A -> B -> B.  For now it might be reasonable to
> >    check for this in dtc.
> 
> I wouldn't be worried about cycles. We can just declare them forbidden
> in the binding. Anything can break if you supply a broken DT, this
> is the least of the problems.

That's my thought.  If there turns out to be a really good reason to
describe cycles then we can cross that bridge* when we come to it,
but it's best to forbid it until/unless the need for it is proven.

(*no pun intended)

Note that a certain kind of trivial cycle will always be created
when a node refers back to its parent:

root: / {
	ranges;

	iommu {
		reg = < ... >;
		slaves = <&root>;
	};
};

ePAPR says that if there is no "ranges" property, then the parent
node cannot access any address of the child -- we can interpret
this as saying that transactions do not propagate.  "ranges" with
an empty value imples a complete 1:1 mapping, which we can interpret
as transactions being forwarded without any transformation.

Crucially, "iommu" must not have a "ranges" property in this case,
because this would permit a static routing cycle root -> iommu ->
root.


Providing that the only devices that master on iommu are not
themselves bridges reachable from /, there is no cycle --
a given transaction will issued to the iommu will sooner or later hit
something that is not a bridge and disappear.

Note that there is no cycle through the "reg" property on iommu:
"reg" indicates a sink for transactions; "slaves" indicates a
source of transactions, and "ranges" indicates a propagator of
transactions.

"dma-ranges" indicates that the children of the node _might_ be
sources of transactions (but that does not mean that they definitely
are) -- and that the parent node acts as a bridge for those transactions,
forwarding them back to its own parent or children depending on the
address.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-02 15:19           ` Arnd Bergmann
@ 2014-05-02 17:43             ` Dave Martin
  -1 siblings, 0 replies; 58+ messages in thread
From: Dave Martin @ 2014-05-02 17:43 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Thierry Reding, Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Shaik Ameer Basha, Stephen Warren, Grant Grundler, Will Deacon,
	Jason Gunthorpe, Marc Zyngier,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Hiroshi Doyu

On Fri, May 02, 2014 at 05:19:44PM +0200, Arnd Bergmann wrote:
> On Friday 02 May 2014 15:23:29 Thierry Reding wrote:
> > On Fri, May 02, 2014 at 02:32:08PM +0200, Arnd Bergmann wrote:
> > > On Friday 02 May 2014 13:05:58 Thierry Reding wrote:
> > > > 
> > > > Let me see if I understood the above proposal by trying to translate it
> > > > into a simple example for a specific use-case. On Tegra for example we
> > > > have various units that can either access system memory directly or use
> > > > the IOMMU to translate accesses for them. One such unit would be the
> > > > display controller that scans out a framebuffer from memory.
> > > 
> > > Can you explain how the decision is made whether the IOMMU gets used
> > > or not? In all cases I've seen so far, I think we can hardwire this
> > > in DT, and only expose one or the other. Are both ways used
> > > concurrently?
> > 
> > It should be possible to hardcode this in DT for Tegra. As I understand
> > it, both interfaces can't be used at the same time. Once translation has
> > been enabled for one client, all accesses generated by that client will
> > be translated.
> 
> Ok.
> 
> > > >         dc@0,54200000 {
> > > >                 ...
> > > > 
> > > >                 slave {
> > > >                         /*
> > > >                          * 2 is the memory controller client ID of the
> > > >                          * display controller.
> > > >                          */
> > > >                         iommu = <&iommu 2>;
> > > > 
> > > >                         ...
> > > >                 };
> > > >         };
> > > > 
> > > > Admittedly this is probably a lot more trivial than what you're looking
> > > > for. There's no need for virtualization here, the IOMMU is simply used
> > > > to isolate memory accesses by devices. Still it's a use-case that needs
> > > > to be supported and one that at least Tegra and Exynos have an immediate
> > > > need for.
> > > > 
> > > > So the above isn't much different from the proposed bindings, except
> > > > that the iommu property is now nested within a slave node. I guess this
> > > > gives us a lot more flexibility to extend the description of a slave as
> > > > needed to represent more complex scenarios.
> > > 
> > > This looks rather complicated to parse automatically in the generic
> > > DT code when we try to decide which dma_map_ops to use. We'd have
> > > to look for 'slave' nodes in each device we instatiate and then see
> > > if they use an iommu or not.
> > 
> > But we need to do that now anyway in order to find an iommu property,
> > don't we? Adding one extra level here shouldn't be all that bad if it
> > gives us more flexibility or uniformity with more complicated setups.
> 
> The common code just needs to know whether an IOMMU is in use or
> not, and what the mask/offset are.
> 
> > To some degree this also depends on how we want to handle IOMMUs. If
> > they should remain transparently handled via dma_map_ops, then it makes
> > sense to set this up at device instantiation time. But how can we handle
> > this in situations where one device needs to master on two IOMMUs at the
> > same time? Or if the device needs physically contiguous memory for
> > purposes other than device I/O. Using dma_map_ops we can't control which
> > allocations get mapped via the IOMMU and which don't.
> 
> I still hope we can handle this in common code by selecting the right
> dma_map_ops when the devices are instantiated, at least for 99% of the
> cases. I'm not convinced we really need to handle the 'multiple IOMMUs
> on one device' case in a generic way. If there are no common use cases
> for that, we can probably get away with having multiple device nodes
> and an ugly driver for the exception, instead of making life complicated
> for everybody.

Multiple IOMMUs certainly seems an unusual case for now.

Being able to describe that in the DT doesn't necessarily mean the
kernel has to support it: just as the kernel doesn't need to support
all the features of a crazy hardware platform just someone was crazy
enough to build it.

My expectation was that we do some check when probing a device to figure
out the path from the device to main memory, thus figuring out the dma
mask, the IOMMU (if any) and any relevant device ID.  This is a bit more
complex than the existing situation, but I still think we could have
common code for the bulk of it.

If a device has different roles with completely different paths to
memory, one option could be for the driver to instantiate two devices in
the kernel.  This puts the burden on the driver for the device, instead
of the core framework.

> 
> > > > Also, are slaves/slave-names and slave subnodes mutually exclusive? It
> > > > sounds like slaves/slave-names would be a specialization of the slave
> > > > subnode concept for the trivial cases. Would the following be an
> > > > equivalent description of the above example?
> > > > 
> > > >         dc@0,54200000 {
> > > >                 ...
> > > > 
> > > >                 slaves = <&iommu 2>;
> > > >         };
> > > > 
> > > > I don't see how it could be exactly equivalent since it misses context
> > > > regarding the type of slave that's being interacted with. Perhaps that
> > > > could be solved by making that knowledge driver-specific (i.e. the
> > > > driver for the Tegra display controller will know that it can only be
> > > > the master on an IOMMU and therefore derive the slave type). Or the
> > > > slave's type could be derived from the slave-names property.
> > > 
> > > I'd rather have a device-specific property that tells the driver
> > > about things the iommu driver doesn't need to know but the master
> > > does. In most cases, we should be fine without a name attached to the
> > > slave.
> > 
> > For the easy cases where we either have no IOMMU or a single IOMMU per
> > device, that should work fine. This only becomes problematic when there
> > are more than one, since you need to distinguish between possibly more
> > than one type.
> > 
> > As I understand it, Dave's proposal is for generic bus masters, which
> > may be an IOMMU but could also be something completely different. So in
> > those cases we need extra meta information so that we can look up the
> > proper type of object.
> 
> Doing something complicated for the IOMMUs themselves seems fine, also
> for other nonstandard devices that are just weird. I just want to
> handle the simple case automatically.

Agreed.  Ideally, the simple cases should be completely handled by
framework: the complex cases will have to fend for themselves, unless
a clear pattern emerges in which case frameworks to handle them could
be created in the future.

Cheers
---Dave
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-02 17:43             ` Dave Martin
  0 siblings, 0 replies; 58+ messages in thread
From: Dave Martin @ 2014-05-02 17:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 02, 2014 at 05:19:44PM +0200, Arnd Bergmann wrote:
> On Friday 02 May 2014 15:23:29 Thierry Reding wrote:
> > On Fri, May 02, 2014 at 02:32:08PM +0200, Arnd Bergmann wrote:
> > > On Friday 02 May 2014 13:05:58 Thierry Reding wrote:
> > > > 
> > > > Let me see if I understood the above proposal by trying to translate it
> > > > into a simple example for a specific use-case. On Tegra for example we
> > > > have various units that can either access system memory directly or use
> > > > the IOMMU to translate accesses for them. One such unit would be the
> > > > display controller that scans out a framebuffer from memory.
> > > 
> > > Can you explain how the decision is made whether the IOMMU gets used
> > > or not? In all cases I've seen so far, I think we can hardwire this
> > > in DT, and only expose one or the other. Are both ways used
> > > concurrently?
> > 
> > It should be possible to hardcode this in DT for Tegra. As I understand
> > it, both interfaces can't be used at the same time. Once translation has
> > been enabled for one client, all accesses generated by that client will
> > be translated.
> 
> Ok.
> 
> > > >         dc at 0,54200000 {
> > > >                 ...
> > > > 
> > > >                 slave {
> > > >                         /*
> > > >                          * 2 is the memory controller client ID of the
> > > >                          * display controller.
> > > >                          */
> > > >                         iommu = <&iommu 2>;
> > > > 
> > > >                         ...
> > > >                 };
> > > >         };
> > > > 
> > > > Admittedly this is probably a lot more trivial than what you're looking
> > > > for. There's no need for virtualization here, the IOMMU is simply used
> > > > to isolate memory accesses by devices. Still it's a use-case that needs
> > > > to be supported and one that at least Tegra and Exynos have an immediate
> > > > need for.
> > > > 
> > > > So the above isn't much different from the proposed bindings, except
> > > > that the iommu property is now nested within a slave node. I guess this
> > > > gives us a lot more flexibility to extend the description of a slave as
> > > > needed to represent more complex scenarios.
> > > 
> > > This looks rather complicated to parse automatically in the generic
> > > DT code when we try to decide which dma_map_ops to use. We'd have
> > > to look for 'slave' nodes in each device we instatiate and then see
> > > if they use an iommu or not.
> > 
> > But we need to do that now anyway in order to find an iommu property,
> > don't we? Adding one extra level here shouldn't be all that bad if it
> > gives us more flexibility or uniformity with more complicated setups.
> 
> The common code just needs to know whether an IOMMU is in use or
> not, and what the mask/offset are.
> 
> > To some degree this also depends on how we want to handle IOMMUs. If
> > they should remain transparently handled via dma_map_ops, then it makes
> > sense to set this up at device instantiation time. But how can we handle
> > this in situations where one device needs to master on two IOMMUs at the
> > same time? Or if the device needs physically contiguous memory for
> > purposes other than device I/O. Using dma_map_ops we can't control which
> > allocations get mapped via the IOMMU and which don't.
> 
> I still hope we can handle this in common code by selecting the right
> dma_map_ops when the devices are instantiated, at least for 99% of the
> cases. I'm not convinced we really need to handle the 'multiple IOMMUs
> on one device' case in a generic way. If there are no common use cases
> for that, we can probably get away with having multiple device nodes
> and an ugly driver for the exception, instead of making life complicated
> for everybody.

Multiple IOMMUs certainly seems an unusual case for now.

Being able to describe that in the DT doesn't necessarily mean the
kernel has to support it: just as the kernel doesn't need to support
all the features of a crazy hardware platform just someone was crazy
enough to build it.

My expectation was that we do some check when probing a device to figure
out the path from the device to main memory, thus figuring out the dma
mask, the IOMMU (if any) and any relevant device ID.  This is a bit more
complex than the existing situation, but I still think we could have
common code for the bulk of it.

If a device has different roles with completely different paths to
memory, one option could be for the driver to instantiate two devices in
the kernel.  This puts the burden on the driver for the device, instead
of the core framework.

> 
> > > > Also, are slaves/slave-names and slave subnodes mutually exclusive? It
> > > > sounds like slaves/slave-names would be a specialization of the slave
> > > > subnode concept for the trivial cases. Would the following be an
> > > > equivalent description of the above example?
> > > > 
> > > >         dc at 0,54200000 {
> > > >                 ...
> > > > 
> > > >                 slaves = <&iommu 2>;
> > > >         };
> > > > 
> > > > I don't see how it could be exactly equivalent since it misses context
> > > > regarding the type of slave that's being interacted with. Perhaps that
> > > > could be solved by making that knowledge driver-specific (i.e. the
> > > > driver for the Tegra display controller will know that it can only be
> > > > the master on an IOMMU and therefore derive the slave type). Or the
> > > > slave's type could be derived from the slave-names property.
> > > 
> > > I'd rather have a device-specific property that tells the driver
> > > about things the iommu driver doesn't need to know but the master
> > > does. In most cases, we should be fine without a name attached to the
> > > slave.
> > 
> > For the easy cases where we either have no IOMMU or a single IOMMU per
> > device, that should work fine. This only becomes problematic when there
> > are more than one, since you need to distinguish between possibly more
> > than one type.
> > 
> > As I understand it, Dave's proposal is for generic bus masters, which
> > may be an IOMMU but could also be something completely different. So in
> > those cases we need extra meta information so that we can look up the
> > proper type of object.
> 
> Doing something complicated for the IOMMUs themselves seems fine, also
> for other nonstandard devices that are just weird. I just want to
> handle the simple case automatically.

Agreed.  Ideally, the simple cases should be completely handled by
framework: the complex cases will have to fend for themselves, unless
a clear pattern emerges in which case frameworks to handle them could
be created in the future.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-02 17:31       ` Dave Martin
@ 2014-05-02 18:17           ` Jason Gunthorpe
  -1 siblings, 0 replies; 58+ messages in thread
From: Jason Gunthorpe @ 2014-05-02 18:17 UTC (permalink / raw)
  To: Dave Martin
  Cc: Arnd Bergmann, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Shaik Ameer Basha, Stephen Warren, Grant Grundler, Will Deacon,
	Marc Zyngier, Thierry Reding, Hiroshi Doyu

On Fri, May 02, 2014 at 06:31:20PM +0100, Dave Martin wrote:

> Note that there is no cycle through the "reg" property on iommu:
> "reg" indicates a sink for transactions; "slaves" indicates a
> source of transactions, and "ranges" indicates a propagator of
> transactions.

I wonder if this might be a better naming scheme, I actually don't
really like 'slave' for this, it really only applies well to AXI style
unidirectional busses, and any sort of message-based bus architectures
(HT, PCI, QPI, etc) just have the concept of an initiator and target.

Since initiator/target applies equally well to master/slave buses,
that seems like better, clearer, naming.

Using a nomenclature where
  'reg' describes a target reachable from the CPU initiator via the
        natural DT hierarchy
  'initiator' describes a non-CPU (eg 'DMA') source of ops, and
        travels via the path described to memory (which is the
	target).
  'path' describes the route between an intitator and target, where
        bridges along the route may alter the operation.
  'upstream' path direction toward the target, typically memory.
  'upstream-bridge' The next hop on a path between an initiator/target

But I would encourage you to think about the various limitations this
still has
 - NUMA systems. How does one describe the path from each
   CPU to a target regs, and target memory? This is important for
   automatically setting affinities.
 - Peer-to-Peer DMA, this is where a non-CPU initiator speaks to a
   non-memory target, possibly through IOMMUs and what not. ie
   a graphics card in a PCI-E slot DMA'ing through a QPI bus to
   a graphics card in a PCI-E slot attached to a different socket.

These are already use-cases happening on x86.. and the same underlying
hardware architectures this tries to describe for DMA to memory is at
work for the above as well.

Basically, these days, interconnect is a graph. Pretending things are
a tree is stressful :)

Here is a basic attempt using the above language, trying to describe
an x86ish system with two sockets, two DMA devices, where one has DMA
target capabable memory (eg a GPU)

// DT tree is the view from the SMP CPU complex down to regs
smp_system {
   socket0 {
       cpu0@0 {}
       cpu1@0 {}
       memory@0: {}
       interconnect0: {targets = <&memory@0,interconnect1>;}
       interconnect0_control: {
             ranges;
             peripheral@0 {
   		regs = <>;
                intiator1 {
                        ranges = < ... >;
                        // View from this DMA initiator back to memory
                        upstream-bridge = <&interconnect0>;
                };
		/* For some reason this peripheral has two DMA
		   initiation ports. */
                intiator2 {
                        ranges = < ... >;
                        upstream-bridge = <&interconnect0>;
                };
             };
        };
   }
   socket1 {
       cpu0@1 {}
       cpu1@1 {}
       memory@1: {}
       interconnect1: {targets = <&memory@1,&interconnect0,&peripheral@1/target>;}
       interconnect1_control: {
             ranges;
             peripheral@1 {
                ranges = < ... >;
   		regs = <>;
                intiator {
                        ranges = < ... >;
                        // View from this DMA initiator back to memory
                        upstream-bridge = <&interconnect1>;
                };
                target {
		        reg = <..>
                        /* This peripheral has integrated memory!
                           But notice the CPU path is
                             smp_system -> socket1 -> interconnect1_control -> target
			   While a DMA path is
                             intiator1 -> interconnect0 -> interconnect1 -> target
			 */
                };
            };
            peripheral2@0 {
   		regs = <>;

		// Or we can write the simplest case like this.
		dma-ranges = <>;
		upstream-bridge = <&interconnect1>;
                /* if upstream-bridge is omitted then it defaults to
	           &parent, eg interconnect1_control */
       }
}

It is computable that ops from initator2 -> target flow through
interconnect0, interconnect1, and then are delivered to target.

It has a fair symmetry with the interrupt-parent mechanism..

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-02 18:17           ` Jason Gunthorpe
  0 siblings, 0 replies; 58+ messages in thread
From: Jason Gunthorpe @ 2014-05-02 18:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 02, 2014 at 06:31:20PM +0100, Dave Martin wrote:

> Note that there is no cycle through the "reg" property on iommu:
> "reg" indicates a sink for transactions; "slaves" indicates a
> source of transactions, and "ranges" indicates a propagator of
> transactions.

I wonder if this might be a better naming scheme, I actually don't
really like 'slave' for this, it really only applies well to AXI style
unidirectional busses, and any sort of message-based bus architectures
(HT, PCI, QPI, etc) just have the concept of an initiator and target.

Since initiator/target applies equally well to master/slave buses,
that seems like better, clearer, naming.

Using a nomenclature where
  'reg' describes a target reachable from the CPU initiator via the
        natural DT hierarchy
  'initiator' describes a non-CPU (eg 'DMA') source of ops, and
        travels via the path described to memory (which is the
	target).
  'path' describes the route between an intitator and target, where
        bridges along the route may alter the operation.
  'upstream' path direction toward the target, typically memory.
  'upstream-bridge' The next hop on a path between an initiator/target

But I would encourage you to think about the various limitations this
still has
 - NUMA systems. How does one describe the path from each
   CPU to a target regs, and target memory? This is important for
   automatically setting affinities.
 - Peer-to-Peer DMA, this is where a non-CPU initiator speaks to a
   non-memory target, possibly through IOMMUs and what not. ie
   a graphics card in a PCI-E slot DMA'ing through a QPI bus to
   a graphics card in a PCI-E slot attached to a different socket.

These are already use-cases happening on x86.. and the same underlying
hardware architectures this tries to describe for DMA to memory is at
work for the above as well.

Basically, these days, interconnect is a graph. Pretending things are
a tree is stressful :)

Here is a basic attempt using the above language, trying to describe
an x86ish system with two sockets, two DMA devices, where one has DMA
target capabable memory (eg a GPU)

// DT tree is the view from the SMP CPU complex down to regs
smp_system {
   socket0 {
       cpu0 at 0 {}
       cpu1 at 0 {}
       memory at 0: {}
       interconnect0: {targets = <&memory@0,interconnect1>;}
       interconnect0_control: {
             ranges;
             peripheral at 0 {
   		regs = <>;
                intiator1 {
                        ranges = < ... >;
                        // View from this DMA initiator back to memory
                        upstream-bridge = <&interconnect0>;
                };
		/* For some reason this peripheral has two DMA
		   initiation ports. */
                intiator2 {
                        ranges = < ... >;
                        upstream-bridge = <&interconnect0>;
                };
             };
        };
   }
   socket1 {
       cpu0 at 1 {}
       cpu1 at 1 {}
       memory at 1: {}
       interconnect1: {targets = <&memory at 1,&interconnect0,&peripheral@1/target>;}
       interconnect1_control: {
             ranges;
             peripheral at 1 {
                ranges = < ... >;
   		regs = <>;
                intiator {
                        ranges = < ... >;
                        // View from this DMA initiator back to memory
                        upstream-bridge = <&interconnect1>;
                };
                target {
		        reg = <..>
                        /* This peripheral has integrated memory!
                           But notice the CPU path is
                             smp_system -> socket1 -> interconnect1_control -> target
			   While a DMA path is
                             intiator1 -> interconnect0 -> interconnect1 -> target
			 */
                };
            };
            peripheral2 at 0 {
   		regs = <>;

		// Or we can write the simplest case like this.
		dma-ranges = <>;
		upstream-bridge = <&interconnect1>;
                /* if upstream-bridge is omitted then it defaults to
	           &parent, eg interconnect1_control */
       }
}

It is computable that ops from initator2 -> target flow through
interconnect0, interconnect1, and then are delivered to target.

It has a fair symmetry with the interrupt-parent mechanism..

Jason

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-02 13:23         ` Thierry Reding
@ 2014-05-02 18:50           ` Stephen Warren
  -1 siblings, 0 replies; 58+ messages in thread
From: Stephen Warren @ 2014-05-02 18:50 UTC (permalink / raw)
  To: Thierry Reding, Arnd Bergmann
  Cc: Dave Martin, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Shaik Ameer Basha, Grant Grundler, Hiroshi Doyu, Jason Gunthorpe,
	Will Deacon, Mark Rutland, Marc Zyngier

On 05/02/2014 07:23 AM, Thierry Reding wrote:
> On Fri, May 02, 2014 at 02:32:08PM +0200, Arnd Bergmann wrote:
>> On Friday 02 May 2014 13:05:58 Thierry Reding wrote:
>>>
>>> Let me see if I understood the above proposal by trying to translate it
>>> into a simple example for a specific use-case. On Tegra for example we
>>> have various units that can either access system memory directly or use
>>> the IOMMU to translate accesses for them. One such unit would be the
>>> display controller that scans out a framebuffer from memory.
>>
>> Can you explain how the decision is made whether the IOMMU gets used
>> or not? In all cases I've seen so far, I think we can hardwire this
>> in DT, and only expose one or the other. Are both ways used
>> concurrently?
> 
> It should be possible to hardcode this in DT for Tegra. As I understand
> it, both interfaces can't be used at the same time. Once translation has
> been enabled for one client, all accesses generated by that client will
> be translated.
> 
> Hiroshi, please correct me if I'm wrong.

I believe the HW connectivity is always as follows:

Bus master (e.g. display controller) ---> IOMMU (Tegra SMMU) ---> RAM

In the IOMMU, there is a bit per bus master that indicates whether the
IOMMU translates the bus master's accesses or not. If that bit is
enabled, then page tables in the IOMMU are used to perform the translation.

You could also look at the HW setup as:

Bus master (e.g. display controller)
    v
   ----
  /    \
  ------
   |  \
   |   ------------------
   |                     \
   v                     v
IOMMU (Tegra SMMU) ---> RAM

But IIRC the bit that controls that demux is in the IOMMU, so this
distinction probably isn't relevant.

Now, perhaps there are devices which themselves control whether
transactions are sent to the IOMMU or direct to RAM, but I'm not
familiar with them. Is the GPU in that category, since it has its own
GMMU, albeit chained into the SMMU IIRC?
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-02 18:50           ` Stephen Warren
  0 siblings, 0 replies; 58+ messages in thread
From: Stephen Warren @ 2014-05-02 18:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/02/2014 07:23 AM, Thierry Reding wrote:
> On Fri, May 02, 2014 at 02:32:08PM +0200, Arnd Bergmann wrote:
>> On Friday 02 May 2014 13:05:58 Thierry Reding wrote:
>>>
>>> Let me see if I understood the above proposal by trying to translate it
>>> into a simple example for a specific use-case. On Tegra for example we
>>> have various units that can either access system memory directly or use
>>> the IOMMU to translate accesses for them. One such unit would be the
>>> display controller that scans out a framebuffer from memory.
>>
>> Can you explain how the decision is made whether the IOMMU gets used
>> or not? In all cases I've seen so far, I think we can hardwire this
>> in DT, and only expose one or the other. Are both ways used
>> concurrently?
> 
> It should be possible to hardcode this in DT for Tegra. As I understand
> it, both interfaces can't be used at the same time. Once translation has
> been enabled for one client, all accesses generated by that client will
> be translated.
> 
> Hiroshi, please correct me if I'm wrong.

I believe the HW connectivity is always as follows:

Bus master (e.g. display controller) ---> IOMMU (Tegra SMMU) ---> RAM

In the IOMMU, there is a bit per bus master that indicates whether the
IOMMU translates the bus master's accesses or not. If that bit is
enabled, then page tables in the IOMMU are used to perform the translation.

You could also look at the HW setup as:

Bus master (e.g. display controller)
    v
   ----
  /    \
  ------
   |  \
   |   ------------------
   |                     \
   v                     v
IOMMU (Tegra SMMU) ---> RAM

But IIRC the bit that controls that demux is in the IOMMU, so this
distinction probably isn't relevant.

Now, perhaps there are devices which themselves control whether
transactions are sent to the IOMMU or direct to RAM, but I'm not
familiar with them. Is the GPU in that category, since it has its own
GMMU, albeit chained into the SMMU IIRC?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-02 15:19           ` Arnd Bergmann
@ 2014-05-02 18:55             ` Stephen Warren
  -1 siblings, 0 replies; 58+ messages in thread
From: Stephen Warren @ 2014-05-02 18:55 UTC (permalink / raw)
  To: Arnd Bergmann, Thierry Reding
  Cc: Mark Rutland, devicetree, Shaik Ameer Basha, Grant Grundler,
	Will Deacon, Jason Gunthorpe, Marc Zyngier, Dave Martin,
	linux-arm-kernel, Hiroshi Doyu

On 05/02/2014 09:19 AM, Arnd Bergmann wrote:
> On Friday 02 May 2014 15:23:29 Thierry Reding wrote:
...
>> To some degree this also depends on how we want to handle IOMMUs. If
>> they should remain transparently handled via dma_map_ops, then it makes
>> sense to set this up at device instantiation time. But how can we handle
>> this in situations where one device needs to master on two IOMMUs at the
>> same time? Or if the device needs physically contiguous memory for
>> purposes other than device I/O. Using dma_map_ops we can't control which
>> allocations get mapped via the IOMMU and which don't.
> 
> I still hope we can handle this in common code by selecting the right
> dma_map_ops when the devices are instantiated, at least for 99% of the
> cases. I'm not convinced we really need to handle the 'multiple IOMMUs
> on one device' case in a generic way. If there are no common use cases
> for that, we can probably get away with having multiple device nodes
> and an ugly driver for the exception, instead of making life complicated
> for everybody.

By "multiple device nodes", I assume you mean device tree nodes? I'm not
sure I like the sound of that.

I believe that DT should represent the structure of the HW in terms of
HW modules or blocks. If there's a single cohesive HW module that
happens to talk to multiple MMUs, or indeed has any kind of unusual case
at all, I don't think that should force the DT representation to be
broken up into multiple nodes. We should have a DT node for that HW
module, and it should be up to the device driver to make the internal SW
representation work correctly.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-02 18:55             ` Stephen Warren
  0 siblings, 0 replies; 58+ messages in thread
From: Stephen Warren @ 2014-05-02 18:55 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/02/2014 09:19 AM, Arnd Bergmann wrote:
> On Friday 02 May 2014 15:23:29 Thierry Reding wrote:
...
>> To some degree this also depends on how we want to handle IOMMUs. If
>> they should remain transparently handled via dma_map_ops, then it makes
>> sense to set this up at device instantiation time. But how can we handle
>> this in situations where one device needs to master on two IOMMUs at the
>> same time? Or if the device needs physically contiguous memory for
>> purposes other than device I/O. Using dma_map_ops we can't control which
>> allocations get mapped via the IOMMU and which don't.
> 
> I still hope we can handle this in common code by selecting the right
> dma_map_ops when the devices are instantiated, at least for 99% of the
> cases. I'm not convinced we really need to handle the 'multiple IOMMUs
> on one device' case in a generic way. If there are no common use cases
> for that, we can probably get away with having multiple device nodes
> and an ugly driver for the exception, instead of making life complicated
> for everybody.

By "multiple device nodes", I assume you mean device tree nodes? I'm not
sure I like the sound of that.

I believe that DT should represent the structure of the HW in terms of
HW modules or blocks. If there's a single cohesive HW module that
happens to talk to multiple MMUs, or indeed has any kind of unusual case
at all, I don't think that should force the DT representation to be
broken up into multiple nodes. We should have a DT node for that HW
module, and it should be up to the device driver to make the internal SW
representation work correctly.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-02 18:55             ` Stephen Warren
@ 2014-05-02 19:02                 ` Arnd Bergmann
  -1 siblings, 0 replies; 58+ messages in thread
From: Arnd Bergmann @ 2014-05-02 19:02 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Thierry Reding, Dave Martin, devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Shaik Ameer Basha, Grant Grundler, Hiroshi Doyu, Jason Gunthorpe,
	Will Deacon, Mark Rutland, Marc Zyngier

On Friday 02 May 2014 12:55:45 Stephen Warren wrote:
> On 05/02/2014 09:19 AM, Arnd Bergmann wrote:
> > On Friday 02 May 2014 15:23:29 Thierry Reding wrote:
> ...
> >> To some degree this also depends on how we want to handle IOMMUs. If
> >> they should remain transparently handled via dma_map_ops, then it makes
> >> sense to set this up at device instantiation time. But how can we handle
> >> this in situations where one device needs to master on two IOMMUs at the
> >> same time? Or if the device needs physically contiguous memory for
> >> purposes other than device I/O. Using dma_map_ops we can't control which
> >> allocations get mapped via the IOMMU and which don't.
> > 
> > I still hope we can handle this in common code by selecting the right
> > dma_map_ops when the devices are instantiated, at least for 99% of the
> > cases. I'm not convinced we really need to handle the 'multiple IOMMUs
> > on one device' case in a generic way. If there are no common use cases
> > for that, we can probably get away with having multiple device nodes
> > and an ugly driver for the exception, instead of making life complicated
> > for everybody.
> 
> By "multiple device nodes", I assume you mean device tree nodes? I'm not
> sure I like the sound of that.
> 
> I believe that DT should represent the structure of the HW in terms of
> HW modules or blocks. If there's a single cohesive HW module that
> happens to talk to multiple MMUs, or indeed has any kind of unusual case
> at all, I don't think that should force the DT representation to be
> broken up into multiple nodes. We should have a DT node for that HW
> module, and it should be up to the device driver to make the internal SW
> representation work correctly.

I agree we should in general try our best to have the DT representation
match exactly what the hardware looks like. However we already have some
areas where we violate that, typically when things are not trees.

If there is no real use case but only a theoretical possibility, I don't
have a problem with being less strict about the general rule on hardware
representation.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-02 19:02                 ` Arnd Bergmann
  0 siblings, 0 replies; 58+ messages in thread
From: Arnd Bergmann @ 2014-05-02 19:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 02 May 2014 12:55:45 Stephen Warren wrote:
> On 05/02/2014 09:19 AM, Arnd Bergmann wrote:
> > On Friday 02 May 2014 15:23:29 Thierry Reding wrote:
> ...
> >> To some degree this also depends on how we want to handle IOMMUs. If
> >> they should remain transparently handled via dma_map_ops, then it makes
> >> sense to set this up at device instantiation time. But how can we handle
> >> this in situations where one device needs to master on two IOMMUs at the
> >> same time? Or if the device needs physically contiguous memory for
> >> purposes other than device I/O. Using dma_map_ops we can't control which
> >> allocations get mapped via the IOMMU and which don't.
> > 
> > I still hope we can handle this in common code by selecting the right
> > dma_map_ops when the devices are instantiated, at least for 99% of the
> > cases. I'm not convinced we really need to handle the 'multiple IOMMUs
> > on one device' case in a generic way. If there are no common use cases
> > for that, we can probably get away with having multiple device nodes
> > and an ugly driver for the exception, instead of making life complicated
> > for everybody.
> 
> By "multiple device nodes", I assume you mean device tree nodes? I'm not
> sure I like the sound of that.
> 
> I believe that DT should represent the structure of the HW in terms of
> HW modules or blocks. If there's a single cohesive HW module that
> happens to talk to multiple MMUs, or indeed has any kind of unusual case
> at all, I don't think that should force the DT representation to be
> broken up into multiple nodes. We should have a DT node for that HW
> module, and it should be up to the device driver to make the internal SW
> representation work correctly.

I agree we should in general try our best to have the DT representation
match exactly what the hardware looks like. However we already have some
areas where we violate that, typically when things are not trees.

If there is no real use case but only a theoretical possibility, I don't
have a problem with being less strict about the general rule on hardware
representation.

	Arnd

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-02 18:50           ` Stephen Warren
@ 2014-05-02 19:06               ` Arnd Bergmann
  -1 siblings, 0 replies; 58+ messages in thread
From: Arnd Bergmann @ 2014-05-02 19:06 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: Stephen Warren, Thierry Reding, Mark Rutland,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Shaik Ameer Basha,
	Grant Grundler, Will Deacon, Jason Gunthorpe, Marc Zyngier,
	Dave Martin, Hiroshi Doyu

On Friday 02 May 2014 12:50:17 Stephen Warren wrote:
> On 05/02/2014 07:23 AM, Thierry Reding wrote:
> > On Fri, May 02, 2014 at 02:32:08PM +0200, Arnd Bergmann wrote:
> >> On Friday 02 May 2014 13:05:58 Thierry Reding wrote:
> >>>
> >>> Let me see if I understood the above proposal by trying to translate it
> >>> into a simple example for a specific use-case. On Tegra for example we
> >>> have various units that can either access system memory directly or use
> >>> the IOMMU to translate accesses for them. One such unit would be the
> >>> display controller that scans out a framebuffer from memory.
> >>
> >> Can you explain how the decision is made whether the IOMMU gets used
> >> or not? In all cases I've seen so far, I think we can hardwire this
> >> in DT, and only expose one or the other. Are both ways used
> >> concurrently?
> > 
> > It should be possible to hardcode this in DT for Tegra. As I understand
> > it, both interfaces can't be used at the same time. Once translation has
> > been enabled for one client, all accesses generated by that client will
> > be translated.
> > 
> > Hiroshi, please correct me if I'm wrong.
> 
> I believe the HW connectivity is always as follows:
> 
> Bus master (e.g. display controller) ---> IOMMU (Tegra SMMU) ---> RAM
> 
> In the IOMMU, there is a bit per bus master that indicates whether the
> IOMMU translates the bus master's accesses or not. If that bit is
> enabled, then page tables in the IOMMU are used to perform the translation.
> 
> You could also look at the HW setup as:
> 
> Bus master (e.g. display controller)
>     v
>    ----
>   /    \
>   ------
>    |  \
>    |   ------------------
>    |                     \
>    v                     v
> IOMMU (Tegra SMMU) ---> RAM
> 
> But IIRC the bit that controls that demux is in the IOMMU, so this
> distinction probably isn't relevant.

Ok. I think this case can be dealt with easily enough without
having to represent it as two master ports on one device. There
really is just one master, and it can be configured in two ways.

We can either choose to make the DT representation decide which
way is used, or we can always point to the IOMMU, and let the
IOMMU driver decide.

> Now, perhaps there are devices which themselves control whether
> transactions are sent to the IOMMU or direct to RAM, but I'm not
> familiar with them. Is the GPU in that category, since it has its own
> GMMU, albeit chained into the SMMU IIRC?

Devices with a built-in IOMMU such as most GPUs are also easy enough
to handle: There is no reason to actually show the IOMMU in DT and
we can just treat the GPU as a black box.

Note that you don't really have to support the dma-mapping.h API on
GPUs, they usually need to go down to the IOMMU level anyway.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-02 19:06               ` Arnd Bergmann
  0 siblings, 0 replies; 58+ messages in thread
From: Arnd Bergmann @ 2014-05-02 19:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 02 May 2014 12:50:17 Stephen Warren wrote:
> On 05/02/2014 07:23 AM, Thierry Reding wrote:
> > On Fri, May 02, 2014 at 02:32:08PM +0200, Arnd Bergmann wrote:
> >> On Friday 02 May 2014 13:05:58 Thierry Reding wrote:
> >>>
> >>> Let me see if I understood the above proposal by trying to translate it
> >>> into a simple example for a specific use-case. On Tegra for example we
> >>> have various units that can either access system memory directly or use
> >>> the IOMMU to translate accesses for them. One such unit would be the
> >>> display controller that scans out a framebuffer from memory.
> >>
> >> Can you explain how the decision is made whether the IOMMU gets used
> >> or not? In all cases I've seen so far, I think we can hardwire this
> >> in DT, and only expose one or the other. Are both ways used
> >> concurrently?
> > 
> > It should be possible to hardcode this in DT for Tegra. As I understand
> > it, both interfaces can't be used at the same time. Once translation has
> > been enabled for one client, all accesses generated by that client will
> > be translated.
> > 
> > Hiroshi, please correct me if I'm wrong.
> 
> I believe the HW connectivity is always as follows:
> 
> Bus master (e.g. display controller) ---> IOMMU (Tegra SMMU) ---> RAM
> 
> In the IOMMU, there is a bit per bus master that indicates whether the
> IOMMU translates the bus master's accesses or not. If that bit is
> enabled, then page tables in the IOMMU are used to perform the translation.
> 
> You could also look at the HW setup as:
> 
> Bus master (e.g. display controller)
>     v
>    ----
>   /    \
>   ------
>    |  \
>    |   ------------------
>    |                     \
>    v                     v
> IOMMU (Tegra SMMU) ---> RAM
> 
> But IIRC the bit that controls that demux is in the IOMMU, so this
> distinction probably isn't relevant.

Ok. I think this case can be dealt with easily enough without
having to represent it as two master ports on one device. There
really is just one master, and it can be configured in two ways.

We can either choose to make the DT representation decide which
way is used, or we can always point to the IOMMU, and let the
IOMMU driver decide.

> Now, perhaps there are devices which themselves control whether
> transactions are sent to the IOMMU or direct to RAM, but I'm not
> familiar with them. Is the GPU in that category, since it has its own
> GMMU, albeit chained into the SMMU IIRC?

Devices with a built-in IOMMU such as most GPUs are also easy enough
to handle: There is no reason to actually show the IOMMU in DT and
we can just treat the GPU as a black box.

Note that you don't really have to support the dma-mapping.h API on
GPUs, they usually need to go down to the IOMMU level anyway.

	Arnd

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-02 17:31       ` Dave Martin
@ 2014-05-02 20:36           ` Arnd Bergmann
  -1 siblings, 0 replies; 58+ messages in thread
From: Arnd Bergmann @ 2014-05-02 20:36 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: Dave Martin, Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Shaik Ameer Basha, Stephen Warren, Grant Grundler, Will Deacon,
	Jason Gunthorpe, Marc Zyngier, Thierry Reding, Hiroshi Doyu

On Friday 02 May 2014 18:31:20 Dave Martin wrote:
> No, but I didn't state it very clearly.
> 
> In this:
> 
>         parent {
>                 child {
>                         ranges = < ... >;
>                         dma-ranges = < ... >;
>                 };
>         };

The ranges and dma-ranges properties belong into parent, not child here.
I guess that's what you meant at least.

> There are two transaction flows being described.  There are transactions
> from parent -> child, for which "ranges" describes the mappings, and
> there are transactions from child -> parent, for which "dma-ranges"
> describes the mappings.

Right.

> The name "dma-ranges" obfuscates this symmetry, so it took me a while
> to figure out what it really means -- maybe I'm still confused, but
> I think that's the gist of it.
> 
> 
> For the purposes of cross-links, my plan was that we interpret all
> those links as "forward" (i.e., parent -> child) links, where the
> referencing node is deemed to be the parent, and the referenced node is
> deemed to be the child. Just as in the ePAPR case, the associated mapping
> is then described by "ranges".

That seems counterintuitive to me. When a device initiates a transaction,
it should look at the "dma-ranges" of its parent. The "slaves" property
would be a way to redirect the parent for these transactions, but it
doesn't mean that the device suddenly translates ranges as seen from
its parent.

In other words, "ranges" should always move from CPU to MMIO target
(slave), while "dma-ranges" should always move from a DMA master towards
memory. If you want to represent a device-to-device DMA, you may have
to move up a few levels using "dma-ranges" and then move down again
using "ranges".

> > Don't you need arguments to the phandle? It seems that in most
> > cases, you need at least one of a dma-ranges like translation
> > or a master ID. What you need would be specific to the slave.
> 
> For any 1:N relationship between nodes, you can describe the
> _relationship_ by putting properties on the nodes at the "1" end.  This
> is precisely how "ranges" and "dma-ranges" work.

That doesn't seem very helpful or intuitive though. If I have
an IOMMU that N DMA masters can target, I don't want to have
information about all the masters in the IOMMU, that information
belongs into the masters, but the format in which it is stored
must be specific to the IOMMU.

> The N:M case can be resolved by inserting simple-bus nodes into any
> links with non-default mappings: i.e., you split each affected link in
> two, with a simple-bus node in the middle describing the mapping:

Ok, ignoring those oddball cases for now, I'd like to first
try to come to a more useful representation for the common case.

> root: / {
>         ranges;
>         ...
> 
>         master@1 {
>                 slave {
>                         ranges = < ... >;
>                         slaves = <&root>;
>                 };
>         };

The "ranges" property here is really weird. I understand you
mean this to be a device that has a unique mapping into its
the slave (the root device here). I would reduce this to one
of two cases:

a) it sits on an intermediate bus by itself, and that bus
has the non-default mapping:

/ {
	dma-ranges;
	ranges;

	bus@x {
		ranges;
		dma-ranges = < ... >; // this is the mapping
		master@1 {
			...
		};
	};

b) the device itself is strange, but it's a property of the
device that the driver knows about (e.g. it can only do
a 30-bit DMA mask rather than 32 bit, for a realistic case),
so we don't have represent this at all and let the driver
deal with the translation:

/ {
	dma-ranges;
	ranges;

	master@1 {
		...
	};

};

>         master@2 {
>                 slave {
>                         slaves = < &root &master2_dma_slave >;
>                         slave-names = "config-fetch", "dma";
> 
>                	   master2_dma_slave: dma-slave {
>                                 ranges = < ... >;
>                                 slaves = <&root>;
>                         };
>                 };
>         };

As I said before, I'd consider this a non-issue until anyone
can come up with a case that needs the complexity.

A possible representation would be to have two masters
as child nodes of the actual device to avoid having a 'slaves'
property with multiple entries, and if the device is the only
one with a weird translation, that can go to some other bus
node we make up for this purpose:

/ {
	ranges;
	dma-ranges;

	fake-bus {
		dma-ranges = < ... >;
		slaves = < &{/} >; // is the default, so can be ommitted
	};

	bus {
		ranges;
		// no dma-ranges --> no default DMA translation

		device@1 {
			master@1 {
				// hopefully will never be
				// needed in real life
				slaves = < &{/fake-bus}>;
			};

			master@2 {
				slaves = < &{/} >;
			};
		};
	};
};

>         master@3 {
>                 slaves = <&root>;
>         };
> };

Here, the slave is the parent device, which is the default
anyway, so I wouldn't require listing anything at all,
besides an empty dma-ranges in the parent node.

If we can get away with just a single entry in 'slaves' all
the time, we could actually rename that property to 'dma-parent',
for consistency with 'interrupt-parent' and a few other things.

Freescale already has 'fsl,iommu-parent' for this case, a
'dma-parent' would be a generalization of that, but less general
than your 'slaves' property.

> > It may be best to make the ranges explicit here and then also
> > allow additional fields depending on e.g. a #dma-slave-cells
> > property in the slave.
> > 
> > For instance, a 32-bit master on a a 64-bit bus that has master-id
> > 23 would look like
> > 
> >       otherbus: axi@somewhere{
> >               #address-cells = <2>;
> >               #size-cells = <2>;
> >       };
> > 
> >       somemaster@somewhere {
> >               #address-cells = <1>;
> >               #size-cells = <1>;
> >               slaves = <&otherbus  // phandle
> >                               0     // local address
> >                               0 0   // remote address
> >                               0x1 0 // size
> >                               23>;  // master id
> >       };
> 
> I thought about this possibility, but was worried that the "slaves"
> property would become awkward to parse, where except for the "master id"
> concept, all these attributes are well described by ePAPR already for
> bus nodes if we can figure out how to piggyback on them -- hence my
> alternative approach explained above.
> 
> How to describe the "master id" is particularly problematic and may
> be a separate discussion.  It can get munged or remapped as it
> passes through the interconnect: for example, a PCI device's ID 
> accompanying an MSI write may be translated once as it passes from
> the PCI RC to an IOMMU, then again before it reaches the GIC.

Hmm, that would actually mean we'd have to do complex "dma-ranges"
properties with more than one entry, which I had hoped to avoid.

> In the "windowed IOMMU" case, address bits are effectively being
> mapped to ID bits as they reach IOMMU.
> 
> An IOMMU also does a complete mapping of ID+address -> ID'+address'
> (although programmable rather than static and unprobeable, so the
> actual mappings for an IOMMU won't be in the DT).

right.

> > > 2) The generic "slave" node(s) are for convenience and readability.
> > >    They could be made eliminated by using child nodes with
> > >    binding-specific names and referencing them in "slaves".  This is a
> > >    bit more awkward, but has the same expressive power.
> > > 
> > >    Should the generic "slave" nodes go away?
> > 
> > I would prefer not having to have subnodes for the simple case
> > where you just need to reference one slave iommu from a master
> > device.
> 
> My expectation is that subnodes would only be useful in special cases in
> any case.
> 
> We can remove the special "slave" name, because there's nothing to
> stop us referencing other random nested nodes with the "slaves" property.

Ok.

> > I wouldn't be worried about cycles. We can just declare them forbidden
> > in the binding. Anything can break if you supply a broken DT, this
> > is the least of the problems.
> 
> That's my thought.  If there turns out to be a really good reason to
> describe cycles then we can cross that bridge* when we come to it,
> but it's best to forbid it until/unless the need for it is proven.
> 
> (*no pun intended)

Right.

> Note that a certain kind of trivial cycle will always be created
> when a node refers back to its parent:
> 
> root: / {
>         ranges;
> 
>         iommu {
>                 reg = < ... >;
>                 slaves = <&root>;
>         };
> };
> 
> ePAPR says that if there is no "ranges" property, then the parent
> node cannot access any address of the child -- we can interpret
> this as saying that transactions do not propagate.  "ranges" with
> an empty value imples a complete 1:1 mapping, which we can interpret
> as transactions being forwarded without any transformation.
> 
> Crucially, "iommu" must not have a "ranges" property in this case,
> because this would permit a static routing cycle root -> iommu ->
> root.

Makes sense.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-02 20:36           ` Arnd Bergmann
  0 siblings, 0 replies; 58+ messages in thread
From: Arnd Bergmann @ 2014-05-02 20:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 02 May 2014 18:31:20 Dave Martin wrote:
> No, but I didn't state it very clearly.
> 
> In this:
> 
>         parent {
>                 child {
>                         ranges = < ... >;
>                         dma-ranges = < ... >;
>                 };
>         };

The ranges and dma-ranges properties belong into parent, not child here.
I guess that's what you meant at least.

> There are two transaction flows being described.  There are transactions
> from parent -> child, for which "ranges" describes the mappings, and
> there are transactions from child -> parent, for which "dma-ranges"
> describes the mappings.

Right.

> The name "dma-ranges" obfuscates this symmetry, so it took me a while
> to figure out what it really means -- maybe I'm still confused, but
> I think that's the gist of it.
> 
> 
> For the purposes of cross-links, my plan was that we interpret all
> those links as "forward" (i.e., parent -> child) links, where the
> referencing node is deemed to be the parent, and the referenced node is
> deemed to be the child. Just as in the ePAPR case, the associated mapping
> is then described by "ranges".

That seems counterintuitive to me. When a device initiates a transaction,
it should look at the "dma-ranges" of its parent. The "slaves" property
would be a way to redirect the parent for these transactions, but it
doesn't mean that the device suddenly translates ranges as seen from
its parent.

In other words, "ranges" should always move from CPU to MMIO target
(slave), while "dma-ranges" should always move from a DMA master towards
memory. If you want to represent a device-to-device DMA, you may have
to move up a few levels using "dma-ranges" and then move down again
using "ranges".

> > Don't you need arguments to the phandle? It seems that in most
> > cases, you need at least one of a dma-ranges like translation
> > or a master ID. What you need would be specific to the slave.
> 
> For any 1:N relationship between nodes, you can describe the
> _relationship_ by putting properties on the nodes at the "1" end.  This
> is precisely how "ranges" and "dma-ranges" work.

That doesn't seem very helpful or intuitive though. If I have
an IOMMU that N DMA masters can target, I don't want to have
information about all the masters in the IOMMU, that information
belongs into the masters, but the format in which it is stored
must be specific to the IOMMU.

> The N:M case can be resolved by inserting simple-bus nodes into any
> links with non-default mappings: i.e., you split each affected link in
> two, with a simple-bus node in the middle describing the mapping:

Ok, ignoring those oddball cases for now, I'd like to first
try to come to a more useful representation for the common case.

> root: / {
>         ranges;
>         ...
> 
>         master at 1 {
>                 slave {
>                         ranges = < ... >;
>                         slaves = <&root>;
>                 };
>         };

The "ranges" property here is really weird. I understand you
mean this to be a device that has a unique mapping into its
the slave (the root device here). I would reduce this to one
of two cases:

a) it sits on an intermediate bus by itself, and that bus
has the non-default mapping:

/ {
	dma-ranges;
	ranges;

	bus at x {
		ranges;
		dma-ranges = < ... >; // this is the mapping
		master at 1 {
			...
		};
	};

b) the device itself is strange, but it's a property of the
device that the driver knows about (e.g. it can only do
a 30-bit DMA mask rather than 32 bit, for a realistic case),
so we don't have represent this at all and let the driver
deal with the translation:

/ {
	dma-ranges;
	ranges;

	master at 1 {
		...
	};

};

>         master at 2 {
>                 slave {
>                         slaves = < &root &master2_dma_slave >;
>                         slave-names = "config-fetch", "dma";
> 
>                	   master2_dma_slave: dma-slave {
>                                 ranges = < ... >;
>                                 slaves = <&root>;
>                         };
>                 };
>         };

As I said before, I'd consider this a non-issue until anyone
can come up with a case that needs the complexity.

A possible representation would be to have two masters
as child nodes of the actual device to avoid having a 'slaves'
property with multiple entries, and if the device is the only
one with a weird translation, that can go to some other bus
node we make up for this purpose:

/ {
	ranges;
	dma-ranges;

	fake-bus {
		dma-ranges = < ... >;
		slaves = < &{/} >; // is the default, so can be ommitted
	};

	bus {
		ranges;
		// no dma-ranges --> no default DMA translation

		device at 1 {
			master at 1 {
				// hopefully will never be
				// needed in real life
				slaves = < &{/fake-bus}>;
			};

			master at 2 {
				slaves = < &{/} >;
			};
		};
	};
};

>         master at 3 {
>                 slaves = <&root>;
>         };
> };

Here, the slave is the parent device, which is the default
anyway, so I wouldn't require listing anything at all,
besides an empty dma-ranges in the parent node.

If we can get away with just a single entry in 'slaves' all
the time, we could actually rename that property to 'dma-parent',
for consistency with 'interrupt-parent' and a few other things.

Freescale already has 'fsl,iommu-parent' for this case, a
'dma-parent' would be a generalization of that, but less general
than your 'slaves' property.

> > It may be best to make the ranges explicit here and then also
> > allow additional fields depending on e.g. a #dma-slave-cells
> > property in the slave.
> > 
> > For instance, a 32-bit master on a a 64-bit bus that has master-id
> > 23 would look like
> > 
> >       otherbus: axi at somewhere{
> >               #address-cells = <2>;
> >               #size-cells = <2>;
> >       };
> > 
> >       somemaster at somewhere {
> >               #address-cells = <1>;
> >               #size-cells = <1>;
> >               slaves = <&otherbus  // phandle
> >                               0     // local address
> >                               0 0   // remote address
> >                               0x1 0 // size
> >                               23>;  // master id
> >       };
> 
> I thought about this possibility, but was worried that the "slaves"
> property would become awkward to parse, where except for the "master id"
> concept, all these attributes are well described by ePAPR already for
> bus nodes if we can figure out how to piggyback on them -- hence my
> alternative approach explained above.
> 
> How to describe the "master id" is particularly problematic and may
> be a separate discussion.  It can get munged or remapped as it
> passes through the interconnect: for example, a PCI device's ID 
> accompanying an MSI write may be translated once as it passes from
> the PCI RC to an IOMMU, then again before it reaches the GIC.

Hmm, that would actually mean we'd have to do complex "dma-ranges"
properties with more than one entry, which I had hoped to avoid.

> In the "windowed IOMMU" case, address bits are effectively being
> mapped to ID bits as they reach IOMMU.
> 
> An IOMMU also does a complete mapping of ID+address -> ID'+address'
> (although programmable rather than static and unprobeable, so the
> actual mappings for an IOMMU won't be in the DT).

right.

> > > 2) The generic "slave" node(s) are for convenience and readability.
> > >    They could be made eliminated by using child nodes with
> > >    binding-specific names and referencing them in "slaves".  This is a
> > >    bit more awkward, but has the same expressive power.
> > > 
> > >    Should the generic "slave" nodes go away?
> > 
> > I would prefer not having to have subnodes for the simple case
> > where you just need to reference one slave iommu from a master
> > device.
> 
> My expectation is that subnodes would only be useful in special cases in
> any case.
> 
> We can remove the special "slave" name, because there's nothing to
> stop us referencing other random nested nodes with the "slaves" property.

Ok.

> > I wouldn't be worried about cycles. We can just declare them forbidden
> > in the binding. Anything can break if you supply a broken DT, this
> > is the least of the problems.
> 
> That's my thought.  If there turns out to be a really good reason to
> describe cycles then we can cross that bridge* when we come to it,
> but it's best to forbid it until/unless the need for it is proven.
> 
> (*no pun intended)

Right.

> Note that a certain kind of trivial cycle will always be created
> when a node refers back to its parent:
> 
> root: / {
>         ranges;
> 
>         iommu {
>                 reg = < ... >;
>                 slaves = <&root>;
>         };
> };
> 
> ePAPR says that if there is no "ranges" property, then the parent
> node cannot access any address of the child -- we can interpret
> this as saying that transactions do not propagate.  "ranges" with
> an empty value imples a complete 1:1 mapping, which we can interpret
> as transactions being forwarded without any transformation.
> 
> Crucially, "iommu" must not have a "ranges" property in this case,
> because this would permit a static routing cycle root -> iommu ->
> root.

Makes sense.

	Arnd

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-02 17:43             ` Dave Martin
@ 2014-05-05 15:14                 ` Arnd Bergmann
  -1 siblings, 0 replies; 58+ messages in thread
From: Arnd Bergmann @ 2014-05-05 15:14 UTC (permalink / raw)
  To: Dave Martin
  Cc: Thierry Reding, Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Shaik Ameer Basha, Stephen Warren, Grant Grundler, Will Deacon,
	Jason Gunthorpe, Marc Zyngier,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Hiroshi Doyu

On Friday 02 May 2014 18:43:01 Dave Martin wrote:
> On Fri, May 02, 2014 at 05:19:44PM +0200, Arnd Bergmann wrote:
> > On Friday 02 May 2014 15:23:29 Thierry Reding wrote:
> > > To some degree this also depends on how we want to handle IOMMUs. If
> > > they should remain transparently handled via dma_map_ops, then it makes
> > > sense to set this up at device instantiation time. But how can we handle
> > > this in situations where one device needs to master on two IOMMUs at the
> > > same time? Or if the device needs physically contiguous memory for
> > > purposes other than device I/O. Using dma_map_ops we can't control which
> > > allocations get mapped via the IOMMU and which don't.
> > 
> > I still hope we can handle this in common code by selecting the right
> > dma_map_ops when the devices are instantiated, at least for 99% of the
> > cases. I'm not convinced we really need to handle the 'multiple IOMMUs
> > on one device' case in a generic way. If there are no common use cases
> > for that, we can probably get away with having multiple device nodes
> > and an ugly driver for the exception, instead of making life complicated
> > for everybody.
> 
> Multiple IOMMUs certainly seems an unusual case for now.
> 
> Being able to describe that in the DT doesn't necessarily mean the
> kernel has to support it: just as the kernel doesn't need to support
> all the features of a crazy hardware platform just someone was crazy
> enough to build it.

Right.

> My expectation was that we do some check when probing a device to figure
> out the path from the device to main memory, thus figuring out the dma
> mask, the IOMMU (if any) and any relevant device ID.  This is a bit more
> complex than the existing situation, but I still think we could have
> common code for the bulk of it.

I'd still prefer if we could keep following just "dma-ranges" to find
a path from device to memory, with an extension to handle IOMMUs etc,
but not describe the things to the extend that your proposal does.

We already pretend that things are a tree for the purposes of MMIO,
which is probably still close enough for the vast majority of cases.
For simplicity, I'd actually prefer keeping the illusion that MMIO
and DMA are two different things, which matches how operating systems
see things even if it's no longer true for the hardware.

> If a device has different roles with completely different paths to
> memory, one option could be for the driver to instantiate two devices in
> the kernel.  This puts the burden on the driver for the device, instead
> of the core framework.

Yes, this is what I suggested earlier as well.

	ARnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-05 15:14                 ` Arnd Bergmann
  0 siblings, 0 replies; 58+ messages in thread
From: Arnd Bergmann @ 2014-05-05 15:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 02 May 2014 18:43:01 Dave Martin wrote:
> On Fri, May 02, 2014 at 05:19:44PM +0200, Arnd Bergmann wrote:
> > On Friday 02 May 2014 15:23:29 Thierry Reding wrote:
> > > To some degree this also depends on how we want to handle IOMMUs. If
> > > they should remain transparently handled via dma_map_ops, then it makes
> > > sense to set this up at device instantiation time. But how can we handle
> > > this in situations where one device needs to master on two IOMMUs at the
> > > same time? Or if the device needs physically contiguous memory for
> > > purposes other than device I/O. Using dma_map_ops we can't control which
> > > allocations get mapped via the IOMMU and which don't.
> > 
> > I still hope we can handle this in common code by selecting the right
> > dma_map_ops when the devices are instantiated, at least for 99% of the
> > cases. I'm not convinced we really need to handle the 'multiple IOMMUs
> > on one device' case in a generic way. If there are no common use cases
> > for that, we can probably get away with having multiple device nodes
> > and an ugly driver for the exception, instead of making life complicated
> > for everybody.
> 
> Multiple IOMMUs certainly seems an unusual case for now.
> 
> Being able to describe that in the DT doesn't necessarily mean the
> kernel has to support it: just as the kernel doesn't need to support
> all the features of a crazy hardware platform just someone was crazy
> enough to build it.

Right.

> My expectation was that we do some check when probing a device to figure
> out the path from the device to main memory, thus figuring out the dma
> mask, the IOMMU (if any) and any relevant device ID.  This is a bit more
> complex than the existing situation, but I still think we could have
> common code for the bulk of it.

I'd still prefer if we could keep following just "dma-ranges" to find
a path from device to memory, with an extension to handle IOMMUs etc,
but not describe the things to the extend that your proposal does.

We already pretend that things are a tree for the purposes of MMIO,
which is probably still close enough for the vast majority of cases.
For simplicity, I'd actually prefer keeping the illusion that MMIO
and DMA are two different things, which matches how operating systems
see things even if it's no longer true for the hardware.

> If a device has different roles with completely different paths to
> memory, one option could be for the driver to instantiate two devices in
> the kernel.  This puts the burden on the driver for the device, instead
> of the core framework.

Yes, this is what I suggested earlier as well.

	ARnd

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-05 15:14                 ` Arnd Bergmann
@ 2014-05-09 10:33                   ` Dave Martin
  -1 siblings, 0 replies; 58+ messages in thread
From: Dave Martin @ 2014-05-09 10:33 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Shaik Ameer Basha, Stephen Warren, Grant Grundler, Will Deacon,
	Jason Gunthorpe, Marc Zyngier, Thierry Reding,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Hiroshi Doyu

On Mon, May 05, 2014 at 05:14:44PM +0200, Arnd Bergmann wrote:
> On Friday 02 May 2014 18:43:01 Dave Martin wrote:
> > On Fri, May 02, 2014 at 05:19:44PM +0200, Arnd Bergmann wrote:
> > > On Friday 02 May 2014 15:23:29 Thierry Reding wrote:
> > > > To some degree this also depends on how we want to handle IOMMUs. If
> > > > they should remain transparently handled via dma_map_ops, then it makes
> > > > sense to set this up at device instantiation time. But how can we handle
> > > > this in situations where one device needs to master on two IOMMUs at the
> > > > same time? Or if the device needs physically contiguous memory for
> > > > purposes other than device I/O. Using dma_map_ops we can't control which
> > > > allocations get mapped via the IOMMU and which don't.
> > > 
> > > I still hope we can handle this in common code by selecting the right
> > > dma_map_ops when the devices are instantiated, at least for 99% of the
> > > cases. I'm not convinced we really need to handle the 'multiple IOMMUs
> > > on one device' case in a generic way. If there are no common use cases
> > > for that, we can probably get away with having multiple device nodes
> > > and an ugly driver for the exception, instead of making life complicated
> > > for everybody.
> > 
> > Multiple IOMMUs certainly seems an unusual case for now.
> > 
> > Being able to describe that in the DT doesn't necessarily mean the
> > kernel has to support it: just as the kernel doesn't need to support
> > all the features of a crazy hardware platform just someone was crazy
> > enough to build it.
> 
> Right.
> 
> > My expectation was that we do some check when probing a device to figure
> > out the path from the device to main memory, thus figuring out the dma
> > mask, the IOMMU (if any) and any relevant device ID.  This is a bit more
> > complex than the existing situation, but I still think we could have
> > common code for the bulk of it.
> 
> I'd still prefer if we could keep following just "dma-ranges" to find
> a path from device to memory, with an extension to handle IOMMUs etc,
> but not describe the things to the extend that your proposal does.

This is really a simplification.  If we only had to consider the path
from devices to memory, and if the pretence that the memory is at /
works (almost always true, at least for general-purpose memory), then
we often have a simpler problem.  It is still not always solvable
though, since a device could still have a unique access path with
unique mappings that cannot be described in terms of any of the other
paths.

However, this is not just about DMA any more.  Devices also master onto
interrupt controllers to generate MSIs, onto IOMMUs to provide them
with address space remapping, onto other devices via IOMMUs; and GPUs
can master onto all kinds of things.

The alternative to a completely generic description of bus mastering
would be to consider the specific things that we are interested in
allowing devices to master on, and describe the whole end-to-end
link for each in the bus master's node.

 * memory
 * interrupt controllers (for MSIs)
 * IOMMUs

This could lead to combinatorial blow-up as the number of target
devices grows, since these linkages have to be re-described for
every bus master -- especially for GPUs which could do a fair amount of
device control by themselves.  It could also lead to fragmentation
as each binding solves common problems in different ways.

The downside is that is any path flows through a dynamically
configurable component, such as an IOMMU or a bridge that can be
remapped, turned off etc., then unless we describe how the path is
really linked together the kernel will need all sorts of baked-in
knowledge in order to manage the system safely.  The _effect_ of the
problem component on the path is not static, so we can't describe 
that effect directly in the DT.  For truly weird features that are
unique to a particular platform that's OK, but "how devices are
linked together" seems a much more general and useful concept than that.

> We already pretend that things are a tree for the purposes of MMIO,
> which is probably still close enough for the vast majority of cases.
> For simplicity, I'd actually prefer keeping the illusion that MMIO
> and DMA are two different things, which matches how operating systems
> see things even if it's no longer true for the hardware.
> 
> > If a device has different roles with completely different paths to
> > memory, one option could be for the driver to instantiate two devices in
> > the kernel.  This puts the burden on the driver for the device, instead
> > of the core framework.
> 
> Yes, this is what I suggested earlier as well.

I may have missed your point slightly before.  Anyway, I think the
"multiple master roles" issue is sufficiently unusual that describing 
them using multiple device nodes in the DT is reasonable.  So I think
I'm happy to get rid of the ability to specify and distinguish multiple
roles within a single device node.

Cheers
---Dave
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-09 10:33                   ` Dave Martin
  0 siblings, 0 replies; 58+ messages in thread
From: Dave Martin @ 2014-05-09 10:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 05, 2014 at 05:14:44PM +0200, Arnd Bergmann wrote:
> On Friday 02 May 2014 18:43:01 Dave Martin wrote:
> > On Fri, May 02, 2014 at 05:19:44PM +0200, Arnd Bergmann wrote:
> > > On Friday 02 May 2014 15:23:29 Thierry Reding wrote:
> > > > To some degree this also depends on how we want to handle IOMMUs. If
> > > > they should remain transparently handled via dma_map_ops, then it makes
> > > > sense to set this up at device instantiation time. But how can we handle
> > > > this in situations where one device needs to master on two IOMMUs at the
> > > > same time? Or if the device needs physically contiguous memory for
> > > > purposes other than device I/O. Using dma_map_ops we can't control which
> > > > allocations get mapped via the IOMMU and which don't.
> > > 
> > > I still hope we can handle this in common code by selecting the right
> > > dma_map_ops when the devices are instantiated, at least for 99% of the
> > > cases. I'm not convinced we really need to handle the 'multiple IOMMUs
> > > on one device' case in a generic way. If there are no common use cases
> > > for that, we can probably get away with having multiple device nodes
> > > and an ugly driver for the exception, instead of making life complicated
> > > for everybody.
> > 
> > Multiple IOMMUs certainly seems an unusual case for now.
> > 
> > Being able to describe that in the DT doesn't necessarily mean the
> > kernel has to support it: just as the kernel doesn't need to support
> > all the features of a crazy hardware platform just someone was crazy
> > enough to build it.
> 
> Right.
> 
> > My expectation was that we do some check when probing a device to figure
> > out the path from the device to main memory, thus figuring out the dma
> > mask, the IOMMU (if any) and any relevant device ID.  This is a bit more
> > complex than the existing situation, but I still think we could have
> > common code for the bulk of it.
> 
> I'd still prefer if we could keep following just "dma-ranges" to find
> a path from device to memory, with an extension to handle IOMMUs etc,
> but not describe the things to the extend that your proposal does.

This is really a simplification.  If we only had to consider the path
from devices to memory, and if the pretence that the memory is at /
works (almost always true, at least for general-purpose memory), then
we often have a simpler problem.  It is still not always solvable
though, since a device could still have a unique access path with
unique mappings that cannot be described in terms of any of the other
paths.

However, this is not just about DMA any more.  Devices also master onto
interrupt controllers to generate MSIs, onto IOMMUs to provide them
with address space remapping, onto other devices via IOMMUs; and GPUs
can master onto all kinds of things.

The alternative to a completely generic description of bus mastering
would be to consider the specific things that we are interested in
allowing devices to master on, and describe the whole end-to-end
link for each in the bus master's node.

 * memory
 * interrupt controllers (for MSIs)
 * IOMMUs

This could lead to combinatorial blow-up as the number of target
devices grows, since these linkages have to be re-described for
every bus master -- especially for GPUs which could do a fair amount of
device control by themselves.  It could also lead to fragmentation
as each binding solves common problems in different ways.

The downside is that is any path flows through a dynamically
configurable component, such as an IOMMU or a bridge that can be
remapped, turned off etc., then unless we describe how the path is
really linked together the kernel will need all sorts of baked-in
knowledge in order to manage the system safely.  The _effect_ of the
problem component on the path is not static, so we can't describe 
that effect directly in the DT.  For truly weird features that are
unique to a particular platform that's OK, but "how devices are
linked together" seems a much more general and useful concept than that.

> We already pretend that things are a tree for the purposes of MMIO,
> which is probably still close enough for the vast majority of cases.
> For simplicity, I'd actually prefer keeping the illusion that MMIO
> and DMA are two different things, which matches how operating systems
> see things even if it's no longer true for the hardware.
> 
> > If a device has different roles with completely different paths to
> > memory, one option could be for the driver to instantiate two devices in
> > the kernel.  This puts the burden on the driver for the device, instead
> > of the core framework.
> 
> Yes, this is what I suggested earlier as well.

I may have missed your point slightly before.  Anyway, I think the
"multiple master roles" issue is sufficiently unusual that describing 
them using multiple device nodes in the DT is reasonable.  So I think
I'm happy to get rid of the ability to specify and distinguish multiple
roles within a single device node.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-02 19:02                 ` Arnd Bergmann
@ 2014-05-09 10:45                   ` Dave Martin
  -1 siblings, 0 replies; 58+ messages in thread
From: Dave Martin @ 2014-05-09 10:45 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Stephen Warren, Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Shaik Ameer Basha, Grant Grundler, Will Deacon, Jason Gunthorpe,
	Marc Zyngier, Thierry Reding,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Hiroshi Doyu

On Fri, May 02, 2014 at 09:02:20PM +0200, Arnd Bergmann wrote:
> On Friday 02 May 2014 12:55:45 Stephen Warren wrote:
> > On 05/02/2014 09:19 AM, Arnd Bergmann wrote:
> > > On Friday 02 May 2014 15:23:29 Thierry Reding wrote:
> > ...
> > >> To some degree this also depends on how we want to handle IOMMUs. If
> > >> they should remain transparently handled via dma_map_ops, then it makes
> > >> sense to set this up at device instantiation time. But how can we handle
> > >> this in situations where one device needs to master on two IOMMUs at the
> > >> same time? Or if the device needs physically contiguous memory for
> > >> purposes other than device I/O. Using dma_map_ops we can't control which
> > >> allocations get mapped via the IOMMU and which don't.
> > > 
> > > I still hope we can handle this in common code by selecting the right
> > > dma_map_ops when the devices are instantiated, at least for 99% of the
> > > cases. I'm not convinced we really need to handle the 'multiple IOMMUs
> > > on one device' case in a generic way. If there are no common use cases
> > > for that, we can probably get away with having multiple device nodes
> > > and an ugly driver for the exception, instead of making life complicated
> > > for everybody.
> > 
> > By "multiple device nodes", I assume you mean device tree nodes? I'm not
> > sure I like the sound of that.
> > 
> > I believe that DT should represent the structure of the HW in terms of
> > HW modules or blocks. If there's a single cohesive HW module that
> > happens to talk to multiple MMUs, or indeed has any kind of unusual case
> > at all, I don't think that should force the DT representation to be
> > broken up into multiple nodes. We should have a DT node for that HW
> > module, and it should be up to the device driver to make the internal SW
> > representation work correctly.

This is rather fuzzy.  On-SoC, it is often non-obvious what counts as a
single component and what counts as a collection of multiple components.

For DT, we are concerned whether there is anything configurable in the
relationship between the components, or how each component connects to
the rest of the system that it is worthwhile to abstract.

> I agree we should in general try our best to have the DT representation
> match exactly what the hardware looks like. However we already have some
> areas where we violate that, typically when things are not trees.

The DT is an abstraction: it needs to describe all the OS-relevant
aspects of platform *behaviour* that can vary between configurations.

The most reliable way to describe behaviour perfectly is to describe the
hardware exactly in every detail, but it's not much help to an OS
because the useful information is spread all over the place and not in a
usable form.

I deliberately went hard down the "describe the hardware" route to
get the discussion going, but this doesn't always mean it is the best
answer.

We have to trade off the false economy of not describing enough against
the pointless expense of describing too much...

> If there is no real use case but only a theoretical possibility, I don't
> have a problem with being less strict about the general rule on hardware
> representation.

I agree that this is all a bit hypothetical -- good for batting ideas
around, but not so good for reaching practical conclusions.

I'll try to come up with something a bit more concrete.

Cheers
---Dave
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-09 10:45                   ` Dave Martin
  0 siblings, 0 replies; 58+ messages in thread
From: Dave Martin @ 2014-05-09 10:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 02, 2014 at 09:02:20PM +0200, Arnd Bergmann wrote:
> On Friday 02 May 2014 12:55:45 Stephen Warren wrote:
> > On 05/02/2014 09:19 AM, Arnd Bergmann wrote:
> > > On Friday 02 May 2014 15:23:29 Thierry Reding wrote:
> > ...
> > >> To some degree this also depends on how we want to handle IOMMUs. If
> > >> they should remain transparently handled via dma_map_ops, then it makes
> > >> sense to set this up at device instantiation time. But how can we handle
> > >> this in situations where one device needs to master on two IOMMUs at the
> > >> same time? Or if the device needs physically contiguous memory for
> > >> purposes other than device I/O. Using dma_map_ops we can't control which
> > >> allocations get mapped via the IOMMU and which don't.
> > > 
> > > I still hope we can handle this in common code by selecting the right
> > > dma_map_ops when the devices are instantiated, at least for 99% of the
> > > cases. I'm not convinced we really need to handle the 'multiple IOMMUs
> > > on one device' case in a generic way. If there are no common use cases
> > > for that, we can probably get away with having multiple device nodes
> > > and an ugly driver for the exception, instead of making life complicated
> > > for everybody.
> > 
> > By "multiple device nodes", I assume you mean device tree nodes? I'm not
> > sure I like the sound of that.
> > 
> > I believe that DT should represent the structure of the HW in terms of
> > HW modules or blocks. If there's a single cohesive HW module that
> > happens to talk to multiple MMUs, or indeed has any kind of unusual case
> > at all, I don't think that should force the DT representation to be
> > broken up into multiple nodes. We should have a DT node for that HW
> > module, and it should be up to the device driver to make the internal SW
> > representation work correctly.

This is rather fuzzy.  On-SoC, it is often non-obvious what counts as a
single component and what counts as a collection of multiple components.

For DT, we are concerned whether there is anything configurable in the
relationship between the components, or how each component connects to
the rest of the system that it is worthwhile to abstract.

> I agree we should in general try our best to have the DT representation
> match exactly what the hardware looks like. However we already have some
> areas where we violate that, typically when things are not trees.

The DT is an abstraction: it needs to describe all the OS-relevant
aspects of platform *behaviour* that can vary between configurations.

The most reliable way to describe behaviour perfectly is to describe the
hardware exactly in every detail, but it's not much help to an OS
because the useful information is spread all over the place and not in a
usable form.

I deliberately went hard down the "describe the hardware" route to
get the discussion going, but this doesn't always mean it is the best
answer.

We have to trade off the false economy of not describing enough against
the pointless expense of describing too much...

> If there is no real use case but only a theoretical possibility, I don't
> have a problem with being less strict about the general rule on hardware
> representation.

I agree that this is all a bit hypothetical -- good for batting ideas
around, but not so good for reaching practical conclusions.

I'll try to come up with something a bit more concrete.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-02 19:06               ` Arnd Bergmann
@ 2014-05-09 10:56                 ` Dave Martin
  -1 siblings, 0 replies; 58+ messages in thread
From: Dave Martin @ 2014-05-09 10:56 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Mark Rutland,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Shaik Ameer Basha,
	Stephen Warren, Grant Grundler, Will Deacon, Jason Gunthorpe,
	Marc Zyngier, Thierry Reding, Hiroshi Doyu

On Fri, May 02, 2014 at 09:06:43PM +0200, Arnd Bergmann wrote:
> On Friday 02 May 2014 12:50:17 Stephen Warren wrote:
> > On 05/02/2014 07:23 AM, Thierry Reding wrote:
> > > On Fri, May 02, 2014 at 02:32:08PM +0200, Arnd Bergmann wrote:
> > >> On Friday 02 May 2014 13:05:58 Thierry Reding wrote:
> > >>>
> > >>> Let me see if I understood the above proposal by trying to translate it
> > >>> into a simple example for a specific use-case. On Tegra for example we
> > >>> have various units that can either access system memory directly or use
> > >>> the IOMMU to translate accesses for them. One such unit would be the
> > >>> display controller that scans out a framebuffer from memory.
> > >>
> > >> Can you explain how the decision is made whether the IOMMU gets used
> > >> or not? In all cases I've seen so far, I think we can hardwire this
> > >> in DT, and only expose one or the other. Are both ways used
> > >> concurrently?
> > > 
> > > It should be possible to hardcode this in DT for Tegra. As I understand
> > > it, both interfaces can't be used at the same time. Once translation has
> > > been enabled for one client, all accesses generated by that client will
> > > be translated.
> > > 
> > > Hiroshi, please correct me if I'm wrong.
> > 
> > I believe the HW connectivity is always as follows:
> > 
> > Bus master (e.g. display controller) ---> IOMMU (Tegra SMMU) ---> RAM
> > 
> > In the IOMMU, there is a bit per bus master that indicates whether the
> > IOMMU translates the bus master's accesses or not. If that bit is
> > enabled, then page tables in the IOMMU are used to perform the translation.
> > 
> > You could also look at the HW setup as:
> > 
> > Bus master (e.g. display controller)
> >     v
> >    ----
> >   /    \
> >   ------
> >    |  \
> >    |   ------------------
> >    |                     \
> >    v                     v
> > IOMMU (Tegra SMMU) ---> RAM
> > 
> > But IIRC the bit that controls that demux is in the IOMMU, so this
> > distinction probably isn't relevant.
> 
> Ok. I think this case can be dealt with easily enough without
> having to represent it as two master ports on one device. There
> really is just one master, and it can be configured in two ways.

I think in this case, this is effectively a "bypass" control in
the IOMMU, so it can be treated as part of the IOMMU.

Whether any real system will require us to describe dynamic forking
is unclear.

By "dynamic forking" I mean where a transaction really does flow down
different paths or through different sets of components based on more
than just the destination address, possibly under runtime control.

> 
> We can either choose to make the DT representation decide which
> way is used, or we can always point to the IOMMU, and let the
> IOMMU driver decide.
> 
> > Now, perhaps there are devices which themselves control whether
> > transactions are sent to the IOMMU or direct to RAM, but I'm not
> > familiar with them. Is the GPU in that category, since it has its own
> > GMMU, albeit chained into the SMMU IIRC?
> 
> Devices with a built-in IOMMU such as most GPUs are also easy enough
> to handle: There is no reason to actually show the IOMMU in DT and
> we can just treat the GPU as a black box.

It's impossible for such a built-in IOMMU to be shared with other
devices, so that's probably reasonable.

For an IOMMU out in the interconnect, the OS needs to understand
that it is shared, so that it knows not to mess up other flows
when poking the IOMMU.

Cheers
---Dave

> 
> Note that you don't really have to support the dma-mapping.h API on
> GPUs, they usually need to go down to the IOMMU level anyway.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-09 10:56                 ` Dave Martin
  0 siblings, 0 replies; 58+ messages in thread
From: Dave Martin @ 2014-05-09 10:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 02, 2014 at 09:06:43PM +0200, Arnd Bergmann wrote:
> On Friday 02 May 2014 12:50:17 Stephen Warren wrote:
> > On 05/02/2014 07:23 AM, Thierry Reding wrote:
> > > On Fri, May 02, 2014 at 02:32:08PM +0200, Arnd Bergmann wrote:
> > >> On Friday 02 May 2014 13:05:58 Thierry Reding wrote:
> > >>>
> > >>> Let me see if I understood the above proposal by trying to translate it
> > >>> into a simple example for a specific use-case. On Tegra for example we
> > >>> have various units that can either access system memory directly or use
> > >>> the IOMMU to translate accesses for them. One such unit would be the
> > >>> display controller that scans out a framebuffer from memory.
> > >>
> > >> Can you explain how the decision is made whether the IOMMU gets used
> > >> or not? In all cases I've seen so far, I think we can hardwire this
> > >> in DT, and only expose one or the other. Are both ways used
> > >> concurrently?
> > > 
> > > It should be possible to hardcode this in DT for Tegra. As I understand
> > > it, both interfaces can't be used at the same time. Once translation has
> > > been enabled for one client, all accesses generated by that client will
> > > be translated.
> > > 
> > > Hiroshi, please correct me if I'm wrong.
> > 
> > I believe the HW connectivity is always as follows:
> > 
> > Bus master (e.g. display controller) ---> IOMMU (Tegra SMMU) ---> RAM
> > 
> > In the IOMMU, there is a bit per bus master that indicates whether the
> > IOMMU translates the bus master's accesses or not. If that bit is
> > enabled, then page tables in the IOMMU are used to perform the translation.
> > 
> > You could also look at the HW setup as:
> > 
> > Bus master (e.g. display controller)
> >     v
> >    ----
> >   /    \
> >   ------
> >    |  \
> >    |   ------------------
> >    |                     \
> >    v                     v
> > IOMMU (Tegra SMMU) ---> RAM
> > 
> > But IIRC the bit that controls that demux is in the IOMMU, so this
> > distinction probably isn't relevant.
> 
> Ok. I think this case can be dealt with easily enough without
> having to represent it as two master ports on one device. There
> really is just one master, and it can be configured in two ways.

I think in this case, this is effectively a "bypass" control in
the IOMMU, so it can be treated as part of the IOMMU.

Whether any real system will require us to describe dynamic forking
is unclear.

By "dynamic forking" I mean where a transaction really does flow down
different paths or through different sets of components based on more
than just the destination address, possibly under runtime control.

> 
> We can either choose to make the DT representation decide which
> way is used, or we can always point to the IOMMU, and let the
> IOMMU driver decide.
> 
> > Now, perhaps there are devices which themselves control whether
> > transactions are sent to the IOMMU or direct to RAM, but I'm not
> > familiar with them. Is the GPU in that category, since it has its own
> > GMMU, albeit chained into the SMMU IIRC?
> 
> Devices with a built-in IOMMU such as most GPUs are also easy enough
> to handle: There is no reason to actually show the IOMMU in DT and
> we can just treat the GPU as a black box.

It's impossible for such a built-in IOMMU to be shared with other
devices, so that's probably reasonable.

For an IOMMU out in the interconnect, the OS needs to understand
that it is shared, so that it knows not to mess up other flows
when poking the IOMMU.

Cheers
---Dave

> 
> Note that you don't really have to support the dma-mapping.h API on
> GPUs, they usually need to go down to the IOMMU level anyway.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-09 10:33                   ` Dave Martin
@ 2014-05-09 11:15                       ` Arnd Bergmann
  -1 siblings, 0 replies; 58+ messages in thread
From: Arnd Bergmann @ 2014-05-09 11:15 UTC (permalink / raw)
  To: Dave Martin
  Cc: Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Shaik Ameer Basha, Stephen Warren, Grant Grundler, Will Deacon,
	Jason Gunthorpe, Marc Zyngier, Thierry Reding,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Hiroshi Doyu

On Friday 09 May 2014 11:33:18 Dave Martin wrote:
> On Mon, May 05, 2014 at 05:14:44PM +0200, Arnd Bergmann wrote:
> > On Friday 02 May 2014 18:43:01 Dave Martin wrote:
> > > My expectation was that we do some check when probing a device to figure
> > > out the path from the device to main memory, thus figuring out the dma
> > > mask, the IOMMU (if any) and any relevant device ID.  This is a bit more
> > > complex than the existing situation, but I still think we could have
> > > common code for the bulk of it.
> > 
> > I'd still prefer if we could keep following just "dma-ranges" to find
> > a path from device to memory, with an extension to handle IOMMUs etc,
> > but not describe the things to the extend that your proposal does.
> 
> This is really a simplification.  If we only had to consider the path
> from devices to memory, and if the pretence that the memory is at /
> works (almost always true, at least for general-purpose memory), then
> we often have a simpler problem.  It is still not always solvable
> though, since a device could still have a unique access path with
> unique mappings that cannot be described in terms of any of the other
> paths.
> 
> However, this is not just about DMA any more.  Devices also master onto
> interrupt controllers to generate MSIs, onto IOMMUs to provide them
> with address space remapping, onto other devices via IOMMUs; and GPUs
> can master onto all kinds of things.
> 
> The alternative to a completely generic description of bus mastering
> would be to consider the specific things that we are interested in
> allowing devices to master on, and describe the whole end-to-end
> link for each in the bus master's node.
> 
>  * memory
>  * interrupt controllers (for MSIs)
>  * IOMMUs

Right. The "memory" and "IOMMU" targets are really variations of the
same theme here: The one place we need to know about how to access
main memory is when we set up the dma_map_ops, and we already need
to know a number of things there:

- coherency
- bus offsets
- swiotlb range, if needed
- which IOMMU, if any

MSI also seems like something that is worth treating special, in
particular since the ARM implementation so much unlike everybody
else's and we need to have a way to give a consistent API to device
drivers.

> This could lead to combinatorial blow-up as the number of target
> devices grows, since these linkages have to be re-described for
> every bus master -- especially for GPUs which could do a fair amount of
> device control by themselves.  It could also lead to fragmentation
> as each binding solves common problems in different ways.

If the linkage is the same for multiple devices on the same bus,
I think we can always do it the traditional way using dma-ranges
on the parent bus.

> The downside is that is any path flows through a dynamically
> configurable component, such as an IOMMU or a bridge that can be
> remapped, turned off etc., then unless we describe how the path is
> really linked together the kernel will need all sorts of baked-in
> knowledge in order to manage the system safely.  The _effect_ of the
> problem component on the path is not static, so we can't describe 
> that effect directly in the DT.  For truly weird features that are
> unique to a particular platform that's OK, but "how devices are
> linked together" seems a much more general and useful concept than that.

My feeling is that this belongs in the same category as for example
PCI bus windows or pin control: while in theory we could describe
in DT exactly what the hardware is capable of and let the kernel
decide how to do it, this is much too complicated in practice, so
we are better off describing in DT how we /want/ things to be set up,
or how things are already set up by the firmware and not to be
touched.

> > We already pretend that things are a tree for the purposes of MMIO,
> > which is probably still close enough for the vast majority of cases.
> > For simplicity, I'd actually prefer keeping the illusion that MMIO
> > and DMA are two different things, which matches how operating systems
> > see things even if it's no longer true for the hardware.
> > 
> > > If a device has different roles with completely different paths to
> > > memory, one option could be for the driver to instantiate two devices in
> > > the kernel.  This puts the burden on the driver for the device, instead
> > > of the core framework.
> > 
> > Yes, this is what I suggested earlier as well.
> 
> I may have missed your point slightly before.  Anyway, I think the
> "multiple master roles" issue is sufficiently unusual that describing 
> them using multiple device nodes in the DT is reasonable.  So I think
> I'm happy to get rid of the ability to specify and distinguish multiple
> roles within a single device node.

Ok.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-09 11:15                       ` Arnd Bergmann
  0 siblings, 0 replies; 58+ messages in thread
From: Arnd Bergmann @ 2014-05-09 11:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 09 May 2014 11:33:18 Dave Martin wrote:
> On Mon, May 05, 2014 at 05:14:44PM +0200, Arnd Bergmann wrote:
> > On Friday 02 May 2014 18:43:01 Dave Martin wrote:
> > > My expectation was that we do some check when probing a device to figure
> > > out the path from the device to main memory, thus figuring out the dma
> > > mask, the IOMMU (if any) and any relevant device ID.  This is a bit more
> > > complex than the existing situation, but I still think we could have
> > > common code for the bulk of it.
> > 
> > I'd still prefer if we could keep following just "dma-ranges" to find
> > a path from device to memory, with an extension to handle IOMMUs etc,
> > but not describe the things to the extend that your proposal does.
> 
> This is really a simplification.  If we only had to consider the path
> from devices to memory, and if the pretence that the memory is at /
> works (almost always true, at least for general-purpose memory), then
> we often have a simpler problem.  It is still not always solvable
> though, since a device could still have a unique access path with
> unique mappings that cannot be described in terms of any of the other
> paths.
> 
> However, this is not just about DMA any more.  Devices also master onto
> interrupt controllers to generate MSIs, onto IOMMUs to provide them
> with address space remapping, onto other devices via IOMMUs; and GPUs
> can master onto all kinds of things.
> 
> The alternative to a completely generic description of bus mastering
> would be to consider the specific things that we are interested in
> allowing devices to master on, and describe the whole end-to-end
> link for each in the bus master's node.
> 
>  * memory
>  * interrupt controllers (for MSIs)
>  * IOMMUs

Right. The "memory" and "IOMMU" targets are really variations of the
same theme here: The one place we need to know about how to access
main memory is when we set up the dma_map_ops, and we already need
to know a number of things there:

- coherency
- bus offsets
- swiotlb range, if needed
- which IOMMU, if any

MSI also seems like something that is worth treating special, in
particular since the ARM implementation so much unlike everybody
else's and we need to have a way to give a consistent API to device
drivers.

> This could lead to combinatorial blow-up as the number of target
> devices grows, since these linkages have to be re-described for
> every bus master -- especially for GPUs which could do a fair amount of
> device control by themselves.  It could also lead to fragmentation
> as each binding solves common problems in different ways.

If the linkage is the same for multiple devices on the same bus,
I think we can always do it the traditional way using dma-ranges
on the parent bus.

> The downside is that is any path flows through a dynamically
> configurable component, such as an IOMMU or a bridge that can be
> remapped, turned off etc., then unless we describe how the path is
> really linked together the kernel will need all sorts of baked-in
> knowledge in order to manage the system safely.  The _effect_ of the
> problem component on the path is not static, so we can't describe 
> that effect directly in the DT.  For truly weird features that are
> unique to a particular platform that's OK, but "how devices are
> linked together" seems a much more general and useful concept than that.

My feeling is that this belongs in the same category as for example
PCI bus windows or pin control: while in theory we could describe
in DT exactly what the hardware is capable of and let the kernel
decide how to do it, this is much too complicated in practice, so
we are better off describing in DT how we /want/ things to be set up,
or how things are already set up by the firmware and not to be
touched.

> > We already pretend that things are a tree for the purposes of MMIO,
> > which is probably still close enough for the vast majority of cases.
> > For simplicity, I'd actually prefer keeping the illusion that MMIO
> > and DMA are two different things, which matches how operating systems
> > see things even if it's no longer true for the hardware.
> > 
> > > If a device has different roles with completely different paths to
> > > memory, one option could be for the driver to instantiate two devices in
> > > the kernel.  This puts the burden on the driver for the device, instead
> > > of the core framework.
> > 
> > Yes, this is what I suggested earlier as well.
> 
> I may have missed your point slightly before.  Anyway, I think the
> "multiple master roles" issue is sufficiently unusual that describing 
> them using multiple device nodes in the DT is reasonable.  So I think
> I'm happy to get rid of the ability to specify and distinguish multiple
> roles within a single device node.

Ok.

	Arnd

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-02 20:36           ` Arnd Bergmann
@ 2014-05-09 13:26             ` Dave Martin
  -1 siblings, 0 replies; 58+ messages in thread
From: Dave Martin @ 2014-05-09 13:26 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Mark Rutland,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Shaik Ameer Basha,
	Stephen Warren, Grant Grundler, Will Deacon, Jason Gunthorpe,
	Marc Zyngier, Thierry Reding, Hiroshi Doyu

On Fri, May 02, 2014 at 10:36:43PM +0200, Arnd Bergmann wrote:
> On Friday 02 May 2014 18:31:20 Dave Martin wrote:
> > No, but I didn't state it very clearly.
> > 
> > In this:
> > 
> >         parent {
> >                 child {
> >                         ranges = < ... >;
> >                         dma-ranges = < ... >;
> >                 };
> >         };
> 
> The ranges and dma-ranges properties belong into parent, not child here.
> I guess that's what you meant at least.
> 
> > There are two transaction flows being described.  There are transactions
> > from parent -> child, for which "ranges" describes the mappings, and
> > there are transactions from child -> parent, for which "dma-ranges"
> > describes the mappings.
> 
> Right.
> 
> > The name "dma-ranges" obfuscates this symmetry, so it took me a while
> > to figure out what it really means -- maybe I'm still confused, but
> > I think that's the gist of it.
> > 
> > 
> > For the purposes of cross-links, my plan was that we interpret all
> > those links as "forward" (i.e., parent -> child) links, where the
> > referencing node is deemed to be the parent, and the referenced node is
> > deemed to be the child. Just as in the ePAPR case, the associated mapping
> > is then described by "ranges".
> 
> That seems counterintuitive to me. When a device initiates a transaction,
> it should look at the "dma-ranges" of its parent. The "slaves" property
> would be a way to redirect the parent for these transactions, but it
> doesn't mean that the device suddenly translates ranges as seen from
> its parent.
> 
> In other words, "ranges" should always move from CPU to MMIO target
> (slave), while "dma-ranges" should always move from a DMA master towards
> memory. If you want to represent a device-to-device DMA, you may have
> to move up a few levels using "dma-ranges" and then move down again
> using "ranges".

In unidirectional bus architectures with non-CPU bus masters, the
classification of flows as "upstream" or "downstream" is nonsensical.
There is only a single direction at any point: the topology derives
completely from how things are linked together.

Trying to orient the topology of SoC architectures can be like trying to
decide which way is up in an M. C. Escher drawing, but with less elegant
symmetry.  While the CPUs would be clearly placed at the top by almost
everybody, it's generally impossible to draw things so that some of the
topology isn't upside-down.

This wouldn't matter, except ePAPR DT does make a fundamental difference
between upstream and downstream: only a single destination is permitted
for upstream via "dma-ranges": the parent node.  For downstream,
multiple destinations are permitted, because a node can have multiple
children.  We can add additional logical children to a node without
radical change to the way traversal works.  But allowing multiple
parents is likely to be a good deal more disruptive.

With dma-ranges, unless multiple logical parent nodes are permitted for
traversal then we have no way to describe independent properties
or mappings for paths to different destinations: ePAPR does not allow
you to pingpong between upstream and downstream directions.  You
traverse in the upstream until you reach a common ancestor of the
destination, then you traverse downstream.  Towards the common 
ancestor there is no chance to describe different paths, because
there is only a single parent node at each point.  From the common
ansestor transactions from all masters follow the same paths through
the DT, so there is still no way to describe per-master-per-slave
mappings.

The "ranges" approach solves this problem, which I believe may
be important if we want to describe how ID signals are mapped via
the same mechanisms: we know that there really will be per-master-
per-slave mappings for IDs.

There may be other solutions, or aspects of the problem I still don't
understand properly ... (likely!)

> 
> > > Don't you need arguments to the phandle? It seems that in most
> > > cases, you need at least one of a dma-ranges like translation
> > > or a master ID. What you need would be specific to the slave.
> > 
> > For any 1:N relationship between nodes, you can describe the
> > _relationship_ by putting properties on the nodes at the "1" end.  This
> > is precisely how "ranges" and "dma-ranges" work.
> 
> That doesn't seem very helpful or intuitive though. If I have
> an IOMMU that N DMA masters can target, I don't want to have
> information about all the masters in the IOMMU, that information
> belongs into the masters, but the format in which it is stored
> must be specific to the IOMMU.
> 
> > The N:M case can be resolved by inserting simple-bus nodes into any
> > links with non-default mappings: i.e., you split each affected link in
> > two, with a simple-bus node in the middle describing the mapping:
> 
> Ok, ignoring those oddball cases for now, I'd like to first
> try to come to a more useful representation for the common case.
> 
> > root: / {
> >         ranges;
> >         ...
> > 
> >         master@1 {
> >                 slave {
> >                         ranges = < ... >;
> >                         slaves = <&root>;
> >                 };
> >         };
> 
> The "ranges" property here is really weird. I understand you
> mean this to be a device that has a unique mapping into its
> the slave (the root device here). I would reduce this to one
> of two cases:

We should probably revisit this.  It sounds like you got what I
was trying to do here, but my text above may make the rationale
a bit clearer.  Maybe.

> 
> a) it sits on an intermediate bus by itself, and that bus
> has the non-default mapping:
> 
> / {
> 	dma-ranges;
> 	ranges;
> 
> 	bus@x {
> 		ranges;
> 		dma-ranges = < ... >; // this is the mapping
> 		master@1 {
> 			...
> 		};
> 	};
> 
> b) the device itself is strange, but it's a property of the
> device that the driver knows about (e.g. it can only do
> a 30-bit DMA mask rather than 32 bit, for a realistic case),
> so we don't have represent this at all and let the driver
> deal with the translation:

We have a ready-made way to describe things like 30-bit DMA in
a quite generic way, if traversal is through a node with

	ranges = < 0x00000000 0x00000000 0x3fffffff >;

(or similarly, dma-ranges).  Deducing a 30-bit mask from that
is not hard.  A DMA mask actually does the same thing as a ranges
property, but in a much more limited way (which can be a good thing).

> 
> / {
> 	dma-ranges;
> 	ranges;
> 
> 	master@1 {
> 		...
> 	};
> 
> };
> 
> >         master@2 {
> >                 slave {
> >                         slaves = < &root &master2_dma_slave >;
> >                         slave-names = "config-fetch", "dma";
> > 
> >                	   master2_dma_slave: dma-slave {
> >                                 ranges = < ... >;
> >                                 slaves = <&root>;
> >                         };
> >                 };
> >         };
> 
> As I said before, I'd consider this a non-issue until anyone
> can come up with a case that needs the complexity.
> 
> A possible representation would be to have two masters
> as child nodes of the actual device to avoid having a 'slaves'
> property with multiple entries, and if the device is the only
> one with a weird translation, that can go to some other bus
> node we make up for this purpose:
> 
> / {
> 	ranges;
> 	dma-ranges;
> 
> 	fake-bus {
> 		dma-ranges = < ... >;
> 		slaves = < &{/} >; // is the default, so can be ommitted
> 	};
> 
> 	bus {
> 		ranges;
> 		// no dma-ranges --> no default DMA translation
> 
> 		device@1 {
> 			master@1 {
> 				// hopefully will never be
> 				// needed in real life
> 				slaves = < &{/fake-bus}>;
> 			};
> 
> 			master@2 {
> 				slaves = < &{/} >;
> 			};
> 		};
> 	};
> };
> 
> >         master@3 {
> >                 slaves = <&root>;
> >         };
> > };
> 
> Here, the slave is the parent device, which is the default
> anyway, so I wouldn't require listing anything at all,
> besides an empty dma-ranges in the parent node.
> 
> If we can get away with just a single entry in 'slaves' all
> the time, we could actually rename that property to 'dma-parent',
> for consistency with 'interrupt-parent' and a few other things.
> 
> Freescale already has 'fsl,iommu-parent' for this case, a
> 'dma-parent' would be a generalization of that, but less general
> than your 'slaves' property.
> 
> > > It may be best to make the ranges explicit here and then also
> > > allow additional fields depending on e.g. a #dma-slave-cells
> > > property in the slave.
> > > 
> > > For instance, a 32-bit master on a a 64-bit bus that has master-id
> > > 23 would look like
> > > 
> > >       otherbus: axi@somewhere{
> > >               #address-cells = <2>;
> > >               #size-cells = <2>;
> > >       };
> > > 
> > >       somemaster@somewhere {
> > >               #address-cells = <1>;
> > >               #size-cells = <1>;
> > >               slaves = <&otherbus  // phandle
> > >                               0     // local address
> > >                               0 0   // remote address
> > >                               0x1 0 // size
> > >                               23>;  // master id
> > >       };
> > 
> > I thought about this possibility, but was worried that the "slaves"
> > property would become awkward to parse, where except for the "master id"
> > concept, all these attributes are well described by ePAPR already for
> > bus nodes if we can figure out how to piggyback on them -- hence my
> > alternative approach explained above.
> > 
> > How to describe the "master id" is particularly problematic and may
> > be a separate discussion.  It can get munged or remapped as it
> > passes through the interconnect: for example, a PCI device's ID 
> > accompanying an MSI write may be translated once as it passes from
> > the PCI RC to an IOMMU, then again before it reaches the GIC.

As commented on the other branch of this thread, I think I agree with
this general approach.  We shouldn't have to do it that often, and it
keeps some complexity out of the core binding.

Cheers
---Dave

(Leaving the remainder of your reply for context -- thanks for this.)

> 
> Hmm, that would actually mean we'd have to do complex "dma-ranges"
> properties with more than one entry, which I had hoped to avoid.
> 
> > In the "windowed IOMMU" case, address bits are effectively being
> > mapped to ID bits as they reach IOMMU.
> > 
> > An IOMMU also does a complete mapping of ID+address -> ID'+address'
> > (although programmable rather than static and unprobeable, so the
> > actual mappings for an IOMMU won't be in the DT).
> 
> right.
> 
> > > > 2) The generic "slave" node(s) are for convenience and readability.
> > > >    They could be made eliminated by using child nodes with
> > > >    binding-specific names and referencing them in "slaves".  This is a
> > > >    bit more awkward, but has the same expressive power.
> > > > 
> > > >    Should the generic "slave" nodes go away?
> > > 
> > > I would prefer not having to have subnodes for the simple case
> > > where you just need to reference one slave iommu from a master
> > > device.
> > 
> > My expectation is that subnodes would only be useful in special cases in
> > any case.
> > 
> > We can remove the special "slave" name, because there's nothing to
> > stop us referencing other random nested nodes with the "slaves" property.
> 
> Ok.
> 
> > > I wouldn't be worried about cycles. We can just declare them forbidden
> > > in the binding. Anything can break if you supply a broken DT, this
> > > is the least of the problems.
> > 
> > That's my thought.  If there turns out to be a really good reason to
> > describe cycles then we can cross that bridge* when we come to it,
> > but it's best to forbid it until/unless the need for it is proven.
> > 
> > (*no pun intended)
> 
> Right.
> 
> > Note that a certain kind of trivial cycle will always be created
> > when a node refers back to its parent:
> > 
> > root: / {
> >         ranges;
> > 
> >         iommu {
> >                 reg = < ... >;
> >                 slaves = <&root>;
> >         };
> > };
> > 
> > ePAPR says that if there is no "ranges" property, then the parent
> > node cannot access any address of the child -- we can interpret
> > this as saying that transactions do not propagate.  "ranges" with
> > an empty value imples a complete 1:1 mapping, which we can interpret
> > as transactions being forwarded without any transformation.
> > 
> > Crucially, "iommu" must not have a "ranges" property in this case,
> > because this would permit a static routing cycle root -> iommu ->
> > root.
> 
> Makes sense.
> 
> 	Arnd
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-09 13:26             ` Dave Martin
  0 siblings, 0 replies; 58+ messages in thread
From: Dave Martin @ 2014-05-09 13:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 02, 2014 at 10:36:43PM +0200, Arnd Bergmann wrote:
> On Friday 02 May 2014 18:31:20 Dave Martin wrote:
> > No, but I didn't state it very clearly.
> > 
> > In this:
> > 
> >         parent {
> >                 child {
> >                         ranges = < ... >;
> >                         dma-ranges = < ... >;
> >                 };
> >         };
> 
> The ranges and dma-ranges properties belong into parent, not child here.
> I guess that's what you meant at least.
> 
> > There are two transaction flows being described.  There are transactions
> > from parent -> child, for which "ranges" describes the mappings, and
> > there are transactions from child -> parent, for which "dma-ranges"
> > describes the mappings.
> 
> Right.
> 
> > The name "dma-ranges" obfuscates this symmetry, so it took me a while
> > to figure out what it really means -- maybe I'm still confused, but
> > I think that's the gist of it.
> > 
> > 
> > For the purposes of cross-links, my plan was that we interpret all
> > those links as "forward" (i.e., parent -> child) links, where the
> > referencing node is deemed to be the parent, and the referenced node is
> > deemed to be the child. Just as in the ePAPR case, the associated mapping
> > is then described by "ranges".
> 
> That seems counterintuitive to me. When a device initiates a transaction,
> it should look at the "dma-ranges" of its parent. The "slaves" property
> would be a way to redirect the parent for these transactions, but it
> doesn't mean that the device suddenly translates ranges as seen from
> its parent.
> 
> In other words, "ranges" should always move from CPU to MMIO target
> (slave), while "dma-ranges" should always move from a DMA master towards
> memory. If you want to represent a device-to-device DMA, you may have
> to move up a few levels using "dma-ranges" and then move down again
> using "ranges".

In unidirectional bus architectures with non-CPU bus masters, the
classification of flows as "upstream" or "downstream" is nonsensical.
There is only a single direction at any point: the topology derives
completely from how things are linked together.

Trying to orient the topology of SoC architectures can be like trying to
decide which way is up in an M. C. Escher drawing, but with less elegant
symmetry.  While the CPUs would be clearly placed at the top by almost
everybody, it's generally impossible to draw things so that some of the
topology isn't upside-down.

This wouldn't matter, except ePAPR DT does make a fundamental difference
between upstream and downstream: only a single destination is permitted
for upstream via "dma-ranges": the parent node.  For downstream,
multiple destinations are permitted, because a node can have multiple
children.  We can add additional logical children to a node without
radical change to the way traversal works.  But allowing multiple
parents is likely to be a good deal more disruptive.

With dma-ranges, unless multiple logical parent nodes are permitted for
traversal then we have no way to describe independent properties
or mappings for paths to different destinations: ePAPR does not allow
you to pingpong between upstream and downstream directions.  You
traverse in the upstream until you reach a common ancestor of the
destination, then you traverse downstream.  Towards the common 
ancestor there is no chance to describe different paths, because
there is only a single parent node at each point.  From the common
ansestor transactions from all masters follow the same paths through
the DT, so there is still no way to describe per-master-per-slave
mappings.

The "ranges" approach solves this problem, which I believe may
be important if we want to describe how ID signals are mapped via
the same mechanisms: we know that there really will be per-master-
per-slave mappings for IDs.

There may be other solutions, or aspects of the problem I still don't
understand properly ... (likely!)

> 
> > > Don't you need arguments to the phandle? It seems that in most
> > > cases, you need at least one of a dma-ranges like translation
> > > or a master ID. What you need would be specific to the slave.
> > 
> > For any 1:N relationship between nodes, you can describe the
> > _relationship_ by putting properties on the nodes at the "1" end.  This
> > is precisely how "ranges" and "dma-ranges" work.
> 
> That doesn't seem very helpful or intuitive though. If I have
> an IOMMU that N DMA masters can target, I don't want to have
> information about all the masters in the IOMMU, that information
> belongs into the masters, but the format in which it is stored
> must be specific to the IOMMU.
> 
> > The N:M case can be resolved by inserting simple-bus nodes into any
> > links with non-default mappings: i.e., you split each affected link in
> > two, with a simple-bus node in the middle describing the mapping:
> 
> Ok, ignoring those oddball cases for now, I'd like to first
> try to come to a more useful representation for the common case.
> 
> > root: / {
> >         ranges;
> >         ...
> > 
> >         master at 1 {
> >                 slave {
> >                         ranges = < ... >;
> >                         slaves = <&root>;
> >                 };
> >         };
> 
> The "ranges" property here is really weird. I understand you
> mean this to be a device that has a unique mapping into its
> the slave (the root device here). I would reduce this to one
> of two cases:

We should probably revisit this.  It sounds like you got what I
was trying to do here, but my text above may make the rationale
a bit clearer.  Maybe.

> 
> a) it sits on an intermediate bus by itself, and that bus
> has the non-default mapping:
> 
> / {
> 	dma-ranges;
> 	ranges;
> 
> 	bus at x {
> 		ranges;
> 		dma-ranges = < ... >; // this is the mapping
> 		master at 1 {
> 			...
> 		};
> 	};
> 
> b) the device itself is strange, but it's a property of the
> device that the driver knows about (e.g. it can only do
> a 30-bit DMA mask rather than 32 bit, for a realistic case),
> so we don't have represent this at all and let the driver
> deal with the translation:

We have a ready-made way to describe things like 30-bit DMA in
a quite generic way, if traversal is through a node with

	ranges = < 0x00000000 0x00000000 0x3fffffff >;

(or similarly, dma-ranges).  Deducing a 30-bit mask from that
is not hard.  A DMA mask actually does the same thing as a ranges
property, but in a much more limited way (which can be a good thing).

> 
> / {
> 	dma-ranges;
> 	ranges;
> 
> 	master at 1 {
> 		...
> 	};
> 
> };
> 
> >         master at 2 {
> >                 slave {
> >                         slaves = < &root &master2_dma_slave >;
> >                         slave-names = "config-fetch", "dma";
> > 
> >                	   master2_dma_slave: dma-slave {
> >                                 ranges = < ... >;
> >                                 slaves = <&root>;
> >                         };
> >                 };
> >         };
> 
> As I said before, I'd consider this a non-issue until anyone
> can come up with a case that needs the complexity.
> 
> A possible representation would be to have two masters
> as child nodes of the actual device to avoid having a 'slaves'
> property with multiple entries, and if the device is the only
> one with a weird translation, that can go to some other bus
> node we make up for this purpose:
> 
> / {
> 	ranges;
> 	dma-ranges;
> 
> 	fake-bus {
> 		dma-ranges = < ... >;
> 		slaves = < &{/} >; // is the default, so can be ommitted
> 	};
> 
> 	bus {
> 		ranges;
> 		// no dma-ranges --> no default DMA translation
> 
> 		device at 1 {
> 			master at 1 {
> 				// hopefully will never be
> 				// needed in real life
> 				slaves = < &{/fake-bus}>;
> 			};
> 
> 			master at 2 {
> 				slaves = < &{/} >;
> 			};
> 		};
> 	};
> };
> 
> >         master at 3 {
> >                 slaves = <&root>;
> >         };
> > };
> 
> Here, the slave is the parent device, which is the default
> anyway, so I wouldn't require listing anything at all,
> besides an empty dma-ranges in the parent node.
> 
> If we can get away with just a single entry in 'slaves' all
> the time, we could actually rename that property to 'dma-parent',
> for consistency with 'interrupt-parent' and a few other things.
> 
> Freescale already has 'fsl,iommu-parent' for this case, a
> 'dma-parent' would be a generalization of that, but less general
> than your 'slaves' property.
> 
> > > It may be best to make the ranges explicit here and then also
> > > allow additional fields depending on e.g. a #dma-slave-cells
> > > property in the slave.
> > > 
> > > For instance, a 32-bit master on a a 64-bit bus that has master-id
> > > 23 would look like
> > > 
> > >       otherbus: axi at somewhere{
> > >               #address-cells = <2>;
> > >               #size-cells = <2>;
> > >       };
> > > 
> > >       somemaster at somewhere {
> > >               #address-cells = <1>;
> > >               #size-cells = <1>;
> > >               slaves = <&otherbus  // phandle
> > >                               0     // local address
> > >                               0 0   // remote address
> > >                               0x1 0 // size
> > >                               23>;  // master id
> > >       };
> > 
> > I thought about this possibility, but was worried that the "slaves"
> > property would become awkward to parse, where except for the "master id"
> > concept, all these attributes are well described by ePAPR already for
> > bus nodes if we can figure out how to piggyback on them -- hence my
> > alternative approach explained above.
> > 
> > How to describe the "master id" is particularly problematic and may
> > be a separate discussion.  It can get munged or remapped as it
> > passes through the interconnect: for example, a PCI device's ID 
> > accompanying an MSI write may be translated once as it passes from
> > the PCI RC to an IOMMU, then again before it reaches the GIC.

As commented on the other branch of this thread, I think I agree with
this general approach.  We shouldn't have to do it that often, and it
keeps some complexity out of the core binding.

Cheers
---Dave

(Leaving the remainder of your reply for context -- thanks for this.)

> 
> Hmm, that would actually mean we'd have to do complex "dma-ranges"
> properties with more than one entry, which I had hoped to avoid.
> 
> > In the "windowed IOMMU" case, address bits are effectively being
> > mapped to ID bits as they reach IOMMU.
> > 
> > An IOMMU also does a complete mapping of ID+address -> ID'+address'
> > (although programmable rather than static and unprobeable, so the
> > actual mappings for an IOMMU won't be in the DT).
> 
> right.
> 
> > > > 2) The generic "slave" node(s) are for convenience and readability.
> > > >    They could be made eliminated by using child nodes with
> > > >    binding-specific names and referencing them in "slaves".  This is a
> > > >    bit more awkward, but has the same expressive power.
> > > > 
> > > >    Should the generic "slave" nodes go away?
> > > 
> > > I would prefer not having to have subnodes for the simple case
> > > where you just need to reference one slave iommu from a master
> > > device.
> > 
> > My expectation is that subnodes would only be useful in special cases in
> > any case.
> > 
> > We can remove the special "slave" name, because there's nothing to
> > stop us referencing other random nested nodes with the "slaves" property.
> 
> Ok.
> 
> > > I wouldn't be worried about cycles. We can just declare them forbidden
> > > in the binding. Anything can break if you supply a broken DT, this
> > > is the least of the problems.
> > 
> > That's my thought.  If there turns out to be a really good reason to
> > describe cycles then we can cross that bridge* when we come to it,
> > but it's best to forbid it until/unless the need for it is proven.
> > 
> > (*no pun intended)
> 
> Right.
> 
> > Note that a certain kind of trivial cycle will always be created
> > when a node refers back to its parent:
> > 
> > root: / {
> >         ranges;
> > 
> >         iommu {
> >                 reg = < ... >;
> >                 slaves = <&root>;
> >         };
> > };
> > 
> > ePAPR says that if there is no "ranges" property, then the parent
> > node cannot access any address of the child -- we can interpret
> > this as saying that transactions do not propagate.  "ranges" with
> > an empty value imples a complete 1:1 mapping, which we can interpret
> > as transactions being forwarded without any transformation.
> > 
> > Crucially, "iommu" must not have a "ranges" property in this case,
> > because this would permit a static routing cycle root -> iommu ->
> > root.
> 
> Makes sense.
> 
> 	Arnd
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-02 18:17           ` Jason Gunthorpe
@ 2014-05-09 14:16               ` Dave Martin
  -1 siblings, 0 replies; 58+ messages in thread
From: Dave Martin @ 2014-05-09 14:16 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Shaik Ameer Basha, Arnd Bergmann, Stephen Warren, Grant Grundler,
	Will Deacon, Marc Zyngier, Thierry Reding,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Hiroshi Doyu

On Fri, May 02, 2014 at 12:17:50PM -0600, Jason Gunthorpe wrote:
> On Fri, May 02, 2014 at 06:31:20PM +0100, Dave Martin wrote:
> 
> > Note that there is no cycle through the "reg" property on iommu:
> > "reg" indicates a sink for transactions; "slaves" indicates a
> > source of transactions, and "ranges" indicates a propagator of
> > transactions.
> 
> I wonder if this might be a better naming scheme, I actually don't
> really like 'slave' for this, it really only applies well to AXI style
> unidirectional busses, and any sort of message-based bus architectures
> (HT, PCI, QPI, etc) just have the concept of an initiator and target.
> 
> Since initiator/target applies equally well to master/slave buses,
> that seems like better, clearer, naming.

Sure, I wouldn't have a problem with such a suggestion.  A more neutral
naming is less likely to cause confusion.

> Using a nomenclature where
>   'reg' describes a target reachable from the CPU initiator via the
>         natural DT hierarchy

I would say, reachable from the parent device node (which implies your
statement).  This is consistent with the way ePAPR describes device-to-
device DMA (even if Linux doesn't usually make a lot of use of that).

>   'initiator' describes a non-CPU (eg 'DMA') source of ops, and
>         travels via the path described to memory (which is the
> 	target).

CPUs are initiators only; non-mastering devices are targets only.

We might want some terminology to distinguish between mastering 
devices and bridges, both of which act as initiators and targets.

We could have a concept of a "forwarder" or "gateway".  But a bus
may still be a target as well as forwarder: if the bus contains some
control registers for example.  There is nothing to stop "reg" and
"ranges" being present on the same node.

"ranges" and "dma-ranges" both describe a node's forwarding role,
one for transactions received from the parent, and one for transactions
received from children.

>   'path' describes the route between an intitator and target, where
>         bridges along the route may alter the operation.

ok

>   'upstream' path direction toward the target, typically memory.

I'm not keen on that, because we would describe the hop between /
and /memory as downstream or upstream depending on who initiates the
transaction.  (I appreciate you weren't including CPUs in your
discussion, but if the termology works for the whole system it
would be a bonus).

>   'upstream-bridge' The next hop on a path between an initiator/target

Maybe.  I'm still not sure quite why this is considered different
from the downward path through the DT, except that you consider
the cross-links in the DT to be "upward", but I considered them
"downward" (which I think are mostly equivalent approaches).

Can you elaborate?

> 
> But I would encourage you to think about the various limitations this
> still has
>  - NUMA systems. How does one describe the path from each
>    CPU to a target regs, and target memory? This is important for
>    automatically setting affinities.

This is a good point.

Currently I had only been considering visibility, not affinity.
We actually have a similar problem with GIC, where there may
be multiple MSI mailboxes visible to a device, but one that is
preferred (due to being fewer hops away in the silicon, even though
the routing may be transparent).

I wasn't trying to solve this problem yet, and don't have a good
answer for it at present.

We could describe a whole separate bus for each CPU, with links
to common interconnect subtrees downstream.  But that might involve
a lot of duplication.  Your example below doesn't look too bad
though.

>  - Peer-to-Peer DMA, this is where a non-CPU initiator speaks to a
>    non-memory target, possibly through IOMMUs and what not. ie
>    a graphics card in a PCI-E slot DMA'ing through a QPI bus to
>    a graphics card in a PCI-E slot attached to a different socket.

Actually, I do intend to describe that and I think I achieved it :)

To try to keep the length of this mail down a bit I won't try to
give an example here, but I'm happy to follow up later if this is
still not answered elsewhere in the thread.

> 
> These are already use-cases happening on x86.. and the same underlying
> hardware architectures this tries to describe for DMA to memory is at
> work for the above as well.
> 
> Basically, these days, interconnect is a graph. Pretending things are
> a tree is stressful :)
> 
> Here is a basic attempt using the above language, trying to describe
> an x86ish system with two sockets, two DMA devices, where one has DMA
> target capabable memory (eg a GPU)
> 
> // DT tree is the view from the SMP CPU complex down to regs
> smp_system {
>    socket0 {
>        cpu0@0 {}
>        cpu1@0 {}
>        memory@0: {}
>        interconnect0: {targets = <&memory@0,interconnect1>;}
>        interconnect0_control: {
>              ranges;
>              peripheral@0 {
>    		regs = <>;
>                 intiator1 {
>                         ranges = < ... >;
>                         // View from this DMA initiator back to memory
>                         upstream-bridge = <&interconnect0>;
>                 };
> 		/* For some reason this peripheral has two DMA
> 		   initiation ports. */
>                 intiator2 {
>                         ranges = < ... >;
>                         upstream-bridge = <&interconnect0>;
>                 };

Describing separate masters within a device in this way looks quite nice.

Understanding what to do with them can still be left up to the driver
for the parent node (peripheral@0 in this case).

>              };
>         };
>    }
>    socket1 {
>        cpu0@1 {}
>        cpu1@1 {}
>        memory@1: {}
>        interconnect1: {targets = <&memory@1,&interconnect0,&peripheral@1/target>;}
>        interconnect1_control: {
>              ranges;
>              peripheral@1 {
>                 ranges = < ... >;
>    		regs = <>;
>                 intiator {
>                         ranges = < ... >;
>                         // View from this DMA initiator back to memory
>                         upstream-bridge = <&interconnect1>;
>                 };
>                 target {
> 		        reg = <..>
>                         /* This peripheral has integrated memory!
>                            But notice the CPU path is
>                              smp_system -> socket1 -> interconnect1_control -> target
> 			   While a DMA path is
>                              intiator1 -> interconnect0 -> interconnect1 -> target
> 			 */
>                 };

By hiding slaves (as opposed to masters) inside subnodes, can DT do
generic reachability analysis?  Maybe the answer is "yes".  I know
devices hanging of buses whose compatible string is not "simple-bus" are
not automatically probed, but there are other reasons for that, such as
bus-specific power-on and probing methods.

>             };
>             peripheral2@0 {
>    		regs = <>;
> 
> 		// Or we can write the simplest case like this.
> 		dma-ranges = <>;
> 		upstream-bridge = <&interconnect1>;
>                 /* if upstream-bridge is omitted then it defaults to
> 	           &parent, eg interconnect1_control */

This doesn't seem so different from my approach, though I need to
think about it a bit more.

>        }
> }
> 
> It is computable that ops from initator2 -> target flow through
> interconnect0, interconnect1, and then are delivered to target.
> 
> It has a fair symmetry with the interrupt-parent mechanism..

Although that language is rather different from mine, I think my
proposal could describe this.  It doesn't preclude multi-rooted trees etc.;
we could give a CPU a "slaves" property to override the default child
for transaction rooting (which for CPUs is / -- somewhat illogical, but
that's the way ePAPR has it).

There's no reason why buses can't be cross-connected using slaves
properties.  I'd avoided such things so far, because it introduces
new cycle risks, such as
socket@0 -> cross -> socket@1 -> cross -> socket@0 in the following.

(This cycle is also present in your example, with different syntax,
via interconnectX { targets = < ... &interconnectY >; };  I probably
misunderstood some aspects of your example -- feel free to put me right.)

/ {
	cpus {
		cpu@0 {
			slaves = <&socket0_interconnect>;
		};
		cpu@1 {
			slaves = <&socket0_interconnect>;
		};
		cpu@2 {
			slaves = <&socket1_interconnect>;
		};
		cpu@3 {
			slaves = <&socket1_interconnect>;
		};
	};


socket0_interconnect: socket@0 {
		slaves = <&socket0_cross_connector &common_bus>;

		memory {
			reg = < ... >;
		};

socket0_cross_connector: cross {
			ranges = < ... >;
		};
	};

socket1_interconnect: socket@1 {
		slaves = <&socket1_cross_connector &common_bus>;

		memory {
			reg = < ... >;
		};

socket0_cross_connector: cross {
			ranges = < ... >;
		};
	};

	common_bus {
		ranges;

		...
	};
};

(This very slapdash, but hopefully you get the idea.)

Of course, nothing about this tells an OS anything about affinity,
except what it can guess from the number of nodes that must be traversed
between two points -- which may be misleading, particular if extra nodes
are inserted in order to describe mappings and linkages.

Cycles could be avoided via the cross-connector ranges properties -- I
would sincerely hope that the hardware really does something
equivalent -- but then you cannot answer questions like "is the path
from X to Y cycle-free" without also specifying an address.

Of course, if we make a rule that the DT must be cycle-free for all
transactions we could make it the author's responsibility, with a dumb,
brute-force limit in the parser on the number of nodes permitted in
any path.


The downside of this approach is that the DT is unparseable to any
parser that doesn't understand the new concepts.

For visibility that's acceptable, because if ePAPR doesn't allow for
a correct describtion of visibility then a correct DT could not
be interpreted comprehensively in any case.

For affinity, I feel that we should structure the DT in a way that
still describes reachability and visibility correctly, even when
processed by a tool that doesn't understand the affinity concepts.
But I don't see how to do that yet.

Let me know if you have any ideas!

Cheers
---Dave

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-09 14:16               ` Dave Martin
  0 siblings, 0 replies; 58+ messages in thread
From: Dave Martin @ 2014-05-09 14:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 02, 2014 at 12:17:50PM -0600, Jason Gunthorpe wrote:
> On Fri, May 02, 2014 at 06:31:20PM +0100, Dave Martin wrote:
> 
> > Note that there is no cycle through the "reg" property on iommu:
> > "reg" indicates a sink for transactions; "slaves" indicates a
> > source of transactions, and "ranges" indicates a propagator of
> > transactions.
> 
> I wonder if this might be a better naming scheme, I actually don't
> really like 'slave' for this, it really only applies well to AXI style
> unidirectional busses, and any sort of message-based bus architectures
> (HT, PCI, QPI, etc) just have the concept of an initiator and target.
> 
> Since initiator/target applies equally well to master/slave buses,
> that seems like better, clearer, naming.

Sure, I wouldn't have a problem with such a suggestion.  A more neutral
naming is less likely to cause confusion.

> Using a nomenclature where
>   'reg' describes a target reachable from the CPU initiator via the
>         natural DT hierarchy

I would say, reachable from the parent device node (which implies your
statement).  This is consistent with the way ePAPR describes device-to-
device DMA (even if Linux doesn't usually make a lot of use of that).

>   'initiator' describes a non-CPU (eg 'DMA') source of ops, and
>         travels via the path described to memory (which is the
> 	target).

CPUs are initiators only; non-mastering devices are targets only.

We might want some terminology to distinguish between mastering 
devices and bridges, both of which act as initiators and targets.

We could have a concept of a "forwarder" or "gateway".  But a bus
may still be a target as well as forwarder: if the bus contains some
control registers for example.  There is nothing to stop "reg" and
"ranges" being present on the same node.

"ranges" and "dma-ranges" both describe a node's forwarding role,
one for transactions received from the parent, and one for transactions
received from children.

>   'path' describes the route between an intitator and target, where
>         bridges along the route may alter the operation.

ok

>   'upstream' path direction toward the target, typically memory.

I'm not keen on that, because we would describe the hop between /
and /memory as downstream or upstream depending on who initiates the
transaction.  (I appreciate you weren't including CPUs in your
discussion, but if the termology works for the whole system it
would be a bonus).

>   'upstream-bridge' The next hop on a path between an initiator/target

Maybe.  I'm still not sure quite why this is considered different
from the downward path through the DT, except that you consider
the cross-links in the DT to be "upward", but I considered them
"downward" (which I think are mostly equivalent approaches).

Can you elaborate?

> 
> But I would encourage you to think about the various limitations this
> still has
>  - NUMA systems. How does one describe the path from each
>    CPU to a target regs, and target memory? This is important for
>    automatically setting affinities.

This is a good point.

Currently I had only been considering visibility, not affinity.
We actually have a similar problem with GIC, where there may
be multiple MSI mailboxes visible to a device, but one that is
preferred (due to being fewer hops away in the silicon, even though
the routing may be transparent).

I wasn't trying to solve this problem yet, and don't have a good
answer for it at present.

We could describe a whole separate bus for each CPU, with links
to common interconnect subtrees downstream.  But that might involve
a lot of duplication.  Your example below doesn't look too bad
though.

>  - Peer-to-Peer DMA, this is where a non-CPU initiator speaks to a
>    non-memory target, possibly through IOMMUs and what not. ie
>    a graphics card in a PCI-E slot DMA'ing through a QPI bus to
>    a graphics card in a PCI-E slot attached to a different socket.

Actually, I do intend to describe that and I think I achieved it :)

To try to keep the length of this mail down a bit I won't try to
give an example here, but I'm happy to follow up later if this is
still not answered elsewhere in the thread.

> 
> These are already use-cases happening on x86.. and the same underlying
> hardware architectures this tries to describe for DMA to memory is at
> work for the above as well.
> 
> Basically, these days, interconnect is a graph. Pretending things are
> a tree is stressful :)
> 
> Here is a basic attempt using the above language, trying to describe
> an x86ish system with two sockets, two DMA devices, where one has DMA
> target capabable memory (eg a GPU)
> 
> // DT tree is the view from the SMP CPU complex down to regs
> smp_system {
>    socket0 {
>        cpu0 at 0 {}
>        cpu1 at 0 {}
>        memory at 0: {}
>        interconnect0: {targets = <&memory@0,interconnect1>;}
>        interconnect0_control: {
>              ranges;
>              peripheral at 0 {
>    		regs = <>;
>                 intiator1 {
>                         ranges = < ... >;
>                         // View from this DMA initiator back to memory
>                         upstream-bridge = <&interconnect0>;
>                 };
> 		/* For some reason this peripheral has two DMA
> 		   initiation ports. */
>                 intiator2 {
>                         ranges = < ... >;
>                         upstream-bridge = <&interconnect0>;
>                 };

Describing separate masters within a device in this way looks quite nice.

Understanding what to do with them can still be left up to the driver
for the parent node (peripheral at 0 in this case).

>              };
>         };
>    }
>    socket1 {
>        cpu0 at 1 {}
>        cpu1 at 1 {}
>        memory at 1: {}
>        interconnect1: {targets = <&memory at 1,&interconnect0,&peripheral@1/target>;}
>        interconnect1_control: {
>              ranges;
>              peripheral at 1 {
>                 ranges = < ... >;
>    		regs = <>;
>                 intiator {
>                         ranges = < ... >;
>                         // View from this DMA initiator back to memory
>                         upstream-bridge = <&interconnect1>;
>                 };
>                 target {
> 		        reg = <..>
>                         /* This peripheral has integrated memory!
>                            But notice the CPU path is
>                              smp_system -> socket1 -> interconnect1_control -> target
> 			   While a DMA path is
>                              intiator1 -> interconnect0 -> interconnect1 -> target
> 			 */
>                 };

By hiding slaves (as opposed to masters) inside subnodes, can DT do
generic reachability analysis?  Maybe the answer is "yes".  I know
devices hanging of buses whose compatible string is not "simple-bus" are
not automatically probed, but there are other reasons for that, such as
bus-specific power-on and probing methods.

>             };
>             peripheral2 at 0 {
>    		regs = <>;
> 
> 		// Or we can write the simplest case like this.
> 		dma-ranges = <>;
> 		upstream-bridge = <&interconnect1>;
>                 /* if upstream-bridge is omitted then it defaults to
> 	           &parent, eg interconnect1_control */

This doesn't seem so different from my approach, though I need to
think about it a bit more.

>        }
> }
> 
> It is computable that ops from initator2 -> target flow through
> interconnect0, interconnect1, and then are delivered to target.
> 
> It has a fair symmetry with the interrupt-parent mechanism..

Although that language is rather different from mine, I think my
proposal could describe this.  It doesn't preclude multi-rooted trees etc.;
we could give a CPU a "slaves" property to override the default child
for transaction rooting (which for CPUs is / -- somewhat illogical, but
that's the way ePAPR has it).

There's no reason why buses can't be cross-connected using slaves
properties.  I'd avoided such things so far, because it introduces
new cycle risks, such as
socket at 0 -> cross -> socket at 1 -> cross -> socket at 0 in the following.

(This cycle is also present in your example, with different syntax,
via interconnectX { targets = < ... &interconnectY >; };  I probably
misunderstood some aspects of your example -- feel free to put me right.)

/ {
	cpus {
		cpu at 0 {
			slaves = <&socket0_interconnect>;
		};
		cpu at 1 {
			slaves = <&socket0_interconnect>;
		};
		cpu at 2 {
			slaves = <&socket1_interconnect>;
		};
		cpu at 3 {
			slaves = <&socket1_interconnect>;
		};
	};


socket0_interconnect: socket at 0 {
		slaves = <&socket0_cross_connector &common_bus>;

		memory {
			reg = < ... >;
		};

socket0_cross_connector: cross {
			ranges = < ... >;
		};
	};

socket1_interconnect: socket at 1 {
		slaves = <&socket1_cross_connector &common_bus>;

		memory {
			reg = < ... >;
		};

socket0_cross_connector: cross {
			ranges = < ... >;
		};
	};

	common_bus {
		ranges;

		...
	};
};

(This very slapdash, but hopefully you get the idea.)

Of course, nothing about this tells an OS anything about affinity,
except what it can guess from the number of nodes that must be traversed
between two points -- which may be misleading, particular if extra nodes
are inserted in order to describe mappings and linkages.

Cycles could be avoided via the cross-connector ranges properties -- I
would sincerely hope that the hardware really does something
equivalent -- but then you cannot answer questions like "is the path
from X to Y cycle-free" without also specifying an address.

Of course, if we make a rule that the DT must be cycle-free for all
transactions we could make it the author's responsibility, with a dumb,
brute-force limit in the parser on the number of nodes permitted in
any path.


The downside of this approach is that the DT is unparseable to any
parser that doesn't understand the new concepts.

For visibility that's acceptable, because if ePAPR doesn't allow for
a correct describtion of visibility then a correct DT could not
be interpreted comprehensively in any case.

For affinity, I feel that we should structure the DT in a way that
still describes reachability and visibility correctly, even when
processed by a tool that doesn't understand the affinity concepts.
But I don't see how to do that yet.

Let me know if you have any ideas!

Cheers
---Dave

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-09 10:33                   ` Dave Martin
@ 2014-05-09 14:59                       ` Grant Grundler
  -1 siblings, 0 replies; 58+ messages in thread
From: Grant Grundler @ 2014-05-09 14:59 UTC (permalink / raw)
  To: Dave Martin
  Cc: Arnd Bergmann, Mark Rutland, Linux DeviceTree, Shaik Ameer Basha,
	Stephen Warren, Grant Grundler, Will Deacon, Jason Gunthorpe,
	Marc Zyngier, Thierry Reding,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Hiroshi Doyu

Dave,
Mostly agree with your assessment. My $0.02 on one comment:


On Fri, May 9, 2014 at 3:33 AM, Dave Martin <Dave.Martin-5wv7dgnIgG8@public.gmane.org> wrote:
....
> The downside is that is any path flows through a dynamically
> configurable component, such as an IOMMU or a bridge that can be
> remapped, turned off etc., then unless we describe how the path is
> really linked together the kernel will need all sorts of baked-in
> knowledge in order to manage the system safely.

Absolutely. Some knowledge (assumptions) will be "baked in" as code
and some knowledge (explicit parameters) as device tree.

My understanding of device tree is it essentially specifies
"parameters" and "methods" (code provided in the kernel). All methods
make some assumptions about the behavior of the HW it's operating on
and as long as all users of a method share that assumption, all is
well. We don't need device tree to describe those assumptions.

cheers,
grant
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-09 14:59                       ` Grant Grundler
  0 siblings, 0 replies; 58+ messages in thread
From: Grant Grundler @ 2014-05-09 14:59 UTC (permalink / raw)
  To: linux-arm-kernel

Dave,
Mostly agree with your assessment. My $0.02 on one comment:


On Fri, May 9, 2014 at 3:33 AM, Dave Martin <Dave.Martin@arm.com> wrote:
....
> The downside is that is any path flows through a dynamically
> configurable component, such as an IOMMU or a bridge that can be
> remapped, turned off etc., then unless we describe how the path is
> really linked together the kernel will need all sorts of baked-in
> knowledge in order to manage the system safely.

Absolutely. Some knowledge (assumptions) will be "baked in" as code
and some knowledge (explicit parameters) as device tree.

My understanding of device tree is it essentially specifies
"parameters" and "methods" (code provided in the kernel). All methods
make some assumptions about the behavior of the HW it's operating on
and as long as all users of a method share that assumption, all is
well. We don't need device tree to describe those assumptions.

cheers,
grant

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-09 14:16               ` Dave Martin
@ 2014-05-09 17:10                   ` Jason Gunthorpe
  -1 siblings, 0 replies; 58+ messages in thread
From: Jason Gunthorpe @ 2014-05-09 17:10 UTC (permalink / raw)
  To: Dave Martin
  Cc: Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Shaik Ameer Basha, Arnd Bergmann, Stephen Warren, Grant Grundler,
	Will Deacon, Marc Zyngier, Thierry Reding,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Hiroshi Doyu

On Fri, May 09, 2014 at 03:16:33PM +0100, Dave Martin wrote:
> On Fri, May 02, 2014 at 12:17:50PM -0600, Jason Gunthorpe wrote:

> > I wonder if this might be a better naming scheme, I actually don't
> > really like 'slave' for this, it really only applies well to AXI style
> > unidirectional busses, and any sort of message-based bus architectures
> > (HT, PCI, QPI, etc) just have the concept of an initiator and target.
> > 
> > Since initiator/target applies equally well to master/slave buses,
> > that seems like better, clearer, naming.
> 
> Sure, I wouldn't have a problem with such a suggestion.  A more neutral
> naming is less likely to cause confusion.
> 
> > Using a nomenclature where
> >   'reg' describes a target reachable from the CPU initiator via the
> >         natural DT hierarchy
> 
> I would say, reachable from the parent device node (which implies your
> statement).  This is consistent with the way ePAPR describes device-to-
> device DMA (even if Linux doesn't usually make a lot of use of that).

Trying to simplify, but yes, that is right..

> >   'initiator' describes a non-CPU (eg 'DMA') source of ops, and
> >         travels via the path described to memory (which is the
> > 	target).
> 
> CPUs are initiators only; non-mastering devices are targets only.
> 
> We might want some terminology to distinguish between mastering 
> devices and bridges, both of which act as initiators and targets.

I was hoping to simplify a bit. What the kernel needs, really, is the
node that initates a transaction, the node that is the ultimate
completing target of that transaction and the path through all
intervening (transformative or not) components.

The fact a bridge is bus-level slave on one side and a bus-level
master on another is not relevant to the above - a bridge is not an
initiator and it is not a completing target.

> We could have a concept of a "forwarder" or "gateway".  But a bus
> may still be a target as well as forwarder: if the bus contains some
> control registers for example.  There is nothing to stop "reg" and
> "ranges" being present on the same node.

Then the node is a 'target', and a 'bridge'. The key is to carefully
define how the DT properties are used for each view-point.

> >   'upstream' path direction toward the target, typically memory.
> 
> I'm not keen on that, because we would describe the hop between /
> and /memory as downstream or upstream depending on who initiates the
> transaction.  (I appreciate you weren't including CPUs in your
> discussion, but if the termology works for the whole system it
> would be a bonus).

I'm just using the word 'upstream' as meaning 'moving closer to the
completing target'.

It isn't a great word, 'forward path' would do better to borrow
a networking term.

Then we have the 'return path' which, on message based busses is the
path a completion message from 'completing target' travels to reach
the 'initiator'.

They actually can be asymmetric in some situations, but I'm not sure
that is worth considering in DT, we can just assume completions travel
a reverse path that hits every transformative bridge.

> >   'upstream-bridge' The next hop on a path between an initiator/target
> 
> Maybe.  I'm still not sure quite why this is considered different
> from the downward path through the DT, except that you consider
> the cross-links in the DT to be "upward", but I considered them
> "downward" (which I think are mostly equivalent approaches).
> 
> Can you elaborate?

Let us not worry about upstream/downstream and just talk about the
next bridge on the 'forward path' toward the 'completing target'.

> > But I would encourage you to think about the various limitations this
> > still has
> >  - NUMA systems. How does one describe the path from each
> >    CPU to a target regs, and target memory? This is important for
> >    automatically setting affinities.
> 
> This is a good point.
> 
> Currently I had only been considering visibility, not affinity.
> We actually have a similar problem with GIC, where there may
> be multiple MSI mailboxes visible to a device, but one that is
> preferred (due to being fewer hops away in the silicon, even though
> the routing may be transparent).

Really, the MSI affinity isn't handled by the architecture like x86?
Funky.
 
> We could describe a whole separate bus for each CPU, with links
> to common interconnect subtrees downstream.  But that might involve
> a lot of duplication.  Your example below doesn't look too bad
> though.

Unfortunately my example ignores the ePAPR scheme of having a /cpu
node, I didn't think too hard to fix that though.

> >  - Peer-to-Peer DMA, this is where a non-CPU initiator speaks to a
> >    non-memory target, possibly through IOMMUs and what not. ie
> >    a graphics card in a PCI-E slot DMA'ing through a QPI bus to
> >    a graphics card in a PCI-E slot attached to a different socket.
> 
> Actually, I do intend to describe that and I think I achieved it :)
> 
> To try to keep the length of this mail down a bit I won't try to
> give an example here, but I'm happy to follow up later if this is
> still not answered elsewhere in the thread.

I think you got most of it, if I understand properly. The tricky bit I
was concerned with, is where the CPU and DMA paths are not the same.

> >                 intiator1 {
> >                         ranges = < ... >;
> >                         // View from this DMA initiator back to memory
> >                         upstream-bridge = <&interconnect0>;
> >                 };
> > 		/* For some reason this peripheral has two DMA
> > 		   initiation ports. */
> >                 intiator2 {
> >                         ranges = < ... >;
> >                         upstream-bridge = <&interconnect0>;
> >                 };
> 
> Describing separate masters within a device in this way looks quite nice.
> 
> Understanding what to do with them can still be left up to the driver
> for the parent node (peripheral@0 in this case).

I was thinking the DMA API could learn to have a handle to the
initiator, with no handle it assumes the device node is itself the
initiator (eg dma-rages case)

> >              peripheral@1 {
> >                 ranges = < ... >;
> >    		regs = <>;
> >                 intiator {
> >                         ranges = < ... >;
> >                         // View from this DMA initiator back to memory
> >                         upstream-bridge = <&interconnect1>;
> >                 };
> >                 target {
> > 		        reg = <..>
> >                         /* This peripheral has integrated memory!
> >                            But notice the CPU path is
> >                              smp_system -> socket1 -> interconnect1_control -> target
> > 			   While a DMA path is
> >                              intiator1 -> interconnect0 -> interconnect1 -> target
> > 			 */
> >                 };
> 
> By hiding slaves (as opposed to masters) inside subnodes, can DT do
> generic reachability analysis?  Maybe the answer is "yes".  I know
> devices hanging of buses whose compatible string is not "simple-bus" are
> not automatically probed, but there are other reasons for that, such as
> bus-specific power-on and probing methods.

Again, in this instance, it becomes up to the driver for peripheral@1
to do something sensible with the buried nodes.

The generic DT machinery will happily convert the reg of target into a
CPU address for you.

> >             };
> >             peripheral2@0 {
> >    		regs = <>;
> > 
> > 		// Or we can write the simplest case like this.
> > 		dma-ranges = <>;
> > 		upstream-bridge = <&interconnect1>;
> >                 /* if upstream-bridge is omitted then it defaults to
> > 	           &parent, eg interconnect1_control */
> 
> This doesn't seem so different from my approach, though I need to
> think about it a bit more.

This is how I was thinking to unify the language with the existing
syntax.
 - dma-ranges alone in an initiator context is equivalent to using an
   implicit buried node:
    initiator {
       ranges == dma_ranges
       upstream-bridge = <&parent>;
    }
 - While in a bridge context it it attached to the
   'forward path edge'.

Now we have a very precise definition for dma-ranges in the same
language as the rest, and we can identify every involved node as
'initiator', 'target' or 'bridge'

> > It is computable that ops from initator2 -> target flow through
> > interconnect0, interconnect1, and then are delivered to target.
> > 
> > It has a fair symmetry with the interrupt-parent mechanism..
> 
> Although that language is rather different from mine, I think my
> proposal could describe this.  

Yes, I think it did, I was mostly thinking to tighten up the language.

> It doesn't preclude multi-rooted trees etc.; we could give a CPU a
> "slaves" property to override the default child for transaction
> rooting (which for CPUs is / -- somewhat illogical, but that's the
> way ePAPR has it).

Right..

> There's no reason why buses can't be cross-connected using slaves
> properties.  I'd avoided such things so far, because it introduces
> new cycle risks, such as
> socket@0 -> cross -> socket@1 -> cross -> socket@0 in the following.

That is actually really how hardware works though. The socket-routers
are configured to not have cycles on an address-by-address basis, but
the actual high level topology is cyclic.

> / {
> 	cpus {
> 		cpu@0 {
> 			slaves = <&socket0_interconnect>;
> 		};
> 		cpu@1 {
> 			slaves = <&socket0_interconnect>;
> 		};
> 		cpu@2 {
> 			slaves = <&socket1_interconnect>;
> 		};
> 		cpu@3 {
> 			slaves = <&socket1_interconnect>;
> 		};
> 	};
> 

So, this has dis-aggregated the sockets, which looses the coherent
address view.

I feel it is important to have a single top level node that represents
the start point for *any* CPU issued transaction. This is the coherent
memory space of the system. The DT tree follows a representative
physical topology, eg from the cpu0, socket 0 view.

Knowing affinity should be computable later on top of that.

> Of course, nothing about this tells an OS anything about affinity,
> except what it can guess from the number of nodes that must be traversed
> between two points -- which may be misleading, particular if extra nodes
> are inserted in order to describe mappings and linkages.

Right, now you have to start adding a 'cost' to edges in the graph :)

Or maybe this is wrong headed and nodes should simply have an
 affinity = <&cpu0 &cpu1 &memory0 &memory1>

> Cycles could be avoided via the cross-connector ranges properties -- I
> would sincerely hope that the hardware really does something
> equivalent -- but then you cannot answer questions like "is the path
> from X to Y cycle-free" without also specifying an address.

Correct, and that is how HW works.

The DT description for a bridge might actually need to include address
based routing :(

next-hop = <BASE SIZE &target
            BASE SIZE &memory>

> The downside of this approach is that the DT is unparseable to any
> parser that doesn't understand the new concepts.

This is why I feel the top level 'coherent view' node is so
important. It retains the compatability.

Or go back to the suggestion I gave last time - keep the DT tree
basically as-is today and store a graph edge list separately.

Jason
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-09 17:10                   ` Jason Gunthorpe
  0 siblings, 0 replies; 58+ messages in thread
From: Jason Gunthorpe @ 2014-05-09 17:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, May 09, 2014 at 03:16:33PM +0100, Dave Martin wrote:
> On Fri, May 02, 2014 at 12:17:50PM -0600, Jason Gunthorpe wrote:

> > I wonder if this might be a better naming scheme, I actually don't
> > really like 'slave' for this, it really only applies well to AXI style
> > unidirectional busses, and any sort of message-based bus architectures
> > (HT, PCI, QPI, etc) just have the concept of an initiator and target.
> > 
> > Since initiator/target applies equally well to master/slave buses,
> > that seems like better, clearer, naming.
> 
> Sure, I wouldn't have a problem with such a suggestion.  A more neutral
> naming is less likely to cause confusion.
> 
> > Using a nomenclature where
> >   'reg' describes a target reachable from the CPU initiator via the
> >         natural DT hierarchy
> 
> I would say, reachable from the parent device node (which implies your
> statement).  This is consistent with the way ePAPR describes device-to-
> device DMA (even if Linux doesn't usually make a lot of use of that).

Trying to simplify, but yes, that is right..

> >   'initiator' describes a non-CPU (eg 'DMA') source of ops, and
> >         travels via the path described to memory (which is the
> > 	target).
> 
> CPUs are initiators only; non-mastering devices are targets only.
> 
> We might want some terminology to distinguish between mastering 
> devices and bridges, both of which act as initiators and targets.

I was hoping to simplify a bit. What the kernel needs, really, is the
node that initates a transaction, the node that is the ultimate
completing target of that transaction and the path through all
intervening (transformative or not) components.

The fact a bridge is bus-level slave on one side and a bus-level
master on another is not relevant to the above - a bridge is not an
initiator and it is not a completing target.

> We could have a concept of a "forwarder" or "gateway".  But a bus
> may still be a target as well as forwarder: if the bus contains some
> control registers for example.  There is nothing to stop "reg" and
> "ranges" being present on the same node.

Then the node is a 'target', and a 'bridge'. The key is to carefully
define how the DT properties are used for each view-point.

> >   'upstream' path direction toward the target, typically memory.
> 
> I'm not keen on that, because we would describe the hop between /
> and /memory as downstream or upstream depending on who initiates the
> transaction.  (I appreciate you weren't including CPUs in your
> discussion, but if the termology works for the whole system it
> would be a bonus).

I'm just using the word 'upstream' as meaning 'moving closer to the
completing target'.

It isn't a great word, 'forward path' would do better to borrow
a networking term.

Then we have the 'return path' which, on message based busses is the
path a completion message from 'completing target' travels to reach
the 'initiator'.

They actually can be asymmetric in some situations, but I'm not sure
that is worth considering in DT, we can just assume completions travel
a reverse path that hits every transformative bridge.

> >   'upstream-bridge' The next hop on a path between an initiator/target
> 
> Maybe.  I'm still not sure quite why this is considered different
> from the downward path through the DT, except that you consider
> the cross-links in the DT to be "upward", but I considered them
> "downward" (which I think are mostly equivalent approaches).
> 
> Can you elaborate?

Let us not worry about upstream/downstream and just talk about the
next bridge on the 'forward path' toward the 'completing target'.

> > But I would encourage you to think about the various limitations this
> > still has
> >  - NUMA systems. How does one describe the path from each
> >    CPU to a target regs, and target memory? This is important for
> >    automatically setting affinities.
> 
> This is a good point.
> 
> Currently I had only been considering visibility, not affinity.
> We actually have a similar problem with GIC, where there may
> be multiple MSI mailboxes visible to a device, but one that is
> preferred (due to being fewer hops away in the silicon, even though
> the routing may be transparent).

Really, the MSI affinity isn't handled by the architecture like x86?
Funky.
 
> We could describe a whole separate bus for each CPU, with links
> to common interconnect subtrees downstream.  But that might involve
> a lot of duplication.  Your example below doesn't look too bad
> though.

Unfortunately my example ignores the ePAPR scheme of having a /cpu
node, I didn't think too hard to fix that though.

> >  - Peer-to-Peer DMA, this is where a non-CPU initiator speaks to a
> >    non-memory target, possibly through IOMMUs and what not. ie
> >    a graphics card in a PCI-E slot DMA'ing through a QPI bus to
> >    a graphics card in a PCI-E slot attached to a different socket.
> 
> Actually, I do intend to describe that and I think I achieved it :)
> 
> To try to keep the length of this mail down a bit I won't try to
> give an example here, but I'm happy to follow up later if this is
> still not answered elsewhere in the thread.

I think you got most of it, if I understand properly. The tricky bit I
was concerned with, is where the CPU and DMA paths are not the same.

> >                 intiator1 {
> >                         ranges = < ... >;
> >                         // View from this DMA initiator back to memory
> >                         upstream-bridge = <&interconnect0>;
> >                 };
> > 		/* For some reason this peripheral has two DMA
> > 		   initiation ports. */
> >                 intiator2 {
> >                         ranges = < ... >;
> >                         upstream-bridge = <&interconnect0>;
> >                 };
> 
> Describing separate masters within a device in this way looks quite nice.
> 
> Understanding what to do with them can still be left up to the driver
> for the parent node (peripheral at 0 in this case).

I was thinking the DMA API could learn to have a handle to the
initiator, with no handle it assumes the device node is itself the
initiator (eg dma-rages case)

> >              peripheral at 1 {
> >                 ranges = < ... >;
> >    		regs = <>;
> >                 intiator {
> >                         ranges = < ... >;
> >                         // View from this DMA initiator back to memory
> >                         upstream-bridge = <&interconnect1>;
> >                 };
> >                 target {
> > 		        reg = <..>
> >                         /* This peripheral has integrated memory!
> >                            But notice the CPU path is
> >                              smp_system -> socket1 -> interconnect1_control -> target
> > 			   While a DMA path is
> >                              intiator1 -> interconnect0 -> interconnect1 -> target
> > 			 */
> >                 };
> 
> By hiding slaves (as opposed to masters) inside subnodes, can DT do
> generic reachability analysis?  Maybe the answer is "yes".  I know
> devices hanging of buses whose compatible string is not "simple-bus" are
> not automatically probed, but there are other reasons for that, such as
> bus-specific power-on and probing methods.

Again, in this instance, it becomes up to the driver for peripheral at 1
to do something sensible with the buried nodes.

The generic DT machinery will happily convert the reg of target into a
CPU address for you.

> >             };
> >             peripheral2 at 0 {
> >    		regs = <>;
> > 
> > 		// Or we can write the simplest case like this.
> > 		dma-ranges = <>;
> > 		upstream-bridge = <&interconnect1>;
> >                 /* if upstream-bridge is omitted then it defaults to
> > 	           &parent, eg interconnect1_control */
> 
> This doesn't seem so different from my approach, though I need to
> think about it a bit more.

This is how I was thinking to unify the language with the existing
syntax.
 - dma-ranges alone in an initiator context is equivalent to using an
   implicit buried node:
    initiator {
       ranges == dma_ranges
       upstream-bridge = <&parent>;
    }
 - While in a bridge context it it attached to the
   'forward path edge'.

Now we have a very precise definition for dma-ranges in the same
language as the rest, and we can identify every involved node as
'initiator', 'target' or 'bridge'

> > It is computable that ops from initator2 -> target flow through
> > interconnect0, interconnect1, and then are delivered to target.
> > 
> > It has a fair symmetry with the interrupt-parent mechanism..
> 
> Although that language is rather different from mine, I think my
> proposal could describe this.  

Yes, I think it did, I was mostly thinking to tighten up the language.

> It doesn't preclude multi-rooted trees etc.; we could give a CPU a
> "slaves" property to override the default child for transaction
> rooting (which for CPUs is / -- somewhat illogical, but that's the
> way ePAPR has it).

Right..

> There's no reason why buses can't be cross-connected using slaves
> properties.  I'd avoided such things so far, because it introduces
> new cycle risks, such as
> socket at 0 -> cross -> socket at 1 -> cross -> socket at 0 in the following.

That is actually really how hardware works though. The socket-routers
are configured to not have cycles on an address-by-address basis, but
the actual high level topology is cyclic.

> / {
> 	cpus {
> 		cpu at 0 {
> 			slaves = <&socket0_interconnect>;
> 		};
> 		cpu at 1 {
> 			slaves = <&socket0_interconnect>;
> 		};
> 		cpu at 2 {
> 			slaves = <&socket1_interconnect>;
> 		};
> 		cpu at 3 {
> 			slaves = <&socket1_interconnect>;
> 		};
> 	};
> 

So, this has dis-aggregated the sockets, which looses the coherent
address view.

I feel it is important to have a single top level node that represents
the start point for *any* CPU issued transaction. This is the coherent
memory space of the system. The DT tree follows a representative
physical topology, eg from the cpu0, socket 0 view.

Knowing affinity should be computable later on top of that.

> Of course, nothing about this tells an OS anything about affinity,
> except what it can guess from the number of nodes that must be traversed
> between two points -- which may be misleading, particular if extra nodes
> are inserted in order to describe mappings and linkages.

Right, now you have to start adding a 'cost' to edges in the graph :)

Or maybe this is wrong headed and nodes should simply have an
 affinity = <&cpu0 &cpu1 &memory0 &memory1>

> Cycles could be avoided via the cross-connector ranges properties -- I
> would sincerely hope that the hardware really does something
> equivalent -- but then you cannot answer questions like "is the path
> from X to Y cycle-free" without also specifying an address.

Correct, and that is how HW works.

The DT description for a bridge might actually need to include address
based routing :(

next-hop = <BASE SIZE &target
            BASE SIZE &memory>

> The downside of this approach is that the DT is unparseable to any
> parser that doesn't understand the new concepts.

This is why I feel the top level 'coherent view' node is so
important. It retains the compatability.

Or go back to the suggestion I gave last time - keep the DT tree
basically as-is today and store a graph edge list separately.

Jason

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-09 10:56                 ` Dave Martin
@ 2014-05-12 16:19                     ` Stephen Warren
  -1 siblings, 0 replies; 58+ messages in thread
From: Stephen Warren @ 2014-05-12 16:19 UTC (permalink / raw)
  To: Dave Martin, Arnd Bergmann
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Mark Rutland,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Shaik Ameer Basha,
	Grant Grundler, Will Deacon, Jason Gunthorpe, Marc Zyngier,
	Thierry Reding, Hiroshi Doyu

On 05/09/2014 04:56 AM, Dave Martin wrote:
> On Fri, May 02, 2014 at 09:06:43PM +0200, Arnd Bergmann wrote:
>> On Friday 02 May 2014 12:50:17 Stephen Warren wrote:
...
>>> Now, perhaps there are devices which themselves control whether
>>> transactions are sent to the IOMMU or direct to RAM, but I'm not
>>> familiar with them. Is the GPU in that category, since it has its own
>>> GMMU, albeit chained into the SMMU IIRC?
>>
>> Devices with a built-in IOMMU such as most GPUs are also easy enough
>> to handle: There is no reason to actually show the IOMMU in DT and
>> we can just treat the GPU as a black box.
> 
> It's impossible for such a built-in IOMMU to be shared with other
> devices, so that's probably reasonable.

I don't believe that's true.

For example, on Tegra, the CPU (and likely anything that can bus-master
the relevant bus) can send transactions into the GPU, which can then
turn them around towards RAM, and those likely then go through the MMU
inside the GPU.

IIRC, the current Nouveau support for Tegra even makes use of that
feature, although I think that's a temporary thing that we're hoping to
get rid of once the Tegra support in Nouveau gets more mature.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-12 16:19                     ` Stephen Warren
  0 siblings, 0 replies; 58+ messages in thread
From: Stephen Warren @ 2014-05-12 16:19 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/09/2014 04:56 AM, Dave Martin wrote:
> On Fri, May 02, 2014 at 09:06:43PM +0200, Arnd Bergmann wrote:
>> On Friday 02 May 2014 12:50:17 Stephen Warren wrote:
...
>>> Now, perhaps there are devices which themselves control whether
>>> transactions are sent to the IOMMU or direct to RAM, but I'm not
>>> familiar with them. Is the GPU in that category, since it has its own
>>> GMMU, albeit chained into the SMMU IIRC?
>>
>> Devices with a built-in IOMMU such as most GPUs are also easy enough
>> to handle: There is no reason to actually show the IOMMU in DT and
>> we can just treat the GPU as a black box.
> 
> It's impossible for such a built-in IOMMU to be shared with other
> devices, so that's probably reasonable.

I don't believe that's true.

For example, on Tegra, the CPU (and likely anything that can bus-master
the relevant bus) can send transactions into the GPU, which can then
turn them around towards RAM, and those likely then go through the MMU
inside the GPU.

IIRC, the current Nouveau support for Tegra even makes use of that
feature, although I think that's a temporary thing that we're hoping to
get rid of once the Tegra support in Nouveau gets more mature.

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-12 16:19                     ` Stephen Warren
@ 2014-05-12 18:10                         ` Arnd Bergmann
  -1 siblings, 0 replies; 58+ messages in thread
From: Arnd Bergmann @ 2014-05-12 18:10 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Dave Martin, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Shaik Ameer Basha, Grant Grundler, Will Deacon, Jason Gunthorpe,
	Marc Zyngier, Thierry Reding, Hiroshi Doyu

On Monday 12 May 2014 10:19:16 Stephen Warren wrote:
> On 05/09/2014 04:56 AM, Dave Martin wrote:
> > On Fri, May 02, 2014 at 09:06:43PM +0200, Arnd Bergmann wrote:
> >> On Friday 02 May 2014 12:50:17 Stephen Warren wrote:
> ...
> >>> Now, perhaps there are devices which themselves control whether
> >>> transactions are sent to the IOMMU or direct to RAM, but I'm not
> >>> familiar with them. Is the GPU in that category, since it has its own
> >>> GMMU, albeit chained into the SMMU IIRC?
> >>
> >> Devices with a built-in IOMMU such as most GPUs are also easy enough
> >> to handle: There is no reason to actually show the IOMMU in DT and
> >> we can just treat the GPU as a black box.
> > 
> > It's impossible for such a built-in IOMMU to be shared with other
> > devices, so that's probably reasonable.
> 
> I don't believe that's true.
> 
> For example, on Tegra, the CPU (and likely anything that can bus-master
> the relevant bus) can send transactions into the GPU, which can then
> turn them around towards RAM, and those likely then go through the MMU
> inside the GPU.
> 
> IIRC, the current Nouveau support for Tegra even makes use of that
> feature, although I think that's a temporary thing that we're hoping to
> get rid of once the Tegra support in Nouveau gets more mature.

But the important point here is that you wouldn't use the dma-mapping
API to manage this. First of all, the CPU is special anyway, but also
if you do a device-to-device DMA into the GPU address space and that
ends up being redirected to memory through the IOMMU, you still wouldn't
manage the I/O page tables through the interfaces of the device doing the
DMA, but through some private interface of the GPU.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-12 18:10                         ` Arnd Bergmann
  0 siblings, 0 replies; 58+ messages in thread
From: Arnd Bergmann @ 2014-05-12 18:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 12 May 2014 10:19:16 Stephen Warren wrote:
> On 05/09/2014 04:56 AM, Dave Martin wrote:
> > On Fri, May 02, 2014 at 09:06:43PM +0200, Arnd Bergmann wrote:
> >> On Friday 02 May 2014 12:50:17 Stephen Warren wrote:
> ...
> >>> Now, perhaps there are devices which themselves control whether
> >>> transactions are sent to the IOMMU or direct to RAM, but I'm not
> >>> familiar with them. Is the GPU in that category, since it has its own
> >>> GMMU, albeit chained into the SMMU IIRC?
> >>
> >> Devices with a built-in IOMMU such as most GPUs are also easy enough
> >> to handle: There is no reason to actually show the IOMMU in DT and
> >> we can just treat the GPU as a black box.
> > 
> > It's impossible for such a built-in IOMMU to be shared with other
> > devices, so that's probably reasonable.
> 
> I don't believe that's true.
> 
> For example, on Tegra, the CPU (and likely anything that can bus-master
> the relevant bus) can send transactions into the GPU, which can then
> turn them around towards RAM, and those likely then go through the MMU
> inside the GPU.
> 
> IIRC, the current Nouveau support for Tegra even makes use of that
> feature, although I think that's a temporary thing that we're hoping to
> get rid of once the Tegra support in Nouveau gets more mature.

But the important point here is that you wouldn't use the dma-mapping
API to manage this. First of all, the CPU is special anyway, but also
if you do a device-to-device DMA into the GPU address space and that
ends up being redirected to memory through the IOMMU, you still wouldn't
manage the I/O page tables through the interfaces of the device doing the
DMA, but through some private interface of the GPU.

	Arnd

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-12 18:10                         ` Arnd Bergmann
@ 2014-05-12 18:29                           ` Stephen Warren
  -1 siblings, 0 replies; 58+ messages in thread
From: Stephen Warren @ 2014-05-12 18:29 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Dave Martin, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Shaik Ameer Basha, Grant Grundler, Will Deacon, Jason Gunthorpe,
	Marc Zyngier, Thierry Reding, Hiroshi Doyu

On 05/12/2014 12:10 PM, Arnd Bergmann wrote:
> On Monday 12 May 2014 10:19:16 Stephen Warren wrote:
>> On 05/09/2014 04:56 AM, Dave Martin wrote:
>>> On Fri, May 02, 2014 at 09:06:43PM +0200, Arnd Bergmann wrote:
>>>> On Friday 02 May 2014 12:50:17 Stephen Warren wrote:
>> ...
>>>>> Now, perhaps there are devices which themselves control whether
>>>>> transactions are sent to the IOMMU or direct to RAM, but I'm not
>>>>> familiar with them. Is the GPU in that category, since it has its own
>>>>> GMMU, albeit chained into the SMMU IIRC?
>>>>
>>>> Devices with a built-in IOMMU such as most GPUs are also easy enough
>>>> to handle: There is no reason to actually show the IOMMU in DT and
>>>> we can just treat the GPU as a black box.
>>>
>>> It's impossible for such a built-in IOMMU to be shared with other
>>> devices, so that's probably reasonable.
>>
>> I don't believe that's true.
>>
>> For example, on Tegra, the CPU (and likely anything that can bus-master
>> the relevant bus) can send transactions into the GPU, which can then
>> turn them around towards RAM, and those likely then go through the MMU
>> inside the GPU.
>>
>> IIRC, the current Nouveau support for Tegra even makes use of that
>> feature, although I think that's a temporary thing that we're hoping to
>> get rid of once the Tegra support in Nouveau gets more mature.
> 
> But the important point here is that you wouldn't use the dma-mapping
> API to manage this. First of all, the CPU is special anyway, but also
> if you do a device-to-device DMA into the GPU address space and that
> ends up being redirected to memory through the IOMMU, you still wouldn't
> manage the I/O page tables through the interfaces of the device doing the
> DMA, but through some private interface of the GPU.

Why not? If something wants to DMA to a memory region, irrespective of
whether the GPU MMU (or any MMU) is in between those master transactions
and the RAM or not, surely the driver should always use the DMA mapping
API to set that up? Anything else just means using custom APIs, and
isn't the whole point of the DMA mapping API to provide a standard API
for that purpose?
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-12 18:29                           ` Stephen Warren
  0 siblings, 0 replies; 58+ messages in thread
From: Stephen Warren @ 2014-05-12 18:29 UTC (permalink / raw)
  To: linux-arm-kernel

On 05/12/2014 12:10 PM, Arnd Bergmann wrote:
> On Monday 12 May 2014 10:19:16 Stephen Warren wrote:
>> On 05/09/2014 04:56 AM, Dave Martin wrote:
>>> On Fri, May 02, 2014 at 09:06:43PM +0200, Arnd Bergmann wrote:
>>>> On Friday 02 May 2014 12:50:17 Stephen Warren wrote:
>> ...
>>>>> Now, perhaps there are devices which themselves control whether
>>>>> transactions are sent to the IOMMU or direct to RAM, but I'm not
>>>>> familiar with them. Is the GPU in that category, since it has its own
>>>>> GMMU, albeit chained into the SMMU IIRC?
>>>>
>>>> Devices with a built-in IOMMU such as most GPUs are also easy enough
>>>> to handle: There is no reason to actually show the IOMMU in DT and
>>>> we can just treat the GPU as a black box.
>>>
>>> It's impossible for such a built-in IOMMU to be shared with other
>>> devices, so that's probably reasonable.
>>
>> I don't believe that's true.
>>
>> For example, on Tegra, the CPU (and likely anything that can bus-master
>> the relevant bus) can send transactions into the GPU, which can then
>> turn them around towards RAM, and those likely then go through the MMU
>> inside the GPU.
>>
>> IIRC, the current Nouveau support for Tegra even makes use of that
>> feature, although I think that's a temporary thing that we're hoping to
>> get rid of once the Tegra support in Nouveau gets more mature.
> 
> But the important point here is that you wouldn't use the dma-mapping
> API to manage this. First of all, the CPU is special anyway, but also
> if you do a device-to-device DMA into the GPU address space and that
> ends up being redirected to memory through the IOMMU, you still wouldn't
> manage the I/O page tables through the interfaces of the device doing the
> DMA, but through some private interface of the GPU.

Why not? If something wants to DMA to a memory region, irrespective of
whether the GPU MMU (or any MMU) is in between those master transactions
and the RAM or not, surely the driver should always use the DMA mapping
API to set that up? Anything else just means using custom APIs, and
isn't the whole point of the DMA mapping API to provide a standard API
for that purpose?

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-12 18:29                           ` Stephen Warren
@ 2014-05-12 19:53                               ` Arnd Bergmann
  -1 siblings, 0 replies; 58+ messages in thread
From: Arnd Bergmann @ 2014-05-12 19:53 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: Stephen Warren, Mark Rutland, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Shaik Ameer Basha, Grant Grundler, Will Deacon, Jason Gunthorpe,
	Marc Zyngier, Thierry Reding, Dave Martin, Hiroshi Doyu

On Monday 12 May 2014 12:29:16 Stephen Warren wrote:
> On 05/12/2014 12:10 PM, Arnd Bergmann wrote:
> > On Monday 12 May 2014 10:19:16 Stephen Warren wrote:
> >> IIRC, the current Nouveau support for Tegra even makes use of that
> >> feature, although I think that's a temporary thing that we're hoping to
> >> get rid of once the Tegra support in Nouveau gets more mature.
> > 
> > But the important point here is that you wouldn't use the dma-mapping
> > API to manage this. First of all, the CPU is special anyway, but also
> > if you do a device-to-device DMA into the GPU address space and that
> > ends up being redirected to memory through the IOMMU, you still wouldn't
> > manage the I/O page tables through the interfaces of the device doing the
> > DMA, but through some private interface of the GPU.
> 
> Why not? If something wants to DMA to a memory region, irrespective of
> whether the GPU MMU (or any MMU) is in between those master transactions
> and the RAM or not, surely the driver should always use the DMA mapping
> API to set that up? Anything else just means using custom APIs, and
> isn't the whole point of the DMA mapping API to provide a standard API
> for that purpose?

It sounds like an abuse of the hardware if you use the GPU's IOMMU
to set up DMA for a random non-GPU DMA master. I'd prefer not to go
there and instead use swiotlb.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-12 19:53                               ` Arnd Bergmann
  0 siblings, 0 replies; 58+ messages in thread
From: Arnd Bergmann @ 2014-05-12 19:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 12 May 2014 12:29:16 Stephen Warren wrote:
> On 05/12/2014 12:10 PM, Arnd Bergmann wrote:
> > On Monday 12 May 2014 10:19:16 Stephen Warren wrote:
> >> IIRC, the current Nouveau support for Tegra even makes use of that
> >> feature, although I think that's a temporary thing that we're hoping to
> >> get rid of once the Tegra support in Nouveau gets more mature.
> > 
> > But the important point here is that you wouldn't use the dma-mapping
> > API to manage this. First of all, the CPU is special anyway, but also
> > if you do a device-to-device DMA into the GPU address space and that
> > ends up being redirected to memory through the IOMMU, you still wouldn't
> > manage the I/O page tables through the interfaces of the device doing the
> > DMA, but through some private interface of the GPU.
> 
> Why not? If something wants to DMA to a memory region, irrespective of
> whether the GPU MMU (or any MMU) is in between those master transactions
> and the RAM or not, surely the driver should always use the DMA mapping
> API to set that up? Anything else just means using custom APIs, and
> isn't the whole point of the DMA mapping API to provide a standard API
> for that purpose?

It sounds like an abuse of the hardware if you use the GPU's IOMMU
to set up DMA for a random non-GPU DMA master. I'd prefer not to go
there and instead use swiotlb.

	Arnd

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [RFC] Describing arbitrary bus mastering relationships in DT
  2014-05-12 18:29                           ` Stephen Warren
@ 2014-05-12 20:02                               ` Grant Grundler
  -1 siblings, 0 replies; 58+ messages in thread
From: Grant Grundler @ 2014-05-12 20:02 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Arnd Bergmann, Dave Martin,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Mark Rutland,
	Linux DeviceTree, Shaik Ameer Basha, Grant Grundler, Will Deacon,
	Jason Gunthorpe, Marc Zyngier, Thierry Reding, Hiroshi Doyu

On Mon, May 12, 2014 at 11:29 AM, Stephen Warren <swarren-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org> wrote:
....
>> But the important point here is that you wouldn't use the dma-mapping
>> API to manage this. First of all, the CPU is special anyway, but also
>> if you do a device-to-device DMA into the GPU address space and that
>> ends up being redirected to memory through the IOMMU, you still wouldn't
>> manage the I/O page tables through the interfaces of the device doing the
>> DMA, but through some private interface of the GPU.
>
> Why not? If something wants to DMA to a memory region, irrespective of
> whether the GPU MMU (or any MMU) is in between those master transactions
> and the RAM or not, surely the driver should always use the DMA mapping
> API to set that up?

No.  As one of the contributors to DMA API, I'm pretty confident it's
not. It _could_ be used that way but it's certainly not the original
design. P2P transactions are different since they are "less likely"
(depends on arch and implementation) to participate in the CPU cache
coherency or even be visible to the CPU. In particular, think of case
where all transactions are locally routed behind a PCI bridge (or
other fabric) and CPU/IOMMU/RAM controller never sees those.

A long standing real example is in drivers/scsi/sym53c8xx_2 driver.
The "scripts" engine needs to access local (on chip) RAM through PCI
bus transactions. So it uses it's own PCI BAR registers to sort that
out.
In essence, "local PCI physical" addresses.  I believe the code is in
sym_iomap_device(). No CPU or IOMMU is involved with this.  This
driver otherwise uses the DMA API for all other host RAM accesses.

> Anything else just means using custom APIs, and
> isn't the whole point of the DMA mapping API to provide a standard API
> for that purpose?

yes and no. Yes, the generic DMA API is to provide DMA mapping
services to hide the (or lack of) IOMMU AND provide Cache Coherency
for DMA transactions to RAM that is visible to the CPU cache.

In general, I'd argue transactions that route through an IOMMU need to
work with the existing DMA API. Historically those transactions are
routed "upstream" - away from other IO devices and thus not the case
referred to here.

If the IOMMU is part of a "graph topology" (vs a tree topology), the
drivers will have to know if they use DMA API or not to access the
intended target.

cheers,
grant
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC] Describing arbitrary bus mastering relationships in DT
@ 2014-05-12 20:02                               ` Grant Grundler
  0 siblings, 0 replies; 58+ messages in thread
From: Grant Grundler @ 2014-05-12 20:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 12, 2014 at 11:29 AM, Stephen Warren <swarren@wwwdotorg.org> wrote:
....
>> But the important point here is that you wouldn't use the dma-mapping
>> API to manage this. First of all, the CPU is special anyway, but also
>> if you do a device-to-device DMA into the GPU address space and that
>> ends up being redirected to memory through the IOMMU, you still wouldn't
>> manage the I/O page tables through the interfaces of the device doing the
>> DMA, but through some private interface of the GPU.
>
> Why not? If something wants to DMA to a memory region, irrespective of
> whether the GPU MMU (or any MMU) is in between those master transactions
> and the RAM or not, surely the driver should always use the DMA mapping
> API to set that up?

No.  As one of the contributors to DMA API, I'm pretty confident it's
not. It _could_ be used that way but it's certainly not the original
design. P2P transactions are different since they are "less likely"
(depends on arch and implementation) to participate in the CPU cache
coherency or even be visible to the CPU. In particular, think of case
where all transactions are locally routed behind a PCI bridge (or
other fabric) and CPU/IOMMU/RAM controller never sees those.

A long standing real example is in drivers/scsi/sym53c8xx_2 driver.
The "scripts" engine needs to access local (on chip) RAM through PCI
bus transactions. So it uses it's own PCI BAR registers to sort that
out.
In essence, "local PCI physical" addresses.  I believe the code is in
sym_iomap_device(). No CPU or IOMMU is involved with this.  This
driver otherwise uses the DMA API for all other host RAM accesses.

> Anything else just means using custom APIs, and
> isn't the whole point of the DMA mapping API to provide a standard API
> for that purpose?

yes and no. Yes, the generic DMA API is to provide DMA mapping
services to hide the (or lack of) IOMMU AND provide Cache Coherency
for DMA transactions to RAM that is visible to the CPU cache.

In general, I'd argue transactions that route through an IOMMU need to
work with the existing DMA API. Historically those transactions are
routed "upstream" - away from other IO devices and thus not the case
referred to here.

If the IOMMU is part of a "graph topology" (vs a tree topology), the
drivers will have to know if they use DMA API or not to access the
intended target.

cheers,
grant

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2014-05-12 20:02 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-01 17:32 [RFC] Describing arbitrary bus mastering relationships in DT Dave Martin
2014-05-01 17:32 ` Dave Martin
     [not found] ` <20140501173248.GD3732-M5GwZQ6tE7x5pKCnmE3YQBJ8xKzm50AiAL8bYrjMMd8@public.gmane.org>
2014-05-02 11:05   ` Thierry Reding
2014-05-02 11:05     ` Thierry Reding
2014-05-02 12:32     ` Arnd Bergmann
2014-05-02 12:32       ` Arnd Bergmann
2014-05-02 13:23       ` Thierry Reding
2014-05-02 13:23         ` Thierry Reding
2014-05-02 15:19         ` Arnd Bergmann
2014-05-02 15:19           ` Arnd Bergmann
2014-05-02 17:43           ` Dave Martin
2014-05-02 17:43             ` Dave Martin
     [not found]             ` <20140502174301.GE3805-M5GwZQ6tE7x5pKCnmE3YQBJ8xKzm50AiAL8bYrjMMd8@public.gmane.org>
2014-05-05 15:14               ` Arnd Bergmann
2014-05-05 15:14                 ` Arnd Bergmann
2014-05-09 10:33                 ` Dave Martin
2014-05-09 10:33                   ` Dave Martin
     [not found]                   ` <20140509103309.GA3875-M5GwZQ6tE7x5pKCnmE3YQBJ8xKzm50AiAL8bYrjMMd8@public.gmane.org>
2014-05-09 11:15                     ` Arnd Bergmann
2014-05-09 11:15                       ` Arnd Bergmann
2014-05-09 14:59                     ` Grant Grundler
2014-05-09 14:59                       ` Grant Grundler
2014-05-02 18:55           ` Stephen Warren
2014-05-02 18:55             ` Stephen Warren
     [not found]             ` <5363EA31.3000509-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2014-05-02 19:02               ` Arnd Bergmann
2014-05-02 19:02                 ` Arnd Bergmann
2014-05-09 10:45                 ` Dave Martin
2014-05-09 10:45                   ` Dave Martin
2014-05-02 18:50         ` Stephen Warren
2014-05-02 18:50           ` Stephen Warren
     [not found]           ` <5363E8E9.6000908-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2014-05-02 19:06             ` Arnd Bergmann
2014-05-02 19:06               ` Arnd Bergmann
2014-05-09 10:56               ` Dave Martin
2014-05-09 10:56                 ` Dave Martin
     [not found]                 ` <20140509105638.GB3921-M5GwZQ6tE7x5pKCnmE3YQBJ8xKzm50AiAL8bYrjMMd8@public.gmane.org>
2014-05-12 16:19                   ` Stephen Warren
2014-05-12 16:19                     ` Stephen Warren
     [not found]                     ` <5370F484.9030209-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2014-05-12 18:10                       ` Arnd Bergmann
2014-05-12 18:10                         ` Arnd Bergmann
2014-05-12 18:29                         ` Stephen Warren
2014-05-12 18:29                           ` Stephen Warren
     [not found]                           ` <537112FC.1040204-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2014-05-12 19:53                             ` Arnd Bergmann
2014-05-12 19:53                               ` Arnd Bergmann
2014-05-12 20:02                             ` Grant Grundler
2014-05-12 20:02                               ` Grant Grundler
2014-05-02 16:19     ` Dave Martin
2014-05-02 16:19       ` Dave Martin
2014-05-02 16:14   ` Arnd Bergmann
2014-05-02 16:14     ` Arnd Bergmann
2014-05-02 17:31     ` Dave Martin
2014-05-02 17:31       ` Dave Martin
     [not found]       ` <20140502173120.GD3805-M5GwZQ6tE7x5pKCnmE3YQBJ8xKzm50AiAL8bYrjMMd8@public.gmane.org>
2014-05-02 18:17         ` Jason Gunthorpe
2014-05-02 18:17           ` Jason Gunthorpe
     [not found]           ` <20140502181750.GD3179-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2014-05-09 14:16             ` Dave Martin
2014-05-09 14:16               ` Dave Martin
     [not found]               ` <20140509141633.GD3921-M5GwZQ6tE7x5pKCnmE3YQBJ8xKzm50AiAL8bYrjMMd8@public.gmane.org>
2014-05-09 17:10                 ` Jason Gunthorpe
2014-05-09 17:10                   ` Jason Gunthorpe
2014-05-02 20:36         ` Arnd Bergmann
2014-05-02 20:36           ` Arnd Bergmann
2014-05-09 13:26           ` Dave Martin
2014-05-09 13:26             ` Dave Martin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.