All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/6] MT8173 IOMMU SUPPORT
@ 2015-08-03 10:21 ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-03 10:21 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Mark Rutland, Matthias Brugger
  Cc: Robin Murphy, Will Deacon, Daniel Kurtz, Tomasz Figa,
	Lucas Stach, Rob Herring, Catalin Marinas, linux-mediatek,
	Sasha Hauer, srv_heupstream, devicetree, linux-kernel,
	linux-arm-kernel, iommu, pebolle, arnd, mitchelh, youhua.li,
	k.zhang, frederic.chen

This patch set adds support for m4u(Multimedia Memory Management Unit),
Currently it only support the m4u with 2 levels of pagetable on mt8173.

  It's based on Robin Murphy's arm64 DMA-v5[1] and Robin's allow DMA API use in
io-pgtable-arm[2]. The dsti is based on MTK clock patch[3].
 
  Please check the hardware block diagram of Mediatek IOMMU.
 
              EMI (External Memory Interface)
               |
              m4u (Multimedia Memory Management Unit)
               |
              smi (Smart Multimedia Interface)
               |
        +---------------+-------
        |               |
        |               |
    vdec larb       disp larb      ... SoCs have different local arbiter(larb).
        |               |
        |               |
   +----+----+    +-----+-----+
   |    |    |    |     |     |    ...
   |    |    |    |     |     |    ...
   |    |    |    |     |     |    ...
  MC   PP   VLD  OVL0 RDMA0 WDMA0  ... There are different ports in each larb.
  
  Normally we specify a local arbiter(larb) for each multimedia hardware like
display, video decode, video encode and camera. And there are different ports in
each larb. Take a example, there are some ports like MC, PP, UFO, VLD, AVC_MV,
PRED_RD in video larb, all the ports are according to the video hardware.
 
  From the diagram, all the multimedia module connect with m4u via smi.
SMI is responsible to enable/disable iommu and control the clocks for each local
arbiter. If we should enable the iommu of video decode, it should config the
video's ports. And if the video hardware work whether enable/disable iommu, 
it should enable the clock of its larb's clock. And smi also help bandwidth
control for each local. So we add a special driver for smi and locate
it drivers/memory.

v4:
-use only one iommu domain here based on the Robin's DMA-v5.
-remove flush_pgtable.
-change writel to writel_relaxed.
-about Short-descriptor: move dma_map_single into io-pgtable-arm-short.
 Improve the flow of free pgtable and add NO_XN+NO_PERMS quirk following
 Will's suggestion.
-Change two sytle issues in dtsi according to Daniel's suggestion.

v3: http://lists.linuxfoundation.org/pipermail/iommu/2015-July/013632.html
-rebased onto v4.2-rc1
-improve iommu flow based on the Robin's DMA v3[4].
-change mtk iommu-cells from 1 to 2.
-about Short-descriptor: add split function; add self-test; add some other bits like nG,
 XN according to the spec; add SUPERSECTION and MTK quirk; move io_pgtable_ops_to_pgtable
 out from LPAE to the header file.
-about SMI: move from driver/soc/mediatek to driver/memory; change the clocks from
 clk[2] to clk_apb and clk_smi; add pm.
-add iommu suspend/resume to backup/restore register.

v2: http://lists.linuxfoundation.org/pipermail/iommu/2015-May/013028.html
-add arm short descriptor support.
-seperate smi common from smi and change the clock-names according
 to smi HW.
-delete the hardcode of the port-names in mt8173.
 replace this with larb-portes-nr in dtsi.
-fix some coding style issues.

v1: http://lists.infradead.org/pipermail/linux-mediatek/2015-March/000058.html
-initial version.

[1]: http://lists.linuxfoundation.org/pipermail/iommu/2015-July/013900.html
[2]: http://lists.linuxfoundation.org/pipermail/iommu/2015-July/013876.html
[3]: http://lists.infradead.org/pipermail/linux-mediatek/2015-July/001800.html
[4]: http://lists.linuxfoundation.org/pipermail/iommu/2015-July/013597.html

Yong Wu (6):
  dt-bindings: iommu: Add binding for mediatek IOMMU
  dt-bindings: mediatek: Add smi dts binding
  iommu: add ARM short descriptor page table allocator.
  memory: mediatek: Add SMI driver
  iommu/mediatek: Add mt8173 IOMMU driver
  dts: mt8173: Add iommu/smi nodes for mt8173

 .../devicetree/bindings/iommu/mediatek,iommu.txt   |  61 ++
 .../memory-controllers/mediatek,smi-larb.txt       |  25 +
 .../bindings/memory-controllers/mediatek,smi.txt   |  24 +
 arch/arm64/boot/dts/mediatek/mt8173.dtsi           |  81 ++
 drivers/iommu/Kconfig                              |  31 +
 drivers/iommu/Makefile                             |   2 +
 drivers/iommu/io-pgtable-arm-short.c               | 811 +++++++++++++++++++++
 drivers/iommu/io-pgtable-arm.c                     |   3 -
 drivers/iommu/io-pgtable.c                         |   4 +
 drivers/iommu/io-pgtable.h                         |  14 +
 drivers/iommu/mtk_iommu.c                          | 714 ++++++++++++++++++
 drivers/memory/Kconfig                             |   8 +
 drivers/memory/Makefile                            |   1 +
 drivers/memory/mtk-smi.c                           | 285 ++++++++
 include/dt-bindings/memory/mt8173-larb-port.h      | 105 +++
 include/soc/mediatek/smi.h                         |  60 ++
 16 files changed, 2226 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
 create mode 100644 Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.txt
 create mode 100644 Documentation/devicetree/bindings/memory-controllers/mediatek,smi.txt
 create mode 100644 drivers/iommu/io-pgtable-arm-short.c
 create mode 100644 drivers/iommu/mtk_iommu.c
 create mode 100644 drivers/memory/mtk-smi.c
 create mode 100644 include/dt-bindings/memory/mt8173-larb-port.h
 create mode 100644 include/soc/mediatek/smi.h

-- 
1.8.1.1.dirty


^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v4 0/6] MT8173 IOMMU SUPPORT
@ 2015-08-03 10:21 ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-03 10:21 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Mark Rutland, Matthias Brugger
  Cc: Robin Murphy, Will Deacon, Daniel Kurtz, Tomasz Figa,
	Lucas Stach, Rob Herring, Catalin Marinas,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Sasha Hauer,
	srv_heupstream-NuS5LvNUpcJWk0Htik3J/w,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pebolle-IWqWACnzNjzz+pZb47iToQ, arnd-r2nGTMty4D4,
	mitchelh-sgV2jX0FEOL9JmXXK+q4OQ,
	youhua.li-NuS5LvNUpcJWk0Htik3J/w, k.zhang-NuS5LvNUpcJWk0Htik3J/w,
	frederic.chen-NuS5LvNUpcJWk0Htik3J/w

This patch set adds support for m4u(Multimedia Memory Management Unit),
Currently it only support the m4u with 2 levels of pagetable on mt8173.

  It's based on Robin Murphy's arm64 DMA-v5[1] and Robin's allow DMA API use in
io-pgtable-arm[2]. The dsti is based on MTK clock patch[3].
 
  Please check the hardware block diagram of Mediatek IOMMU.
 
              EMI (External Memory Interface)
               |
              m4u (Multimedia Memory Management Unit)
               |
              smi (Smart Multimedia Interface)
               |
        +---------------+-------
        |               |
        |               |
    vdec larb       disp larb      ... SoCs have different local arbiter(larb).
        |               |
        |               |
   +----+----+    +-----+-----+
   |    |    |    |     |     |    ...
   |    |    |    |     |     |    ...
   |    |    |    |     |     |    ...
  MC   PP   VLD  OVL0 RDMA0 WDMA0  ... There are different ports in each larb.
  
  Normally we specify a local arbiter(larb) for each multimedia hardware like
display, video decode, video encode and camera. And there are different ports in
each larb. Take a example, there are some ports like MC, PP, UFO, VLD, AVC_MV,
PRED_RD in video larb, all the ports are according to the video hardware.
 
  From the diagram, all the multimedia module connect with m4u via smi.
SMI is responsible to enable/disable iommu and control the clocks for each local
arbiter. If we should enable the iommu of video decode, it should config the
video's ports. And if the video hardware work whether enable/disable iommu, 
it should enable the clock of its larb's clock. And smi also help bandwidth
control for each local. So we add a special driver for smi and locate
it drivers/memory.

v4:
-use only one iommu domain here based on the Robin's DMA-v5.
-remove flush_pgtable.
-change writel to writel_relaxed.
-about Short-descriptor: move dma_map_single into io-pgtable-arm-short.
 Improve the flow of free pgtable and add NO_XN+NO_PERMS quirk following
 Will's suggestion.
-Change two sytle issues in dtsi according to Daniel's suggestion.

v3: http://lists.linuxfoundation.org/pipermail/iommu/2015-July/013632.html
-rebased onto v4.2-rc1
-improve iommu flow based on the Robin's DMA v3[4].
-change mtk iommu-cells from 1 to 2.
-about Short-descriptor: add split function; add self-test; add some other bits like nG,
 XN according to the spec; add SUPERSECTION and MTK quirk; move io_pgtable_ops_to_pgtable
 out from LPAE to the header file.
-about SMI: move from driver/soc/mediatek to driver/memory; change the clocks from
 clk[2] to clk_apb and clk_smi; add pm.
-add iommu suspend/resume to backup/restore register.

v2: http://lists.linuxfoundation.org/pipermail/iommu/2015-May/013028.html
-add arm short descriptor support.
-seperate smi common from smi and change the clock-names according
 to smi HW.
-delete the hardcode of the port-names in mt8173.
 replace this with larb-portes-nr in dtsi.
-fix some coding style issues.

v1: http://lists.infradead.org/pipermail/linux-mediatek/2015-March/000058.html
-initial version.

[1]: http://lists.linuxfoundation.org/pipermail/iommu/2015-July/013900.html
[2]: http://lists.linuxfoundation.org/pipermail/iommu/2015-July/013876.html
[3]: http://lists.infradead.org/pipermail/linux-mediatek/2015-July/001800.html
[4]: http://lists.linuxfoundation.org/pipermail/iommu/2015-July/013597.html

Yong Wu (6):
  dt-bindings: iommu: Add binding for mediatek IOMMU
  dt-bindings: mediatek: Add smi dts binding
  iommu: add ARM short descriptor page table allocator.
  memory: mediatek: Add SMI driver
  iommu/mediatek: Add mt8173 IOMMU driver
  dts: mt8173: Add iommu/smi nodes for mt8173

 .../devicetree/bindings/iommu/mediatek,iommu.txt   |  61 ++
 .../memory-controllers/mediatek,smi-larb.txt       |  25 +
 .../bindings/memory-controllers/mediatek,smi.txt   |  24 +
 arch/arm64/boot/dts/mediatek/mt8173.dtsi           |  81 ++
 drivers/iommu/Kconfig                              |  31 +
 drivers/iommu/Makefile                             |   2 +
 drivers/iommu/io-pgtable-arm-short.c               | 811 +++++++++++++++++++++
 drivers/iommu/io-pgtable-arm.c                     |   3 -
 drivers/iommu/io-pgtable.c                         |   4 +
 drivers/iommu/io-pgtable.h                         |  14 +
 drivers/iommu/mtk_iommu.c                          | 714 ++++++++++++++++++
 drivers/memory/Kconfig                             |   8 +
 drivers/memory/Makefile                            |   1 +
 drivers/memory/mtk-smi.c                           | 285 ++++++++
 include/dt-bindings/memory/mt8173-larb-port.h      | 105 +++
 include/soc/mediatek/smi.h                         |  60 ++
 16 files changed, 2226 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
 create mode 100644 Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.txt
 create mode 100644 Documentation/devicetree/bindings/memory-controllers/mediatek,smi.txt
 create mode 100644 drivers/iommu/io-pgtable-arm-short.c
 create mode 100644 drivers/iommu/mtk_iommu.c
 create mode 100644 drivers/memory/mtk-smi.c
 create mode 100644 include/dt-bindings/memory/mt8173-larb-port.h
 create mode 100644 include/soc/mediatek/smi.h

-- 
1.8.1.1.dirty

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v4 0/6] MT8173 IOMMU SUPPORT
@ 2015-08-03 10:21 ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-03 10:21 UTC (permalink / raw)
  To: linux-arm-kernel

This patch set adds support for m4u(Multimedia Memory Management Unit),
Currently it only support the m4u with 2 levels of pagetable on mt8173.

  It's based on Robin Murphy's arm64 DMA-v5[1] and Robin's allow DMA API use in
io-pgtable-arm[2]. The dsti is based on MTK clock patch[3].
 
  Please check the hardware block diagram of Mediatek IOMMU.
 
              EMI (External Memory Interface)
               |
              m4u (Multimedia Memory Management Unit)
               |
              smi (Smart Multimedia Interface)
               |
        +---------------+-------
        |               |
        |               |
    vdec larb       disp larb      ... SoCs have different local arbiter(larb).
        |               |
        |               |
   +----+----+    +-----+-----+
   |    |    |    |     |     |    ...
   |    |    |    |     |     |    ...
   |    |    |    |     |     |    ...
  MC   PP   VLD  OVL0 RDMA0 WDMA0  ... There are different ports in each larb.
  
  Normally we specify a local arbiter(larb) for each multimedia hardware like
display, video decode, video encode and camera. And there are different ports in
each larb. Take a example, there are some ports like MC, PP, UFO, VLD, AVC_MV,
PRED_RD in video larb, all the ports are according to the video hardware.
 
  From the diagram, all the multimedia module connect with m4u via smi.
SMI is responsible to enable/disable iommu and control the clocks for each local
arbiter. If we should enable the iommu of video decode, it should config the
video's ports. And if the video hardware work whether enable/disable iommu, 
it should enable the clock of its larb's clock. And smi also help bandwidth
control for each local. So we add a special driver for smi and locate
it drivers/memory.

v4:
-use only one iommu domain here based on the Robin's DMA-v5.
-remove flush_pgtable.
-change writel to writel_relaxed.
-about Short-descriptor: move dma_map_single into io-pgtable-arm-short.
 Improve the flow of free pgtable and add NO_XN+NO_PERMS quirk following
 Will's suggestion.
-Change two sytle issues in dtsi according to Daniel's suggestion.

v3: http://lists.linuxfoundation.org/pipermail/iommu/2015-July/013632.html
-rebased onto v4.2-rc1
-improve iommu flow based on the Robin's DMA v3[4].
-change mtk iommu-cells from 1 to 2.
-about Short-descriptor: add split function; add self-test; add some other bits like nG,
 XN according to the spec; add SUPERSECTION and MTK quirk; move io_pgtable_ops_to_pgtable
 out from LPAE to the header file.
-about SMI: move from driver/soc/mediatek to driver/memory; change the clocks from
 clk[2] to clk_apb and clk_smi; add pm.
-add iommu suspend/resume to backup/restore register.

v2: http://lists.linuxfoundation.org/pipermail/iommu/2015-May/013028.html
-add arm short descriptor support.
-seperate smi common from smi and change the clock-names according
 to smi HW.
-delete the hardcode of the port-names in mt8173.
 replace this with larb-portes-nr in dtsi.
-fix some coding style issues.

v1: http://lists.infradead.org/pipermail/linux-mediatek/2015-March/000058.html
-initial version.

[1]: http://lists.linuxfoundation.org/pipermail/iommu/2015-July/013900.html
[2]: http://lists.linuxfoundation.org/pipermail/iommu/2015-July/013876.html
[3]: http://lists.infradead.org/pipermail/linux-mediatek/2015-July/001800.html
[4]: http://lists.linuxfoundation.org/pipermail/iommu/2015-July/013597.html

Yong Wu (6):
  dt-bindings: iommu: Add binding for mediatek IOMMU
  dt-bindings: mediatek: Add smi dts binding
  iommu: add ARM short descriptor page table allocator.
  memory: mediatek: Add SMI driver
  iommu/mediatek: Add mt8173 IOMMU driver
  dts: mt8173: Add iommu/smi nodes for mt8173

 .../devicetree/bindings/iommu/mediatek,iommu.txt   |  61 ++
 .../memory-controllers/mediatek,smi-larb.txt       |  25 +
 .../bindings/memory-controllers/mediatek,smi.txt   |  24 +
 arch/arm64/boot/dts/mediatek/mt8173.dtsi           |  81 ++
 drivers/iommu/Kconfig                              |  31 +
 drivers/iommu/Makefile                             |   2 +
 drivers/iommu/io-pgtable-arm-short.c               | 811 +++++++++++++++++++++
 drivers/iommu/io-pgtable-arm.c                     |   3 -
 drivers/iommu/io-pgtable.c                         |   4 +
 drivers/iommu/io-pgtable.h                         |  14 +
 drivers/iommu/mtk_iommu.c                          | 714 ++++++++++++++++++
 drivers/memory/Kconfig                             |   8 +
 drivers/memory/Makefile                            |   1 +
 drivers/memory/mtk-smi.c                           | 285 ++++++++
 include/dt-bindings/memory/mt8173-larb-port.h      | 105 +++
 include/soc/mediatek/smi.h                         |  60 ++
 16 files changed, 2226 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
 create mode 100644 Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.txt
 create mode 100644 Documentation/devicetree/bindings/memory-controllers/mediatek,smi.txt
 create mode 100644 drivers/iommu/io-pgtable-arm-short.c
 create mode 100644 drivers/iommu/mtk_iommu.c
 create mode 100644 drivers/memory/mtk-smi.c
 create mode 100644 include/dt-bindings/memory/mt8173-larb-port.h
 create mode 100644 include/soc/mediatek/smi.h

-- 
1.8.1.1.dirty

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v4 1/6] dt-bindings: iommu: Add binding for mediatek IOMMU
@ 2015-08-03 10:21   ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-03 10:21 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Mark Rutland, Matthias Brugger
  Cc: Robin Murphy, Will Deacon, Daniel Kurtz, Tomasz Figa,
	Lucas Stach, Rob Herring, Catalin Marinas, linux-mediatek,
	Sasha Hauer, srv_heupstream, devicetree, linux-kernel,
	linux-arm-kernel, iommu, pebolle, arnd, mitchelh, youhua.li,
	k.zhang, frederic.chen, Yong Wu

This patch add mediatek iommu dts binding document.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
---
 .../devicetree/bindings/iommu/mediatek,iommu.txt   |  61 ++++++++++++
 include/dt-bindings/memory/mt8173-larb-port.h      | 105 +++++++++++++++++++++
 2 files changed, 166 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
 create mode 100644 include/dt-bindings/memory/mt8173-larb-port.h

diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt b/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
new file mode 100644
index 0000000..691739a
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
@@ -0,0 +1,61 @@
+* Mediatek IOMMU Architecture Implementation
+
+  Mediatek Socs may contain a implementation of Multimedia Memory
+Management Unit(M4U),which use ARM Short-descriptor translation table
+to achieve address translation.
+
+  The IOMMU Hardware Block Diagram, please check below:
+
+              EMI (External Memory Interface)
+               |
+              m4u (Multimedia Memory Management Unit)
+               |
+              smi (Smart Multimedia Interface)
+               |
+        +---------------+-------
+        |               |
+        |               |
+    vdec larb       disp larb      ... SoCs have different local arbiter(larb).
+        |               |
+        |               |
+   +----+----+    +-----+-----+
+   |    |    |    |     |     |    ...
+   |    |    |    |     |     |    ...
+   |    |    |    |     |     |    ...
+  MC   PP   VLD  OVL0 RDMA0 WDMA0  ... There are different ports in each larb.
+
+  As above, The Multimedia HW will go through SMI and M4U while it
+access EMI. SMI is a brige between m4u and the Multimedia HW. It contain
+smi local arbiter and smi common. It will control whether the Multimedia
+HW should go though the m4u for translation or bypass it and talk
+directly with EMI. And also SMI help control the clocks for each
+local arbiter.
+  Normally we specify a local arbiter(larb) for each multimedia HW
+like display, video decode, and camera. And there are different ports
+in each larb. Take a example, There are some ports like MC, PP, VLD in the
+video decode local arbiter, all the ports are according to the video HW.
+
+Required properties:
+- compatible : must be "mediatek,mt8173-m4u".
+- reg : m4u register base and size.
+- interrupts : the interrupt of m4u.
+- clocks : must contain one entry for each clock-names.
+- clock-names : must be "bclk", It is the block clock of m4u.
+- mediatek,larb : List of phandle to the local arbiters in the current Socs.
+	Refer to bindings/memory-controllers/mediatek,smi-larb.txt. It must sort
+	according to the local arbiter index, like larb0, larb1, larb2...
+- iommu-cells : must be 2. There are 2 cells needed to enable/disable iommu.
+	The first one is local arbiter index(larbid), and the other is port
+	index(portid) within local arbiter. Specifies the larbid and portid as
+	defined in dt-binding/memory/mt8173-larb-port.h.
+
+Example:
+	iommu: mmsys_iommu@10205000 {
+		compatible = "mediatek,mt8173-m4u";
+		reg = <0 0x10205000 0 0x1000>;
+		interrupts = <GIC_SPI 139 IRQ_TYPE_LEVEL_LOW>;
+		clocks = <&infracfg CLK_INFRA_M4U>;
+		clock-names = "bclk";
+		mediatek,larb = <&larb0 &larb1 &larb2 &larb3 &larb4 &larb5>;
+		#iommu-cells = <2>;
+	};
diff --git a/include/dt-bindings/memory/mt8173-larb-port.h b/include/dt-bindings/memory/mt8173-larb-port.h
new file mode 100644
index 0000000..7517087
--- /dev/null
+++ b/include/dt-bindings/memory/mt8173-larb-port.h
@@ -0,0 +1,105 @@
+/*
+ * Copyright (c) 2014-2015 MediaTek Inc.
+ * Author: Yong Wu <yong.wu@mediatek.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#ifndef __DTS_IOMMU_PORT_MT8173_H
+#define __DTS_IOMMU_PORT_MT8173_H
+
+#define M4U_LARB0_ID			0
+#define M4U_LARB1_ID			1
+#define M4U_LARB2_ID			2
+#define M4U_LARB3_ID			3
+#define M4U_LARB4_ID			4
+#define M4U_LARB5_ID			5
+
+/* larb0 */
+#define	M4U_PORT_DISP_OVL0		0
+#define	M4U_PORT_DISP_RDMA0		1
+#define	M4U_PORT_DISP_WDMA0		2
+#define	M4U_PORT_DISP_OD_R		3
+#define	M4U_PORT_DISP_OD_W		4
+#define	M4U_PORT_MDP_RDMA0		5
+#define	M4U_PORT_MDP_WDMA		6
+#define	M4U_PORT_MDP_WROT0		7
+
+/* larb1 */
+#define	M4U_PORT_HW_VDEC_MC_EXT		0
+#define	M4U_PORT_HW_VDEC_PP_EXT		1
+#define	M4U_PORT_HW_VDEC_UFO_EXT	2
+#define	M4U_PORT_HW_VDEC_VLD_EXT	3
+#define	M4U_PORT_HW_VDEC_VLD2_EXT	4
+#define	M4U_PORT_HW_VDEC_AVC_MV_EXT	5
+#define	M4U_PORT_HW_VDEC_PRED_RD_EXT	6
+#define	M4U_PORT_HW_VDEC_PRED_WR_EXT	7
+#define	M4U_PORT_HW_VDEC_PPWRAP_EXT	8
+#define	M4U_PORT_HW_VDEC_TILE		9
+
+/* larb2 */
+#define	M4U_PORT_IMGO			0
+#define	M4U_PORT_RRZO			1
+#define	M4U_PORT_AAO			2
+#define	M4U_PORT_LCSO			3
+#define	M4U_PORT_ESFKO			4
+#define	M4U_PORT_IMGO_D			5
+#define	M4U_PORT_LSCI			6
+#define	M4U_PORT_LSCI_D			7
+#define	M4U_PORT_BPCI			8
+#define	M4U_PORT_BPCI_D			9
+#define	M4U_PORT_UFDI			10
+#define	M4U_PORT_IMGI			11
+#define	M4U_PORT_IMG2O			12
+#define	M4U_PORT_IMG3O			13
+#define	M4U_PORT_VIPI			14
+#define	M4U_PORT_VIP2I			15
+#define	M4U_PORT_VIP3I			16
+#define	M4U_PORT_LCEI			17
+#define	M4U_PORT_RB			18
+#define	M4U_PORT_RP			19
+#define	M4U_PORT_WR			20
+
+/* larb3 */
+#define	M4U_PORT_VENC_RCPU		0
+#define	M4U_PORT_VENC_REC		1
+#define	M4U_PORT_VENC_BSDMA		2
+#define	M4U_PORT_VENC_SV_COMV		3
+#define	M4U_PORT_VENC_RD_COMV		4
+#define	M4U_PORT_JPGENC_RDMA		5
+#define	M4U_PORT_JPGENC_BSDMA		6
+#define	M4U_PORT_JPGDEC_WDMA		7
+#define	M4U_PORT_JPGDEC_BSDMA		8
+#define	M4U_PORT_VENC_CUR_LUMA		9
+#define	M4U_PORT_VENC_CUR_CHROMA	10
+#define	M4U_PORT_VENC_REF_LUMA		11
+#define	M4U_PORT_VENC_REF_CHROMA	12
+#define	M4U_PORT_VENC_NBM_RDMA		13
+#define	M4U_PORT_VENC_NBM_WDMA		14
+
+/* larb4 */
+#define	M4U_PORT_DISP_OVL1		0
+#define	M4U_PORT_DISP_RDMA1		1
+#define	M4U_PORT_DISP_RDMA2		2
+#define	M4U_PORT_DISP_WDMA1		3
+#define	M4U_PORT_MDP_RDMA1		4
+#define	M4U_PORT_MDP_WROT1		5
+
+/* larb5 */
+#define	M4U_PORT_VENC_RCPU_SET2		0
+#define	M4U_PORT_VENC_REC_FRM_SET2	1
+#define	M4U_PORT_VENC_REF_LUMA_SET2	2
+#define	M4U_PORT_VENC_REC_CHROMA_SET2	3
+#define	M4U_PORT_VENC_BSDMA_SET2	4
+#define	M4U_PORT_VENC_CUR_LUMA_SET2	5
+#define	M4U_PORT_VENC_CUR_CHROMA_SET2	6
+#define	M4U_PORT_VENC_RD_COMA_SET2	7
+#define	M4U_PORT_VENC_SV_COMA_SET2	8
+
+#endif
-- 
1.8.1.1.dirty


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v4 1/6] dt-bindings: iommu: Add binding for mediatek IOMMU
@ 2015-08-03 10:21   ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-03 10:21 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Mark Rutland, Matthias Brugger
  Cc: Robin Murphy, Will Deacon, Daniel Kurtz, Tomasz Figa,
	Lucas Stach, Rob Herring, Catalin Marinas,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Sasha Hauer,
	srv_heupstream-NuS5LvNUpcJWk0Htik3J/w,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pebolle-IWqWACnzNjzz+pZb47iToQ, arnd-r2nGTMty4D4,
	mitchelh-sgV2jX0FEOL9JmXXK+q4OQ,
	youhua.li-NuS5LvNUpcJWk0Htik3J/w, k.zhang-NuS5LvNUpcJWk0Htik3J/w,
	frederic.chen-NuS5LvNUpcJWk0Htik3J/w, Yong Wu

This patch add mediatek iommu dts binding document.

Signed-off-by: Yong Wu <yong.wu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
---
 .../devicetree/bindings/iommu/mediatek,iommu.txt   |  61 ++++++++++++
 include/dt-bindings/memory/mt8173-larb-port.h      | 105 +++++++++++++++++++++
 2 files changed, 166 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
 create mode 100644 include/dt-bindings/memory/mt8173-larb-port.h

diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt b/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
new file mode 100644
index 0000000..691739a
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
@@ -0,0 +1,61 @@
+* Mediatek IOMMU Architecture Implementation
+
+  Mediatek Socs may contain a implementation of Multimedia Memory
+Management Unit(M4U),which use ARM Short-descriptor translation table
+to achieve address translation.
+
+  The IOMMU Hardware Block Diagram, please check below:
+
+              EMI (External Memory Interface)
+               |
+              m4u (Multimedia Memory Management Unit)
+               |
+              smi (Smart Multimedia Interface)
+               |
+        +---------------+-------
+        |               |
+        |               |
+    vdec larb       disp larb      ... SoCs have different local arbiter(larb).
+        |               |
+        |               |
+   +----+----+    +-----+-----+
+   |    |    |    |     |     |    ...
+   |    |    |    |     |     |    ...
+   |    |    |    |     |     |    ...
+  MC   PP   VLD  OVL0 RDMA0 WDMA0  ... There are different ports in each larb.
+
+  As above, The Multimedia HW will go through SMI and M4U while it
+access EMI. SMI is a brige between m4u and the Multimedia HW. It contain
+smi local arbiter and smi common. It will control whether the Multimedia
+HW should go though the m4u for translation or bypass it and talk
+directly with EMI. And also SMI help control the clocks for each
+local arbiter.
+  Normally we specify a local arbiter(larb) for each multimedia HW
+like display, video decode, and camera. And there are different ports
+in each larb. Take a example, There are some ports like MC, PP, VLD in the
+video decode local arbiter, all the ports are according to the video HW.
+
+Required properties:
+- compatible : must be "mediatek,mt8173-m4u".
+- reg : m4u register base and size.
+- interrupts : the interrupt of m4u.
+- clocks : must contain one entry for each clock-names.
+- clock-names : must be "bclk", It is the block clock of m4u.
+- mediatek,larb : List of phandle to the local arbiters in the current Socs.
+	Refer to bindings/memory-controllers/mediatek,smi-larb.txt. It must sort
+	according to the local arbiter index, like larb0, larb1, larb2...
+- iommu-cells : must be 2. There are 2 cells needed to enable/disable iommu.
+	The first one is local arbiter index(larbid), and the other is port
+	index(portid) within local arbiter. Specifies the larbid and portid as
+	defined in dt-binding/memory/mt8173-larb-port.h.
+
+Example:
+	iommu: mmsys_iommu@10205000 {
+		compatible = "mediatek,mt8173-m4u";
+		reg = <0 0x10205000 0 0x1000>;
+		interrupts = <GIC_SPI 139 IRQ_TYPE_LEVEL_LOW>;
+		clocks = <&infracfg CLK_INFRA_M4U>;
+		clock-names = "bclk";
+		mediatek,larb = <&larb0 &larb1 &larb2 &larb3 &larb4 &larb5>;
+		#iommu-cells = <2>;
+	};
diff --git a/include/dt-bindings/memory/mt8173-larb-port.h b/include/dt-bindings/memory/mt8173-larb-port.h
new file mode 100644
index 0000000..7517087
--- /dev/null
+++ b/include/dt-bindings/memory/mt8173-larb-port.h
@@ -0,0 +1,105 @@
+/*
+ * Copyright (c) 2014-2015 MediaTek Inc.
+ * Author: Yong Wu <yong.wu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#ifndef __DTS_IOMMU_PORT_MT8173_H
+#define __DTS_IOMMU_PORT_MT8173_H
+
+#define M4U_LARB0_ID			0
+#define M4U_LARB1_ID			1
+#define M4U_LARB2_ID			2
+#define M4U_LARB3_ID			3
+#define M4U_LARB4_ID			4
+#define M4U_LARB5_ID			5
+
+/* larb0 */
+#define	M4U_PORT_DISP_OVL0		0
+#define	M4U_PORT_DISP_RDMA0		1
+#define	M4U_PORT_DISP_WDMA0		2
+#define	M4U_PORT_DISP_OD_R		3
+#define	M4U_PORT_DISP_OD_W		4
+#define	M4U_PORT_MDP_RDMA0		5
+#define	M4U_PORT_MDP_WDMA		6
+#define	M4U_PORT_MDP_WROT0		7
+
+/* larb1 */
+#define	M4U_PORT_HW_VDEC_MC_EXT		0
+#define	M4U_PORT_HW_VDEC_PP_EXT		1
+#define	M4U_PORT_HW_VDEC_UFO_EXT	2
+#define	M4U_PORT_HW_VDEC_VLD_EXT	3
+#define	M4U_PORT_HW_VDEC_VLD2_EXT	4
+#define	M4U_PORT_HW_VDEC_AVC_MV_EXT	5
+#define	M4U_PORT_HW_VDEC_PRED_RD_EXT	6
+#define	M4U_PORT_HW_VDEC_PRED_WR_EXT	7
+#define	M4U_PORT_HW_VDEC_PPWRAP_EXT	8
+#define	M4U_PORT_HW_VDEC_TILE		9
+
+/* larb2 */
+#define	M4U_PORT_IMGO			0
+#define	M4U_PORT_RRZO			1
+#define	M4U_PORT_AAO			2
+#define	M4U_PORT_LCSO			3
+#define	M4U_PORT_ESFKO			4
+#define	M4U_PORT_IMGO_D			5
+#define	M4U_PORT_LSCI			6
+#define	M4U_PORT_LSCI_D			7
+#define	M4U_PORT_BPCI			8
+#define	M4U_PORT_BPCI_D			9
+#define	M4U_PORT_UFDI			10
+#define	M4U_PORT_IMGI			11
+#define	M4U_PORT_IMG2O			12
+#define	M4U_PORT_IMG3O			13
+#define	M4U_PORT_VIPI			14
+#define	M4U_PORT_VIP2I			15
+#define	M4U_PORT_VIP3I			16
+#define	M4U_PORT_LCEI			17
+#define	M4U_PORT_RB			18
+#define	M4U_PORT_RP			19
+#define	M4U_PORT_WR			20
+
+/* larb3 */
+#define	M4U_PORT_VENC_RCPU		0
+#define	M4U_PORT_VENC_REC		1
+#define	M4U_PORT_VENC_BSDMA		2
+#define	M4U_PORT_VENC_SV_COMV		3
+#define	M4U_PORT_VENC_RD_COMV		4
+#define	M4U_PORT_JPGENC_RDMA		5
+#define	M4U_PORT_JPGENC_BSDMA		6
+#define	M4U_PORT_JPGDEC_WDMA		7
+#define	M4U_PORT_JPGDEC_BSDMA		8
+#define	M4U_PORT_VENC_CUR_LUMA		9
+#define	M4U_PORT_VENC_CUR_CHROMA	10
+#define	M4U_PORT_VENC_REF_LUMA		11
+#define	M4U_PORT_VENC_REF_CHROMA	12
+#define	M4U_PORT_VENC_NBM_RDMA		13
+#define	M4U_PORT_VENC_NBM_WDMA		14
+
+/* larb4 */
+#define	M4U_PORT_DISP_OVL1		0
+#define	M4U_PORT_DISP_RDMA1		1
+#define	M4U_PORT_DISP_RDMA2		2
+#define	M4U_PORT_DISP_WDMA1		3
+#define	M4U_PORT_MDP_RDMA1		4
+#define	M4U_PORT_MDP_WROT1		5
+
+/* larb5 */
+#define	M4U_PORT_VENC_RCPU_SET2		0
+#define	M4U_PORT_VENC_REC_FRM_SET2	1
+#define	M4U_PORT_VENC_REF_LUMA_SET2	2
+#define	M4U_PORT_VENC_REC_CHROMA_SET2	3
+#define	M4U_PORT_VENC_BSDMA_SET2	4
+#define	M4U_PORT_VENC_CUR_LUMA_SET2	5
+#define	M4U_PORT_VENC_CUR_CHROMA_SET2	6
+#define	M4U_PORT_VENC_RD_COMA_SET2	7
+#define	M4U_PORT_VENC_SV_COMA_SET2	8
+
+#endif
-- 
1.8.1.1.dirty

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v4 1/6] dt-bindings: iommu: Add binding for mediatek IOMMU
@ 2015-08-03 10:21   ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-03 10:21 UTC (permalink / raw)
  To: linux-arm-kernel

This patch add mediatek iommu dts binding document.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
---
 .../devicetree/bindings/iommu/mediatek,iommu.txt   |  61 ++++++++++++
 include/dt-bindings/memory/mt8173-larb-port.h      | 105 +++++++++++++++++++++
 2 files changed, 166 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
 create mode 100644 include/dt-bindings/memory/mt8173-larb-port.h

diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt b/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
new file mode 100644
index 0000000..691739a
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
@@ -0,0 +1,61 @@
+* Mediatek IOMMU Architecture Implementation
+
+  Mediatek Socs may contain a implementation of Multimedia Memory
+Management Unit(M4U),which use ARM Short-descriptor translation table
+to achieve address translation.
+
+  The IOMMU Hardware Block Diagram, please check below:
+
+              EMI (External Memory Interface)
+               |
+              m4u (Multimedia Memory Management Unit)
+               |
+              smi (Smart Multimedia Interface)
+               |
+        +---------------+-------
+        |               |
+        |               |
+    vdec larb       disp larb      ... SoCs have different local arbiter(larb).
+        |               |
+        |               |
+   +----+----+    +-----+-----+
+   |    |    |    |     |     |    ...
+   |    |    |    |     |     |    ...
+   |    |    |    |     |     |    ...
+  MC   PP   VLD  OVL0 RDMA0 WDMA0  ... There are different ports in each larb.
+
+  As above, The Multimedia HW will go through SMI and M4U while it
+access EMI. SMI is a brige between m4u and the Multimedia HW. It contain
+smi local arbiter and smi common. It will control whether the Multimedia
+HW should go though the m4u for translation or bypass it and talk
+directly with EMI. And also SMI help control the clocks for each
+local arbiter.
+  Normally we specify a local arbiter(larb) for each multimedia HW
+like display, video decode, and camera. And there are different ports
+in each larb. Take a example, There are some ports like MC, PP, VLD in the
+video decode local arbiter, all the ports are according to the video HW.
+
+Required properties:
+- compatible : must be "mediatek,mt8173-m4u".
+- reg : m4u register base and size.
+- interrupts : the interrupt of m4u.
+- clocks : must contain one entry for each clock-names.
+- clock-names : must be "bclk", It is the block clock of m4u.
+- mediatek,larb : List of phandle to the local arbiters in the current Socs.
+	Refer to bindings/memory-controllers/mediatek,smi-larb.txt. It must sort
+	according to the local arbiter index, like larb0, larb1, larb2...
+- iommu-cells : must be 2. There are 2 cells needed to enable/disable iommu.
+	The first one is local arbiter index(larbid), and the other is port
+	index(portid) within local arbiter. Specifies the larbid and portid as
+	defined in dt-binding/memory/mt8173-larb-port.h.
+
+Example:
+	iommu: mmsys_iommu at 10205000 {
+		compatible = "mediatek,mt8173-m4u";
+		reg = <0 0x10205000 0 0x1000>;
+		interrupts = <GIC_SPI 139 IRQ_TYPE_LEVEL_LOW>;
+		clocks = <&infracfg CLK_INFRA_M4U>;
+		clock-names = "bclk";
+		mediatek,larb = <&larb0 &larb1 &larb2 &larb3 &larb4 &larb5>;
+		#iommu-cells = <2>;
+	};
diff --git a/include/dt-bindings/memory/mt8173-larb-port.h b/include/dt-bindings/memory/mt8173-larb-port.h
new file mode 100644
index 0000000..7517087
--- /dev/null
+++ b/include/dt-bindings/memory/mt8173-larb-port.h
@@ -0,0 +1,105 @@
+/*
+ * Copyright (c) 2014-2015 MediaTek Inc.
+ * Author: Yong Wu <yong.wu@mediatek.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#ifndef __DTS_IOMMU_PORT_MT8173_H
+#define __DTS_IOMMU_PORT_MT8173_H
+
+#define M4U_LARB0_ID			0
+#define M4U_LARB1_ID			1
+#define M4U_LARB2_ID			2
+#define M4U_LARB3_ID			3
+#define M4U_LARB4_ID			4
+#define M4U_LARB5_ID			5
+
+/* larb0 */
+#define	M4U_PORT_DISP_OVL0		0
+#define	M4U_PORT_DISP_RDMA0		1
+#define	M4U_PORT_DISP_WDMA0		2
+#define	M4U_PORT_DISP_OD_R		3
+#define	M4U_PORT_DISP_OD_W		4
+#define	M4U_PORT_MDP_RDMA0		5
+#define	M4U_PORT_MDP_WDMA		6
+#define	M4U_PORT_MDP_WROT0		7
+
+/* larb1 */
+#define	M4U_PORT_HW_VDEC_MC_EXT		0
+#define	M4U_PORT_HW_VDEC_PP_EXT		1
+#define	M4U_PORT_HW_VDEC_UFO_EXT	2
+#define	M4U_PORT_HW_VDEC_VLD_EXT	3
+#define	M4U_PORT_HW_VDEC_VLD2_EXT	4
+#define	M4U_PORT_HW_VDEC_AVC_MV_EXT	5
+#define	M4U_PORT_HW_VDEC_PRED_RD_EXT	6
+#define	M4U_PORT_HW_VDEC_PRED_WR_EXT	7
+#define	M4U_PORT_HW_VDEC_PPWRAP_EXT	8
+#define	M4U_PORT_HW_VDEC_TILE		9
+
+/* larb2 */
+#define	M4U_PORT_IMGO			0
+#define	M4U_PORT_RRZO			1
+#define	M4U_PORT_AAO			2
+#define	M4U_PORT_LCSO			3
+#define	M4U_PORT_ESFKO			4
+#define	M4U_PORT_IMGO_D			5
+#define	M4U_PORT_LSCI			6
+#define	M4U_PORT_LSCI_D			7
+#define	M4U_PORT_BPCI			8
+#define	M4U_PORT_BPCI_D			9
+#define	M4U_PORT_UFDI			10
+#define	M4U_PORT_IMGI			11
+#define	M4U_PORT_IMG2O			12
+#define	M4U_PORT_IMG3O			13
+#define	M4U_PORT_VIPI			14
+#define	M4U_PORT_VIP2I			15
+#define	M4U_PORT_VIP3I			16
+#define	M4U_PORT_LCEI			17
+#define	M4U_PORT_RB			18
+#define	M4U_PORT_RP			19
+#define	M4U_PORT_WR			20
+
+/* larb3 */
+#define	M4U_PORT_VENC_RCPU		0
+#define	M4U_PORT_VENC_REC		1
+#define	M4U_PORT_VENC_BSDMA		2
+#define	M4U_PORT_VENC_SV_COMV		3
+#define	M4U_PORT_VENC_RD_COMV		4
+#define	M4U_PORT_JPGENC_RDMA		5
+#define	M4U_PORT_JPGENC_BSDMA		6
+#define	M4U_PORT_JPGDEC_WDMA		7
+#define	M4U_PORT_JPGDEC_BSDMA		8
+#define	M4U_PORT_VENC_CUR_LUMA		9
+#define	M4U_PORT_VENC_CUR_CHROMA	10
+#define	M4U_PORT_VENC_REF_LUMA		11
+#define	M4U_PORT_VENC_REF_CHROMA	12
+#define	M4U_PORT_VENC_NBM_RDMA		13
+#define	M4U_PORT_VENC_NBM_WDMA		14
+
+/* larb4 */
+#define	M4U_PORT_DISP_OVL1		0
+#define	M4U_PORT_DISP_RDMA1		1
+#define	M4U_PORT_DISP_RDMA2		2
+#define	M4U_PORT_DISP_WDMA1		3
+#define	M4U_PORT_MDP_RDMA1		4
+#define	M4U_PORT_MDP_WROT1		5
+
+/* larb5 */
+#define	M4U_PORT_VENC_RCPU_SET2		0
+#define	M4U_PORT_VENC_REC_FRM_SET2	1
+#define	M4U_PORT_VENC_REF_LUMA_SET2	2
+#define	M4U_PORT_VENC_REC_CHROMA_SET2	3
+#define	M4U_PORT_VENC_BSDMA_SET2	4
+#define	M4U_PORT_VENC_CUR_LUMA_SET2	5
+#define	M4U_PORT_VENC_CUR_CHROMA_SET2	6
+#define	M4U_PORT_VENC_RD_COMA_SET2	7
+#define	M4U_PORT_VENC_SV_COMA_SET2	8
+
+#endif
-- 
1.8.1.1.dirty

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v4 2/6] dt-bindings: mediatek: Add smi dts binding
@ 2015-08-03 10:21   ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-03 10:21 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Mark Rutland, Matthias Brugger
  Cc: Robin Murphy, Will Deacon, Daniel Kurtz, Tomasz Figa,
	Lucas Stach, Rob Herring, Catalin Marinas, linux-mediatek,
	Sasha Hauer, srv_heupstream, devicetree, linux-kernel,
	linux-arm-kernel, iommu, pebolle, arnd, mitchelh, youhua.li,
	k.zhang, frederic.chen, Yong Wu

This patch add smi binding document.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
---
 .../memory-controllers/mediatek,smi-larb.txt       | 25 ++++++++++++++++++++++
 .../bindings/memory-controllers/mediatek,smi.txt   | 24 +++++++++++++++++++++
 2 files changed, 49 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.txt
 create mode 100644 Documentation/devicetree/bindings/memory-controllers/mediatek,smi.txt

diff --git a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.txt b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.txt
new file mode 100644
index 0000000..55ff3b7
--- /dev/null
+++ b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.txt
@@ -0,0 +1,25 @@
+SMI (Smart Multimedia Interface) Local Arbiter
+
+The hardware block diagram please check bindings/iommu/mediatek,iommu.txt
+
+Required properties:
+- compatible : must be "mediatek,mt8173-smi-larb"
+- reg : the register and size of this local arbiter.
+- mediatek,smi : a phandle to the smi_common node.
+- power-domains : a phandle to the power domain of this local arbiter.
+- clocks : Must contain an entry for each entry in clock-names.
+- clock-names: must contain 2 entries, as follows:
+  - "apb" : Advanced Peripheral Bus clock, It's the clock for setting
+	    the register.
+  - "smi" : It's the clock for transfer data and command.
+
+Example:
+	larb1: larb@16010000 {
+		compatible = "mediatek,mt8173-smi-larb";
+		reg = <0 0x16010000 0 0x1000>;
+		mediatek,smi = <&smi_common>;
+		power-domains = <&scpsys MT8173_POWER_DOMAIN_VDEC>;
+		clocks = <&vdecsys CLK_VDEC_CKEN>,
+			 <&vdecsys CLK_VDEC_LARB_CKEN>;
+		clock-names = "apb", "smi";
+	};
diff --git a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi.txt b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi.txt
new file mode 100644
index 0000000..f54e91c
--- /dev/null
+++ b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi.txt
@@ -0,0 +1,24 @@
+SMI (Smart Multimedia Interface)
+
+The hardware block diagram please check bindings/iommu/mediatek,iommu.txt
+
+Required properties:
+- compatible : must be "mediatek,mt8173-smi"
+- reg : the register and size of the SMI block.
+- power-domains : a phandle to the power domain of this local arbiter.
+- clocks : Must contain an entry for each entry in clock-names.
+- clock-names : must contain 2 entries, as follows:
+  - "apb" : Advanced Peripheral Bus clock, It's the clock for setting
+	    the register.
+  - "smi" : It's the clock for transfer data and command.
+  They may be the same if both source clock are the same.
+
+Example:
+	smi_common: smi@14022000 {
+		compatible = "mediatek,mt8173-smi";
+		reg = <0 0x14022000 0 0x1000>;
+		power-domains = <&scpsys MT8173_POWER_DOMAIN_MM>;
+		clocks = <&mmsys CLK_MM_SMI_COMMON>,
+			 <&mmsys CLK_MM_SMI_COMMON>;
+		clock-names = "apb", "smi";
+	};
-- 
1.8.1.1.dirty


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v4 2/6] dt-bindings: mediatek: Add smi dts binding
@ 2015-08-03 10:21   ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-03 10:21 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Mark Rutland, Matthias Brugger
  Cc: k.zhang-NuS5LvNUpcJWk0Htik3J/w,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	pebolle-IWqWACnzNjzz+pZb47iToQ,
	frederic.chen-NuS5LvNUpcJWk0Htik3J/w, arnd-r2nGTMty4D4,
	srv_heupstream-NuS5LvNUpcJWk0Htik3J/w, Catalin Marinas,
	Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tomasz Figa,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	Daniel Kurtz, Sasha Hauer,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	youhua.li-NuS5LvNUpcJWk0Htik3J/w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Lucas Stach

This patch add smi binding document.

Signed-off-by: Yong Wu <yong.wu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
---
 .../memory-controllers/mediatek,smi-larb.txt       | 25 ++++++++++++++++++++++
 .../bindings/memory-controllers/mediatek,smi.txt   | 24 +++++++++++++++++++++
 2 files changed, 49 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.txt
 create mode 100644 Documentation/devicetree/bindings/memory-controllers/mediatek,smi.txt

diff --git a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.txt b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.txt
new file mode 100644
index 0000000..55ff3b7
--- /dev/null
+++ b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.txt
@@ -0,0 +1,25 @@
+SMI (Smart Multimedia Interface) Local Arbiter
+
+The hardware block diagram please check bindings/iommu/mediatek,iommu.txt
+
+Required properties:
+- compatible : must be "mediatek,mt8173-smi-larb"
+- reg : the register and size of this local arbiter.
+- mediatek,smi : a phandle to the smi_common node.
+- power-domains : a phandle to the power domain of this local arbiter.
+- clocks : Must contain an entry for each entry in clock-names.
+- clock-names: must contain 2 entries, as follows:
+  - "apb" : Advanced Peripheral Bus clock, It's the clock for setting
+	    the register.
+  - "smi" : It's the clock for transfer data and command.
+
+Example:
+	larb1: larb@16010000 {
+		compatible = "mediatek,mt8173-smi-larb";
+		reg = <0 0x16010000 0 0x1000>;
+		mediatek,smi = <&smi_common>;
+		power-domains = <&scpsys MT8173_POWER_DOMAIN_VDEC>;
+		clocks = <&vdecsys CLK_VDEC_CKEN>,
+			 <&vdecsys CLK_VDEC_LARB_CKEN>;
+		clock-names = "apb", "smi";
+	};
diff --git a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi.txt b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi.txt
new file mode 100644
index 0000000..f54e91c
--- /dev/null
+++ b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi.txt
@@ -0,0 +1,24 @@
+SMI (Smart Multimedia Interface)
+
+The hardware block diagram please check bindings/iommu/mediatek,iommu.txt
+
+Required properties:
+- compatible : must be "mediatek,mt8173-smi"
+- reg : the register and size of the SMI block.
+- power-domains : a phandle to the power domain of this local arbiter.
+- clocks : Must contain an entry for each entry in clock-names.
+- clock-names : must contain 2 entries, as follows:
+  - "apb" : Advanced Peripheral Bus clock, It's the clock for setting
+	    the register.
+  - "smi" : It's the clock for transfer data and command.
+  They may be the same if both source clock are the same.
+
+Example:
+	smi_common: smi@14022000 {
+		compatible = "mediatek,mt8173-smi";
+		reg = <0 0x14022000 0 0x1000>;
+		power-domains = <&scpsys MT8173_POWER_DOMAIN_MM>;
+		clocks = <&mmsys CLK_MM_SMI_COMMON>,
+			 <&mmsys CLK_MM_SMI_COMMON>;
+		clock-names = "apb", "smi";
+	};
-- 
1.8.1.1.dirty

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v4 2/6] dt-bindings: mediatek: Add smi dts binding
@ 2015-08-03 10:21   ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-03 10:21 UTC (permalink / raw)
  To: linux-arm-kernel

This patch add smi binding document.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
---
 .../memory-controllers/mediatek,smi-larb.txt       | 25 ++++++++++++++++++++++
 .../bindings/memory-controllers/mediatek,smi.txt   | 24 +++++++++++++++++++++
 2 files changed, 49 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.txt
 create mode 100644 Documentation/devicetree/bindings/memory-controllers/mediatek,smi.txt

diff --git a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.txt b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.txt
new file mode 100644
index 0000000..55ff3b7
--- /dev/null
+++ b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi-larb.txt
@@ -0,0 +1,25 @@
+SMI (Smart Multimedia Interface) Local Arbiter
+
+The hardware block diagram please check bindings/iommu/mediatek,iommu.txt
+
+Required properties:
+- compatible : must be "mediatek,mt8173-smi-larb"
+- reg : the register and size of this local arbiter.
+- mediatek,smi : a phandle to the smi_common node.
+- power-domains : a phandle to the power domain of this local arbiter.
+- clocks : Must contain an entry for each entry in clock-names.
+- clock-names: must contain 2 entries, as follows:
+  - "apb" : Advanced Peripheral Bus clock, It's the clock for setting
+	    the register.
+  - "smi" : It's the clock for transfer data and command.
+
+Example:
+	larb1: larb at 16010000 {
+		compatible = "mediatek,mt8173-smi-larb";
+		reg = <0 0x16010000 0 0x1000>;
+		mediatek,smi = <&smi_common>;
+		power-domains = <&scpsys MT8173_POWER_DOMAIN_VDEC>;
+		clocks = <&vdecsys CLK_VDEC_CKEN>,
+			 <&vdecsys CLK_VDEC_LARB_CKEN>;
+		clock-names = "apb", "smi";
+	};
diff --git a/Documentation/devicetree/bindings/memory-controllers/mediatek,smi.txt b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi.txt
new file mode 100644
index 0000000..f54e91c
--- /dev/null
+++ b/Documentation/devicetree/bindings/memory-controllers/mediatek,smi.txt
@@ -0,0 +1,24 @@
+SMI (Smart Multimedia Interface)
+
+The hardware block diagram please check bindings/iommu/mediatek,iommu.txt
+
+Required properties:
+- compatible : must be "mediatek,mt8173-smi"
+- reg : the register and size of the SMI block.
+- power-domains : a phandle to the power domain of this local arbiter.
+- clocks : Must contain an entry for each entry in clock-names.
+- clock-names : must contain 2 entries, as follows:
+  - "apb" : Advanced Peripheral Bus clock, It's the clock for setting
+	    the register.
+  - "smi" : It's the clock for transfer data and command.
+  They may be the same if both source clock are the same.
+
+Example:
+	smi_common: smi at 14022000 {
+		compatible = "mediatek,mt8173-smi";
+		reg = <0 0x14022000 0 0x1000>;
+		power-domains = <&scpsys MT8173_POWER_DOMAIN_MM>;
+		clocks = <&mmsys CLK_MM_SMI_COMMON>,
+			 <&mmsys CLK_MM_SMI_COMMON>;
+		clock-names = "apb", "smi";
+	};
-- 
1.8.1.1.dirty

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
@ 2015-08-03 10:21   ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-03 10:21 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Mark Rutland, Matthias Brugger
  Cc: Robin Murphy, Will Deacon, Daniel Kurtz, Tomasz Figa,
	Lucas Stach, Rob Herring, Catalin Marinas, linux-mediatek,
	Sasha Hauer, srv_heupstream, devicetree, linux-kernel,
	linux-arm-kernel, iommu, pebolle, arnd, mitchelh, youhua.li,
	k.zhang, frederic.chen, Yong Wu

This patch is for ARM Short Descriptor Format.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
---
 drivers/iommu/Kconfig                |  18 +
 drivers/iommu/Makefile               |   1 +
 drivers/iommu/io-pgtable-arm-short.c | 813 +++++++++++++++++++++++++++++++++++
 drivers/iommu/io-pgtable-arm.c       |   3 -
 drivers/iommu/io-pgtable.c           |   4 +
 drivers/iommu/io-pgtable.h           |  14 +
 6 files changed, 850 insertions(+), 3 deletions(-)
 create mode 100644 drivers/iommu/io-pgtable-arm-short.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f1fb1d3..3abd066 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -39,6 +39,24 @@ config IOMMU_IO_PGTABLE_LPAE_SELFTEST
 
 	  If unsure, say N here.
 
+config IOMMU_IO_PGTABLE_SHORT
+	bool "ARMv7/v8 Short Descriptor Format"
+	select IOMMU_IO_PGTABLE
+	depends on ARM || ARM64 || COMPILE_TEST
+	help
+	  Enable support for the ARM Short-descriptor pagetable format.
+	  This allocator supports 2 levels translation tables which supports
+	  a memory map based on memory sections or pages.
+
+config IOMMU_IO_PGTABLE_SHORT_SELFTEST
+	bool "Short Descriptor selftests"
+	depends on IOMMU_IO_PGTABLE_SHORT
+	help
+	  Enable self-tests for Short-descriptor page table allocator.
+	  This performs a series of page-table consistency checks during boot.
+
+	  If unsure, say N here.
+
 endmenu
 
 config IOMMU_IOVA
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index c6dcc51..06df3e6 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
+obj-$(CONFIG_IOMMU_IO_PGTABLE_SHORT) += io-pgtable-arm-short.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
 obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o msm_iommu_dev.o
diff --git a/drivers/iommu/io-pgtable-arm-short.c b/drivers/iommu/io-pgtable-arm-short.c
new file mode 100644
index 0000000..56f5480
--- /dev/null
+++ b/drivers/iommu/io-pgtable-arm-short.c
@@ -0,0 +1,813 @@
+/*
+ * Copyright (c) 2014-2015 MediaTek Inc.
+ * Author: Yong Wu <yong.wu@mediatek.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#define pr_fmt(fmt)	"arm-short-desc io-pgtable: "fmt
+
+#include <linux/err.h>
+#include <linux/slab.h>
+#include <linux/iommu.h>
+#include <linux/errno.h>
+#include "io-pgtable.h"
+
+typedef u32 arm_short_iopte;
+
+struct arm_short_io_pgtable {
+	struct io_pgtable	iop;
+	struct kmem_cache	*pgtable_cached;
+	size_t			pgd_size;
+	void			*pgd;
+};
+
+#define io_pgtable_to_data(x)			\
+	container_of((x), struct arm_short_io_pgtable, iop)
+
+#define io_pgtable_ops_to_data(x)		\
+	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
+
+#define io_pgtable_cfg_to_pgtable(x)		\
+	container_of((x), struct io_pgtable, cfg)
+
+#define io_pgtable_cfg_to_data(x)		\
+	io_pgtable_to_data(io_pgtable_cfg_to_pgtable(x))
+
+#define ARM_SHORT_PGDIR_SHIFT			20
+#define ARM_SHORT_PAGE_SHIFT			12
+#define ARM_SHORT_PTRS_PER_PTE			\
+	(1 << (ARM_SHORT_PGDIR_SHIFT - ARM_SHORT_PAGE_SHIFT))
+#define ARM_SHORT_BYTES_PER_PTE			\
+	(ARM_SHORT_PTRS_PER_PTE * sizeof(arm_short_iopte))
+
+/* level 1 pagetable */
+#define ARM_SHORT_PGD_TYPE_PGTABLE		BIT(0)
+#define ARM_SHORT_PGD_TYPE_SECTION		BIT(1)
+#define ARM_SHORT_PGD_B				BIT(2)
+#define ARM_SHORT_PGD_C				BIT(3)
+#define ARM_SHORT_PGD_PGTABLE_NS		BIT(3)
+#define ARM_SHORT_PGD_SECTION_XN		BIT(4)
+#define ARM_SHORT_PGD_IMPLE			BIT(9)
+#define ARM_SHORT_PGD_RD_WR			(3 << 10)
+#define ARM_SHORT_PGD_RDONLY			BIT(15)
+#define ARM_SHORT_PGD_S				BIT(16)
+#define ARM_SHORT_PGD_nG			BIT(17)
+#define ARM_SHORT_PGD_SUPERSECTION		BIT(18)
+#define ARM_SHORT_PGD_SECTION_NS		BIT(19)
+
+#define ARM_SHORT_PGD_TYPE_SUPERSECTION		\
+	(ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_SUPERSECTION)
+#define ARM_SHORT_PGD_SECTION_TYPE_MSK		\
+	(ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_SUPERSECTION)
+#define ARM_SHORT_PGD_PGTABLE_TYPE_MSK		\
+	(ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_TYPE_PGTABLE)
+#define ARM_SHORT_PGD_TYPE_IS_PGTABLE(pgd)	\
+	(((pgd) & ARM_SHORT_PGD_PGTABLE_TYPE_MSK) == ARM_SHORT_PGD_TYPE_PGTABLE)
+#define ARM_SHORT_PGD_TYPE_IS_SECTION(pgd)	\
+	(((pgd) & ARM_SHORT_PGD_SECTION_TYPE_MSK) == ARM_SHORT_PGD_TYPE_SECTION)
+#define ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(pgd)	\
+	(((pgd) & ARM_SHORT_PGD_SECTION_TYPE_MSK) == \
+	ARM_SHORT_PGD_TYPE_SUPERSECTION)
+#define ARM_SHORT_PGD_PGTABLE_MSK		0xfffffc00
+#define ARM_SHORT_PGD_SECTION_MSK		(~(SZ_1M - 1))
+#define ARM_SHORT_PGD_SUPERSECTION_MSK		(~(SZ_16M - 1))
+
+/* level 2 pagetable */
+#define ARM_SHORT_PTE_TYPE_LARGE		BIT(0)
+#define ARM_SHORT_PTE_SMALL_XN			BIT(0)
+#define ARM_SHORT_PTE_TYPE_SMALL		BIT(1)
+#define ARM_SHORT_PTE_B				BIT(2)
+#define ARM_SHORT_PTE_C				BIT(3)
+#define ARM_SHORT_PTE_RD_WR			(3 << 4)
+#define ARM_SHORT_PTE_RDONLY			BIT(9)
+#define ARM_SHORT_PTE_S				BIT(10)
+#define ARM_SHORT_PTE_nG			BIT(11)
+#define ARM_SHORT_PTE_LARGE_XN			BIT(15)
+#define ARM_SHORT_PTE_LARGE_MSK			(~(SZ_64K - 1))
+#define ARM_SHORT_PTE_SMALL_MSK			(~(SZ_4K - 1))
+#define ARM_SHORT_PTE_TYPE_MSK			\
+	(ARM_SHORT_PTE_TYPE_LARGE | ARM_SHORT_PTE_TYPE_SMALL)
+#define ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(pte)	\
+	(((pte) & ARM_SHORT_PTE_TYPE_SMALL) == ARM_SHORT_PTE_TYPE_SMALL)
+#define ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(pte)	\
+	(((pte) & ARM_SHORT_PTE_TYPE_MSK) == ARM_SHORT_PTE_TYPE_LARGE)
+
+#define ARM_SHORT_PGD_IDX(a)			((a) >> ARM_SHORT_PGDIR_SHIFT)
+#define ARM_SHORT_PTE_IDX(a)			\
+	(((a) >> ARM_SHORT_PAGE_SHIFT) & (ARM_SHORT_PTRS_PER_PTE - 1))
+
+#define ARM_SHORT_GET_PGTABLE_VA(pgd)		\
+	(phys_to_virt((unsigned long)pgd & ARM_SHORT_PGD_PGTABLE_MSK))
+
+#define ARM_SHORT_PTE_LARGE_GET_PROT(pte)	\
+	(((pte) & (~ARM_SHORT_PTE_LARGE_MSK)) & ~ARM_SHORT_PTE_TYPE_MSK)
+
+#define ARM_SHORT_PGD_GET_PROT(pgd)		\
+	(((pgd) & (~ARM_SHORT_PGD_SECTION_MSK)) & ~ARM_SHORT_PGD_SUPERSECTION)
+
+static bool selftest_running;
+
+static arm_short_iopte *
+arm_short_get_pte_in_pgd(arm_short_iopte pgd, unsigned int iova)
+{
+	arm_short_iopte *pte;
+
+	pte = ARM_SHORT_GET_PGTABLE_VA(pgd);
+	pte += ARM_SHORT_PTE_IDX(iova);
+	return pte;
+}
+
+static dma_addr_t
+__arm_short_dma_addr(struct device *dev, void *va)
+{
+	return phys_to_dma(dev, virt_to_phys(va));
+}
+
+static int
+__arm_short_set_pte(arm_short_iopte *ptep, arm_short_iopte pte,
+		    unsigned int ptenr, struct io_pgtable_cfg *cfg)
+{
+	struct device *dev = cfg->iommu_dev;
+	int i;
+
+	for (i = 0; i < ptenr; i++) {
+		if (ptep[i] && pte) {
+			/* Someone else may have allocated for this pte */
+			WARN_ON(!selftest_running);
+			goto err_exist_pte;
+		}
+		ptep[i] = pte;
+	}
+
+	if (selftest_running)
+		return 0;
+
+	dma_sync_single_for_device(dev, __arm_short_dma_addr(dev, ptep),
+				   sizeof(*ptep) * ptenr, DMA_TO_DEVICE);
+	return 0;
+
+err_exist_pte:
+	while (i--)
+		ptep[i] = 0;
+	return -EEXIST;
+}
+
+static void *
+__arm_short_alloc_pgtable(size_t size, gfp_t gfp, bool pgd,
+			  struct io_pgtable_cfg *cfg)
+{
+	struct arm_short_io_pgtable *data;
+	struct device *dev = cfg->iommu_dev;
+	dma_addr_t dma;
+	void *va;
+
+	if (pgd) {/* lvl1 pagetable */
+		va = alloc_pages_exact(size, gfp);
+	} else {  /* lvl2 pagetable */
+		data = io_pgtable_cfg_to_data(cfg);
+		va = kmem_cache_zalloc(data->pgtable_cached, gfp);
+	}
+
+	if (!va)
+		return NULL;
+
+	if (selftest_running)
+		return va;
+
+	dma = dma_map_single(dev, va, size, DMA_TO_DEVICE);
+	if (dma_mapping_error(dev, dma))
+		goto out_free;
+
+	if (dma != __arm_short_dma_addr(dev, va))
+		goto out_unmap;
+
+	if (!pgd) {
+		kmemleak_ignore(va);
+		dma_sync_single_for_device(dev, __arm_short_dma_addr(dev, va),
+					   size, DMA_TO_DEVICE);
+	}
+
+	return va;
+
+out_unmap:
+	dev_err_ratelimited(dev, "Cannot accommodate DMA translation for IOMMU page tables\n");
+	dma_unmap_single(dev, dma, size, DMA_TO_DEVICE);
+out_free:
+	if (pgd)
+		free_pages_exact(va, size);
+	else
+		kmem_cache_free(data->pgtable_cached, va);
+	return NULL;
+}
+
+static void
+__arm_short_free_pgtable(void *va, size_t size, bool pgd,
+			 struct io_pgtable_cfg *cfg)
+{
+	struct arm_short_io_pgtable *data = io_pgtable_cfg_to_data(cfg);
+	struct device *dev = cfg->iommu_dev;
+
+	if (!selftest_running)
+		dma_unmap_single(dev, __arm_short_dma_addr(dev, va),
+				 size, DMA_TO_DEVICE);
+
+	if (pgd)
+		free_pages_exact(va, size);
+	else
+		kmem_cache_free(data->pgtable_cached, va);
+}
+
+static arm_short_iopte
+__arm_short_pte_prot(struct arm_short_io_pgtable *data, int prot, bool large)
+{
+	arm_short_iopte pteprot;
+	int quirk = data->iop.cfg.quirks;
+
+	pteprot = ARM_SHORT_PTE_S | ARM_SHORT_PTE_nG;
+	pteprot |= large ? ARM_SHORT_PTE_TYPE_LARGE :
+				ARM_SHORT_PTE_TYPE_SMALL;
+	if (prot & IOMMU_CACHE)
+		pteprot |=  ARM_SHORT_PTE_B | ARM_SHORT_PTE_C;
+	if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_XN) && (prot & IOMMU_NOEXEC)) {
+			pteprot |= large ? ARM_SHORT_PTE_LARGE_XN :
+				ARM_SHORT_PTE_SMALL_XN;
+	}
+	if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_PERMS)) {
+		pteprot |= ARM_SHORT_PTE_RD_WR;
+		if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
+			pteprot |= ARM_SHORT_PTE_RDONLY;
+	}
+	return pteprot;
+}
+
+static arm_short_iopte
+__arm_short_pgd_prot(struct arm_short_io_pgtable *data, int prot, bool super)
+{
+	arm_short_iopte pgdprot;
+	int quirk = data->iop.cfg.quirks;
+
+	pgdprot = ARM_SHORT_PGD_S | ARM_SHORT_PGD_nG;
+	pgdprot |= super ? ARM_SHORT_PGD_TYPE_SUPERSECTION :
+				ARM_SHORT_PGD_TYPE_SECTION;
+	if (prot & IOMMU_CACHE)
+		pgdprot |= ARM_SHORT_PGD_C | ARM_SHORT_PGD_B;
+	if (quirk & IO_PGTABLE_QUIRK_ARM_NS)
+		pgdprot |= ARM_SHORT_PGD_SECTION_NS;
+
+	if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_XN) && (prot & IOMMU_NOEXEC))
+			pgdprot |= ARM_SHORT_PGD_SECTION_XN;
+
+	if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_PERMS)) {
+		pgdprot |= ARM_SHORT_PGD_RD_WR;
+		if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
+			pgdprot |= ARM_SHORT_PGD_RDONLY;
+	}
+	return pgdprot;
+}
+
+static arm_short_iopte
+__arm_short_pte_prot_split(struct arm_short_io_pgtable *data,
+			   arm_short_iopte pgdprot,
+			   arm_short_iopte pteprot_large,
+			   bool large)
+{
+	arm_short_iopte pteprot = 0;
+
+	pteprot = ARM_SHORT_PTE_S | ARM_SHORT_PTE_nG | ARM_SHORT_PTE_RD_WR;
+	pteprot |= large ? ARM_SHORT_PTE_TYPE_LARGE :
+				ARM_SHORT_PTE_TYPE_SMALL;
+
+	/* large page to small page pte prot. Only large page may split */
+	if (!pgdprot && !large) {
+		pteprot |= pteprot_large & ~ARM_SHORT_PTE_SMALL_MSK;
+		if (pteprot_large & ARM_SHORT_PTE_LARGE_XN)
+			pteprot |= ARM_SHORT_PTE_SMALL_XN;
+	}
+
+	/* section to pte prot */
+	if (pgdprot & ARM_SHORT_PGD_C)
+		pteprot |= ARM_SHORT_PTE_C;
+	if (pgdprot & ARM_SHORT_PGD_B)
+		pteprot |= ARM_SHORT_PTE_B;
+	if (pgdprot & ARM_SHORT_PGD_nG)
+		pteprot |= ARM_SHORT_PTE_nG;
+	if (pgdprot & ARM_SHORT_PGD_SECTION_XN)
+		pteprot |= large ? ARM_SHORT_PTE_LARGE_XN :
+				ARM_SHORT_PTE_SMALL_XN;
+	if (pgdprot & ARM_SHORT_PGD_RD_WR)
+		pteprot |= ARM_SHORT_PTE_RD_WR;
+	if (pgdprot & ARM_SHORT_PGD_RDONLY)
+		pteprot |= ARM_SHORT_PTE_RDONLY;
+
+	return pteprot;
+}
+
+static arm_short_iopte
+__arm_short_pgtable_prot(struct arm_short_io_pgtable *data)
+{
+	arm_short_iopte pgdprot = 0;
+
+	pgdprot = ARM_SHORT_PGD_TYPE_PGTABLE;
+	if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS)
+		pgdprot |= ARM_SHORT_PGD_PGTABLE_NS;
+	return pgdprot;
+}
+
+static int
+_arm_short_map(struct arm_short_io_pgtable *data,
+	       unsigned int iova, phys_addr_t paddr,
+	       arm_short_iopte pgdprot, arm_short_iopte pteprot,
+	       bool large)
+{
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	arm_short_iopte *pgd = data->pgd, *pte;
+	void *pte_new = NULL;
+	int ret;
+
+	pgd += ARM_SHORT_PGD_IDX(iova);
+
+	if (!pteprot) { /* section or supersection */
+		pte = pgd;
+		pteprot = pgdprot;
+	} else {        /* page or largepage */
+		if (!(*pgd)) {
+			pte_new = __arm_short_alloc_pgtable(
+					ARM_SHORT_BYTES_PER_PTE,
+					GFP_ATOMIC, false, cfg);
+			if (unlikely(!pte_new))
+				return -ENOMEM;
+
+			pgdprot |= virt_to_phys(pte_new);
+			__arm_short_set_pte(pgd, pgdprot, 1, cfg);
+		}
+		pte = arm_short_get_pte_in_pgd(*pgd, iova);
+	}
+
+	pteprot |= (arm_short_iopte)paddr;
+	ret = __arm_short_set_pte(pte, pteprot, large ? 16 : 1, cfg);
+	if (ret && pte_new)
+		__arm_short_free_pgtable(pte_new, ARM_SHORT_BYTES_PER_PTE,
+					 false, cfg);
+	return ret;
+}
+
+static int arm_short_map(struct io_pgtable_ops *ops, unsigned long iova,
+			 phys_addr_t paddr, size_t size, int prot)
+{
+	struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	arm_short_iopte pgdprot = 0, pteprot = 0;
+	bool large;
+
+	/* If no access, then nothing to do */
+	if (!(prot & (IOMMU_READ | IOMMU_WRITE)))
+		return 0;
+
+	if (WARN_ON((iova | paddr) & (size - 1)))
+		return -EINVAL;
+
+	switch (size) {
+	case SZ_4K:
+	case SZ_64K:
+		large = (size == SZ_64K) ? true : false;
+		pteprot = __arm_short_pte_prot(data, prot, large);
+		pgdprot = __arm_short_pgtable_prot(data);
+		break;
+
+	case SZ_1M:
+	case SZ_16M:
+		large = (size == SZ_16M) ? true : false;
+		pgdprot = __arm_short_pgd_prot(data, prot, large);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return _arm_short_map(data, iova, paddr, pgdprot, pteprot, large);
+}
+
+static phys_addr_t arm_short_iova_to_phys(struct io_pgtable_ops *ops,
+					  unsigned long iova)
+{
+	struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	arm_short_iopte *pte, *pgd = data->pgd;
+	phys_addr_t pa = 0;
+
+	pgd += ARM_SHORT_PGD_IDX(iova);
+
+	if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(*pgd)) {
+		pte = arm_short_get_pte_in_pgd(*pgd, iova);
+
+		if (ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(*pte)) {
+			pa = (*pte) & ARM_SHORT_PTE_LARGE_MSK;
+			pa |= iova & ~ARM_SHORT_PTE_LARGE_MSK;
+		} else if (ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(*pte)) {
+			pa = (*pte) & ARM_SHORT_PTE_SMALL_MSK;
+			pa |= iova & ~ARM_SHORT_PTE_SMALL_MSK;
+		}
+	} else if (ARM_SHORT_PGD_TYPE_IS_SECTION(*pgd)) {
+		pa = (*pgd) & ARM_SHORT_PGD_SECTION_MSK;
+		pa |= iova & ~ARM_SHORT_PGD_SECTION_MSK;
+	} else if (ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(*pgd)) {
+		pa = (*pgd) & ARM_SHORT_PGD_SUPERSECTION_MSK;
+		pa |= iova & ~ARM_SHORT_PGD_SUPERSECTION_MSK;
+	}
+
+	return pa;
+}
+
+static bool _arm_short_whether_free_pgtable(arm_short_iopte *pgd)
+{
+	arm_short_iopte *pte;
+	int i;
+
+	pte = ARM_SHORT_GET_PGTABLE_VA(*pgd);
+	for (i = 0; i < ARM_SHORT_PTRS_PER_PTE; i++) {
+		if (pte[i] != 0)
+			return false;
+	}
+
+	return true;
+}
+
+static int
+arm_short_split_blk_unmap(struct io_pgtable_ops *ops, unsigned int iova,
+			  phys_addr_t paddr, size_t size,
+			  arm_short_iopte pgdprotup, arm_short_iopte pteprotup,
+			  size_t blk_size)
+{
+	struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	const struct iommu_gather_ops *tlb = data->iop.cfg.tlb;
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	unsigned long *pgbitmap = &cfg->pgsize_bitmap;
+	unsigned int blk_base, blk_start, blk_end, i;
+	arm_short_iopte pgdprot, pteprot;
+	phys_addr_t blk_paddr;
+	size_t mapsize = 0, nextmapsize;
+	int ret;
+
+	/* find the nearest mapsize */
+	for (i = find_first_bit(pgbitmap, BITS_PER_LONG);
+	     i < BITS_PER_LONG && ((1 << i) < blk_size) &&
+	     IS_ALIGNED(size, 1 << i);
+	     i = find_next_bit(pgbitmap, BITS_PER_LONG, i + 1))
+		mapsize = 1 << i;
+
+	if (WARN_ON(!mapsize))
+		return 0; /* Bytes unmapped */
+	nextmapsize = 1 << i;
+
+	blk_base = iova & ~(blk_size - 1);
+	blk_start = blk_base;
+	blk_end = blk_start + blk_size;
+	blk_paddr = paddr;
+
+	for (; blk_start < blk_end;
+	     blk_start += mapsize, blk_paddr += mapsize) {
+		/* Unmap! */
+		if (blk_start == iova)
+			continue;
+
+		/* Try to upper map */
+		if (blk_base != blk_start &&
+		    IS_ALIGNED(blk_start | blk_paddr, nextmapsize) &&
+		    mapsize != nextmapsize) {
+			mapsize = nextmapsize;
+			i = find_next_bit(pgbitmap, BITS_PER_LONG, i + 1);
+			if (i < BITS_PER_LONG)
+				nextmapsize = 1 << i;
+		}
+
+		if (mapsize == SZ_1M) {
+			pgdprot = pgdprotup;
+			pgdprot |= __arm_short_pgd_prot(data, 0, false);
+			pteprot = 0;
+		} else { /* small or large page */
+			pgdprot = (blk_size == SZ_64K) ? 0 : pgdprotup;
+			pteprot = __arm_short_pte_prot_split(
+					data, pgdprot, pteprotup,
+					mapsize == SZ_64K);
+			pgdprot = __arm_short_pgtable_prot(data);
+		}
+
+		ret = _arm_short_map(data, blk_start, blk_paddr, pgdprot,
+				     pteprot, mapsize == SZ_64K);
+		if (ret < 0) {
+			/* Free the table we allocated */
+			arm_short_iopte *pgd = data->pgd, *pte;
+
+			pgd += ARM_SHORT_PGD_IDX(blk_base);
+			if (*pgd) {
+				pte = ARM_SHORT_GET_PGTABLE_VA(*pgd);
+				__arm_short_set_pte(pgd, 0, 1, cfg);
+				tlb->tlb_add_flush(blk_base, blk_size, true,
+						   data->iop.cookie);
+				tlb->tlb_sync(data->iop.cookie);
+				__arm_short_free_pgtable(
+					pte, ARM_SHORT_BYTES_PER_PTE,
+					false, cfg);
+			}
+			return 0;/* Bytes unmapped */
+		}
+	}
+
+	tlb->tlb_add_flush(blk_base, blk_size, true, data->iop.cookie);
+	tlb->tlb_sync(data->iop.cookie);
+	return size;
+}
+
+static int arm_short_unmap(struct io_pgtable_ops *ops,
+			   unsigned long iova,
+			   size_t size)
+{
+	struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	arm_short_iopte *pgd, *pte = NULL;
+	arm_short_iopte curpgd, curpte = 0;
+	phys_addr_t paddr;
+	unsigned int iova_base, blk_size = 0;
+	void *cookie = data->iop.cookie;
+	bool pgtablefree = false;
+
+	pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova);
+
+	/* Get block size */
+	if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(*pgd)) {
+		pte = arm_short_get_pte_in_pgd(*pgd, iova);
+
+		if (ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(*pte))
+			blk_size = SZ_4K;
+		else if (ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(*pte))
+			blk_size = SZ_64K;
+		else
+			WARN_ON(1);
+	} else if (ARM_SHORT_PGD_TYPE_IS_SECTION(*pgd)) {
+		blk_size = SZ_1M;
+	} else if (ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(*pgd)) {
+		blk_size = SZ_16M;
+	} else {
+		WARN_ON(1);
+	}
+
+	iova_base = iova & ~(blk_size - 1);
+	pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova_base);
+	paddr = arm_short_iova_to_phys(ops, iova_base);
+	curpgd = *pgd;
+
+	if (blk_size == SZ_4K || blk_size == SZ_64K) {
+		pte = arm_short_get_pte_in_pgd(*pgd, iova_base);
+		curpte = *pte;
+		__arm_short_set_pte(pte, 0, blk_size / SZ_4K, cfg);
+
+		pgtablefree = _arm_short_whether_free_pgtable(pgd);
+		if (pgtablefree)
+			__arm_short_set_pte(pgd, 0, 1, cfg);
+	} else if (blk_size == SZ_1M || blk_size == SZ_16M) {
+		__arm_short_set_pte(pgd, 0, blk_size / SZ_1M, cfg);
+	}
+
+	cfg->tlb->tlb_add_flush(iova_base, blk_size, true, cookie);
+	cfg->tlb->tlb_sync(cookie);
+
+	if (pgtablefree)/* Free pgtable after tlb-flush */
+		__arm_short_free_pgtable(ARM_SHORT_GET_PGTABLE_VA(curpgd),
+					 ARM_SHORT_BYTES_PER_PTE, false, cfg);
+
+	if (blk_size > size) { /* Split the block */
+		return arm_short_split_blk_unmap(
+				ops, iova, paddr, size,
+				ARM_SHORT_PGD_GET_PROT(curpgd),
+				ARM_SHORT_PTE_LARGE_GET_PROT(curpte),
+				blk_size);
+	} else if (blk_size < size) {
+		/* Unmap the block while remap partial again after split */
+		return blk_size +
+			arm_short_unmap(ops, iova + blk_size, size - blk_size);
+	}
+
+	return size;
+}
+
+static struct io_pgtable *
+arm_short_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
+{
+	struct arm_short_io_pgtable *data;
+
+	if (cfg->ias > 32 || cfg->oas > 32)
+		return NULL;
+
+	cfg->pgsize_bitmap &=
+		(cfg->quirks & IO_PGTABLE_QUIRK_SHORT_SUPERSECTION) ?
+		(SZ_4K | SZ_64K | SZ_1M | SZ_16M) : (SZ_4K | SZ_64K | SZ_1M);
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return NULL;
+
+	data->pgd_size = SZ_16K;
+	data->pgd = __arm_short_alloc_pgtable(
+					data->pgd_size,
+					GFP_KERNEL | __GFP_ZERO | __GFP_DMA,
+					true, cfg);
+	if (!data->pgd)
+		goto out_free_data;
+	wmb();/* Ensure the empty pgd is visible before any actual TTBR write */
+
+	data->pgtable_cached = kmem_cache_create(
+					"io-pgtable-arm-short",
+					 ARM_SHORT_BYTES_PER_PTE,
+					 ARM_SHORT_BYTES_PER_PTE,
+					 0, NULL);
+	if (!data->pgtable_cached)
+		goto out_free_pgd;
+
+	/* TTBRs */
+	cfg->arm_short_cfg.ttbr[0] = virt_to_phys(data->pgd);
+	cfg->arm_short_cfg.ttbr[1] = 0;
+	cfg->arm_short_cfg.tcr = 0;
+	cfg->arm_short_cfg.nmrr = 0;
+	cfg->arm_short_cfg.prrr = 0;
+
+	data->iop.ops = (struct io_pgtable_ops) {
+		.map		= arm_short_map,
+		.unmap		= arm_short_unmap,
+		.iova_to_phys	= arm_short_iova_to_phys,
+	};
+
+	return &data->iop;
+
+out_free_pgd:
+	__arm_short_free_pgtable(data->pgd, data->pgd_size, true, cfg);
+out_free_data:
+	kfree(data);
+	return NULL;
+}
+
+static void arm_short_free_pgtable(struct io_pgtable *iop)
+{
+	struct arm_short_io_pgtable *data = io_pgtable_to_data(iop);
+
+	kmem_cache_destroy(data->pgtable_cached);
+	__arm_short_free_pgtable(data->pgd, data->pgd_size,
+				 true, &data->iop.cfg);
+	kfree(data);
+}
+
+struct io_pgtable_init_fns io_pgtable_arm_short_init_fns = {
+	.alloc	= arm_short_alloc_pgtable,
+	.free	= arm_short_free_pgtable,
+};
+
+#ifdef CONFIG_IOMMU_IO_PGTABLE_SHORT_SELFTEST
+
+static struct io_pgtable_cfg *cfg_cookie;
+
+static void dummy_tlb_flush_all(void *cookie)
+{
+	WARN_ON(cookie != cfg_cookie);
+}
+
+static void dummy_tlb_add_flush(unsigned long iova, size_t size, bool leaf,
+				void *cookie)
+{
+	WARN_ON(cookie != cfg_cookie);
+	WARN_ON(!(size & cfg_cookie->pgsize_bitmap));
+}
+
+static void dummy_tlb_sync(void *cookie)
+{
+	WARN_ON(cookie != cfg_cookie);
+}
+
+static struct iommu_gather_ops dummy_tlb_ops = {
+	.tlb_flush_all	= dummy_tlb_flush_all,
+	.tlb_add_flush	= dummy_tlb_add_flush,
+	.tlb_sync	= dummy_tlb_sync,
+};
+
+#define __FAIL(ops)	({				\
+		WARN(1, "selftest: test failed\n");	\
+		selftest_running = false;		\
+		-EFAULT;				\
+})
+
+static int __init arm_short_do_selftests(void)
+{
+	struct io_pgtable_ops *ops;
+	struct io_pgtable_cfg cfg = {
+		.tlb = &dummy_tlb_ops,
+		.oas = 32,
+		.ias = 32,
+		.quirks = IO_PGTABLE_QUIRK_ARM_NS |
+			IO_PGTABLE_QUIRK_SHORT_SUPERSECTION,
+		.pgsize_bitmap = SZ_4K | SZ_64K | SZ_1M | SZ_16M,
+	};
+	unsigned int iova, size, iova_start;
+	unsigned int i, loopnr = 0;
+
+	selftest_running = true;
+
+	cfg_cookie = &cfg;
+
+	ops = alloc_io_pgtable_ops(ARM_SHORT_DESC, &cfg, &cfg);
+	if (!ops) {
+		pr_err("Failed to alloc short desc io pgtable\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * Initial sanity checks.
+	 * Empty page tables shouldn't provide any translations.
+	 */
+	if (ops->iova_to_phys(ops, 42))
+		return __FAIL(ops);
+
+	if (ops->iova_to_phys(ops, SZ_1G + 42))
+		return __FAIL(ops);
+
+	if (ops->iova_to_phys(ops, SZ_2G + 42))
+		return __FAIL(ops);
+
+	/*
+	 * Distinct mappings of different granule sizes.
+	 */
+	iova = 0;
+	i = find_first_bit(&cfg.pgsize_bitmap, BITS_PER_LONG);
+	while (i != BITS_PER_LONG) {
+		size = 1UL << i;
+		if (ops->map(ops, iova, iova, size, IOMMU_READ |
+						    IOMMU_WRITE |
+						    IOMMU_NOEXEC |
+						    IOMMU_CACHE))
+			return __FAIL(ops);
+
+		/* Overlapping mappings */
+		if (!ops->map(ops, iova, iova + size, size,
+			      IOMMU_READ | IOMMU_NOEXEC))
+			return __FAIL(ops);
+
+		if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
+			return __FAIL(ops);
+
+		iova += SZ_16M;
+		i++;
+		i = find_next_bit(&cfg.pgsize_bitmap, BITS_PER_LONG, i);
+		loopnr++;
+	}
+
+	/* Partial unmap */
+	i = 1;
+	size = 1UL << __ffs(cfg.pgsize_bitmap);
+	while (i < loopnr) {
+		iova_start = i * SZ_16M;
+		if (ops->unmap(ops, iova_start + size, size) != size)
+			return __FAIL(ops);
+
+		/* Remap of partial unmap */
+		if (ops->map(ops, iova_start + size, size, size, IOMMU_READ))
+			return __FAIL(ops);
+
+		if (ops->iova_to_phys(ops, iova_start + size + 42)
+		    != (size + 42))
+			return __FAIL(ops);
+		i++;
+	}
+
+	/* Full unmap */
+	iova = 0;
+	i = find_first_bit(&cfg.pgsize_bitmap, BITS_PER_LONG);
+	while (i != BITS_PER_LONG) {
+		size = 1UL << i;
+
+		if (ops->unmap(ops, iova, size) != size)
+			return __FAIL(ops);
+
+		if (ops->iova_to_phys(ops, iova + 42))
+			return __FAIL(ops);
+
+		/* Remap full block */
+		if (ops->map(ops, iova, iova, size, IOMMU_WRITE))
+			return __FAIL(ops);
+
+		if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
+			return __FAIL(ops);
+
+		iova += SZ_16M;
+		i++;
+		i = find_next_bit(&cfg.pgsize_bitmap, BITS_PER_LONG, i);
+	}
+
+	free_io_pgtable_ops(ops);
+
+	selftest_running = false;
+	return 0;
+}
+
+subsys_initcall(arm_short_do_selftests);
+#endif
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index e4bc2b2..9978eca 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -38,9 +38,6 @@
 #define io_pgtable_to_data(x)						\
 	container_of((x), struct arm_lpae_io_pgtable, iop)
 
-#define io_pgtable_ops_to_pgtable(x)					\
-	container_of((x), struct io_pgtable, ops)
-
 #define io_pgtable_ops_to_data(x)					\
 	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
 
diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
index 6436fe2..14a9b3a 100644
--- a/drivers/iommu/io-pgtable.c
+++ b/drivers/iommu/io-pgtable.c
@@ -28,6 +28,7 @@ extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns;
 extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns;
 extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns;
 extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns;
+extern struct io_pgtable_init_fns io_pgtable_arm_short_init_fns;
 
 static const struct io_pgtable_init_fns *
 io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] =
@@ -38,6 +39,9 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] =
 	[ARM_64_LPAE_S1] = &io_pgtable_arm_64_lpae_s1_init_fns,
 	[ARM_64_LPAE_S2] = &io_pgtable_arm_64_lpae_s2_init_fns,
 #endif
+#ifdef CONFIG_IOMMU_IO_PGTABLE_SHORT
+	[ARM_SHORT_DESC] = &io_pgtable_arm_short_init_fns,
+#endif
 };
 
 struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
index 68c63d9..0f45e60 100644
--- a/drivers/iommu/io-pgtable.h
+++ b/drivers/iommu/io-pgtable.h
@@ -9,6 +9,7 @@ enum io_pgtable_fmt {
 	ARM_32_LPAE_S2,
 	ARM_64_LPAE_S1,
 	ARM_64_LPAE_S2,
+	ARM_SHORT_DESC,
 	IO_PGTABLE_NUM_FMTS,
 };
 
@@ -45,6 +46,9 @@ struct iommu_gather_ops {
  */
 struct io_pgtable_cfg {
 	#define IO_PGTABLE_QUIRK_ARM_NS	(1 << 0)	/* Set NS bit in PTEs */
+	#define IO_PGTABLE_QUIRK_SHORT_SUPERSECTION     BIT(1)
+	#define IO_PGTABLE_QUIRK_SHORT_NO_XN		BIT(2) /* No XN bit */
+	#define IO_PGTABLE_QUIRK_SHORT_NO_PERMS		BIT(3) /* No AP bit */
 	int				quirks;
 	unsigned long			pgsize_bitmap;
 	unsigned int			ias;
@@ -64,6 +68,13 @@ struct io_pgtable_cfg {
 			u64	vttbr;
 			u64	vtcr;
 		} arm_lpae_s2_cfg;
+
+		struct {
+			u32	ttbr[2];
+			u32	tcr;
+			u32	nmrr;
+			u32	prrr;
+		} arm_short_cfg;
 	};
 };
 
@@ -130,6 +141,9 @@ struct io_pgtable {
 	struct io_pgtable_ops	ops;
 };
 
+#define io_pgtable_ops_to_pgtable(x)		\
+	container_of((x), struct io_pgtable, ops)
+
 /**
  * struct io_pgtable_init_fns - Alloc/free a set of page tables for a
  *                              particular format.
-- 
1.8.1.1.dirty


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
@ 2015-08-03 10:21   ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-03 10:21 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Mark Rutland, Matthias Brugger
  Cc: k.zhang-NuS5LvNUpcJWk0Htik3J/w,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	pebolle-IWqWACnzNjzz+pZb47iToQ,
	frederic.chen-NuS5LvNUpcJWk0Htik3J/w, arnd-r2nGTMty4D4,
	srv_heupstream-NuS5LvNUpcJWk0Htik3J/w, Catalin Marinas,
	Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tomasz Figa,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	Daniel Kurtz, Sasha Hauer,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	youhua.li-NuS5LvNUpcJWk0Htik3J/w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Lucas Stach

This patch is for ARM Short Descriptor Format.

Signed-off-by: Yong Wu <yong.wu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
---
 drivers/iommu/Kconfig                |  18 +
 drivers/iommu/Makefile               |   1 +
 drivers/iommu/io-pgtable-arm-short.c | 813 +++++++++++++++++++++++++++++++++++
 drivers/iommu/io-pgtable-arm.c       |   3 -
 drivers/iommu/io-pgtable.c           |   4 +
 drivers/iommu/io-pgtable.h           |  14 +
 6 files changed, 850 insertions(+), 3 deletions(-)
 create mode 100644 drivers/iommu/io-pgtable-arm-short.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f1fb1d3..3abd066 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -39,6 +39,24 @@ config IOMMU_IO_PGTABLE_LPAE_SELFTEST
 
 	  If unsure, say N here.
 
+config IOMMU_IO_PGTABLE_SHORT
+	bool "ARMv7/v8 Short Descriptor Format"
+	select IOMMU_IO_PGTABLE
+	depends on ARM || ARM64 || COMPILE_TEST
+	help
+	  Enable support for the ARM Short-descriptor pagetable format.
+	  This allocator supports 2 levels translation tables which supports
+	  a memory map based on memory sections or pages.
+
+config IOMMU_IO_PGTABLE_SHORT_SELFTEST
+	bool "Short Descriptor selftests"
+	depends on IOMMU_IO_PGTABLE_SHORT
+	help
+	  Enable self-tests for Short-descriptor page table allocator.
+	  This performs a series of page-table consistency checks during boot.
+
+	  If unsure, say N here.
+
 endmenu
 
 config IOMMU_IOVA
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index c6dcc51..06df3e6 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
+obj-$(CONFIG_IOMMU_IO_PGTABLE_SHORT) += io-pgtable-arm-short.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
 obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o msm_iommu_dev.o
diff --git a/drivers/iommu/io-pgtable-arm-short.c b/drivers/iommu/io-pgtable-arm-short.c
new file mode 100644
index 0000000..56f5480
--- /dev/null
+++ b/drivers/iommu/io-pgtable-arm-short.c
@@ -0,0 +1,813 @@
+/*
+ * Copyright (c) 2014-2015 MediaTek Inc.
+ * Author: Yong Wu <yong.wu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#define pr_fmt(fmt)	"arm-short-desc io-pgtable: "fmt
+
+#include <linux/err.h>
+#include <linux/slab.h>
+#include <linux/iommu.h>
+#include <linux/errno.h>
+#include "io-pgtable.h"
+
+typedef u32 arm_short_iopte;
+
+struct arm_short_io_pgtable {
+	struct io_pgtable	iop;
+	struct kmem_cache	*pgtable_cached;
+	size_t			pgd_size;
+	void			*pgd;
+};
+
+#define io_pgtable_to_data(x)			\
+	container_of((x), struct arm_short_io_pgtable, iop)
+
+#define io_pgtable_ops_to_data(x)		\
+	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
+
+#define io_pgtable_cfg_to_pgtable(x)		\
+	container_of((x), struct io_pgtable, cfg)
+
+#define io_pgtable_cfg_to_data(x)		\
+	io_pgtable_to_data(io_pgtable_cfg_to_pgtable(x))
+
+#define ARM_SHORT_PGDIR_SHIFT			20
+#define ARM_SHORT_PAGE_SHIFT			12
+#define ARM_SHORT_PTRS_PER_PTE			\
+	(1 << (ARM_SHORT_PGDIR_SHIFT - ARM_SHORT_PAGE_SHIFT))
+#define ARM_SHORT_BYTES_PER_PTE			\
+	(ARM_SHORT_PTRS_PER_PTE * sizeof(arm_short_iopte))
+
+/* level 1 pagetable */
+#define ARM_SHORT_PGD_TYPE_PGTABLE		BIT(0)
+#define ARM_SHORT_PGD_TYPE_SECTION		BIT(1)
+#define ARM_SHORT_PGD_B				BIT(2)
+#define ARM_SHORT_PGD_C				BIT(3)
+#define ARM_SHORT_PGD_PGTABLE_NS		BIT(3)
+#define ARM_SHORT_PGD_SECTION_XN		BIT(4)
+#define ARM_SHORT_PGD_IMPLE			BIT(9)
+#define ARM_SHORT_PGD_RD_WR			(3 << 10)
+#define ARM_SHORT_PGD_RDONLY			BIT(15)
+#define ARM_SHORT_PGD_S				BIT(16)
+#define ARM_SHORT_PGD_nG			BIT(17)
+#define ARM_SHORT_PGD_SUPERSECTION		BIT(18)
+#define ARM_SHORT_PGD_SECTION_NS		BIT(19)
+
+#define ARM_SHORT_PGD_TYPE_SUPERSECTION		\
+	(ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_SUPERSECTION)
+#define ARM_SHORT_PGD_SECTION_TYPE_MSK		\
+	(ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_SUPERSECTION)
+#define ARM_SHORT_PGD_PGTABLE_TYPE_MSK		\
+	(ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_TYPE_PGTABLE)
+#define ARM_SHORT_PGD_TYPE_IS_PGTABLE(pgd)	\
+	(((pgd) & ARM_SHORT_PGD_PGTABLE_TYPE_MSK) == ARM_SHORT_PGD_TYPE_PGTABLE)
+#define ARM_SHORT_PGD_TYPE_IS_SECTION(pgd)	\
+	(((pgd) & ARM_SHORT_PGD_SECTION_TYPE_MSK) == ARM_SHORT_PGD_TYPE_SECTION)
+#define ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(pgd)	\
+	(((pgd) & ARM_SHORT_PGD_SECTION_TYPE_MSK) == \
+	ARM_SHORT_PGD_TYPE_SUPERSECTION)
+#define ARM_SHORT_PGD_PGTABLE_MSK		0xfffffc00
+#define ARM_SHORT_PGD_SECTION_MSK		(~(SZ_1M - 1))
+#define ARM_SHORT_PGD_SUPERSECTION_MSK		(~(SZ_16M - 1))
+
+/* level 2 pagetable */
+#define ARM_SHORT_PTE_TYPE_LARGE		BIT(0)
+#define ARM_SHORT_PTE_SMALL_XN			BIT(0)
+#define ARM_SHORT_PTE_TYPE_SMALL		BIT(1)
+#define ARM_SHORT_PTE_B				BIT(2)
+#define ARM_SHORT_PTE_C				BIT(3)
+#define ARM_SHORT_PTE_RD_WR			(3 << 4)
+#define ARM_SHORT_PTE_RDONLY			BIT(9)
+#define ARM_SHORT_PTE_S				BIT(10)
+#define ARM_SHORT_PTE_nG			BIT(11)
+#define ARM_SHORT_PTE_LARGE_XN			BIT(15)
+#define ARM_SHORT_PTE_LARGE_MSK			(~(SZ_64K - 1))
+#define ARM_SHORT_PTE_SMALL_MSK			(~(SZ_4K - 1))
+#define ARM_SHORT_PTE_TYPE_MSK			\
+	(ARM_SHORT_PTE_TYPE_LARGE | ARM_SHORT_PTE_TYPE_SMALL)
+#define ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(pte)	\
+	(((pte) & ARM_SHORT_PTE_TYPE_SMALL) == ARM_SHORT_PTE_TYPE_SMALL)
+#define ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(pte)	\
+	(((pte) & ARM_SHORT_PTE_TYPE_MSK) == ARM_SHORT_PTE_TYPE_LARGE)
+
+#define ARM_SHORT_PGD_IDX(a)			((a) >> ARM_SHORT_PGDIR_SHIFT)
+#define ARM_SHORT_PTE_IDX(a)			\
+	(((a) >> ARM_SHORT_PAGE_SHIFT) & (ARM_SHORT_PTRS_PER_PTE - 1))
+
+#define ARM_SHORT_GET_PGTABLE_VA(pgd)		\
+	(phys_to_virt((unsigned long)pgd & ARM_SHORT_PGD_PGTABLE_MSK))
+
+#define ARM_SHORT_PTE_LARGE_GET_PROT(pte)	\
+	(((pte) & (~ARM_SHORT_PTE_LARGE_MSK)) & ~ARM_SHORT_PTE_TYPE_MSK)
+
+#define ARM_SHORT_PGD_GET_PROT(pgd)		\
+	(((pgd) & (~ARM_SHORT_PGD_SECTION_MSK)) & ~ARM_SHORT_PGD_SUPERSECTION)
+
+static bool selftest_running;
+
+static arm_short_iopte *
+arm_short_get_pte_in_pgd(arm_short_iopte pgd, unsigned int iova)
+{
+	arm_short_iopte *pte;
+
+	pte = ARM_SHORT_GET_PGTABLE_VA(pgd);
+	pte += ARM_SHORT_PTE_IDX(iova);
+	return pte;
+}
+
+static dma_addr_t
+__arm_short_dma_addr(struct device *dev, void *va)
+{
+	return phys_to_dma(dev, virt_to_phys(va));
+}
+
+static int
+__arm_short_set_pte(arm_short_iopte *ptep, arm_short_iopte pte,
+		    unsigned int ptenr, struct io_pgtable_cfg *cfg)
+{
+	struct device *dev = cfg->iommu_dev;
+	int i;
+
+	for (i = 0; i < ptenr; i++) {
+		if (ptep[i] && pte) {
+			/* Someone else may have allocated for this pte */
+			WARN_ON(!selftest_running);
+			goto err_exist_pte;
+		}
+		ptep[i] = pte;
+	}
+
+	if (selftest_running)
+		return 0;
+
+	dma_sync_single_for_device(dev, __arm_short_dma_addr(dev, ptep),
+				   sizeof(*ptep) * ptenr, DMA_TO_DEVICE);
+	return 0;
+
+err_exist_pte:
+	while (i--)
+		ptep[i] = 0;
+	return -EEXIST;
+}
+
+static void *
+__arm_short_alloc_pgtable(size_t size, gfp_t gfp, bool pgd,
+			  struct io_pgtable_cfg *cfg)
+{
+	struct arm_short_io_pgtable *data;
+	struct device *dev = cfg->iommu_dev;
+	dma_addr_t dma;
+	void *va;
+
+	if (pgd) {/* lvl1 pagetable */
+		va = alloc_pages_exact(size, gfp);
+	} else {  /* lvl2 pagetable */
+		data = io_pgtable_cfg_to_data(cfg);
+		va = kmem_cache_zalloc(data->pgtable_cached, gfp);
+	}
+
+	if (!va)
+		return NULL;
+
+	if (selftest_running)
+		return va;
+
+	dma = dma_map_single(dev, va, size, DMA_TO_DEVICE);
+	if (dma_mapping_error(dev, dma))
+		goto out_free;
+
+	if (dma != __arm_short_dma_addr(dev, va))
+		goto out_unmap;
+
+	if (!pgd) {
+		kmemleak_ignore(va);
+		dma_sync_single_for_device(dev, __arm_short_dma_addr(dev, va),
+					   size, DMA_TO_DEVICE);
+	}
+
+	return va;
+
+out_unmap:
+	dev_err_ratelimited(dev, "Cannot accommodate DMA translation for IOMMU page tables\n");
+	dma_unmap_single(dev, dma, size, DMA_TO_DEVICE);
+out_free:
+	if (pgd)
+		free_pages_exact(va, size);
+	else
+		kmem_cache_free(data->pgtable_cached, va);
+	return NULL;
+}
+
+static void
+__arm_short_free_pgtable(void *va, size_t size, bool pgd,
+			 struct io_pgtable_cfg *cfg)
+{
+	struct arm_short_io_pgtable *data = io_pgtable_cfg_to_data(cfg);
+	struct device *dev = cfg->iommu_dev;
+
+	if (!selftest_running)
+		dma_unmap_single(dev, __arm_short_dma_addr(dev, va),
+				 size, DMA_TO_DEVICE);
+
+	if (pgd)
+		free_pages_exact(va, size);
+	else
+		kmem_cache_free(data->pgtable_cached, va);
+}
+
+static arm_short_iopte
+__arm_short_pte_prot(struct arm_short_io_pgtable *data, int prot, bool large)
+{
+	arm_short_iopte pteprot;
+	int quirk = data->iop.cfg.quirks;
+
+	pteprot = ARM_SHORT_PTE_S | ARM_SHORT_PTE_nG;
+	pteprot |= large ? ARM_SHORT_PTE_TYPE_LARGE :
+				ARM_SHORT_PTE_TYPE_SMALL;
+	if (prot & IOMMU_CACHE)
+		pteprot |=  ARM_SHORT_PTE_B | ARM_SHORT_PTE_C;
+	if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_XN) && (prot & IOMMU_NOEXEC)) {
+			pteprot |= large ? ARM_SHORT_PTE_LARGE_XN :
+				ARM_SHORT_PTE_SMALL_XN;
+	}
+	if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_PERMS)) {
+		pteprot |= ARM_SHORT_PTE_RD_WR;
+		if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
+			pteprot |= ARM_SHORT_PTE_RDONLY;
+	}
+	return pteprot;
+}
+
+static arm_short_iopte
+__arm_short_pgd_prot(struct arm_short_io_pgtable *data, int prot, bool super)
+{
+	arm_short_iopte pgdprot;
+	int quirk = data->iop.cfg.quirks;
+
+	pgdprot = ARM_SHORT_PGD_S | ARM_SHORT_PGD_nG;
+	pgdprot |= super ? ARM_SHORT_PGD_TYPE_SUPERSECTION :
+				ARM_SHORT_PGD_TYPE_SECTION;
+	if (prot & IOMMU_CACHE)
+		pgdprot |= ARM_SHORT_PGD_C | ARM_SHORT_PGD_B;
+	if (quirk & IO_PGTABLE_QUIRK_ARM_NS)
+		pgdprot |= ARM_SHORT_PGD_SECTION_NS;
+
+	if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_XN) && (prot & IOMMU_NOEXEC))
+			pgdprot |= ARM_SHORT_PGD_SECTION_XN;
+
+	if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_PERMS)) {
+		pgdprot |= ARM_SHORT_PGD_RD_WR;
+		if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
+			pgdprot |= ARM_SHORT_PGD_RDONLY;
+	}
+	return pgdprot;
+}
+
+static arm_short_iopte
+__arm_short_pte_prot_split(struct arm_short_io_pgtable *data,
+			   arm_short_iopte pgdprot,
+			   arm_short_iopte pteprot_large,
+			   bool large)
+{
+	arm_short_iopte pteprot = 0;
+
+	pteprot = ARM_SHORT_PTE_S | ARM_SHORT_PTE_nG | ARM_SHORT_PTE_RD_WR;
+	pteprot |= large ? ARM_SHORT_PTE_TYPE_LARGE :
+				ARM_SHORT_PTE_TYPE_SMALL;
+
+	/* large page to small page pte prot. Only large page may split */
+	if (!pgdprot && !large) {
+		pteprot |= pteprot_large & ~ARM_SHORT_PTE_SMALL_MSK;
+		if (pteprot_large & ARM_SHORT_PTE_LARGE_XN)
+			pteprot |= ARM_SHORT_PTE_SMALL_XN;
+	}
+
+	/* section to pte prot */
+	if (pgdprot & ARM_SHORT_PGD_C)
+		pteprot |= ARM_SHORT_PTE_C;
+	if (pgdprot & ARM_SHORT_PGD_B)
+		pteprot |= ARM_SHORT_PTE_B;
+	if (pgdprot & ARM_SHORT_PGD_nG)
+		pteprot |= ARM_SHORT_PTE_nG;
+	if (pgdprot & ARM_SHORT_PGD_SECTION_XN)
+		pteprot |= large ? ARM_SHORT_PTE_LARGE_XN :
+				ARM_SHORT_PTE_SMALL_XN;
+	if (pgdprot & ARM_SHORT_PGD_RD_WR)
+		pteprot |= ARM_SHORT_PTE_RD_WR;
+	if (pgdprot & ARM_SHORT_PGD_RDONLY)
+		pteprot |= ARM_SHORT_PTE_RDONLY;
+
+	return pteprot;
+}
+
+static arm_short_iopte
+__arm_short_pgtable_prot(struct arm_short_io_pgtable *data)
+{
+	arm_short_iopte pgdprot = 0;
+
+	pgdprot = ARM_SHORT_PGD_TYPE_PGTABLE;
+	if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS)
+		pgdprot |= ARM_SHORT_PGD_PGTABLE_NS;
+	return pgdprot;
+}
+
+static int
+_arm_short_map(struct arm_short_io_pgtable *data,
+	       unsigned int iova, phys_addr_t paddr,
+	       arm_short_iopte pgdprot, arm_short_iopte pteprot,
+	       bool large)
+{
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	arm_short_iopte *pgd = data->pgd, *pte;
+	void *pte_new = NULL;
+	int ret;
+
+	pgd += ARM_SHORT_PGD_IDX(iova);
+
+	if (!pteprot) { /* section or supersection */
+		pte = pgd;
+		pteprot = pgdprot;
+	} else {        /* page or largepage */
+		if (!(*pgd)) {
+			pte_new = __arm_short_alloc_pgtable(
+					ARM_SHORT_BYTES_PER_PTE,
+					GFP_ATOMIC, false, cfg);
+			if (unlikely(!pte_new))
+				return -ENOMEM;
+
+			pgdprot |= virt_to_phys(pte_new);
+			__arm_short_set_pte(pgd, pgdprot, 1, cfg);
+		}
+		pte = arm_short_get_pte_in_pgd(*pgd, iova);
+	}
+
+	pteprot |= (arm_short_iopte)paddr;
+	ret = __arm_short_set_pte(pte, pteprot, large ? 16 : 1, cfg);
+	if (ret && pte_new)
+		__arm_short_free_pgtable(pte_new, ARM_SHORT_BYTES_PER_PTE,
+					 false, cfg);
+	return ret;
+}
+
+static int arm_short_map(struct io_pgtable_ops *ops, unsigned long iova,
+			 phys_addr_t paddr, size_t size, int prot)
+{
+	struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	arm_short_iopte pgdprot = 0, pteprot = 0;
+	bool large;
+
+	/* If no access, then nothing to do */
+	if (!(prot & (IOMMU_READ | IOMMU_WRITE)))
+		return 0;
+
+	if (WARN_ON((iova | paddr) & (size - 1)))
+		return -EINVAL;
+
+	switch (size) {
+	case SZ_4K:
+	case SZ_64K:
+		large = (size == SZ_64K) ? true : false;
+		pteprot = __arm_short_pte_prot(data, prot, large);
+		pgdprot = __arm_short_pgtable_prot(data);
+		break;
+
+	case SZ_1M:
+	case SZ_16M:
+		large = (size == SZ_16M) ? true : false;
+		pgdprot = __arm_short_pgd_prot(data, prot, large);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return _arm_short_map(data, iova, paddr, pgdprot, pteprot, large);
+}
+
+static phys_addr_t arm_short_iova_to_phys(struct io_pgtable_ops *ops,
+					  unsigned long iova)
+{
+	struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	arm_short_iopte *pte, *pgd = data->pgd;
+	phys_addr_t pa = 0;
+
+	pgd += ARM_SHORT_PGD_IDX(iova);
+
+	if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(*pgd)) {
+		pte = arm_short_get_pte_in_pgd(*pgd, iova);
+
+		if (ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(*pte)) {
+			pa = (*pte) & ARM_SHORT_PTE_LARGE_MSK;
+			pa |= iova & ~ARM_SHORT_PTE_LARGE_MSK;
+		} else if (ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(*pte)) {
+			pa = (*pte) & ARM_SHORT_PTE_SMALL_MSK;
+			pa |= iova & ~ARM_SHORT_PTE_SMALL_MSK;
+		}
+	} else if (ARM_SHORT_PGD_TYPE_IS_SECTION(*pgd)) {
+		pa = (*pgd) & ARM_SHORT_PGD_SECTION_MSK;
+		pa |= iova & ~ARM_SHORT_PGD_SECTION_MSK;
+	} else if (ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(*pgd)) {
+		pa = (*pgd) & ARM_SHORT_PGD_SUPERSECTION_MSK;
+		pa |= iova & ~ARM_SHORT_PGD_SUPERSECTION_MSK;
+	}
+
+	return pa;
+}
+
+static bool _arm_short_whether_free_pgtable(arm_short_iopte *pgd)
+{
+	arm_short_iopte *pte;
+	int i;
+
+	pte = ARM_SHORT_GET_PGTABLE_VA(*pgd);
+	for (i = 0; i < ARM_SHORT_PTRS_PER_PTE; i++) {
+		if (pte[i] != 0)
+			return false;
+	}
+
+	return true;
+}
+
+static int
+arm_short_split_blk_unmap(struct io_pgtable_ops *ops, unsigned int iova,
+			  phys_addr_t paddr, size_t size,
+			  arm_short_iopte pgdprotup, arm_short_iopte pteprotup,
+			  size_t blk_size)
+{
+	struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	const struct iommu_gather_ops *tlb = data->iop.cfg.tlb;
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	unsigned long *pgbitmap = &cfg->pgsize_bitmap;
+	unsigned int blk_base, blk_start, blk_end, i;
+	arm_short_iopte pgdprot, pteprot;
+	phys_addr_t blk_paddr;
+	size_t mapsize = 0, nextmapsize;
+	int ret;
+
+	/* find the nearest mapsize */
+	for (i = find_first_bit(pgbitmap, BITS_PER_LONG);
+	     i < BITS_PER_LONG && ((1 << i) < blk_size) &&
+	     IS_ALIGNED(size, 1 << i);
+	     i = find_next_bit(pgbitmap, BITS_PER_LONG, i + 1))
+		mapsize = 1 << i;
+
+	if (WARN_ON(!mapsize))
+		return 0; /* Bytes unmapped */
+	nextmapsize = 1 << i;
+
+	blk_base = iova & ~(blk_size - 1);
+	blk_start = blk_base;
+	blk_end = blk_start + blk_size;
+	blk_paddr = paddr;
+
+	for (; blk_start < blk_end;
+	     blk_start += mapsize, blk_paddr += mapsize) {
+		/* Unmap! */
+		if (blk_start == iova)
+			continue;
+
+		/* Try to upper map */
+		if (blk_base != blk_start &&
+		    IS_ALIGNED(blk_start | blk_paddr, nextmapsize) &&
+		    mapsize != nextmapsize) {
+			mapsize = nextmapsize;
+			i = find_next_bit(pgbitmap, BITS_PER_LONG, i + 1);
+			if (i < BITS_PER_LONG)
+				nextmapsize = 1 << i;
+		}
+
+		if (mapsize == SZ_1M) {
+			pgdprot = pgdprotup;
+			pgdprot |= __arm_short_pgd_prot(data, 0, false);
+			pteprot = 0;
+		} else { /* small or large page */
+			pgdprot = (blk_size == SZ_64K) ? 0 : pgdprotup;
+			pteprot = __arm_short_pte_prot_split(
+					data, pgdprot, pteprotup,
+					mapsize == SZ_64K);
+			pgdprot = __arm_short_pgtable_prot(data);
+		}
+
+		ret = _arm_short_map(data, blk_start, blk_paddr, pgdprot,
+				     pteprot, mapsize == SZ_64K);
+		if (ret < 0) {
+			/* Free the table we allocated */
+			arm_short_iopte *pgd = data->pgd, *pte;
+
+			pgd += ARM_SHORT_PGD_IDX(blk_base);
+			if (*pgd) {
+				pte = ARM_SHORT_GET_PGTABLE_VA(*pgd);
+				__arm_short_set_pte(pgd, 0, 1, cfg);
+				tlb->tlb_add_flush(blk_base, blk_size, true,
+						   data->iop.cookie);
+				tlb->tlb_sync(data->iop.cookie);
+				__arm_short_free_pgtable(
+					pte, ARM_SHORT_BYTES_PER_PTE,
+					false, cfg);
+			}
+			return 0;/* Bytes unmapped */
+		}
+	}
+
+	tlb->tlb_add_flush(blk_base, blk_size, true, data->iop.cookie);
+	tlb->tlb_sync(data->iop.cookie);
+	return size;
+}
+
+static int arm_short_unmap(struct io_pgtable_ops *ops,
+			   unsigned long iova,
+			   size_t size)
+{
+	struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	arm_short_iopte *pgd, *pte = NULL;
+	arm_short_iopte curpgd, curpte = 0;
+	phys_addr_t paddr;
+	unsigned int iova_base, blk_size = 0;
+	void *cookie = data->iop.cookie;
+	bool pgtablefree = false;
+
+	pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova);
+
+	/* Get block size */
+	if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(*pgd)) {
+		pte = arm_short_get_pte_in_pgd(*pgd, iova);
+
+		if (ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(*pte))
+			blk_size = SZ_4K;
+		else if (ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(*pte))
+			blk_size = SZ_64K;
+		else
+			WARN_ON(1);
+	} else if (ARM_SHORT_PGD_TYPE_IS_SECTION(*pgd)) {
+		blk_size = SZ_1M;
+	} else if (ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(*pgd)) {
+		blk_size = SZ_16M;
+	} else {
+		WARN_ON(1);
+	}
+
+	iova_base = iova & ~(blk_size - 1);
+	pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova_base);
+	paddr = arm_short_iova_to_phys(ops, iova_base);
+	curpgd = *pgd;
+
+	if (blk_size == SZ_4K || blk_size == SZ_64K) {
+		pte = arm_short_get_pte_in_pgd(*pgd, iova_base);
+		curpte = *pte;
+		__arm_short_set_pte(pte, 0, blk_size / SZ_4K, cfg);
+
+		pgtablefree = _arm_short_whether_free_pgtable(pgd);
+		if (pgtablefree)
+			__arm_short_set_pte(pgd, 0, 1, cfg);
+	} else if (blk_size == SZ_1M || blk_size == SZ_16M) {
+		__arm_short_set_pte(pgd, 0, blk_size / SZ_1M, cfg);
+	}
+
+	cfg->tlb->tlb_add_flush(iova_base, blk_size, true, cookie);
+	cfg->tlb->tlb_sync(cookie);
+
+	if (pgtablefree)/* Free pgtable after tlb-flush */
+		__arm_short_free_pgtable(ARM_SHORT_GET_PGTABLE_VA(curpgd),
+					 ARM_SHORT_BYTES_PER_PTE, false, cfg);
+
+	if (blk_size > size) { /* Split the block */
+		return arm_short_split_blk_unmap(
+				ops, iova, paddr, size,
+				ARM_SHORT_PGD_GET_PROT(curpgd),
+				ARM_SHORT_PTE_LARGE_GET_PROT(curpte),
+				blk_size);
+	} else if (blk_size < size) {
+		/* Unmap the block while remap partial again after split */
+		return blk_size +
+			arm_short_unmap(ops, iova + blk_size, size - blk_size);
+	}
+
+	return size;
+}
+
+static struct io_pgtable *
+arm_short_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
+{
+	struct arm_short_io_pgtable *data;
+
+	if (cfg->ias > 32 || cfg->oas > 32)
+		return NULL;
+
+	cfg->pgsize_bitmap &=
+		(cfg->quirks & IO_PGTABLE_QUIRK_SHORT_SUPERSECTION) ?
+		(SZ_4K | SZ_64K | SZ_1M | SZ_16M) : (SZ_4K | SZ_64K | SZ_1M);
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return NULL;
+
+	data->pgd_size = SZ_16K;
+	data->pgd = __arm_short_alloc_pgtable(
+					data->pgd_size,
+					GFP_KERNEL | __GFP_ZERO | __GFP_DMA,
+					true, cfg);
+	if (!data->pgd)
+		goto out_free_data;
+	wmb();/* Ensure the empty pgd is visible before any actual TTBR write */
+
+	data->pgtable_cached = kmem_cache_create(
+					"io-pgtable-arm-short",
+					 ARM_SHORT_BYTES_PER_PTE,
+					 ARM_SHORT_BYTES_PER_PTE,
+					 0, NULL);
+	if (!data->pgtable_cached)
+		goto out_free_pgd;
+
+	/* TTBRs */
+	cfg->arm_short_cfg.ttbr[0] = virt_to_phys(data->pgd);
+	cfg->arm_short_cfg.ttbr[1] = 0;
+	cfg->arm_short_cfg.tcr = 0;
+	cfg->arm_short_cfg.nmrr = 0;
+	cfg->arm_short_cfg.prrr = 0;
+
+	data->iop.ops = (struct io_pgtable_ops) {
+		.map		= arm_short_map,
+		.unmap		= arm_short_unmap,
+		.iova_to_phys	= arm_short_iova_to_phys,
+	};
+
+	return &data->iop;
+
+out_free_pgd:
+	__arm_short_free_pgtable(data->pgd, data->pgd_size, true, cfg);
+out_free_data:
+	kfree(data);
+	return NULL;
+}
+
+static void arm_short_free_pgtable(struct io_pgtable *iop)
+{
+	struct arm_short_io_pgtable *data = io_pgtable_to_data(iop);
+
+	kmem_cache_destroy(data->pgtable_cached);
+	__arm_short_free_pgtable(data->pgd, data->pgd_size,
+				 true, &data->iop.cfg);
+	kfree(data);
+}
+
+struct io_pgtable_init_fns io_pgtable_arm_short_init_fns = {
+	.alloc	= arm_short_alloc_pgtable,
+	.free	= arm_short_free_pgtable,
+};
+
+#ifdef CONFIG_IOMMU_IO_PGTABLE_SHORT_SELFTEST
+
+static struct io_pgtable_cfg *cfg_cookie;
+
+static void dummy_tlb_flush_all(void *cookie)
+{
+	WARN_ON(cookie != cfg_cookie);
+}
+
+static void dummy_tlb_add_flush(unsigned long iova, size_t size, bool leaf,
+				void *cookie)
+{
+	WARN_ON(cookie != cfg_cookie);
+	WARN_ON(!(size & cfg_cookie->pgsize_bitmap));
+}
+
+static void dummy_tlb_sync(void *cookie)
+{
+	WARN_ON(cookie != cfg_cookie);
+}
+
+static struct iommu_gather_ops dummy_tlb_ops = {
+	.tlb_flush_all	= dummy_tlb_flush_all,
+	.tlb_add_flush	= dummy_tlb_add_flush,
+	.tlb_sync	= dummy_tlb_sync,
+};
+
+#define __FAIL(ops)	({				\
+		WARN(1, "selftest: test failed\n");	\
+		selftest_running = false;		\
+		-EFAULT;				\
+})
+
+static int __init arm_short_do_selftests(void)
+{
+	struct io_pgtable_ops *ops;
+	struct io_pgtable_cfg cfg = {
+		.tlb = &dummy_tlb_ops,
+		.oas = 32,
+		.ias = 32,
+		.quirks = IO_PGTABLE_QUIRK_ARM_NS |
+			IO_PGTABLE_QUIRK_SHORT_SUPERSECTION,
+		.pgsize_bitmap = SZ_4K | SZ_64K | SZ_1M | SZ_16M,
+	};
+	unsigned int iova, size, iova_start;
+	unsigned int i, loopnr = 0;
+
+	selftest_running = true;
+
+	cfg_cookie = &cfg;
+
+	ops = alloc_io_pgtable_ops(ARM_SHORT_DESC, &cfg, &cfg);
+	if (!ops) {
+		pr_err("Failed to alloc short desc io pgtable\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * Initial sanity checks.
+	 * Empty page tables shouldn't provide any translations.
+	 */
+	if (ops->iova_to_phys(ops, 42))
+		return __FAIL(ops);
+
+	if (ops->iova_to_phys(ops, SZ_1G + 42))
+		return __FAIL(ops);
+
+	if (ops->iova_to_phys(ops, SZ_2G + 42))
+		return __FAIL(ops);
+
+	/*
+	 * Distinct mappings of different granule sizes.
+	 */
+	iova = 0;
+	i = find_first_bit(&cfg.pgsize_bitmap, BITS_PER_LONG);
+	while (i != BITS_PER_LONG) {
+		size = 1UL << i;
+		if (ops->map(ops, iova, iova, size, IOMMU_READ |
+						    IOMMU_WRITE |
+						    IOMMU_NOEXEC |
+						    IOMMU_CACHE))
+			return __FAIL(ops);
+
+		/* Overlapping mappings */
+		if (!ops->map(ops, iova, iova + size, size,
+			      IOMMU_READ | IOMMU_NOEXEC))
+			return __FAIL(ops);
+
+		if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
+			return __FAIL(ops);
+
+		iova += SZ_16M;
+		i++;
+		i = find_next_bit(&cfg.pgsize_bitmap, BITS_PER_LONG, i);
+		loopnr++;
+	}
+
+	/* Partial unmap */
+	i = 1;
+	size = 1UL << __ffs(cfg.pgsize_bitmap);
+	while (i < loopnr) {
+		iova_start = i * SZ_16M;
+		if (ops->unmap(ops, iova_start + size, size) != size)
+			return __FAIL(ops);
+
+		/* Remap of partial unmap */
+		if (ops->map(ops, iova_start + size, size, size, IOMMU_READ))
+			return __FAIL(ops);
+
+		if (ops->iova_to_phys(ops, iova_start + size + 42)
+		    != (size + 42))
+			return __FAIL(ops);
+		i++;
+	}
+
+	/* Full unmap */
+	iova = 0;
+	i = find_first_bit(&cfg.pgsize_bitmap, BITS_PER_LONG);
+	while (i != BITS_PER_LONG) {
+		size = 1UL << i;
+
+		if (ops->unmap(ops, iova, size) != size)
+			return __FAIL(ops);
+
+		if (ops->iova_to_phys(ops, iova + 42))
+			return __FAIL(ops);
+
+		/* Remap full block */
+		if (ops->map(ops, iova, iova, size, IOMMU_WRITE))
+			return __FAIL(ops);
+
+		if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
+			return __FAIL(ops);
+
+		iova += SZ_16M;
+		i++;
+		i = find_next_bit(&cfg.pgsize_bitmap, BITS_PER_LONG, i);
+	}
+
+	free_io_pgtable_ops(ops);
+
+	selftest_running = false;
+	return 0;
+}
+
+subsys_initcall(arm_short_do_selftests);
+#endif
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index e4bc2b2..9978eca 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -38,9 +38,6 @@
 #define io_pgtable_to_data(x)						\
 	container_of((x), struct arm_lpae_io_pgtable, iop)
 
-#define io_pgtable_ops_to_pgtable(x)					\
-	container_of((x), struct io_pgtable, ops)
-
 #define io_pgtable_ops_to_data(x)					\
 	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
 
diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
index 6436fe2..14a9b3a 100644
--- a/drivers/iommu/io-pgtable.c
+++ b/drivers/iommu/io-pgtable.c
@@ -28,6 +28,7 @@ extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns;
 extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns;
 extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns;
 extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns;
+extern struct io_pgtable_init_fns io_pgtable_arm_short_init_fns;
 
 static const struct io_pgtable_init_fns *
 io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] =
@@ -38,6 +39,9 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] =
 	[ARM_64_LPAE_S1] = &io_pgtable_arm_64_lpae_s1_init_fns,
 	[ARM_64_LPAE_S2] = &io_pgtable_arm_64_lpae_s2_init_fns,
 #endif
+#ifdef CONFIG_IOMMU_IO_PGTABLE_SHORT
+	[ARM_SHORT_DESC] = &io_pgtable_arm_short_init_fns,
+#endif
 };
 
 struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
index 68c63d9..0f45e60 100644
--- a/drivers/iommu/io-pgtable.h
+++ b/drivers/iommu/io-pgtable.h
@@ -9,6 +9,7 @@ enum io_pgtable_fmt {
 	ARM_32_LPAE_S2,
 	ARM_64_LPAE_S1,
 	ARM_64_LPAE_S2,
+	ARM_SHORT_DESC,
 	IO_PGTABLE_NUM_FMTS,
 };
 
@@ -45,6 +46,9 @@ struct iommu_gather_ops {
  */
 struct io_pgtable_cfg {
 	#define IO_PGTABLE_QUIRK_ARM_NS	(1 << 0)	/* Set NS bit in PTEs */
+	#define IO_PGTABLE_QUIRK_SHORT_SUPERSECTION     BIT(1)
+	#define IO_PGTABLE_QUIRK_SHORT_NO_XN		BIT(2) /* No XN bit */
+	#define IO_PGTABLE_QUIRK_SHORT_NO_PERMS		BIT(3) /* No AP bit */
 	int				quirks;
 	unsigned long			pgsize_bitmap;
 	unsigned int			ias;
@@ -64,6 +68,13 @@ struct io_pgtable_cfg {
 			u64	vttbr;
 			u64	vtcr;
 		} arm_lpae_s2_cfg;
+
+		struct {
+			u32	ttbr[2];
+			u32	tcr;
+			u32	nmrr;
+			u32	prrr;
+		} arm_short_cfg;
 	};
 };
 
@@ -130,6 +141,9 @@ struct io_pgtable {
 	struct io_pgtable_ops	ops;
 };
 
+#define io_pgtable_ops_to_pgtable(x)		\
+	container_of((x), struct io_pgtable, ops)
+
 /**
  * struct io_pgtable_init_fns - Alloc/free a set of page tables for a
  *                              particular format.
-- 
1.8.1.1.dirty

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
@ 2015-08-03 10:21   ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-03 10:21 UTC (permalink / raw)
  To: linux-arm-kernel

This patch is for ARM Short Descriptor Format.

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
---
 drivers/iommu/Kconfig                |  18 +
 drivers/iommu/Makefile               |   1 +
 drivers/iommu/io-pgtable-arm-short.c | 813 +++++++++++++++++++++++++++++++++++
 drivers/iommu/io-pgtable-arm.c       |   3 -
 drivers/iommu/io-pgtable.c           |   4 +
 drivers/iommu/io-pgtable.h           |  14 +
 6 files changed, 850 insertions(+), 3 deletions(-)
 create mode 100644 drivers/iommu/io-pgtable-arm-short.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f1fb1d3..3abd066 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -39,6 +39,24 @@ config IOMMU_IO_PGTABLE_LPAE_SELFTEST
 
 	  If unsure, say N here.
 
+config IOMMU_IO_PGTABLE_SHORT
+	bool "ARMv7/v8 Short Descriptor Format"
+	select IOMMU_IO_PGTABLE
+	depends on ARM || ARM64 || COMPILE_TEST
+	help
+	  Enable support for the ARM Short-descriptor pagetable format.
+	  This allocator supports 2 levels translation tables which supports
+	  a memory map based on memory sections or pages.
+
+config IOMMU_IO_PGTABLE_SHORT_SELFTEST
+	bool "Short Descriptor selftests"
+	depends on IOMMU_IO_PGTABLE_SHORT
+	help
+	  Enable self-tests for Short-descriptor page table allocator.
+	  This performs a series of page-table consistency checks during boot.
+
+	  If unsure, say N here.
+
 endmenu
 
 config IOMMU_IOVA
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index c6dcc51..06df3e6 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
+obj-$(CONFIG_IOMMU_IO_PGTABLE_SHORT) += io-pgtable-arm-short.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
 obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o msm_iommu_dev.o
diff --git a/drivers/iommu/io-pgtable-arm-short.c b/drivers/iommu/io-pgtable-arm-short.c
new file mode 100644
index 0000000..56f5480
--- /dev/null
+++ b/drivers/iommu/io-pgtable-arm-short.c
@@ -0,0 +1,813 @@
+/*
+ * Copyright (c) 2014-2015 MediaTek Inc.
+ * Author: Yong Wu <yong.wu@mediatek.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#define pr_fmt(fmt)	"arm-short-desc io-pgtable: "fmt
+
+#include <linux/err.h>
+#include <linux/slab.h>
+#include <linux/iommu.h>
+#include <linux/errno.h>
+#include "io-pgtable.h"
+
+typedef u32 arm_short_iopte;
+
+struct arm_short_io_pgtable {
+	struct io_pgtable	iop;
+	struct kmem_cache	*pgtable_cached;
+	size_t			pgd_size;
+	void			*pgd;
+};
+
+#define io_pgtable_to_data(x)			\
+	container_of((x), struct arm_short_io_pgtable, iop)
+
+#define io_pgtable_ops_to_data(x)		\
+	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
+
+#define io_pgtable_cfg_to_pgtable(x)		\
+	container_of((x), struct io_pgtable, cfg)
+
+#define io_pgtable_cfg_to_data(x)		\
+	io_pgtable_to_data(io_pgtable_cfg_to_pgtable(x))
+
+#define ARM_SHORT_PGDIR_SHIFT			20
+#define ARM_SHORT_PAGE_SHIFT			12
+#define ARM_SHORT_PTRS_PER_PTE			\
+	(1 << (ARM_SHORT_PGDIR_SHIFT - ARM_SHORT_PAGE_SHIFT))
+#define ARM_SHORT_BYTES_PER_PTE			\
+	(ARM_SHORT_PTRS_PER_PTE * sizeof(arm_short_iopte))
+
+/* level 1 pagetable */
+#define ARM_SHORT_PGD_TYPE_PGTABLE		BIT(0)
+#define ARM_SHORT_PGD_TYPE_SECTION		BIT(1)
+#define ARM_SHORT_PGD_B				BIT(2)
+#define ARM_SHORT_PGD_C				BIT(3)
+#define ARM_SHORT_PGD_PGTABLE_NS		BIT(3)
+#define ARM_SHORT_PGD_SECTION_XN		BIT(4)
+#define ARM_SHORT_PGD_IMPLE			BIT(9)
+#define ARM_SHORT_PGD_RD_WR			(3 << 10)
+#define ARM_SHORT_PGD_RDONLY			BIT(15)
+#define ARM_SHORT_PGD_S				BIT(16)
+#define ARM_SHORT_PGD_nG			BIT(17)
+#define ARM_SHORT_PGD_SUPERSECTION		BIT(18)
+#define ARM_SHORT_PGD_SECTION_NS		BIT(19)
+
+#define ARM_SHORT_PGD_TYPE_SUPERSECTION		\
+	(ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_SUPERSECTION)
+#define ARM_SHORT_PGD_SECTION_TYPE_MSK		\
+	(ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_SUPERSECTION)
+#define ARM_SHORT_PGD_PGTABLE_TYPE_MSK		\
+	(ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_TYPE_PGTABLE)
+#define ARM_SHORT_PGD_TYPE_IS_PGTABLE(pgd)	\
+	(((pgd) & ARM_SHORT_PGD_PGTABLE_TYPE_MSK) == ARM_SHORT_PGD_TYPE_PGTABLE)
+#define ARM_SHORT_PGD_TYPE_IS_SECTION(pgd)	\
+	(((pgd) & ARM_SHORT_PGD_SECTION_TYPE_MSK) == ARM_SHORT_PGD_TYPE_SECTION)
+#define ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(pgd)	\
+	(((pgd) & ARM_SHORT_PGD_SECTION_TYPE_MSK) == \
+	ARM_SHORT_PGD_TYPE_SUPERSECTION)
+#define ARM_SHORT_PGD_PGTABLE_MSK		0xfffffc00
+#define ARM_SHORT_PGD_SECTION_MSK		(~(SZ_1M - 1))
+#define ARM_SHORT_PGD_SUPERSECTION_MSK		(~(SZ_16M - 1))
+
+/* level 2 pagetable */
+#define ARM_SHORT_PTE_TYPE_LARGE		BIT(0)
+#define ARM_SHORT_PTE_SMALL_XN			BIT(0)
+#define ARM_SHORT_PTE_TYPE_SMALL		BIT(1)
+#define ARM_SHORT_PTE_B				BIT(2)
+#define ARM_SHORT_PTE_C				BIT(3)
+#define ARM_SHORT_PTE_RD_WR			(3 << 4)
+#define ARM_SHORT_PTE_RDONLY			BIT(9)
+#define ARM_SHORT_PTE_S				BIT(10)
+#define ARM_SHORT_PTE_nG			BIT(11)
+#define ARM_SHORT_PTE_LARGE_XN			BIT(15)
+#define ARM_SHORT_PTE_LARGE_MSK			(~(SZ_64K - 1))
+#define ARM_SHORT_PTE_SMALL_MSK			(~(SZ_4K - 1))
+#define ARM_SHORT_PTE_TYPE_MSK			\
+	(ARM_SHORT_PTE_TYPE_LARGE | ARM_SHORT_PTE_TYPE_SMALL)
+#define ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(pte)	\
+	(((pte) & ARM_SHORT_PTE_TYPE_SMALL) == ARM_SHORT_PTE_TYPE_SMALL)
+#define ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(pte)	\
+	(((pte) & ARM_SHORT_PTE_TYPE_MSK) == ARM_SHORT_PTE_TYPE_LARGE)
+
+#define ARM_SHORT_PGD_IDX(a)			((a) >> ARM_SHORT_PGDIR_SHIFT)
+#define ARM_SHORT_PTE_IDX(a)			\
+	(((a) >> ARM_SHORT_PAGE_SHIFT) & (ARM_SHORT_PTRS_PER_PTE - 1))
+
+#define ARM_SHORT_GET_PGTABLE_VA(pgd)		\
+	(phys_to_virt((unsigned long)pgd & ARM_SHORT_PGD_PGTABLE_MSK))
+
+#define ARM_SHORT_PTE_LARGE_GET_PROT(pte)	\
+	(((pte) & (~ARM_SHORT_PTE_LARGE_MSK)) & ~ARM_SHORT_PTE_TYPE_MSK)
+
+#define ARM_SHORT_PGD_GET_PROT(pgd)		\
+	(((pgd) & (~ARM_SHORT_PGD_SECTION_MSK)) & ~ARM_SHORT_PGD_SUPERSECTION)
+
+static bool selftest_running;
+
+static arm_short_iopte *
+arm_short_get_pte_in_pgd(arm_short_iopte pgd, unsigned int iova)
+{
+	arm_short_iopte *pte;
+
+	pte = ARM_SHORT_GET_PGTABLE_VA(pgd);
+	pte += ARM_SHORT_PTE_IDX(iova);
+	return pte;
+}
+
+static dma_addr_t
+__arm_short_dma_addr(struct device *dev, void *va)
+{
+	return phys_to_dma(dev, virt_to_phys(va));
+}
+
+static int
+__arm_short_set_pte(arm_short_iopte *ptep, arm_short_iopte pte,
+		    unsigned int ptenr, struct io_pgtable_cfg *cfg)
+{
+	struct device *dev = cfg->iommu_dev;
+	int i;
+
+	for (i = 0; i < ptenr; i++) {
+		if (ptep[i] && pte) {
+			/* Someone else may have allocated for this pte */
+			WARN_ON(!selftest_running);
+			goto err_exist_pte;
+		}
+		ptep[i] = pte;
+	}
+
+	if (selftest_running)
+		return 0;
+
+	dma_sync_single_for_device(dev, __arm_short_dma_addr(dev, ptep),
+				   sizeof(*ptep) * ptenr, DMA_TO_DEVICE);
+	return 0;
+
+err_exist_pte:
+	while (i--)
+		ptep[i] = 0;
+	return -EEXIST;
+}
+
+static void *
+__arm_short_alloc_pgtable(size_t size, gfp_t gfp, bool pgd,
+			  struct io_pgtable_cfg *cfg)
+{
+	struct arm_short_io_pgtable *data;
+	struct device *dev = cfg->iommu_dev;
+	dma_addr_t dma;
+	void *va;
+
+	if (pgd) {/* lvl1 pagetable */
+		va = alloc_pages_exact(size, gfp);
+	} else {  /* lvl2 pagetable */
+		data = io_pgtable_cfg_to_data(cfg);
+		va = kmem_cache_zalloc(data->pgtable_cached, gfp);
+	}
+
+	if (!va)
+		return NULL;
+
+	if (selftest_running)
+		return va;
+
+	dma = dma_map_single(dev, va, size, DMA_TO_DEVICE);
+	if (dma_mapping_error(dev, dma))
+		goto out_free;
+
+	if (dma != __arm_short_dma_addr(dev, va))
+		goto out_unmap;
+
+	if (!pgd) {
+		kmemleak_ignore(va);
+		dma_sync_single_for_device(dev, __arm_short_dma_addr(dev, va),
+					   size, DMA_TO_DEVICE);
+	}
+
+	return va;
+
+out_unmap:
+	dev_err_ratelimited(dev, "Cannot accommodate DMA translation for IOMMU page tables\n");
+	dma_unmap_single(dev, dma, size, DMA_TO_DEVICE);
+out_free:
+	if (pgd)
+		free_pages_exact(va, size);
+	else
+		kmem_cache_free(data->pgtable_cached, va);
+	return NULL;
+}
+
+static void
+__arm_short_free_pgtable(void *va, size_t size, bool pgd,
+			 struct io_pgtable_cfg *cfg)
+{
+	struct arm_short_io_pgtable *data = io_pgtable_cfg_to_data(cfg);
+	struct device *dev = cfg->iommu_dev;
+
+	if (!selftest_running)
+		dma_unmap_single(dev, __arm_short_dma_addr(dev, va),
+				 size, DMA_TO_DEVICE);
+
+	if (pgd)
+		free_pages_exact(va, size);
+	else
+		kmem_cache_free(data->pgtable_cached, va);
+}
+
+static arm_short_iopte
+__arm_short_pte_prot(struct arm_short_io_pgtable *data, int prot, bool large)
+{
+	arm_short_iopte pteprot;
+	int quirk = data->iop.cfg.quirks;
+
+	pteprot = ARM_SHORT_PTE_S | ARM_SHORT_PTE_nG;
+	pteprot |= large ? ARM_SHORT_PTE_TYPE_LARGE :
+				ARM_SHORT_PTE_TYPE_SMALL;
+	if (prot & IOMMU_CACHE)
+		pteprot |=  ARM_SHORT_PTE_B | ARM_SHORT_PTE_C;
+	if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_XN) && (prot & IOMMU_NOEXEC)) {
+			pteprot |= large ? ARM_SHORT_PTE_LARGE_XN :
+				ARM_SHORT_PTE_SMALL_XN;
+	}
+	if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_PERMS)) {
+		pteprot |= ARM_SHORT_PTE_RD_WR;
+		if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
+			pteprot |= ARM_SHORT_PTE_RDONLY;
+	}
+	return pteprot;
+}
+
+static arm_short_iopte
+__arm_short_pgd_prot(struct arm_short_io_pgtable *data, int prot, bool super)
+{
+	arm_short_iopte pgdprot;
+	int quirk = data->iop.cfg.quirks;
+
+	pgdprot = ARM_SHORT_PGD_S | ARM_SHORT_PGD_nG;
+	pgdprot |= super ? ARM_SHORT_PGD_TYPE_SUPERSECTION :
+				ARM_SHORT_PGD_TYPE_SECTION;
+	if (prot & IOMMU_CACHE)
+		pgdprot |= ARM_SHORT_PGD_C | ARM_SHORT_PGD_B;
+	if (quirk & IO_PGTABLE_QUIRK_ARM_NS)
+		pgdprot |= ARM_SHORT_PGD_SECTION_NS;
+
+	if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_XN) && (prot & IOMMU_NOEXEC))
+			pgdprot |= ARM_SHORT_PGD_SECTION_XN;
+
+	if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_PERMS)) {
+		pgdprot |= ARM_SHORT_PGD_RD_WR;
+		if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
+			pgdprot |= ARM_SHORT_PGD_RDONLY;
+	}
+	return pgdprot;
+}
+
+static arm_short_iopte
+__arm_short_pte_prot_split(struct arm_short_io_pgtable *data,
+			   arm_short_iopte pgdprot,
+			   arm_short_iopte pteprot_large,
+			   bool large)
+{
+	arm_short_iopte pteprot = 0;
+
+	pteprot = ARM_SHORT_PTE_S | ARM_SHORT_PTE_nG | ARM_SHORT_PTE_RD_WR;
+	pteprot |= large ? ARM_SHORT_PTE_TYPE_LARGE :
+				ARM_SHORT_PTE_TYPE_SMALL;
+
+	/* large page to small page pte prot. Only large page may split */
+	if (!pgdprot && !large) {
+		pteprot |= pteprot_large & ~ARM_SHORT_PTE_SMALL_MSK;
+		if (pteprot_large & ARM_SHORT_PTE_LARGE_XN)
+			pteprot |= ARM_SHORT_PTE_SMALL_XN;
+	}
+
+	/* section to pte prot */
+	if (pgdprot & ARM_SHORT_PGD_C)
+		pteprot |= ARM_SHORT_PTE_C;
+	if (pgdprot & ARM_SHORT_PGD_B)
+		pteprot |= ARM_SHORT_PTE_B;
+	if (pgdprot & ARM_SHORT_PGD_nG)
+		pteprot |= ARM_SHORT_PTE_nG;
+	if (pgdprot & ARM_SHORT_PGD_SECTION_XN)
+		pteprot |= large ? ARM_SHORT_PTE_LARGE_XN :
+				ARM_SHORT_PTE_SMALL_XN;
+	if (pgdprot & ARM_SHORT_PGD_RD_WR)
+		pteprot |= ARM_SHORT_PTE_RD_WR;
+	if (pgdprot & ARM_SHORT_PGD_RDONLY)
+		pteprot |= ARM_SHORT_PTE_RDONLY;
+
+	return pteprot;
+}
+
+static arm_short_iopte
+__arm_short_pgtable_prot(struct arm_short_io_pgtable *data)
+{
+	arm_short_iopte pgdprot = 0;
+
+	pgdprot = ARM_SHORT_PGD_TYPE_PGTABLE;
+	if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS)
+		pgdprot |= ARM_SHORT_PGD_PGTABLE_NS;
+	return pgdprot;
+}
+
+static int
+_arm_short_map(struct arm_short_io_pgtable *data,
+	       unsigned int iova, phys_addr_t paddr,
+	       arm_short_iopte pgdprot, arm_short_iopte pteprot,
+	       bool large)
+{
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	arm_short_iopte *pgd = data->pgd, *pte;
+	void *pte_new = NULL;
+	int ret;
+
+	pgd += ARM_SHORT_PGD_IDX(iova);
+
+	if (!pteprot) { /* section or supersection */
+		pte = pgd;
+		pteprot = pgdprot;
+	} else {        /* page or largepage */
+		if (!(*pgd)) {
+			pte_new = __arm_short_alloc_pgtable(
+					ARM_SHORT_BYTES_PER_PTE,
+					GFP_ATOMIC, false, cfg);
+			if (unlikely(!pte_new))
+				return -ENOMEM;
+
+			pgdprot |= virt_to_phys(pte_new);
+			__arm_short_set_pte(pgd, pgdprot, 1, cfg);
+		}
+		pte = arm_short_get_pte_in_pgd(*pgd, iova);
+	}
+
+	pteprot |= (arm_short_iopte)paddr;
+	ret = __arm_short_set_pte(pte, pteprot, large ? 16 : 1, cfg);
+	if (ret && pte_new)
+		__arm_short_free_pgtable(pte_new, ARM_SHORT_BYTES_PER_PTE,
+					 false, cfg);
+	return ret;
+}
+
+static int arm_short_map(struct io_pgtable_ops *ops, unsigned long iova,
+			 phys_addr_t paddr, size_t size, int prot)
+{
+	struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	arm_short_iopte pgdprot = 0, pteprot = 0;
+	bool large;
+
+	/* If no access, then nothing to do */
+	if (!(prot & (IOMMU_READ | IOMMU_WRITE)))
+		return 0;
+
+	if (WARN_ON((iova | paddr) & (size - 1)))
+		return -EINVAL;
+
+	switch (size) {
+	case SZ_4K:
+	case SZ_64K:
+		large = (size == SZ_64K) ? true : false;
+		pteprot = __arm_short_pte_prot(data, prot, large);
+		pgdprot = __arm_short_pgtable_prot(data);
+		break;
+
+	case SZ_1M:
+	case SZ_16M:
+		large = (size == SZ_16M) ? true : false;
+		pgdprot = __arm_short_pgd_prot(data, prot, large);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return _arm_short_map(data, iova, paddr, pgdprot, pteprot, large);
+}
+
+static phys_addr_t arm_short_iova_to_phys(struct io_pgtable_ops *ops,
+					  unsigned long iova)
+{
+	struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	arm_short_iopte *pte, *pgd = data->pgd;
+	phys_addr_t pa = 0;
+
+	pgd += ARM_SHORT_PGD_IDX(iova);
+
+	if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(*pgd)) {
+		pte = arm_short_get_pte_in_pgd(*pgd, iova);
+
+		if (ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(*pte)) {
+			pa = (*pte) & ARM_SHORT_PTE_LARGE_MSK;
+			pa |= iova & ~ARM_SHORT_PTE_LARGE_MSK;
+		} else if (ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(*pte)) {
+			pa = (*pte) & ARM_SHORT_PTE_SMALL_MSK;
+			pa |= iova & ~ARM_SHORT_PTE_SMALL_MSK;
+		}
+	} else if (ARM_SHORT_PGD_TYPE_IS_SECTION(*pgd)) {
+		pa = (*pgd) & ARM_SHORT_PGD_SECTION_MSK;
+		pa |= iova & ~ARM_SHORT_PGD_SECTION_MSK;
+	} else if (ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(*pgd)) {
+		pa = (*pgd) & ARM_SHORT_PGD_SUPERSECTION_MSK;
+		pa |= iova & ~ARM_SHORT_PGD_SUPERSECTION_MSK;
+	}
+
+	return pa;
+}
+
+static bool _arm_short_whether_free_pgtable(arm_short_iopte *pgd)
+{
+	arm_short_iopte *pte;
+	int i;
+
+	pte = ARM_SHORT_GET_PGTABLE_VA(*pgd);
+	for (i = 0; i < ARM_SHORT_PTRS_PER_PTE; i++) {
+		if (pte[i] != 0)
+			return false;
+	}
+
+	return true;
+}
+
+static int
+arm_short_split_blk_unmap(struct io_pgtable_ops *ops, unsigned int iova,
+			  phys_addr_t paddr, size_t size,
+			  arm_short_iopte pgdprotup, arm_short_iopte pteprotup,
+			  size_t blk_size)
+{
+	struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	const struct iommu_gather_ops *tlb = data->iop.cfg.tlb;
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	unsigned long *pgbitmap = &cfg->pgsize_bitmap;
+	unsigned int blk_base, blk_start, blk_end, i;
+	arm_short_iopte pgdprot, pteprot;
+	phys_addr_t blk_paddr;
+	size_t mapsize = 0, nextmapsize;
+	int ret;
+
+	/* find the nearest mapsize */
+	for (i = find_first_bit(pgbitmap, BITS_PER_LONG);
+	     i < BITS_PER_LONG && ((1 << i) < blk_size) &&
+	     IS_ALIGNED(size, 1 << i);
+	     i = find_next_bit(pgbitmap, BITS_PER_LONG, i + 1))
+		mapsize = 1 << i;
+
+	if (WARN_ON(!mapsize))
+		return 0; /* Bytes unmapped */
+	nextmapsize = 1 << i;
+
+	blk_base = iova & ~(blk_size - 1);
+	blk_start = blk_base;
+	blk_end = blk_start + blk_size;
+	blk_paddr = paddr;
+
+	for (; blk_start < blk_end;
+	     blk_start += mapsize, blk_paddr += mapsize) {
+		/* Unmap! */
+		if (blk_start == iova)
+			continue;
+
+		/* Try to upper map */
+		if (blk_base != blk_start &&
+		    IS_ALIGNED(blk_start | blk_paddr, nextmapsize) &&
+		    mapsize != nextmapsize) {
+			mapsize = nextmapsize;
+			i = find_next_bit(pgbitmap, BITS_PER_LONG, i + 1);
+			if (i < BITS_PER_LONG)
+				nextmapsize = 1 << i;
+		}
+
+		if (mapsize == SZ_1M) {
+			pgdprot = pgdprotup;
+			pgdprot |= __arm_short_pgd_prot(data, 0, false);
+			pteprot = 0;
+		} else { /* small or large page */
+			pgdprot = (blk_size == SZ_64K) ? 0 : pgdprotup;
+			pteprot = __arm_short_pte_prot_split(
+					data, pgdprot, pteprotup,
+					mapsize == SZ_64K);
+			pgdprot = __arm_short_pgtable_prot(data);
+		}
+
+		ret = _arm_short_map(data, blk_start, blk_paddr, pgdprot,
+				     pteprot, mapsize == SZ_64K);
+		if (ret < 0) {
+			/* Free the table we allocated */
+			arm_short_iopte *pgd = data->pgd, *pte;
+
+			pgd += ARM_SHORT_PGD_IDX(blk_base);
+			if (*pgd) {
+				pte = ARM_SHORT_GET_PGTABLE_VA(*pgd);
+				__arm_short_set_pte(pgd, 0, 1, cfg);
+				tlb->tlb_add_flush(blk_base, blk_size, true,
+						   data->iop.cookie);
+				tlb->tlb_sync(data->iop.cookie);
+				__arm_short_free_pgtable(
+					pte, ARM_SHORT_BYTES_PER_PTE,
+					false, cfg);
+			}
+			return 0;/* Bytes unmapped */
+		}
+	}
+
+	tlb->tlb_add_flush(blk_base, blk_size, true, data->iop.cookie);
+	tlb->tlb_sync(data->iop.cookie);
+	return size;
+}
+
+static int arm_short_unmap(struct io_pgtable_ops *ops,
+			   unsigned long iova,
+			   size_t size)
+{
+	struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+	arm_short_iopte *pgd, *pte = NULL;
+	arm_short_iopte curpgd, curpte = 0;
+	phys_addr_t paddr;
+	unsigned int iova_base, blk_size = 0;
+	void *cookie = data->iop.cookie;
+	bool pgtablefree = false;
+
+	pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova);
+
+	/* Get block size */
+	if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(*pgd)) {
+		pte = arm_short_get_pte_in_pgd(*pgd, iova);
+
+		if (ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(*pte))
+			blk_size = SZ_4K;
+		else if (ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(*pte))
+			blk_size = SZ_64K;
+		else
+			WARN_ON(1);
+	} else if (ARM_SHORT_PGD_TYPE_IS_SECTION(*pgd)) {
+		blk_size = SZ_1M;
+	} else if (ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(*pgd)) {
+		blk_size = SZ_16M;
+	} else {
+		WARN_ON(1);
+	}
+
+	iova_base = iova & ~(blk_size - 1);
+	pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova_base);
+	paddr = arm_short_iova_to_phys(ops, iova_base);
+	curpgd = *pgd;
+
+	if (blk_size == SZ_4K || blk_size == SZ_64K) {
+		pte = arm_short_get_pte_in_pgd(*pgd, iova_base);
+		curpte = *pte;
+		__arm_short_set_pte(pte, 0, blk_size / SZ_4K, cfg);
+
+		pgtablefree = _arm_short_whether_free_pgtable(pgd);
+		if (pgtablefree)
+			__arm_short_set_pte(pgd, 0, 1, cfg);
+	} else if (blk_size == SZ_1M || blk_size == SZ_16M) {
+		__arm_short_set_pte(pgd, 0, blk_size / SZ_1M, cfg);
+	}
+
+	cfg->tlb->tlb_add_flush(iova_base, blk_size, true, cookie);
+	cfg->tlb->tlb_sync(cookie);
+
+	if (pgtablefree)/* Free pgtable after tlb-flush */
+		__arm_short_free_pgtable(ARM_SHORT_GET_PGTABLE_VA(curpgd),
+					 ARM_SHORT_BYTES_PER_PTE, false, cfg);
+
+	if (blk_size > size) { /* Split the block */
+		return arm_short_split_blk_unmap(
+				ops, iova, paddr, size,
+				ARM_SHORT_PGD_GET_PROT(curpgd),
+				ARM_SHORT_PTE_LARGE_GET_PROT(curpte),
+				blk_size);
+	} else if (blk_size < size) {
+		/* Unmap the block while remap partial again after split */
+		return blk_size +
+			arm_short_unmap(ops, iova + blk_size, size - blk_size);
+	}
+
+	return size;
+}
+
+static struct io_pgtable *
+arm_short_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
+{
+	struct arm_short_io_pgtable *data;
+
+	if (cfg->ias > 32 || cfg->oas > 32)
+		return NULL;
+
+	cfg->pgsize_bitmap &=
+		(cfg->quirks & IO_PGTABLE_QUIRK_SHORT_SUPERSECTION) ?
+		(SZ_4K | SZ_64K | SZ_1M | SZ_16M) : (SZ_4K | SZ_64K | SZ_1M);
+
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return NULL;
+
+	data->pgd_size = SZ_16K;
+	data->pgd = __arm_short_alloc_pgtable(
+					data->pgd_size,
+					GFP_KERNEL | __GFP_ZERO | __GFP_DMA,
+					true, cfg);
+	if (!data->pgd)
+		goto out_free_data;
+	wmb();/* Ensure the empty pgd is visible before any actual TTBR write */
+
+	data->pgtable_cached = kmem_cache_create(
+					"io-pgtable-arm-short",
+					 ARM_SHORT_BYTES_PER_PTE,
+					 ARM_SHORT_BYTES_PER_PTE,
+					 0, NULL);
+	if (!data->pgtable_cached)
+		goto out_free_pgd;
+
+	/* TTBRs */
+	cfg->arm_short_cfg.ttbr[0] = virt_to_phys(data->pgd);
+	cfg->arm_short_cfg.ttbr[1] = 0;
+	cfg->arm_short_cfg.tcr = 0;
+	cfg->arm_short_cfg.nmrr = 0;
+	cfg->arm_short_cfg.prrr = 0;
+
+	data->iop.ops = (struct io_pgtable_ops) {
+		.map		= arm_short_map,
+		.unmap		= arm_short_unmap,
+		.iova_to_phys	= arm_short_iova_to_phys,
+	};
+
+	return &data->iop;
+
+out_free_pgd:
+	__arm_short_free_pgtable(data->pgd, data->pgd_size, true, cfg);
+out_free_data:
+	kfree(data);
+	return NULL;
+}
+
+static void arm_short_free_pgtable(struct io_pgtable *iop)
+{
+	struct arm_short_io_pgtable *data = io_pgtable_to_data(iop);
+
+	kmem_cache_destroy(data->pgtable_cached);
+	__arm_short_free_pgtable(data->pgd, data->pgd_size,
+				 true, &data->iop.cfg);
+	kfree(data);
+}
+
+struct io_pgtable_init_fns io_pgtable_arm_short_init_fns = {
+	.alloc	= arm_short_alloc_pgtable,
+	.free	= arm_short_free_pgtable,
+};
+
+#ifdef CONFIG_IOMMU_IO_PGTABLE_SHORT_SELFTEST
+
+static struct io_pgtable_cfg *cfg_cookie;
+
+static void dummy_tlb_flush_all(void *cookie)
+{
+	WARN_ON(cookie != cfg_cookie);
+}
+
+static void dummy_tlb_add_flush(unsigned long iova, size_t size, bool leaf,
+				void *cookie)
+{
+	WARN_ON(cookie != cfg_cookie);
+	WARN_ON(!(size & cfg_cookie->pgsize_bitmap));
+}
+
+static void dummy_tlb_sync(void *cookie)
+{
+	WARN_ON(cookie != cfg_cookie);
+}
+
+static struct iommu_gather_ops dummy_tlb_ops = {
+	.tlb_flush_all	= dummy_tlb_flush_all,
+	.tlb_add_flush	= dummy_tlb_add_flush,
+	.tlb_sync	= dummy_tlb_sync,
+};
+
+#define __FAIL(ops)	({				\
+		WARN(1, "selftest: test failed\n");	\
+		selftest_running = false;		\
+		-EFAULT;				\
+})
+
+static int __init arm_short_do_selftests(void)
+{
+	struct io_pgtable_ops *ops;
+	struct io_pgtable_cfg cfg = {
+		.tlb = &dummy_tlb_ops,
+		.oas = 32,
+		.ias = 32,
+		.quirks = IO_PGTABLE_QUIRK_ARM_NS |
+			IO_PGTABLE_QUIRK_SHORT_SUPERSECTION,
+		.pgsize_bitmap = SZ_4K | SZ_64K | SZ_1M | SZ_16M,
+	};
+	unsigned int iova, size, iova_start;
+	unsigned int i, loopnr = 0;
+
+	selftest_running = true;
+
+	cfg_cookie = &cfg;
+
+	ops = alloc_io_pgtable_ops(ARM_SHORT_DESC, &cfg, &cfg);
+	if (!ops) {
+		pr_err("Failed to alloc short desc io pgtable\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * Initial sanity checks.
+	 * Empty page tables shouldn't provide any translations.
+	 */
+	if (ops->iova_to_phys(ops, 42))
+		return __FAIL(ops);
+
+	if (ops->iova_to_phys(ops, SZ_1G + 42))
+		return __FAIL(ops);
+
+	if (ops->iova_to_phys(ops, SZ_2G + 42))
+		return __FAIL(ops);
+
+	/*
+	 * Distinct mappings of different granule sizes.
+	 */
+	iova = 0;
+	i = find_first_bit(&cfg.pgsize_bitmap, BITS_PER_LONG);
+	while (i != BITS_PER_LONG) {
+		size = 1UL << i;
+		if (ops->map(ops, iova, iova, size, IOMMU_READ |
+						    IOMMU_WRITE |
+						    IOMMU_NOEXEC |
+						    IOMMU_CACHE))
+			return __FAIL(ops);
+
+		/* Overlapping mappings */
+		if (!ops->map(ops, iova, iova + size, size,
+			      IOMMU_READ | IOMMU_NOEXEC))
+			return __FAIL(ops);
+
+		if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
+			return __FAIL(ops);
+
+		iova += SZ_16M;
+		i++;
+		i = find_next_bit(&cfg.pgsize_bitmap, BITS_PER_LONG, i);
+		loopnr++;
+	}
+
+	/* Partial unmap */
+	i = 1;
+	size = 1UL << __ffs(cfg.pgsize_bitmap);
+	while (i < loopnr) {
+		iova_start = i * SZ_16M;
+		if (ops->unmap(ops, iova_start + size, size) != size)
+			return __FAIL(ops);
+
+		/* Remap of partial unmap */
+		if (ops->map(ops, iova_start + size, size, size, IOMMU_READ))
+			return __FAIL(ops);
+
+		if (ops->iova_to_phys(ops, iova_start + size + 42)
+		    != (size + 42))
+			return __FAIL(ops);
+		i++;
+	}
+
+	/* Full unmap */
+	iova = 0;
+	i = find_first_bit(&cfg.pgsize_bitmap, BITS_PER_LONG);
+	while (i != BITS_PER_LONG) {
+		size = 1UL << i;
+
+		if (ops->unmap(ops, iova, size) != size)
+			return __FAIL(ops);
+
+		if (ops->iova_to_phys(ops, iova + 42))
+			return __FAIL(ops);
+
+		/* Remap full block */
+		if (ops->map(ops, iova, iova, size, IOMMU_WRITE))
+			return __FAIL(ops);
+
+		if (ops->iova_to_phys(ops, iova + 42) != (iova + 42))
+			return __FAIL(ops);
+
+		iova += SZ_16M;
+		i++;
+		i = find_next_bit(&cfg.pgsize_bitmap, BITS_PER_LONG, i);
+	}
+
+	free_io_pgtable_ops(ops);
+
+	selftest_running = false;
+	return 0;
+}
+
+subsys_initcall(arm_short_do_selftests);
+#endif
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index e4bc2b2..9978eca 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -38,9 +38,6 @@
 #define io_pgtable_to_data(x)						\
 	container_of((x), struct arm_lpae_io_pgtable, iop)
 
-#define io_pgtable_ops_to_pgtable(x)					\
-	container_of((x), struct io_pgtable, ops)
-
 #define io_pgtable_ops_to_data(x)					\
 	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
 
diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
index 6436fe2..14a9b3a 100644
--- a/drivers/iommu/io-pgtable.c
+++ b/drivers/iommu/io-pgtable.c
@@ -28,6 +28,7 @@ extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns;
 extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns;
 extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns;
 extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns;
+extern struct io_pgtable_init_fns io_pgtable_arm_short_init_fns;
 
 static const struct io_pgtable_init_fns *
 io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] =
@@ -38,6 +39,9 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] =
 	[ARM_64_LPAE_S1] = &io_pgtable_arm_64_lpae_s1_init_fns,
 	[ARM_64_LPAE_S2] = &io_pgtable_arm_64_lpae_s2_init_fns,
 #endif
+#ifdef CONFIG_IOMMU_IO_PGTABLE_SHORT
+	[ARM_SHORT_DESC] = &io_pgtable_arm_short_init_fns,
+#endif
 };
 
 struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
index 68c63d9..0f45e60 100644
--- a/drivers/iommu/io-pgtable.h
+++ b/drivers/iommu/io-pgtable.h
@@ -9,6 +9,7 @@ enum io_pgtable_fmt {
 	ARM_32_LPAE_S2,
 	ARM_64_LPAE_S1,
 	ARM_64_LPAE_S2,
+	ARM_SHORT_DESC,
 	IO_PGTABLE_NUM_FMTS,
 };
 
@@ -45,6 +46,9 @@ struct iommu_gather_ops {
  */
 struct io_pgtable_cfg {
 	#define IO_PGTABLE_QUIRK_ARM_NS	(1 << 0)	/* Set NS bit in PTEs */
+	#define IO_PGTABLE_QUIRK_SHORT_SUPERSECTION     BIT(1)
+	#define IO_PGTABLE_QUIRK_SHORT_NO_XN		BIT(2) /* No XN bit */
+	#define IO_PGTABLE_QUIRK_SHORT_NO_PERMS		BIT(3) /* No AP bit */
 	int				quirks;
 	unsigned long			pgsize_bitmap;
 	unsigned int			ias;
@@ -64,6 +68,13 @@ struct io_pgtable_cfg {
 			u64	vttbr;
 			u64	vtcr;
 		} arm_lpae_s2_cfg;
+
+		struct {
+			u32	ttbr[2];
+			u32	tcr;
+			u32	nmrr;
+			u32	prrr;
+		} arm_short_cfg;
 	};
 };
 
@@ -130,6 +141,9 @@ struct io_pgtable {
 	struct io_pgtable_ops	ops;
 };
 
+#define io_pgtable_ops_to_pgtable(x)		\
+	container_of((x), struct io_pgtable, ops)
+
 /**
  * struct io_pgtable_init_fns - Alloc/free a set of page tables for a
  *                              particular format.
-- 
1.8.1.1.dirty

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v4 4/6] memory: mediatek: Add SMI driver
@ 2015-08-03 10:21   ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-03 10:21 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Mark Rutland, Matthias Brugger
  Cc: Robin Murphy, Will Deacon, Daniel Kurtz, Tomasz Figa,
	Lucas Stach, Rob Herring, Catalin Marinas, linux-mediatek,
	Sasha Hauer, srv_heupstream, devicetree, linux-kernel,
	linux-arm-kernel, iommu, pebolle, arnd, mitchelh, youhua.li,
	k.zhang, frederic.chen, Yong Wu

This patch add SMI(Smart Multimedia Interface) driver. This driver
is responsible to enable/disable iommu and control the clocks of each
local arbiter

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
---
 drivers/memory/Kconfig     |   8 ++
 drivers/memory/Makefile    |   1 +
 drivers/memory/mtk-smi.c   | 285 +++++++++++++++++++++++++++++++++++++++++++++
 include/soc/mediatek/smi.h |  60 ++++++++++
 4 files changed, 354 insertions(+)
 create mode 100644 drivers/memory/mtk-smi.c
 create mode 100644 include/soc/mediatek/smi.h

diff --git a/drivers/memory/Kconfig b/drivers/memory/Kconfig
index 8406c668..c0e1607 100644
--- a/drivers/memory/Kconfig
+++ b/drivers/memory/Kconfig
@@ -100,6 +100,14 @@ config JZ4780_NEMC
 	  the Ingenic JZ4780. This controller is used to handle external
 	  memory devices such as NAND and SRAM.
 
+config MTK_SMI
+	bool
+	depends on ARCH_MEDIATEK || COMPILE_TEST
+	help
+	  This driver is for the Memory Controller module in MediaTek SoCs,
+	  mainly help enable/disable iommu and control the clock for each
+	  local arbiter.
+
 source "drivers/memory/tegra/Kconfig"
 
 endif
diff --git a/drivers/memory/Makefile b/drivers/memory/Makefile
index b670441..f854e40 100644
--- a/drivers/memory/Makefile
+++ b/drivers/memory/Makefile
@@ -14,5 +14,6 @@ obj-$(CONFIG_FSL_IFC)		+= fsl_ifc.o
 obj-$(CONFIG_MVEBU_DEVBUS)	+= mvebu-devbus.o
 obj-$(CONFIG_TEGRA20_MC)	+= tegra20-mc.o
 obj-$(CONFIG_JZ4780_NEMC)	+= jz4780-nemc.o
+obj-$(CONFIG_MTK_SMI)		+= mtk-smi.o
 
 obj-$(CONFIG_TEGRA_MC)		+= tegra/
diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
new file mode 100644
index 0000000..e62cceb
--- /dev/null
+++ b/drivers/memory/mtk-smi.c
@@ -0,0 +1,285 @@
+/*
+ * Copyright (c) 2014-2015 MediaTek Inc.
+ * Author: Yong Wu <yong.wu@mediatek.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#include <linux/io.h>
+#include <linux/platform_device.h>
+#include <linux/clk.h>
+#include <linux/err.h>
+#include <linux/slab.h>
+#include <linux/of_address.h>
+#include <linux/of_irq.h>
+#include <linux/of_platform.h>
+#include <linux/pm_runtime.h>
+#include <soc/mediatek/smi.h>
+
+#define SMI_LARB_MMU_EN		0xf00
+#define F_SMI_MMU_EN(port)	BIT(port)
+
+struct mtk_smi_common {
+	struct clk		*clk_apb;
+	struct clk		*clk_smi;
+};
+
+struct mtk_smi_larb {
+	void __iomem		*base;
+	struct clk		*clk_apb;
+	struct clk		*clk_smi;
+	struct device		*smi;
+};
+
+struct mtk_larb_mmu {
+	u32			mmu;
+};
+
+static int mtk_smi_common_get(struct device *smidev)
+{
+	struct mtk_smi_common *smipriv = dev_get_drvdata(smidev);
+	int ret;
+
+	ret = pm_runtime_get_sync(smidev);
+	if (ret < 0)
+		return ret;
+
+	ret = clk_prepare_enable(smipriv->clk_apb);
+	if (ret) {
+		dev_err(smidev, "Failed to enable the apb clock\n");
+		goto err_put_pm;
+	}
+	ret = clk_prepare_enable(smipriv->clk_smi);
+	if (ret) {
+		dev_err(smidev, "Failed to enable the smi clock\n");
+		goto err_disable_apb;
+	}
+	return ret;
+
+err_disable_apb:
+	clk_disable_unprepare(smipriv->clk_apb);
+err_put_pm:
+	pm_runtime_put(smidev);
+	return ret;
+}
+
+static void mtk_smi_common_put(struct device *smidev)
+{
+	struct mtk_smi_common *smipriv = dev_get_drvdata(smidev);
+
+	clk_disable_unprepare(smipriv->clk_smi);
+	clk_disable_unprepare(smipriv->clk_apb);
+	pm_runtime_put(smidev);
+}
+
+int mtk_smi_larb_get(struct device *larbdev)
+{
+	struct mtk_smi_larb *larbpriv = dev_get_drvdata(larbdev);
+	struct mtk_larb_mmu *mmucfg = larbdev->archdata.iommu;
+	int ret;
+
+	ret = mtk_smi_common_get(larbpriv->smi);
+	if (ret)
+		return ret;
+
+	ret = pm_runtime_get_sync(larbdev);
+	if (ret < 0)
+		goto err_put_smicommon;
+
+	ret = clk_prepare_enable(larbpriv->clk_apb);
+	if (ret) {
+		dev_err(larbdev, "Failed to enable the apb clock\n");
+		goto err_put_pm;
+	}
+
+	ret = clk_prepare_enable(larbpriv->clk_smi);
+	if (ret) {
+		dev_err(larbdev, "Failed to enable the smi clock\n");
+		goto err_disable_apb;
+	}
+
+	/* config iommu */
+	writel_relaxed(mmucfg->mmu, larbpriv->base + SMI_LARB_MMU_EN);
+
+	return ret;
+
+err_disable_apb:
+	clk_disable_unprepare(larbpriv->clk_apb);
+err_put_pm:
+	pm_runtime_put(larbdev);
+err_put_smicommon:
+	mtk_smi_common_put(larbpriv->smi);
+	return ret;
+}
+
+void mtk_smi_larb_put(struct device *larbdev)
+{
+	struct mtk_smi_larb *larbpriv = dev_get_drvdata(larbdev);
+
+	clk_disable_unprepare(larbpriv->clk_smi);
+	clk_disable_unprepare(larbpriv->clk_apb);
+	pm_runtime_put(larbdev);
+	mtk_smi_common_put(larbpriv->smi);
+}
+
+int mtk_smi_config_port(struct device *larbdev, unsigned int larbportid,
+			bool enable)
+{
+	struct mtk_larb_mmu *mmucfg = larbdev->archdata.iommu;
+
+	if (!mmucfg) {
+		mmucfg = kzalloc(sizeof(*mmucfg), GFP_KERNEL);
+		if (!mmucfg)
+			return -ENOMEM;
+		larbdev->archdata.iommu = mmucfg;
+	}
+
+	dev_dbg(larbdev, "%s iommu port id %d\n",
+		enable ? "enable" : "disable", larbportid);
+
+	if (enable)
+		mmucfg->mmu |= F_SMI_MMU_EN(larbportid);
+	else
+		mmucfg->mmu &= ~F_SMI_MMU_EN(larbportid);
+
+	return 0;
+}
+
+static int mtk_smi_larb_probe(struct platform_device *pdev)
+{
+	struct mtk_smi_larb *larbpriv;
+	struct resource *res;
+	struct device *dev = &pdev->dev;
+	struct device_node *smi_node;
+	struct platform_device *smi_pdev;
+
+	larbpriv = devm_kzalloc(dev, sizeof(*larbpriv), GFP_KERNEL);
+	if (!larbpriv)
+		return -ENOMEM;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	larbpriv->base = devm_ioremap_resource(dev, res);
+	if (IS_ERR(larbpriv->base))
+		return PTR_ERR(larbpriv->base);
+
+	larbpriv->clk_apb = devm_clk_get(dev, "apb");
+	if (IS_ERR(larbpriv->clk_apb))
+		return PTR_ERR(larbpriv->clk_apb);
+
+	larbpriv->clk_smi = devm_clk_get(dev, "smi");
+	if (IS_ERR(larbpriv->clk_smi))
+		return PTR_ERR(larbpriv->clk_smi);
+
+	smi_node = of_parse_phandle(dev->of_node, "mediatek,smi", 0);
+	if (!smi_node)
+		return -EINVAL;
+
+	smi_pdev = of_find_device_by_node(smi_node);
+	of_node_put(smi_node);
+	if (smi_pdev) {
+		larbpriv->smi = &smi_pdev->dev;
+	} else {
+		dev_err(dev, "Failed to get the smi_common device\n");
+		return -EINVAL;
+	}
+
+	pm_runtime_enable(dev);
+	dev_set_drvdata(dev, larbpriv);
+	return 0;
+}
+
+static int mtk_smi_larb_remove(struct platform_device *pdev)
+{
+	struct mtk_larb_mmu *mmucfg = pdev->dev.archdata.iommu;
+
+	kfree(mmucfg);
+	pm_runtime_disable(&pdev->dev);
+	return 0;
+}
+
+static const struct of_device_id mtk_smi_larb_of_ids[] = {
+	{ .compatible = "mediatek,mt8173-smi-larb",},
+	{}
+};
+
+static struct platform_driver mtk_smi_larb_driver = {
+	.probe	= mtk_smi_larb_probe,
+	.remove = mtk_smi_larb_remove,
+	.driver	= {
+		.name = "mtk-smi-larb",
+		.of_match_table = mtk_smi_larb_of_ids,
+	}
+};
+
+static int mtk_smi_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct mtk_smi_common *smipriv;
+	int ret;
+
+	smipriv = devm_kzalloc(dev, sizeof(*smipriv), GFP_KERNEL);
+	if (!smipriv)
+		return -ENOMEM;
+
+	smipriv->clk_apb = devm_clk_get(dev, "apb");
+	if (IS_ERR(smipriv->clk_apb))
+		return PTR_ERR(smipriv->clk_apb);
+
+	smipriv->clk_smi = devm_clk_get(dev, "smi");
+	if (IS_ERR(smipriv->clk_smi))
+		return PTR_ERR(smipriv->clk_smi);
+
+	pm_runtime_enable(dev);
+	dev_set_drvdata(dev, smipriv);
+	return ret;
+}
+
+static int mtk_smi_remove(struct platform_device *pdev)
+{
+	pm_runtime_disable(&pdev->dev);
+	return 0;
+}
+
+static const struct of_device_id mtk_smi_of_ids[] = {
+	{ .compatible = "mediatek,mt8173-smi",},
+	{}
+};
+
+static struct platform_driver mtk_smi_driver = {
+	.probe	= mtk_smi_probe,
+	.remove = mtk_smi_remove,
+	.driver	= {
+		.name = "mtk-smi",
+		.of_match_table = mtk_smi_of_ids,
+	}
+};
+
+static int __init mtk_smi_init(void)
+{
+	int ret;
+
+	ret = platform_driver_register(&mtk_smi_driver);
+	if (ret != 0) {
+		pr_err("Failed to register SMI driver\n");
+		return ret;
+	}
+
+	ret = platform_driver_register(&mtk_smi_larb_driver);
+	if (ret != 0) {
+		pr_err("Failed to register SMI-LARB driver\n");
+		goto err_unreg_smi;
+	}
+	return ret;
+
+err_unreg_smi:
+	platform_driver_unregister(&mtk_smi_driver);
+	return ret;
+}
+
+subsys_initcall(mtk_smi_init);
diff --git a/include/soc/mediatek/smi.h b/include/soc/mediatek/smi.h
new file mode 100644
index 0000000..189d575
--- /dev/null
+++ b/include/soc/mediatek/smi.h
@@ -0,0 +1,60 @@
+/*
+ * Copyright (c) 2014-2015 MediaTek Inc.
+ * Author: Yong Wu <yong.wu@mediatek.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#ifndef MTK_IOMMU_SMI_H
+#define MTK_IOMMU_SMI_H
+
+#include <linux/device.h>
+
+#ifdef CONFIG_MTK_SMI
+
+/*
+ * Record the iommu info for each port in the local arbiter.
+ * It is only for iommu.
+ *
+ * Returns 0 if successfully, others if failed.
+ */
+int mtk_smi_config_port(struct device *larbdev, unsigned int larbportid,
+			bool enable);
+/*
+ * The two function below config iommu and enable/disable the clock
+ * for the larb.
+ *
+ * mtk_smi_larb_get must be called before the multimedia HW work.
+ * mtk_smi_larb_put must be called after HW done.
+ * Both should be called in non-atomic context.
+ *
+ * Returns 0 if successfully, others if failed.
+ */
+int mtk_smi_larb_get(struct device *larbdev);
+void mtk_smi_larb_put(struct device *larbdev);
+
+#else
+
+static int
+mtk_smi_config_port(struct device *larbdev, unsigned int larbportid,
+		    bool enable)
+{
+	return 0;
+}
+
+static inline int mtk_smi_larb_get(struct device *larbdev)
+{
+	return 0;
+}
+
+static inline void mtk_smi_larb_put(struct device *larbdev) { }
+
+#endif
+
+#endif
-- 
1.8.1.1.dirty


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v4 4/6] memory: mediatek: Add SMI driver
@ 2015-08-03 10:21   ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-03 10:21 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Mark Rutland, Matthias Brugger
  Cc: k.zhang-NuS5LvNUpcJWk0Htik3J/w,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	pebolle-IWqWACnzNjzz+pZb47iToQ,
	frederic.chen-NuS5LvNUpcJWk0Htik3J/w, arnd-r2nGTMty4D4,
	srv_heupstream-NuS5LvNUpcJWk0Htik3J/w, Catalin Marinas,
	Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tomasz Figa,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	Daniel Kurtz, Sasha Hauer,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	youhua.li-NuS5LvNUpcJWk0Htik3J/w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Lucas Stach

This patch add SMI(Smart Multimedia Interface) driver. This driver
is responsible to enable/disable iommu and control the clocks of each
local arbiter

Signed-off-by: Yong Wu <yong.wu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
---
 drivers/memory/Kconfig     |   8 ++
 drivers/memory/Makefile    |   1 +
 drivers/memory/mtk-smi.c   | 285 +++++++++++++++++++++++++++++++++++++++++++++
 include/soc/mediatek/smi.h |  60 ++++++++++
 4 files changed, 354 insertions(+)
 create mode 100644 drivers/memory/mtk-smi.c
 create mode 100644 include/soc/mediatek/smi.h

diff --git a/drivers/memory/Kconfig b/drivers/memory/Kconfig
index 8406c668..c0e1607 100644
--- a/drivers/memory/Kconfig
+++ b/drivers/memory/Kconfig
@@ -100,6 +100,14 @@ config JZ4780_NEMC
 	  the Ingenic JZ4780. This controller is used to handle external
 	  memory devices such as NAND and SRAM.
 
+config MTK_SMI
+	bool
+	depends on ARCH_MEDIATEK || COMPILE_TEST
+	help
+	  This driver is for the Memory Controller module in MediaTek SoCs,
+	  mainly help enable/disable iommu and control the clock for each
+	  local arbiter.
+
 source "drivers/memory/tegra/Kconfig"
 
 endif
diff --git a/drivers/memory/Makefile b/drivers/memory/Makefile
index b670441..f854e40 100644
--- a/drivers/memory/Makefile
+++ b/drivers/memory/Makefile
@@ -14,5 +14,6 @@ obj-$(CONFIG_FSL_IFC)		+= fsl_ifc.o
 obj-$(CONFIG_MVEBU_DEVBUS)	+= mvebu-devbus.o
 obj-$(CONFIG_TEGRA20_MC)	+= tegra20-mc.o
 obj-$(CONFIG_JZ4780_NEMC)	+= jz4780-nemc.o
+obj-$(CONFIG_MTK_SMI)		+= mtk-smi.o
 
 obj-$(CONFIG_TEGRA_MC)		+= tegra/
diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
new file mode 100644
index 0000000..e62cceb
--- /dev/null
+++ b/drivers/memory/mtk-smi.c
@@ -0,0 +1,285 @@
+/*
+ * Copyright (c) 2014-2015 MediaTek Inc.
+ * Author: Yong Wu <yong.wu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#include <linux/io.h>
+#include <linux/platform_device.h>
+#include <linux/clk.h>
+#include <linux/err.h>
+#include <linux/slab.h>
+#include <linux/of_address.h>
+#include <linux/of_irq.h>
+#include <linux/of_platform.h>
+#include <linux/pm_runtime.h>
+#include <soc/mediatek/smi.h>
+
+#define SMI_LARB_MMU_EN		0xf00
+#define F_SMI_MMU_EN(port)	BIT(port)
+
+struct mtk_smi_common {
+	struct clk		*clk_apb;
+	struct clk		*clk_smi;
+};
+
+struct mtk_smi_larb {
+	void __iomem		*base;
+	struct clk		*clk_apb;
+	struct clk		*clk_smi;
+	struct device		*smi;
+};
+
+struct mtk_larb_mmu {
+	u32			mmu;
+};
+
+static int mtk_smi_common_get(struct device *smidev)
+{
+	struct mtk_smi_common *smipriv = dev_get_drvdata(smidev);
+	int ret;
+
+	ret = pm_runtime_get_sync(smidev);
+	if (ret < 0)
+		return ret;
+
+	ret = clk_prepare_enable(smipriv->clk_apb);
+	if (ret) {
+		dev_err(smidev, "Failed to enable the apb clock\n");
+		goto err_put_pm;
+	}
+	ret = clk_prepare_enable(smipriv->clk_smi);
+	if (ret) {
+		dev_err(smidev, "Failed to enable the smi clock\n");
+		goto err_disable_apb;
+	}
+	return ret;
+
+err_disable_apb:
+	clk_disable_unprepare(smipriv->clk_apb);
+err_put_pm:
+	pm_runtime_put(smidev);
+	return ret;
+}
+
+static void mtk_smi_common_put(struct device *smidev)
+{
+	struct mtk_smi_common *smipriv = dev_get_drvdata(smidev);
+
+	clk_disable_unprepare(smipriv->clk_smi);
+	clk_disable_unprepare(smipriv->clk_apb);
+	pm_runtime_put(smidev);
+}
+
+int mtk_smi_larb_get(struct device *larbdev)
+{
+	struct mtk_smi_larb *larbpriv = dev_get_drvdata(larbdev);
+	struct mtk_larb_mmu *mmucfg = larbdev->archdata.iommu;
+	int ret;
+
+	ret = mtk_smi_common_get(larbpriv->smi);
+	if (ret)
+		return ret;
+
+	ret = pm_runtime_get_sync(larbdev);
+	if (ret < 0)
+		goto err_put_smicommon;
+
+	ret = clk_prepare_enable(larbpriv->clk_apb);
+	if (ret) {
+		dev_err(larbdev, "Failed to enable the apb clock\n");
+		goto err_put_pm;
+	}
+
+	ret = clk_prepare_enable(larbpriv->clk_smi);
+	if (ret) {
+		dev_err(larbdev, "Failed to enable the smi clock\n");
+		goto err_disable_apb;
+	}
+
+	/* config iommu */
+	writel_relaxed(mmucfg->mmu, larbpriv->base + SMI_LARB_MMU_EN);
+
+	return ret;
+
+err_disable_apb:
+	clk_disable_unprepare(larbpriv->clk_apb);
+err_put_pm:
+	pm_runtime_put(larbdev);
+err_put_smicommon:
+	mtk_smi_common_put(larbpriv->smi);
+	return ret;
+}
+
+void mtk_smi_larb_put(struct device *larbdev)
+{
+	struct mtk_smi_larb *larbpriv = dev_get_drvdata(larbdev);
+
+	clk_disable_unprepare(larbpriv->clk_smi);
+	clk_disable_unprepare(larbpriv->clk_apb);
+	pm_runtime_put(larbdev);
+	mtk_smi_common_put(larbpriv->smi);
+}
+
+int mtk_smi_config_port(struct device *larbdev, unsigned int larbportid,
+			bool enable)
+{
+	struct mtk_larb_mmu *mmucfg = larbdev->archdata.iommu;
+
+	if (!mmucfg) {
+		mmucfg = kzalloc(sizeof(*mmucfg), GFP_KERNEL);
+		if (!mmucfg)
+			return -ENOMEM;
+		larbdev->archdata.iommu = mmucfg;
+	}
+
+	dev_dbg(larbdev, "%s iommu port id %d\n",
+		enable ? "enable" : "disable", larbportid);
+
+	if (enable)
+		mmucfg->mmu |= F_SMI_MMU_EN(larbportid);
+	else
+		mmucfg->mmu &= ~F_SMI_MMU_EN(larbportid);
+
+	return 0;
+}
+
+static int mtk_smi_larb_probe(struct platform_device *pdev)
+{
+	struct mtk_smi_larb *larbpriv;
+	struct resource *res;
+	struct device *dev = &pdev->dev;
+	struct device_node *smi_node;
+	struct platform_device *smi_pdev;
+
+	larbpriv = devm_kzalloc(dev, sizeof(*larbpriv), GFP_KERNEL);
+	if (!larbpriv)
+		return -ENOMEM;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	larbpriv->base = devm_ioremap_resource(dev, res);
+	if (IS_ERR(larbpriv->base))
+		return PTR_ERR(larbpriv->base);
+
+	larbpriv->clk_apb = devm_clk_get(dev, "apb");
+	if (IS_ERR(larbpriv->clk_apb))
+		return PTR_ERR(larbpriv->clk_apb);
+
+	larbpriv->clk_smi = devm_clk_get(dev, "smi");
+	if (IS_ERR(larbpriv->clk_smi))
+		return PTR_ERR(larbpriv->clk_smi);
+
+	smi_node = of_parse_phandle(dev->of_node, "mediatek,smi", 0);
+	if (!smi_node)
+		return -EINVAL;
+
+	smi_pdev = of_find_device_by_node(smi_node);
+	of_node_put(smi_node);
+	if (smi_pdev) {
+		larbpriv->smi = &smi_pdev->dev;
+	} else {
+		dev_err(dev, "Failed to get the smi_common device\n");
+		return -EINVAL;
+	}
+
+	pm_runtime_enable(dev);
+	dev_set_drvdata(dev, larbpriv);
+	return 0;
+}
+
+static int mtk_smi_larb_remove(struct platform_device *pdev)
+{
+	struct mtk_larb_mmu *mmucfg = pdev->dev.archdata.iommu;
+
+	kfree(mmucfg);
+	pm_runtime_disable(&pdev->dev);
+	return 0;
+}
+
+static const struct of_device_id mtk_smi_larb_of_ids[] = {
+	{ .compatible = "mediatek,mt8173-smi-larb",},
+	{}
+};
+
+static struct platform_driver mtk_smi_larb_driver = {
+	.probe	= mtk_smi_larb_probe,
+	.remove = mtk_smi_larb_remove,
+	.driver	= {
+		.name = "mtk-smi-larb",
+		.of_match_table = mtk_smi_larb_of_ids,
+	}
+};
+
+static int mtk_smi_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct mtk_smi_common *smipriv;
+	int ret;
+
+	smipriv = devm_kzalloc(dev, sizeof(*smipriv), GFP_KERNEL);
+	if (!smipriv)
+		return -ENOMEM;
+
+	smipriv->clk_apb = devm_clk_get(dev, "apb");
+	if (IS_ERR(smipriv->clk_apb))
+		return PTR_ERR(smipriv->clk_apb);
+
+	smipriv->clk_smi = devm_clk_get(dev, "smi");
+	if (IS_ERR(smipriv->clk_smi))
+		return PTR_ERR(smipriv->clk_smi);
+
+	pm_runtime_enable(dev);
+	dev_set_drvdata(dev, smipriv);
+	return ret;
+}
+
+static int mtk_smi_remove(struct platform_device *pdev)
+{
+	pm_runtime_disable(&pdev->dev);
+	return 0;
+}
+
+static const struct of_device_id mtk_smi_of_ids[] = {
+	{ .compatible = "mediatek,mt8173-smi",},
+	{}
+};
+
+static struct platform_driver mtk_smi_driver = {
+	.probe	= mtk_smi_probe,
+	.remove = mtk_smi_remove,
+	.driver	= {
+		.name = "mtk-smi",
+		.of_match_table = mtk_smi_of_ids,
+	}
+};
+
+static int __init mtk_smi_init(void)
+{
+	int ret;
+
+	ret = platform_driver_register(&mtk_smi_driver);
+	if (ret != 0) {
+		pr_err("Failed to register SMI driver\n");
+		return ret;
+	}
+
+	ret = platform_driver_register(&mtk_smi_larb_driver);
+	if (ret != 0) {
+		pr_err("Failed to register SMI-LARB driver\n");
+		goto err_unreg_smi;
+	}
+	return ret;
+
+err_unreg_smi:
+	platform_driver_unregister(&mtk_smi_driver);
+	return ret;
+}
+
+subsys_initcall(mtk_smi_init);
diff --git a/include/soc/mediatek/smi.h b/include/soc/mediatek/smi.h
new file mode 100644
index 0000000..189d575
--- /dev/null
+++ b/include/soc/mediatek/smi.h
@@ -0,0 +1,60 @@
+/*
+ * Copyright (c) 2014-2015 MediaTek Inc.
+ * Author: Yong Wu <yong.wu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#ifndef MTK_IOMMU_SMI_H
+#define MTK_IOMMU_SMI_H
+
+#include <linux/device.h>
+
+#ifdef CONFIG_MTK_SMI
+
+/*
+ * Record the iommu info for each port in the local arbiter.
+ * It is only for iommu.
+ *
+ * Returns 0 if successfully, others if failed.
+ */
+int mtk_smi_config_port(struct device *larbdev, unsigned int larbportid,
+			bool enable);
+/*
+ * The two function below config iommu and enable/disable the clock
+ * for the larb.
+ *
+ * mtk_smi_larb_get must be called before the multimedia HW work.
+ * mtk_smi_larb_put must be called after HW done.
+ * Both should be called in non-atomic context.
+ *
+ * Returns 0 if successfully, others if failed.
+ */
+int mtk_smi_larb_get(struct device *larbdev);
+void mtk_smi_larb_put(struct device *larbdev);
+
+#else
+
+static int
+mtk_smi_config_port(struct device *larbdev, unsigned int larbportid,
+		    bool enable)
+{
+	return 0;
+}
+
+static inline int mtk_smi_larb_get(struct device *larbdev)
+{
+	return 0;
+}
+
+static inline void mtk_smi_larb_put(struct device *larbdev) { }
+
+#endif
+
+#endif
-- 
1.8.1.1.dirty

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v4 4/6] memory: mediatek: Add SMI driver
@ 2015-08-03 10:21   ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-03 10:21 UTC (permalink / raw)
  To: linux-arm-kernel

This patch add SMI(Smart Multimedia Interface) driver. This driver
is responsible to enable/disable iommu and control the clocks of each
local arbiter

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
---
 drivers/memory/Kconfig     |   8 ++
 drivers/memory/Makefile    |   1 +
 drivers/memory/mtk-smi.c   | 285 +++++++++++++++++++++++++++++++++++++++++++++
 include/soc/mediatek/smi.h |  60 ++++++++++
 4 files changed, 354 insertions(+)
 create mode 100644 drivers/memory/mtk-smi.c
 create mode 100644 include/soc/mediatek/smi.h

diff --git a/drivers/memory/Kconfig b/drivers/memory/Kconfig
index 8406c668..c0e1607 100644
--- a/drivers/memory/Kconfig
+++ b/drivers/memory/Kconfig
@@ -100,6 +100,14 @@ config JZ4780_NEMC
 	  the Ingenic JZ4780. This controller is used to handle external
 	  memory devices such as NAND and SRAM.
 
+config MTK_SMI
+	bool
+	depends on ARCH_MEDIATEK || COMPILE_TEST
+	help
+	  This driver is for the Memory Controller module in MediaTek SoCs,
+	  mainly help enable/disable iommu and control the clock for each
+	  local arbiter.
+
 source "drivers/memory/tegra/Kconfig"
 
 endif
diff --git a/drivers/memory/Makefile b/drivers/memory/Makefile
index b670441..f854e40 100644
--- a/drivers/memory/Makefile
+++ b/drivers/memory/Makefile
@@ -14,5 +14,6 @@ obj-$(CONFIG_FSL_IFC)		+= fsl_ifc.o
 obj-$(CONFIG_MVEBU_DEVBUS)	+= mvebu-devbus.o
 obj-$(CONFIG_TEGRA20_MC)	+= tegra20-mc.o
 obj-$(CONFIG_JZ4780_NEMC)	+= jz4780-nemc.o
+obj-$(CONFIG_MTK_SMI)		+= mtk-smi.o
 
 obj-$(CONFIG_TEGRA_MC)		+= tegra/
diff --git a/drivers/memory/mtk-smi.c b/drivers/memory/mtk-smi.c
new file mode 100644
index 0000000..e62cceb
--- /dev/null
+++ b/drivers/memory/mtk-smi.c
@@ -0,0 +1,285 @@
+/*
+ * Copyright (c) 2014-2015 MediaTek Inc.
+ * Author: Yong Wu <yong.wu@mediatek.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#include <linux/io.h>
+#include <linux/platform_device.h>
+#include <linux/clk.h>
+#include <linux/err.h>
+#include <linux/slab.h>
+#include <linux/of_address.h>
+#include <linux/of_irq.h>
+#include <linux/of_platform.h>
+#include <linux/pm_runtime.h>
+#include <soc/mediatek/smi.h>
+
+#define SMI_LARB_MMU_EN		0xf00
+#define F_SMI_MMU_EN(port)	BIT(port)
+
+struct mtk_smi_common {
+	struct clk		*clk_apb;
+	struct clk		*clk_smi;
+};
+
+struct mtk_smi_larb {
+	void __iomem		*base;
+	struct clk		*clk_apb;
+	struct clk		*clk_smi;
+	struct device		*smi;
+};
+
+struct mtk_larb_mmu {
+	u32			mmu;
+};
+
+static int mtk_smi_common_get(struct device *smidev)
+{
+	struct mtk_smi_common *smipriv = dev_get_drvdata(smidev);
+	int ret;
+
+	ret = pm_runtime_get_sync(smidev);
+	if (ret < 0)
+		return ret;
+
+	ret = clk_prepare_enable(smipriv->clk_apb);
+	if (ret) {
+		dev_err(smidev, "Failed to enable the apb clock\n");
+		goto err_put_pm;
+	}
+	ret = clk_prepare_enable(smipriv->clk_smi);
+	if (ret) {
+		dev_err(smidev, "Failed to enable the smi clock\n");
+		goto err_disable_apb;
+	}
+	return ret;
+
+err_disable_apb:
+	clk_disable_unprepare(smipriv->clk_apb);
+err_put_pm:
+	pm_runtime_put(smidev);
+	return ret;
+}
+
+static void mtk_smi_common_put(struct device *smidev)
+{
+	struct mtk_smi_common *smipriv = dev_get_drvdata(smidev);
+
+	clk_disable_unprepare(smipriv->clk_smi);
+	clk_disable_unprepare(smipriv->clk_apb);
+	pm_runtime_put(smidev);
+}
+
+int mtk_smi_larb_get(struct device *larbdev)
+{
+	struct mtk_smi_larb *larbpriv = dev_get_drvdata(larbdev);
+	struct mtk_larb_mmu *mmucfg = larbdev->archdata.iommu;
+	int ret;
+
+	ret = mtk_smi_common_get(larbpriv->smi);
+	if (ret)
+		return ret;
+
+	ret = pm_runtime_get_sync(larbdev);
+	if (ret < 0)
+		goto err_put_smicommon;
+
+	ret = clk_prepare_enable(larbpriv->clk_apb);
+	if (ret) {
+		dev_err(larbdev, "Failed to enable the apb clock\n");
+		goto err_put_pm;
+	}
+
+	ret = clk_prepare_enable(larbpriv->clk_smi);
+	if (ret) {
+		dev_err(larbdev, "Failed to enable the smi clock\n");
+		goto err_disable_apb;
+	}
+
+	/* config iommu */
+	writel_relaxed(mmucfg->mmu, larbpriv->base + SMI_LARB_MMU_EN);
+
+	return ret;
+
+err_disable_apb:
+	clk_disable_unprepare(larbpriv->clk_apb);
+err_put_pm:
+	pm_runtime_put(larbdev);
+err_put_smicommon:
+	mtk_smi_common_put(larbpriv->smi);
+	return ret;
+}
+
+void mtk_smi_larb_put(struct device *larbdev)
+{
+	struct mtk_smi_larb *larbpriv = dev_get_drvdata(larbdev);
+
+	clk_disable_unprepare(larbpriv->clk_smi);
+	clk_disable_unprepare(larbpriv->clk_apb);
+	pm_runtime_put(larbdev);
+	mtk_smi_common_put(larbpriv->smi);
+}
+
+int mtk_smi_config_port(struct device *larbdev, unsigned int larbportid,
+			bool enable)
+{
+	struct mtk_larb_mmu *mmucfg = larbdev->archdata.iommu;
+
+	if (!mmucfg) {
+		mmucfg = kzalloc(sizeof(*mmucfg), GFP_KERNEL);
+		if (!mmucfg)
+			return -ENOMEM;
+		larbdev->archdata.iommu = mmucfg;
+	}
+
+	dev_dbg(larbdev, "%s iommu port id %d\n",
+		enable ? "enable" : "disable", larbportid);
+
+	if (enable)
+		mmucfg->mmu |= F_SMI_MMU_EN(larbportid);
+	else
+		mmucfg->mmu &= ~F_SMI_MMU_EN(larbportid);
+
+	return 0;
+}
+
+static int mtk_smi_larb_probe(struct platform_device *pdev)
+{
+	struct mtk_smi_larb *larbpriv;
+	struct resource *res;
+	struct device *dev = &pdev->dev;
+	struct device_node *smi_node;
+	struct platform_device *smi_pdev;
+
+	larbpriv = devm_kzalloc(dev, sizeof(*larbpriv), GFP_KERNEL);
+	if (!larbpriv)
+		return -ENOMEM;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	larbpriv->base = devm_ioremap_resource(dev, res);
+	if (IS_ERR(larbpriv->base))
+		return PTR_ERR(larbpriv->base);
+
+	larbpriv->clk_apb = devm_clk_get(dev, "apb");
+	if (IS_ERR(larbpriv->clk_apb))
+		return PTR_ERR(larbpriv->clk_apb);
+
+	larbpriv->clk_smi = devm_clk_get(dev, "smi");
+	if (IS_ERR(larbpriv->clk_smi))
+		return PTR_ERR(larbpriv->clk_smi);
+
+	smi_node = of_parse_phandle(dev->of_node, "mediatek,smi", 0);
+	if (!smi_node)
+		return -EINVAL;
+
+	smi_pdev = of_find_device_by_node(smi_node);
+	of_node_put(smi_node);
+	if (smi_pdev) {
+		larbpriv->smi = &smi_pdev->dev;
+	} else {
+		dev_err(dev, "Failed to get the smi_common device\n");
+		return -EINVAL;
+	}
+
+	pm_runtime_enable(dev);
+	dev_set_drvdata(dev, larbpriv);
+	return 0;
+}
+
+static int mtk_smi_larb_remove(struct platform_device *pdev)
+{
+	struct mtk_larb_mmu *mmucfg = pdev->dev.archdata.iommu;
+
+	kfree(mmucfg);
+	pm_runtime_disable(&pdev->dev);
+	return 0;
+}
+
+static const struct of_device_id mtk_smi_larb_of_ids[] = {
+	{ .compatible = "mediatek,mt8173-smi-larb",},
+	{}
+};
+
+static struct platform_driver mtk_smi_larb_driver = {
+	.probe	= mtk_smi_larb_probe,
+	.remove = mtk_smi_larb_remove,
+	.driver	= {
+		.name = "mtk-smi-larb",
+		.of_match_table = mtk_smi_larb_of_ids,
+	}
+};
+
+static int mtk_smi_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct mtk_smi_common *smipriv;
+	int ret;
+
+	smipriv = devm_kzalloc(dev, sizeof(*smipriv), GFP_KERNEL);
+	if (!smipriv)
+		return -ENOMEM;
+
+	smipriv->clk_apb = devm_clk_get(dev, "apb");
+	if (IS_ERR(smipriv->clk_apb))
+		return PTR_ERR(smipriv->clk_apb);
+
+	smipriv->clk_smi = devm_clk_get(dev, "smi");
+	if (IS_ERR(smipriv->clk_smi))
+		return PTR_ERR(smipriv->clk_smi);
+
+	pm_runtime_enable(dev);
+	dev_set_drvdata(dev, smipriv);
+	return ret;
+}
+
+static int mtk_smi_remove(struct platform_device *pdev)
+{
+	pm_runtime_disable(&pdev->dev);
+	return 0;
+}
+
+static const struct of_device_id mtk_smi_of_ids[] = {
+	{ .compatible = "mediatek,mt8173-smi",},
+	{}
+};
+
+static struct platform_driver mtk_smi_driver = {
+	.probe	= mtk_smi_probe,
+	.remove = mtk_smi_remove,
+	.driver	= {
+		.name = "mtk-smi",
+		.of_match_table = mtk_smi_of_ids,
+	}
+};
+
+static int __init mtk_smi_init(void)
+{
+	int ret;
+
+	ret = platform_driver_register(&mtk_smi_driver);
+	if (ret != 0) {
+		pr_err("Failed to register SMI driver\n");
+		return ret;
+	}
+
+	ret = platform_driver_register(&mtk_smi_larb_driver);
+	if (ret != 0) {
+		pr_err("Failed to register SMI-LARB driver\n");
+		goto err_unreg_smi;
+	}
+	return ret;
+
+err_unreg_smi:
+	platform_driver_unregister(&mtk_smi_driver);
+	return ret;
+}
+
+subsys_initcall(mtk_smi_init);
diff --git a/include/soc/mediatek/smi.h b/include/soc/mediatek/smi.h
new file mode 100644
index 0000000..189d575
--- /dev/null
+++ b/include/soc/mediatek/smi.h
@@ -0,0 +1,60 @@
+/*
+ * Copyright (c) 2014-2015 MediaTek Inc.
+ * Author: Yong Wu <yong.wu@mediatek.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#ifndef MTK_IOMMU_SMI_H
+#define MTK_IOMMU_SMI_H
+
+#include <linux/device.h>
+
+#ifdef CONFIG_MTK_SMI
+
+/*
+ * Record the iommu info for each port in the local arbiter.
+ * It is only for iommu.
+ *
+ * Returns 0 if successfully, others if failed.
+ */
+int mtk_smi_config_port(struct device *larbdev, unsigned int larbportid,
+			bool enable);
+/*
+ * The two function below config iommu and enable/disable the clock
+ * for the larb.
+ *
+ * mtk_smi_larb_get must be called before the multimedia HW work.
+ * mtk_smi_larb_put must be called after HW done.
+ * Both should be called in non-atomic context.
+ *
+ * Returns 0 if successfully, others if failed.
+ */
+int mtk_smi_larb_get(struct device *larbdev);
+void mtk_smi_larb_put(struct device *larbdev);
+
+#else
+
+static int
+mtk_smi_config_port(struct device *larbdev, unsigned int larbportid,
+		    bool enable)
+{
+	return 0;
+}
+
+static inline int mtk_smi_larb_get(struct device *larbdev)
+{
+	return 0;
+}
+
+static inline void mtk_smi_larb_put(struct device *larbdev) { }
+
+#endif
+
+#endif
-- 
1.8.1.1.dirty

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v4 5/6] iommu/mediatek: Add mt8173 IOMMU driver
@ 2015-08-03 10:21   ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-03 10:21 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Mark Rutland, Matthias Brugger
  Cc: Robin Murphy, Will Deacon, Daniel Kurtz, Tomasz Figa,
	Lucas Stach, Rob Herring, Catalin Marinas, linux-mediatek,
	Sasha Hauer, srv_heupstream, devicetree, linux-kernel,
	linux-arm-kernel, iommu, pebolle, arnd, mitchelh, youhua.li,
	k.zhang, frederic.chen, Yong Wu

This patch adds support for mediatek m4u (MultiMedia Memory Management
Unit).

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
---
 drivers/iommu/Kconfig     |  13 +
 drivers/iommu/Makefile    |   1 +
 drivers/iommu/mtk_iommu.c | 714 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 728 insertions(+)
 create mode 100644 drivers/iommu/mtk_iommu.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 3abd066..f0ae553e 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -386,4 +386,17 @@ config ARM_SMMU_V3
 	  Say Y here if your system includes an IOMMU device implementing
 	  the ARM SMMUv3 architecture.
 
+config MTK_IOMMU
+	bool "MTK IOMMU Support"
+	depends on ARCH_MEDIATEK || COMPILE_TEST
+	select IOMMU_API
+	select IOMMU_DMA
+	select IOMMU_IO_PGTABLE_SHORT
+	select MEMORY
+	select MTK_SMI
+	help
+	  Support for the IOMMUs on certain Mediatek SOCs.
+
+	  If unsure, say N here.
+
 endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 06df3e6..f4f2f2c 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -21,6 +21,7 @@ obj-$(CONFIG_ROCKCHIP_IOMMU) += rockchip-iommu.o
 obj-$(CONFIG_TEGRA_IOMMU_GART) += tegra-gart.o
 obj-$(CONFIG_TEGRA_IOMMU_SMMU) += tegra-smmu.o
 obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
+obj-$(CONFIG_MTK_IOMMU) += mtk_iommu.o
 obj-$(CONFIG_SHMOBILE_IOMMU) += shmobile-iommu.o
 obj-$(CONFIG_SHMOBILE_IPMMU) += shmobile-ipmmu.o
 obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
new file mode 100644
index 0000000..3c4f1b5
--- /dev/null
+++ b/drivers/iommu/mtk_iommu.c
@@ -0,0 +1,714 @@
+/*
+ * Copyright (c) 2014-2015 MediaTek Inc.
+ * Author: Yong Wu <yong.wu@mediatek.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#include <linux/io.h>
+#include <linux/interrupt.h>
+#include <linux/platform_device.h>
+#include <linux/iommu.h>
+#include <linux/dma-mapping.h>
+#include <linux/dma-iommu.h>
+#include <linux/of_iommu.h>
+#include <linux/of_address.h>
+#include <linux/of_irq.h>
+#include <linux/of_platform.h>
+#include <linux/list.h>
+#include <linux/clk.h>
+#include <linux/iopoll.h>
+#include <asm/cacheflush.h>
+#include <soc/mediatek/smi.h>
+
+#include "io-pgtable.h"
+
+#define REG_MMU_PT_BASE_ADDR			0x000
+
+#define REG_MMU_INVALIDATE			0x020
+#define F_ALL_INVLD				0x2
+#define F_MMU_INV_RANGE				0x1
+
+#define REG_MMU_INVLD_START_A			0x024
+#define REG_MMU_INVLD_END_A			0x028
+
+#define REG_MMU_INV_SEL				0x038
+#define F_INVLD_EN0				BIT(0)
+#define F_INVLD_EN1				BIT(1)
+
+#define REG_MMU_STANDARD_AXI_MODE		0x048
+#define REG_MMU_DCM_DIS				0x050
+
+#define REG_MMU_CTRL_REG			0x110
+#define F_MMU_PREFETCH_RT_REPLACE_MOD		BIT(4)
+#define F_MMU_TF_PROTECT_SEL(prot)		(((prot) & 0x3) << 5)
+#define F_COHERENCE_EN				BIT(8)
+
+#define REG_MMU_IVRP_PADDR			0x114
+#define F_MMU_IVRP_PA_SET(pa)			((pa) >> 1)
+
+#define REG_MMU_INT_CONTROL0			0x120
+#define F_L2_MULIT_HIT_EN			BIT(0)
+#define F_TABLE_WALK_FAULT_INT_EN		BIT(1)
+#define F_PREETCH_FIFO_OVERFLOW_INT_EN		BIT(2)
+#define F_MISS_FIFO_OVERFLOW_INT_EN		BIT(3)
+#define F_PREFETCH_FIFO_ERR_INT_EN		BIT(5)
+#define F_MISS_FIFO_ERR_INT_EN			BIT(6)
+#define F_INT_L2_CLR_BIT			BIT(12)
+
+#define REG_MMU_INT_MAIN_CONTROL		0x124
+#define F_INT_TRANSLATION_FAULT			BIT(0)
+#define F_INT_MAIN_MULTI_HIT_FAULT		BIT(1)
+#define F_INT_INVALID_PA_FAULT			BIT(2)
+#define F_INT_ENTRY_REPLACEMENT_FAULT		BIT(3)
+#define F_INT_TLB_MISS_FAULT			BIT(4)
+#define F_INT_MISS_TRANSATION_FIFO_FAULT	BIT(5)
+#define F_INT_PRETETCH_TRANSATION_FIFO_FAULT	BIT(6)
+
+#define REG_MMU_CPE_DONE			0x12C
+
+#define REG_MMU_FAULT_ST1			0x134
+
+#define REG_MMU_FAULT_VA			0x13c
+#define F_MMU_FAULT_VA_MSK			0xfffff000
+#define F_MMU_FAULT_VA_WRITE_BIT		BIT(1)
+#define F_MMU_FAULT_VA_LAYER_BIT		BIT(0)
+
+#define REG_MMU_INVLD_PA			0x140
+#define REG_MMU_INT_ID				0x150
+#define F_MMU0_INT_ID_LARB_ID(a)		(((a) >> 7) & 0x7)
+#define F_MMU0_INT_ID_PORT_ID(a)		(((a) >> 2) & 0x1f)
+
+#define MTK_PROTECT_PA_ALIGN			128
+#define MTK_IOMMU_LARB_MAX_NR			8
+#define MTK_IOMMU_REG_NR			10
+
+struct mtk_iommu_suspend_reg {
+	u32				standard_axi_mode;
+	u32				dcm_dis;
+	u32				ctrl_reg;
+	u32				ivrp_paddr;
+	u32				int_control0;
+	u32				int_main_control;
+};
+
+struct mtk_iommu_data {
+	void __iomem			*base;
+	int				irq;
+	struct device			*dev;
+	struct device			*larbdev[MTK_IOMMU_LARB_MAX_NR];
+	struct clk			*bclk;
+	phys_addr_t			protect_base; /* protect memory base */
+	int				larb_nr;/* local arbiter number */
+	struct mtk_iommu_suspend_reg	reg;
+};
+
+struct mtk_iommu_domain {
+	struct imu_pgd_t		*pgd;
+	spinlock_t			pgtlock; /* lock for page table */
+
+	struct io_pgtable_cfg		cfg;
+	struct io_pgtable_ops		*iop;
+
+	struct mtk_iommu_data		*data;
+	struct iommu_domain		domain;
+};
+
+struct mtk_iommu_client_priv {
+	struct list_head		client;
+	unsigned int			larbid;
+	unsigned int			portid;
+};
+
+static struct iommu_ops mtk_iommu_ops;
+
+/*
+ * There is only one iommu domain called the m4u domain that
+ * all Multimedia modules share.
+ */
+static struct mtk_iommu_domain	*m4udom;
+
+static struct mtk_iommu_domain *to_mtk_domain(struct iommu_domain *dom)
+{
+	return container_of(dom, struct mtk_iommu_domain, domain);
+}
+
+static void mtk_iommu_clear_intr(const struct mtk_iommu_data *data)
+{
+	u32 val;
+
+	val = readl_relaxed(data->base + REG_MMU_INT_CONTROL0);
+	val |= F_INT_L2_CLR_BIT;
+	writel_relaxed(val, data->base + REG_MMU_INT_CONTROL0);
+}
+
+static void mtk_iommu_tlb_flush_all(void *cookie)
+{
+	struct mtk_iommu_domain *domain = cookie;
+	void __iomem *base;
+
+	base = domain->data->base;
+	writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0, base + REG_MMU_INV_SEL);
+	writel_relaxed(F_ALL_INVLD, base + REG_MMU_INVALIDATE);
+	mb();/* Make sure flush all done */
+}
+
+static void mtk_iommu_tlb_add_flush(unsigned long iova, size_t size,
+				    bool leaf, void *cookie)
+{
+	struct mtk_iommu_domain *domain = cookie;
+	void __iomem *base = domain->data->base;
+	unsigned int iova_start = iova, iova_end = iova + size - 1;
+
+	writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0, base + REG_MMU_INV_SEL);
+
+	writel_relaxed(iova_start, base + REG_MMU_INVLD_START_A);
+	writel_relaxed(iova_end, base + REG_MMU_INVLD_END_A);
+	writel_relaxed(F_MMU_INV_RANGE, base + REG_MMU_INVALIDATE);
+}
+
+static void mtk_iommu_tlb_sync(void *cookie)
+{
+	struct mtk_iommu_domain *domain = cookie;
+	void __iomem *base = domain->data->base;
+	int ret;
+	u32 tmp;
+
+	ret = readl_poll_timeout_atomic(base + REG_MMU_CPE_DONE, tmp,
+					tmp != 0, 10, 1000000);
+	if (ret) {
+		dev_warn(domain->data->dev,
+			 "Partial TLB flush timed out, falling back to full flush\n");
+		mtk_iommu_tlb_flush_all(cookie);
+	}
+	writel_relaxed(0, base + REG_MMU_CPE_DONE);
+}
+
+static struct iommu_gather_ops mtk_iommu_gather_ops = {
+	.tlb_flush_all = mtk_iommu_tlb_flush_all,
+	.tlb_add_flush = mtk_iommu_tlb_add_flush,
+	.tlb_sync = mtk_iommu_tlb_sync,
+};
+
+static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
+{
+	struct mtk_iommu_domain *mtkdom = dev_id;
+	struct mtk_iommu_data *data = mtkdom->data;
+	u32 int_state, regval, fault_iova, fault_pa;
+	unsigned int fault_larb, fault_port;
+	bool layer, write;
+
+	int_state = readl_relaxed(data->base + REG_MMU_FAULT_ST1);
+
+	/* read error info from registers */
+	fault_iova = readl_relaxed(data->base + REG_MMU_FAULT_VA);
+	layer = fault_iova & F_MMU_FAULT_VA_LAYER_BIT;
+	write = fault_iova & F_MMU_FAULT_VA_WRITE_BIT;
+	fault_iova &= F_MMU_FAULT_VA_MSK;
+	fault_pa = readl_relaxed(data->base + REG_MMU_INVLD_PA);
+	regval = readl_relaxed(data->base + REG_MMU_INT_ID);
+	fault_larb = F_MMU0_INT_ID_LARB_ID(regval);
+	fault_port = F_MMU0_INT_ID_PORT_ID(regval);
+
+	if (report_iommu_fault(&mtkdom->domain, data->dev, fault_iova,
+			       write ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) {
+		dev_err_ratelimited(
+			data->dev,
+			"fault type=0x%x iova=0x%x pa=0x%x larb=%d port=%d layer=%d %s\n",
+			int_state, fault_iova, fault_pa, fault_larb, fault_port,
+			layer, write ? "write" : "read");
+	}
+
+	mtk_iommu_clear_intr(data);
+	mtk_iommu_tlb_flush_all(mtkdom);
+
+	return IRQ_HANDLED;
+}
+
+static int mtk_iommu_parse_dt(struct platform_device *pdev,
+			      struct mtk_iommu_data *data)
+{
+	struct device *dev = &pdev->dev;
+	struct device_node *ofnode;
+	struct resource *res;
+	int i;
+
+	ofnode = dev->of_node;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	data->base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(data->base))
+		return PTR_ERR(data->base);
+
+	data->irq = platform_get_irq(pdev, 0);
+	if (data->irq < 0)
+		return data->irq;
+
+	data->bclk = devm_clk_get(dev, "bclk");
+	if (IS_ERR(data->bclk))
+		return PTR_ERR(data->bclk);
+
+	data->larb_nr = of_count_phandle_with_args(
+					ofnode, "mediatek,larb", NULL);
+	if (data->larb_nr < 0)
+		return data->larb_nr;
+
+	for (i = 0; i < data->larb_nr; i++) {
+		struct device_node *larbnode;
+		struct platform_device *plarbdev;
+
+		larbnode = of_parse_phandle(ofnode, "mediatek,larb", i);
+		if (!larbnode)
+			return -EINVAL;
+
+		plarbdev = of_find_device_by_node(larbnode);
+		of_node_put(larbnode);
+		if (!plarbdev)
+			return -EPROBE_DEFER;
+		data->larbdev[i] = &plarbdev->dev;
+	}
+
+	return 0;
+}
+
+static int mtk_iommu_hw_init(const struct mtk_iommu_domain *mtkdom)
+{
+	struct mtk_iommu_data *data = mtkdom->data;
+	void __iomem *base = data->base;
+	u32 regval;
+	int ret = 0;
+
+	ret = clk_prepare_enable(data->bclk);
+	if (ret) {
+		dev_err(data->dev, "Failed to enable iommu clk(%d)\n", ret);
+		return ret;
+	}
+
+	writel_relaxed(mtkdom->cfg.arm_short_cfg.ttbr[0],
+		       base + REG_MMU_PT_BASE_ADDR);
+
+	regval = F_MMU_PREFETCH_RT_REPLACE_MOD |
+		F_MMU_TF_PROTECT_SEL(2) |
+		F_COHERENCE_EN;
+	writel_relaxed(regval, base + REG_MMU_CTRL_REG);
+
+	regval = F_L2_MULIT_HIT_EN |
+		F_TABLE_WALK_FAULT_INT_EN |
+		F_PREETCH_FIFO_OVERFLOW_INT_EN |
+		F_MISS_FIFO_OVERFLOW_INT_EN |
+		F_PREFETCH_FIFO_ERR_INT_EN |
+		F_MISS_FIFO_ERR_INT_EN;
+	writel_relaxed(regval, base + REG_MMU_INT_CONTROL0);
+
+	regval = F_INT_TRANSLATION_FAULT |
+		F_INT_MAIN_MULTI_HIT_FAULT |
+		F_INT_INVALID_PA_FAULT |
+		F_INT_ENTRY_REPLACEMENT_FAULT |
+		F_INT_TLB_MISS_FAULT |
+		F_INT_MISS_TRANSATION_FIFO_FAULT |
+		F_INT_PRETETCH_TRANSATION_FIFO_FAULT;
+	writel_relaxed(regval, base + REG_MMU_INT_MAIN_CONTROL);
+
+	regval = ALIGN(data->protect_base, MTK_PROTECT_PA_ALIGN);
+	regval = F_MMU_IVRP_PA_SET(regval);
+	writel_relaxed(regval, base + REG_MMU_IVRP_PADDR);
+
+	writel_relaxed(0, base + REG_MMU_DCM_DIS);
+	writel_relaxed(0, base + REG_MMU_STANDARD_AXI_MODE);
+
+	if (devm_request_irq(data->dev, data->irq, mtk_iommu_isr, 0,
+			     dev_name(data->dev), (void *)mtkdom)) {
+		writel_relaxed(0, base + REG_MMU_PT_BASE_ADDR);
+		clk_disable_unprepare(data->bclk);
+		dev_err(data->dev, "Failed @ IRQ-%d Request\n", data->irq);
+		return -ENODEV;
+	}
+
+	return 0;
+}
+
+static int mtk_iommu_config(struct mtk_iommu_domain *mtkdom,
+			    struct device *dev, bool enable)
+{
+	struct mtk_iommu_data *data = mtkdom->data;
+	struct mtk_iommu_client_priv *head, *cur, *next;
+
+	head = dev->archdata.iommu;
+	list_for_each_entry_safe(cur, next, &head->client, client) {
+		if (cur->larbid >= data->larb_nr) {
+			dev_err(data->dev, "Invalid larb:%d\n", cur->larbid);
+			return -EINVAL;
+		}
+
+		mtk_smi_config_port(data->larbdev[cur->larbid],
+				    cur->portid, enable);
+		if (!enable) {
+			list_del(&cur->client);
+			kfree(cur);
+		}
+	}
+
+	if (!enable) {
+		kfree(head);
+		dev->archdata.iommu = NULL;
+	}
+	return 0;
+}
+
+static int mtk_iommu_init_domain_context(struct mtk_iommu_domain *dom)
+{
+	int ret;
+
+	if (dom->iop)
+		return 0;
+
+	spin_lock_init(&dom->pgtlock);
+	dom->cfg.quirks = IO_PGTABLE_QUIRK_ARM_NS |
+			IO_PGTABLE_QUIRK_SHORT_SUPERSECTION |
+			IO_PGTABLE_QUIRK_SHORT_NO_XN |
+			IO_PGTABLE_QUIRK_SHORT_NO_PERMS;
+	dom->cfg.pgsize_bitmap = mtk_iommu_ops.pgsize_bitmap,
+	dom->cfg.ias = 32;
+	dom->cfg.oas = 32;
+	dom->cfg.tlb = &mtk_iommu_gather_ops;
+	dom->cfg.iommu_dev = dom->data->dev;
+
+	dom->iop = alloc_io_pgtable_ops(ARM_SHORT_DESC, &dom->cfg, dom);
+	if (!dom->iop) {
+		dev_err(dom->data->dev, "Failed to alloc io pgtable\n");
+		return -EINVAL;
+	}
+
+	/* Update our support page sizes bitmap */
+	mtk_iommu_ops.pgsize_bitmap = dom->cfg.pgsize_bitmap;
+
+	ret = mtk_iommu_hw_init(dom);
+	if (ret)
+		free_io_pgtable_ops(dom->iop);
+
+	return ret;
+}
+
+static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type)
+{
+	struct mtk_iommu_domain *priv;
+
+	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
+		return NULL;
+
+	if (m4udom)/* The m4u domain exist. */
+		return &m4udom->domain;
+
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return NULL;
+
+	priv->domain.geometry.aperture_start = 0;
+	priv->domain.geometry.aperture_end = DMA_BIT_MASK(32);
+	priv->domain.geometry.force_aperture = true;
+
+	m4udom = priv;
+
+	return &priv->domain;
+}
+
+static void mtk_iommu_domain_free(struct iommu_domain *domain)
+{
+	kfree(to_mtk_domain(domain));
+}
+
+static int mtk_iommu_attach_device(struct iommu_domain *domain,
+				   struct device *dev)
+{
+	struct mtk_iommu_domain *priv = to_mtk_domain(domain);
+	struct iommu_group *group;
+	int ret;
+
+	group = iommu_group_get(dev);
+	if (!group)
+		return 0;
+	iommu_group_put(group);
+
+	ret = mtk_iommu_init_domain_context(priv);
+	if (ret)
+		return ret;
+
+	return mtk_iommu_config(priv, dev, true);
+}
+
+static void mtk_iommu_detach_device(struct iommu_domain *domain,
+				    struct device *dev)
+{
+	mtk_iommu_config(to_mtk_domain(domain), dev, false);
+}
+
+static int mtk_iommu_map(struct iommu_domain *domain, unsigned long iova,
+			 phys_addr_t paddr, size_t size, int prot)
+{
+	struct mtk_iommu_domain *priv = to_mtk_domain(domain);
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&priv->pgtlock, flags);
+	ret = priv->iop->map(priv->iop, iova, paddr, size, prot);
+	spin_unlock_irqrestore(&priv->pgtlock, flags);
+
+	return ret;
+}
+
+static size_t mtk_iommu_unmap(struct iommu_domain *domain,
+			      unsigned long iova, size_t size)
+{
+	struct mtk_iommu_domain *priv = to_mtk_domain(domain);
+	unsigned long flags;
+	size_t unmapsize;
+
+	spin_lock_irqsave(&priv->pgtlock, flags);
+	unmapsize = priv->iop->unmap(priv->iop, iova, size);
+	spin_unlock_irqrestore(&priv->pgtlock, flags);
+
+	return unmapsize;
+}
+
+static phys_addr_t mtk_iommu_iova_to_phys(struct iommu_domain *domain,
+					  dma_addr_t iova)
+{
+	struct mtk_iommu_domain *priv = to_mtk_domain(domain);
+	unsigned long flags;
+	phys_addr_t pa;
+
+	spin_lock_irqsave(&priv->pgtlock, flags);
+	pa = priv->iop->iova_to_phys(priv->iop, iova);
+	spin_unlock_irqrestore(&priv->pgtlock, flags);
+
+	return pa;
+}
+
+static int mtk_iommu_add_device(struct device *dev)
+{
+	struct iommu_group *group;
+	int ret;
+
+	if (!dev->archdata.iommu) /* Not a iommu client device */
+		return -ENODEV;
+
+	group = iommu_group_get(dev);
+	if (!group) {
+		group = iommu_group_alloc();
+		if (IS_ERR(group)) {
+			dev_err(dev, "Failed to allocate IOMMU group\n");
+			return PTR_ERR(group);
+		}
+	}
+
+	ret = iommu_group_add_device(group, dev);
+	if (ret) {
+		dev_err(dev, "Failed to add IOMMU group\n");
+		goto err_group_put;
+	}
+
+	ret = iommu_attach_group(&m4udom->domain, group);
+	if (ret)
+		dev_err(dev, "Failed to attach IOMMU group\n");
+
+err_group_put:
+	iommu_group_put(group);
+	return ret;
+}
+
+static void mtk_iommu_remove_device(struct device *dev)
+{
+	if (!dev->archdata.iommu)
+		return;
+
+	iommu_group_remove_device(dev);
+}
+
+static int mtk_iommu_of_xlate(struct device *dev, struct of_phandle_args *args)
+{
+	struct mtk_iommu_client_priv *head, *priv, *next;
+
+	if (args->args_count != 2) {
+		dev_err(dev, "invalid #iommu-cells(%d) property for IOMMU\n",
+			args->args_count);
+		return -EINVAL;
+	}
+
+	if (!dev->archdata.iommu) {
+		head = kzalloc(sizeof(*head), GFP_KERNEL);
+		if (!head)
+			return -ENOMEM;
+
+		dev->archdata.iommu = head;
+		INIT_LIST_HEAD(&head->client);
+	} else {
+		head = dev->archdata.iommu;
+	}
+
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		goto err_free_mem;
+
+	priv->larbid = args->args[0];
+	priv->portid = args->args[1];
+	list_add_tail(&priv->client, &head->client);
+
+	return 0;
+
+err_free_mem:
+	list_for_each_entry_safe(priv, next, &head->client, client)
+		kfree(priv);
+	kfree(head);
+	dev->archdata.iommu = NULL;
+	return -ENOMEM;
+}
+
+static struct iommu_ops mtk_iommu_ops = {
+	.domain_alloc	= mtk_iommu_domain_alloc,
+	.domain_free	= mtk_iommu_domain_free,
+	.attach_dev	= mtk_iommu_attach_device,
+	.detach_dev	= mtk_iommu_detach_device,
+	.map		= mtk_iommu_map,
+	.unmap		= mtk_iommu_unmap,
+	.map_sg		= default_iommu_map_sg,
+	.iova_to_phys	= mtk_iommu_iova_to_phys,
+	.add_device	= mtk_iommu_add_device,
+	.remove_device	= mtk_iommu_remove_device,
+	.of_xlate	= mtk_iommu_of_xlate,
+	.pgsize_bitmap	= SZ_4K | SZ_64K | SZ_1M | SZ_16M,
+};
+
+static const struct of_device_id mtk_iommu_of_ids[] = {
+	{ .compatible = "mediatek,mt8173-m4u", },
+	{}
+};
+
+static int mtk_iommu_init_fn(struct device_node *np)
+{
+	struct platform_device *pdev;
+
+	pdev = of_platform_device_create(np, NULL, platform_bus_type.dev_root);
+	if (IS_ERR(pdev))
+		return PTR_ERR(pdev);
+
+	of_iommu_set_ops(np, &mtk_iommu_ops);
+
+	return 0;
+}
+
+IOMMU_OF_DECLARE(mtkm4u, "mediatek,mt8173-m4u", mtk_iommu_init_fn);
+
+static int mtk_iommu_probe(struct platform_device *pdev)
+{
+	struct mtk_iommu_data   *data;
+	struct device           *dev = &pdev->dev;
+	void __iomem	        *protect;
+	int                     ret;
+
+	data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	/* Protect memory. HW will access here while translation fault.*/
+	protect = devm_kzalloc(dev, MTK_PROTECT_PA_ALIGN * 2, GFP_KERNEL);
+	if (!protect)
+		return -ENOMEM;
+	data->protect_base = virt_to_phys(protect);
+
+	ret = mtk_iommu_parse_dt(pdev, data);
+	if (ret)
+		return ret;
+
+	if (!m4udom)/* There is no iommu client */
+		return 0;
+
+	data->dev = dev;
+	m4udom->data = data;
+	dev_set_drvdata(dev, m4udom);
+
+	return 0;
+}
+
+static int mtk_iommu_remove(struct platform_device *pdev)
+{
+	struct mtk_iommu_domain *mtkdom = dev_get_drvdata(&pdev->dev);
+
+	if (!mtkdom)
+		return 0;
+
+	free_io_pgtable_ops(mtkdom->iop); /* Destroy domain context */
+	clk_disable_unprepare(mtkdom->data->bclk);
+	return 0;
+}
+
+static int mtk_iommu_suspend(struct device *dev)
+{
+	struct mtk_iommu_domain *mtkdom = dev_get_drvdata(dev);
+	struct mtk_iommu_suspend_reg *reg = &mtkdom->data->reg;
+	void __iomem *base = mtkdom->data->base;
+
+	reg->standard_axi_mode = readl_relaxed(base +
+					       REG_MMU_STANDARD_AXI_MODE);
+	reg->dcm_dis = readl_relaxed(base + REG_MMU_DCM_DIS);
+	reg->ctrl_reg = readl_relaxed(base + REG_MMU_CTRL_REG);
+	reg->ivrp_paddr = readl_relaxed(base + REG_MMU_IVRP_PADDR);
+	reg->int_control0 = readl_relaxed(base + REG_MMU_INT_CONTROL0);
+	reg->int_main_control = readl_relaxed(base + REG_MMU_INT_MAIN_CONTROL);
+	return 0;
+}
+
+static int mtk_iommu_resume(struct device *dev)
+{
+	struct mtk_iommu_domain *mtkdom = dev_get_drvdata(dev);
+	struct mtk_iommu_suspend_reg *reg = &mtkdom->data->reg;
+	void __iomem *base = mtkdom->data->base;
+
+	writel_relaxed(mtkdom->cfg.arm_short_cfg.ttbr[0],
+		       base + REG_MMU_PT_BASE_ADDR);
+	writel_relaxed(reg->standard_axi_mode,
+		       base + REG_MMU_STANDARD_AXI_MODE);
+	writel_relaxed(reg->dcm_dis, base + REG_MMU_DCM_DIS);
+	writel_relaxed(reg->ctrl_reg, base + REG_MMU_CTRL_REG);
+	writel_relaxed(reg->ivrp_paddr, base + REG_MMU_IVRP_PADDR);
+	writel_relaxed(reg->int_control0, base + REG_MMU_INT_CONTROL0);
+	writel_relaxed(reg->int_main_control, base + REG_MMU_INT_MAIN_CONTROL);
+	return 0;
+}
+
+const struct dev_pm_ops mtk_iommu_pm_ops = {
+	SET_SYSTEM_SLEEP_PM_OPS(mtk_iommu_suspend, mtk_iommu_resume)
+};
+
+static struct platform_driver mtk_iommu_driver = {
+	.probe	= mtk_iommu_probe,
+	.remove	= mtk_iommu_remove,
+	.driver	= {
+		.name = "mtk-iommu",
+		.of_match_table = mtk_iommu_of_ids,
+		.pm = &mtk_iommu_pm_ops,
+	}
+};
+
+static int __init mtk_iommu_init(void)
+{
+	int ret;
+
+	ret = platform_driver_register(&mtk_iommu_driver);
+	if (ret) {
+		pr_err("%s: Failed to register driver\n", __func__);
+		return ret;
+	}
+
+	if (!iommu_present(&platform_bus_type))
+		bus_set_iommu(&platform_bus_type, &mtk_iommu_ops);
+
+	return 0;
+}
+
+subsys_initcall(mtk_iommu_init);
-- 
1.8.1.1.dirty


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v4 5/6] iommu/mediatek: Add mt8173 IOMMU driver
@ 2015-08-03 10:21   ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-03 10:21 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Mark Rutland, Matthias Brugger
  Cc: k.zhang-NuS5LvNUpcJWk0Htik3J/w,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	pebolle-IWqWACnzNjzz+pZb47iToQ,
	frederic.chen-NuS5LvNUpcJWk0Htik3J/w, arnd-r2nGTMty4D4,
	srv_heupstream-NuS5LvNUpcJWk0Htik3J/w, Catalin Marinas,
	Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tomasz Figa,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	Daniel Kurtz, Sasha Hauer,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	youhua.li-NuS5LvNUpcJWk0Htik3J/w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Lucas Stach

This patch adds support for mediatek m4u (MultiMedia Memory Management
Unit).

Signed-off-by: Yong Wu <yong.wu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
---
 drivers/iommu/Kconfig     |  13 +
 drivers/iommu/Makefile    |   1 +
 drivers/iommu/mtk_iommu.c | 714 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 728 insertions(+)
 create mode 100644 drivers/iommu/mtk_iommu.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 3abd066..f0ae553e 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -386,4 +386,17 @@ config ARM_SMMU_V3
 	  Say Y here if your system includes an IOMMU device implementing
 	  the ARM SMMUv3 architecture.
 
+config MTK_IOMMU
+	bool "MTK IOMMU Support"
+	depends on ARCH_MEDIATEK || COMPILE_TEST
+	select IOMMU_API
+	select IOMMU_DMA
+	select IOMMU_IO_PGTABLE_SHORT
+	select MEMORY
+	select MTK_SMI
+	help
+	  Support for the IOMMUs on certain Mediatek SOCs.
+
+	  If unsure, say N here.
+
 endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 06df3e6..f4f2f2c 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -21,6 +21,7 @@ obj-$(CONFIG_ROCKCHIP_IOMMU) += rockchip-iommu.o
 obj-$(CONFIG_TEGRA_IOMMU_GART) += tegra-gart.o
 obj-$(CONFIG_TEGRA_IOMMU_SMMU) += tegra-smmu.o
 obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
+obj-$(CONFIG_MTK_IOMMU) += mtk_iommu.o
 obj-$(CONFIG_SHMOBILE_IOMMU) += shmobile-iommu.o
 obj-$(CONFIG_SHMOBILE_IPMMU) += shmobile-ipmmu.o
 obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
new file mode 100644
index 0000000..3c4f1b5
--- /dev/null
+++ b/drivers/iommu/mtk_iommu.c
@@ -0,0 +1,714 @@
+/*
+ * Copyright (c) 2014-2015 MediaTek Inc.
+ * Author: Yong Wu <yong.wu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#include <linux/io.h>
+#include <linux/interrupt.h>
+#include <linux/platform_device.h>
+#include <linux/iommu.h>
+#include <linux/dma-mapping.h>
+#include <linux/dma-iommu.h>
+#include <linux/of_iommu.h>
+#include <linux/of_address.h>
+#include <linux/of_irq.h>
+#include <linux/of_platform.h>
+#include <linux/list.h>
+#include <linux/clk.h>
+#include <linux/iopoll.h>
+#include <asm/cacheflush.h>
+#include <soc/mediatek/smi.h>
+
+#include "io-pgtable.h"
+
+#define REG_MMU_PT_BASE_ADDR			0x000
+
+#define REG_MMU_INVALIDATE			0x020
+#define F_ALL_INVLD				0x2
+#define F_MMU_INV_RANGE				0x1
+
+#define REG_MMU_INVLD_START_A			0x024
+#define REG_MMU_INVLD_END_A			0x028
+
+#define REG_MMU_INV_SEL				0x038
+#define F_INVLD_EN0				BIT(0)
+#define F_INVLD_EN1				BIT(1)
+
+#define REG_MMU_STANDARD_AXI_MODE		0x048
+#define REG_MMU_DCM_DIS				0x050
+
+#define REG_MMU_CTRL_REG			0x110
+#define F_MMU_PREFETCH_RT_REPLACE_MOD		BIT(4)
+#define F_MMU_TF_PROTECT_SEL(prot)		(((prot) & 0x3) << 5)
+#define F_COHERENCE_EN				BIT(8)
+
+#define REG_MMU_IVRP_PADDR			0x114
+#define F_MMU_IVRP_PA_SET(pa)			((pa) >> 1)
+
+#define REG_MMU_INT_CONTROL0			0x120
+#define F_L2_MULIT_HIT_EN			BIT(0)
+#define F_TABLE_WALK_FAULT_INT_EN		BIT(1)
+#define F_PREETCH_FIFO_OVERFLOW_INT_EN		BIT(2)
+#define F_MISS_FIFO_OVERFLOW_INT_EN		BIT(3)
+#define F_PREFETCH_FIFO_ERR_INT_EN		BIT(5)
+#define F_MISS_FIFO_ERR_INT_EN			BIT(6)
+#define F_INT_L2_CLR_BIT			BIT(12)
+
+#define REG_MMU_INT_MAIN_CONTROL		0x124
+#define F_INT_TRANSLATION_FAULT			BIT(0)
+#define F_INT_MAIN_MULTI_HIT_FAULT		BIT(1)
+#define F_INT_INVALID_PA_FAULT			BIT(2)
+#define F_INT_ENTRY_REPLACEMENT_FAULT		BIT(3)
+#define F_INT_TLB_MISS_FAULT			BIT(4)
+#define F_INT_MISS_TRANSATION_FIFO_FAULT	BIT(5)
+#define F_INT_PRETETCH_TRANSATION_FIFO_FAULT	BIT(6)
+
+#define REG_MMU_CPE_DONE			0x12C
+
+#define REG_MMU_FAULT_ST1			0x134
+
+#define REG_MMU_FAULT_VA			0x13c
+#define F_MMU_FAULT_VA_MSK			0xfffff000
+#define F_MMU_FAULT_VA_WRITE_BIT		BIT(1)
+#define F_MMU_FAULT_VA_LAYER_BIT		BIT(0)
+
+#define REG_MMU_INVLD_PA			0x140
+#define REG_MMU_INT_ID				0x150
+#define F_MMU0_INT_ID_LARB_ID(a)		(((a) >> 7) & 0x7)
+#define F_MMU0_INT_ID_PORT_ID(a)		(((a) >> 2) & 0x1f)
+
+#define MTK_PROTECT_PA_ALIGN			128
+#define MTK_IOMMU_LARB_MAX_NR			8
+#define MTK_IOMMU_REG_NR			10
+
+struct mtk_iommu_suspend_reg {
+	u32				standard_axi_mode;
+	u32				dcm_dis;
+	u32				ctrl_reg;
+	u32				ivrp_paddr;
+	u32				int_control0;
+	u32				int_main_control;
+};
+
+struct mtk_iommu_data {
+	void __iomem			*base;
+	int				irq;
+	struct device			*dev;
+	struct device			*larbdev[MTK_IOMMU_LARB_MAX_NR];
+	struct clk			*bclk;
+	phys_addr_t			protect_base; /* protect memory base */
+	int				larb_nr;/* local arbiter number */
+	struct mtk_iommu_suspend_reg	reg;
+};
+
+struct mtk_iommu_domain {
+	struct imu_pgd_t		*pgd;
+	spinlock_t			pgtlock; /* lock for page table */
+
+	struct io_pgtable_cfg		cfg;
+	struct io_pgtable_ops		*iop;
+
+	struct mtk_iommu_data		*data;
+	struct iommu_domain		domain;
+};
+
+struct mtk_iommu_client_priv {
+	struct list_head		client;
+	unsigned int			larbid;
+	unsigned int			portid;
+};
+
+static struct iommu_ops mtk_iommu_ops;
+
+/*
+ * There is only one iommu domain called the m4u domain that
+ * all Multimedia modules share.
+ */
+static struct mtk_iommu_domain	*m4udom;
+
+static struct mtk_iommu_domain *to_mtk_domain(struct iommu_domain *dom)
+{
+	return container_of(dom, struct mtk_iommu_domain, domain);
+}
+
+static void mtk_iommu_clear_intr(const struct mtk_iommu_data *data)
+{
+	u32 val;
+
+	val = readl_relaxed(data->base + REG_MMU_INT_CONTROL0);
+	val |= F_INT_L2_CLR_BIT;
+	writel_relaxed(val, data->base + REG_MMU_INT_CONTROL0);
+}
+
+static void mtk_iommu_tlb_flush_all(void *cookie)
+{
+	struct mtk_iommu_domain *domain = cookie;
+	void __iomem *base;
+
+	base = domain->data->base;
+	writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0, base + REG_MMU_INV_SEL);
+	writel_relaxed(F_ALL_INVLD, base + REG_MMU_INVALIDATE);
+	mb();/* Make sure flush all done */
+}
+
+static void mtk_iommu_tlb_add_flush(unsigned long iova, size_t size,
+				    bool leaf, void *cookie)
+{
+	struct mtk_iommu_domain *domain = cookie;
+	void __iomem *base = domain->data->base;
+	unsigned int iova_start = iova, iova_end = iova + size - 1;
+
+	writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0, base + REG_MMU_INV_SEL);
+
+	writel_relaxed(iova_start, base + REG_MMU_INVLD_START_A);
+	writel_relaxed(iova_end, base + REG_MMU_INVLD_END_A);
+	writel_relaxed(F_MMU_INV_RANGE, base + REG_MMU_INVALIDATE);
+}
+
+static void mtk_iommu_tlb_sync(void *cookie)
+{
+	struct mtk_iommu_domain *domain = cookie;
+	void __iomem *base = domain->data->base;
+	int ret;
+	u32 tmp;
+
+	ret = readl_poll_timeout_atomic(base + REG_MMU_CPE_DONE, tmp,
+					tmp != 0, 10, 1000000);
+	if (ret) {
+		dev_warn(domain->data->dev,
+			 "Partial TLB flush timed out, falling back to full flush\n");
+		mtk_iommu_tlb_flush_all(cookie);
+	}
+	writel_relaxed(0, base + REG_MMU_CPE_DONE);
+}
+
+static struct iommu_gather_ops mtk_iommu_gather_ops = {
+	.tlb_flush_all = mtk_iommu_tlb_flush_all,
+	.tlb_add_flush = mtk_iommu_tlb_add_flush,
+	.tlb_sync = mtk_iommu_tlb_sync,
+};
+
+static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
+{
+	struct mtk_iommu_domain *mtkdom = dev_id;
+	struct mtk_iommu_data *data = mtkdom->data;
+	u32 int_state, regval, fault_iova, fault_pa;
+	unsigned int fault_larb, fault_port;
+	bool layer, write;
+
+	int_state = readl_relaxed(data->base + REG_MMU_FAULT_ST1);
+
+	/* read error info from registers */
+	fault_iova = readl_relaxed(data->base + REG_MMU_FAULT_VA);
+	layer = fault_iova & F_MMU_FAULT_VA_LAYER_BIT;
+	write = fault_iova & F_MMU_FAULT_VA_WRITE_BIT;
+	fault_iova &= F_MMU_FAULT_VA_MSK;
+	fault_pa = readl_relaxed(data->base + REG_MMU_INVLD_PA);
+	regval = readl_relaxed(data->base + REG_MMU_INT_ID);
+	fault_larb = F_MMU0_INT_ID_LARB_ID(regval);
+	fault_port = F_MMU0_INT_ID_PORT_ID(regval);
+
+	if (report_iommu_fault(&mtkdom->domain, data->dev, fault_iova,
+			       write ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) {
+		dev_err_ratelimited(
+			data->dev,
+			"fault type=0x%x iova=0x%x pa=0x%x larb=%d port=%d layer=%d %s\n",
+			int_state, fault_iova, fault_pa, fault_larb, fault_port,
+			layer, write ? "write" : "read");
+	}
+
+	mtk_iommu_clear_intr(data);
+	mtk_iommu_tlb_flush_all(mtkdom);
+
+	return IRQ_HANDLED;
+}
+
+static int mtk_iommu_parse_dt(struct platform_device *pdev,
+			      struct mtk_iommu_data *data)
+{
+	struct device *dev = &pdev->dev;
+	struct device_node *ofnode;
+	struct resource *res;
+	int i;
+
+	ofnode = dev->of_node;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	data->base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(data->base))
+		return PTR_ERR(data->base);
+
+	data->irq = platform_get_irq(pdev, 0);
+	if (data->irq < 0)
+		return data->irq;
+
+	data->bclk = devm_clk_get(dev, "bclk");
+	if (IS_ERR(data->bclk))
+		return PTR_ERR(data->bclk);
+
+	data->larb_nr = of_count_phandle_with_args(
+					ofnode, "mediatek,larb", NULL);
+	if (data->larb_nr < 0)
+		return data->larb_nr;
+
+	for (i = 0; i < data->larb_nr; i++) {
+		struct device_node *larbnode;
+		struct platform_device *plarbdev;
+
+		larbnode = of_parse_phandle(ofnode, "mediatek,larb", i);
+		if (!larbnode)
+			return -EINVAL;
+
+		plarbdev = of_find_device_by_node(larbnode);
+		of_node_put(larbnode);
+		if (!plarbdev)
+			return -EPROBE_DEFER;
+		data->larbdev[i] = &plarbdev->dev;
+	}
+
+	return 0;
+}
+
+static int mtk_iommu_hw_init(const struct mtk_iommu_domain *mtkdom)
+{
+	struct mtk_iommu_data *data = mtkdom->data;
+	void __iomem *base = data->base;
+	u32 regval;
+	int ret = 0;
+
+	ret = clk_prepare_enable(data->bclk);
+	if (ret) {
+		dev_err(data->dev, "Failed to enable iommu clk(%d)\n", ret);
+		return ret;
+	}
+
+	writel_relaxed(mtkdom->cfg.arm_short_cfg.ttbr[0],
+		       base + REG_MMU_PT_BASE_ADDR);
+
+	regval = F_MMU_PREFETCH_RT_REPLACE_MOD |
+		F_MMU_TF_PROTECT_SEL(2) |
+		F_COHERENCE_EN;
+	writel_relaxed(regval, base + REG_MMU_CTRL_REG);
+
+	regval = F_L2_MULIT_HIT_EN |
+		F_TABLE_WALK_FAULT_INT_EN |
+		F_PREETCH_FIFO_OVERFLOW_INT_EN |
+		F_MISS_FIFO_OVERFLOW_INT_EN |
+		F_PREFETCH_FIFO_ERR_INT_EN |
+		F_MISS_FIFO_ERR_INT_EN;
+	writel_relaxed(regval, base + REG_MMU_INT_CONTROL0);
+
+	regval = F_INT_TRANSLATION_FAULT |
+		F_INT_MAIN_MULTI_HIT_FAULT |
+		F_INT_INVALID_PA_FAULT |
+		F_INT_ENTRY_REPLACEMENT_FAULT |
+		F_INT_TLB_MISS_FAULT |
+		F_INT_MISS_TRANSATION_FIFO_FAULT |
+		F_INT_PRETETCH_TRANSATION_FIFO_FAULT;
+	writel_relaxed(regval, base + REG_MMU_INT_MAIN_CONTROL);
+
+	regval = ALIGN(data->protect_base, MTK_PROTECT_PA_ALIGN);
+	regval = F_MMU_IVRP_PA_SET(regval);
+	writel_relaxed(regval, base + REG_MMU_IVRP_PADDR);
+
+	writel_relaxed(0, base + REG_MMU_DCM_DIS);
+	writel_relaxed(0, base + REG_MMU_STANDARD_AXI_MODE);
+
+	if (devm_request_irq(data->dev, data->irq, mtk_iommu_isr, 0,
+			     dev_name(data->dev), (void *)mtkdom)) {
+		writel_relaxed(0, base + REG_MMU_PT_BASE_ADDR);
+		clk_disable_unprepare(data->bclk);
+		dev_err(data->dev, "Failed @ IRQ-%d Request\n", data->irq);
+		return -ENODEV;
+	}
+
+	return 0;
+}
+
+static int mtk_iommu_config(struct mtk_iommu_domain *mtkdom,
+			    struct device *dev, bool enable)
+{
+	struct mtk_iommu_data *data = mtkdom->data;
+	struct mtk_iommu_client_priv *head, *cur, *next;
+
+	head = dev->archdata.iommu;
+	list_for_each_entry_safe(cur, next, &head->client, client) {
+		if (cur->larbid >= data->larb_nr) {
+			dev_err(data->dev, "Invalid larb:%d\n", cur->larbid);
+			return -EINVAL;
+		}
+
+		mtk_smi_config_port(data->larbdev[cur->larbid],
+				    cur->portid, enable);
+		if (!enable) {
+			list_del(&cur->client);
+			kfree(cur);
+		}
+	}
+
+	if (!enable) {
+		kfree(head);
+		dev->archdata.iommu = NULL;
+	}
+	return 0;
+}
+
+static int mtk_iommu_init_domain_context(struct mtk_iommu_domain *dom)
+{
+	int ret;
+
+	if (dom->iop)
+		return 0;
+
+	spin_lock_init(&dom->pgtlock);
+	dom->cfg.quirks = IO_PGTABLE_QUIRK_ARM_NS |
+			IO_PGTABLE_QUIRK_SHORT_SUPERSECTION |
+			IO_PGTABLE_QUIRK_SHORT_NO_XN |
+			IO_PGTABLE_QUIRK_SHORT_NO_PERMS;
+	dom->cfg.pgsize_bitmap = mtk_iommu_ops.pgsize_bitmap,
+	dom->cfg.ias = 32;
+	dom->cfg.oas = 32;
+	dom->cfg.tlb = &mtk_iommu_gather_ops;
+	dom->cfg.iommu_dev = dom->data->dev;
+
+	dom->iop = alloc_io_pgtable_ops(ARM_SHORT_DESC, &dom->cfg, dom);
+	if (!dom->iop) {
+		dev_err(dom->data->dev, "Failed to alloc io pgtable\n");
+		return -EINVAL;
+	}
+
+	/* Update our support page sizes bitmap */
+	mtk_iommu_ops.pgsize_bitmap = dom->cfg.pgsize_bitmap;
+
+	ret = mtk_iommu_hw_init(dom);
+	if (ret)
+		free_io_pgtable_ops(dom->iop);
+
+	return ret;
+}
+
+static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type)
+{
+	struct mtk_iommu_domain *priv;
+
+	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
+		return NULL;
+
+	if (m4udom)/* The m4u domain exist. */
+		return &m4udom->domain;
+
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return NULL;
+
+	priv->domain.geometry.aperture_start = 0;
+	priv->domain.geometry.aperture_end = DMA_BIT_MASK(32);
+	priv->domain.geometry.force_aperture = true;
+
+	m4udom = priv;
+
+	return &priv->domain;
+}
+
+static void mtk_iommu_domain_free(struct iommu_domain *domain)
+{
+	kfree(to_mtk_domain(domain));
+}
+
+static int mtk_iommu_attach_device(struct iommu_domain *domain,
+				   struct device *dev)
+{
+	struct mtk_iommu_domain *priv = to_mtk_domain(domain);
+	struct iommu_group *group;
+	int ret;
+
+	group = iommu_group_get(dev);
+	if (!group)
+		return 0;
+	iommu_group_put(group);
+
+	ret = mtk_iommu_init_domain_context(priv);
+	if (ret)
+		return ret;
+
+	return mtk_iommu_config(priv, dev, true);
+}
+
+static void mtk_iommu_detach_device(struct iommu_domain *domain,
+				    struct device *dev)
+{
+	mtk_iommu_config(to_mtk_domain(domain), dev, false);
+}
+
+static int mtk_iommu_map(struct iommu_domain *domain, unsigned long iova,
+			 phys_addr_t paddr, size_t size, int prot)
+{
+	struct mtk_iommu_domain *priv = to_mtk_domain(domain);
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&priv->pgtlock, flags);
+	ret = priv->iop->map(priv->iop, iova, paddr, size, prot);
+	spin_unlock_irqrestore(&priv->pgtlock, flags);
+
+	return ret;
+}
+
+static size_t mtk_iommu_unmap(struct iommu_domain *domain,
+			      unsigned long iova, size_t size)
+{
+	struct mtk_iommu_domain *priv = to_mtk_domain(domain);
+	unsigned long flags;
+	size_t unmapsize;
+
+	spin_lock_irqsave(&priv->pgtlock, flags);
+	unmapsize = priv->iop->unmap(priv->iop, iova, size);
+	spin_unlock_irqrestore(&priv->pgtlock, flags);
+
+	return unmapsize;
+}
+
+static phys_addr_t mtk_iommu_iova_to_phys(struct iommu_domain *domain,
+					  dma_addr_t iova)
+{
+	struct mtk_iommu_domain *priv = to_mtk_domain(domain);
+	unsigned long flags;
+	phys_addr_t pa;
+
+	spin_lock_irqsave(&priv->pgtlock, flags);
+	pa = priv->iop->iova_to_phys(priv->iop, iova);
+	spin_unlock_irqrestore(&priv->pgtlock, flags);
+
+	return pa;
+}
+
+static int mtk_iommu_add_device(struct device *dev)
+{
+	struct iommu_group *group;
+	int ret;
+
+	if (!dev->archdata.iommu) /* Not a iommu client device */
+		return -ENODEV;
+
+	group = iommu_group_get(dev);
+	if (!group) {
+		group = iommu_group_alloc();
+		if (IS_ERR(group)) {
+			dev_err(dev, "Failed to allocate IOMMU group\n");
+			return PTR_ERR(group);
+		}
+	}
+
+	ret = iommu_group_add_device(group, dev);
+	if (ret) {
+		dev_err(dev, "Failed to add IOMMU group\n");
+		goto err_group_put;
+	}
+
+	ret = iommu_attach_group(&m4udom->domain, group);
+	if (ret)
+		dev_err(dev, "Failed to attach IOMMU group\n");
+
+err_group_put:
+	iommu_group_put(group);
+	return ret;
+}
+
+static void mtk_iommu_remove_device(struct device *dev)
+{
+	if (!dev->archdata.iommu)
+		return;
+
+	iommu_group_remove_device(dev);
+}
+
+static int mtk_iommu_of_xlate(struct device *dev, struct of_phandle_args *args)
+{
+	struct mtk_iommu_client_priv *head, *priv, *next;
+
+	if (args->args_count != 2) {
+		dev_err(dev, "invalid #iommu-cells(%d) property for IOMMU\n",
+			args->args_count);
+		return -EINVAL;
+	}
+
+	if (!dev->archdata.iommu) {
+		head = kzalloc(sizeof(*head), GFP_KERNEL);
+		if (!head)
+			return -ENOMEM;
+
+		dev->archdata.iommu = head;
+		INIT_LIST_HEAD(&head->client);
+	} else {
+		head = dev->archdata.iommu;
+	}
+
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		goto err_free_mem;
+
+	priv->larbid = args->args[0];
+	priv->portid = args->args[1];
+	list_add_tail(&priv->client, &head->client);
+
+	return 0;
+
+err_free_mem:
+	list_for_each_entry_safe(priv, next, &head->client, client)
+		kfree(priv);
+	kfree(head);
+	dev->archdata.iommu = NULL;
+	return -ENOMEM;
+}
+
+static struct iommu_ops mtk_iommu_ops = {
+	.domain_alloc	= mtk_iommu_domain_alloc,
+	.domain_free	= mtk_iommu_domain_free,
+	.attach_dev	= mtk_iommu_attach_device,
+	.detach_dev	= mtk_iommu_detach_device,
+	.map		= mtk_iommu_map,
+	.unmap		= mtk_iommu_unmap,
+	.map_sg		= default_iommu_map_sg,
+	.iova_to_phys	= mtk_iommu_iova_to_phys,
+	.add_device	= mtk_iommu_add_device,
+	.remove_device	= mtk_iommu_remove_device,
+	.of_xlate	= mtk_iommu_of_xlate,
+	.pgsize_bitmap	= SZ_4K | SZ_64K | SZ_1M | SZ_16M,
+};
+
+static const struct of_device_id mtk_iommu_of_ids[] = {
+	{ .compatible = "mediatek,mt8173-m4u", },
+	{}
+};
+
+static int mtk_iommu_init_fn(struct device_node *np)
+{
+	struct platform_device *pdev;
+
+	pdev = of_platform_device_create(np, NULL, platform_bus_type.dev_root);
+	if (IS_ERR(pdev))
+		return PTR_ERR(pdev);
+
+	of_iommu_set_ops(np, &mtk_iommu_ops);
+
+	return 0;
+}
+
+IOMMU_OF_DECLARE(mtkm4u, "mediatek,mt8173-m4u", mtk_iommu_init_fn);
+
+static int mtk_iommu_probe(struct platform_device *pdev)
+{
+	struct mtk_iommu_data   *data;
+	struct device           *dev = &pdev->dev;
+	void __iomem	        *protect;
+	int                     ret;
+
+	data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	/* Protect memory. HW will access here while translation fault.*/
+	protect = devm_kzalloc(dev, MTK_PROTECT_PA_ALIGN * 2, GFP_KERNEL);
+	if (!protect)
+		return -ENOMEM;
+	data->protect_base = virt_to_phys(protect);
+
+	ret = mtk_iommu_parse_dt(pdev, data);
+	if (ret)
+		return ret;
+
+	if (!m4udom)/* There is no iommu client */
+		return 0;
+
+	data->dev = dev;
+	m4udom->data = data;
+	dev_set_drvdata(dev, m4udom);
+
+	return 0;
+}
+
+static int mtk_iommu_remove(struct platform_device *pdev)
+{
+	struct mtk_iommu_domain *mtkdom = dev_get_drvdata(&pdev->dev);
+
+	if (!mtkdom)
+		return 0;
+
+	free_io_pgtable_ops(mtkdom->iop); /* Destroy domain context */
+	clk_disable_unprepare(mtkdom->data->bclk);
+	return 0;
+}
+
+static int mtk_iommu_suspend(struct device *dev)
+{
+	struct mtk_iommu_domain *mtkdom = dev_get_drvdata(dev);
+	struct mtk_iommu_suspend_reg *reg = &mtkdom->data->reg;
+	void __iomem *base = mtkdom->data->base;
+
+	reg->standard_axi_mode = readl_relaxed(base +
+					       REG_MMU_STANDARD_AXI_MODE);
+	reg->dcm_dis = readl_relaxed(base + REG_MMU_DCM_DIS);
+	reg->ctrl_reg = readl_relaxed(base + REG_MMU_CTRL_REG);
+	reg->ivrp_paddr = readl_relaxed(base + REG_MMU_IVRP_PADDR);
+	reg->int_control0 = readl_relaxed(base + REG_MMU_INT_CONTROL0);
+	reg->int_main_control = readl_relaxed(base + REG_MMU_INT_MAIN_CONTROL);
+	return 0;
+}
+
+static int mtk_iommu_resume(struct device *dev)
+{
+	struct mtk_iommu_domain *mtkdom = dev_get_drvdata(dev);
+	struct mtk_iommu_suspend_reg *reg = &mtkdom->data->reg;
+	void __iomem *base = mtkdom->data->base;
+
+	writel_relaxed(mtkdom->cfg.arm_short_cfg.ttbr[0],
+		       base + REG_MMU_PT_BASE_ADDR);
+	writel_relaxed(reg->standard_axi_mode,
+		       base + REG_MMU_STANDARD_AXI_MODE);
+	writel_relaxed(reg->dcm_dis, base + REG_MMU_DCM_DIS);
+	writel_relaxed(reg->ctrl_reg, base + REG_MMU_CTRL_REG);
+	writel_relaxed(reg->ivrp_paddr, base + REG_MMU_IVRP_PADDR);
+	writel_relaxed(reg->int_control0, base + REG_MMU_INT_CONTROL0);
+	writel_relaxed(reg->int_main_control, base + REG_MMU_INT_MAIN_CONTROL);
+	return 0;
+}
+
+const struct dev_pm_ops mtk_iommu_pm_ops = {
+	SET_SYSTEM_SLEEP_PM_OPS(mtk_iommu_suspend, mtk_iommu_resume)
+};
+
+static struct platform_driver mtk_iommu_driver = {
+	.probe	= mtk_iommu_probe,
+	.remove	= mtk_iommu_remove,
+	.driver	= {
+		.name = "mtk-iommu",
+		.of_match_table = mtk_iommu_of_ids,
+		.pm = &mtk_iommu_pm_ops,
+	}
+};
+
+static int __init mtk_iommu_init(void)
+{
+	int ret;
+
+	ret = platform_driver_register(&mtk_iommu_driver);
+	if (ret) {
+		pr_err("%s: Failed to register driver\n", __func__);
+		return ret;
+	}
+
+	if (!iommu_present(&platform_bus_type))
+		bus_set_iommu(&platform_bus_type, &mtk_iommu_ops);
+
+	return 0;
+}
+
+subsys_initcall(mtk_iommu_init);
-- 
1.8.1.1.dirty

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v4 5/6] iommu/mediatek: Add mt8173 IOMMU driver
@ 2015-08-03 10:21   ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-03 10:21 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for mediatek m4u (MultiMedia Memory Management
Unit).

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
---
 drivers/iommu/Kconfig     |  13 +
 drivers/iommu/Makefile    |   1 +
 drivers/iommu/mtk_iommu.c | 714 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 728 insertions(+)
 create mode 100644 drivers/iommu/mtk_iommu.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 3abd066..f0ae553e 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -386,4 +386,17 @@ config ARM_SMMU_V3
 	  Say Y here if your system includes an IOMMU device implementing
 	  the ARM SMMUv3 architecture.
 
+config MTK_IOMMU
+	bool "MTK IOMMU Support"
+	depends on ARCH_MEDIATEK || COMPILE_TEST
+	select IOMMU_API
+	select IOMMU_DMA
+	select IOMMU_IO_PGTABLE_SHORT
+	select MEMORY
+	select MTK_SMI
+	help
+	  Support for the IOMMUs on certain Mediatek SOCs.
+
+	  If unsure, say N here.
+
 endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 06df3e6..f4f2f2c 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -21,6 +21,7 @@ obj-$(CONFIG_ROCKCHIP_IOMMU) += rockchip-iommu.o
 obj-$(CONFIG_TEGRA_IOMMU_GART) += tegra-gart.o
 obj-$(CONFIG_TEGRA_IOMMU_SMMU) += tegra-smmu.o
 obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
+obj-$(CONFIG_MTK_IOMMU) += mtk_iommu.o
 obj-$(CONFIG_SHMOBILE_IOMMU) += shmobile-iommu.o
 obj-$(CONFIG_SHMOBILE_IPMMU) += shmobile-ipmmu.o
 obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
new file mode 100644
index 0000000..3c4f1b5
--- /dev/null
+++ b/drivers/iommu/mtk_iommu.c
@@ -0,0 +1,714 @@
+/*
+ * Copyright (c) 2014-2015 MediaTek Inc.
+ * Author: Yong Wu <yong.wu@mediatek.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#include <linux/io.h>
+#include <linux/interrupt.h>
+#include <linux/platform_device.h>
+#include <linux/iommu.h>
+#include <linux/dma-mapping.h>
+#include <linux/dma-iommu.h>
+#include <linux/of_iommu.h>
+#include <linux/of_address.h>
+#include <linux/of_irq.h>
+#include <linux/of_platform.h>
+#include <linux/list.h>
+#include <linux/clk.h>
+#include <linux/iopoll.h>
+#include <asm/cacheflush.h>
+#include <soc/mediatek/smi.h>
+
+#include "io-pgtable.h"
+
+#define REG_MMU_PT_BASE_ADDR			0x000
+
+#define REG_MMU_INVALIDATE			0x020
+#define F_ALL_INVLD				0x2
+#define F_MMU_INV_RANGE				0x1
+
+#define REG_MMU_INVLD_START_A			0x024
+#define REG_MMU_INVLD_END_A			0x028
+
+#define REG_MMU_INV_SEL				0x038
+#define F_INVLD_EN0				BIT(0)
+#define F_INVLD_EN1				BIT(1)
+
+#define REG_MMU_STANDARD_AXI_MODE		0x048
+#define REG_MMU_DCM_DIS				0x050
+
+#define REG_MMU_CTRL_REG			0x110
+#define F_MMU_PREFETCH_RT_REPLACE_MOD		BIT(4)
+#define F_MMU_TF_PROTECT_SEL(prot)		(((prot) & 0x3) << 5)
+#define F_COHERENCE_EN				BIT(8)
+
+#define REG_MMU_IVRP_PADDR			0x114
+#define F_MMU_IVRP_PA_SET(pa)			((pa) >> 1)
+
+#define REG_MMU_INT_CONTROL0			0x120
+#define F_L2_MULIT_HIT_EN			BIT(0)
+#define F_TABLE_WALK_FAULT_INT_EN		BIT(1)
+#define F_PREETCH_FIFO_OVERFLOW_INT_EN		BIT(2)
+#define F_MISS_FIFO_OVERFLOW_INT_EN		BIT(3)
+#define F_PREFETCH_FIFO_ERR_INT_EN		BIT(5)
+#define F_MISS_FIFO_ERR_INT_EN			BIT(6)
+#define F_INT_L2_CLR_BIT			BIT(12)
+
+#define REG_MMU_INT_MAIN_CONTROL		0x124
+#define F_INT_TRANSLATION_FAULT			BIT(0)
+#define F_INT_MAIN_MULTI_HIT_FAULT		BIT(1)
+#define F_INT_INVALID_PA_FAULT			BIT(2)
+#define F_INT_ENTRY_REPLACEMENT_FAULT		BIT(3)
+#define F_INT_TLB_MISS_FAULT			BIT(4)
+#define F_INT_MISS_TRANSATION_FIFO_FAULT	BIT(5)
+#define F_INT_PRETETCH_TRANSATION_FIFO_FAULT	BIT(6)
+
+#define REG_MMU_CPE_DONE			0x12C
+
+#define REG_MMU_FAULT_ST1			0x134
+
+#define REG_MMU_FAULT_VA			0x13c
+#define F_MMU_FAULT_VA_MSK			0xfffff000
+#define F_MMU_FAULT_VA_WRITE_BIT		BIT(1)
+#define F_MMU_FAULT_VA_LAYER_BIT		BIT(0)
+
+#define REG_MMU_INVLD_PA			0x140
+#define REG_MMU_INT_ID				0x150
+#define F_MMU0_INT_ID_LARB_ID(a)		(((a) >> 7) & 0x7)
+#define F_MMU0_INT_ID_PORT_ID(a)		(((a) >> 2) & 0x1f)
+
+#define MTK_PROTECT_PA_ALIGN			128
+#define MTK_IOMMU_LARB_MAX_NR			8
+#define MTK_IOMMU_REG_NR			10
+
+struct mtk_iommu_suspend_reg {
+	u32				standard_axi_mode;
+	u32				dcm_dis;
+	u32				ctrl_reg;
+	u32				ivrp_paddr;
+	u32				int_control0;
+	u32				int_main_control;
+};
+
+struct mtk_iommu_data {
+	void __iomem			*base;
+	int				irq;
+	struct device			*dev;
+	struct device			*larbdev[MTK_IOMMU_LARB_MAX_NR];
+	struct clk			*bclk;
+	phys_addr_t			protect_base; /* protect memory base */
+	int				larb_nr;/* local arbiter number */
+	struct mtk_iommu_suspend_reg	reg;
+};
+
+struct mtk_iommu_domain {
+	struct imu_pgd_t		*pgd;
+	spinlock_t			pgtlock; /* lock for page table */
+
+	struct io_pgtable_cfg		cfg;
+	struct io_pgtable_ops		*iop;
+
+	struct mtk_iommu_data		*data;
+	struct iommu_domain		domain;
+};
+
+struct mtk_iommu_client_priv {
+	struct list_head		client;
+	unsigned int			larbid;
+	unsigned int			portid;
+};
+
+static struct iommu_ops mtk_iommu_ops;
+
+/*
+ * There is only one iommu domain called the m4u domain that
+ * all Multimedia modules share.
+ */
+static struct mtk_iommu_domain	*m4udom;
+
+static struct mtk_iommu_domain *to_mtk_domain(struct iommu_domain *dom)
+{
+	return container_of(dom, struct mtk_iommu_domain, domain);
+}
+
+static void mtk_iommu_clear_intr(const struct mtk_iommu_data *data)
+{
+	u32 val;
+
+	val = readl_relaxed(data->base + REG_MMU_INT_CONTROL0);
+	val |= F_INT_L2_CLR_BIT;
+	writel_relaxed(val, data->base + REG_MMU_INT_CONTROL0);
+}
+
+static void mtk_iommu_tlb_flush_all(void *cookie)
+{
+	struct mtk_iommu_domain *domain = cookie;
+	void __iomem *base;
+
+	base = domain->data->base;
+	writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0, base + REG_MMU_INV_SEL);
+	writel_relaxed(F_ALL_INVLD, base + REG_MMU_INVALIDATE);
+	mb();/* Make sure flush all done */
+}
+
+static void mtk_iommu_tlb_add_flush(unsigned long iova, size_t size,
+				    bool leaf, void *cookie)
+{
+	struct mtk_iommu_domain *domain = cookie;
+	void __iomem *base = domain->data->base;
+	unsigned int iova_start = iova, iova_end = iova + size - 1;
+
+	writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0, base + REG_MMU_INV_SEL);
+
+	writel_relaxed(iova_start, base + REG_MMU_INVLD_START_A);
+	writel_relaxed(iova_end, base + REG_MMU_INVLD_END_A);
+	writel_relaxed(F_MMU_INV_RANGE, base + REG_MMU_INVALIDATE);
+}
+
+static void mtk_iommu_tlb_sync(void *cookie)
+{
+	struct mtk_iommu_domain *domain = cookie;
+	void __iomem *base = domain->data->base;
+	int ret;
+	u32 tmp;
+
+	ret = readl_poll_timeout_atomic(base + REG_MMU_CPE_DONE, tmp,
+					tmp != 0, 10, 1000000);
+	if (ret) {
+		dev_warn(domain->data->dev,
+			 "Partial TLB flush timed out, falling back to full flush\n");
+		mtk_iommu_tlb_flush_all(cookie);
+	}
+	writel_relaxed(0, base + REG_MMU_CPE_DONE);
+}
+
+static struct iommu_gather_ops mtk_iommu_gather_ops = {
+	.tlb_flush_all = mtk_iommu_tlb_flush_all,
+	.tlb_add_flush = mtk_iommu_tlb_add_flush,
+	.tlb_sync = mtk_iommu_tlb_sync,
+};
+
+static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
+{
+	struct mtk_iommu_domain *mtkdom = dev_id;
+	struct mtk_iommu_data *data = mtkdom->data;
+	u32 int_state, regval, fault_iova, fault_pa;
+	unsigned int fault_larb, fault_port;
+	bool layer, write;
+
+	int_state = readl_relaxed(data->base + REG_MMU_FAULT_ST1);
+
+	/* read error info from registers */
+	fault_iova = readl_relaxed(data->base + REG_MMU_FAULT_VA);
+	layer = fault_iova & F_MMU_FAULT_VA_LAYER_BIT;
+	write = fault_iova & F_MMU_FAULT_VA_WRITE_BIT;
+	fault_iova &= F_MMU_FAULT_VA_MSK;
+	fault_pa = readl_relaxed(data->base + REG_MMU_INVLD_PA);
+	regval = readl_relaxed(data->base + REG_MMU_INT_ID);
+	fault_larb = F_MMU0_INT_ID_LARB_ID(regval);
+	fault_port = F_MMU0_INT_ID_PORT_ID(regval);
+
+	if (report_iommu_fault(&mtkdom->domain, data->dev, fault_iova,
+			       write ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) {
+		dev_err_ratelimited(
+			data->dev,
+			"fault type=0x%x iova=0x%x pa=0x%x larb=%d port=%d layer=%d %s\n",
+			int_state, fault_iova, fault_pa, fault_larb, fault_port,
+			layer, write ? "write" : "read");
+	}
+
+	mtk_iommu_clear_intr(data);
+	mtk_iommu_tlb_flush_all(mtkdom);
+
+	return IRQ_HANDLED;
+}
+
+static int mtk_iommu_parse_dt(struct platform_device *pdev,
+			      struct mtk_iommu_data *data)
+{
+	struct device *dev = &pdev->dev;
+	struct device_node *ofnode;
+	struct resource *res;
+	int i;
+
+	ofnode = dev->of_node;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	data->base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(data->base))
+		return PTR_ERR(data->base);
+
+	data->irq = platform_get_irq(pdev, 0);
+	if (data->irq < 0)
+		return data->irq;
+
+	data->bclk = devm_clk_get(dev, "bclk");
+	if (IS_ERR(data->bclk))
+		return PTR_ERR(data->bclk);
+
+	data->larb_nr = of_count_phandle_with_args(
+					ofnode, "mediatek,larb", NULL);
+	if (data->larb_nr < 0)
+		return data->larb_nr;
+
+	for (i = 0; i < data->larb_nr; i++) {
+		struct device_node *larbnode;
+		struct platform_device *plarbdev;
+
+		larbnode = of_parse_phandle(ofnode, "mediatek,larb", i);
+		if (!larbnode)
+			return -EINVAL;
+
+		plarbdev = of_find_device_by_node(larbnode);
+		of_node_put(larbnode);
+		if (!plarbdev)
+			return -EPROBE_DEFER;
+		data->larbdev[i] = &plarbdev->dev;
+	}
+
+	return 0;
+}
+
+static int mtk_iommu_hw_init(const struct mtk_iommu_domain *mtkdom)
+{
+	struct mtk_iommu_data *data = mtkdom->data;
+	void __iomem *base = data->base;
+	u32 regval;
+	int ret = 0;
+
+	ret = clk_prepare_enable(data->bclk);
+	if (ret) {
+		dev_err(data->dev, "Failed to enable iommu clk(%d)\n", ret);
+		return ret;
+	}
+
+	writel_relaxed(mtkdom->cfg.arm_short_cfg.ttbr[0],
+		       base + REG_MMU_PT_BASE_ADDR);
+
+	regval = F_MMU_PREFETCH_RT_REPLACE_MOD |
+		F_MMU_TF_PROTECT_SEL(2) |
+		F_COHERENCE_EN;
+	writel_relaxed(regval, base + REG_MMU_CTRL_REG);
+
+	regval = F_L2_MULIT_HIT_EN |
+		F_TABLE_WALK_FAULT_INT_EN |
+		F_PREETCH_FIFO_OVERFLOW_INT_EN |
+		F_MISS_FIFO_OVERFLOW_INT_EN |
+		F_PREFETCH_FIFO_ERR_INT_EN |
+		F_MISS_FIFO_ERR_INT_EN;
+	writel_relaxed(regval, base + REG_MMU_INT_CONTROL0);
+
+	regval = F_INT_TRANSLATION_FAULT |
+		F_INT_MAIN_MULTI_HIT_FAULT |
+		F_INT_INVALID_PA_FAULT |
+		F_INT_ENTRY_REPLACEMENT_FAULT |
+		F_INT_TLB_MISS_FAULT |
+		F_INT_MISS_TRANSATION_FIFO_FAULT |
+		F_INT_PRETETCH_TRANSATION_FIFO_FAULT;
+	writel_relaxed(regval, base + REG_MMU_INT_MAIN_CONTROL);
+
+	regval = ALIGN(data->protect_base, MTK_PROTECT_PA_ALIGN);
+	regval = F_MMU_IVRP_PA_SET(regval);
+	writel_relaxed(regval, base + REG_MMU_IVRP_PADDR);
+
+	writel_relaxed(0, base + REG_MMU_DCM_DIS);
+	writel_relaxed(0, base + REG_MMU_STANDARD_AXI_MODE);
+
+	if (devm_request_irq(data->dev, data->irq, mtk_iommu_isr, 0,
+			     dev_name(data->dev), (void *)mtkdom)) {
+		writel_relaxed(0, base + REG_MMU_PT_BASE_ADDR);
+		clk_disable_unprepare(data->bclk);
+		dev_err(data->dev, "Failed @ IRQ-%d Request\n", data->irq);
+		return -ENODEV;
+	}
+
+	return 0;
+}
+
+static int mtk_iommu_config(struct mtk_iommu_domain *mtkdom,
+			    struct device *dev, bool enable)
+{
+	struct mtk_iommu_data *data = mtkdom->data;
+	struct mtk_iommu_client_priv *head, *cur, *next;
+
+	head = dev->archdata.iommu;
+	list_for_each_entry_safe(cur, next, &head->client, client) {
+		if (cur->larbid >= data->larb_nr) {
+			dev_err(data->dev, "Invalid larb:%d\n", cur->larbid);
+			return -EINVAL;
+		}
+
+		mtk_smi_config_port(data->larbdev[cur->larbid],
+				    cur->portid, enable);
+		if (!enable) {
+			list_del(&cur->client);
+			kfree(cur);
+		}
+	}
+
+	if (!enable) {
+		kfree(head);
+		dev->archdata.iommu = NULL;
+	}
+	return 0;
+}
+
+static int mtk_iommu_init_domain_context(struct mtk_iommu_domain *dom)
+{
+	int ret;
+
+	if (dom->iop)
+		return 0;
+
+	spin_lock_init(&dom->pgtlock);
+	dom->cfg.quirks = IO_PGTABLE_QUIRK_ARM_NS |
+			IO_PGTABLE_QUIRK_SHORT_SUPERSECTION |
+			IO_PGTABLE_QUIRK_SHORT_NO_XN |
+			IO_PGTABLE_QUIRK_SHORT_NO_PERMS;
+	dom->cfg.pgsize_bitmap = mtk_iommu_ops.pgsize_bitmap,
+	dom->cfg.ias = 32;
+	dom->cfg.oas = 32;
+	dom->cfg.tlb = &mtk_iommu_gather_ops;
+	dom->cfg.iommu_dev = dom->data->dev;
+
+	dom->iop = alloc_io_pgtable_ops(ARM_SHORT_DESC, &dom->cfg, dom);
+	if (!dom->iop) {
+		dev_err(dom->data->dev, "Failed to alloc io pgtable\n");
+		return -EINVAL;
+	}
+
+	/* Update our support page sizes bitmap */
+	mtk_iommu_ops.pgsize_bitmap = dom->cfg.pgsize_bitmap;
+
+	ret = mtk_iommu_hw_init(dom);
+	if (ret)
+		free_io_pgtable_ops(dom->iop);
+
+	return ret;
+}
+
+static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type)
+{
+	struct mtk_iommu_domain *priv;
+
+	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
+		return NULL;
+
+	if (m4udom)/* The m4u domain exist. */
+		return &m4udom->domain;
+
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return NULL;
+
+	priv->domain.geometry.aperture_start = 0;
+	priv->domain.geometry.aperture_end = DMA_BIT_MASK(32);
+	priv->domain.geometry.force_aperture = true;
+
+	m4udom = priv;
+
+	return &priv->domain;
+}
+
+static void mtk_iommu_domain_free(struct iommu_domain *domain)
+{
+	kfree(to_mtk_domain(domain));
+}
+
+static int mtk_iommu_attach_device(struct iommu_domain *domain,
+				   struct device *dev)
+{
+	struct mtk_iommu_domain *priv = to_mtk_domain(domain);
+	struct iommu_group *group;
+	int ret;
+
+	group = iommu_group_get(dev);
+	if (!group)
+		return 0;
+	iommu_group_put(group);
+
+	ret = mtk_iommu_init_domain_context(priv);
+	if (ret)
+		return ret;
+
+	return mtk_iommu_config(priv, dev, true);
+}
+
+static void mtk_iommu_detach_device(struct iommu_domain *domain,
+				    struct device *dev)
+{
+	mtk_iommu_config(to_mtk_domain(domain), dev, false);
+}
+
+static int mtk_iommu_map(struct iommu_domain *domain, unsigned long iova,
+			 phys_addr_t paddr, size_t size, int prot)
+{
+	struct mtk_iommu_domain *priv = to_mtk_domain(domain);
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(&priv->pgtlock, flags);
+	ret = priv->iop->map(priv->iop, iova, paddr, size, prot);
+	spin_unlock_irqrestore(&priv->pgtlock, flags);
+
+	return ret;
+}
+
+static size_t mtk_iommu_unmap(struct iommu_domain *domain,
+			      unsigned long iova, size_t size)
+{
+	struct mtk_iommu_domain *priv = to_mtk_domain(domain);
+	unsigned long flags;
+	size_t unmapsize;
+
+	spin_lock_irqsave(&priv->pgtlock, flags);
+	unmapsize = priv->iop->unmap(priv->iop, iova, size);
+	spin_unlock_irqrestore(&priv->pgtlock, flags);
+
+	return unmapsize;
+}
+
+static phys_addr_t mtk_iommu_iova_to_phys(struct iommu_domain *domain,
+					  dma_addr_t iova)
+{
+	struct mtk_iommu_domain *priv = to_mtk_domain(domain);
+	unsigned long flags;
+	phys_addr_t pa;
+
+	spin_lock_irqsave(&priv->pgtlock, flags);
+	pa = priv->iop->iova_to_phys(priv->iop, iova);
+	spin_unlock_irqrestore(&priv->pgtlock, flags);
+
+	return pa;
+}
+
+static int mtk_iommu_add_device(struct device *dev)
+{
+	struct iommu_group *group;
+	int ret;
+
+	if (!dev->archdata.iommu) /* Not a iommu client device */
+		return -ENODEV;
+
+	group = iommu_group_get(dev);
+	if (!group) {
+		group = iommu_group_alloc();
+		if (IS_ERR(group)) {
+			dev_err(dev, "Failed to allocate IOMMU group\n");
+			return PTR_ERR(group);
+		}
+	}
+
+	ret = iommu_group_add_device(group, dev);
+	if (ret) {
+		dev_err(dev, "Failed to add IOMMU group\n");
+		goto err_group_put;
+	}
+
+	ret = iommu_attach_group(&m4udom->domain, group);
+	if (ret)
+		dev_err(dev, "Failed to attach IOMMU group\n");
+
+err_group_put:
+	iommu_group_put(group);
+	return ret;
+}
+
+static void mtk_iommu_remove_device(struct device *dev)
+{
+	if (!dev->archdata.iommu)
+		return;
+
+	iommu_group_remove_device(dev);
+}
+
+static int mtk_iommu_of_xlate(struct device *dev, struct of_phandle_args *args)
+{
+	struct mtk_iommu_client_priv *head, *priv, *next;
+
+	if (args->args_count != 2) {
+		dev_err(dev, "invalid #iommu-cells(%d) property for IOMMU\n",
+			args->args_count);
+		return -EINVAL;
+	}
+
+	if (!dev->archdata.iommu) {
+		head = kzalloc(sizeof(*head), GFP_KERNEL);
+		if (!head)
+			return -ENOMEM;
+
+		dev->archdata.iommu = head;
+		INIT_LIST_HEAD(&head->client);
+	} else {
+		head = dev->archdata.iommu;
+	}
+
+	priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		goto err_free_mem;
+
+	priv->larbid = args->args[0];
+	priv->portid = args->args[1];
+	list_add_tail(&priv->client, &head->client);
+
+	return 0;
+
+err_free_mem:
+	list_for_each_entry_safe(priv, next, &head->client, client)
+		kfree(priv);
+	kfree(head);
+	dev->archdata.iommu = NULL;
+	return -ENOMEM;
+}
+
+static struct iommu_ops mtk_iommu_ops = {
+	.domain_alloc	= mtk_iommu_domain_alloc,
+	.domain_free	= mtk_iommu_domain_free,
+	.attach_dev	= mtk_iommu_attach_device,
+	.detach_dev	= mtk_iommu_detach_device,
+	.map		= mtk_iommu_map,
+	.unmap		= mtk_iommu_unmap,
+	.map_sg		= default_iommu_map_sg,
+	.iova_to_phys	= mtk_iommu_iova_to_phys,
+	.add_device	= mtk_iommu_add_device,
+	.remove_device	= mtk_iommu_remove_device,
+	.of_xlate	= mtk_iommu_of_xlate,
+	.pgsize_bitmap	= SZ_4K | SZ_64K | SZ_1M | SZ_16M,
+};
+
+static const struct of_device_id mtk_iommu_of_ids[] = {
+	{ .compatible = "mediatek,mt8173-m4u", },
+	{}
+};
+
+static int mtk_iommu_init_fn(struct device_node *np)
+{
+	struct platform_device *pdev;
+
+	pdev = of_platform_device_create(np, NULL, platform_bus_type.dev_root);
+	if (IS_ERR(pdev))
+		return PTR_ERR(pdev);
+
+	of_iommu_set_ops(np, &mtk_iommu_ops);
+
+	return 0;
+}
+
+IOMMU_OF_DECLARE(mtkm4u, "mediatek,mt8173-m4u", mtk_iommu_init_fn);
+
+static int mtk_iommu_probe(struct platform_device *pdev)
+{
+	struct mtk_iommu_data   *data;
+	struct device           *dev = &pdev->dev;
+	void __iomem	        *protect;
+	int                     ret;
+
+	data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	/* Protect memory. HW will access here while translation fault.*/
+	protect = devm_kzalloc(dev, MTK_PROTECT_PA_ALIGN * 2, GFP_KERNEL);
+	if (!protect)
+		return -ENOMEM;
+	data->protect_base = virt_to_phys(protect);
+
+	ret = mtk_iommu_parse_dt(pdev, data);
+	if (ret)
+		return ret;
+
+	if (!m4udom)/* There is no iommu client */
+		return 0;
+
+	data->dev = dev;
+	m4udom->data = data;
+	dev_set_drvdata(dev, m4udom);
+
+	return 0;
+}
+
+static int mtk_iommu_remove(struct platform_device *pdev)
+{
+	struct mtk_iommu_domain *mtkdom = dev_get_drvdata(&pdev->dev);
+
+	if (!mtkdom)
+		return 0;
+
+	free_io_pgtable_ops(mtkdom->iop); /* Destroy domain context */
+	clk_disable_unprepare(mtkdom->data->bclk);
+	return 0;
+}
+
+static int mtk_iommu_suspend(struct device *dev)
+{
+	struct mtk_iommu_domain *mtkdom = dev_get_drvdata(dev);
+	struct mtk_iommu_suspend_reg *reg = &mtkdom->data->reg;
+	void __iomem *base = mtkdom->data->base;
+
+	reg->standard_axi_mode = readl_relaxed(base +
+					       REG_MMU_STANDARD_AXI_MODE);
+	reg->dcm_dis = readl_relaxed(base + REG_MMU_DCM_DIS);
+	reg->ctrl_reg = readl_relaxed(base + REG_MMU_CTRL_REG);
+	reg->ivrp_paddr = readl_relaxed(base + REG_MMU_IVRP_PADDR);
+	reg->int_control0 = readl_relaxed(base + REG_MMU_INT_CONTROL0);
+	reg->int_main_control = readl_relaxed(base + REG_MMU_INT_MAIN_CONTROL);
+	return 0;
+}
+
+static int mtk_iommu_resume(struct device *dev)
+{
+	struct mtk_iommu_domain *mtkdom = dev_get_drvdata(dev);
+	struct mtk_iommu_suspend_reg *reg = &mtkdom->data->reg;
+	void __iomem *base = mtkdom->data->base;
+
+	writel_relaxed(mtkdom->cfg.arm_short_cfg.ttbr[0],
+		       base + REG_MMU_PT_BASE_ADDR);
+	writel_relaxed(reg->standard_axi_mode,
+		       base + REG_MMU_STANDARD_AXI_MODE);
+	writel_relaxed(reg->dcm_dis, base + REG_MMU_DCM_DIS);
+	writel_relaxed(reg->ctrl_reg, base + REG_MMU_CTRL_REG);
+	writel_relaxed(reg->ivrp_paddr, base + REG_MMU_IVRP_PADDR);
+	writel_relaxed(reg->int_control0, base + REG_MMU_INT_CONTROL0);
+	writel_relaxed(reg->int_main_control, base + REG_MMU_INT_MAIN_CONTROL);
+	return 0;
+}
+
+const struct dev_pm_ops mtk_iommu_pm_ops = {
+	SET_SYSTEM_SLEEP_PM_OPS(mtk_iommu_suspend, mtk_iommu_resume)
+};
+
+static struct platform_driver mtk_iommu_driver = {
+	.probe	= mtk_iommu_probe,
+	.remove	= mtk_iommu_remove,
+	.driver	= {
+		.name = "mtk-iommu",
+		.of_match_table = mtk_iommu_of_ids,
+		.pm = &mtk_iommu_pm_ops,
+	}
+};
+
+static int __init mtk_iommu_init(void)
+{
+	int ret;
+
+	ret = platform_driver_register(&mtk_iommu_driver);
+	if (ret) {
+		pr_err("%s: Failed to register driver\n", __func__);
+		return ret;
+	}
+
+	if (!iommu_present(&platform_bus_type))
+		bus_set_iommu(&platform_bus_type, &mtk_iommu_ops);
+
+	return 0;
+}
+
+subsys_initcall(mtk_iommu_init);
-- 
1.8.1.1.dirty

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v4 6/6] dts: mt8173: Add iommu/smi nodes for mt8173
@ 2015-08-03 10:21   ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-03 10:21 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Mark Rutland, Matthias Brugger
  Cc: Robin Murphy, Will Deacon, Daniel Kurtz, Tomasz Figa,
	Lucas Stach, Rob Herring, Catalin Marinas, linux-mediatek,
	Sasha Hauer, srv_heupstream, devicetree, linux-kernel,
	linux-arm-kernel, iommu, pebolle, arnd, mitchelh, youhua.li,
	k.zhang, frederic.chen, Yong Wu

This patch add the iommu/larbs nodes for mt8173

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
---
 arch/arm64/boot/dts/mediatek/mt8173.dtsi | 81 ++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/arch/arm64/boot/dts/mediatek/mt8173.dtsi b/arch/arm64/boot/dts/mediatek/mt8173.dtsi
index 6c3f047..a92956d 100644
--- a/arch/arm64/boot/dts/mediatek/mt8173.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8173.dtsi
@@ -14,6 +14,7 @@
 #include <dt-bindings/clock/mt8173-clk.h>
 #include <dt-bindings/interrupt-controller/irq.h>
 #include <dt-bindings/interrupt-controller/arm-gic.h>
+#include <dt-bindings/memory/mt8173-larb-port.h>
 #include <dt-bindings/power/mt8173-power.h>
 #include <dt-bindings/reset-controller/mt8173-resets.h>
 #include "mt8173-pinfunc.h"
@@ -265,6 +266,17 @@
 			reg = <0 0x10200620 0 0x20>;
 		};
 
+		iommu: iommu@10205000 {
+			compatible = "mediatek,mt8173-m4u";
+			reg = <0 0x10205000 0 0x1000>;
+			interrupts = <GIC_SPI 139 IRQ_TYPE_LEVEL_LOW>;
+			clocks = <&infracfg CLK_INFRA_M4U>;
+			clock-names = "bclk";
+			mediatek,larb = <&larb0 &larb1 &larb2
+					 &larb3 &larb4 &larb5>;
+			#iommu-cells = <2>;
+		};
+
 		apmixedsys: clock-controller@10209000 {
 			compatible = "mediatek,mt8173-apmixedsys";
 			reg = <0 0x10209000 0 0x1000>;
@@ -501,29 +513,98 @@
 			#clock-cells = <1>;
 		};
 
+		larb0: larb@14021000 {
+			compatible = "mediatek,mt8173-smi-larb";
+			reg = <0 0x14021000 0 0x1000>;
+			mediatek,smi = <&smi_common>;
+			power-domains = <&scpsys MT8173_POWER_DOMAIN_MM>;
+			clocks = <&mmsys CLK_MM_SMI_LARB0>,
+				 <&mmsys CLK_MM_SMI_LARB0>;
+			clock-names = "apb", "smi";
+		};
+
+		smi_common: smi@14022000 {
+			compatible = "mediatek,mt8173-smi";
+			reg = <0 0x14022000 0 0x1000>;
+			power-domains = <&scpsys MT8173_POWER_DOMAIN_MM>;
+			clocks = <&mmsys CLK_MM_SMI_COMMON>,
+				 <&mmsys CLK_MM_SMI_COMMON>;
+			clock-names = "apb", "smi";
+		};
+
+		larb4: larb@14027000 {
+			compatible = "mediatek,mt8173-smi-larb";
+			reg = <0 0x14027000 0 0x1000>;
+			mediatek,smi = <&smi_common>;
+			power-domains = <&scpsys MT8173_POWER_DOMAIN_MM>;
+			clocks = <&mmsys CLK_MM_SMI_LARB4>,
+				 <&mmsys CLK_MM_SMI_LARB4>;
+			clock-names = "apb", "smi";
+		};
+
 		imgsys: clock-controller@15000000 {
 			compatible = "mediatek,mt8173-imgsys", "syscon";
 			reg = <0 0x15000000 0 0x1000>;
 			#clock-cells = <1>;
 		};
 
+		larb2: larb@15001000 {
+			compatible = "mediatek,mt8173-smi-larb";
+			reg = <0 0x15001000 0 0x1000>;
+			mediatek,smi = <&smi_common>;
+			power-domains = <&scpsys MT8173_POWER_DOMAIN_ISP>;
+			clocks = <&imgsys CLK_IMG_LARB2_SMI>,
+				 <&imgsys CLK_IMG_LARB2_SMI>;
+			clock-names = "apb", "smi";
+		};
+
 		vdecsys: clock-controller@16000000 {
 			compatible = "mediatek,mt8173-vdecsys", "syscon";
 			reg = <0 0x16000000 0 0x1000>;
 			#clock-cells = <1>;
 		};
 
+		larb1: larb@16010000 {
+			compatible = "mediatek,mt8173-smi-larb";
+			reg = <0 0x16010000 0 0x1000>;
+			mediatek,smi = <&smi_common>;
+			power-domains = <&scpsys MT8173_POWER_DOMAIN_VDEC>;
+			clocks = <&vdecsys CLK_VDEC_CKEN>,
+				 <&vdecsys CLK_VDEC_LARB_CKEN>;
+			clock-names = "apb", "smi";
+		};
+
 		vencsys: clock-controller@18000000 {
 			compatible = "mediatek,mt8173-vencsys", "syscon";
 			reg = <0 0x18000000 0 0x1000>;
 			#clock-cells = <1>;
 		};
 
+		larb3: larb@18001000 {
+			compatible = "mediatek,mt8173-smi-larb";
+			reg = <0 0x18001000 0 0x1000>;
+			mediatek,smi = <&smi_common>;
+			power-domains = <&scpsys MT8173_POWER_DOMAIN_VENC>;
+			clocks = <&vencsys CLK_VENC_CKE1>,
+				 <&vencsys CLK_VENC_CKE0>;
+			clock-names = "apb", "smi";
+		};
+
 		vencltsys: clock-controller@19000000 {
 			compatible = "mediatek,mt8173-vencltsys", "syscon";
 			reg = <0 0x19000000 0 0x1000>;
 			#clock-cells = <1>;
 		};
+
+		larb5: larb@19001000 {
+			compatible = "mediatek,mt8173-smi-larb";
+			reg = <0 0x19001000 0 0x1000>;
+			mediatek,smi = <&smi_common>;
+			power-domains = <&scpsys MT8173_POWER_DOMAIN_VENC_LT>;
+			clocks = <&vencltsys CLK_VENCLT_CKE1>,
+				 <&vencltsys CLK_VENCLT_CKE0>;
+			clock-names = "apb", "smi";
+		};
 	};
 };
 
-- 
1.8.1.1.dirty


^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v4 6/6] dts: mt8173: Add iommu/smi nodes for mt8173
@ 2015-08-03 10:21   ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-03 10:21 UTC (permalink / raw)
  To: Joerg Roedel, Thierry Reding, Mark Rutland, Matthias Brugger
  Cc: k.zhang-NuS5LvNUpcJWk0Htik3J/w,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	pebolle-IWqWACnzNjzz+pZb47iToQ,
	frederic.chen-NuS5LvNUpcJWk0Htik3J/w, arnd-r2nGTMty4D4,
	srv_heupstream-NuS5LvNUpcJWk0Htik3J/w, Catalin Marinas,
	Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tomasz Figa,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	Daniel Kurtz, Sasha Hauer,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	youhua.li-NuS5LvNUpcJWk0Htik3J/w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Lucas Stach

This patch add the iommu/larbs nodes for mt8173

Signed-off-by: Yong Wu <yong.wu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
---
 arch/arm64/boot/dts/mediatek/mt8173.dtsi | 81 ++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/arch/arm64/boot/dts/mediatek/mt8173.dtsi b/arch/arm64/boot/dts/mediatek/mt8173.dtsi
index 6c3f047..a92956d 100644
--- a/arch/arm64/boot/dts/mediatek/mt8173.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8173.dtsi
@@ -14,6 +14,7 @@
 #include <dt-bindings/clock/mt8173-clk.h>
 #include <dt-bindings/interrupt-controller/irq.h>
 #include <dt-bindings/interrupt-controller/arm-gic.h>
+#include <dt-bindings/memory/mt8173-larb-port.h>
 #include <dt-bindings/power/mt8173-power.h>
 #include <dt-bindings/reset-controller/mt8173-resets.h>
 #include "mt8173-pinfunc.h"
@@ -265,6 +266,17 @@
 			reg = <0 0x10200620 0 0x20>;
 		};
 
+		iommu: iommu@10205000 {
+			compatible = "mediatek,mt8173-m4u";
+			reg = <0 0x10205000 0 0x1000>;
+			interrupts = <GIC_SPI 139 IRQ_TYPE_LEVEL_LOW>;
+			clocks = <&infracfg CLK_INFRA_M4U>;
+			clock-names = "bclk";
+			mediatek,larb = <&larb0 &larb1 &larb2
+					 &larb3 &larb4 &larb5>;
+			#iommu-cells = <2>;
+		};
+
 		apmixedsys: clock-controller@10209000 {
 			compatible = "mediatek,mt8173-apmixedsys";
 			reg = <0 0x10209000 0 0x1000>;
@@ -501,29 +513,98 @@
 			#clock-cells = <1>;
 		};
 
+		larb0: larb@14021000 {
+			compatible = "mediatek,mt8173-smi-larb";
+			reg = <0 0x14021000 0 0x1000>;
+			mediatek,smi = <&smi_common>;
+			power-domains = <&scpsys MT8173_POWER_DOMAIN_MM>;
+			clocks = <&mmsys CLK_MM_SMI_LARB0>,
+				 <&mmsys CLK_MM_SMI_LARB0>;
+			clock-names = "apb", "smi";
+		};
+
+		smi_common: smi@14022000 {
+			compatible = "mediatek,mt8173-smi";
+			reg = <0 0x14022000 0 0x1000>;
+			power-domains = <&scpsys MT8173_POWER_DOMAIN_MM>;
+			clocks = <&mmsys CLK_MM_SMI_COMMON>,
+				 <&mmsys CLK_MM_SMI_COMMON>;
+			clock-names = "apb", "smi";
+		};
+
+		larb4: larb@14027000 {
+			compatible = "mediatek,mt8173-smi-larb";
+			reg = <0 0x14027000 0 0x1000>;
+			mediatek,smi = <&smi_common>;
+			power-domains = <&scpsys MT8173_POWER_DOMAIN_MM>;
+			clocks = <&mmsys CLK_MM_SMI_LARB4>,
+				 <&mmsys CLK_MM_SMI_LARB4>;
+			clock-names = "apb", "smi";
+		};
+
 		imgsys: clock-controller@15000000 {
 			compatible = "mediatek,mt8173-imgsys", "syscon";
 			reg = <0 0x15000000 0 0x1000>;
 			#clock-cells = <1>;
 		};
 
+		larb2: larb@15001000 {
+			compatible = "mediatek,mt8173-smi-larb";
+			reg = <0 0x15001000 0 0x1000>;
+			mediatek,smi = <&smi_common>;
+			power-domains = <&scpsys MT8173_POWER_DOMAIN_ISP>;
+			clocks = <&imgsys CLK_IMG_LARB2_SMI>,
+				 <&imgsys CLK_IMG_LARB2_SMI>;
+			clock-names = "apb", "smi";
+		};
+
 		vdecsys: clock-controller@16000000 {
 			compatible = "mediatek,mt8173-vdecsys", "syscon";
 			reg = <0 0x16000000 0 0x1000>;
 			#clock-cells = <1>;
 		};
 
+		larb1: larb@16010000 {
+			compatible = "mediatek,mt8173-smi-larb";
+			reg = <0 0x16010000 0 0x1000>;
+			mediatek,smi = <&smi_common>;
+			power-domains = <&scpsys MT8173_POWER_DOMAIN_VDEC>;
+			clocks = <&vdecsys CLK_VDEC_CKEN>,
+				 <&vdecsys CLK_VDEC_LARB_CKEN>;
+			clock-names = "apb", "smi";
+		};
+
 		vencsys: clock-controller@18000000 {
 			compatible = "mediatek,mt8173-vencsys", "syscon";
 			reg = <0 0x18000000 0 0x1000>;
 			#clock-cells = <1>;
 		};
 
+		larb3: larb@18001000 {
+			compatible = "mediatek,mt8173-smi-larb";
+			reg = <0 0x18001000 0 0x1000>;
+			mediatek,smi = <&smi_common>;
+			power-domains = <&scpsys MT8173_POWER_DOMAIN_VENC>;
+			clocks = <&vencsys CLK_VENC_CKE1>,
+				 <&vencsys CLK_VENC_CKE0>;
+			clock-names = "apb", "smi";
+		};
+
 		vencltsys: clock-controller@19000000 {
 			compatible = "mediatek,mt8173-vencltsys", "syscon";
 			reg = <0 0x19000000 0 0x1000>;
 			#clock-cells = <1>;
 		};
+
+		larb5: larb@19001000 {
+			compatible = "mediatek,mt8173-smi-larb";
+			reg = <0 0x19001000 0 0x1000>;
+			mediatek,smi = <&smi_common>;
+			power-domains = <&scpsys MT8173_POWER_DOMAIN_VENC_LT>;
+			clocks = <&vencltsys CLK_VENCLT_CKE1>,
+				 <&vencltsys CLK_VENCLT_CKE0>;
+			clock-names = "apb", "smi";
+		};
 	};
 };
 
-- 
1.8.1.1.dirty

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [PATCH v4 6/6] dts: mt8173: Add iommu/smi nodes for mt8173
@ 2015-08-03 10:21   ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-03 10:21 UTC (permalink / raw)
  To: linux-arm-kernel

This patch add the iommu/larbs nodes for mt8173

Signed-off-by: Yong Wu <yong.wu@mediatek.com>
---
 arch/arm64/boot/dts/mediatek/mt8173.dtsi | 81 ++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/arch/arm64/boot/dts/mediatek/mt8173.dtsi b/arch/arm64/boot/dts/mediatek/mt8173.dtsi
index 6c3f047..a92956d 100644
--- a/arch/arm64/boot/dts/mediatek/mt8173.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8173.dtsi
@@ -14,6 +14,7 @@
 #include <dt-bindings/clock/mt8173-clk.h>
 #include <dt-bindings/interrupt-controller/irq.h>
 #include <dt-bindings/interrupt-controller/arm-gic.h>
+#include <dt-bindings/memory/mt8173-larb-port.h>
 #include <dt-bindings/power/mt8173-power.h>
 #include <dt-bindings/reset-controller/mt8173-resets.h>
 #include "mt8173-pinfunc.h"
@@ -265,6 +266,17 @@
 			reg = <0 0x10200620 0 0x20>;
 		};
 
+		iommu: iommu at 10205000 {
+			compatible = "mediatek,mt8173-m4u";
+			reg = <0 0x10205000 0 0x1000>;
+			interrupts = <GIC_SPI 139 IRQ_TYPE_LEVEL_LOW>;
+			clocks = <&infracfg CLK_INFRA_M4U>;
+			clock-names = "bclk";
+			mediatek,larb = <&larb0 &larb1 &larb2
+					 &larb3 &larb4 &larb5>;
+			#iommu-cells = <2>;
+		};
+
 		apmixedsys: clock-controller at 10209000 {
 			compatible = "mediatek,mt8173-apmixedsys";
 			reg = <0 0x10209000 0 0x1000>;
@@ -501,29 +513,98 @@
 			#clock-cells = <1>;
 		};
 
+		larb0: larb at 14021000 {
+			compatible = "mediatek,mt8173-smi-larb";
+			reg = <0 0x14021000 0 0x1000>;
+			mediatek,smi = <&smi_common>;
+			power-domains = <&scpsys MT8173_POWER_DOMAIN_MM>;
+			clocks = <&mmsys CLK_MM_SMI_LARB0>,
+				 <&mmsys CLK_MM_SMI_LARB0>;
+			clock-names = "apb", "smi";
+		};
+
+		smi_common: smi at 14022000 {
+			compatible = "mediatek,mt8173-smi";
+			reg = <0 0x14022000 0 0x1000>;
+			power-domains = <&scpsys MT8173_POWER_DOMAIN_MM>;
+			clocks = <&mmsys CLK_MM_SMI_COMMON>,
+				 <&mmsys CLK_MM_SMI_COMMON>;
+			clock-names = "apb", "smi";
+		};
+
+		larb4: larb at 14027000 {
+			compatible = "mediatek,mt8173-smi-larb";
+			reg = <0 0x14027000 0 0x1000>;
+			mediatek,smi = <&smi_common>;
+			power-domains = <&scpsys MT8173_POWER_DOMAIN_MM>;
+			clocks = <&mmsys CLK_MM_SMI_LARB4>,
+				 <&mmsys CLK_MM_SMI_LARB4>;
+			clock-names = "apb", "smi";
+		};
+
 		imgsys: clock-controller at 15000000 {
 			compatible = "mediatek,mt8173-imgsys", "syscon";
 			reg = <0 0x15000000 0 0x1000>;
 			#clock-cells = <1>;
 		};
 
+		larb2: larb at 15001000 {
+			compatible = "mediatek,mt8173-smi-larb";
+			reg = <0 0x15001000 0 0x1000>;
+			mediatek,smi = <&smi_common>;
+			power-domains = <&scpsys MT8173_POWER_DOMAIN_ISP>;
+			clocks = <&imgsys CLK_IMG_LARB2_SMI>,
+				 <&imgsys CLK_IMG_LARB2_SMI>;
+			clock-names = "apb", "smi";
+		};
+
 		vdecsys: clock-controller at 16000000 {
 			compatible = "mediatek,mt8173-vdecsys", "syscon";
 			reg = <0 0x16000000 0 0x1000>;
 			#clock-cells = <1>;
 		};
 
+		larb1: larb at 16010000 {
+			compatible = "mediatek,mt8173-smi-larb";
+			reg = <0 0x16010000 0 0x1000>;
+			mediatek,smi = <&smi_common>;
+			power-domains = <&scpsys MT8173_POWER_DOMAIN_VDEC>;
+			clocks = <&vdecsys CLK_VDEC_CKEN>,
+				 <&vdecsys CLK_VDEC_LARB_CKEN>;
+			clock-names = "apb", "smi";
+		};
+
 		vencsys: clock-controller at 18000000 {
 			compatible = "mediatek,mt8173-vencsys", "syscon";
 			reg = <0 0x18000000 0 0x1000>;
 			#clock-cells = <1>;
 		};
 
+		larb3: larb at 18001000 {
+			compatible = "mediatek,mt8173-smi-larb";
+			reg = <0 0x18001000 0 0x1000>;
+			mediatek,smi = <&smi_common>;
+			power-domains = <&scpsys MT8173_POWER_DOMAIN_VENC>;
+			clocks = <&vencsys CLK_VENC_CKE1>,
+				 <&vencsys CLK_VENC_CKE0>;
+			clock-names = "apb", "smi";
+		};
+
 		vencltsys: clock-controller at 19000000 {
 			compatible = "mediatek,mt8173-vencltsys", "syscon";
 			reg = <0 0x19000000 0 0x1000>;
 			#clock-cells = <1>;
 		};
+
+		larb5: larb at 19001000 {
+			compatible = "mediatek,mt8173-smi-larb";
+			reg = <0 0x19001000 0 0x1000>;
+			mediatek,smi = <&smi_common>;
+			power-domains = <&scpsys MT8173_POWER_DOMAIN_VENC_LT>;
+			clocks = <&vencltsys CLK_VENCLT_CKE1>,
+				 <&vencltsys CLK_VENCLT_CKE0>;
+			clock-names = "apb", "smi";
+		};
 	};
 };
 
-- 
1.8.1.1.dirty

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 4/6] memory: mediatek: Add SMI driver
@ 2015-08-11 14:56     ` Joerg Roedel
  0 siblings, 0 replies; 60+ messages in thread
From: Joerg Roedel @ 2015-08-11 14:56 UTC (permalink / raw)
  To: Yong Wu
  Cc: Thierry Reding, Mark Rutland, Matthias Brugger, Robin Murphy,
	Will Deacon, Daniel Kurtz, Tomasz Figa, Lucas Stach, Rob Herring,
	Catalin Marinas, linux-mediatek, Sasha Hauer, srv_heupstream,
	devicetree, linux-kernel, linux-arm-kernel, iommu, pebolle, arnd,
	mitchelh, youhua.li, k.zhang, frederic.chen

On Mon, Aug 03, 2015 at 06:21:17PM +0800, Yong Wu wrote:
> +static int mtk_smi_common_get(struct device *smidev)
> +{
> +	struct mtk_smi_common *smipriv = dev_get_drvdata(smidev);
> +	int ret;
> +
> +	ret = pm_runtime_get_sync(smidev);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = clk_prepare_enable(smipriv->clk_apb);
> +	if (ret) {
> +		dev_err(smidev, "Failed to enable the apb clock\n");
> +		goto err_put_pm;
> +	}
> +	ret = clk_prepare_enable(smipriv->clk_smi);
> +	if (ret) {
> +		dev_err(smidev, "Failed to enable the smi clock\n");
> +		goto err_disable_apb;
> +	}
> +	return ret;
> +
> +err_disable_apb:
> +	clk_disable_unprepare(smipriv->clk_apb);
> +err_put_pm:
> +	pm_runtime_put(smidev);
> +	return ret;
> +}

[...]

> +int mtk_smi_larb_get(struct device *larbdev)
> +{
> +	struct mtk_smi_larb *larbpriv = dev_get_drvdata(larbdev);
> +	struct mtk_larb_mmu *mmucfg = larbdev->archdata.iommu;
> +	int ret;
> +
> +	ret = mtk_smi_common_get(larbpriv->smi);
> +	if (ret)
> +		return ret;
> +
> +	ret = pm_runtime_get_sync(larbdev);
> +	if (ret < 0)
> +		goto err_put_smicommon;
> +
> +	ret = clk_prepare_enable(larbpriv->clk_apb);
> +	if (ret) {
> +		dev_err(larbdev, "Failed to enable the apb clock\n");
> +		goto err_put_pm;
> +	}
> +
> +	ret = clk_prepare_enable(larbpriv->clk_smi);
> +	if (ret) {
> +		dev_err(larbdev, "Failed to enable the smi clock\n");
> +		goto err_disable_apb;
> +	}

The clock enablement looks similar to the function above, maybe move it
to a helper function?


	Joerg


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 4/6] memory: mediatek: Add SMI driver
@ 2015-08-11 14:56     ` Joerg Roedel
  0 siblings, 0 replies; 60+ messages in thread
From: Joerg Roedel @ 2015-08-11 14:56 UTC (permalink / raw)
  To: Yong Wu
  Cc: Mark Rutland, Catalin Marinas, Will Deacon,
	youhua.li-NuS5LvNUpcJWk0Htik3J/w, Thierry Reding,
	k.zhang-NuS5LvNUpcJWk0Htik3J/w,
	devicetree-u79uwXL29TY76Z2rM5mHXA, arnd-r2nGTMty4D4, Tomasz Figa,
	Rob Herring, linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Matthias Brugger,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	pebolle-IWqWACnzNjzz+pZb47iToQ,
	frederic.chen-NuS5LvNUpcJWk0Htik3J/w,
	srv_heupstream-NuS5LvNUpcJWk0Htik3J/w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Daniel Kurtz,
	Sasha Hauer, Lucas Stach

On Mon, Aug 03, 2015 at 06:21:17PM +0800, Yong Wu wrote:
> +static int mtk_smi_common_get(struct device *smidev)
> +{
> +	struct mtk_smi_common *smipriv = dev_get_drvdata(smidev);
> +	int ret;
> +
> +	ret = pm_runtime_get_sync(smidev);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = clk_prepare_enable(smipriv->clk_apb);
> +	if (ret) {
> +		dev_err(smidev, "Failed to enable the apb clock\n");
> +		goto err_put_pm;
> +	}
> +	ret = clk_prepare_enable(smipriv->clk_smi);
> +	if (ret) {
> +		dev_err(smidev, "Failed to enable the smi clock\n");
> +		goto err_disable_apb;
> +	}
> +	return ret;
> +
> +err_disable_apb:
> +	clk_disable_unprepare(smipriv->clk_apb);
> +err_put_pm:
> +	pm_runtime_put(smidev);
> +	return ret;
> +}

[...]

> +int mtk_smi_larb_get(struct device *larbdev)
> +{
> +	struct mtk_smi_larb *larbpriv = dev_get_drvdata(larbdev);
> +	struct mtk_larb_mmu *mmucfg = larbdev->archdata.iommu;
> +	int ret;
> +
> +	ret = mtk_smi_common_get(larbpriv->smi);
> +	if (ret)
> +		return ret;
> +
> +	ret = pm_runtime_get_sync(larbdev);
> +	if (ret < 0)
> +		goto err_put_smicommon;
> +
> +	ret = clk_prepare_enable(larbpriv->clk_apb);
> +	if (ret) {
> +		dev_err(larbdev, "Failed to enable the apb clock\n");
> +		goto err_put_pm;
> +	}
> +
> +	ret = clk_prepare_enable(larbpriv->clk_smi);
> +	if (ret) {
> +		dev_err(larbdev, "Failed to enable the smi clock\n");
> +		goto err_disable_apb;
> +	}

The clock enablement looks similar to the function above, maybe move it
to a helper function?


	Joerg

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v4 4/6] memory: mediatek: Add SMI driver
@ 2015-08-11 14:56     ` Joerg Roedel
  0 siblings, 0 replies; 60+ messages in thread
From: Joerg Roedel @ 2015-08-11 14:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Aug 03, 2015 at 06:21:17PM +0800, Yong Wu wrote:
> +static int mtk_smi_common_get(struct device *smidev)
> +{
> +	struct mtk_smi_common *smipriv = dev_get_drvdata(smidev);
> +	int ret;
> +
> +	ret = pm_runtime_get_sync(smidev);
> +	if (ret < 0)
> +		return ret;
> +
> +	ret = clk_prepare_enable(smipriv->clk_apb);
> +	if (ret) {
> +		dev_err(smidev, "Failed to enable the apb clock\n");
> +		goto err_put_pm;
> +	}
> +	ret = clk_prepare_enable(smipriv->clk_smi);
> +	if (ret) {
> +		dev_err(smidev, "Failed to enable the smi clock\n");
> +		goto err_disable_apb;
> +	}
> +	return ret;
> +
> +err_disable_apb:
> +	clk_disable_unprepare(smipriv->clk_apb);
> +err_put_pm:
> +	pm_runtime_put(smidev);
> +	return ret;
> +}

[...]

> +int mtk_smi_larb_get(struct device *larbdev)
> +{
> +	struct mtk_smi_larb *larbpriv = dev_get_drvdata(larbdev);
> +	struct mtk_larb_mmu *mmucfg = larbdev->archdata.iommu;
> +	int ret;
> +
> +	ret = mtk_smi_common_get(larbpriv->smi);
> +	if (ret)
> +		return ret;
> +
> +	ret = pm_runtime_get_sync(larbdev);
> +	if (ret < 0)
> +		goto err_put_smicommon;
> +
> +	ret = clk_prepare_enable(larbpriv->clk_apb);
> +	if (ret) {
> +		dev_err(larbdev, "Failed to enable the apb clock\n");
> +		goto err_put_pm;
> +	}
> +
> +	ret = clk_prepare_enable(larbpriv->clk_smi);
> +	if (ret) {
> +		dev_err(larbdev, "Failed to enable the smi clock\n");
> +		goto err_disable_apb;
> +	}

The clock enablement looks similar to the function above, maybe move it
to a helper function?


	Joerg

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 5/6] iommu/mediatek: Add mt8173 IOMMU driver
@ 2015-08-11 15:39     ` Joerg Roedel
  0 siblings, 0 replies; 60+ messages in thread
From: Joerg Roedel @ 2015-08-11 15:39 UTC (permalink / raw)
  To: Yong Wu
  Cc: Thierry Reding, Mark Rutland, Matthias Brugger, Robin Murphy,
	Will Deacon, Daniel Kurtz, Tomasz Figa, Lucas Stach, Rob Herring,
	Catalin Marinas, linux-mediatek, Sasha Hauer, srv_heupstream,
	devicetree, linux-kernel, linux-arm-kernel, iommu, pebolle, arnd,
	mitchelh, youhua.li, k.zhang, frederic.chen

On Mon, Aug 03, 2015 at 06:21:18PM +0800, Yong Wu wrote:
> +/*
> + * There is only one iommu domain called the m4u domain that
> + * all Multimedia modules share.
> + */
> +static struct mtk_iommu_domain	*m4udom;

What is the reason you only implement one domain? Can't the IOMMU
isolate different devices from each other?


> +static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type)
> +{
> +	struct mtk_iommu_domain *priv;
> +
> +	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
> +		return NULL;
> +
> +	if (m4udom)/* The m4u domain exist. */
> +		return &m4udom->domain;

This is not going to work. If you always return the same domain the
iommu core might re-initialize domain state (and overwrite changed
state). At the moment this is only the domain-type which will change
every time this function is called, but there might be more.

> +static int mtk_iommu_add_device(struct device *dev)
> +{
> +	struct iommu_group *group;
> +	int ret;
> +
> +	if (!dev->archdata.iommu) /* Not a iommu client device */
> +		return -ENODEV;
> +
> +	group = iommu_group_get(dev);
> +	if (!group) {
> +		group = iommu_group_alloc();
> +		if (IS_ERR(group)) {
> +			dev_err(dev, "Failed to allocate IOMMU group\n");
> +			return PTR_ERR(group);
> +		}
> +	}
> +
> +	ret = iommu_group_add_device(group, dev);
> +	if (ret) {
> +		dev_err(dev, "Failed to add IOMMU group\n");
> +		goto err_group_put;
> +	}
> +
> +	ret = iommu_attach_group(&m4udom->domain, group);
> +	if (ret)
> +		dev_err(dev, "Failed to attach IOMMU group\n");
> +
> +err_group_put:
> +	iommu_group_put(group);
> +	return ret;
> +}

Putting every device into its own group indicates that the IOMMU can
isolate between single devices on the bus, which makes it even more
questionable that you only allow one domain for the whole driver.


	Joerg


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 5/6] iommu/mediatek: Add mt8173 IOMMU driver
@ 2015-08-11 15:39     ` Joerg Roedel
  0 siblings, 0 replies; 60+ messages in thread
From: Joerg Roedel @ 2015-08-11 15:39 UTC (permalink / raw)
  To: Yong Wu
  Cc: Mark Rutland, Catalin Marinas, Will Deacon,
	youhua.li-NuS5LvNUpcJWk0Htik3J/w, Thierry Reding,
	k.zhang-NuS5LvNUpcJWk0Htik3J/w,
	devicetree-u79uwXL29TY76Z2rM5mHXA, arnd-r2nGTMty4D4, Tomasz Figa,
	Rob Herring, linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Matthias Brugger,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	pebolle-IWqWACnzNjzz+pZb47iToQ,
	frederic.chen-NuS5LvNUpcJWk0Htik3J/w,
	srv_heupstream-NuS5LvNUpcJWk0Htik3J/w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Daniel Kurtz,
	Sasha Hauer, Lucas Stach

On Mon, Aug 03, 2015 at 06:21:18PM +0800, Yong Wu wrote:
> +/*
> + * There is only one iommu domain called the m4u domain that
> + * all Multimedia modules share.
> + */
> +static struct mtk_iommu_domain	*m4udom;

What is the reason you only implement one domain? Can't the IOMMU
isolate different devices from each other?


> +static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type)
> +{
> +	struct mtk_iommu_domain *priv;
> +
> +	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
> +		return NULL;
> +
> +	if (m4udom)/* The m4u domain exist. */
> +		return &m4udom->domain;

This is not going to work. If you always return the same domain the
iommu core might re-initialize domain state (and overwrite changed
state). At the moment this is only the domain-type which will change
every time this function is called, but there might be more.

> +static int mtk_iommu_add_device(struct device *dev)
> +{
> +	struct iommu_group *group;
> +	int ret;
> +
> +	if (!dev->archdata.iommu) /* Not a iommu client device */
> +		return -ENODEV;
> +
> +	group = iommu_group_get(dev);
> +	if (!group) {
> +		group = iommu_group_alloc();
> +		if (IS_ERR(group)) {
> +			dev_err(dev, "Failed to allocate IOMMU group\n");
> +			return PTR_ERR(group);
> +		}
> +	}
> +
> +	ret = iommu_group_add_device(group, dev);
> +	if (ret) {
> +		dev_err(dev, "Failed to add IOMMU group\n");
> +		goto err_group_put;
> +	}
> +
> +	ret = iommu_attach_group(&m4udom->domain, group);
> +	if (ret)
> +		dev_err(dev, "Failed to attach IOMMU group\n");
> +
> +err_group_put:
> +	iommu_group_put(group);
> +	return ret;
> +}

Putting every device into its own group indicates that the IOMMU can
isolate between single devices on the bus, which makes it even more
questionable that you only allow one domain for the whole driver.


	Joerg

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v4 5/6] iommu/mediatek: Add mt8173 IOMMU driver
@ 2015-08-11 15:39     ` Joerg Roedel
  0 siblings, 0 replies; 60+ messages in thread
From: Joerg Roedel @ 2015-08-11 15:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Aug 03, 2015 at 06:21:18PM +0800, Yong Wu wrote:
> +/*
> + * There is only one iommu domain called the m4u domain that
> + * all Multimedia modules share.
> + */
> +static struct mtk_iommu_domain	*m4udom;

What is the reason you only implement one domain? Can't the IOMMU
isolate different devices from each other?


> +static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type)
> +{
> +	struct mtk_iommu_domain *priv;
> +
> +	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
> +		return NULL;
> +
> +	if (m4udom)/* The m4u domain exist. */
> +		return &m4udom->domain;

This is not going to work. If you always return the same domain the
iommu core might re-initialize domain state (and overwrite changed
state). At the moment this is only the domain-type which will change
every time this function is called, but there might be more.

> +static int mtk_iommu_add_device(struct device *dev)
> +{
> +	struct iommu_group *group;
> +	int ret;
> +
> +	if (!dev->archdata.iommu) /* Not a iommu client device */
> +		return -ENODEV;
> +
> +	group = iommu_group_get(dev);
> +	if (!group) {
> +		group = iommu_group_alloc();
> +		if (IS_ERR(group)) {
> +			dev_err(dev, "Failed to allocate IOMMU group\n");
> +			return PTR_ERR(group);
> +		}
> +	}
> +
> +	ret = iommu_group_add_device(group, dev);
> +	if (ret) {
> +		dev_err(dev, "Failed to add IOMMU group\n");
> +		goto err_group_put;
> +	}
> +
> +	ret = iommu_attach_group(&m4udom->domain, group);
> +	if (ret)
> +		dev_err(dev, "Failed to attach IOMMU group\n");
> +
> +err_group_put:
> +	iommu_group_put(group);
> +	return ret;
> +}

Putting every device into its own group indicates that the IOMMU can
isolate between single devices on the bus, which makes it even more
questionable that you only allow one domain for the whole driver.


	Joerg

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 5/6] iommu/mediatek: Add mt8173 IOMMU driver
@ 2015-08-12 12:28       ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-12 12:28 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Thierry Reding, Mark Rutland, Matthias Brugger, Robin Murphy,
	Will Deacon, Daniel Kurtz, Tomasz Figa, Lucas Stach, Rob Herring,
	Catalin Marinas, linux-mediatek, Sasha Hauer, srv_heupstream,
	devicetree, linux-kernel, linux-arm-kernel, iommu, pebolle, arnd,
	mitchelh, youhua.li, k.zhang, frederic.chen

On Tue, 2015-08-11 at 17:39 +0200, Joerg Roedel wrote:
> On Mon, Aug 03, 2015 at 06:21:18PM +0800, Yong Wu wrote:
> > +/*
> > + * There is only one iommu domain called the m4u domain that
> > + * all Multimedia modules share.
> > + */
> > +static struct mtk_iommu_domain	*m4udom;
> 
> What is the reason you only implement one domain? Can't the IOMMU
> isolate different devices from each other?
Hi Joerg,

  Thanks for your review.
  From your comment, you may care that we use only one domain.

  Our HW is called m4u(MultiMedia Memory Management Unit) which
help all the multimedia hardware access the dram, include display,video
decode,video encode,camera,mdp,etc. And the m4u has only one pagetable.
(Actually there is two pagetables in m4u, one is the normal pagetable,
the other is security pagetable. Currently We don't implement the
security one, so there is only one pagetable here.)
That is to say all the multimedia devices are in the m4u domain and
share the m4u’s pagetable.

So We have only one domain here..

> > +static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type)
> > +{
> > +	struct mtk_iommu_domain *priv;
> > +
> > +	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
> > +		return NULL;
> > +
> > +	if (m4udom)/* The m4u domain exist. */
> > +		return &m4udom->domain;
> 
> This is not going to work. If you always return the same domain the
> iommu core might re-initialize domain state (and overwrite changed
> state). At the moment this is only the domain-type which will change
> every time this function is called, but there might be more.

In our v3[1],  I didn't return the same domain, Then I have to trick in
attach_device like below:
//============
static int mtk_iommu_attach_device(struct iommu_domain *domain,
                                  struct device *dev)
{
        struct device *imudev = xxxx;
        /*
         * Reserve one iommu domain as the m4u domain which
         * all Multimedia modules share and free the others.
         */
        if (!imudev->archdata.iommu)
               imudev->archdata.iommu = priv;
        else if (imudev->archdata.iommu != priv)
               iommu_domain_free(domain);
        xxxx
}
//==============
In this version, I used a global variable to record the m4u domain then
I can delete these code and make it simple.

Then how should I do in our case? .
If we can't return the same domain here, We have to add some code like
above in the attach_device.

[1]:http://lists.linuxfoundation.org/pipermail/iommu/2015-July/013631.html
 
> 
> > +static int mtk_iommu_add_device(struct device *dev)
> > +{
> > +	struct iommu_group *group;
> > +	int ret;
> > +
> > +	if (!dev->archdata.iommu) /* Not a iommu client device */
> > +		return -ENODEV;
> > +
> > +	group = iommu_group_get(dev);
> > +	if (!group) {
> > +		group = iommu_group_alloc();
> > +		if (IS_ERR(group)) {
> > +			dev_err(dev, "Failed to allocate IOMMU group\n");
> > +			return PTR_ERR(group);
> > +		}
> > +	}
> > +
> > +	ret = iommu_group_add_device(group, dev);
> > +	if (ret) {
> > +		dev_err(dev, "Failed to add IOMMU group\n");
> > +		goto err_group_put;
> > +	}
> > +
> > +	ret = iommu_attach_group(&m4udom->domain, group);
> > +	if (ret)
> > +		dev_err(dev, "Failed to attach IOMMU group\n");
> > +
> > +err_group_put:
> > +	iommu_group_put(group);
> > +	return ret;
> > +}
> 
> Putting every device into its own group indicates that the IOMMU can
> isolate between single devices on the bus, which makes it even more
> questionable that you only allow one domain for the whole driver.
> 
> 
> 	Joerg
> 



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 5/6] iommu/mediatek: Add mt8173 IOMMU driver
@ 2015-08-12 12:28       ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-12 12:28 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Mark Rutland, Catalin Marinas, Will Deacon,
	youhua.li-NuS5LvNUpcJWk0Htik3J/w, Thierry Reding,
	k.zhang-NuS5LvNUpcJWk0Htik3J/w,
	devicetree-u79uwXL29TY76Z2rM5mHXA, arnd-r2nGTMty4D4, Tomasz Figa,
	Rob Herring, linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Matthias Brugger,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	pebolle-IWqWACnzNjzz+pZb47iToQ,
	frederic.chen-NuS5LvNUpcJWk0Htik3J/w,
	srv_heupstream-NuS5LvNUpcJWk0Htik3J/w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Daniel Kurtz,
	Sasha Hauer, Lucas Stach

On Tue, 2015-08-11 at 17:39 +0200, Joerg Roedel wrote:
> On Mon, Aug 03, 2015 at 06:21:18PM +0800, Yong Wu wrote:
> > +/*
> > + * There is only one iommu domain called the m4u domain that
> > + * all Multimedia modules share.
> > + */
> > +static struct mtk_iommu_domain	*m4udom;
> 
> What is the reason you only implement one domain? Can't the IOMMU
> isolate different devices from each other?
Hi Joerg,

  Thanks for your review.
  From your comment, you may care that we use only one domain.

  Our HW is called m4u(MultiMedia Memory Management Unit) which
help all the multimedia hardware access the dram, include display,video
decode,video encode,camera,mdp,etc. And the m4u has only one pagetable.
(Actually there is two pagetables in m4u, one is the normal pagetable,
the other is security pagetable. Currently We don't implement the
security one, so there is only one pagetable here.)
That is to say all the multimedia devices are in the m4u domain and
share the m4u’s pagetable.

So We have only one domain here..

> > +static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type)
> > +{
> > +	struct mtk_iommu_domain *priv;
> > +
> > +	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
> > +		return NULL;
> > +
> > +	if (m4udom)/* The m4u domain exist. */
> > +		return &m4udom->domain;
> 
> This is not going to work. If you always return the same domain the
> iommu core might re-initialize domain state (and overwrite changed
> state). At the moment this is only the domain-type which will change
> every time this function is called, but there might be more.

In our v3[1],  I didn't return the same domain, Then I have to trick in
attach_device like below:
//============
static int mtk_iommu_attach_device(struct iommu_domain *domain,
                                  struct device *dev)
{
        struct device *imudev = xxxx;
        /*
         * Reserve one iommu domain as the m4u domain which
         * all Multimedia modules share and free the others.
         */
        if (!imudev->archdata.iommu)
               imudev->archdata.iommu = priv;
        else if (imudev->archdata.iommu != priv)
               iommu_domain_free(domain);
        xxxx
}
//==============
In this version, I used a global variable to record the m4u domain then
I can delete these code and make it simple.

Then how should I do in our case? .
If we can't return the same domain here, We have to add some code like
above in the attach_device.

[1]:http://lists.linuxfoundation.org/pipermail/iommu/2015-July/013631.html
 
> 
> > +static int mtk_iommu_add_device(struct device *dev)
> > +{
> > +	struct iommu_group *group;
> > +	int ret;
> > +
> > +	if (!dev->archdata.iommu) /* Not a iommu client device */
> > +		return -ENODEV;
> > +
> > +	group = iommu_group_get(dev);
> > +	if (!group) {
> > +		group = iommu_group_alloc();
> > +		if (IS_ERR(group)) {
> > +			dev_err(dev, "Failed to allocate IOMMU group\n");
> > +			return PTR_ERR(group);
> > +		}
> > +	}
> > +
> > +	ret = iommu_group_add_device(group, dev);
> > +	if (ret) {
> > +		dev_err(dev, "Failed to add IOMMU group\n");
> > +		goto err_group_put;
> > +	}
> > +
> > +	ret = iommu_attach_group(&m4udom->domain, group);
> > +	if (ret)
> > +		dev_err(dev, "Failed to attach IOMMU group\n");
> > +
> > +err_group_put:
> > +	iommu_group_put(group);
> > +	return ret;
> > +}
> 
> Putting every device into its own group indicates that the IOMMU can
> isolate between single devices on the bus, which makes it even more
> questionable that you only allow one domain for the whole driver.
> 
> 
> 	Joerg
> 


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v4 5/6] iommu/mediatek: Add mt8173 IOMMU driver
@ 2015-08-12 12:28       ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-12 12:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2015-08-11 at 17:39 +0200, Joerg Roedel wrote:
> On Mon, Aug 03, 2015 at 06:21:18PM +0800, Yong Wu wrote:
> > +/*
> > + * There is only one iommu domain called the m4u domain that
> > + * all Multimedia modules share.
> > + */
> > +static struct mtk_iommu_domain	*m4udom;
> 
> What is the reason you only implement one domain? Can't the IOMMU
> isolate different devices from each other?
Hi Joerg,

  Thanks for your review.
  From your comment, you may care that we use only one domain.

  Our HW is called m4u(MultiMedia Memory Management Unit) which
help all the multimedia hardware access the dram, include display,video
decode,video encode,camera,mdp,etc. And the m4u has only one pagetable.
(Actually there is two pagetables in m4u, one is the normal pagetable,
the other is security pagetable. Currently We don't implement the
security one, so there is only one pagetable here.)
That is to say all the multimedia devices are in the m4u domain and
share the m4u?s pagetable.

So We have only one domain here..

> > +static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type)
> > +{
> > +	struct mtk_iommu_domain *priv;
> > +
> > +	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
> > +		return NULL;
> > +
> > +	if (m4udom)/* The m4u domain exist. */
> > +		return &m4udom->domain;
> 
> This is not going to work. If you always return the same domain the
> iommu core might re-initialize domain state (and overwrite changed
> state). At the moment this is only the domain-type which will change
> every time this function is called, but there might be more.

In our v3[1],  I didn't return the same domain, Then I have to trick in
attach_device like below:
//============
static int mtk_iommu_attach_device(struct iommu_domain *domain,
                                  struct device *dev)
{
        struct device *imudev = xxxx;
        /*
         * Reserve one iommu domain as the m4u domain which
         * all Multimedia modules share and free the others.
         */
        if (!imudev->archdata.iommu)
               imudev->archdata.iommu = priv;
        else if (imudev->archdata.iommu != priv)
               iommu_domain_free(domain);
        xxxx
}
//==============
In this version, I used a global variable to record the m4u domain then
I can delete these code and make it simple.

Then how should I do in our case? .
If we can't return the same domain here, We have to add some code like
above in the attach_device.

[1]:http://lists.linuxfoundation.org/pipermail/iommu/2015-July/013631.html
 
> 
> > +static int mtk_iommu_add_device(struct device *dev)
> > +{
> > +	struct iommu_group *group;
> > +	int ret;
> > +
> > +	if (!dev->archdata.iommu) /* Not a iommu client device */
> > +		return -ENODEV;
> > +
> > +	group = iommu_group_get(dev);
> > +	if (!group) {
> > +		group = iommu_group_alloc();
> > +		if (IS_ERR(group)) {
> > +			dev_err(dev, "Failed to allocate IOMMU group\n");
> > +			return PTR_ERR(group);
> > +		}
> > +	}
> > +
> > +	ret = iommu_group_add_device(group, dev);
> > +	if (ret) {
> > +		dev_err(dev, "Failed to add IOMMU group\n");
> > +		goto err_group_put;
> > +	}
> > +
> > +	ret = iommu_attach_group(&m4udom->domain, group);
> > +	if (ret)
> > +		dev_err(dev, "Failed to attach IOMMU group\n");
> > +
> > +err_group_put:
> > +	iommu_group_put(group);
> > +	return ret;
> > +}
> 
> Putting every device into its own group indicates that the IOMMU can
> isolate between single devices on the bus, which makes it even more
> questionable that you only allow one domain for the whole driver.
> 
> 
> 	Joerg
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 4/6] memory: mediatek: Add SMI driver
  2015-08-11 14:56     ` Joerg Roedel
  (?)
@ 2015-08-12 12:39       ` Yong Wu
  -1 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-12 12:39 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Thierry Reding, Mark Rutland, Matthias Brugger, Robin Murphy,
	Will Deacon, Daniel Kurtz, Tomasz Figa, Lucas Stach, Rob Herring,
	Catalin Marinas, linux-mediatek, Sasha Hauer, srv_heupstream,
	devicetree, linux-kernel, linux-arm-kernel, iommu, pebolle, arnd,
	mitchelh, youhua.li, k.zhang, frederic.chen

On Tue, 2015-08-11 at 16:56 +0200, Joerg Roedel wrote:
> On Mon, Aug 03, 2015 at 06:21:17PM +0800, Yong Wu wrote:
> > +static int mtk_smi_common_get(struct device *smidev)
> > +{
> > +	struct mtk_smi_common *smipriv = dev_get_drvdata(smidev);
> > +	int ret;
> > +
> > +	ret = pm_runtime_get_sync(smidev);
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	ret = clk_prepare_enable(smipriv->clk_apb);
> > +	if (ret) {
> > +		dev_err(smidev, "Failed to enable the apb clock\n");
> > +		goto err_put_pm;
> > +	}
> > +	ret = clk_prepare_enable(smipriv->clk_smi);
> > +	if (ret) {
> > +		dev_err(smidev, "Failed to enable the smi clock\n");
> > +		goto err_disable_apb;
> > +	}
> > +	return ret;
> > +
> > +err_disable_apb:
> > +	clk_disable_unprepare(smipriv->clk_apb);
> > +err_put_pm:
> > +	pm_runtime_put(smidev);
> > +	return ret;
> > +}
> 
> [...]
> 
> > +int mtk_smi_larb_get(struct device *larbdev)
> > +{
> > +	struct mtk_smi_larb *larbpriv = dev_get_drvdata(larbdev);
> > +	struct mtk_larb_mmu *mmucfg = larbdev->archdata.iommu;
> > +	int ret;
> > +
> > +	ret = mtk_smi_common_get(larbpriv->smi);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = pm_runtime_get_sync(larbdev);
> > +	if (ret < 0)
> > +		goto err_put_smicommon;
> > +
> > +	ret = clk_prepare_enable(larbpriv->clk_apb);
> > +	if (ret) {
> > +		dev_err(larbdev, "Failed to enable the apb clock\n");
> > +		goto err_put_pm;
> > +	}
> > +
> > +	ret = clk_prepare_enable(larbpriv->clk_smi);
> > +	if (ret) {
> > +		dev_err(larbdev, "Failed to enable the smi clock\n");
> > +		goto err_disable_apb;
> > +	}
> 
> The clock enablement looks similar to the function above, maybe move it
> to a helper function?

Thanks. I will improve it in next version.

> 
> 
> 	Joerg
> 



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 4/6] memory: mediatek: Add SMI driver
@ 2015-08-12 12:39       ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-12 12:39 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Thierry Reding, Mark Rutland, Matthias Brugger, Robin Murphy,
	Will Deacon, Daniel Kurtz, Tomasz Figa, Lucas Stach, Rob Herring,
	Catalin Marinas, linux-mediatek, Sasha Hauer, srv_heupstream,
	devicetree, linux-kernel, linux-arm-kernel, iommu, pebolle, arnd,
	mitchelh, youhua.li, k.zhang, frederic.chen

On Tue, 2015-08-11 at 16:56 +0200, Joerg Roedel wrote:
> On Mon, Aug 03, 2015 at 06:21:17PM +0800, Yong Wu wrote:
> > +static int mtk_smi_common_get(struct device *smidev)
> > +{
> > +	struct mtk_smi_common *smipriv = dev_get_drvdata(smidev);
> > +	int ret;
> > +
> > +	ret = pm_runtime_get_sync(smidev);
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	ret = clk_prepare_enable(smipriv->clk_apb);
> > +	if (ret) {
> > +		dev_err(smidev, "Failed to enable the apb clock\n");
> > +		goto err_put_pm;
> > +	}
> > +	ret = clk_prepare_enable(smipriv->clk_smi);
> > +	if (ret) {
> > +		dev_err(smidev, "Failed to enable the smi clock\n");
> > +		goto err_disable_apb;
> > +	}
> > +	return ret;
> > +
> > +err_disable_apb:
> > +	clk_disable_unprepare(smipriv->clk_apb);
> > +err_put_pm:
> > +	pm_runtime_put(smidev);
> > +	return ret;
> > +}
> 
> [...]
> 
> > +int mtk_smi_larb_get(struct device *larbdev)
> > +{
> > +	struct mtk_smi_larb *larbpriv = dev_get_drvdata(larbdev);
> > +	struct mtk_larb_mmu *mmucfg = larbdev->archdata.iommu;
> > +	int ret;
> > +
> > +	ret = mtk_smi_common_get(larbpriv->smi);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = pm_runtime_get_sync(larbdev);
> > +	if (ret < 0)
> > +		goto err_put_smicommon;
> > +
> > +	ret = clk_prepare_enable(larbpriv->clk_apb);
> > +	if (ret) {
> > +		dev_err(larbdev, "Failed to enable the apb clock\n");
> > +		goto err_put_pm;
> > +	}
> > +
> > +	ret = clk_prepare_enable(larbpriv->clk_smi);
> > +	if (ret) {
> > +		dev_err(larbdev, "Failed to enable the smi clock\n");
> > +		goto err_disable_apb;
> > +	}
> 
> The clock enablement looks similar to the function above, maybe move it
> to a helper function?

Thanks. I will improve it in next version.

> 
> 
> 	Joerg
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v4 4/6] memory: mediatek: Add SMI driver
@ 2015-08-12 12:39       ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-08-12 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2015-08-11 at 16:56 +0200, Joerg Roedel wrote:
> On Mon, Aug 03, 2015 at 06:21:17PM +0800, Yong Wu wrote:
> > +static int mtk_smi_common_get(struct device *smidev)
> > +{
> > +	struct mtk_smi_common *smipriv = dev_get_drvdata(smidev);
> > +	int ret;
> > +
> > +	ret = pm_runtime_get_sync(smidev);
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	ret = clk_prepare_enable(smipriv->clk_apb);
> > +	if (ret) {
> > +		dev_err(smidev, "Failed to enable the apb clock\n");
> > +		goto err_put_pm;
> > +	}
> > +	ret = clk_prepare_enable(smipriv->clk_smi);
> > +	if (ret) {
> > +		dev_err(smidev, "Failed to enable the smi clock\n");
> > +		goto err_disable_apb;
> > +	}
> > +	return ret;
> > +
> > +err_disable_apb:
> > +	clk_disable_unprepare(smipriv->clk_apb);
> > +err_put_pm:
> > +	pm_runtime_put(smidev);
> > +	return ret;
> > +}
> 
> [...]
> 
> > +int mtk_smi_larb_get(struct device *larbdev)
> > +{
> > +	struct mtk_smi_larb *larbpriv = dev_get_drvdata(larbdev);
> > +	struct mtk_larb_mmu *mmucfg = larbdev->archdata.iommu;
> > +	int ret;
> > +
> > +	ret = mtk_smi_common_get(larbpriv->smi);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = pm_runtime_get_sync(larbdev);
> > +	if (ret < 0)
> > +		goto err_put_smicommon;
> > +
> > +	ret = clk_prepare_enable(larbpriv->clk_apb);
> > +	if (ret) {
> > +		dev_err(larbdev, "Failed to enable the apb clock\n");
> > +		goto err_put_pm;
> > +	}
> > +
> > +	ret = clk_prepare_enable(larbpriv->clk_smi);
> > +	if (ret) {
> > +		dev_err(larbdev, "Failed to enable the smi clock\n");
> > +		goto err_disable_apb;
> > +	}
> 
> The clock enablement looks similar to the function above, maybe move it
> to a helper function?

Thanks. I will improve it in next version.

> 
> 
> 	Joerg
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 5/6] iommu/mediatek: Add mt8173 IOMMU driver
@ 2015-09-11 15:33     ` Robin Murphy
  0 siblings, 0 replies; 60+ messages in thread
From: Robin Murphy @ 2015-09-11 15:33 UTC (permalink / raw)
  To: Yong Wu, Joerg Roedel, Thierry Reding, Mark Rutland, Matthias Brugger
  Cc: Will Deacon, Daniel Kurtz, Tomasz Figa, Lucas Stach, Rob Herring,
	Catalin Marinas, linux-mediatek, Sasha Hauer, srv_heupstream,
	devicetree, linux-kernel, linux-arm-kernel, iommu, pebolle, arnd,
	mitchelh, youhua.li, k.zhang, frederic.chen

On 03/08/15 11:21, Yong Wu wrote:
> This patch adds support for mediatek m4u (MultiMedia Memory Management
> Unit).
>
> Signed-off-by: Yong Wu <yong.wu@mediatek.com>
> ---
[...]
> +/*
> + * There is only one iommu domain called the m4u domain that
> + * all Multimedia modules share.
> + */
> +static struct mtk_iommu_domain *m4udom;

It's a shame this can't be part of the m4u device's mtk_iommu_data, but 
since the way iommu_domain_alloc works makes that impossible, I think we 
have little choice but to use the global and hope your guys never build 
a system with two of these things in ;)

[...]
> +static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type)
> +{
> +       struct mtk_iommu_domain *priv;
> +
> +       if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
> +               return NULL;
> +
> +       if (m4udom)/* The m4u domain exist. */
> +               return &m4udom->domain;
> +
> +       priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> +       if (!priv)
> +               return NULL;
> +
> +       priv->domain.geometry.aperture_start = 0;
> +       priv->domain.geometry.aperture_end = DMA_BIT_MASK(32);
> +       priv->domain.geometry.force_aperture = true;

My intention is that in the IOMMU_DOMAIN_DMA case you'd call 
iommu_get_dma_cookie(&priv->domain) here as well, that way I can get rid 
of some of the dodgy workarounds in arch_setup_dma_ops which try to 
cover all possible cases (and which I'm now not 100% confident in). I'm 
just about to start trying to fix that up (expect a repost of my series 
in a week or two once -rc1 has landed).

> +
> +       m4udom = priv;
> +
> +       return &priv->domain;
> +}
[...]
> +static int mtk_iommu_add_device(struct device *dev)
> +{
> +       struct iommu_group *group;
> +       int ret;
> +
> +       if (!dev->archdata.iommu) /* Not a iommu client device */
> +               return -ENODEV;
> +
> +       group = iommu_group_get(dev);
> +       if (!group) {
> +               group = iommu_group_alloc();
> +               if (IS_ERR(group)) {
> +                       dev_err(dev, "Failed to allocate IOMMU group\n");
> +                       return PTR_ERR(group);
> +               }
> +       }
> +
 > +       ret = iommu_group_add_device(group, dev);
> +       if (ret) {
> +               dev_err(dev, "Failed to add IOMMU group\n");
> +               goto err_group_put;
> +       }

I know the rest of the code means that you can't hit it in practice, but 
if you ever did have two client devices in the same group then the 
iommu_group_get() could legitimately succeed for the second device, then 
you'd blow up creating a duplicate sysfs entry by adding the device to 
its own group again. Probably not what you want.

> +
> +       ret = iommu_attach_group(&m4udom->domain, group);
> +       if (ret)
> +               dev_err(dev, "Failed to attach IOMMU group\n");

Similarly here, if two devices did share a group then the group could 
legitimately already be attached to a domain here (by the first device), 
so attaching it again would be wrong. I think it would be nicer to check 
with iommu_get_domain_for_dev() first to see if you need to do anything 
at all (a valid domain from that implies a valid group).

> +
> +err_group_put:
> +       iommu_group_put(group);
> +       return ret;
> +}
[...]
> +static int mtk_iommu_probe(struct platform_device *pdev)
> +{
> +       struct mtk_iommu_data   *data;
> +       struct device           *dev = &pdev->dev;
> +       void __iomem            *protect;
> +       int                     ret;
> +
> +       data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
> +       if (!data)
> +               return -ENOMEM;
> +
> +       /* Protect memory. HW will access here while translation fault.*/
> +       protect = devm_kzalloc(dev, MTK_PROTECT_PA_ALIGN * 2, GFP_KERNEL);
> +       if (!protect)
> +               return -ENOMEM;
> +       data->protect_base = virt_to_phys(protect);
> +
> +       ret = mtk_iommu_parse_dt(pdev, data);
> +       if (ret)
> +               return ret;
> +
> +       if (!m4udom)/* There is no iommu client */
> +               return 0;

I don't quite follow this: m4udom is apparently only created by someone 
calling domain_alloc() - how can you guarantee that happens before this 
driver is probed? - but if they then go and try to attach the device to 
their new domain, it's going to end up in mtk_hw_init() poking the 
hardware of the m4u device that can't have even probed yet.

I can only imagine it currently works by sheer chance due to the 
horrible arch_setup_dma_ops delayed attachment workaround, so even if I 
can't remove that completely when I look at it next week I'm liable to 
change it in a way that breaks this badly ;)

Robin.

> +
> +       data->dev = dev;
> +       m4udom->data = data;
> +       dev_set_drvdata(dev, m4udom);
> +
> +       return 0;
> +}


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 5/6] iommu/mediatek: Add mt8173 IOMMU driver
@ 2015-09-11 15:33     ` Robin Murphy
  0 siblings, 0 replies; 60+ messages in thread
From: Robin Murphy @ 2015-09-11 15:33 UTC (permalink / raw)
  To: Yong Wu, Joerg Roedel, Thierry Reding, Mark Rutland, Matthias Brugger
  Cc: k.zhang-NuS5LvNUpcJWk0Htik3J/w,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	pebolle-IWqWACnzNjzz+pZb47iToQ,
	frederic.chen-NuS5LvNUpcJWk0Htik3J/w, arnd-r2nGTMty4D4,
	srv_heupstream-NuS5LvNUpcJWk0Htik3J/w, Catalin Marinas,
	Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tomasz Figa,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	Daniel Kurtz, Sasha Hauer,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	youhua.li-NuS5LvNUpcJWk0Htik3J/w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Lucas Stach

On 03/08/15 11:21, Yong Wu wrote:
> This patch adds support for mediatek m4u (MultiMedia Memory Management
> Unit).
>
> Signed-off-by: Yong Wu <yong.wu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
> ---
[...]
> +/*
> + * There is only one iommu domain called the m4u domain that
> + * all Multimedia modules share.
> + */
> +static struct mtk_iommu_domain *m4udom;

It's a shame this can't be part of the m4u device's mtk_iommu_data, but 
since the way iommu_domain_alloc works makes that impossible, I think we 
have little choice but to use the global and hope your guys never build 
a system with two of these things in ;)

[...]
> +static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type)
> +{
> +       struct mtk_iommu_domain *priv;
> +
> +       if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
> +               return NULL;
> +
> +       if (m4udom)/* The m4u domain exist. */
> +               return &m4udom->domain;
> +
> +       priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> +       if (!priv)
> +               return NULL;
> +
> +       priv->domain.geometry.aperture_start = 0;
> +       priv->domain.geometry.aperture_end = DMA_BIT_MASK(32);
> +       priv->domain.geometry.force_aperture = true;

My intention is that in the IOMMU_DOMAIN_DMA case you'd call 
iommu_get_dma_cookie(&priv->domain) here as well, that way I can get rid 
of some of the dodgy workarounds in arch_setup_dma_ops which try to 
cover all possible cases (and which I'm now not 100% confident in). I'm 
just about to start trying to fix that up (expect a repost of my series 
in a week or two once -rc1 has landed).

> +
> +       m4udom = priv;
> +
> +       return &priv->domain;
> +}
[...]
> +static int mtk_iommu_add_device(struct device *dev)
> +{
> +       struct iommu_group *group;
> +       int ret;
> +
> +       if (!dev->archdata.iommu) /* Not a iommu client device */
> +               return -ENODEV;
> +
> +       group = iommu_group_get(dev);
> +       if (!group) {
> +               group = iommu_group_alloc();
> +               if (IS_ERR(group)) {
> +                       dev_err(dev, "Failed to allocate IOMMU group\n");
> +                       return PTR_ERR(group);
> +               }
> +       }
> +
 > +       ret = iommu_group_add_device(group, dev);
> +       if (ret) {
> +               dev_err(dev, "Failed to add IOMMU group\n");
> +               goto err_group_put;
> +       }

I know the rest of the code means that you can't hit it in practice, but 
if you ever did have two client devices in the same group then the 
iommu_group_get() could legitimately succeed for the second device, then 
you'd blow up creating a duplicate sysfs entry by adding the device to 
its own group again. Probably not what you want.

> +
> +       ret = iommu_attach_group(&m4udom->domain, group);
> +       if (ret)
> +               dev_err(dev, "Failed to attach IOMMU group\n");

Similarly here, if two devices did share a group then the group could 
legitimately already be attached to a domain here (by the first device), 
so attaching it again would be wrong. I think it would be nicer to check 
with iommu_get_domain_for_dev() first to see if you need to do anything 
at all (a valid domain from that implies a valid group).

> +
> +err_group_put:
> +       iommu_group_put(group);
> +       return ret;
> +}
[...]
> +static int mtk_iommu_probe(struct platform_device *pdev)
> +{
> +       struct mtk_iommu_data   *data;
> +       struct device           *dev = &pdev->dev;
> +       void __iomem            *protect;
> +       int                     ret;
> +
> +       data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
> +       if (!data)
> +               return -ENOMEM;
> +
> +       /* Protect memory. HW will access here while translation fault.*/
> +       protect = devm_kzalloc(dev, MTK_PROTECT_PA_ALIGN * 2, GFP_KERNEL);
> +       if (!protect)
> +               return -ENOMEM;
> +       data->protect_base = virt_to_phys(protect);
> +
> +       ret = mtk_iommu_parse_dt(pdev, data);
> +       if (ret)
> +               return ret;
> +
> +       if (!m4udom)/* There is no iommu client */
> +               return 0;

I don't quite follow this: m4udom is apparently only created by someone 
calling domain_alloc() - how can you guarantee that happens before this 
driver is probed? - but if they then go and try to attach the device to 
their new domain, it's going to end up in mtk_hw_init() poking the 
hardware of the m4u device that can't have even probed yet.

I can only imagine it currently works by sheer chance due to the 
horrible arch_setup_dma_ops delayed attachment workaround, so even if I 
can't remove that completely when I look at it next week I'm liable to 
change it in a way that breaks this badly ;)

Robin.

> +
> +       data->dev = dev;
> +       m4udom->data = data;
> +       dev_set_drvdata(dev, m4udom);
> +
> +       return 0;
> +}

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v4 5/6] iommu/mediatek: Add mt8173 IOMMU driver
@ 2015-09-11 15:33     ` Robin Murphy
  0 siblings, 0 replies; 60+ messages in thread
From: Robin Murphy @ 2015-09-11 15:33 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/08/15 11:21, Yong Wu wrote:
> This patch adds support for mediatek m4u (MultiMedia Memory Management
> Unit).
>
> Signed-off-by: Yong Wu <yong.wu@mediatek.com>
> ---
[...]
> +/*
> + * There is only one iommu domain called the m4u domain that
> + * all Multimedia modules share.
> + */
> +static struct mtk_iommu_domain *m4udom;

It's a shame this can't be part of the m4u device's mtk_iommu_data, but 
since the way iommu_domain_alloc works makes that impossible, I think we 
have little choice but to use the global and hope your guys never build 
a system with two of these things in ;)

[...]
> +static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type)
> +{
> +       struct mtk_iommu_domain *priv;
> +
> +       if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
> +               return NULL;
> +
> +       if (m4udom)/* The m4u domain exist. */
> +               return &m4udom->domain;
> +
> +       priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> +       if (!priv)
> +               return NULL;
> +
> +       priv->domain.geometry.aperture_start = 0;
> +       priv->domain.geometry.aperture_end = DMA_BIT_MASK(32);
> +       priv->domain.geometry.force_aperture = true;

My intention is that in the IOMMU_DOMAIN_DMA case you'd call 
iommu_get_dma_cookie(&priv->domain) here as well, that way I can get rid 
of some of the dodgy workarounds in arch_setup_dma_ops which try to 
cover all possible cases (and which I'm now not 100% confident in). I'm 
just about to start trying to fix that up (expect a repost of my series 
in a week or two once -rc1 has landed).

> +
> +       m4udom = priv;
> +
> +       return &priv->domain;
> +}
[...]
> +static int mtk_iommu_add_device(struct device *dev)
> +{
> +       struct iommu_group *group;
> +       int ret;
> +
> +       if (!dev->archdata.iommu) /* Not a iommu client device */
> +               return -ENODEV;
> +
> +       group = iommu_group_get(dev);
> +       if (!group) {
> +               group = iommu_group_alloc();
> +               if (IS_ERR(group)) {
> +                       dev_err(dev, "Failed to allocate IOMMU group\n");
> +                       return PTR_ERR(group);
> +               }
> +       }
> +
 > +       ret = iommu_group_add_device(group, dev);
> +       if (ret) {
> +               dev_err(dev, "Failed to add IOMMU group\n");
> +               goto err_group_put;
> +       }

I know the rest of the code means that you can't hit it in practice, but 
if you ever did have two client devices in the same group then the 
iommu_group_get() could legitimately succeed for the second device, then 
you'd blow up creating a duplicate sysfs entry by adding the device to 
its own group again. Probably not what you want.

> +
> +       ret = iommu_attach_group(&m4udom->domain, group);
> +       if (ret)
> +               dev_err(dev, "Failed to attach IOMMU group\n");

Similarly here, if two devices did share a group then the group could 
legitimately already be attached to a domain here (by the first device), 
so attaching it again would be wrong. I think it would be nicer to check 
with iommu_get_domain_for_dev() first to see if you need to do anything 
at all (a valid domain from that implies a valid group).

> +
> +err_group_put:
> +       iommu_group_put(group);
> +       return ret;
> +}
[...]
> +static int mtk_iommu_probe(struct platform_device *pdev)
> +{
> +       struct mtk_iommu_data   *data;
> +       struct device           *dev = &pdev->dev;
> +       void __iomem            *protect;
> +       int                     ret;
> +
> +       data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
> +       if (!data)
> +               return -ENOMEM;
> +
> +       /* Protect memory. HW will access here while translation fault.*/
> +       protect = devm_kzalloc(dev, MTK_PROTECT_PA_ALIGN * 2, GFP_KERNEL);
> +       if (!protect)
> +               return -ENOMEM;
> +       data->protect_base = virt_to_phys(protect);
> +
> +       ret = mtk_iommu_parse_dt(pdev, data);
> +       if (ret)
> +               return ret;
> +
> +       if (!m4udom)/* There is no iommu client */
> +               return 0;

I don't quite follow this: m4udom is apparently only created by someone 
calling domain_alloc() - how can you guarantee that happens before this 
driver is probed? - but if they then go and try to attach the device to 
their new domain, it's going to end up in mtk_hw_init() poking the 
hardware of the m4u device that can't have even probed yet.

I can only imagine it currently works by sheer chance due to the 
horrible arch_setup_dma_ops delayed attachment workaround, so even if I 
can't remove that completely when I look at it next week I'm liable to 
change it in a way that breaks this badly ;)

Robin.

> +
> +       data->dev = dev;
> +       m4udom->data = data;
> +       dev_set_drvdata(dev, m4udom);
> +
> +       return 0;
> +}

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 5/6] iommu/mediatek: Add mt8173 IOMMU driver
@ 2015-09-15  5:53       ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-09-15  5:53 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Joerg Roedel, Thierry Reding, Mark Rutland, Matthias Brugger,
	Will Deacon, Daniel Kurtz, Tomasz Figa, Lucas Stach, Rob Herring,
	Catalin Marinas, linux-mediatek, Sasha Hauer, srv_heupstream,
	devicetree, linux-kernel, linux-arm-kernel, iommu, pebolle, arnd,
	mitchelh, youhua.li, k.zhang, frederic.chen

On Fri, 2015-09-11 at 16:33 +0100, Robin Murphy wrote:
> On 03/08/15 11:21, Yong Wu wrote:
> > This patch adds support for mediatek m4u (MultiMedia Memory Management
> > Unit).
> >
> > Signed-off-by: Yong Wu <yong.wu@mediatek.com>
> > ---
> [...]
> > +/*
> > + * There is only one iommu domain called the m4u domain that
> > + * all Multimedia modules share.
> > + */
> > +static struct mtk_iommu_domain *m4udom;
> 
> It's a shame this can't be part of the m4u device's mtk_iommu_data, but 
> since the way iommu_domain_alloc works makes that impossible, I think we 
> have little choice but to use the global and hope your guys never build 
> a system with two of these things in ;)

Hi Robin,
   Thanks very much for your review. This gobal variable trouble me very
much. please also help check below.

> 
> [...]
> > +static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type)
> > +{
> > +       struct mtk_iommu_domain *priv;
> > +
> > +       if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
> > +               return NULL;
> > +
> > +       if (m4udom)/* The m4u domain exist. */
> > +               return &m4udom->domain;

     From Joerg's comment[0]: "This is not going to work. If you always
return the same domain the iommu core might re-initialize domain state
(and overwrite changed state)."

     It seems that I have to delete this here. then alloc iommu-domain
every time. and add some workaround code in mtk_iommu_attach_device like
our v3[1](reserve one as the m4u domain and delete the others).
Then the code maybe not very concise, but it could also work in the
future, Is it OK?

[0]:http://lists.linuxfoundation.org/pipermail/iommu/2015-August/014057.html
[1]:http://lists.linuxfoundation.org/pipermail/iommu/2015-July/013631.html

> > +
> > +       priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> > +       if (!priv)
> > +               return NULL;
> > +
> > +       priv->domain.geometry.aperture_start = 0;
> > +       priv->domain.geometry.aperture_end = DMA_BIT_MASK(32);
> > +       priv->domain.geometry.force_aperture = true;
> 
> My intention is that in the IOMMU_DOMAIN_DMA case you'd call 
> iommu_get_dma_cookie(&priv->domain) here as well, that way I can get rid 
> of some of the dodgy workarounds in arch_setup_dma_ops which try to 
> cover all possible cases (and which I'm now not 100% confident in). I'm 
> just about to start trying to fix that up (expect a repost of my series 
> in a week or two once -rc1 has landed).

  I will add like this:
  if (type == IOMMU_DOMAIN_DMA && iommu_get_dma_cookie(&priv->domain)) {
       kfree(priv);
       return NULL;
  }

> 
> > +
> > +       m4udom = priv;
> > +
> > +       return &priv->domain;
> > +}
> [...]
> > +static int mtk_iommu_add_device(struct device *dev)
> > +{
> > +       struct iommu_group *group;
> > +       int ret;
> > +
> > +       if (!dev->archdata.iommu) /* Not a iommu client device */
> > +               return -ENODEV;
> > +
> > +       group = iommu_group_get(dev);
> > +       if (!group) {
> > +               group = iommu_group_alloc();
> > +               if (IS_ERR(group)) {
> > +                       dev_err(dev, "Failed to allocate IOMMU group\n");
> > +                       return PTR_ERR(group);
> > +               }
> > +       }
> > +
>  > +       ret = iommu_group_add_device(group, dev);
> > +       if (ret) {
> > +               dev_err(dev, "Failed to add IOMMU group\n");
> > +               goto err_group_put;
> > +       }
> 
> I know the rest of the code means that you can't hit it in practice, but 
> if you ever did have two client devices in the same group then the 
> iommu_group_get() could legitimately succeed for the second device, then 
> you'd blow up creating a duplicate sysfs entry by adding the device to 
> its own group again. Probably not what you want.

   the "dev" is different while enter this function, That is to say
every client device have its own iommu-group. is it right?

> 
> > +
> > +       ret = iommu_attach_group(&m4udom->domain, group);
> > +       if (ret)
> > +               dev_err(dev, "Failed to attach IOMMU group\n");
> 
> Similarly here, if two devices did share a group then the group could 
> legitimately already be attached to a domain here (by the first device), 
> so attaching it again would be wrong. I think it would be nicer to check 
> with iommu_get_domain_for_dev() first to see if you need to do anything 
> at all (a valid domain from that implies a valid group).

   Here all the devices has their own iommu-group, I only attach the
same iommu-domain for them due to our m4u HW. All the clients are in
M4U-HW's domain and there is only one pagetable here.

> 
> > +
> > +err_group_put:
> > +       iommu_group_put(group);
> > +       return ret;
> > +}
> [...]
> > +static int mtk_iommu_probe(struct platform_device *pdev)
> > +{
> > +       struct mtk_iommu_data   *data;
> > +       struct device           *dev = &pdev->dev;
> > +       void __iomem            *protect;
> > +       int                     ret;
> > +
> > +       data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
> > +       if (!data)
> > +               return -ENOMEM;
> > +
> > +       /* Protect memory. HW will access here while translation fault.*/
> > +       protect = devm_kzalloc(dev, MTK_PROTECT_PA_ALIGN * 2, GFP_KERNEL);
> > +       if (!protect)
> > +               return -ENOMEM;
> > +       data->protect_base = virt_to_phys(protect);
> > +
> > +       ret = mtk_iommu_parse_dt(pdev, data);
> > +       if (ret)
> > +               return ret;
> > +
> > +       if (!m4udom)/* There is no iommu client */
> > +               return 0;
> 
> I don't quite follow this: m4udom is apparently only created by someone 
> calling domain_alloc() - how can you guarantee that happens before this 
> driver is probed? - but if they then go and try to attach the device to 
> their new domain, it's going to end up in mtk_hw_init() poking the 
> hardware of the m4u device that can't have even probed yet.

    I think the probe will run always earlier than mtk_hw_init.
    In the mtk_iommu_attach_device below, I add iommu_group_get to
guarantee the sequence.
//==================
static int mtk_iommu_attach_device(struct iommu_domain *domain,
				   struct device *dev)
{
	struct mtk_iommu_domain *priv = to_mtk_domain(domain);
	struct iommu_group *group;
	int ret;

	group = iommu_group_get(dev);
	if (!group)
		return 0;
	iommu_group_put(group);

	ret = mtk_iommu_init_domain_context(priv);
	if (ret)
		return ret;

	return mtk_iommu_config(priv, dev, true);
}
//======================
   After the probe done, It will enter bus_set_iommu->
mtk_iommu_add_device where will create iommu group for it.
then enter iommu_attach_group->mtk_iommu_attach_device again.
is this ok here?

About "how can you guarantee that happens before this 
driver is probed?"
->Sorry, I can't guarantee this. The domain_alloc is called by
arch_setup_dma_ops in the DMA core, I will change this depend on the
next DMA.

> 
> I can only imagine it currently works by sheer chance due to the 
> horrible arch_setup_dma_ops delayed attachment workaround, so even if I 
> can't remove that completely when I look at it next week I'm liable to 
> change it in a way that breaks this badly ;)
> 
> Robin.
> 
> > +
> > +       data->dev = dev;
> > +       m4udom->data = data;
> > +       dev_set_drvdata(dev, m4udom);
> > +
> > +       return 0;
> > +}
> 



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 5/6] iommu/mediatek: Add mt8173 IOMMU driver
@ 2015-09-15  5:53       ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-09-15  5:53 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Mark Rutland, Catalin Marinas, Will Deacon,
	youhua.li-NuS5LvNUpcJWk0Htik3J/w, Thierry Reding,
	k.zhang-NuS5LvNUpcJWk0Htik3J/w,
	devicetree-u79uwXL29TY76Z2rM5mHXA, arnd-r2nGTMty4D4, Tomasz Figa,
	Rob Herring, linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Matthias Brugger,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	pebolle-IWqWACnzNjzz+pZb47iToQ,
	frederic.chen-NuS5LvNUpcJWk0Htik3J/w,
	srv_heupstream-NuS5LvNUpcJWk0Htik3J/w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org

On Fri, 2015-09-11 at 16:33 +0100, Robin Murphy wrote:
> On 03/08/15 11:21, Yong Wu wrote:
> > This patch adds support for mediatek m4u (MultiMedia Memory Management
> > Unit).
> >
> > Signed-off-by: Yong Wu <yong.wu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
> > ---
> [...]
> > +/*
> > + * There is only one iommu domain called the m4u domain that
> > + * all Multimedia modules share.
> > + */
> > +static struct mtk_iommu_domain *m4udom;
> 
> It's a shame this can't be part of the m4u device's mtk_iommu_data, but 
> since the way iommu_domain_alloc works makes that impossible, I think we 
> have little choice but to use the global and hope your guys never build 
> a system with two of these things in ;)

Hi Robin,
   Thanks very much for your review. This gobal variable trouble me very
much. please also help check below.

> 
> [...]
> > +static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type)
> > +{
> > +       struct mtk_iommu_domain *priv;
> > +
> > +       if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
> > +               return NULL;
> > +
> > +       if (m4udom)/* The m4u domain exist. */
> > +               return &m4udom->domain;

     From Joerg's comment[0]: "This is not going to work. If you always
return the same domain the iommu core might re-initialize domain state
(and overwrite changed state)."

     It seems that I have to delete this here. then alloc iommu-domain
every time. and add some workaround code in mtk_iommu_attach_device like
our v3[1](reserve one as the m4u domain and delete the others).
Then the code maybe not very concise, but it could also work in the
future, Is it OK?

[0]:http://lists.linuxfoundation.org/pipermail/iommu/2015-August/014057.html
[1]:http://lists.linuxfoundation.org/pipermail/iommu/2015-July/013631.html

> > +
> > +       priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> > +       if (!priv)
> > +               return NULL;
> > +
> > +       priv->domain.geometry.aperture_start = 0;
> > +       priv->domain.geometry.aperture_end = DMA_BIT_MASK(32);
> > +       priv->domain.geometry.force_aperture = true;
> 
> My intention is that in the IOMMU_DOMAIN_DMA case you'd call 
> iommu_get_dma_cookie(&priv->domain) here as well, that way I can get rid 
> of some of the dodgy workarounds in arch_setup_dma_ops which try to 
> cover all possible cases (and which I'm now not 100% confident in). I'm 
> just about to start trying to fix that up (expect a repost of my series 
> in a week or two once -rc1 has landed).

  I will add like this:
  if (type == IOMMU_DOMAIN_DMA && iommu_get_dma_cookie(&priv->domain)) {
       kfree(priv);
       return NULL;
  }

> 
> > +
> > +       m4udom = priv;
> > +
> > +       return &priv->domain;
> > +}
> [...]
> > +static int mtk_iommu_add_device(struct device *dev)
> > +{
> > +       struct iommu_group *group;
> > +       int ret;
> > +
> > +       if (!dev->archdata.iommu) /* Not a iommu client device */
> > +               return -ENODEV;
> > +
> > +       group = iommu_group_get(dev);
> > +       if (!group) {
> > +               group = iommu_group_alloc();
> > +               if (IS_ERR(group)) {
> > +                       dev_err(dev, "Failed to allocate IOMMU group\n");
> > +                       return PTR_ERR(group);
> > +               }
> > +       }
> > +
>  > +       ret = iommu_group_add_device(group, dev);
> > +       if (ret) {
> > +               dev_err(dev, "Failed to add IOMMU group\n");
> > +               goto err_group_put;
> > +       }
> 
> I know the rest of the code means that you can't hit it in practice, but 
> if you ever did have two client devices in the same group then the 
> iommu_group_get() could legitimately succeed for the second device, then 
> you'd blow up creating a duplicate sysfs entry by adding the device to 
> its own group again. Probably not what you want.

   the "dev" is different while enter this function, That is to say
every client device have its own iommu-group. is it right?

> 
> > +
> > +       ret = iommu_attach_group(&m4udom->domain, group);
> > +       if (ret)
> > +               dev_err(dev, "Failed to attach IOMMU group\n");
> 
> Similarly here, if two devices did share a group then the group could 
> legitimately already be attached to a domain here (by the first device), 
> so attaching it again would be wrong. I think it would be nicer to check 
> with iommu_get_domain_for_dev() first to see if you need to do anything 
> at all (a valid domain from that implies a valid group).

   Here all the devices has their own iommu-group, I only attach the
same iommu-domain for them due to our m4u HW. All the clients are in
M4U-HW's domain and there is only one pagetable here.

> 
> > +
> > +err_group_put:
> > +       iommu_group_put(group);
> > +       return ret;
> > +}
> [...]
> > +static int mtk_iommu_probe(struct platform_device *pdev)
> > +{
> > +       struct mtk_iommu_data   *data;
> > +       struct device           *dev = &pdev->dev;
> > +       void __iomem            *protect;
> > +       int                     ret;
> > +
> > +       data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
> > +       if (!data)
> > +               return -ENOMEM;
> > +
> > +       /* Protect memory. HW will access here while translation fault.*/
> > +       protect = devm_kzalloc(dev, MTK_PROTECT_PA_ALIGN * 2, GFP_KERNEL);
> > +       if (!protect)
> > +               return -ENOMEM;
> > +       data->protect_base = virt_to_phys(protect);
> > +
> > +       ret = mtk_iommu_parse_dt(pdev, data);
> > +       if (ret)
> > +               return ret;
> > +
> > +       if (!m4udom)/* There is no iommu client */
> > +               return 0;
> 
> I don't quite follow this: m4udom is apparently only created by someone 
> calling domain_alloc() - how can you guarantee that happens before this 
> driver is probed? - but if they then go and try to attach the device to 
> their new domain, it's going to end up in mtk_hw_init() poking the 
> hardware of the m4u device that can't have even probed yet.

    I think the probe will run always earlier than mtk_hw_init.
    In the mtk_iommu_attach_device below, I add iommu_group_get to
guarantee the sequence.
//==================
static int mtk_iommu_attach_device(struct iommu_domain *domain,
				   struct device *dev)
{
	struct mtk_iommu_domain *priv = to_mtk_domain(domain);
	struct iommu_group *group;
	int ret;

	group = iommu_group_get(dev);
	if (!group)
		return 0;
	iommu_group_put(group);

	ret = mtk_iommu_init_domain_context(priv);
	if (ret)
		return ret;

	return mtk_iommu_config(priv, dev, true);
}
//======================
   After the probe done, It will enter bus_set_iommu->
mtk_iommu_add_device where will create iommu group for it.
then enter iommu_attach_group->mtk_iommu_attach_device again.
is this ok here?

About "how can you guarantee that happens before this 
driver is probed?"
->Sorry, I can't guarantee this. The domain_alloc is called by
arch_setup_dma_ops in the DMA core, I will change this depend on the
next DMA.

> 
> I can only imagine it currently works by sheer chance due to the 
> horrible arch_setup_dma_ops delayed attachment workaround, so even if I 
> can't remove that completely when I look at it next week I'm liable to 
> change it in a way that breaks this badly ;)
> 
> Robin.
> 
> > +
> > +       data->dev = dev;
> > +       m4udom->data = data;
> > +       dev_set_drvdata(dev, m4udom);
> > +
> > +       return 0;
> > +}
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v4 5/6] iommu/mediatek: Add mt8173 IOMMU driver
@ 2015-09-15  5:53       ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-09-15  5:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2015-09-11 at 16:33 +0100, Robin Murphy wrote:
> On 03/08/15 11:21, Yong Wu wrote:
> > This patch adds support for mediatek m4u (MultiMedia Memory Management
> > Unit).
> >
> > Signed-off-by: Yong Wu <yong.wu@mediatek.com>
> > ---
> [...]
> > +/*
> > + * There is only one iommu domain called the m4u domain that
> > + * all Multimedia modules share.
> > + */
> > +static struct mtk_iommu_domain *m4udom;
> 
> It's a shame this can't be part of the m4u device's mtk_iommu_data, but 
> since the way iommu_domain_alloc works makes that impossible, I think we 
> have little choice but to use the global and hope your guys never build 
> a system with two of these things in ;)

Hi Robin,
   Thanks very much for your review. This gobal variable trouble me very
much. please also help check below.

> 
> [...]
> > +static struct iommu_domain *mtk_iommu_domain_alloc(unsigned type)
> > +{
> > +       struct mtk_iommu_domain *priv;
> > +
> > +       if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
> > +               return NULL;
> > +
> > +       if (m4udom)/* The m4u domain exist. */
> > +               return &m4udom->domain;

     From Joerg's comment[0]: "This is not going to work. If you always
return the same domain the iommu core might re-initialize domain state
(and overwrite changed state)."

     It seems that I have to delete this here. then alloc iommu-domain
every time. and add some workaround code in mtk_iommu_attach_device like
our v3[1](reserve one as the m4u domain and delete the others).
Then the code maybe not very concise, but it could also work in the
future, Is it OK?

[0]:http://lists.linuxfoundation.org/pipermail/iommu/2015-August/014057.html
[1]:http://lists.linuxfoundation.org/pipermail/iommu/2015-July/013631.html

> > +
> > +       priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> > +       if (!priv)
> > +               return NULL;
> > +
> > +       priv->domain.geometry.aperture_start = 0;
> > +       priv->domain.geometry.aperture_end = DMA_BIT_MASK(32);
> > +       priv->domain.geometry.force_aperture = true;
> 
> My intention is that in the IOMMU_DOMAIN_DMA case you'd call 
> iommu_get_dma_cookie(&priv->domain) here as well, that way I can get rid 
> of some of the dodgy workarounds in arch_setup_dma_ops which try to 
> cover all possible cases (and which I'm now not 100% confident in). I'm 
> just about to start trying to fix that up (expect a repost of my series 
> in a week or two once -rc1 has landed).

  I will add like this:
  if (type == IOMMU_DOMAIN_DMA && iommu_get_dma_cookie(&priv->domain)) {
       kfree(priv);
       return NULL;
  }

> 
> > +
> > +       m4udom = priv;
> > +
> > +       return &priv->domain;
> > +}
> [...]
> > +static int mtk_iommu_add_device(struct device *dev)
> > +{
> > +       struct iommu_group *group;
> > +       int ret;
> > +
> > +       if (!dev->archdata.iommu) /* Not a iommu client device */
> > +               return -ENODEV;
> > +
> > +       group = iommu_group_get(dev);
> > +       if (!group) {
> > +               group = iommu_group_alloc();
> > +               if (IS_ERR(group)) {
> > +                       dev_err(dev, "Failed to allocate IOMMU group\n");
> > +                       return PTR_ERR(group);
> > +               }
> > +       }
> > +
>  > +       ret = iommu_group_add_device(group, dev);
> > +       if (ret) {
> > +               dev_err(dev, "Failed to add IOMMU group\n");
> > +               goto err_group_put;
> > +       }
> 
> I know the rest of the code means that you can't hit it in practice, but 
> if you ever did have two client devices in the same group then the 
> iommu_group_get() could legitimately succeed for the second device, then 
> you'd blow up creating a duplicate sysfs entry by adding the device to 
> its own group again. Probably not what you want.

   the "dev" is different while enter this function, That is to say
every client device have its own iommu-group. is it right?

> 
> > +
> > +       ret = iommu_attach_group(&m4udom->domain, group);
> > +       if (ret)
> > +               dev_err(dev, "Failed to attach IOMMU group\n");
> 
> Similarly here, if two devices did share a group then the group could 
> legitimately already be attached to a domain here (by the first device), 
> so attaching it again would be wrong. I think it would be nicer to check 
> with iommu_get_domain_for_dev() first to see if you need to do anything 
> at all (a valid domain from that implies a valid group).

   Here all the devices has their own iommu-group, I only attach the
same iommu-domain for them due to our m4u HW. All the clients are in
M4U-HW's domain and there is only one pagetable here.

> 
> > +
> > +err_group_put:
> > +       iommu_group_put(group);
> > +       return ret;
> > +}
> [...]
> > +static int mtk_iommu_probe(struct platform_device *pdev)
> > +{
> > +       struct mtk_iommu_data   *data;
> > +       struct device           *dev = &pdev->dev;
> > +       void __iomem            *protect;
> > +       int                     ret;
> > +
> > +       data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
> > +       if (!data)
> > +               return -ENOMEM;
> > +
> > +       /* Protect memory. HW will access here while translation fault.*/
> > +       protect = devm_kzalloc(dev, MTK_PROTECT_PA_ALIGN * 2, GFP_KERNEL);
> > +       if (!protect)
> > +               return -ENOMEM;
> > +       data->protect_base = virt_to_phys(protect);
> > +
> > +       ret = mtk_iommu_parse_dt(pdev, data);
> > +       if (ret)
> > +               return ret;
> > +
> > +       if (!m4udom)/* There is no iommu client */
> > +               return 0;
> 
> I don't quite follow this: m4udom is apparently only created by someone 
> calling domain_alloc() - how can you guarantee that happens before this 
> driver is probed? - but if they then go and try to attach the device to 
> their new domain, it's going to end up in mtk_hw_init() poking the 
> hardware of the m4u device that can't have even probed yet.

    I think the probe will run always earlier than mtk_hw_init.
    In the mtk_iommu_attach_device below, I add iommu_group_get to
guarantee the sequence.
//==================
static int mtk_iommu_attach_device(struct iommu_domain *domain,
				   struct device *dev)
{
	struct mtk_iommu_domain *priv = to_mtk_domain(domain);
	struct iommu_group *group;
	int ret;

	group = iommu_group_get(dev);
	if (!group)
		return 0;
	iommu_group_put(group);

	ret = mtk_iommu_init_domain_context(priv);
	if (ret)
		return ret;

	return mtk_iommu_config(priv, dev, true);
}
//======================
   After the probe done, It will enter bus_set_iommu->
mtk_iommu_add_device where will create iommu group for it.
then enter iommu_attach_group->mtk_iommu_attach_device again.
is this ok here?

About "how can you guarantee that happens before this 
driver is probed?"
->Sorry, I can't guarantee this. The domain_alloc is called by
arch_setup_dma_ops in the DMA core, I will change this depend on the
next DMA.

> 
> I can only imagine it currently works by sheer chance due to the 
> horrible arch_setup_dma_ops delayed attachment workaround, so even if I 
> can't remove that completely when I look at it next week I'm liable to 
> change it in a way that breaks this badly ;)
> 
> Robin.
> 
> > +
> > +       data->dev = dev;
> > +       m4udom->data = data;
> > +       dev_set_drvdata(dev, m4udom);
> > +
> > +       return 0;
> > +}
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
@ 2015-09-16 15:58     ` Will Deacon
  0 siblings, 0 replies; 60+ messages in thread
From: Will Deacon @ 2015-09-16 15:58 UTC (permalink / raw)
  To: Yong Wu
  Cc: Joerg Roedel, Thierry Reding, Mark Rutland, Matthias Brugger,
	Robin Murphy, Daniel Kurtz, Tomasz Figa, Lucas Stach,
	Rob Herring, Catalin Marinas, linux-mediatek, Sasha Hauer,
	srv_heupstream, devicetree, linux-kernel, linux-arm-kernel,
	iommu, pebolle, arnd, mitchelh, youhua.li, k.zhang,
	frederic.chen

On Mon, Aug 03, 2015 at 11:21:16AM +0100, Yong Wu wrote:
> This patch is for ARM Short Descriptor Format.
> 
> Signed-off-by: Yong Wu <yong.wu@mediatek.com>
> ---
>  drivers/iommu/Kconfig                |  18 +
>  drivers/iommu/Makefile               |   1 +
>  drivers/iommu/io-pgtable-arm-short.c | 813 +++++++++++++++++++++++++++++++++++
>  drivers/iommu/io-pgtable-arm.c       |   3 -
>  drivers/iommu/io-pgtable.c           |   4 +
>  drivers/iommu/io-pgtable.h           |  14 +
>  6 files changed, 850 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/iommu/io-pgtable-arm-short.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index f1fb1d3..3abd066 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -39,6 +39,24 @@ config IOMMU_IO_PGTABLE_LPAE_SELFTEST
> 
>           If unsure, say N here.
> 
> +config IOMMU_IO_PGTABLE_SHORT
> +       bool "ARMv7/v8 Short Descriptor Format"
> +       select IOMMU_IO_PGTABLE
> +       depends on ARM || ARM64 || COMPILE_TEST
> +       help
> +         Enable support for the ARM Short-descriptor pagetable format.
> +         This allocator supports 2 levels translation tables which supports

Some minor rewording here:

"...2 levels of translation tables, which enables a 32-bit memory map based
 on..."

> +         a memory map based on memory sections or pages.
> +
> +config IOMMU_IO_PGTABLE_SHORT_SELFTEST
> +       bool "Short Descriptor selftests"
> +       depends on IOMMU_IO_PGTABLE_SHORT
> +       help
> +         Enable self-tests for Short-descriptor page table allocator.
> +         This performs a series of page-table consistency checks during boot.
> +
> +         If unsure, say N here.
> +
>  endmenu
> 
>  config IOMMU_IOVA

[...]

> +#define ARM_SHORT_PGDIR_SHIFT                  20
> +#define ARM_SHORT_PAGE_SHIFT                   12
> +#define ARM_SHORT_PTRS_PER_PTE                 \
> +       (1 << (ARM_SHORT_PGDIR_SHIFT - ARM_SHORT_PAGE_SHIFT))
> +#define ARM_SHORT_BYTES_PER_PTE                        \
> +       (ARM_SHORT_PTRS_PER_PTE * sizeof(arm_short_iopte))
> +
> +/* level 1 pagetable */
> +#define ARM_SHORT_PGD_TYPE_PGTABLE             BIT(0)
> +#define ARM_SHORT_PGD_TYPE_SECTION             BIT(1)
> +#define ARM_SHORT_PGD_B                                BIT(2)
> +#define ARM_SHORT_PGD_C                                BIT(3)
> +#define ARM_SHORT_PGD_PGTABLE_NS               BIT(3)
> +#define ARM_SHORT_PGD_SECTION_XN               BIT(4)
> +#define ARM_SHORT_PGD_IMPLE                    BIT(9)
> +#define ARM_SHORT_PGD_RD_WR                    (3 << 10)
> +#define ARM_SHORT_PGD_RDONLY                   BIT(15)
> +#define ARM_SHORT_PGD_S                                BIT(16)
> +#define ARM_SHORT_PGD_nG                       BIT(17)
> +#define ARM_SHORT_PGD_SUPERSECTION             BIT(18)
> +#define ARM_SHORT_PGD_SECTION_NS               BIT(19)
> +
> +#define ARM_SHORT_PGD_TYPE_SUPERSECTION                \
> +       (ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_SUPERSECTION)
> +#define ARM_SHORT_PGD_SECTION_TYPE_MSK         \
> +       (ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_SUPERSECTION)
> +#define ARM_SHORT_PGD_PGTABLE_TYPE_MSK         \
> +       (ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_TYPE_PGTABLE)
> +#define ARM_SHORT_PGD_TYPE_IS_PGTABLE(pgd)     \
> +       (((pgd) & ARM_SHORT_PGD_PGTABLE_TYPE_MSK) == ARM_SHORT_PGD_TYPE_PGTABLE)
> +#define ARM_SHORT_PGD_TYPE_IS_SECTION(pgd)     \
> +       (((pgd) & ARM_SHORT_PGD_SECTION_TYPE_MSK) == ARM_SHORT_PGD_TYPE_SECTION)
> +#define ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(pgd)        \
> +       (((pgd) & ARM_SHORT_PGD_SECTION_TYPE_MSK) == \
> +       ARM_SHORT_PGD_TYPE_SUPERSECTION)
> +#define ARM_SHORT_PGD_PGTABLE_MSK              0xfffffc00

You could use (~(ARM_SHORT_BYTES_PER_PTE - 1)), I think.

> +#define ARM_SHORT_PGD_SECTION_MSK              (~(SZ_1M - 1))
> +#define ARM_SHORT_PGD_SUPERSECTION_MSK         (~(SZ_16M - 1))
> +
> +/* level 2 pagetable */
> +#define ARM_SHORT_PTE_TYPE_LARGE               BIT(0)
> +#define ARM_SHORT_PTE_SMALL_XN                 BIT(0)
> +#define ARM_SHORT_PTE_TYPE_SMALL               BIT(1)
> +#define ARM_SHORT_PTE_B                                BIT(2)
> +#define ARM_SHORT_PTE_C                                BIT(3)
> +#define ARM_SHORT_PTE_RD_WR                    (3 << 4)
> +#define ARM_SHORT_PTE_RDONLY                   BIT(9)
> +#define ARM_SHORT_PTE_S                                BIT(10)
> +#define ARM_SHORT_PTE_nG                       BIT(11)
> +#define ARM_SHORT_PTE_LARGE_XN                 BIT(15)
> +#define ARM_SHORT_PTE_LARGE_MSK                        (~(SZ_64K - 1))
> +#define ARM_SHORT_PTE_SMALL_MSK                        (~(SZ_4K - 1))
> +#define ARM_SHORT_PTE_TYPE_MSK                 \
> +       (ARM_SHORT_PTE_TYPE_LARGE | ARM_SHORT_PTE_TYPE_SMALL)
> +#define ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(pte)   \
> +       (((pte) & ARM_SHORT_PTE_TYPE_SMALL) == ARM_SHORT_PTE_TYPE_SMALL)

Maybe a comment here, because it's confusing that you don't and with the
mask due to XN.

> +#define ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(pte)   \
> +       (((pte) & ARM_SHORT_PTE_TYPE_MSK) == ARM_SHORT_PTE_TYPE_LARGE)
> +
> +#define ARM_SHORT_PGD_IDX(a)                   ((a) >> ARM_SHORT_PGDIR_SHIFT)
> +#define ARM_SHORT_PTE_IDX(a)                   \
> +       (((a) >> ARM_SHORT_PAGE_SHIFT) & (ARM_SHORT_PTRS_PER_PTE - 1))
> +
> +#define ARM_SHORT_GET_PGTABLE_VA(pgd)          \
> +       (phys_to_virt((unsigned long)pgd & ARM_SHORT_PGD_PGTABLE_MSK))
> +
> +#define ARM_SHORT_PTE_LARGE_GET_PROT(pte)      \
> +       (((pte) & (~ARM_SHORT_PTE_LARGE_MSK)) & ~ARM_SHORT_PTE_TYPE_MSK)

AFAICT, the only user of this also does an '& ~ARM_SHORT_PTE_SMALL_MSK'.
Wouldn't it be better to define ARM_SHORT_PTE_GET_PROT, which just returns
the AP bits? That said, what are you going to do about XN? I know you
don't support it in your hardware, but this could code should still do
the right thing.

> +static int
> +__arm_short_set_pte(arm_short_iopte *ptep, arm_short_iopte pte,
> +                   unsigned int ptenr, struct io_pgtable_cfg *cfg)
> +{
> +       struct device *dev = cfg->iommu_dev;
> +       int i;
> +
> +       for (i = 0; i < ptenr; i++) {
> +               if (ptep[i] && pte) {
> +                       /* Someone else may have allocated for this pte */
> +                       WARN_ON(!selftest_running);
> +                       goto err_exist_pte;
> +               }
> +               ptep[i] = pte;
> +       }
> +
> +       if (selftest_running)
> +               return 0;
> +
> +       dma_sync_single_for_device(dev, __arm_short_dma_addr(dev, ptep),
> +                                  sizeof(*ptep) * ptenr, DMA_TO_DEVICE);
> +       return 0;
> +
> +err_exist_pte:
> +       while (i--)
> +               ptep[i] = 0;

What about a dma_sync for the failure case?

> +       return -EEXIST;
> +}
> +
> +static void *
> +__arm_short_alloc_pgtable(size_t size, gfp_t gfp, bool pgd,
> +                         struct io_pgtable_cfg *cfg)
> +{
> +       struct arm_short_io_pgtable *data;
> +       struct device *dev = cfg->iommu_dev;
> +       dma_addr_t dma;
> +       void *va;
> +
> +       if (pgd) {/* lvl1 pagetable */
> +               va = alloc_pages_exact(size, gfp);
> +       } else {  /* lvl2 pagetable */
> +               data = io_pgtable_cfg_to_data(cfg);
> +               va = kmem_cache_zalloc(data->pgtable_cached, gfp);
> +       }
> +
> +       if (!va)
> +               return NULL;
> +
> +       if (selftest_running)
> +               return va;
> +
> +       dma = dma_map_single(dev, va, size, DMA_TO_DEVICE);
> +       if (dma_mapping_error(dev, dma))
> +               goto out_free;
> +
> +       if (dma != __arm_short_dma_addr(dev, va))
> +               goto out_unmap;
> +
> +       if (!pgd) {
> +               kmemleak_ignore(va);
> +               dma_sync_single_for_device(dev, __arm_short_dma_addr(dev, va),
> +                                          size, DMA_TO_DEVICE);

Why do you need to do this as well as the dma_map_single above?

> +       }
> +
> +       return va;
> +
> +out_unmap:
> +       dev_err_ratelimited(dev, "Cannot accommodate DMA translation for IOMMU page tables\n");
> +       dma_unmap_single(dev, dma, size, DMA_TO_DEVICE);
> +out_free:
> +       if (pgd)
> +               free_pages_exact(va, size);
> +       else
> +               kmem_cache_free(data->pgtable_cached, va);
> +       return NULL;
> +}
> +
> +static void
> +__arm_short_free_pgtable(void *va, size_t size, bool pgd,
> +                        struct io_pgtable_cfg *cfg)
> +{
> +       struct arm_short_io_pgtable *data = io_pgtable_cfg_to_data(cfg);
> +       struct device *dev = cfg->iommu_dev;
> +
> +       if (!selftest_running)
> +               dma_unmap_single(dev, __arm_short_dma_addr(dev, va),
> +                                size, DMA_TO_DEVICE);
> +
> +       if (pgd)
> +               free_pages_exact(va, size);
> +       else
> +               kmem_cache_free(data->pgtable_cached, va);
> +}
> +
> +static arm_short_iopte
> +__arm_short_pte_prot(struct arm_short_io_pgtable *data, int prot, bool large)
> +{
> +       arm_short_iopte pteprot;
> +       int quirk = data->iop.cfg.quirks;
> +
> +       pteprot = ARM_SHORT_PTE_S | ARM_SHORT_PTE_nG;
> +       pteprot |= large ? ARM_SHORT_PTE_TYPE_LARGE :
> +                               ARM_SHORT_PTE_TYPE_SMALL;
> +       if (prot & IOMMU_CACHE)
> +               pteprot |=  ARM_SHORT_PTE_B | ARM_SHORT_PTE_C;
> +       if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_XN) && (prot & IOMMU_NOEXEC)) {
> +                       pteprot |= large ? ARM_SHORT_PTE_LARGE_XN :
> +                               ARM_SHORT_PTE_SMALL_XN;

Weird indentation, man. Also, see my later comment about combining NO_XN
with NO_PERMS (the latter subsumes the first)

> +       }
> +       if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_PERMS)) {
> +               pteprot |= ARM_SHORT_PTE_RD_WR;
> +               if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
> +                       pteprot |= ARM_SHORT_PTE_RDONLY;
> +       }
> +       return pteprot;
> +}
> +
> +static arm_short_iopte
> +__arm_short_pgd_prot(struct arm_short_io_pgtable *data, int prot, bool super)
> +{
> +       arm_short_iopte pgdprot;
> +       int quirk = data->iop.cfg.quirks;
> +
> +       pgdprot = ARM_SHORT_PGD_S | ARM_SHORT_PGD_nG;
> +       pgdprot |= super ? ARM_SHORT_PGD_TYPE_SUPERSECTION :
> +                               ARM_SHORT_PGD_TYPE_SECTION;
> +       if (prot & IOMMU_CACHE)
> +               pgdprot |= ARM_SHORT_PGD_C | ARM_SHORT_PGD_B;
> +       if (quirk & IO_PGTABLE_QUIRK_ARM_NS)
> +               pgdprot |= ARM_SHORT_PGD_SECTION_NS;
> +
> +       if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_XN) && (prot & IOMMU_NOEXEC))
> +                       pgdprot |= ARM_SHORT_PGD_SECTION_XN;
> +
> +       if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_PERMS)) {

Same comments here.

> +               pgdprot |= ARM_SHORT_PGD_RD_WR;
> +               if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
> +                       pgdprot |= ARM_SHORT_PGD_RDONLY;
> +       }
> +       return pgdprot;
> +}
> +
> +static arm_short_iopte
> +__arm_short_pte_prot_split(struct arm_short_io_pgtable *data,
> +                          arm_short_iopte pgdprot,
> +                          arm_short_iopte pteprot_large,
> +                          bool large)
> +{
> +       arm_short_iopte pteprot = 0;
> +
> +       pteprot = ARM_SHORT_PTE_S | ARM_SHORT_PTE_nG | ARM_SHORT_PTE_RD_WR;
> +       pteprot |= large ? ARM_SHORT_PTE_TYPE_LARGE :
> +                               ARM_SHORT_PTE_TYPE_SMALL;
> +
> +       /* large page to small page pte prot. Only large page may split */
> +       if (!pgdprot && !large) {

It's slightly complicated having these two variables controlling the
behaviour of the split. In reality, we're either splitting a section or
a large page, so there are three valid combinations.

It might be simpler to operate on IOMMU_{READ,WRITE,NOEXEC,CACHE} as
much as possible, and then have some simple functions to encode/decode
these into section/large/small page prot bits. We could then just pass
the IOMMU_* prot around along with the map size. What do you think?

> +               pteprot |= pteprot_large & ~ARM_SHORT_PTE_SMALL_MSK;
> +               if (pteprot_large & ARM_SHORT_PTE_LARGE_XN)
> +                       pteprot |= ARM_SHORT_PTE_SMALL_XN;
> +       }
> +
> +       /* section to pte prot */
> +       if (pgdprot & ARM_SHORT_PGD_C)
> +               pteprot |= ARM_SHORT_PTE_C;
> +       if (pgdprot & ARM_SHORT_PGD_B)
> +               pteprot |= ARM_SHORT_PTE_B;
> +       if (pgdprot & ARM_SHORT_PGD_nG)
> +               pteprot |= ARM_SHORT_PTE_nG;
> +       if (pgdprot & ARM_SHORT_PGD_SECTION_XN)
> +               pteprot |= large ? ARM_SHORT_PTE_LARGE_XN :
> +                               ARM_SHORT_PTE_SMALL_XN;
> +       if (pgdprot & ARM_SHORT_PGD_RD_WR)
> +               pteprot |= ARM_SHORT_PTE_RD_WR;
> +       if (pgdprot & ARM_SHORT_PGD_RDONLY)
> +               pteprot |= ARM_SHORT_PTE_RDONLY;
> +
> +       return pteprot;
> +}
> +
> +static arm_short_iopte
> +__arm_short_pgtable_prot(struct arm_short_io_pgtable *data)
> +{
> +       arm_short_iopte pgdprot = 0;
> +
> +       pgdprot = ARM_SHORT_PGD_TYPE_PGTABLE;
> +       if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS)
> +               pgdprot |= ARM_SHORT_PGD_PGTABLE_NS;
> +       return pgdprot;
> +}
> +
> +static int
> +_arm_short_map(struct arm_short_io_pgtable *data,
> +              unsigned int iova, phys_addr_t paddr,
> +              arm_short_iopte pgdprot, arm_short_iopte pteprot,
> +              bool large)
> +{
> +       struct io_pgtable_cfg *cfg = &data->iop.cfg;
> +       arm_short_iopte *pgd = data->pgd, *pte;
> +       void *pte_new = NULL;
> +       int ret;
> +
> +       pgd += ARM_SHORT_PGD_IDX(iova);
> +
> +       if (!pteprot) { /* section or supersection */
> +               pte = pgd;
> +               pteprot = pgdprot;
> +       } else {        /* page or largepage */
> +               if (!(*pgd)) {
> +                       pte_new = __arm_short_alloc_pgtable(
> +                                       ARM_SHORT_BYTES_PER_PTE,
> +                                       GFP_ATOMIC, false, cfg);
> +                       if (unlikely(!pte_new))
> +                               return -ENOMEM;
> +
> +                       pgdprot |= virt_to_phys(pte_new);
> +                       __arm_short_set_pte(pgd, pgdprot, 1, cfg);
> +               }
> +               pte = arm_short_get_pte_in_pgd(*pgd, iova);
> +       }
> +
> +       pteprot |= (arm_short_iopte)paddr;
> +       ret = __arm_short_set_pte(pte, pteprot, large ? 16 : 1, cfg);
> +       if (ret && pte_new)
> +               __arm_short_free_pgtable(pte_new, ARM_SHORT_BYTES_PER_PTE,
> +                                        false, cfg);

Don't you need to kill the pgd entry before freeing this? Please see my
previous comments about safely freeing page tables:

  http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/358268.html

(at the end of the post)

> +       return ret;
> +}
> +
> +static int arm_short_map(struct io_pgtable_ops *ops, unsigned long iova,
> +                        phys_addr_t paddr, size_t size, int prot)
> +{
> +       struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
> +       arm_short_iopte pgdprot = 0, pteprot = 0;
> +       bool large;
> +
> +       /* If no access, then nothing to do */
> +       if (!(prot & (IOMMU_READ | IOMMU_WRITE)))
> +               return 0;
> +
> +       if (WARN_ON((iova | paddr) & (size - 1)))
> +               return -EINVAL;
> +
> +       switch (size) {
> +       case SZ_4K:
> +       case SZ_64K:
> +               large = (size == SZ_64K) ? true : false;
> +               pteprot = __arm_short_pte_prot(data, prot, large);
> +               pgdprot = __arm_short_pgtable_prot(data);
> +               break;
> +
> +       case SZ_1M:
> +       case SZ_16M:
> +               large = (size == SZ_16M) ? true : false;
> +               pgdprot = __arm_short_pgd_prot(data, prot, large);
> +               break;
> +       default:
> +               return -EINVAL;
> +       }
> +
> +       return _arm_short_map(data, iova, paddr, pgdprot, pteprot, large);
> +}
> +
> +static phys_addr_t arm_short_iova_to_phys(struct io_pgtable_ops *ops,
> +                                         unsigned long iova)
> +{
> +       struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
> +       arm_short_iopte *pte, *pgd = data->pgd;
> +       phys_addr_t pa = 0;
> +
> +       pgd += ARM_SHORT_PGD_IDX(iova);
> +
> +       if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(*pgd)) {
> +               pte = arm_short_get_pte_in_pgd(*pgd, iova);
> +
> +               if (ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(*pte)) {
> +                       pa = (*pte) & ARM_SHORT_PTE_LARGE_MSK;
> +                       pa |= iova & ~ARM_SHORT_PTE_LARGE_MSK;
> +               } else if (ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(*pte)) {
> +                       pa = (*pte) & ARM_SHORT_PTE_SMALL_MSK;
> +                       pa |= iova & ~ARM_SHORT_PTE_SMALL_MSK;
> +               }
> +       } else if (ARM_SHORT_PGD_TYPE_IS_SECTION(*pgd)) {
> +               pa = (*pgd) & ARM_SHORT_PGD_SECTION_MSK;
> +               pa |= iova & ~ARM_SHORT_PGD_SECTION_MSK;
> +       } else if (ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(*pgd)) {
> +               pa = (*pgd) & ARM_SHORT_PGD_SUPERSECTION_MSK;
> +               pa |= iova & ~ARM_SHORT_PGD_SUPERSECTION_MSK;
> +       }
> +
> +       return pa;
> +}
> +
> +static bool _arm_short_whether_free_pgtable(arm_short_iopte *pgd)
> +{

_arm_short_pgtable_empty might be a better name.

> +       arm_short_iopte *pte;
> +       int i;
> +
> +       pte = ARM_SHORT_GET_PGTABLE_VA(*pgd);
> +       for (i = 0; i < ARM_SHORT_PTRS_PER_PTE; i++) {
> +               if (pte[i] != 0)
> +                       return false;
> +       }
> +
> +       return true;
> +}
> +
> +static int
> +arm_short_split_blk_unmap(struct io_pgtable_ops *ops, unsigned int iova,
> +                         phys_addr_t paddr, size_t size,
> +                         arm_short_iopte pgdprotup, arm_short_iopte pteprotup,
> +                         size_t blk_size)
> +{
> +       struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
> +       const struct iommu_gather_ops *tlb = data->iop.cfg.tlb;
> +       struct io_pgtable_cfg *cfg = &data->iop.cfg;
> +       unsigned long *pgbitmap = &cfg->pgsize_bitmap;
> +       unsigned int blk_base, blk_start, blk_end, i;
> +       arm_short_iopte pgdprot, pteprot;
> +       phys_addr_t blk_paddr;
> +       size_t mapsize = 0, nextmapsize;
> +       int ret;
> +
> +       /* find the nearest mapsize */
> +       for (i = find_first_bit(pgbitmap, BITS_PER_LONG);
> +            i < BITS_PER_LONG && ((1 << i) < blk_size) &&
> +            IS_ALIGNED(size, 1 << i);
> +            i = find_next_bit(pgbitmap, BITS_PER_LONG, i + 1))
> +               mapsize = 1 << i;
> +
> +       if (WARN_ON(!mapsize))
> +               return 0; /* Bytes unmapped */
> +       nextmapsize = 1 << i;
> +
> +       blk_base = iova & ~(blk_size - 1);
> +       blk_start = blk_base;
> +       blk_end = blk_start + blk_size;
> +       blk_paddr = paddr;
> +
> +       for (; blk_start < blk_end;
> +            blk_start += mapsize, blk_paddr += mapsize) {
> +               /* Unmap! */
> +               if (blk_start == iova)
> +                       continue;
> +
> +               /* Try to upper map */
> +               if (blk_base != blk_start &&
> +                   IS_ALIGNED(blk_start | blk_paddr, nextmapsize) &&
> +                   mapsize != nextmapsize) {
> +                       mapsize = nextmapsize;
> +                       i = find_next_bit(pgbitmap, BITS_PER_LONG, i + 1);
> +                       if (i < BITS_PER_LONG)
> +                               nextmapsize = 1 << i;
> +               }
> +
> +               if (mapsize == SZ_1M) {

How do we get here with a mapsize of 1M?

> +                       pgdprot = pgdprotup;
> +                       pgdprot |= __arm_short_pgd_prot(data, 0, false);
> +                       pteprot = 0;
> +               } else { /* small or large page */
> +                       pgdprot = (blk_size == SZ_64K) ? 0 : pgdprotup;
> +                       pteprot = __arm_short_pte_prot_split(
> +                                       data, pgdprot, pteprotup,
> +                                       mapsize == SZ_64K);
> +                       pgdprot = __arm_short_pgtable_prot(data);
> +               }
> +
> +               ret = _arm_short_map(data, blk_start, blk_paddr, pgdprot,
> +                                    pteprot, mapsize == SZ_64K);
> +               if (ret < 0) {
> +                       /* Free the table we allocated */
> +                       arm_short_iopte *pgd = data->pgd, *pte;
> +
> +                       pgd += ARM_SHORT_PGD_IDX(blk_base);
> +                       if (*pgd) {
> +                               pte = ARM_SHORT_GET_PGTABLE_VA(*pgd);
> +                               __arm_short_set_pte(pgd, 0, 1, cfg);
> +                               tlb->tlb_add_flush(blk_base, blk_size, true,
> +                                                  data->iop.cookie);
> +                               tlb->tlb_sync(data->iop.cookie);
> +                               __arm_short_free_pgtable(
> +                                       pte, ARM_SHORT_BYTES_PER_PTE,
> +                                       false, cfg);

This looks wrong. _arm_short_map cleans up if it returns non-zero already.

> +                       }
> +                       return 0;/* Bytes unmapped */
> +               }
> +       }
> +
> +       tlb->tlb_add_flush(blk_base, blk_size, true, data->iop.cookie);
> +       tlb->tlb_sync(data->iop.cookie);

Why are you syncing here? You can postpone this to the caller, if it turns
out the unmap was a success.

> +       return size;
> +}
> +
> +static int arm_short_unmap(struct io_pgtable_ops *ops,
> +                          unsigned long iova,
> +                          size_t size)
> +{
> +       struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
> +       struct io_pgtable_cfg *cfg = &data->iop.cfg;
> +       arm_short_iopte *pgd, *pte = NULL;
> +       arm_short_iopte curpgd, curpte = 0;
> +       phys_addr_t paddr;
> +       unsigned int iova_base, blk_size = 0;
> +       void *cookie = data->iop.cookie;
> +       bool pgtablefree = false;
> +
> +       pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova);
> +
> +       /* Get block size */
> +       if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(*pgd)) {
> +               pte = arm_short_get_pte_in_pgd(*pgd, iova);
> +
> +               if (ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(*pte))
> +                       blk_size = SZ_4K;
> +               else if (ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(*pte))
> +                       blk_size = SZ_64K;
> +               else
> +                       WARN_ON(1);
> +       } else if (ARM_SHORT_PGD_TYPE_IS_SECTION(*pgd)) {
> +               blk_size = SZ_1M;
> +       } else if (ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(*pgd)) {
> +               blk_size = SZ_16M;
> +       } else {
> +               WARN_ON(1);

Maybe return 0 or something instead of falling through with blk_size == 0?

> +       }
> +
> +       iova_base = iova & ~(blk_size - 1);
> +       pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova_base);
> +       paddr = arm_short_iova_to_phys(ops, iova_base);
> +       curpgd = *pgd;
> +
> +       if (blk_size == SZ_4K || blk_size == SZ_64K) {
> +               pte = arm_short_get_pte_in_pgd(*pgd, iova_base);
> +               curpte = *pte;
> +               __arm_short_set_pte(pte, 0, blk_size / SZ_4K, cfg);
> +
> +               pgtablefree = _arm_short_whether_free_pgtable(pgd);
> +               if (pgtablefree)
> +                       __arm_short_set_pte(pgd, 0, 1, cfg);
> +       } else if (blk_size == SZ_1M || blk_size == SZ_16M) {
> +               __arm_short_set_pte(pgd, 0, blk_size / SZ_1M, cfg);
> +       }
> +
> +       cfg->tlb->tlb_add_flush(iova_base, blk_size, true, cookie);
> +       cfg->tlb->tlb_sync(cookie);
> +
> +       if (pgtablefree)/* Free pgtable after tlb-flush */
> +               __arm_short_free_pgtable(ARM_SHORT_GET_PGTABLE_VA(curpgd),
> +                                        ARM_SHORT_BYTES_PER_PTE, false, cfg);

Curious, but why do you care about freeing this on unmap? It will get
freed when the page table itself is freed anyway (via the ->free callback).

> +
> +       if (blk_size > size) { /* Split the block */
> +               return arm_short_split_blk_unmap(
> +                               ops, iova, paddr, size,
> +                               ARM_SHORT_PGD_GET_PROT(curpgd),
> +                               ARM_SHORT_PTE_LARGE_GET_PROT(curpte),
> +                               blk_size);
> +       } else if (blk_size < size) {
> +               /* Unmap the block while remap partial again after split */
> +               return blk_size +
> +                       arm_short_unmap(ops, iova + blk_size, size - blk_size);
> +       }
> +
> +       return size;
> +}
> +
> +static struct io_pgtable *
> +arm_short_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
> +{
> +       struct arm_short_io_pgtable *data;
> +
> +       if (cfg->ias > 32 || cfg->oas > 32)
> +               return NULL;
> +
> +       cfg->pgsize_bitmap &=
> +               (cfg->quirks & IO_PGTABLE_QUIRK_SHORT_SUPERSECTION) ?
> +               (SZ_4K | SZ_64K | SZ_1M | SZ_16M) : (SZ_4K | SZ_64K | SZ_1M);
> +
> +       data = kzalloc(sizeof(*data), GFP_KERNEL);
> +       if (!data)
> +               return NULL;
> +
> +       data->pgd_size = SZ_16K;
> +       data->pgd = __arm_short_alloc_pgtable(
> +                                       data->pgd_size,
> +                                       GFP_KERNEL | __GFP_ZERO | __GFP_DMA,
> +                                       true, cfg);
> +       if (!data->pgd)
> +               goto out_free_data;
> +       wmb();/* Ensure the empty pgd is visible before any actual TTBR write */
> +
> +       data->pgtable_cached = kmem_cache_create(
> +                                       "io-pgtable-arm-short",
> +                                        ARM_SHORT_BYTES_PER_PTE,
> +                                        ARM_SHORT_BYTES_PER_PTE,
> +                                        0, NULL);
> +       if (!data->pgtable_cached)
> +               goto out_free_pgd;
> +
> +       /* TTBRs */
> +       cfg->arm_short_cfg.ttbr[0] = virt_to_phys(data->pgd);
> +       cfg->arm_short_cfg.ttbr[1] = 0;
> +       cfg->arm_short_cfg.tcr = 0;
> +       cfg->arm_short_cfg.nmrr = 0;
> +       cfg->arm_short_cfg.prrr = 0;
> +
> +       data->iop.ops = (struct io_pgtable_ops) {
> +               .map            = arm_short_map,
> +               .unmap          = arm_short_unmap,
> +               .iova_to_phys   = arm_short_iova_to_phys,
> +       };
> +
> +       return &data->iop;
> +
> +out_free_pgd:
> +       __arm_short_free_pgtable(data->pgd, data->pgd_size, true, cfg);
> +out_free_data:
> +       kfree(data);
> +       return NULL;
> +}
> +
> +static void arm_short_free_pgtable(struct io_pgtable *iop)
> +{
> +       struct arm_short_io_pgtable *data = io_pgtable_to_data(iop);
> +
> +       kmem_cache_destroy(data->pgtable_cached);
> +       __arm_short_free_pgtable(data->pgd, data->pgd_size,
> +                                true, &data->iop.cfg);
> +       kfree(data);
> +}
> +
> +struct io_pgtable_init_fns io_pgtable_arm_short_init_fns = {
> +       .alloc  = arm_short_alloc_pgtable,
> +       .free   = arm_short_free_pgtable,
> +};
> +

[...]

> diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
> index 6436fe2..14a9b3a 100644
> --- a/drivers/iommu/io-pgtable.c
> +++ b/drivers/iommu/io-pgtable.c
> @@ -28,6 +28,7 @@ extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns;
>  extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns;
>  extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns;
>  extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns;
> +extern struct io_pgtable_init_fns io_pgtable_arm_short_init_fns;
> 
>  static const struct io_pgtable_init_fns *
>  io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] =
> @@ -38,6 +39,9 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] =
>         [ARM_64_LPAE_S1] = &io_pgtable_arm_64_lpae_s1_init_fns,
>         [ARM_64_LPAE_S2] = &io_pgtable_arm_64_lpae_s2_init_fns,
>  #endif
> +#ifdef CONFIG_IOMMU_IO_PGTABLE_SHORT
> +       [ARM_SHORT_DESC] = &io_pgtable_arm_short_init_fns,
> +#endif
>  };
> 
>  struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
> diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
> index 68c63d9..0f45e60 100644
> --- a/drivers/iommu/io-pgtable.h
> +++ b/drivers/iommu/io-pgtable.h
> @@ -9,6 +9,7 @@ enum io_pgtable_fmt {
>         ARM_32_LPAE_S2,
>         ARM_64_LPAE_S1,
>         ARM_64_LPAE_S2,
> +       ARM_SHORT_DESC,
>         IO_PGTABLE_NUM_FMTS,
>  };
> 
> @@ -45,6 +46,9 @@ struct iommu_gather_ops {
>   */
>  struct io_pgtable_cfg {
>         #define IO_PGTABLE_QUIRK_ARM_NS (1 << 0)        /* Set NS bit in PTEs */
> +       #define IO_PGTABLE_QUIRK_SHORT_SUPERSECTION     BIT(1)
> +       #define IO_PGTABLE_QUIRK_SHORT_NO_XN            BIT(2) /* No XN bit */
> +       #define IO_PGTABLE_QUIRK_SHORT_NO_PERMS         BIT(3) /* No AP bit */

Why have two quirks for this? I suggested included NO_XN in NO_PERMS:

  http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/361160.html

>         int                             quirks;
>         unsigned long                   pgsize_bitmap;
>         unsigned int                    ias;
> @@ -64,6 +68,13 @@ struct io_pgtable_cfg {
>                         u64     vttbr;
>                         u64     vtcr;
>                 } arm_lpae_s2_cfg;
> +
> +               struct {
> +                       u32     ttbr[2];
> +                       u32     tcr;
> +                       u32     nmrr;
> +                       u32     prrr;
> +               } arm_short_cfg;

We don't return an SCTLR value here, so a comment somewhere saying that
access flag is not supported would be helpful (so that drivers can ensure
that they configure things for the AP[2:0] permission model).

Will

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
@ 2015-09-16 15:58     ` Will Deacon
  0 siblings, 0 replies; 60+ messages in thread
From: Will Deacon @ 2015-09-16 15:58 UTC (permalink / raw)
  To: Yong Wu
  Cc: Joerg Roedel, Thierry Reding, Mark Rutland, Matthias Brugger,
	Robin Murphy, Daniel Kurtz, Tomasz Figa, Lucas Stach,
	Rob Herring, Catalin Marinas,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Sasha Hauer,
	srv_heupstream-NuS5LvNUpcJWk0Htik3J/w,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pebolle-IWqWACnzNjzz+pZb47iToQ

On Mon, Aug 03, 2015 at 11:21:16AM +0100, Yong Wu wrote:
> This patch is for ARM Short Descriptor Format.
> 
> Signed-off-by: Yong Wu <yong.wu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
> ---
>  drivers/iommu/Kconfig                |  18 +
>  drivers/iommu/Makefile               |   1 +
>  drivers/iommu/io-pgtable-arm-short.c | 813 +++++++++++++++++++++++++++++++++++
>  drivers/iommu/io-pgtable-arm.c       |   3 -
>  drivers/iommu/io-pgtable.c           |   4 +
>  drivers/iommu/io-pgtable.h           |  14 +
>  6 files changed, 850 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/iommu/io-pgtable-arm-short.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index f1fb1d3..3abd066 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -39,6 +39,24 @@ config IOMMU_IO_PGTABLE_LPAE_SELFTEST
> 
>           If unsure, say N here.
> 
> +config IOMMU_IO_PGTABLE_SHORT
> +       bool "ARMv7/v8 Short Descriptor Format"
> +       select IOMMU_IO_PGTABLE
> +       depends on ARM || ARM64 || COMPILE_TEST
> +       help
> +         Enable support for the ARM Short-descriptor pagetable format.
> +         This allocator supports 2 levels translation tables which supports

Some minor rewording here:

"...2 levels of translation tables, which enables a 32-bit memory map based
 on..."

> +         a memory map based on memory sections or pages.
> +
> +config IOMMU_IO_PGTABLE_SHORT_SELFTEST
> +       bool "Short Descriptor selftests"
> +       depends on IOMMU_IO_PGTABLE_SHORT
> +       help
> +         Enable self-tests for Short-descriptor page table allocator.
> +         This performs a series of page-table consistency checks during boot.
> +
> +         If unsure, say N here.
> +
>  endmenu
> 
>  config IOMMU_IOVA

[...]

> +#define ARM_SHORT_PGDIR_SHIFT                  20
> +#define ARM_SHORT_PAGE_SHIFT                   12
> +#define ARM_SHORT_PTRS_PER_PTE                 \
> +       (1 << (ARM_SHORT_PGDIR_SHIFT - ARM_SHORT_PAGE_SHIFT))
> +#define ARM_SHORT_BYTES_PER_PTE                        \
> +       (ARM_SHORT_PTRS_PER_PTE * sizeof(arm_short_iopte))
> +
> +/* level 1 pagetable */
> +#define ARM_SHORT_PGD_TYPE_PGTABLE             BIT(0)
> +#define ARM_SHORT_PGD_TYPE_SECTION             BIT(1)
> +#define ARM_SHORT_PGD_B                                BIT(2)
> +#define ARM_SHORT_PGD_C                                BIT(3)
> +#define ARM_SHORT_PGD_PGTABLE_NS               BIT(3)
> +#define ARM_SHORT_PGD_SECTION_XN               BIT(4)
> +#define ARM_SHORT_PGD_IMPLE                    BIT(9)
> +#define ARM_SHORT_PGD_RD_WR                    (3 << 10)
> +#define ARM_SHORT_PGD_RDONLY                   BIT(15)
> +#define ARM_SHORT_PGD_S                                BIT(16)
> +#define ARM_SHORT_PGD_nG                       BIT(17)
> +#define ARM_SHORT_PGD_SUPERSECTION             BIT(18)
> +#define ARM_SHORT_PGD_SECTION_NS               BIT(19)
> +
> +#define ARM_SHORT_PGD_TYPE_SUPERSECTION                \
> +       (ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_SUPERSECTION)
> +#define ARM_SHORT_PGD_SECTION_TYPE_MSK         \
> +       (ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_SUPERSECTION)
> +#define ARM_SHORT_PGD_PGTABLE_TYPE_MSK         \
> +       (ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_TYPE_PGTABLE)
> +#define ARM_SHORT_PGD_TYPE_IS_PGTABLE(pgd)     \
> +       (((pgd) & ARM_SHORT_PGD_PGTABLE_TYPE_MSK) == ARM_SHORT_PGD_TYPE_PGTABLE)
> +#define ARM_SHORT_PGD_TYPE_IS_SECTION(pgd)     \
> +       (((pgd) & ARM_SHORT_PGD_SECTION_TYPE_MSK) == ARM_SHORT_PGD_TYPE_SECTION)
> +#define ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(pgd)        \
> +       (((pgd) & ARM_SHORT_PGD_SECTION_TYPE_MSK) == \
> +       ARM_SHORT_PGD_TYPE_SUPERSECTION)
> +#define ARM_SHORT_PGD_PGTABLE_MSK              0xfffffc00

You could use (~(ARM_SHORT_BYTES_PER_PTE - 1)), I think.

> +#define ARM_SHORT_PGD_SECTION_MSK              (~(SZ_1M - 1))
> +#define ARM_SHORT_PGD_SUPERSECTION_MSK         (~(SZ_16M - 1))
> +
> +/* level 2 pagetable */
> +#define ARM_SHORT_PTE_TYPE_LARGE               BIT(0)
> +#define ARM_SHORT_PTE_SMALL_XN                 BIT(0)
> +#define ARM_SHORT_PTE_TYPE_SMALL               BIT(1)
> +#define ARM_SHORT_PTE_B                                BIT(2)
> +#define ARM_SHORT_PTE_C                                BIT(3)
> +#define ARM_SHORT_PTE_RD_WR                    (3 << 4)
> +#define ARM_SHORT_PTE_RDONLY                   BIT(9)
> +#define ARM_SHORT_PTE_S                                BIT(10)
> +#define ARM_SHORT_PTE_nG                       BIT(11)
> +#define ARM_SHORT_PTE_LARGE_XN                 BIT(15)
> +#define ARM_SHORT_PTE_LARGE_MSK                        (~(SZ_64K - 1))
> +#define ARM_SHORT_PTE_SMALL_MSK                        (~(SZ_4K - 1))
> +#define ARM_SHORT_PTE_TYPE_MSK                 \
> +       (ARM_SHORT_PTE_TYPE_LARGE | ARM_SHORT_PTE_TYPE_SMALL)
> +#define ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(pte)   \
> +       (((pte) & ARM_SHORT_PTE_TYPE_SMALL) == ARM_SHORT_PTE_TYPE_SMALL)

Maybe a comment here, because it's confusing that you don't and with the
mask due to XN.

> +#define ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(pte)   \
> +       (((pte) & ARM_SHORT_PTE_TYPE_MSK) == ARM_SHORT_PTE_TYPE_LARGE)
> +
> +#define ARM_SHORT_PGD_IDX(a)                   ((a) >> ARM_SHORT_PGDIR_SHIFT)
> +#define ARM_SHORT_PTE_IDX(a)                   \
> +       (((a) >> ARM_SHORT_PAGE_SHIFT) & (ARM_SHORT_PTRS_PER_PTE - 1))
> +
> +#define ARM_SHORT_GET_PGTABLE_VA(pgd)          \
> +       (phys_to_virt((unsigned long)pgd & ARM_SHORT_PGD_PGTABLE_MSK))
> +
> +#define ARM_SHORT_PTE_LARGE_GET_PROT(pte)      \
> +       (((pte) & (~ARM_SHORT_PTE_LARGE_MSK)) & ~ARM_SHORT_PTE_TYPE_MSK)

AFAICT, the only user of this also does an '& ~ARM_SHORT_PTE_SMALL_MSK'.
Wouldn't it be better to define ARM_SHORT_PTE_GET_PROT, which just returns
the AP bits? That said, what are you going to do about XN? I know you
don't support it in your hardware, but this could code should still do
the right thing.

> +static int
> +__arm_short_set_pte(arm_short_iopte *ptep, arm_short_iopte pte,
> +                   unsigned int ptenr, struct io_pgtable_cfg *cfg)
> +{
> +       struct device *dev = cfg->iommu_dev;
> +       int i;
> +
> +       for (i = 0; i < ptenr; i++) {
> +               if (ptep[i] && pte) {
> +                       /* Someone else may have allocated for this pte */
> +                       WARN_ON(!selftest_running);
> +                       goto err_exist_pte;
> +               }
> +               ptep[i] = pte;
> +       }
> +
> +       if (selftest_running)
> +               return 0;
> +
> +       dma_sync_single_for_device(dev, __arm_short_dma_addr(dev, ptep),
> +                                  sizeof(*ptep) * ptenr, DMA_TO_DEVICE);
> +       return 0;
> +
> +err_exist_pte:
> +       while (i--)
> +               ptep[i] = 0;

What about a dma_sync for the failure case?

> +       return -EEXIST;
> +}
> +
> +static void *
> +__arm_short_alloc_pgtable(size_t size, gfp_t gfp, bool pgd,
> +                         struct io_pgtable_cfg *cfg)
> +{
> +       struct arm_short_io_pgtable *data;
> +       struct device *dev = cfg->iommu_dev;
> +       dma_addr_t dma;
> +       void *va;
> +
> +       if (pgd) {/* lvl1 pagetable */
> +               va = alloc_pages_exact(size, gfp);
> +       } else {  /* lvl2 pagetable */
> +               data = io_pgtable_cfg_to_data(cfg);
> +               va = kmem_cache_zalloc(data->pgtable_cached, gfp);
> +       }
> +
> +       if (!va)
> +               return NULL;
> +
> +       if (selftest_running)
> +               return va;
> +
> +       dma = dma_map_single(dev, va, size, DMA_TO_DEVICE);
> +       if (dma_mapping_error(dev, dma))
> +               goto out_free;
> +
> +       if (dma != __arm_short_dma_addr(dev, va))
> +               goto out_unmap;
> +
> +       if (!pgd) {
> +               kmemleak_ignore(va);
> +               dma_sync_single_for_device(dev, __arm_short_dma_addr(dev, va),
> +                                          size, DMA_TO_DEVICE);

Why do you need to do this as well as the dma_map_single above?

> +       }
> +
> +       return va;
> +
> +out_unmap:
> +       dev_err_ratelimited(dev, "Cannot accommodate DMA translation for IOMMU page tables\n");
> +       dma_unmap_single(dev, dma, size, DMA_TO_DEVICE);
> +out_free:
> +       if (pgd)
> +               free_pages_exact(va, size);
> +       else
> +               kmem_cache_free(data->pgtable_cached, va);
> +       return NULL;
> +}
> +
> +static void
> +__arm_short_free_pgtable(void *va, size_t size, bool pgd,
> +                        struct io_pgtable_cfg *cfg)
> +{
> +       struct arm_short_io_pgtable *data = io_pgtable_cfg_to_data(cfg);
> +       struct device *dev = cfg->iommu_dev;
> +
> +       if (!selftest_running)
> +               dma_unmap_single(dev, __arm_short_dma_addr(dev, va),
> +                                size, DMA_TO_DEVICE);
> +
> +       if (pgd)
> +               free_pages_exact(va, size);
> +       else
> +               kmem_cache_free(data->pgtable_cached, va);
> +}
> +
> +static arm_short_iopte
> +__arm_short_pte_prot(struct arm_short_io_pgtable *data, int prot, bool large)
> +{
> +       arm_short_iopte pteprot;
> +       int quirk = data->iop.cfg.quirks;
> +
> +       pteprot = ARM_SHORT_PTE_S | ARM_SHORT_PTE_nG;
> +       pteprot |= large ? ARM_SHORT_PTE_TYPE_LARGE :
> +                               ARM_SHORT_PTE_TYPE_SMALL;
> +       if (prot & IOMMU_CACHE)
> +               pteprot |=  ARM_SHORT_PTE_B | ARM_SHORT_PTE_C;
> +       if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_XN) && (prot & IOMMU_NOEXEC)) {
> +                       pteprot |= large ? ARM_SHORT_PTE_LARGE_XN :
> +                               ARM_SHORT_PTE_SMALL_XN;

Weird indentation, man. Also, see my later comment about combining NO_XN
with NO_PERMS (the latter subsumes the first)

> +       }
> +       if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_PERMS)) {
> +               pteprot |= ARM_SHORT_PTE_RD_WR;
> +               if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
> +                       pteprot |= ARM_SHORT_PTE_RDONLY;
> +       }
> +       return pteprot;
> +}
> +
> +static arm_short_iopte
> +__arm_short_pgd_prot(struct arm_short_io_pgtable *data, int prot, bool super)
> +{
> +       arm_short_iopte pgdprot;
> +       int quirk = data->iop.cfg.quirks;
> +
> +       pgdprot = ARM_SHORT_PGD_S | ARM_SHORT_PGD_nG;
> +       pgdprot |= super ? ARM_SHORT_PGD_TYPE_SUPERSECTION :
> +                               ARM_SHORT_PGD_TYPE_SECTION;
> +       if (prot & IOMMU_CACHE)
> +               pgdprot |= ARM_SHORT_PGD_C | ARM_SHORT_PGD_B;
> +       if (quirk & IO_PGTABLE_QUIRK_ARM_NS)
> +               pgdprot |= ARM_SHORT_PGD_SECTION_NS;
> +
> +       if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_XN) && (prot & IOMMU_NOEXEC))
> +                       pgdprot |= ARM_SHORT_PGD_SECTION_XN;
> +
> +       if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_PERMS)) {

Same comments here.

> +               pgdprot |= ARM_SHORT_PGD_RD_WR;
> +               if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
> +                       pgdprot |= ARM_SHORT_PGD_RDONLY;
> +       }
> +       return pgdprot;
> +}
> +
> +static arm_short_iopte
> +__arm_short_pte_prot_split(struct arm_short_io_pgtable *data,
> +                          arm_short_iopte pgdprot,
> +                          arm_short_iopte pteprot_large,
> +                          bool large)
> +{
> +       arm_short_iopte pteprot = 0;
> +
> +       pteprot = ARM_SHORT_PTE_S | ARM_SHORT_PTE_nG | ARM_SHORT_PTE_RD_WR;
> +       pteprot |= large ? ARM_SHORT_PTE_TYPE_LARGE :
> +                               ARM_SHORT_PTE_TYPE_SMALL;
> +
> +       /* large page to small page pte prot. Only large page may split */
> +       if (!pgdprot && !large) {

It's slightly complicated having these two variables controlling the
behaviour of the split. In reality, we're either splitting a section or
a large page, so there are three valid combinations.

It might be simpler to operate on IOMMU_{READ,WRITE,NOEXEC,CACHE} as
much as possible, and then have some simple functions to encode/decode
these into section/large/small page prot bits. We could then just pass
the IOMMU_* prot around along with the map size. What do you think?

> +               pteprot |= pteprot_large & ~ARM_SHORT_PTE_SMALL_MSK;
> +               if (pteprot_large & ARM_SHORT_PTE_LARGE_XN)
> +                       pteprot |= ARM_SHORT_PTE_SMALL_XN;
> +       }
> +
> +       /* section to pte prot */
> +       if (pgdprot & ARM_SHORT_PGD_C)
> +               pteprot |= ARM_SHORT_PTE_C;
> +       if (pgdprot & ARM_SHORT_PGD_B)
> +               pteprot |= ARM_SHORT_PTE_B;
> +       if (pgdprot & ARM_SHORT_PGD_nG)
> +               pteprot |= ARM_SHORT_PTE_nG;
> +       if (pgdprot & ARM_SHORT_PGD_SECTION_XN)
> +               pteprot |= large ? ARM_SHORT_PTE_LARGE_XN :
> +                               ARM_SHORT_PTE_SMALL_XN;
> +       if (pgdprot & ARM_SHORT_PGD_RD_WR)
> +               pteprot |= ARM_SHORT_PTE_RD_WR;
> +       if (pgdprot & ARM_SHORT_PGD_RDONLY)
> +               pteprot |= ARM_SHORT_PTE_RDONLY;
> +
> +       return pteprot;
> +}
> +
> +static arm_short_iopte
> +__arm_short_pgtable_prot(struct arm_short_io_pgtable *data)
> +{
> +       arm_short_iopte pgdprot = 0;
> +
> +       pgdprot = ARM_SHORT_PGD_TYPE_PGTABLE;
> +       if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS)
> +               pgdprot |= ARM_SHORT_PGD_PGTABLE_NS;
> +       return pgdprot;
> +}
> +
> +static int
> +_arm_short_map(struct arm_short_io_pgtable *data,
> +              unsigned int iova, phys_addr_t paddr,
> +              arm_short_iopte pgdprot, arm_short_iopte pteprot,
> +              bool large)
> +{
> +       struct io_pgtable_cfg *cfg = &data->iop.cfg;
> +       arm_short_iopte *pgd = data->pgd, *pte;
> +       void *pte_new = NULL;
> +       int ret;
> +
> +       pgd += ARM_SHORT_PGD_IDX(iova);
> +
> +       if (!pteprot) { /* section or supersection */
> +               pte = pgd;
> +               pteprot = pgdprot;
> +       } else {        /* page or largepage */
> +               if (!(*pgd)) {
> +                       pte_new = __arm_short_alloc_pgtable(
> +                                       ARM_SHORT_BYTES_PER_PTE,
> +                                       GFP_ATOMIC, false, cfg);
> +                       if (unlikely(!pte_new))
> +                               return -ENOMEM;
> +
> +                       pgdprot |= virt_to_phys(pte_new);
> +                       __arm_short_set_pte(pgd, pgdprot, 1, cfg);
> +               }
> +               pte = arm_short_get_pte_in_pgd(*pgd, iova);
> +       }
> +
> +       pteprot |= (arm_short_iopte)paddr;
> +       ret = __arm_short_set_pte(pte, pteprot, large ? 16 : 1, cfg);
> +       if (ret && pte_new)
> +               __arm_short_free_pgtable(pte_new, ARM_SHORT_BYTES_PER_PTE,
> +                                        false, cfg);

Don't you need to kill the pgd entry before freeing this? Please see my
previous comments about safely freeing page tables:

  http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/358268.html

(at the end of the post)

> +       return ret;
> +}
> +
> +static int arm_short_map(struct io_pgtable_ops *ops, unsigned long iova,
> +                        phys_addr_t paddr, size_t size, int prot)
> +{
> +       struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
> +       arm_short_iopte pgdprot = 0, pteprot = 0;
> +       bool large;
> +
> +       /* If no access, then nothing to do */
> +       if (!(prot & (IOMMU_READ | IOMMU_WRITE)))
> +               return 0;
> +
> +       if (WARN_ON((iova | paddr) & (size - 1)))
> +               return -EINVAL;
> +
> +       switch (size) {
> +       case SZ_4K:
> +       case SZ_64K:
> +               large = (size == SZ_64K) ? true : false;
> +               pteprot = __arm_short_pte_prot(data, prot, large);
> +               pgdprot = __arm_short_pgtable_prot(data);
> +               break;
> +
> +       case SZ_1M:
> +       case SZ_16M:
> +               large = (size == SZ_16M) ? true : false;
> +               pgdprot = __arm_short_pgd_prot(data, prot, large);
> +               break;
> +       default:
> +               return -EINVAL;
> +       }
> +
> +       return _arm_short_map(data, iova, paddr, pgdprot, pteprot, large);
> +}
> +
> +static phys_addr_t arm_short_iova_to_phys(struct io_pgtable_ops *ops,
> +                                         unsigned long iova)
> +{
> +       struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
> +       arm_short_iopte *pte, *pgd = data->pgd;
> +       phys_addr_t pa = 0;
> +
> +       pgd += ARM_SHORT_PGD_IDX(iova);
> +
> +       if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(*pgd)) {
> +               pte = arm_short_get_pte_in_pgd(*pgd, iova);
> +
> +               if (ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(*pte)) {
> +                       pa = (*pte) & ARM_SHORT_PTE_LARGE_MSK;
> +                       pa |= iova & ~ARM_SHORT_PTE_LARGE_MSK;
> +               } else if (ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(*pte)) {
> +                       pa = (*pte) & ARM_SHORT_PTE_SMALL_MSK;
> +                       pa |= iova & ~ARM_SHORT_PTE_SMALL_MSK;
> +               }
> +       } else if (ARM_SHORT_PGD_TYPE_IS_SECTION(*pgd)) {
> +               pa = (*pgd) & ARM_SHORT_PGD_SECTION_MSK;
> +               pa |= iova & ~ARM_SHORT_PGD_SECTION_MSK;
> +       } else if (ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(*pgd)) {
> +               pa = (*pgd) & ARM_SHORT_PGD_SUPERSECTION_MSK;
> +               pa |= iova & ~ARM_SHORT_PGD_SUPERSECTION_MSK;
> +       }
> +
> +       return pa;
> +}
> +
> +static bool _arm_short_whether_free_pgtable(arm_short_iopte *pgd)
> +{

_arm_short_pgtable_empty might be a better name.

> +       arm_short_iopte *pte;
> +       int i;
> +
> +       pte = ARM_SHORT_GET_PGTABLE_VA(*pgd);
> +       for (i = 0; i < ARM_SHORT_PTRS_PER_PTE; i++) {
> +               if (pte[i] != 0)
> +                       return false;
> +       }
> +
> +       return true;
> +}
> +
> +static int
> +arm_short_split_blk_unmap(struct io_pgtable_ops *ops, unsigned int iova,
> +                         phys_addr_t paddr, size_t size,
> +                         arm_short_iopte pgdprotup, arm_short_iopte pteprotup,
> +                         size_t blk_size)
> +{
> +       struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
> +       const struct iommu_gather_ops *tlb = data->iop.cfg.tlb;
> +       struct io_pgtable_cfg *cfg = &data->iop.cfg;
> +       unsigned long *pgbitmap = &cfg->pgsize_bitmap;
> +       unsigned int blk_base, blk_start, blk_end, i;
> +       arm_short_iopte pgdprot, pteprot;
> +       phys_addr_t blk_paddr;
> +       size_t mapsize = 0, nextmapsize;
> +       int ret;
> +
> +       /* find the nearest mapsize */
> +       for (i = find_first_bit(pgbitmap, BITS_PER_LONG);
> +            i < BITS_PER_LONG && ((1 << i) < blk_size) &&
> +            IS_ALIGNED(size, 1 << i);
> +            i = find_next_bit(pgbitmap, BITS_PER_LONG, i + 1))
> +               mapsize = 1 << i;
> +
> +       if (WARN_ON(!mapsize))
> +               return 0; /* Bytes unmapped */
> +       nextmapsize = 1 << i;
> +
> +       blk_base = iova & ~(blk_size - 1);
> +       blk_start = blk_base;
> +       blk_end = blk_start + blk_size;
> +       blk_paddr = paddr;
> +
> +       for (; blk_start < blk_end;
> +            blk_start += mapsize, blk_paddr += mapsize) {
> +               /* Unmap! */
> +               if (blk_start == iova)
> +                       continue;
> +
> +               /* Try to upper map */
> +               if (blk_base != blk_start &&
> +                   IS_ALIGNED(blk_start | blk_paddr, nextmapsize) &&
> +                   mapsize != nextmapsize) {
> +                       mapsize = nextmapsize;
> +                       i = find_next_bit(pgbitmap, BITS_PER_LONG, i + 1);
> +                       if (i < BITS_PER_LONG)
> +                               nextmapsize = 1 << i;
> +               }
> +
> +               if (mapsize == SZ_1M) {

How do we get here with a mapsize of 1M?

> +                       pgdprot = pgdprotup;
> +                       pgdprot |= __arm_short_pgd_prot(data, 0, false);
> +                       pteprot = 0;
> +               } else { /* small or large page */
> +                       pgdprot = (blk_size == SZ_64K) ? 0 : pgdprotup;
> +                       pteprot = __arm_short_pte_prot_split(
> +                                       data, pgdprot, pteprotup,
> +                                       mapsize == SZ_64K);
> +                       pgdprot = __arm_short_pgtable_prot(data);
> +               }
> +
> +               ret = _arm_short_map(data, blk_start, blk_paddr, pgdprot,
> +                                    pteprot, mapsize == SZ_64K);
> +               if (ret < 0) {
> +                       /* Free the table we allocated */
> +                       arm_short_iopte *pgd = data->pgd, *pte;
> +
> +                       pgd += ARM_SHORT_PGD_IDX(blk_base);
> +                       if (*pgd) {
> +                               pte = ARM_SHORT_GET_PGTABLE_VA(*pgd);
> +                               __arm_short_set_pte(pgd, 0, 1, cfg);
> +                               tlb->tlb_add_flush(blk_base, blk_size, true,
> +                                                  data->iop.cookie);
> +                               tlb->tlb_sync(data->iop.cookie);
> +                               __arm_short_free_pgtable(
> +                                       pte, ARM_SHORT_BYTES_PER_PTE,
> +                                       false, cfg);

This looks wrong. _arm_short_map cleans up if it returns non-zero already.

> +                       }
> +                       return 0;/* Bytes unmapped */
> +               }
> +       }
> +
> +       tlb->tlb_add_flush(blk_base, blk_size, true, data->iop.cookie);
> +       tlb->tlb_sync(data->iop.cookie);

Why are you syncing here? You can postpone this to the caller, if it turns
out the unmap was a success.

> +       return size;
> +}
> +
> +static int arm_short_unmap(struct io_pgtable_ops *ops,
> +                          unsigned long iova,
> +                          size_t size)
> +{
> +       struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
> +       struct io_pgtable_cfg *cfg = &data->iop.cfg;
> +       arm_short_iopte *pgd, *pte = NULL;
> +       arm_short_iopte curpgd, curpte = 0;
> +       phys_addr_t paddr;
> +       unsigned int iova_base, blk_size = 0;
> +       void *cookie = data->iop.cookie;
> +       bool pgtablefree = false;
> +
> +       pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova);
> +
> +       /* Get block size */
> +       if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(*pgd)) {
> +               pte = arm_short_get_pte_in_pgd(*pgd, iova);
> +
> +               if (ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(*pte))
> +                       blk_size = SZ_4K;
> +               else if (ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(*pte))
> +                       blk_size = SZ_64K;
> +               else
> +                       WARN_ON(1);
> +       } else if (ARM_SHORT_PGD_TYPE_IS_SECTION(*pgd)) {
> +               blk_size = SZ_1M;
> +       } else if (ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(*pgd)) {
> +               blk_size = SZ_16M;
> +       } else {
> +               WARN_ON(1);

Maybe return 0 or something instead of falling through with blk_size == 0?

> +       }
> +
> +       iova_base = iova & ~(blk_size - 1);
> +       pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova_base);
> +       paddr = arm_short_iova_to_phys(ops, iova_base);
> +       curpgd = *pgd;
> +
> +       if (blk_size == SZ_4K || blk_size == SZ_64K) {
> +               pte = arm_short_get_pte_in_pgd(*pgd, iova_base);
> +               curpte = *pte;
> +               __arm_short_set_pte(pte, 0, blk_size / SZ_4K, cfg);
> +
> +               pgtablefree = _arm_short_whether_free_pgtable(pgd);
> +               if (pgtablefree)
> +                       __arm_short_set_pte(pgd, 0, 1, cfg);
> +       } else if (blk_size == SZ_1M || blk_size == SZ_16M) {
> +               __arm_short_set_pte(pgd, 0, blk_size / SZ_1M, cfg);
> +       }
> +
> +       cfg->tlb->tlb_add_flush(iova_base, blk_size, true, cookie);
> +       cfg->tlb->tlb_sync(cookie);
> +
> +       if (pgtablefree)/* Free pgtable after tlb-flush */
> +               __arm_short_free_pgtable(ARM_SHORT_GET_PGTABLE_VA(curpgd),
> +                                        ARM_SHORT_BYTES_PER_PTE, false, cfg);

Curious, but why do you care about freeing this on unmap? It will get
freed when the page table itself is freed anyway (via the ->free callback).

> +
> +       if (blk_size > size) { /* Split the block */
> +               return arm_short_split_blk_unmap(
> +                               ops, iova, paddr, size,
> +                               ARM_SHORT_PGD_GET_PROT(curpgd),
> +                               ARM_SHORT_PTE_LARGE_GET_PROT(curpte),
> +                               blk_size);
> +       } else if (blk_size < size) {
> +               /* Unmap the block while remap partial again after split */
> +               return blk_size +
> +                       arm_short_unmap(ops, iova + blk_size, size - blk_size);
> +       }
> +
> +       return size;
> +}
> +
> +static struct io_pgtable *
> +arm_short_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
> +{
> +       struct arm_short_io_pgtable *data;
> +
> +       if (cfg->ias > 32 || cfg->oas > 32)
> +               return NULL;
> +
> +       cfg->pgsize_bitmap &=
> +               (cfg->quirks & IO_PGTABLE_QUIRK_SHORT_SUPERSECTION) ?
> +               (SZ_4K | SZ_64K | SZ_1M | SZ_16M) : (SZ_4K | SZ_64K | SZ_1M);
> +
> +       data = kzalloc(sizeof(*data), GFP_KERNEL);
> +       if (!data)
> +               return NULL;
> +
> +       data->pgd_size = SZ_16K;
> +       data->pgd = __arm_short_alloc_pgtable(
> +                                       data->pgd_size,
> +                                       GFP_KERNEL | __GFP_ZERO | __GFP_DMA,
> +                                       true, cfg);
> +       if (!data->pgd)
> +               goto out_free_data;
> +       wmb();/* Ensure the empty pgd is visible before any actual TTBR write */
> +
> +       data->pgtable_cached = kmem_cache_create(
> +                                       "io-pgtable-arm-short",
> +                                        ARM_SHORT_BYTES_PER_PTE,
> +                                        ARM_SHORT_BYTES_PER_PTE,
> +                                        0, NULL);
> +       if (!data->pgtable_cached)
> +               goto out_free_pgd;
> +
> +       /* TTBRs */
> +       cfg->arm_short_cfg.ttbr[0] = virt_to_phys(data->pgd);
> +       cfg->arm_short_cfg.ttbr[1] = 0;
> +       cfg->arm_short_cfg.tcr = 0;
> +       cfg->arm_short_cfg.nmrr = 0;
> +       cfg->arm_short_cfg.prrr = 0;
> +
> +       data->iop.ops = (struct io_pgtable_ops) {
> +               .map            = arm_short_map,
> +               .unmap          = arm_short_unmap,
> +               .iova_to_phys   = arm_short_iova_to_phys,
> +       };
> +
> +       return &data->iop;
> +
> +out_free_pgd:
> +       __arm_short_free_pgtable(data->pgd, data->pgd_size, true, cfg);
> +out_free_data:
> +       kfree(data);
> +       return NULL;
> +}
> +
> +static void arm_short_free_pgtable(struct io_pgtable *iop)
> +{
> +       struct arm_short_io_pgtable *data = io_pgtable_to_data(iop);
> +
> +       kmem_cache_destroy(data->pgtable_cached);
> +       __arm_short_free_pgtable(data->pgd, data->pgd_size,
> +                                true, &data->iop.cfg);
> +       kfree(data);
> +}
> +
> +struct io_pgtable_init_fns io_pgtable_arm_short_init_fns = {
> +       .alloc  = arm_short_alloc_pgtable,
> +       .free   = arm_short_free_pgtable,
> +};
> +

[...]

> diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
> index 6436fe2..14a9b3a 100644
> --- a/drivers/iommu/io-pgtable.c
> +++ b/drivers/iommu/io-pgtable.c
> @@ -28,6 +28,7 @@ extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns;
>  extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns;
>  extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns;
>  extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns;
> +extern struct io_pgtable_init_fns io_pgtable_arm_short_init_fns;
> 
>  static const struct io_pgtable_init_fns *
>  io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] =
> @@ -38,6 +39,9 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] =
>         [ARM_64_LPAE_S1] = &io_pgtable_arm_64_lpae_s1_init_fns,
>         [ARM_64_LPAE_S2] = &io_pgtable_arm_64_lpae_s2_init_fns,
>  #endif
> +#ifdef CONFIG_IOMMU_IO_PGTABLE_SHORT
> +       [ARM_SHORT_DESC] = &io_pgtable_arm_short_init_fns,
> +#endif
>  };
> 
>  struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
> diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
> index 68c63d9..0f45e60 100644
> --- a/drivers/iommu/io-pgtable.h
> +++ b/drivers/iommu/io-pgtable.h
> @@ -9,6 +9,7 @@ enum io_pgtable_fmt {
>         ARM_32_LPAE_S2,
>         ARM_64_LPAE_S1,
>         ARM_64_LPAE_S2,
> +       ARM_SHORT_DESC,
>         IO_PGTABLE_NUM_FMTS,
>  };
> 
> @@ -45,6 +46,9 @@ struct iommu_gather_ops {
>   */
>  struct io_pgtable_cfg {
>         #define IO_PGTABLE_QUIRK_ARM_NS (1 << 0)        /* Set NS bit in PTEs */
> +       #define IO_PGTABLE_QUIRK_SHORT_SUPERSECTION     BIT(1)
> +       #define IO_PGTABLE_QUIRK_SHORT_NO_XN            BIT(2) /* No XN bit */
> +       #define IO_PGTABLE_QUIRK_SHORT_NO_PERMS         BIT(3) /* No AP bit */

Why have two quirks for this? I suggested included NO_XN in NO_PERMS:

  http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/361160.html

>         int                             quirks;
>         unsigned long                   pgsize_bitmap;
>         unsigned int                    ias;
> @@ -64,6 +68,13 @@ struct io_pgtable_cfg {
>                         u64     vttbr;
>                         u64     vtcr;
>                 } arm_lpae_s2_cfg;
> +
> +               struct {
> +                       u32     ttbr[2];
> +                       u32     tcr;
> +                       u32     nmrr;
> +                       u32     prrr;
> +               } arm_short_cfg;

We don't return an SCTLR value here, so a comment somewhere saying that
access flag is not supported would be helpful (so that drivers can ensure
that they configure things for the AP[2:0] permission model).

Will
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
@ 2015-09-16 15:58     ` Will Deacon
  0 siblings, 0 replies; 60+ messages in thread
From: Will Deacon @ 2015-09-16 15:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Aug 03, 2015 at 11:21:16AM +0100, Yong Wu wrote:
> This patch is for ARM Short Descriptor Format.
> 
> Signed-off-by: Yong Wu <yong.wu@mediatek.com>
> ---
>  drivers/iommu/Kconfig                |  18 +
>  drivers/iommu/Makefile               |   1 +
>  drivers/iommu/io-pgtable-arm-short.c | 813 +++++++++++++++++++++++++++++++++++
>  drivers/iommu/io-pgtable-arm.c       |   3 -
>  drivers/iommu/io-pgtable.c           |   4 +
>  drivers/iommu/io-pgtable.h           |  14 +
>  6 files changed, 850 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/iommu/io-pgtable-arm-short.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index f1fb1d3..3abd066 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -39,6 +39,24 @@ config IOMMU_IO_PGTABLE_LPAE_SELFTEST
> 
>           If unsure, say N here.
> 
> +config IOMMU_IO_PGTABLE_SHORT
> +       bool "ARMv7/v8 Short Descriptor Format"
> +       select IOMMU_IO_PGTABLE
> +       depends on ARM || ARM64 || COMPILE_TEST
> +       help
> +         Enable support for the ARM Short-descriptor pagetable format.
> +         This allocator supports 2 levels translation tables which supports

Some minor rewording here:

"...2 levels of translation tables, which enables a 32-bit memory map based
 on..."

> +         a memory map based on memory sections or pages.
> +
> +config IOMMU_IO_PGTABLE_SHORT_SELFTEST
> +       bool "Short Descriptor selftests"
> +       depends on IOMMU_IO_PGTABLE_SHORT
> +       help
> +         Enable self-tests for Short-descriptor page table allocator.
> +         This performs a series of page-table consistency checks during boot.
> +
> +         If unsure, say N here.
> +
>  endmenu
> 
>  config IOMMU_IOVA

[...]

> +#define ARM_SHORT_PGDIR_SHIFT                  20
> +#define ARM_SHORT_PAGE_SHIFT                   12
> +#define ARM_SHORT_PTRS_PER_PTE                 \
> +       (1 << (ARM_SHORT_PGDIR_SHIFT - ARM_SHORT_PAGE_SHIFT))
> +#define ARM_SHORT_BYTES_PER_PTE                        \
> +       (ARM_SHORT_PTRS_PER_PTE * sizeof(arm_short_iopte))
> +
> +/* level 1 pagetable */
> +#define ARM_SHORT_PGD_TYPE_PGTABLE             BIT(0)
> +#define ARM_SHORT_PGD_TYPE_SECTION             BIT(1)
> +#define ARM_SHORT_PGD_B                                BIT(2)
> +#define ARM_SHORT_PGD_C                                BIT(3)
> +#define ARM_SHORT_PGD_PGTABLE_NS               BIT(3)
> +#define ARM_SHORT_PGD_SECTION_XN               BIT(4)
> +#define ARM_SHORT_PGD_IMPLE                    BIT(9)
> +#define ARM_SHORT_PGD_RD_WR                    (3 << 10)
> +#define ARM_SHORT_PGD_RDONLY                   BIT(15)
> +#define ARM_SHORT_PGD_S                                BIT(16)
> +#define ARM_SHORT_PGD_nG                       BIT(17)
> +#define ARM_SHORT_PGD_SUPERSECTION             BIT(18)
> +#define ARM_SHORT_PGD_SECTION_NS               BIT(19)
> +
> +#define ARM_SHORT_PGD_TYPE_SUPERSECTION                \
> +       (ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_SUPERSECTION)
> +#define ARM_SHORT_PGD_SECTION_TYPE_MSK         \
> +       (ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_SUPERSECTION)
> +#define ARM_SHORT_PGD_PGTABLE_TYPE_MSK         \
> +       (ARM_SHORT_PGD_TYPE_SECTION | ARM_SHORT_PGD_TYPE_PGTABLE)
> +#define ARM_SHORT_PGD_TYPE_IS_PGTABLE(pgd)     \
> +       (((pgd) & ARM_SHORT_PGD_PGTABLE_TYPE_MSK) == ARM_SHORT_PGD_TYPE_PGTABLE)
> +#define ARM_SHORT_PGD_TYPE_IS_SECTION(pgd)     \
> +       (((pgd) & ARM_SHORT_PGD_SECTION_TYPE_MSK) == ARM_SHORT_PGD_TYPE_SECTION)
> +#define ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(pgd)        \
> +       (((pgd) & ARM_SHORT_PGD_SECTION_TYPE_MSK) == \
> +       ARM_SHORT_PGD_TYPE_SUPERSECTION)
> +#define ARM_SHORT_PGD_PGTABLE_MSK              0xfffffc00

You could use (~(ARM_SHORT_BYTES_PER_PTE - 1)), I think.

> +#define ARM_SHORT_PGD_SECTION_MSK              (~(SZ_1M - 1))
> +#define ARM_SHORT_PGD_SUPERSECTION_MSK         (~(SZ_16M - 1))
> +
> +/* level 2 pagetable */
> +#define ARM_SHORT_PTE_TYPE_LARGE               BIT(0)
> +#define ARM_SHORT_PTE_SMALL_XN                 BIT(0)
> +#define ARM_SHORT_PTE_TYPE_SMALL               BIT(1)
> +#define ARM_SHORT_PTE_B                                BIT(2)
> +#define ARM_SHORT_PTE_C                                BIT(3)
> +#define ARM_SHORT_PTE_RD_WR                    (3 << 4)
> +#define ARM_SHORT_PTE_RDONLY                   BIT(9)
> +#define ARM_SHORT_PTE_S                                BIT(10)
> +#define ARM_SHORT_PTE_nG                       BIT(11)
> +#define ARM_SHORT_PTE_LARGE_XN                 BIT(15)
> +#define ARM_SHORT_PTE_LARGE_MSK                        (~(SZ_64K - 1))
> +#define ARM_SHORT_PTE_SMALL_MSK                        (~(SZ_4K - 1))
> +#define ARM_SHORT_PTE_TYPE_MSK                 \
> +       (ARM_SHORT_PTE_TYPE_LARGE | ARM_SHORT_PTE_TYPE_SMALL)
> +#define ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(pte)   \
> +       (((pte) & ARM_SHORT_PTE_TYPE_SMALL) == ARM_SHORT_PTE_TYPE_SMALL)

Maybe a comment here, because it's confusing that you don't and with the
mask due to XN.

> +#define ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(pte)   \
> +       (((pte) & ARM_SHORT_PTE_TYPE_MSK) == ARM_SHORT_PTE_TYPE_LARGE)
> +
> +#define ARM_SHORT_PGD_IDX(a)                   ((a) >> ARM_SHORT_PGDIR_SHIFT)
> +#define ARM_SHORT_PTE_IDX(a)                   \
> +       (((a) >> ARM_SHORT_PAGE_SHIFT) & (ARM_SHORT_PTRS_PER_PTE - 1))
> +
> +#define ARM_SHORT_GET_PGTABLE_VA(pgd)          \
> +       (phys_to_virt((unsigned long)pgd & ARM_SHORT_PGD_PGTABLE_MSK))
> +
> +#define ARM_SHORT_PTE_LARGE_GET_PROT(pte)      \
> +       (((pte) & (~ARM_SHORT_PTE_LARGE_MSK)) & ~ARM_SHORT_PTE_TYPE_MSK)

AFAICT, the only user of this also does an '& ~ARM_SHORT_PTE_SMALL_MSK'.
Wouldn't it be better to define ARM_SHORT_PTE_GET_PROT, which just returns
the AP bits? That said, what are you going to do about XN? I know you
don't support it in your hardware, but this could code should still do
the right thing.

> +static int
> +__arm_short_set_pte(arm_short_iopte *ptep, arm_short_iopte pte,
> +                   unsigned int ptenr, struct io_pgtable_cfg *cfg)
> +{
> +       struct device *dev = cfg->iommu_dev;
> +       int i;
> +
> +       for (i = 0; i < ptenr; i++) {
> +               if (ptep[i] && pte) {
> +                       /* Someone else may have allocated for this pte */
> +                       WARN_ON(!selftest_running);
> +                       goto err_exist_pte;
> +               }
> +               ptep[i] = pte;
> +       }
> +
> +       if (selftest_running)
> +               return 0;
> +
> +       dma_sync_single_for_device(dev, __arm_short_dma_addr(dev, ptep),
> +                                  sizeof(*ptep) * ptenr, DMA_TO_DEVICE);
> +       return 0;
> +
> +err_exist_pte:
> +       while (i--)
> +               ptep[i] = 0;

What about a dma_sync for the failure case?

> +       return -EEXIST;
> +}
> +
> +static void *
> +__arm_short_alloc_pgtable(size_t size, gfp_t gfp, bool pgd,
> +                         struct io_pgtable_cfg *cfg)
> +{
> +       struct arm_short_io_pgtable *data;
> +       struct device *dev = cfg->iommu_dev;
> +       dma_addr_t dma;
> +       void *va;
> +
> +       if (pgd) {/* lvl1 pagetable */
> +               va = alloc_pages_exact(size, gfp);
> +       } else {  /* lvl2 pagetable */
> +               data = io_pgtable_cfg_to_data(cfg);
> +               va = kmem_cache_zalloc(data->pgtable_cached, gfp);
> +       }
> +
> +       if (!va)
> +               return NULL;
> +
> +       if (selftest_running)
> +               return va;
> +
> +       dma = dma_map_single(dev, va, size, DMA_TO_DEVICE);
> +       if (dma_mapping_error(dev, dma))
> +               goto out_free;
> +
> +       if (dma != __arm_short_dma_addr(dev, va))
> +               goto out_unmap;
> +
> +       if (!pgd) {
> +               kmemleak_ignore(va);
> +               dma_sync_single_for_device(dev, __arm_short_dma_addr(dev, va),
> +                                          size, DMA_TO_DEVICE);

Why do you need to do this as well as the dma_map_single above?

> +       }
> +
> +       return va;
> +
> +out_unmap:
> +       dev_err_ratelimited(dev, "Cannot accommodate DMA translation for IOMMU page tables\n");
> +       dma_unmap_single(dev, dma, size, DMA_TO_DEVICE);
> +out_free:
> +       if (pgd)
> +               free_pages_exact(va, size);
> +       else
> +               kmem_cache_free(data->pgtable_cached, va);
> +       return NULL;
> +}
> +
> +static void
> +__arm_short_free_pgtable(void *va, size_t size, bool pgd,
> +                        struct io_pgtable_cfg *cfg)
> +{
> +       struct arm_short_io_pgtable *data = io_pgtable_cfg_to_data(cfg);
> +       struct device *dev = cfg->iommu_dev;
> +
> +       if (!selftest_running)
> +               dma_unmap_single(dev, __arm_short_dma_addr(dev, va),
> +                                size, DMA_TO_DEVICE);
> +
> +       if (pgd)
> +               free_pages_exact(va, size);
> +       else
> +               kmem_cache_free(data->pgtable_cached, va);
> +}
> +
> +static arm_short_iopte
> +__arm_short_pte_prot(struct arm_short_io_pgtable *data, int prot, bool large)
> +{
> +       arm_short_iopte pteprot;
> +       int quirk = data->iop.cfg.quirks;
> +
> +       pteprot = ARM_SHORT_PTE_S | ARM_SHORT_PTE_nG;
> +       pteprot |= large ? ARM_SHORT_PTE_TYPE_LARGE :
> +                               ARM_SHORT_PTE_TYPE_SMALL;
> +       if (prot & IOMMU_CACHE)
> +               pteprot |=  ARM_SHORT_PTE_B | ARM_SHORT_PTE_C;
> +       if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_XN) && (prot & IOMMU_NOEXEC)) {
> +                       pteprot |= large ? ARM_SHORT_PTE_LARGE_XN :
> +                               ARM_SHORT_PTE_SMALL_XN;

Weird indentation, man. Also, see my later comment about combining NO_XN
with NO_PERMS (the latter subsumes the first)

> +       }
> +       if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_PERMS)) {
> +               pteprot |= ARM_SHORT_PTE_RD_WR;
> +               if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
> +                       pteprot |= ARM_SHORT_PTE_RDONLY;
> +       }
> +       return pteprot;
> +}
> +
> +static arm_short_iopte
> +__arm_short_pgd_prot(struct arm_short_io_pgtable *data, int prot, bool super)
> +{
> +       arm_short_iopte pgdprot;
> +       int quirk = data->iop.cfg.quirks;
> +
> +       pgdprot = ARM_SHORT_PGD_S | ARM_SHORT_PGD_nG;
> +       pgdprot |= super ? ARM_SHORT_PGD_TYPE_SUPERSECTION :
> +                               ARM_SHORT_PGD_TYPE_SECTION;
> +       if (prot & IOMMU_CACHE)
> +               pgdprot |= ARM_SHORT_PGD_C | ARM_SHORT_PGD_B;
> +       if (quirk & IO_PGTABLE_QUIRK_ARM_NS)
> +               pgdprot |= ARM_SHORT_PGD_SECTION_NS;
> +
> +       if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_XN) && (prot & IOMMU_NOEXEC))
> +                       pgdprot |= ARM_SHORT_PGD_SECTION_XN;
> +
> +       if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_PERMS)) {

Same comments here.

> +               pgdprot |= ARM_SHORT_PGD_RD_WR;
> +               if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
> +                       pgdprot |= ARM_SHORT_PGD_RDONLY;
> +       }
> +       return pgdprot;
> +}
> +
> +static arm_short_iopte
> +__arm_short_pte_prot_split(struct arm_short_io_pgtable *data,
> +                          arm_short_iopte pgdprot,
> +                          arm_short_iopte pteprot_large,
> +                          bool large)
> +{
> +       arm_short_iopte pteprot = 0;
> +
> +       pteprot = ARM_SHORT_PTE_S | ARM_SHORT_PTE_nG | ARM_SHORT_PTE_RD_WR;
> +       pteprot |= large ? ARM_SHORT_PTE_TYPE_LARGE :
> +                               ARM_SHORT_PTE_TYPE_SMALL;
> +
> +       /* large page to small page pte prot. Only large page may split */
> +       if (!pgdprot && !large) {

It's slightly complicated having these two variables controlling the
behaviour of the split. In reality, we're either splitting a section or
a large page, so there are three valid combinations.

It might be simpler to operate on IOMMU_{READ,WRITE,NOEXEC,CACHE} as
much as possible, and then have some simple functions to encode/decode
these into section/large/small page prot bits. We could then just pass
the IOMMU_* prot around along with the map size. What do you think?

> +               pteprot |= pteprot_large & ~ARM_SHORT_PTE_SMALL_MSK;
> +               if (pteprot_large & ARM_SHORT_PTE_LARGE_XN)
> +                       pteprot |= ARM_SHORT_PTE_SMALL_XN;
> +       }
> +
> +       /* section to pte prot */
> +       if (pgdprot & ARM_SHORT_PGD_C)
> +               pteprot |= ARM_SHORT_PTE_C;
> +       if (pgdprot & ARM_SHORT_PGD_B)
> +               pteprot |= ARM_SHORT_PTE_B;
> +       if (pgdprot & ARM_SHORT_PGD_nG)
> +               pteprot |= ARM_SHORT_PTE_nG;
> +       if (pgdprot & ARM_SHORT_PGD_SECTION_XN)
> +               pteprot |= large ? ARM_SHORT_PTE_LARGE_XN :
> +                               ARM_SHORT_PTE_SMALL_XN;
> +       if (pgdprot & ARM_SHORT_PGD_RD_WR)
> +               pteprot |= ARM_SHORT_PTE_RD_WR;
> +       if (pgdprot & ARM_SHORT_PGD_RDONLY)
> +               pteprot |= ARM_SHORT_PTE_RDONLY;
> +
> +       return pteprot;
> +}
> +
> +static arm_short_iopte
> +__arm_short_pgtable_prot(struct arm_short_io_pgtable *data)
> +{
> +       arm_short_iopte pgdprot = 0;
> +
> +       pgdprot = ARM_SHORT_PGD_TYPE_PGTABLE;
> +       if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS)
> +               pgdprot |= ARM_SHORT_PGD_PGTABLE_NS;
> +       return pgdprot;
> +}
> +
> +static int
> +_arm_short_map(struct arm_short_io_pgtable *data,
> +              unsigned int iova, phys_addr_t paddr,
> +              arm_short_iopte pgdprot, arm_short_iopte pteprot,
> +              bool large)
> +{
> +       struct io_pgtable_cfg *cfg = &data->iop.cfg;
> +       arm_short_iopte *pgd = data->pgd, *pte;
> +       void *pte_new = NULL;
> +       int ret;
> +
> +       pgd += ARM_SHORT_PGD_IDX(iova);
> +
> +       if (!pteprot) { /* section or supersection */
> +               pte = pgd;
> +               pteprot = pgdprot;
> +       } else {        /* page or largepage */
> +               if (!(*pgd)) {
> +                       pte_new = __arm_short_alloc_pgtable(
> +                                       ARM_SHORT_BYTES_PER_PTE,
> +                                       GFP_ATOMIC, false, cfg);
> +                       if (unlikely(!pte_new))
> +                               return -ENOMEM;
> +
> +                       pgdprot |= virt_to_phys(pte_new);
> +                       __arm_short_set_pte(pgd, pgdprot, 1, cfg);
> +               }
> +               pte = arm_short_get_pte_in_pgd(*pgd, iova);
> +       }
> +
> +       pteprot |= (arm_short_iopte)paddr;
> +       ret = __arm_short_set_pte(pte, pteprot, large ? 16 : 1, cfg);
> +       if (ret && pte_new)
> +               __arm_short_free_pgtable(pte_new, ARM_SHORT_BYTES_PER_PTE,
> +                                        false, cfg);

Don't you need to kill the pgd entry before freeing this? Please see my
previous comments about safely freeing page tables:

  http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/358268.html

(at the end of the post)

> +       return ret;
> +}
> +
> +static int arm_short_map(struct io_pgtable_ops *ops, unsigned long iova,
> +                        phys_addr_t paddr, size_t size, int prot)
> +{
> +       struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
> +       arm_short_iopte pgdprot = 0, pteprot = 0;
> +       bool large;
> +
> +       /* If no access, then nothing to do */
> +       if (!(prot & (IOMMU_READ | IOMMU_WRITE)))
> +               return 0;
> +
> +       if (WARN_ON((iova | paddr) & (size - 1)))
> +               return -EINVAL;
> +
> +       switch (size) {
> +       case SZ_4K:
> +       case SZ_64K:
> +               large = (size == SZ_64K) ? true : false;
> +               pteprot = __arm_short_pte_prot(data, prot, large);
> +               pgdprot = __arm_short_pgtable_prot(data);
> +               break;
> +
> +       case SZ_1M:
> +       case SZ_16M:
> +               large = (size == SZ_16M) ? true : false;
> +               pgdprot = __arm_short_pgd_prot(data, prot, large);
> +               break;
> +       default:
> +               return -EINVAL;
> +       }
> +
> +       return _arm_short_map(data, iova, paddr, pgdprot, pteprot, large);
> +}
> +
> +static phys_addr_t arm_short_iova_to_phys(struct io_pgtable_ops *ops,
> +                                         unsigned long iova)
> +{
> +       struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
> +       arm_short_iopte *pte, *pgd = data->pgd;
> +       phys_addr_t pa = 0;
> +
> +       pgd += ARM_SHORT_PGD_IDX(iova);
> +
> +       if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(*pgd)) {
> +               pte = arm_short_get_pte_in_pgd(*pgd, iova);
> +
> +               if (ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(*pte)) {
> +                       pa = (*pte) & ARM_SHORT_PTE_LARGE_MSK;
> +                       pa |= iova & ~ARM_SHORT_PTE_LARGE_MSK;
> +               } else if (ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(*pte)) {
> +                       pa = (*pte) & ARM_SHORT_PTE_SMALL_MSK;
> +                       pa |= iova & ~ARM_SHORT_PTE_SMALL_MSK;
> +               }
> +       } else if (ARM_SHORT_PGD_TYPE_IS_SECTION(*pgd)) {
> +               pa = (*pgd) & ARM_SHORT_PGD_SECTION_MSK;
> +               pa |= iova & ~ARM_SHORT_PGD_SECTION_MSK;
> +       } else if (ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(*pgd)) {
> +               pa = (*pgd) & ARM_SHORT_PGD_SUPERSECTION_MSK;
> +               pa |= iova & ~ARM_SHORT_PGD_SUPERSECTION_MSK;
> +       }
> +
> +       return pa;
> +}
> +
> +static bool _arm_short_whether_free_pgtable(arm_short_iopte *pgd)
> +{

_arm_short_pgtable_empty might be a better name.

> +       arm_short_iopte *pte;
> +       int i;
> +
> +       pte = ARM_SHORT_GET_PGTABLE_VA(*pgd);
> +       for (i = 0; i < ARM_SHORT_PTRS_PER_PTE; i++) {
> +               if (pte[i] != 0)
> +                       return false;
> +       }
> +
> +       return true;
> +}
> +
> +static int
> +arm_short_split_blk_unmap(struct io_pgtable_ops *ops, unsigned int iova,
> +                         phys_addr_t paddr, size_t size,
> +                         arm_short_iopte pgdprotup, arm_short_iopte pteprotup,
> +                         size_t blk_size)
> +{
> +       struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
> +       const struct iommu_gather_ops *tlb = data->iop.cfg.tlb;
> +       struct io_pgtable_cfg *cfg = &data->iop.cfg;
> +       unsigned long *pgbitmap = &cfg->pgsize_bitmap;
> +       unsigned int blk_base, blk_start, blk_end, i;
> +       arm_short_iopte pgdprot, pteprot;
> +       phys_addr_t blk_paddr;
> +       size_t mapsize = 0, nextmapsize;
> +       int ret;
> +
> +       /* find the nearest mapsize */
> +       for (i = find_first_bit(pgbitmap, BITS_PER_LONG);
> +            i < BITS_PER_LONG && ((1 << i) < blk_size) &&
> +            IS_ALIGNED(size, 1 << i);
> +            i = find_next_bit(pgbitmap, BITS_PER_LONG, i + 1))
> +               mapsize = 1 << i;
> +
> +       if (WARN_ON(!mapsize))
> +               return 0; /* Bytes unmapped */
> +       nextmapsize = 1 << i;
> +
> +       blk_base = iova & ~(blk_size - 1);
> +       blk_start = blk_base;
> +       blk_end = blk_start + blk_size;
> +       blk_paddr = paddr;
> +
> +       for (; blk_start < blk_end;
> +            blk_start += mapsize, blk_paddr += mapsize) {
> +               /* Unmap! */
> +               if (blk_start == iova)
> +                       continue;
> +
> +               /* Try to upper map */
> +               if (blk_base != blk_start &&
> +                   IS_ALIGNED(blk_start | blk_paddr, nextmapsize) &&
> +                   mapsize != nextmapsize) {
> +                       mapsize = nextmapsize;
> +                       i = find_next_bit(pgbitmap, BITS_PER_LONG, i + 1);
> +                       if (i < BITS_PER_LONG)
> +                               nextmapsize = 1 << i;
> +               }
> +
> +               if (mapsize == SZ_1M) {

How do we get here with a mapsize of 1M?

> +                       pgdprot = pgdprotup;
> +                       pgdprot |= __arm_short_pgd_prot(data, 0, false);
> +                       pteprot = 0;
> +               } else { /* small or large page */
> +                       pgdprot = (blk_size == SZ_64K) ? 0 : pgdprotup;
> +                       pteprot = __arm_short_pte_prot_split(
> +                                       data, pgdprot, pteprotup,
> +                                       mapsize == SZ_64K);
> +                       pgdprot = __arm_short_pgtable_prot(data);
> +               }
> +
> +               ret = _arm_short_map(data, blk_start, blk_paddr, pgdprot,
> +                                    pteprot, mapsize == SZ_64K);
> +               if (ret < 0) {
> +                       /* Free the table we allocated */
> +                       arm_short_iopte *pgd = data->pgd, *pte;
> +
> +                       pgd += ARM_SHORT_PGD_IDX(blk_base);
> +                       if (*pgd) {
> +                               pte = ARM_SHORT_GET_PGTABLE_VA(*pgd);
> +                               __arm_short_set_pte(pgd, 0, 1, cfg);
> +                               tlb->tlb_add_flush(blk_base, blk_size, true,
> +                                                  data->iop.cookie);
> +                               tlb->tlb_sync(data->iop.cookie);
> +                               __arm_short_free_pgtable(
> +                                       pte, ARM_SHORT_BYTES_PER_PTE,
> +                                       false, cfg);

This looks wrong. _arm_short_map cleans up if it returns non-zero already.

> +                       }
> +                       return 0;/* Bytes unmapped */
> +               }
> +       }
> +
> +       tlb->tlb_add_flush(blk_base, blk_size, true, data->iop.cookie);
> +       tlb->tlb_sync(data->iop.cookie);

Why are you syncing here? You can postpone this to the caller, if it turns
out the unmap was a success.

> +       return size;
> +}
> +
> +static int arm_short_unmap(struct io_pgtable_ops *ops,
> +                          unsigned long iova,
> +                          size_t size)
> +{
> +       struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
> +       struct io_pgtable_cfg *cfg = &data->iop.cfg;
> +       arm_short_iopte *pgd, *pte = NULL;
> +       arm_short_iopte curpgd, curpte = 0;
> +       phys_addr_t paddr;
> +       unsigned int iova_base, blk_size = 0;
> +       void *cookie = data->iop.cookie;
> +       bool pgtablefree = false;
> +
> +       pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova);
> +
> +       /* Get block size */
> +       if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(*pgd)) {
> +               pte = arm_short_get_pte_in_pgd(*pgd, iova);
> +
> +               if (ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(*pte))
> +                       blk_size = SZ_4K;
> +               else if (ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(*pte))
> +                       blk_size = SZ_64K;
> +               else
> +                       WARN_ON(1);
> +       } else if (ARM_SHORT_PGD_TYPE_IS_SECTION(*pgd)) {
> +               blk_size = SZ_1M;
> +       } else if (ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(*pgd)) {
> +               blk_size = SZ_16M;
> +       } else {
> +               WARN_ON(1);

Maybe return 0 or something instead of falling through with blk_size == 0?

> +       }
> +
> +       iova_base = iova & ~(blk_size - 1);
> +       pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova_base);
> +       paddr = arm_short_iova_to_phys(ops, iova_base);
> +       curpgd = *pgd;
> +
> +       if (blk_size == SZ_4K || blk_size == SZ_64K) {
> +               pte = arm_short_get_pte_in_pgd(*pgd, iova_base);
> +               curpte = *pte;
> +               __arm_short_set_pte(pte, 0, blk_size / SZ_4K, cfg);
> +
> +               pgtablefree = _arm_short_whether_free_pgtable(pgd);
> +               if (pgtablefree)
> +                       __arm_short_set_pte(pgd, 0, 1, cfg);
> +       } else if (blk_size == SZ_1M || blk_size == SZ_16M) {
> +               __arm_short_set_pte(pgd, 0, blk_size / SZ_1M, cfg);
> +       }
> +
> +       cfg->tlb->tlb_add_flush(iova_base, blk_size, true, cookie);
> +       cfg->tlb->tlb_sync(cookie);
> +
> +       if (pgtablefree)/* Free pgtable after tlb-flush */
> +               __arm_short_free_pgtable(ARM_SHORT_GET_PGTABLE_VA(curpgd),
> +                                        ARM_SHORT_BYTES_PER_PTE, false, cfg);

Curious, but why do you care about freeing this on unmap? It will get
freed when the page table itself is freed anyway (via the ->free callback).

> +
> +       if (blk_size > size) { /* Split the block */
> +               return arm_short_split_blk_unmap(
> +                               ops, iova, paddr, size,
> +                               ARM_SHORT_PGD_GET_PROT(curpgd),
> +                               ARM_SHORT_PTE_LARGE_GET_PROT(curpte),
> +                               blk_size);
> +       } else if (blk_size < size) {
> +               /* Unmap the block while remap partial again after split */
> +               return blk_size +
> +                       arm_short_unmap(ops, iova + blk_size, size - blk_size);
> +       }
> +
> +       return size;
> +}
> +
> +static struct io_pgtable *
> +arm_short_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
> +{
> +       struct arm_short_io_pgtable *data;
> +
> +       if (cfg->ias > 32 || cfg->oas > 32)
> +               return NULL;
> +
> +       cfg->pgsize_bitmap &=
> +               (cfg->quirks & IO_PGTABLE_QUIRK_SHORT_SUPERSECTION) ?
> +               (SZ_4K | SZ_64K | SZ_1M | SZ_16M) : (SZ_4K | SZ_64K | SZ_1M);
> +
> +       data = kzalloc(sizeof(*data), GFP_KERNEL);
> +       if (!data)
> +               return NULL;
> +
> +       data->pgd_size = SZ_16K;
> +       data->pgd = __arm_short_alloc_pgtable(
> +                                       data->pgd_size,
> +                                       GFP_KERNEL | __GFP_ZERO | __GFP_DMA,
> +                                       true, cfg);
> +       if (!data->pgd)
> +               goto out_free_data;
> +       wmb();/* Ensure the empty pgd is visible before any actual TTBR write */
> +
> +       data->pgtable_cached = kmem_cache_create(
> +                                       "io-pgtable-arm-short",
> +                                        ARM_SHORT_BYTES_PER_PTE,
> +                                        ARM_SHORT_BYTES_PER_PTE,
> +                                        0, NULL);
> +       if (!data->pgtable_cached)
> +               goto out_free_pgd;
> +
> +       /* TTBRs */
> +       cfg->arm_short_cfg.ttbr[0] = virt_to_phys(data->pgd);
> +       cfg->arm_short_cfg.ttbr[1] = 0;
> +       cfg->arm_short_cfg.tcr = 0;
> +       cfg->arm_short_cfg.nmrr = 0;
> +       cfg->arm_short_cfg.prrr = 0;
> +
> +       data->iop.ops = (struct io_pgtable_ops) {
> +               .map            = arm_short_map,
> +               .unmap          = arm_short_unmap,
> +               .iova_to_phys   = arm_short_iova_to_phys,
> +       };
> +
> +       return &data->iop;
> +
> +out_free_pgd:
> +       __arm_short_free_pgtable(data->pgd, data->pgd_size, true, cfg);
> +out_free_data:
> +       kfree(data);
> +       return NULL;
> +}
> +
> +static void arm_short_free_pgtable(struct io_pgtable *iop)
> +{
> +       struct arm_short_io_pgtable *data = io_pgtable_to_data(iop);
> +
> +       kmem_cache_destroy(data->pgtable_cached);
> +       __arm_short_free_pgtable(data->pgd, data->pgd_size,
> +                                true, &data->iop.cfg);
> +       kfree(data);
> +}
> +
> +struct io_pgtable_init_fns io_pgtable_arm_short_init_fns = {
> +       .alloc  = arm_short_alloc_pgtable,
> +       .free   = arm_short_free_pgtable,
> +};
> +

[...]

> diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
> index 6436fe2..14a9b3a 100644
> --- a/drivers/iommu/io-pgtable.c
> +++ b/drivers/iommu/io-pgtable.c
> @@ -28,6 +28,7 @@ extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns;
>  extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns;
>  extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns;
>  extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns;
> +extern struct io_pgtable_init_fns io_pgtable_arm_short_init_fns;
> 
>  static const struct io_pgtable_init_fns *
>  io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] =
> @@ -38,6 +39,9 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] =
>         [ARM_64_LPAE_S1] = &io_pgtable_arm_64_lpae_s1_init_fns,
>         [ARM_64_LPAE_S2] = &io_pgtable_arm_64_lpae_s2_init_fns,
>  #endif
> +#ifdef CONFIG_IOMMU_IO_PGTABLE_SHORT
> +       [ARM_SHORT_DESC] = &io_pgtable_arm_short_init_fns,
> +#endif
>  };
> 
>  struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
> diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
> index 68c63d9..0f45e60 100644
> --- a/drivers/iommu/io-pgtable.h
> +++ b/drivers/iommu/io-pgtable.h
> @@ -9,6 +9,7 @@ enum io_pgtable_fmt {
>         ARM_32_LPAE_S2,
>         ARM_64_LPAE_S1,
>         ARM_64_LPAE_S2,
> +       ARM_SHORT_DESC,
>         IO_PGTABLE_NUM_FMTS,
>  };
> 
> @@ -45,6 +46,9 @@ struct iommu_gather_ops {
>   */
>  struct io_pgtable_cfg {
>         #define IO_PGTABLE_QUIRK_ARM_NS (1 << 0)        /* Set NS bit in PTEs */
> +       #define IO_PGTABLE_QUIRK_SHORT_SUPERSECTION     BIT(1)
> +       #define IO_PGTABLE_QUIRK_SHORT_NO_XN            BIT(2) /* No XN bit */
> +       #define IO_PGTABLE_QUIRK_SHORT_NO_PERMS         BIT(3) /* No AP bit */

Why have two quirks for this? I suggested included NO_XN in NO_PERMS:

  http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/361160.html

>         int                             quirks;
>         unsigned long                   pgsize_bitmap;
>         unsigned int                    ias;
> @@ -64,6 +68,13 @@ struct io_pgtable_cfg {
>                         u64     vttbr;
>                         u64     vtcr;
>                 } arm_lpae_s2_cfg;
> +
> +               struct {
> +                       u32     ttbr[2];
> +                       u32     tcr;
> +                       u32     nmrr;
> +                       u32     prrr;
> +               } arm_short_cfg;

We don't return an SCTLR value here, so a comment somewhere saying that
access flag is not supported would be helpful (so that drivers can ensure
that they configure things for the AP[2:0] permission model).

Will

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
@ 2015-09-17 14:54       ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-09-17 14:54 UTC (permalink / raw)
  To: Will Deacon
  Cc: Joerg Roedel, Thierry Reding, Mark Rutland, Matthias Brugger,
	Robin Murphy, Daniel Kurtz, Tomasz Figa, Lucas Stach,
	Rob Herring, Catalin Marinas, linux-mediatek, Sasha Hauer,
	srv_heupstream, devicetree, linux-kernel, linux-arm-kernel,
	iommu, pebolle, arnd, mitchelh, youhua.li, k.zhang,
	frederic.chen

On Wed, 2015-09-16 at 16:58 +0100, Will Deacon wrote:
> On Mon, Aug 03, 2015 at 11:21:16AM +0100, Yong Wu wrote:
> > This patch is for ARM Short Descriptor Format.
> > 
> > Signed-off-by: Yong Wu <yong.wu@mediatek.com>
> > ---
> >  drivers/iommu/Kconfig                |  18 +
> >  drivers/iommu/Makefile               |   1 +
> >  drivers/iommu/io-pgtable-arm-short.c | 813 +++++++++++++++++++++++++++++++++++
> >  drivers/iommu/io-pgtable-arm.c       |   3 -
> >  drivers/iommu/io-pgtable.c           |   4 +
> >  drivers/iommu/io-pgtable.h           |  14 +
> >  6 files changed, 850 insertions(+), 3 deletions(-)
> >  create mode 100644 drivers/iommu/io-pgtable-arm-short.c
> > 
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index f1fb1d3..3abd066 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -39,6 +39,24 @@ config IOMMU_IO_PGTABLE_LPAE_SELFTEST
> > 
> >           If unsure, say N here.
> > 
> > +config IOMMU_IO_PGTABLE_SHORT
> > +       bool "ARMv7/v8 Short Descriptor Format"
> > +       select IOMMU_IO_PGTABLE
> > +       depends on ARM || ARM64 || COMPILE_TEST
> > +       help
> > +         Enable support for the ARM Short-descriptor pagetable format.
> > +         This allocator supports 2 levels translation tables which supports
> 
> Some minor rewording here:
> 
> "...2 levels of translation tables, which enables a 32-bit memory map based
>  on..."

Hi Will,
    OK.Thanks very much for your review so detail every time.

> 
> > +         a memory map based on memory sections or pages.
> > +
> > +config IOMMU_IO_PGTABLE_SHORT_SELFTEST
> > +       bool "Short Descriptor selftests"
> > +       depends on IOMMU_IO_PGTABLE_SHORT
> > +       help
> > +         Enable self-tests for Short-descriptor page table allocator.
> > +         This performs a series of page-table consistency checks during boot.
> > +
> > +         If unsure, say N here.
> > +
> >  endmenu
> > 
> >  config IOMMU_IOVA
> 
> [...]
> > +#define ARM_SHORT_PGD_PGTABLE_MSK              0xfffffc00
> 
> You could use (~(ARM_SHORT_BYTES_PER_PTE - 1)), I think.

Yes. Thanks.

> > +/* level 2 pagetable */
> > +#define ARM_SHORT_PTE_TYPE_LARGE               BIT(0)
> > +#define ARM_SHORT_PTE_SMALL_XN                 BIT(0)
> > +#define ARM_SHORT_PTE_TYPE_SMALL               BIT(1)
> > +#define ARM_SHORT_PTE_B                                BIT(2)
> > +#define ARM_SHORT_PTE_C                                BIT(3)
> > +#define ARM_SHORT_PTE_RD_WR                    (3 << 4)
> > +#define ARM_SHORT_PTE_RDONLY                   BIT(9)
> > +#define ARM_SHORT_PTE_S                                BIT(10)
> > +#define ARM_SHORT_PTE_nG                       BIT(11)
> > +#define ARM_SHORT_PTE_LARGE_XN                 BIT(15)
> > +#define ARM_SHORT_PTE_LARGE_MSK                        (~(SZ_64K - 1))
> > +#define ARM_SHORT_PTE_SMALL_MSK                        (~(SZ_4K - 1))
> > +#define ARM_SHORT_PTE_TYPE_MSK                 \
> > +       (ARM_SHORT_PTE_TYPE_LARGE | ARM_SHORT_PTE_TYPE_SMALL)
> > +#define ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(pte)   \
> > +       (((pte) & ARM_SHORT_PTE_TYPE_SMALL) == ARM_SHORT_PTE_TYPE_SMALL)
> 
> Maybe a comment here, because it's confusing that you don't and with the
> mask due to XN.

I will add a comment like : 
/* The bit0 in small page is XN */

> > +#define ARM_SHORT_PTE_LARGE_GET_PROT(pte)      \
> > +       (((pte) & (~ARM_SHORT_PTE_LARGE_MSK)) & ~ARM_SHORT_PTE_TYPE_MSK)
> 
> AFAICT, the only user of this also does an '& ~ARM_SHORT_PTE_SMALL_MSK'.
> Wouldn't it be better to define ARM_SHORT_PTE_GET_PROT, which just returns
> the AP bits? That said, what are you going to do about XN? I know you
> don't support it in your hardware, but this could code should still do
> the right thing.

I'm a little confuse here: rename to ARM_SHORT_PTE_GET_PROT which just
return the AP bits? like this :
//=====
#define ARM_SHORT_PTE_GET_PROT(pte) \
(((pte) & (~ARM_SHORT_PTE_SMALL_MSK)) & ~ARM_SHORT_PTE_TYPE_MSK)
//=====

This macro is only used to get the prot of large page while split.

If it only return AP bits, then how about PXN,TEX[2:0] in large page?
(we need transform PXN in large-page to XN in small-page while split)

how about add a comment like below:
//=====
/* Get the prot of large page for split */
#define ARM_SHORT_PTE_LARGE_GET_PROT(pte)      \
   (((pte) & (~ARM_SHORT_PTE_LARGE_MSK)) & ~ARM_SHORT_PTE_TYPE_MSK)
//=====
or rename it ARM_SHORT_PTE_GET_PROT_SPLIT?

> 
> > +static int
> > +__arm_short_set_pte(arm_short_iopte *ptep, arm_short_iopte pte,
> > +                   unsigned int ptenr, struct io_pgtable_cfg *cfg)
> > +{
> > +       struct device *dev = cfg->iommu_dev;
> > +       int i;
> > +
> > +       for (i = 0; i < ptenr; i++) {
> > +               if (ptep[i] && pte) {
> > +                       /* Someone else may have allocated for this pte */
> > +                       WARN_ON(!selftest_running);
> > +                       goto err_exist_pte;
> > +               }
> > +               ptep[i] = pte;
> > +       }
> > +
> > +       if (selftest_running)
> > +               return 0;
> > +
> > +       dma_sync_single_for_device(dev, __arm_short_dma_addr(dev, ptep),
> > +                                  sizeof(*ptep) * ptenr, DMA_TO_DEVICE);
> > +       return 0;
> > +
> > +err_exist_pte:
> > +       while (i--)
> > +               ptep[i] = 0;
> 
> What about a dma_sync for the failure case?

I will add it.

> 
> > +       return -EEXIST;
> > +}
> > +
> > +static void *
> > +__arm_short_alloc_pgtable(size_t size, gfp_t gfp, bool pgd,
> > +                         struct io_pgtable_cfg *cfg)
> > +{
> > +       struct arm_short_io_pgtable *data;
> > +       struct device *dev = cfg->iommu_dev;
> > +       dma_addr_t dma;
> > +       void *va;
> > +
> > +       if (pgd) {/* lvl1 pagetable */
> > +               va = alloc_pages_exact(size, gfp);
> > +       } else {  /* lvl2 pagetable */
> > +               data = io_pgtable_cfg_to_data(cfg);
> > +               va = kmem_cache_zalloc(data->pgtable_cached, gfp);
> > +       }
> > +
> > +       if (!va)
> > +               return NULL;
> > +
> > +       if (selftest_running)
> > +               return va;
> > +
> > +       dma = dma_map_single(dev, va, size, DMA_TO_DEVICE);
> > +       if (dma_mapping_error(dev, dma))
> > +               goto out_free;
> > +
> > +       if (dma != __arm_short_dma_addr(dev, va))
> > +               goto out_unmap;
> > +
> > +       if (!pgd) {
> > +               kmemleak_ignore(va);
> > +               dma_sync_single_for_device(dev, __arm_short_dma_addr(dev, va),
> > +                                          size, DMA_TO_DEVICE);
> 
> Why do you need to do this as well as the dma_map_single above?

It's redundance, I will delete it...

> 
> > +       }
> > +
> > +       return va;
> > +
> > +out_unmap:
> > +       dev_err_ratelimited(dev, "Cannot accommodate DMA translation for IOMMU page tables\n");
> > +       dma_unmap_single(dev, dma, size, DMA_TO_DEVICE);
> > +out_free:
> > +       if (pgd)
> > +               free_pages_exact(va, size);
> > +       else
> > +               kmem_cache_free(data->pgtable_cached, va);
> > +       return NULL;
> > +}
> > +
> > +static void
> > +__arm_short_free_pgtable(void *va, size_t size, bool pgd,
> > +                        struct io_pgtable_cfg *cfg)
> > +{
> > +       struct arm_short_io_pgtable *data = io_pgtable_cfg_to_data(cfg);
> > +       struct device *dev = cfg->iommu_dev;
> > +
> > +       if (!selftest_running)
> > +               dma_unmap_single(dev, __arm_short_dma_addr(dev, va),
> > +                                size, DMA_TO_DEVICE);
> > +
> > +       if (pgd)
> > +               free_pages_exact(va, size);
> > +       else
> > +               kmem_cache_free(data->pgtable_cached, va);
> > +}
> > +
> > +static arm_short_iopte
> > +__arm_short_pte_prot(struct arm_short_io_pgtable *data, int prot, bool large)
> > +{
> > +       arm_short_iopte pteprot;
> > +       int quirk = data->iop.cfg.quirks;
> > +
> > +       pteprot = ARM_SHORT_PTE_S | ARM_SHORT_PTE_nG;
> > +       pteprot |= large ? ARM_SHORT_PTE_TYPE_LARGE :
> > +                               ARM_SHORT_PTE_TYPE_SMALL;
> > +       if (prot & IOMMU_CACHE)
> > +               pteprot |=  ARM_SHORT_PTE_B | ARM_SHORT_PTE_C;
> > +       if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_XN) && (prot & IOMMU_NOEXEC)) {
> > +                       pteprot |= large ? ARM_SHORT_PTE_LARGE_XN :
> > +                               ARM_SHORT_PTE_SMALL_XN;
> 
> Weird indentation, man. Also, see my later comment about combining NO_XN
> with NO_PERMS (the latter subsumes the first)

Sorry, I misunderstanded about the quirk, I will use NO_PERMS which
contain NO_XN.

> 
> > +       }
> > +       if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_PERMS)) {
> > +               pteprot |= ARM_SHORT_PTE_RD_WR;
> > +               if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
> > +                       pteprot |= ARM_SHORT_PTE_RDONLY;
> > +       }
> > +       return pteprot;
> > +}
> > +
[...]
> > +static arm_short_iopte
> > +__arm_short_pte_prot_split(struct arm_short_io_pgtable *data,
> > +                          arm_short_iopte pgdprot,
> > +                          arm_short_iopte pteprot_large,
> > +                          bool large)
> > +{
> > +       arm_short_iopte pteprot = 0;
> > +
> > +       pteprot = ARM_SHORT_PTE_S | ARM_SHORT_PTE_nG | ARM_SHORT_PTE_RD_WR;
> > +       pteprot |= large ? ARM_SHORT_PTE_TYPE_LARGE :
> > +                               ARM_SHORT_PTE_TYPE_SMALL;
> > +
> > +       /* large page to small page pte prot. Only large page may split */
> > +       if (!pgdprot && !large) {
> 
> It's slightly complicated having these two variables controlling the
> behaviour of the split. In reality, we're either splitting a section or
> a large page, so there are three valid combinations.
> 
> It might be simpler to operate on IOMMU_{READ,WRITE,NOEXEC,CACHE} as
> much as possible, and then have some simple functions to encode/decode
> these into section/large/small page prot bits. We could then just pass
> the IOMMU_* prot around along with the map size. What do you think?

It will be more simple if IOMMU_{READ,WRITE,NOEXEC,CACHE} prot can be
used here. But we cann't get IOMMU_x prot while split in unmap. is it
right?
we can only get the prot from the pagetable, then restructure the new
prot after split.

> 
> > +               pteprot |= pteprot_large & ~ARM_SHORT_PTE_SMALL_MSK;
> > +               if (pteprot_large & ARM_SHORT_PTE_LARGE_XN)
> > +                       pteprot |= ARM_SHORT_PTE_SMALL_XN;
> > +       }
> > +
> > +       /* section to pte prot */
> > +       if (pgdprot & ARM_SHORT_PGD_C)
> > +               pteprot |= ARM_SHORT_PTE_C;
> > +       if (pgdprot & ARM_SHORT_PGD_B)
> > +               pteprot |= ARM_SHORT_PTE_B;
> > +       if (pgdprot & ARM_SHORT_PGD_nG)
> > +               pteprot |= ARM_SHORT_PTE_nG;
> > +       if (pgdprot & ARM_SHORT_PGD_SECTION_XN)
> > +               pteprot |= large ? ARM_SHORT_PTE_LARGE_XN :
> > +                               ARM_SHORT_PTE_SMALL_XN;
> > +       if (pgdprot & ARM_SHORT_PGD_RD_WR)
> > +               pteprot |= ARM_SHORT_PTE_RD_WR;
> > +       if (pgdprot & ARM_SHORT_PGD_RDONLY)
> > +               pteprot |= ARM_SHORT_PTE_RDONLY;
> > +
> > +       return pteprot;
> > +}
> > +
> > +static int
> > +_arm_short_map(struct arm_short_io_pgtable *data,
> > +              unsigned int iova, phys_addr_t paddr,
> > +              arm_short_iopte pgdprot, arm_short_iopte pteprot,
> > +              bool large)
> > +{
> > +       struct io_pgtable_cfg *cfg = &data->iop.cfg;
> > +       arm_short_iopte *pgd = data->pgd, *pte;
> > +       void *pte_new = NULL;
> > +       int ret;
> > +
> > +       pgd += ARM_SHORT_PGD_IDX(iova);
> > +
> > +       if (!pteprot) { /* section or supersection */
> > +               pte = pgd;
> > +               pteprot = pgdprot;
> > +       } else {        /* page or largepage */
> > +               if (!(*pgd)) {
> > +                       pte_new = __arm_short_alloc_pgtable(
> > +                                       ARM_SHORT_BYTES_PER_PTE,
> > +                                       GFP_ATOMIC, false, cfg);
> > +                       if (unlikely(!pte_new))
> > +                               return -ENOMEM;
> > +
> > +                       pgdprot |= virt_to_phys(pte_new);
> > +                       __arm_short_set_pte(pgd, pgdprot, 1, cfg);
> > +               }
> > +               pte = arm_short_get_pte_in_pgd(*pgd, iova);
> > +       }
> > +
> > +       pteprot |= (arm_short_iopte)paddr;
> > +       ret = __arm_short_set_pte(pte, pteprot, large ? 16 : 1, cfg);
> > +       if (ret && pte_new)
> > +               __arm_short_free_pgtable(pte_new, ARM_SHORT_BYTES_PER_PTE,
> > +                                        false, cfg);
> 
> Don't you need to kill the pgd entry before freeing this? Please see my
> previous comments about safely freeing page tables:
> 
>   http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/358268.html
> 
> (at the end of the post)


 I will add like this:
//======================
  if (ret && pte_new)
        goto err_unmap_pgd:

  if (data->iop.cfg.quirk & IO_PGTABLE_QUIRK_TLBI_ON_MAP) {
        tlb->tlb_add_flush(iova, size, true, data->iop.cookie);
        tlb->tlb_sync(data->iop.cookie);
   }
   return ret;

err_unmap_pgd:
           __arm_short_set_pte(pgd, 0, 1, cfg);
           tlb->tlb_add_flush(iova, SZ_1M, true, data->iop.cookie); /*
the size is 1M, the whole pgd */
           tlb->tlb_sync(data->iop.cookie); 
           __arm_short_free_pgtable(pte_new, ARM_SHORT_BYTES_PER_PTE,
                                   false, cfg);
   return ret;
//======================

Here I move the TLBI_ON_MAP quirk into _arm_short_map, then the map from
split also could flush-tlb if it's necessary.

> > +       return ret;
> > +}
> > +
[...]
> > +static bool _arm_short_whether_free_pgtable(arm_short_iopte *pgd)
> > +{
> 
> _arm_short_pgtable_empty might be a better name.

Thanks.

> 
> > +       arm_short_iopte *pte;
> > +       int i;
> > +
> > +       pte = ARM_SHORT_GET_PGTABLE_VA(*pgd);
> > +       for (i = 0; i < ARM_SHORT_PTRS_PER_PTE; i++) {
> > +               if (pte[i] != 0)
> > +                       return false;
> > +       }
> > +
> > +       return true;
> > +}
> > +
> > +static int
> > +arm_short_split_blk_unmap(struct io_pgtable_ops *ops, unsigned int iova,
> > +                         phys_addr_t paddr, size_t size,
> > +                         arm_short_iopte pgdprotup, arm_short_iopte pteprotup,
> > +                         size_t blk_size)
> > +{
> > +       struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
> > +       const struct iommu_gather_ops *tlb = data->iop.cfg.tlb;
> > +       struct io_pgtable_cfg *cfg = &data->iop.cfg;
> > +       unsigned long *pgbitmap = &cfg->pgsize_bitmap;
> > +       unsigned int blk_base, blk_start, blk_end, i;
> > +       arm_short_iopte pgdprot, pteprot;
> > +       phys_addr_t blk_paddr;
> > +       size_t mapsize = 0, nextmapsize;
> > +       int ret;
> > +
> > +       /* find the nearest mapsize */
> > +       for (i = find_first_bit(pgbitmap, BITS_PER_LONG);
> > +            i < BITS_PER_LONG && ((1 << i) < blk_size) &&
> > +            IS_ALIGNED(size, 1 << i);
> > +            i = find_next_bit(pgbitmap, BITS_PER_LONG, i + 1))
> > +               mapsize = 1 << i;
> > +
> > +       if (WARN_ON(!mapsize))
> > +               return 0; /* Bytes unmapped */
> > +       nextmapsize = 1 << i;
> > +
> > +       blk_base = iova & ~(blk_size - 1);
> > +       blk_start = blk_base;
> > +       blk_end = blk_start + blk_size;
> > +       blk_paddr = paddr;
> > +
> > +       for (; blk_start < blk_end;
> > +            blk_start += mapsize, blk_paddr += mapsize) {
> > +               /* Unmap! */
> > +               if (blk_start == iova)
> > +                       continue;
> > +
> > +               /* Try to upper map */
> > +               if (blk_base != blk_start &&
> > +                   IS_ALIGNED(blk_start | blk_paddr, nextmapsize) &&
> > +                   mapsize != nextmapsize) {
> > +                       mapsize = nextmapsize;
> > +                       i = find_next_bit(pgbitmap, BITS_PER_LONG, i + 1);
> > +                       if (i < BITS_PER_LONG)
> > +                               nextmapsize = 1 << i;
> > +               }
> > +
> > +               if (mapsize == SZ_1M) {
> 
> How do we get here with a mapsize of 1M?

About the split, there may be some cases:

super section may split to section, or large page, or small page.
section may split to large page, or small page.
large page may split to small page.

How do we get here with a mapsize of 1M?
->the mapsize will be 1M while supersection split to section.
  If we run the self-test, we can get the mapsize's change.

> 
> > +                       pgdprot = pgdprotup;
> > +                       pgdprot |= __arm_short_pgd_prot(data, 0, false);

    Here I cann't get IOMMU_{READ,WRITE,NOEXEC,CACHE}, I have to use 0
as the second parameter(some bits like PGD_B/PGD_C have been record in
pgdprotup)

> > +                       pteprot = 0;
> > +               } else { /* small or large page */
> > +                       pgdprot = (blk_size == SZ_64K) ? 0 : pgdprotup;
> > +                       pteprot = __arm_short_pte_prot_split(
> > +                                       data, pgdprot, pteprotup,
> > +                                       mapsize == SZ_64K);
> > +                       pgdprot = __arm_short_pgtable_prot(data);
> > +               }
> > +
> > +               ret = _arm_short_map(data, blk_start, blk_paddr, pgdprot,
> > +                                    pteprot, mapsize == SZ_64K);
> > +               if (ret < 0) {
> > +                       /* Free the table we allocated */
> > +                       arm_short_iopte *pgd = data->pgd, *pte;
> > +
> > +                       pgd += ARM_SHORT_PGD_IDX(blk_base);
> > +                       if (*pgd) {
> > +                               pte = ARM_SHORT_GET_PGTABLE_VA(*pgd);
> > +                               __arm_short_set_pte(pgd, 0, 1, cfg);
> > +                               tlb->tlb_add_flush(blk_base, blk_size, true,
> > +                                                  data->iop.cookie);
> > +                               tlb->tlb_sync(data->iop.cookie);
> > +                               __arm_short_free_pgtable(
> > +                                       pte, ARM_SHORT_BYTES_PER_PTE,
> > +                                       false, cfg);
> 
> This looks wrong. _arm_short_map cleans up if it returns non-zero already.

...
YES. It seems that I can delete the "if" for the error case.

if (ret < 0)
	return 0;

> 
> > +                       }
> > +                       return 0;/* Bytes unmapped */
> > +               }
> > +       }
> > +
> > +       tlb->tlb_add_flush(blk_base, blk_size, true, data->iop.cookie);
> > +       tlb->tlb_sync(data->iop.cookie);
> 
> Why are you syncing here? You can postpone this to the caller, if it turns
> out the unmap was a success.

I only saw that there is a tlb_add_flush in arm_lpae_split_blk_unmap. so
add it here.
About this, I think that we can delete the tlb-flush here. see below.

> 
> > +       return size;
> > +}
> > +
> > +static int arm_short_unmap(struct io_pgtable_ops *ops,
> > +                          unsigned long iova,
> > +                          size_t size)
> > +{
> > +       struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
> > +       struct io_pgtable_cfg *cfg = &data->iop.cfg;
> > +       arm_short_iopte *pgd, *pte = NULL;
> > +       arm_short_iopte curpgd, curpte = 0;
> > +       phys_addr_t paddr;
> > +       unsigned int iova_base, blk_size = 0;
> > +       void *cookie = data->iop.cookie;
> > +       bool pgtablefree = false;
> > +
> > +       pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova);
> > +
> > +       /* Get block size */
> > +       if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(*pgd)) {
> > +               pte = arm_short_get_pte_in_pgd(*pgd, iova);
> > +
> > +               if (ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(*pte))
> > +                       blk_size = SZ_4K;
> > +               else if (ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(*pte))
> > +                       blk_size = SZ_64K;
> > +               else
> > +                       WARN_ON(1);

> > +       } else if (ARM_SHORT_PGD_TYPE_IS_SECTION(*pgd)) {
> > +               blk_size = SZ_1M;
> > +       } else if (ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(*pgd)) {
> > +               blk_size = SZ_16M;
> > +       } else {
> > +               WARN_ON(1);
> 
> Maybe return 0 or something instead of falling through with blk_size == 0?

how about :
//=====
        if (WARN_ON(blk_size == 0))
		return 0;
//=====
 return 0 and report the error information.

> 
> > +       }
> > +
> > +       iova_base = iova & ~(blk_size - 1);
> > +       pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova_base);
> > +       paddr = arm_short_iova_to_phys(ops, iova_base);
> > +       curpgd = *pgd;
> > +
> > +       if (blk_size == SZ_4K || blk_size == SZ_64K) {
> > +               pte = arm_short_get_pte_in_pgd(*pgd, iova_base);
> > +               curpte = *pte;
> > +               __arm_short_set_pte(pte, 0, blk_size / SZ_4K, cfg);
> > +
> > +            pgtablefree = _arm_short_whether_free_pgtable(pgd);
> > +            if (pgtablefree)
> > +(1)(2)                 __arm_short_set_pte(pgd, 0, 1, cfg);
> > +       } else if (blk_size == SZ_1M || blk_size == SZ_16M) {
> > +               __arm_short_set_pte(pgd, 0, blk_size / SZ_1M, cfg);
> > +       }
> > +
> > +(3)    cfg->tlb->tlb_add_flush(iova_base, blk_size, true, cookie);
> > +(4)    cfg->tlb->tlb_sync(cookie);
> > +
> > +       if (pgtablefree)/* Free pgtable after tlb-flush */
> > +(5)              __arm_short_free_pgtable(ARM_SHORT_GET_PGTABLE_VA(curpgd),
> > +                                        ARM_SHORT_BYTES_PER_PTE, false, cfg);
> 
> Curious, but why do you care about freeing this on unmap? It will get
> freed when the page table itself is freed anyway (via the ->free callback).

This is free the level2 pagetable while there is no item in it. It isn't
level1 pagetable(->free callback).

The flow of free pagetable is following your suggestion that 5 steps
what I mark above like (1)..(5). so I have to move free_pgtable after
(4)tlb_sync. and add a comment /* Free pgtable after tlb-flush */.
the comment may should be changed to : /* Free level2 pgtable after
tlb-flush */

> > +
> > +       if (blk_size > size) { /* Split the block */
> > +               return arm_short_split_blk_unmap(
> > +                               ops, iova, paddr, size,
> > +                               ARM_SHORT_PGD_GET_PROT(curpgd),
> > +                               ARM_SHORT_PTE_LARGE_GET_PROT(curpte),
> > +                               blk_size);

    About add flush-tlb after split. There is a flush-tlb before. And
the map in split is from invalid to valid. It don't need flush-tlb
again.

> > +       } else if (blk_size < size) {
> > +               /* Unmap the block while remap partial again after split */
> > +               return blk_size +
> > +                       arm_short_unmap(ops, iova + blk_size, size - blk_size);
> > +       }
> > +
> > +       return size;
> > +}
> > +
> > +static struct io_pgtable *
> > +arm_short_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
> > +{
> > +       struct arm_short_io_pgtable *data;
> > +
> > +       if (cfg->ias > 32 || cfg->oas > 32)
> > +               return NULL;
> > +
> > +       cfg->pgsize_bitmap &=
> > +               (cfg->quirks & IO_PGTABLE_QUIRK_SHORT_SUPERSECTION) ?
> > +               (SZ_4K | SZ_64K | SZ_1M | SZ_16M) : (SZ_4K | SZ_64K | SZ_1M);
> > +
> > +       data = kzalloc(sizeof(*data), GFP_KERNEL);
> > +       if (!data)
> > +               return NULL;
> > +
> > +       data->pgd_size = SZ_16K;
> > +       data->pgd = __arm_short_alloc_pgtable(
> > +                                       data->pgd_size,
> > +                                       GFP_KERNEL | __GFP_ZERO | __GFP_DMA,
> > +                                       true, cfg);
> > +       if (!data->pgd)
> > +               goto out_free_data;
> > +       wmb();/* Ensure the empty pgd is visible before any actual TTBR write */
> > +
> > +       data->pgtable_cached = kmem_cache_create(
> > +                                       "io-pgtable-arm-short",
> > +                                        ARM_SHORT_BYTES_PER_PTE,
> > +                                        ARM_SHORT_BYTES_PER_PTE,
> > +                                        0, NULL);

I prepare to add SLAB_CACHE_DMA to guarantee the level2 pagetable base
address(pa) is alway 32bit(not over 4GB).

> > +       if (!data->pgtable_cached)
> > +               goto out_free_pgd;
> > +
> > +       /* TTBRs */
> > +       cfg->arm_short_cfg.ttbr[0] = virt_to_phys(data->pgd);
> > +       cfg->arm_short_cfg.ttbr[1] = 0;
> > +       cfg->arm_short_cfg.tcr = 0;
> > +       cfg->arm_short_cfg.nmrr = 0;
> > +       cfg->arm_short_cfg.prrr = 0;

           About SCTLR, How about we add it here:
          //===========
          cfg->arm_short_cfg.sctlr = 0; /* The iommu user should
configure IOMMU_{READ/WRITE} */
          //===========
           Is the comment ok?
> > +
> > +       data->iop.ops = (struct io_pgtable_ops) {
> > +               .map            = arm_short_map,
> > +               .unmap          = arm_short_unmap,
> > +               .iova_to_phys   = arm_short_iova_to_phys,
> > +       };
> > +
> > +       return &data->iop;
> > +
> > +out_free_pgd:
> > +       __arm_short_free_pgtable(data->pgd, data->pgd_size, true, cfg);
> > +out_free_data:
> > +       kfree(data);
> > +       return NULL;
> > +}
> > +
> [...]
> 
> > diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
> > index 6436fe2..14a9b3a 100644
> > --- a/drivers/iommu/io-pgtable.c
> > +++ b/drivers/iommu/io-pgtable.c
> > @@ -28,6 +28,7 @@ extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns;
> >  extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns;
> >  extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns;
> >  extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns;
> > +extern struct io_pgtable_init_fns io_pgtable_arm_short_init_fns;
> > 
> >  static const struct io_pgtable_init_fns *
> >  io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] =
> > @@ -38,6 +39,9 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] =
> >         [ARM_64_LPAE_S1] = &io_pgtable_arm_64_lpae_s1_init_fns,
> >         [ARM_64_LPAE_S2] = &io_pgtable_arm_64_lpae_s2_init_fns,
> >  #endif
> > +#ifdef CONFIG_IOMMU_IO_PGTABLE_SHORT
> > +       [ARM_SHORT_DESC] = &io_pgtable_arm_short_init_fns,
> > +#endif
> >  };
> > 
> >  struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
> > diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
> > index 68c63d9..0f45e60 100644
> > --- a/drivers/iommu/io-pgtable.h
> > +++ b/drivers/iommu/io-pgtable.h
> > @@ -9,6 +9,7 @@ enum io_pgtable_fmt {
> >         ARM_32_LPAE_S2,
> >         ARM_64_LPAE_S1,
> >         ARM_64_LPAE_S2,
> > +       ARM_SHORT_DESC,
> >         IO_PGTABLE_NUM_FMTS,
> >  };
> > 
> > @@ -45,6 +46,9 @@ struct iommu_gather_ops {
> >   */
> >  struct io_pgtable_cfg {
> >         #define IO_PGTABLE_QUIRK_ARM_NS (1 << 0)        /* Set NS bit in PTEs */
> > +       #define IO_PGTABLE_QUIRK_SHORT_SUPERSECTION     BIT(1)
> > +       #define IO_PGTABLE_QUIRK_SHORT_NO_XN            BIT(2) /* No XN bit */
> > +       #define IO_PGTABLE_QUIRK_SHORT_NO_PERMS         BIT(3) /* No AP bit */
> 
> Why have two quirks for this? I suggested included NO_XN in NO_PERMS:
> 
>   http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/361160.html

Sorry. I will change like this in next time:

#define IO_PGTABLE_QUIRK_NO_PERMS            BIT(1) /* No XN/AP bits */
#define IO_PGTABLE_QUIRK_TLBI_ON_MAP         BIT(2) /* TLB-Flush after
map */
#define IO_PGTABLE_QUIRK_SHORT_SUPERSECTION  BIT(3)

Do I need change (1 << 0) to BIT(0) in ARM_NS?

> 
> >         int                             quirks;
> >         unsigned long                   pgsize_bitmap;
> >         unsigned int                    ias;
> > @@ -64,6 +68,13 @@ struct io_pgtable_cfg {
> >                         u64     vttbr;
> >                         u64     vtcr;
> >                 } arm_lpae_s2_cfg;
> > +
> > +               struct {
> > +                       u32     ttbr[2];
> > +                       u32     tcr;
> > +                       u32     nmrr;
> > +                       u32     prrr;
> > +               } arm_short_cfg;
> 
> We don't return an SCTLR value here, so a comment somewhere saying that
> access flag is not supported would be helpful (so that drivers can ensure
> that they configure things for the AP[2:0] permission model).

Do we need add SCTLR? like: 
         struct {
                 u32     ttbr[2];
                 u32     tcr;
                 u32     nmrr;
                 u32     prrr;
+                u32     sctlr;
          } arm_short_cfg;

  Or only add a comment in the place where ttbr,tcr is set?
  /* The iommu user should configure IOMMU_{READ/WRITE} since SCTLR
isn't implemented */
> 
> Will



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
@ 2015-09-17 14:54       ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-09-17 14:54 UTC (permalink / raw)
  To: Will Deacon
  Cc: Joerg Roedel, Thierry Reding, Mark Rutland, Matthias Brugger,
	Robin Murphy, Daniel Kurtz, Tomasz Figa, Lucas Stach,
	Rob Herring, Catalin Marinas,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Sasha Hauer,
	srv_heupstream-NuS5LvNUpcJWk0Htik3J/w,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	pebolle-IWqWACnzNjzz+pZb47iToQ

On Wed, 2015-09-16 at 16:58 +0100, Will Deacon wrote:
> On Mon, Aug 03, 2015 at 11:21:16AM +0100, Yong Wu wrote:
> > This patch is for ARM Short Descriptor Format.
> > 
> > Signed-off-by: Yong Wu <yong.wu-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
> > ---
> >  drivers/iommu/Kconfig                |  18 +
> >  drivers/iommu/Makefile               |   1 +
> >  drivers/iommu/io-pgtable-arm-short.c | 813 +++++++++++++++++++++++++++++++++++
> >  drivers/iommu/io-pgtable-arm.c       |   3 -
> >  drivers/iommu/io-pgtable.c           |   4 +
> >  drivers/iommu/io-pgtable.h           |  14 +
> >  6 files changed, 850 insertions(+), 3 deletions(-)
> >  create mode 100644 drivers/iommu/io-pgtable-arm-short.c
> > 
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index f1fb1d3..3abd066 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -39,6 +39,24 @@ config IOMMU_IO_PGTABLE_LPAE_SELFTEST
> > 
> >           If unsure, say N here.
> > 
> > +config IOMMU_IO_PGTABLE_SHORT
> > +       bool "ARMv7/v8 Short Descriptor Format"
> > +       select IOMMU_IO_PGTABLE
> > +       depends on ARM || ARM64 || COMPILE_TEST
> > +       help
> > +         Enable support for the ARM Short-descriptor pagetable format.
> > +         This allocator supports 2 levels translation tables which supports
> 
> Some minor rewording here:
> 
> "...2 levels of translation tables, which enables a 32-bit memory map based
>  on..."

Hi Will,
    OK.Thanks very much for your review so detail every time.

> 
> > +         a memory map based on memory sections or pages.
> > +
> > +config IOMMU_IO_PGTABLE_SHORT_SELFTEST
> > +       bool "Short Descriptor selftests"
> > +       depends on IOMMU_IO_PGTABLE_SHORT
> > +       help
> > +         Enable self-tests for Short-descriptor page table allocator.
> > +         This performs a series of page-table consistency checks during boot.
> > +
> > +         If unsure, say N here.
> > +
> >  endmenu
> > 
> >  config IOMMU_IOVA
> 
> [...]
> > +#define ARM_SHORT_PGD_PGTABLE_MSK              0xfffffc00
> 
> You could use (~(ARM_SHORT_BYTES_PER_PTE - 1)), I think.

Yes. Thanks.

> > +/* level 2 pagetable */
> > +#define ARM_SHORT_PTE_TYPE_LARGE               BIT(0)
> > +#define ARM_SHORT_PTE_SMALL_XN                 BIT(0)
> > +#define ARM_SHORT_PTE_TYPE_SMALL               BIT(1)
> > +#define ARM_SHORT_PTE_B                                BIT(2)
> > +#define ARM_SHORT_PTE_C                                BIT(3)
> > +#define ARM_SHORT_PTE_RD_WR                    (3 << 4)
> > +#define ARM_SHORT_PTE_RDONLY                   BIT(9)
> > +#define ARM_SHORT_PTE_S                                BIT(10)
> > +#define ARM_SHORT_PTE_nG                       BIT(11)
> > +#define ARM_SHORT_PTE_LARGE_XN                 BIT(15)
> > +#define ARM_SHORT_PTE_LARGE_MSK                        (~(SZ_64K - 1))
> > +#define ARM_SHORT_PTE_SMALL_MSK                        (~(SZ_4K - 1))
> > +#define ARM_SHORT_PTE_TYPE_MSK                 \
> > +       (ARM_SHORT_PTE_TYPE_LARGE | ARM_SHORT_PTE_TYPE_SMALL)
> > +#define ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(pte)   \
> > +       (((pte) & ARM_SHORT_PTE_TYPE_SMALL) == ARM_SHORT_PTE_TYPE_SMALL)
> 
> Maybe a comment here, because it's confusing that you don't and with the
> mask due to XN.

I will add a comment like : 
/* The bit0 in small page is XN */

> > +#define ARM_SHORT_PTE_LARGE_GET_PROT(pte)      \
> > +       (((pte) & (~ARM_SHORT_PTE_LARGE_MSK)) & ~ARM_SHORT_PTE_TYPE_MSK)
> 
> AFAICT, the only user of this also does an '& ~ARM_SHORT_PTE_SMALL_MSK'.
> Wouldn't it be better to define ARM_SHORT_PTE_GET_PROT, which just returns
> the AP bits? That said, what are you going to do about XN? I know you
> don't support it in your hardware, but this could code should still do
> the right thing.

I'm a little confuse here: rename to ARM_SHORT_PTE_GET_PROT which just
return the AP bits? like this :
//=====
#define ARM_SHORT_PTE_GET_PROT(pte) \
(((pte) & (~ARM_SHORT_PTE_SMALL_MSK)) & ~ARM_SHORT_PTE_TYPE_MSK)
//=====

This macro is only used to get the prot of large page while split.

If it only return AP bits, then how about PXN,TEX[2:0] in large page?
(we need transform PXN in large-page to XN in small-page while split)

how about add a comment like below:
//=====
/* Get the prot of large page for split */
#define ARM_SHORT_PTE_LARGE_GET_PROT(pte)      \
   (((pte) & (~ARM_SHORT_PTE_LARGE_MSK)) & ~ARM_SHORT_PTE_TYPE_MSK)
//=====
or rename it ARM_SHORT_PTE_GET_PROT_SPLIT?

> 
> > +static int
> > +__arm_short_set_pte(arm_short_iopte *ptep, arm_short_iopte pte,
> > +                   unsigned int ptenr, struct io_pgtable_cfg *cfg)
> > +{
> > +       struct device *dev = cfg->iommu_dev;
> > +       int i;
> > +
> > +       for (i = 0; i < ptenr; i++) {
> > +               if (ptep[i] && pte) {
> > +                       /* Someone else may have allocated for this pte */
> > +                       WARN_ON(!selftest_running);
> > +                       goto err_exist_pte;
> > +               }
> > +               ptep[i] = pte;
> > +       }
> > +
> > +       if (selftest_running)
> > +               return 0;
> > +
> > +       dma_sync_single_for_device(dev, __arm_short_dma_addr(dev, ptep),
> > +                                  sizeof(*ptep) * ptenr, DMA_TO_DEVICE);
> > +       return 0;
> > +
> > +err_exist_pte:
> > +       while (i--)
> > +               ptep[i] = 0;
> 
> What about a dma_sync for the failure case?

I will add it.

> 
> > +       return -EEXIST;
> > +}
> > +
> > +static void *
> > +__arm_short_alloc_pgtable(size_t size, gfp_t gfp, bool pgd,
> > +                         struct io_pgtable_cfg *cfg)
> > +{
> > +       struct arm_short_io_pgtable *data;
> > +       struct device *dev = cfg->iommu_dev;
> > +       dma_addr_t dma;
> > +       void *va;
> > +
> > +       if (pgd) {/* lvl1 pagetable */
> > +               va = alloc_pages_exact(size, gfp);
> > +       } else {  /* lvl2 pagetable */
> > +               data = io_pgtable_cfg_to_data(cfg);
> > +               va = kmem_cache_zalloc(data->pgtable_cached, gfp);
> > +       }
> > +
> > +       if (!va)
> > +               return NULL;
> > +
> > +       if (selftest_running)
> > +               return va;
> > +
> > +       dma = dma_map_single(dev, va, size, DMA_TO_DEVICE);
> > +       if (dma_mapping_error(dev, dma))
> > +               goto out_free;
> > +
> > +       if (dma != __arm_short_dma_addr(dev, va))
> > +               goto out_unmap;
> > +
> > +       if (!pgd) {
> > +               kmemleak_ignore(va);
> > +               dma_sync_single_for_device(dev, __arm_short_dma_addr(dev, va),
> > +                                          size, DMA_TO_DEVICE);
> 
> Why do you need to do this as well as the dma_map_single above?

It's redundance, I will delete it...

> 
> > +       }
> > +
> > +       return va;
> > +
> > +out_unmap:
> > +       dev_err_ratelimited(dev, "Cannot accommodate DMA translation for IOMMU page tables\n");
> > +       dma_unmap_single(dev, dma, size, DMA_TO_DEVICE);
> > +out_free:
> > +       if (pgd)
> > +               free_pages_exact(va, size);
> > +       else
> > +               kmem_cache_free(data->pgtable_cached, va);
> > +       return NULL;
> > +}
> > +
> > +static void
> > +__arm_short_free_pgtable(void *va, size_t size, bool pgd,
> > +                        struct io_pgtable_cfg *cfg)
> > +{
> > +       struct arm_short_io_pgtable *data = io_pgtable_cfg_to_data(cfg);
> > +       struct device *dev = cfg->iommu_dev;
> > +
> > +       if (!selftest_running)
> > +               dma_unmap_single(dev, __arm_short_dma_addr(dev, va),
> > +                                size, DMA_TO_DEVICE);
> > +
> > +       if (pgd)
> > +               free_pages_exact(va, size);
> > +       else
> > +               kmem_cache_free(data->pgtable_cached, va);
> > +}
> > +
> > +static arm_short_iopte
> > +__arm_short_pte_prot(struct arm_short_io_pgtable *data, int prot, bool large)
> > +{
> > +       arm_short_iopte pteprot;
> > +       int quirk = data->iop.cfg.quirks;
> > +
> > +       pteprot = ARM_SHORT_PTE_S | ARM_SHORT_PTE_nG;
> > +       pteprot |= large ? ARM_SHORT_PTE_TYPE_LARGE :
> > +                               ARM_SHORT_PTE_TYPE_SMALL;
> > +       if (prot & IOMMU_CACHE)
> > +               pteprot |=  ARM_SHORT_PTE_B | ARM_SHORT_PTE_C;
> > +       if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_XN) && (prot & IOMMU_NOEXEC)) {
> > +                       pteprot |= large ? ARM_SHORT_PTE_LARGE_XN :
> > +                               ARM_SHORT_PTE_SMALL_XN;
> 
> Weird indentation, man. Also, see my later comment about combining NO_XN
> with NO_PERMS (the latter subsumes the first)

Sorry, I misunderstanded about the quirk, I will use NO_PERMS which
contain NO_XN.

> 
> > +       }
> > +       if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_PERMS)) {
> > +               pteprot |= ARM_SHORT_PTE_RD_WR;
> > +               if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
> > +                       pteprot |= ARM_SHORT_PTE_RDONLY;
> > +       }
> > +       return pteprot;
> > +}
> > +
[...]
> > +static arm_short_iopte
> > +__arm_short_pte_prot_split(struct arm_short_io_pgtable *data,
> > +                          arm_short_iopte pgdprot,
> > +                          arm_short_iopte pteprot_large,
> > +                          bool large)
> > +{
> > +       arm_short_iopte pteprot = 0;
> > +
> > +       pteprot = ARM_SHORT_PTE_S | ARM_SHORT_PTE_nG | ARM_SHORT_PTE_RD_WR;
> > +       pteprot |= large ? ARM_SHORT_PTE_TYPE_LARGE :
> > +                               ARM_SHORT_PTE_TYPE_SMALL;
> > +
> > +       /* large page to small page pte prot. Only large page may split */
> > +       if (!pgdprot && !large) {
> 
> It's slightly complicated having these two variables controlling the
> behaviour of the split. In reality, we're either splitting a section or
> a large page, so there are three valid combinations.
> 
> It might be simpler to operate on IOMMU_{READ,WRITE,NOEXEC,CACHE} as
> much as possible, and then have some simple functions to encode/decode
> these into section/large/small page prot bits. We could then just pass
> the IOMMU_* prot around along with the map size. What do you think?

It will be more simple if IOMMU_{READ,WRITE,NOEXEC,CACHE} prot can be
used here. But we cann't get IOMMU_x prot while split in unmap. is it
right?
we can only get the prot from the pagetable, then restructure the new
prot after split.

> 
> > +               pteprot |= pteprot_large & ~ARM_SHORT_PTE_SMALL_MSK;
> > +               if (pteprot_large & ARM_SHORT_PTE_LARGE_XN)
> > +                       pteprot |= ARM_SHORT_PTE_SMALL_XN;
> > +       }
> > +
> > +       /* section to pte prot */
> > +       if (pgdprot & ARM_SHORT_PGD_C)
> > +               pteprot |= ARM_SHORT_PTE_C;
> > +       if (pgdprot & ARM_SHORT_PGD_B)
> > +               pteprot |= ARM_SHORT_PTE_B;
> > +       if (pgdprot & ARM_SHORT_PGD_nG)
> > +               pteprot |= ARM_SHORT_PTE_nG;
> > +       if (pgdprot & ARM_SHORT_PGD_SECTION_XN)
> > +               pteprot |= large ? ARM_SHORT_PTE_LARGE_XN :
> > +                               ARM_SHORT_PTE_SMALL_XN;
> > +       if (pgdprot & ARM_SHORT_PGD_RD_WR)
> > +               pteprot |= ARM_SHORT_PTE_RD_WR;
> > +       if (pgdprot & ARM_SHORT_PGD_RDONLY)
> > +               pteprot |= ARM_SHORT_PTE_RDONLY;
> > +
> > +       return pteprot;
> > +}
> > +
> > +static int
> > +_arm_short_map(struct arm_short_io_pgtable *data,
> > +              unsigned int iova, phys_addr_t paddr,
> > +              arm_short_iopte pgdprot, arm_short_iopte pteprot,
> > +              bool large)
> > +{
> > +       struct io_pgtable_cfg *cfg = &data->iop.cfg;
> > +       arm_short_iopte *pgd = data->pgd, *pte;
> > +       void *pte_new = NULL;
> > +       int ret;
> > +
> > +       pgd += ARM_SHORT_PGD_IDX(iova);
> > +
> > +       if (!pteprot) { /* section or supersection */
> > +               pte = pgd;
> > +               pteprot = pgdprot;
> > +       } else {        /* page or largepage */
> > +               if (!(*pgd)) {
> > +                       pte_new = __arm_short_alloc_pgtable(
> > +                                       ARM_SHORT_BYTES_PER_PTE,
> > +                                       GFP_ATOMIC, false, cfg);
> > +                       if (unlikely(!pte_new))
> > +                               return -ENOMEM;
> > +
> > +                       pgdprot |= virt_to_phys(pte_new);
> > +                       __arm_short_set_pte(pgd, pgdprot, 1, cfg);
> > +               }
> > +               pte = arm_short_get_pte_in_pgd(*pgd, iova);
> > +       }
> > +
> > +       pteprot |= (arm_short_iopte)paddr;
> > +       ret = __arm_short_set_pte(pte, pteprot, large ? 16 : 1, cfg);
> > +       if (ret && pte_new)
> > +               __arm_short_free_pgtable(pte_new, ARM_SHORT_BYTES_PER_PTE,
> > +                                        false, cfg);
> 
> Don't you need to kill the pgd entry before freeing this? Please see my
> previous comments about safely freeing page tables:
> 
>   http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/358268.html
> 
> (at the end of the post)


 I will add like this:
//======================
  if (ret && pte_new)
        goto err_unmap_pgd:

  if (data->iop.cfg.quirk & IO_PGTABLE_QUIRK_TLBI_ON_MAP) {
        tlb->tlb_add_flush(iova, size, true, data->iop.cookie);
        tlb->tlb_sync(data->iop.cookie);
   }
   return ret;

err_unmap_pgd:
           __arm_short_set_pte(pgd, 0, 1, cfg);
           tlb->tlb_add_flush(iova, SZ_1M, true, data->iop.cookie); /*
the size is 1M, the whole pgd */
           tlb->tlb_sync(data->iop.cookie); 
           __arm_short_free_pgtable(pte_new, ARM_SHORT_BYTES_PER_PTE,
                                   false, cfg);
   return ret;
//======================

Here I move the TLBI_ON_MAP quirk into _arm_short_map, then the map from
split also could flush-tlb if it's necessary.

> > +       return ret;
> > +}
> > +
[...]
> > +static bool _arm_short_whether_free_pgtable(arm_short_iopte *pgd)
> > +{
> 
> _arm_short_pgtable_empty might be a better name.

Thanks.

> 
> > +       arm_short_iopte *pte;
> > +       int i;
> > +
> > +       pte = ARM_SHORT_GET_PGTABLE_VA(*pgd);
> > +       for (i = 0; i < ARM_SHORT_PTRS_PER_PTE; i++) {
> > +               if (pte[i] != 0)
> > +                       return false;
> > +       }
> > +
> > +       return true;
> > +}
> > +
> > +static int
> > +arm_short_split_blk_unmap(struct io_pgtable_ops *ops, unsigned int iova,
> > +                         phys_addr_t paddr, size_t size,
> > +                         arm_short_iopte pgdprotup, arm_short_iopte pteprotup,
> > +                         size_t blk_size)
> > +{
> > +       struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
> > +       const struct iommu_gather_ops *tlb = data->iop.cfg.tlb;
> > +       struct io_pgtable_cfg *cfg = &data->iop.cfg;
> > +       unsigned long *pgbitmap = &cfg->pgsize_bitmap;
> > +       unsigned int blk_base, blk_start, blk_end, i;
> > +       arm_short_iopte pgdprot, pteprot;
> > +       phys_addr_t blk_paddr;
> > +       size_t mapsize = 0, nextmapsize;
> > +       int ret;
> > +
> > +       /* find the nearest mapsize */
> > +       for (i = find_first_bit(pgbitmap, BITS_PER_LONG);
> > +            i < BITS_PER_LONG && ((1 << i) < blk_size) &&
> > +            IS_ALIGNED(size, 1 << i);
> > +            i = find_next_bit(pgbitmap, BITS_PER_LONG, i + 1))
> > +               mapsize = 1 << i;
> > +
> > +       if (WARN_ON(!mapsize))
> > +               return 0; /* Bytes unmapped */
> > +       nextmapsize = 1 << i;
> > +
> > +       blk_base = iova & ~(blk_size - 1);
> > +       blk_start = blk_base;
> > +       blk_end = blk_start + blk_size;
> > +       blk_paddr = paddr;
> > +
> > +       for (; blk_start < blk_end;
> > +            blk_start += mapsize, blk_paddr += mapsize) {
> > +               /* Unmap! */
> > +               if (blk_start == iova)
> > +                       continue;
> > +
> > +               /* Try to upper map */
> > +               if (blk_base != blk_start &&
> > +                   IS_ALIGNED(blk_start | blk_paddr, nextmapsize) &&
> > +                   mapsize != nextmapsize) {
> > +                       mapsize = nextmapsize;
> > +                       i = find_next_bit(pgbitmap, BITS_PER_LONG, i + 1);
> > +                       if (i < BITS_PER_LONG)
> > +                               nextmapsize = 1 << i;
> > +               }
> > +
> > +               if (mapsize == SZ_1M) {
> 
> How do we get here with a mapsize of 1M?

About the split, there may be some cases:

super section may split to section, or large page, or small page.
section may split to large page, or small page.
large page may split to small page.

How do we get here with a mapsize of 1M?
->the mapsize will be 1M while supersection split to section.
  If we run the self-test, we can get the mapsize's change.

> 
> > +                       pgdprot = pgdprotup;
> > +                       pgdprot |= __arm_short_pgd_prot(data, 0, false);

    Here I cann't get IOMMU_{READ,WRITE,NOEXEC,CACHE}, I have to use 0
as the second parameter(some bits like PGD_B/PGD_C have been record in
pgdprotup)

> > +                       pteprot = 0;
> > +               } else { /* small or large page */
> > +                       pgdprot = (blk_size == SZ_64K) ? 0 : pgdprotup;
> > +                       pteprot = __arm_short_pte_prot_split(
> > +                                       data, pgdprot, pteprotup,
> > +                                       mapsize == SZ_64K);
> > +                       pgdprot = __arm_short_pgtable_prot(data);
> > +               }
> > +
> > +               ret = _arm_short_map(data, blk_start, blk_paddr, pgdprot,
> > +                                    pteprot, mapsize == SZ_64K);
> > +               if (ret < 0) {
> > +                       /* Free the table we allocated */
> > +                       arm_short_iopte *pgd = data->pgd, *pte;
> > +
> > +                       pgd += ARM_SHORT_PGD_IDX(blk_base);
> > +                       if (*pgd) {
> > +                               pte = ARM_SHORT_GET_PGTABLE_VA(*pgd);
> > +                               __arm_short_set_pte(pgd, 0, 1, cfg);
> > +                               tlb->tlb_add_flush(blk_base, blk_size, true,
> > +                                                  data->iop.cookie);
> > +                               tlb->tlb_sync(data->iop.cookie);
> > +                               __arm_short_free_pgtable(
> > +                                       pte, ARM_SHORT_BYTES_PER_PTE,
> > +                                       false, cfg);
> 
> This looks wrong. _arm_short_map cleans up if it returns non-zero already.

...
YES. It seems that I can delete the "if" for the error case.

if (ret < 0)
	return 0;

> 
> > +                       }
> > +                       return 0;/* Bytes unmapped */
> > +               }
> > +       }
> > +
> > +       tlb->tlb_add_flush(blk_base, blk_size, true, data->iop.cookie);
> > +       tlb->tlb_sync(data->iop.cookie);
> 
> Why are you syncing here? You can postpone this to the caller, if it turns
> out the unmap was a success.

I only saw that there is a tlb_add_flush in arm_lpae_split_blk_unmap. so
add it here.
About this, I think that we can delete the tlb-flush here. see below.

> 
> > +       return size;
> > +}
> > +
> > +static int arm_short_unmap(struct io_pgtable_ops *ops,
> > +                          unsigned long iova,
> > +                          size_t size)
> > +{
> > +       struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
> > +       struct io_pgtable_cfg *cfg = &data->iop.cfg;
> > +       arm_short_iopte *pgd, *pte = NULL;
> > +       arm_short_iopte curpgd, curpte = 0;
> > +       phys_addr_t paddr;
> > +       unsigned int iova_base, blk_size = 0;
> > +       void *cookie = data->iop.cookie;
> > +       bool pgtablefree = false;
> > +
> > +       pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova);
> > +
> > +       /* Get block size */
> > +       if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(*pgd)) {
> > +               pte = arm_short_get_pte_in_pgd(*pgd, iova);
> > +
> > +               if (ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(*pte))
> > +                       blk_size = SZ_4K;
> > +               else if (ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(*pte))
> > +                       blk_size = SZ_64K;
> > +               else
> > +                       WARN_ON(1);

> > +       } else if (ARM_SHORT_PGD_TYPE_IS_SECTION(*pgd)) {
> > +               blk_size = SZ_1M;
> > +       } else if (ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(*pgd)) {
> > +               blk_size = SZ_16M;
> > +       } else {
> > +               WARN_ON(1);
> 
> Maybe return 0 or something instead of falling through with blk_size == 0?

how about :
//=====
        if (WARN_ON(blk_size == 0))
		return 0;
//=====
 return 0 and report the error information.

> 
> > +       }
> > +
> > +       iova_base = iova & ~(blk_size - 1);
> > +       pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova_base);
> > +       paddr = arm_short_iova_to_phys(ops, iova_base);
> > +       curpgd = *pgd;
> > +
> > +       if (blk_size == SZ_4K || blk_size == SZ_64K) {
> > +               pte = arm_short_get_pte_in_pgd(*pgd, iova_base);
> > +               curpte = *pte;
> > +               __arm_short_set_pte(pte, 0, blk_size / SZ_4K, cfg);
> > +
> > +            pgtablefree = _arm_short_whether_free_pgtable(pgd);
> > +            if (pgtablefree)
> > +(1)(2)                 __arm_short_set_pte(pgd, 0, 1, cfg);
> > +       } else if (blk_size == SZ_1M || blk_size == SZ_16M) {
> > +               __arm_short_set_pte(pgd, 0, blk_size / SZ_1M, cfg);
> > +       }
> > +
> > +(3)    cfg->tlb->tlb_add_flush(iova_base, blk_size, true, cookie);
> > +(4)    cfg->tlb->tlb_sync(cookie);
> > +
> > +       if (pgtablefree)/* Free pgtable after tlb-flush */
> > +(5)              __arm_short_free_pgtable(ARM_SHORT_GET_PGTABLE_VA(curpgd),
> > +                                        ARM_SHORT_BYTES_PER_PTE, false, cfg);
> 
> Curious, but why do you care about freeing this on unmap? It will get
> freed when the page table itself is freed anyway (via the ->free callback).

This is free the level2 pagetable while there is no item in it. It isn't
level1 pagetable(->free callback).

The flow of free pagetable is following your suggestion that 5 steps
what I mark above like (1)..(5). so I have to move free_pgtable after
(4)tlb_sync. and add a comment /* Free pgtable after tlb-flush */.
the comment may should be changed to : /* Free level2 pgtable after
tlb-flush */

> > +
> > +       if (blk_size > size) { /* Split the block */
> > +               return arm_short_split_blk_unmap(
> > +                               ops, iova, paddr, size,
> > +                               ARM_SHORT_PGD_GET_PROT(curpgd),
> > +                               ARM_SHORT_PTE_LARGE_GET_PROT(curpte),
> > +                               blk_size);

    About add flush-tlb after split. There is a flush-tlb before. And
the map in split is from invalid to valid. It don't need flush-tlb
again.

> > +       } else if (blk_size < size) {
> > +               /* Unmap the block while remap partial again after split */
> > +               return blk_size +
> > +                       arm_short_unmap(ops, iova + blk_size, size - blk_size);
> > +       }
> > +
> > +       return size;
> > +}
> > +
> > +static struct io_pgtable *
> > +arm_short_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
> > +{
> > +       struct arm_short_io_pgtable *data;
> > +
> > +       if (cfg->ias > 32 || cfg->oas > 32)
> > +               return NULL;
> > +
> > +       cfg->pgsize_bitmap &=
> > +               (cfg->quirks & IO_PGTABLE_QUIRK_SHORT_SUPERSECTION) ?
> > +               (SZ_4K | SZ_64K | SZ_1M | SZ_16M) : (SZ_4K | SZ_64K | SZ_1M);
> > +
> > +       data = kzalloc(sizeof(*data), GFP_KERNEL);
> > +       if (!data)
> > +               return NULL;
> > +
> > +       data->pgd_size = SZ_16K;
> > +       data->pgd = __arm_short_alloc_pgtable(
> > +                                       data->pgd_size,
> > +                                       GFP_KERNEL | __GFP_ZERO | __GFP_DMA,
> > +                                       true, cfg);
> > +       if (!data->pgd)
> > +               goto out_free_data;
> > +       wmb();/* Ensure the empty pgd is visible before any actual TTBR write */
> > +
> > +       data->pgtable_cached = kmem_cache_create(
> > +                                       "io-pgtable-arm-short",
> > +                                        ARM_SHORT_BYTES_PER_PTE,
> > +                                        ARM_SHORT_BYTES_PER_PTE,
> > +                                        0, NULL);

I prepare to add SLAB_CACHE_DMA to guarantee the level2 pagetable base
address(pa) is alway 32bit(not over 4GB).

> > +       if (!data->pgtable_cached)
> > +               goto out_free_pgd;
> > +
> > +       /* TTBRs */
> > +       cfg->arm_short_cfg.ttbr[0] = virt_to_phys(data->pgd);
> > +       cfg->arm_short_cfg.ttbr[1] = 0;
> > +       cfg->arm_short_cfg.tcr = 0;
> > +       cfg->arm_short_cfg.nmrr = 0;
> > +       cfg->arm_short_cfg.prrr = 0;

           About SCTLR, How about we add it here:
          //===========
          cfg->arm_short_cfg.sctlr = 0; /* The iommu user should
configure IOMMU_{READ/WRITE} */
          //===========
           Is the comment ok?
> > +
> > +       data->iop.ops = (struct io_pgtable_ops) {
> > +               .map            = arm_short_map,
> > +               .unmap          = arm_short_unmap,
> > +               .iova_to_phys   = arm_short_iova_to_phys,
> > +       };
> > +
> > +       return &data->iop;
> > +
> > +out_free_pgd:
> > +       __arm_short_free_pgtable(data->pgd, data->pgd_size, true, cfg);
> > +out_free_data:
> > +       kfree(data);
> > +       return NULL;
> > +}
> > +
> [...]
> 
> > diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
> > index 6436fe2..14a9b3a 100644
> > --- a/drivers/iommu/io-pgtable.c
> > +++ b/drivers/iommu/io-pgtable.c
> > @@ -28,6 +28,7 @@ extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns;
> >  extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns;
> >  extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns;
> >  extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns;
> > +extern struct io_pgtable_init_fns io_pgtable_arm_short_init_fns;
> > 
> >  static const struct io_pgtable_init_fns *
> >  io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] =
> > @@ -38,6 +39,9 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] =
> >         [ARM_64_LPAE_S1] = &io_pgtable_arm_64_lpae_s1_init_fns,
> >         [ARM_64_LPAE_S2] = &io_pgtable_arm_64_lpae_s2_init_fns,
> >  #endif
> > +#ifdef CONFIG_IOMMU_IO_PGTABLE_SHORT
> > +       [ARM_SHORT_DESC] = &io_pgtable_arm_short_init_fns,
> > +#endif
> >  };
> > 
> >  struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
> > diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
> > index 68c63d9..0f45e60 100644
> > --- a/drivers/iommu/io-pgtable.h
> > +++ b/drivers/iommu/io-pgtable.h
> > @@ -9,6 +9,7 @@ enum io_pgtable_fmt {
> >         ARM_32_LPAE_S2,
> >         ARM_64_LPAE_S1,
> >         ARM_64_LPAE_S2,
> > +       ARM_SHORT_DESC,
> >         IO_PGTABLE_NUM_FMTS,
> >  };
> > 
> > @@ -45,6 +46,9 @@ struct iommu_gather_ops {
> >   */
> >  struct io_pgtable_cfg {
> >         #define IO_PGTABLE_QUIRK_ARM_NS (1 << 0)        /* Set NS bit in PTEs */
> > +       #define IO_PGTABLE_QUIRK_SHORT_SUPERSECTION     BIT(1)
> > +       #define IO_PGTABLE_QUIRK_SHORT_NO_XN            BIT(2) /* No XN bit */
> > +       #define IO_PGTABLE_QUIRK_SHORT_NO_PERMS         BIT(3) /* No AP bit */
> 
> Why have two quirks for this? I suggested included NO_XN in NO_PERMS:
> 
>   http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/361160.html

Sorry. I will change like this in next time:

#define IO_PGTABLE_QUIRK_NO_PERMS            BIT(1) /* No XN/AP bits */
#define IO_PGTABLE_QUIRK_TLBI_ON_MAP         BIT(2) /* TLB-Flush after
map */
#define IO_PGTABLE_QUIRK_SHORT_SUPERSECTION  BIT(3)

Do I need change (1 << 0) to BIT(0) in ARM_NS?

> 
> >         int                             quirks;
> >         unsigned long                   pgsize_bitmap;
> >         unsigned int                    ias;
> > @@ -64,6 +68,13 @@ struct io_pgtable_cfg {
> >                         u64     vttbr;
> >                         u64     vtcr;
> >                 } arm_lpae_s2_cfg;
> > +
> > +               struct {
> > +                       u32     ttbr[2];
> > +                       u32     tcr;
> > +                       u32     nmrr;
> > +                       u32     prrr;
> > +               } arm_short_cfg;
> 
> We don't return an SCTLR value here, so a comment somewhere saying that
> access flag is not supported would be helpful (so that drivers can ensure
> that they configure things for the AP[2:0] permission model).

Do we need add SCTLR? like: 
         struct {
                 u32     ttbr[2];
                 u32     tcr;
                 u32     nmrr;
                 u32     prrr;
+                u32     sctlr;
          } arm_short_cfg;

  Or only add a comment in the place where ttbr,tcr is set?
  /* The iommu user should configure IOMMU_{READ/WRITE} since SCTLR
isn't implemented */
> 
> Will


--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
@ 2015-09-17 14:54       ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-09-17 14:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2015-09-16 at 16:58 +0100, Will Deacon wrote:
> On Mon, Aug 03, 2015 at 11:21:16AM +0100, Yong Wu wrote:
> > This patch is for ARM Short Descriptor Format.
> > 
> > Signed-off-by: Yong Wu <yong.wu@mediatek.com>
> > ---
> >  drivers/iommu/Kconfig                |  18 +
> >  drivers/iommu/Makefile               |   1 +
> >  drivers/iommu/io-pgtable-arm-short.c | 813 +++++++++++++++++++++++++++++++++++
> >  drivers/iommu/io-pgtable-arm.c       |   3 -
> >  drivers/iommu/io-pgtable.c           |   4 +
> >  drivers/iommu/io-pgtable.h           |  14 +
> >  6 files changed, 850 insertions(+), 3 deletions(-)
> >  create mode 100644 drivers/iommu/io-pgtable-arm-short.c
> > 
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index f1fb1d3..3abd066 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -39,6 +39,24 @@ config IOMMU_IO_PGTABLE_LPAE_SELFTEST
> > 
> >           If unsure, say N here.
> > 
> > +config IOMMU_IO_PGTABLE_SHORT
> > +       bool "ARMv7/v8 Short Descriptor Format"
> > +       select IOMMU_IO_PGTABLE
> > +       depends on ARM || ARM64 || COMPILE_TEST
> > +       help
> > +         Enable support for the ARM Short-descriptor pagetable format.
> > +         This allocator supports 2 levels translation tables which supports
> 
> Some minor rewording here:
> 
> "...2 levels of translation tables, which enables a 32-bit memory map based
>  on..."

Hi Will,
    OK.Thanks very much for your review so detail every time.

> 
> > +         a memory map based on memory sections or pages.
> > +
> > +config IOMMU_IO_PGTABLE_SHORT_SELFTEST
> > +       bool "Short Descriptor selftests"
> > +       depends on IOMMU_IO_PGTABLE_SHORT
> > +       help
> > +         Enable self-tests for Short-descriptor page table allocator.
> > +         This performs a series of page-table consistency checks during boot.
> > +
> > +         If unsure, say N here.
> > +
> >  endmenu
> > 
> >  config IOMMU_IOVA
> 
> [...]
> > +#define ARM_SHORT_PGD_PGTABLE_MSK              0xfffffc00
> 
> You could use (~(ARM_SHORT_BYTES_PER_PTE - 1)), I think.

Yes. Thanks.

> > +/* level 2 pagetable */
> > +#define ARM_SHORT_PTE_TYPE_LARGE               BIT(0)
> > +#define ARM_SHORT_PTE_SMALL_XN                 BIT(0)
> > +#define ARM_SHORT_PTE_TYPE_SMALL               BIT(1)
> > +#define ARM_SHORT_PTE_B                                BIT(2)
> > +#define ARM_SHORT_PTE_C                                BIT(3)
> > +#define ARM_SHORT_PTE_RD_WR                    (3 << 4)
> > +#define ARM_SHORT_PTE_RDONLY                   BIT(9)
> > +#define ARM_SHORT_PTE_S                                BIT(10)
> > +#define ARM_SHORT_PTE_nG                       BIT(11)
> > +#define ARM_SHORT_PTE_LARGE_XN                 BIT(15)
> > +#define ARM_SHORT_PTE_LARGE_MSK                        (~(SZ_64K - 1))
> > +#define ARM_SHORT_PTE_SMALL_MSK                        (~(SZ_4K - 1))
> > +#define ARM_SHORT_PTE_TYPE_MSK                 \
> > +       (ARM_SHORT_PTE_TYPE_LARGE | ARM_SHORT_PTE_TYPE_SMALL)
> > +#define ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(pte)   \
> > +       (((pte) & ARM_SHORT_PTE_TYPE_SMALL) == ARM_SHORT_PTE_TYPE_SMALL)
> 
> Maybe a comment here, because it's confusing that you don't and with the
> mask due to XN.

I will add a comment like : 
/* The bit0 in small page is XN */

> > +#define ARM_SHORT_PTE_LARGE_GET_PROT(pte)      \
> > +       (((pte) & (~ARM_SHORT_PTE_LARGE_MSK)) & ~ARM_SHORT_PTE_TYPE_MSK)
> 
> AFAICT, the only user of this also does an '& ~ARM_SHORT_PTE_SMALL_MSK'.
> Wouldn't it be better to define ARM_SHORT_PTE_GET_PROT, which just returns
> the AP bits? That said, what are you going to do about XN? I know you
> don't support it in your hardware, but this could code should still do
> the right thing.

I'm a little confuse here: rename to ARM_SHORT_PTE_GET_PROT which just
return the AP bits? like this :
//=====
#define ARM_SHORT_PTE_GET_PROT(pte) \
(((pte) & (~ARM_SHORT_PTE_SMALL_MSK)) & ~ARM_SHORT_PTE_TYPE_MSK)
//=====

This macro is only used to get the prot of large page while split.

If it only return AP bits, then how about PXN,TEX[2:0] in large page?
(we need transform PXN in large-page to XN in small-page while split)

how about add a comment like below:
//=====
/* Get the prot of large page for split */
#define ARM_SHORT_PTE_LARGE_GET_PROT(pte)      \
   (((pte) & (~ARM_SHORT_PTE_LARGE_MSK)) & ~ARM_SHORT_PTE_TYPE_MSK)
//=====
or rename it ARM_SHORT_PTE_GET_PROT_SPLIT?

> 
> > +static int
> > +__arm_short_set_pte(arm_short_iopte *ptep, arm_short_iopte pte,
> > +                   unsigned int ptenr, struct io_pgtable_cfg *cfg)
> > +{
> > +       struct device *dev = cfg->iommu_dev;
> > +       int i;
> > +
> > +       for (i = 0; i < ptenr; i++) {
> > +               if (ptep[i] && pte) {
> > +                       /* Someone else may have allocated for this pte */
> > +                       WARN_ON(!selftest_running);
> > +                       goto err_exist_pte;
> > +               }
> > +               ptep[i] = pte;
> > +       }
> > +
> > +       if (selftest_running)
> > +               return 0;
> > +
> > +       dma_sync_single_for_device(dev, __arm_short_dma_addr(dev, ptep),
> > +                                  sizeof(*ptep) * ptenr, DMA_TO_DEVICE);
> > +       return 0;
> > +
> > +err_exist_pte:
> > +       while (i--)
> > +               ptep[i] = 0;
> 
> What about a dma_sync for the failure case?

I will add it.

> 
> > +       return -EEXIST;
> > +}
> > +
> > +static void *
> > +__arm_short_alloc_pgtable(size_t size, gfp_t gfp, bool pgd,
> > +                         struct io_pgtable_cfg *cfg)
> > +{
> > +       struct arm_short_io_pgtable *data;
> > +       struct device *dev = cfg->iommu_dev;
> > +       dma_addr_t dma;
> > +       void *va;
> > +
> > +       if (pgd) {/* lvl1 pagetable */
> > +               va = alloc_pages_exact(size, gfp);
> > +       } else {  /* lvl2 pagetable */
> > +               data = io_pgtable_cfg_to_data(cfg);
> > +               va = kmem_cache_zalloc(data->pgtable_cached, gfp);
> > +       }
> > +
> > +       if (!va)
> > +               return NULL;
> > +
> > +       if (selftest_running)
> > +               return va;
> > +
> > +       dma = dma_map_single(dev, va, size, DMA_TO_DEVICE);
> > +       if (dma_mapping_error(dev, dma))
> > +               goto out_free;
> > +
> > +       if (dma != __arm_short_dma_addr(dev, va))
> > +               goto out_unmap;
> > +
> > +       if (!pgd) {
> > +               kmemleak_ignore(va);
> > +               dma_sync_single_for_device(dev, __arm_short_dma_addr(dev, va),
> > +                                          size, DMA_TO_DEVICE);
> 
> Why do you need to do this as well as the dma_map_single above?

It's redundance, I will delete it...

> 
> > +       }
> > +
> > +       return va;
> > +
> > +out_unmap:
> > +       dev_err_ratelimited(dev, "Cannot accommodate DMA translation for IOMMU page tables\n");
> > +       dma_unmap_single(dev, dma, size, DMA_TO_DEVICE);
> > +out_free:
> > +       if (pgd)
> > +               free_pages_exact(va, size);
> > +       else
> > +               kmem_cache_free(data->pgtable_cached, va);
> > +       return NULL;
> > +}
> > +
> > +static void
> > +__arm_short_free_pgtable(void *va, size_t size, bool pgd,
> > +                        struct io_pgtable_cfg *cfg)
> > +{
> > +       struct arm_short_io_pgtable *data = io_pgtable_cfg_to_data(cfg);
> > +       struct device *dev = cfg->iommu_dev;
> > +
> > +       if (!selftest_running)
> > +               dma_unmap_single(dev, __arm_short_dma_addr(dev, va),
> > +                                size, DMA_TO_DEVICE);
> > +
> > +       if (pgd)
> > +               free_pages_exact(va, size);
> > +       else
> > +               kmem_cache_free(data->pgtable_cached, va);
> > +}
> > +
> > +static arm_short_iopte
> > +__arm_short_pte_prot(struct arm_short_io_pgtable *data, int prot, bool large)
> > +{
> > +       arm_short_iopte pteprot;
> > +       int quirk = data->iop.cfg.quirks;
> > +
> > +       pteprot = ARM_SHORT_PTE_S | ARM_SHORT_PTE_nG;
> > +       pteprot |= large ? ARM_SHORT_PTE_TYPE_LARGE :
> > +                               ARM_SHORT_PTE_TYPE_SMALL;
> > +       if (prot & IOMMU_CACHE)
> > +               pteprot |=  ARM_SHORT_PTE_B | ARM_SHORT_PTE_C;
> > +       if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_XN) && (prot & IOMMU_NOEXEC)) {
> > +                       pteprot |= large ? ARM_SHORT_PTE_LARGE_XN :
> > +                               ARM_SHORT_PTE_SMALL_XN;
> 
> Weird indentation, man. Also, see my later comment about combining NO_XN
> with NO_PERMS (the latter subsumes the first)

Sorry, I misunderstanded about the quirk, I will use NO_PERMS which
contain NO_XN.

> 
> > +       }
> > +       if (!(quirk & IO_PGTABLE_QUIRK_SHORT_NO_PERMS)) {
> > +               pteprot |= ARM_SHORT_PTE_RD_WR;
> > +               if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ))
> > +                       pteprot |= ARM_SHORT_PTE_RDONLY;
> > +       }
> > +       return pteprot;
> > +}
> > +
[...]
> > +static arm_short_iopte
> > +__arm_short_pte_prot_split(struct arm_short_io_pgtable *data,
> > +                          arm_short_iopte pgdprot,
> > +                          arm_short_iopte pteprot_large,
> > +                          bool large)
> > +{
> > +       arm_short_iopte pteprot = 0;
> > +
> > +       pteprot = ARM_SHORT_PTE_S | ARM_SHORT_PTE_nG | ARM_SHORT_PTE_RD_WR;
> > +       pteprot |= large ? ARM_SHORT_PTE_TYPE_LARGE :
> > +                               ARM_SHORT_PTE_TYPE_SMALL;
> > +
> > +       /* large page to small page pte prot. Only large page may split */
> > +       if (!pgdprot && !large) {
> 
> It's slightly complicated having these two variables controlling the
> behaviour of the split. In reality, we're either splitting a section or
> a large page, so there are three valid combinations.
> 
> It might be simpler to operate on IOMMU_{READ,WRITE,NOEXEC,CACHE} as
> much as possible, and then have some simple functions to encode/decode
> these into section/large/small page prot bits. We could then just pass
> the IOMMU_* prot around along with the map size. What do you think?

It will be more simple if IOMMU_{READ,WRITE,NOEXEC,CACHE} prot can be
used here. But we cann't get IOMMU_x prot while split in unmap. is it
right?
we can only get the prot from the pagetable, then restructure the new
prot after split.

> 
> > +               pteprot |= pteprot_large & ~ARM_SHORT_PTE_SMALL_MSK;
> > +               if (pteprot_large & ARM_SHORT_PTE_LARGE_XN)
> > +                       pteprot |= ARM_SHORT_PTE_SMALL_XN;
> > +       }
> > +
> > +       /* section to pte prot */
> > +       if (pgdprot & ARM_SHORT_PGD_C)
> > +               pteprot |= ARM_SHORT_PTE_C;
> > +       if (pgdprot & ARM_SHORT_PGD_B)
> > +               pteprot |= ARM_SHORT_PTE_B;
> > +       if (pgdprot & ARM_SHORT_PGD_nG)
> > +               pteprot |= ARM_SHORT_PTE_nG;
> > +       if (pgdprot & ARM_SHORT_PGD_SECTION_XN)
> > +               pteprot |= large ? ARM_SHORT_PTE_LARGE_XN :
> > +                               ARM_SHORT_PTE_SMALL_XN;
> > +       if (pgdprot & ARM_SHORT_PGD_RD_WR)
> > +               pteprot |= ARM_SHORT_PTE_RD_WR;
> > +       if (pgdprot & ARM_SHORT_PGD_RDONLY)
> > +               pteprot |= ARM_SHORT_PTE_RDONLY;
> > +
> > +       return pteprot;
> > +}
> > +
> > +static int
> > +_arm_short_map(struct arm_short_io_pgtable *data,
> > +              unsigned int iova, phys_addr_t paddr,
> > +              arm_short_iopte pgdprot, arm_short_iopte pteprot,
> > +              bool large)
> > +{
> > +       struct io_pgtable_cfg *cfg = &data->iop.cfg;
> > +       arm_short_iopte *pgd = data->pgd, *pte;
> > +       void *pte_new = NULL;
> > +       int ret;
> > +
> > +       pgd += ARM_SHORT_PGD_IDX(iova);
> > +
> > +       if (!pteprot) { /* section or supersection */
> > +               pte = pgd;
> > +               pteprot = pgdprot;
> > +       } else {        /* page or largepage */
> > +               if (!(*pgd)) {
> > +                       pte_new = __arm_short_alloc_pgtable(
> > +                                       ARM_SHORT_BYTES_PER_PTE,
> > +                                       GFP_ATOMIC, false, cfg);
> > +                       if (unlikely(!pte_new))
> > +                               return -ENOMEM;
> > +
> > +                       pgdprot |= virt_to_phys(pte_new);
> > +                       __arm_short_set_pte(pgd, pgdprot, 1, cfg);
> > +               }
> > +               pte = arm_short_get_pte_in_pgd(*pgd, iova);
> > +       }
> > +
> > +       pteprot |= (arm_short_iopte)paddr;
> > +       ret = __arm_short_set_pte(pte, pteprot, large ? 16 : 1, cfg);
> > +       if (ret && pte_new)
> > +               __arm_short_free_pgtable(pte_new, ARM_SHORT_BYTES_PER_PTE,
> > +                                        false, cfg);
> 
> Don't you need to kill the pgd entry before freeing this? Please see my
> previous comments about safely freeing page tables:
> 
>   http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/358268.html
> 
> (at the end of the post)


 I will add like this:
//======================
  if (ret && pte_new)
        goto err_unmap_pgd:

  if (data->iop.cfg.quirk & IO_PGTABLE_QUIRK_TLBI_ON_MAP) {
        tlb->tlb_add_flush(iova, size, true, data->iop.cookie);
        tlb->tlb_sync(data->iop.cookie);
   }
   return ret;

err_unmap_pgd:
           __arm_short_set_pte(pgd, 0, 1, cfg);
           tlb->tlb_add_flush(iova, SZ_1M, true, data->iop.cookie); /*
the size is 1M, the whole pgd */
           tlb->tlb_sync(data->iop.cookie); 
           __arm_short_free_pgtable(pte_new, ARM_SHORT_BYTES_PER_PTE,
                                   false, cfg);
   return ret;
//======================

Here I move the TLBI_ON_MAP quirk into _arm_short_map, then the map from
split also could flush-tlb if it's necessary.

> > +       return ret;
> > +}
> > +
[...]
> > +static bool _arm_short_whether_free_pgtable(arm_short_iopte *pgd)
> > +{
> 
> _arm_short_pgtable_empty might be a better name.

Thanks.

> 
> > +       arm_short_iopte *pte;
> > +       int i;
> > +
> > +       pte = ARM_SHORT_GET_PGTABLE_VA(*pgd);
> > +       for (i = 0; i < ARM_SHORT_PTRS_PER_PTE; i++) {
> > +               if (pte[i] != 0)
> > +                       return false;
> > +       }
> > +
> > +       return true;
> > +}
> > +
> > +static int
> > +arm_short_split_blk_unmap(struct io_pgtable_ops *ops, unsigned int iova,
> > +                         phys_addr_t paddr, size_t size,
> > +                         arm_short_iopte pgdprotup, arm_short_iopte pteprotup,
> > +                         size_t blk_size)
> > +{
> > +       struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
> > +       const struct iommu_gather_ops *tlb = data->iop.cfg.tlb;
> > +       struct io_pgtable_cfg *cfg = &data->iop.cfg;
> > +       unsigned long *pgbitmap = &cfg->pgsize_bitmap;
> > +       unsigned int blk_base, blk_start, blk_end, i;
> > +       arm_short_iopte pgdprot, pteprot;
> > +       phys_addr_t blk_paddr;
> > +       size_t mapsize = 0, nextmapsize;
> > +       int ret;
> > +
> > +       /* find the nearest mapsize */
> > +       for (i = find_first_bit(pgbitmap, BITS_PER_LONG);
> > +            i < BITS_PER_LONG && ((1 << i) < blk_size) &&
> > +            IS_ALIGNED(size, 1 << i);
> > +            i = find_next_bit(pgbitmap, BITS_PER_LONG, i + 1))
> > +               mapsize = 1 << i;
> > +
> > +       if (WARN_ON(!mapsize))
> > +               return 0; /* Bytes unmapped */
> > +       nextmapsize = 1 << i;
> > +
> > +       blk_base = iova & ~(blk_size - 1);
> > +       blk_start = blk_base;
> > +       blk_end = blk_start + blk_size;
> > +       blk_paddr = paddr;
> > +
> > +       for (; blk_start < blk_end;
> > +            blk_start += mapsize, blk_paddr += mapsize) {
> > +               /* Unmap! */
> > +               if (blk_start == iova)
> > +                       continue;
> > +
> > +               /* Try to upper map */
> > +               if (blk_base != blk_start &&
> > +                   IS_ALIGNED(blk_start | blk_paddr, nextmapsize) &&
> > +                   mapsize != nextmapsize) {
> > +                       mapsize = nextmapsize;
> > +                       i = find_next_bit(pgbitmap, BITS_PER_LONG, i + 1);
> > +                       if (i < BITS_PER_LONG)
> > +                               nextmapsize = 1 << i;
> > +               }
> > +
> > +               if (mapsize == SZ_1M) {
> 
> How do we get here with a mapsize of 1M?

About the split, there may be some cases:

super section may split to section, or large page, or small page.
section may split to large page, or small page.
large page may split to small page.

How do we get here with a mapsize of 1M?
->the mapsize will be 1M while supersection split to section.
  If we run the self-test, we can get the mapsize's change.

> 
> > +                       pgdprot = pgdprotup;
> > +                       pgdprot |= __arm_short_pgd_prot(data, 0, false);

    Here I cann't get IOMMU_{READ,WRITE,NOEXEC,CACHE}, I have to use 0
as the second parameter(some bits like PGD_B/PGD_C have been record in
pgdprotup)

> > +                       pteprot = 0;
> > +               } else { /* small or large page */
> > +                       pgdprot = (blk_size == SZ_64K) ? 0 : pgdprotup;
> > +                       pteprot = __arm_short_pte_prot_split(
> > +                                       data, pgdprot, pteprotup,
> > +                                       mapsize == SZ_64K);
> > +                       pgdprot = __arm_short_pgtable_prot(data);
> > +               }
> > +
> > +               ret = _arm_short_map(data, blk_start, blk_paddr, pgdprot,
> > +                                    pteprot, mapsize == SZ_64K);
> > +               if (ret < 0) {
> > +                       /* Free the table we allocated */
> > +                       arm_short_iopte *pgd = data->pgd, *pte;
> > +
> > +                       pgd += ARM_SHORT_PGD_IDX(blk_base);
> > +                       if (*pgd) {
> > +                               pte = ARM_SHORT_GET_PGTABLE_VA(*pgd);
> > +                               __arm_short_set_pte(pgd, 0, 1, cfg);
> > +                               tlb->tlb_add_flush(blk_base, blk_size, true,
> > +                                                  data->iop.cookie);
> > +                               tlb->tlb_sync(data->iop.cookie);
> > +                               __arm_short_free_pgtable(
> > +                                       pte, ARM_SHORT_BYTES_PER_PTE,
> > +                                       false, cfg);
> 
> This looks wrong. _arm_short_map cleans up if it returns non-zero already.

...
YES. It seems that I can delete the "if" for the error case.

if (ret < 0)
	return 0;

> 
> > +                       }
> > +                       return 0;/* Bytes unmapped */
> > +               }
> > +       }
> > +
> > +       tlb->tlb_add_flush(blk_base, blk_size, true, data->iop.cookie);
> > +       tlb->tlb_sync(data->iop.cookie);
> 
> Why are you syncing here? You can postpone this to the caller, if it turns
> out the unmap was a success.

I only saw that there is a tlb_add_flush in arm_lpae_split_blk_unmap. so
add it here.
About this, I think that we can delete the tlb-flush here. see below.

> 
> > +       return size;
> > +}
> > +
> > +static int arm_short_unmap(struct io_pgtable_ops *ops,
> > +                          unsigned long iova,
> > +                          size_t size)
> > +{
> > +       struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
> > +       struct io_pgtable_cfg *cfg = &data->iop.cfg;
> > +       arm_short_iopte *pgd, *pte = NULL;
> > +       arm_short_iopte curpgd, curpte = 0;
> > +       phys_addr_t paddr;
> > +       unsigned int iova_base, blk_size = 0;
> > +       void *cookie = data->iop.cookie;
> > +       bool pgtablefree = false;
> > +
> > +       pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova);
> > +
> > +       /* Get block size */
> > +       if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(*pgd)) {
> > +               pte = arm_short_get_pte_in_pgd(*pgd, iova);
> > +
> > +               if (ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(*pte))
> > +                       blk_size = SZ_4K;
> > +               else if (ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(*pte))
> > +                       blk_size = SZ_64K;
> > +               else
> > +                       WARN_ON(1);

> > +       } else if (ARM_SHORT_PGD_TYPE_IS_SECTION(*pgd)) {
> > +               blk_size = SZ_1M;
> > +       } else if (ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(*pgd)) {
> > +               blk_size = SZ_16M;
> > +       } else {
> > +               WARN_ON(1);
> 
> Maybe return 0 or something instead of falling through with blk_size == 0?

how about :
//=====
        if (WARN_ON(blk_size == 0))
		return 0;
//=====
 return 0 and report the error information.

> 
> > +       }
> > +
> > +       iova_base = iova & ~(blk_size - 1);
> > +       pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova_base);
> > +       paddr = arm_short_iova_to_phys(ops, iova_base);
> > +       curpgd = *pgd;
> > +
> > +       if (blk_size == SZ_4K || blk_size == SZ_64K) {
> > +               pte = arm_short_get_pte_in_pgd(*pgd, iova_base);
> > +               curpte = *pte;
> > +               __arm_short_set_pte(pte, 0, blk_size / SZ_4K, cfg);
> > +
> > +            pgtablefree = _arm_short_whether_free_pgtable(pgd);
> > +            if (pgtablefree)
> > +(1)(2)                 __arm_short_set_pte(pgd, 0, 1, cfg);
> > +       } else if (blk_size == SZ_1M || blk_size == SZ_16M) {
> > +               __arm_short_set_pte(pgd, 0, blk_size / SZ_1M, cfg);
> > +       }
> > +
> > +(3)    cfg->tlb->tlb_add_flush(iova_base, blk_size, true, cookie);
> > +(4)    cfg->tlb->tlb_sync(cookie);
> > +
> > +       if (pgtablefree)/* Free pgtable after tlb-flush */
> > +(5)              __arm_short_free_pgtable(ARM_SHORT_GET_PGTABLE_VA(curpgd),
> > +                                        ARM_SHORT_BYTES_PER_PTE, false, cfg);
> 
> Curious, but why do you care about freeing this on unmap? It will get
> freed when the page table itself is freed anyway (via the ->free callback).

This is free the level2 pagetable while there is no item in it. It isn't
level1 pagetable(->free callback).

The flow of free pagetable is following your suggestion that 5 steps
what I mark above like (1)..(5). so I have to move free_pgtable after
(4)tlb_sync. and add a comment /* Free pgtable after tlb-flush */.
the comment may should be changed to : /* Free level2 pgtable after
tlb-flush */

> > +
> > +       if (blk_size > size) { /* Split the block */
> > +               return arm_short_split_blk_unmap(
> > +                               ops, iova, paddr, size,
> > +                               ARM_SHORT_PGD_GET_PROT(curpgd),
> > +                               ARM_SHORT_PTE_LARGE_GET_PROT(curpte),
> > +                               blk_size);

    About add flush-tlb after split. There is a flush-tlb before. And
the map in split is from invalid to valid. It don't need flush-tlb
again.

> > +       } else if (blk_size < size) {
> > +               /* Unmap the block while remap partial again after split */
> > +               return blk_size +
> > +                       arm_short_unmap(ops, iova + blk_size, size - blk_size);
> > +       }
> > +
> > +       return size;
> > +}
> > +
> > +static struct io_pgtable *
> > +arm_short_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
> > +{
> > +       struct arm_short_io_pgtable *data;
> > +
> > +       if (cfg->ias > 32 || cfg->oas > 32)
> > +               return NULL;
> > +
> > +       cfg->pgsize_bitmap &=
> > +               (cfg->quirks & IO_PGTABLE_QUIRK_SHORT_SUPERSECTION) ?
> > +               (SZ_4K | SZ_64K | SZ_1M | SZ_16M) : (SZ_4K | SZ_64K | SZ_1M);
> > +
> > +       data = kzalloc(sizeof(*data), GFP_KERNEL);
> > +       if (!data)
> > +               return NULL;
> > +
> > +       data->pgd_size = SZ_16K;
> > +       data->pgd = __arm_short_alloc_pgtable(
> > +                                       data->pgd_size,
> > +                                       GFP_KERNEL | __GFP_ZERO | __GFP_DMA,
> > +                                       true, cfg);
> > +       if (!data->pgd)
> > +               goto out_free_data;
> > +       wmb();/* Ensure the empty pgd is visible before any actual TTBR write */
> > +
> > +       data->pgtable_cached = kmem_cache_create(
> > +                                       "io-pgtable-arm-short",
> > +                                        ARM_SHORT_BYTES_PER_PTE,
> > +                                        ARM_SHORT_BYTES_PER_PTE,
> > +                                        0, NULL);

I prepare to add SLAB_CACHE_DMA to guarantee the level2 pagetable base
address(pa) is alway 32bit(not over 4GB).

> > +       if (!data->pgtable_cached)
> > +               goto out_free_pgd;
> > +
> > +       /* TTBRs */
> > +       cfg->arm_short_cfg.ttbr[0] = virt_to_phys(data->pgd);
> > +       cfg->arm_short_cfg.ttbr[1] = 0;
> > +       cfg->arm_short_cfg.tcr = 0;
> > +       cfg->arm_short_cfg.nmrr = 0;
> > +       cfg->arm_short_cfg.prrr = 0;

           About SCTLR, How about we add it here:
          //===========
          cfg->arm_short_cfg.sctlr = 0; /* The iommu user should
configure IOMMU_{READ/WRITE} */
          //===========
           Is the comment ok?
> > +
> > +       data->iop.ops = (struct io_pgtable_ops) {
> > +               .map            = arm_short_map,
> > +               .unmap          = arm_short_unmap,
> > +               .iova_to_phys   = arm_short_iova_to_phys,
> > +       };
> > +
> > +       return &data->iop;
> > +
> > +out_free_pgd:
> > +       __arm_short_free_pgtable(data->pgd, data->pgd_size, true, cfg);
> > +out_free_data:
> > +       kfree(data);
> > +       return NULL;
> > +}
> > +
> [...]
> 
> > diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
> > index 6436fe2..14a9b3a 100644
> > --- a/drivers/iommu/io-pgtable.c
> > +++ b/drivers/iommu/io-pgtable.c
> > @@ -28,6 +28,7 @@ extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns;
> >  extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns;
> >  extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns;
> >  extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns;
> > +extern struct io_pgtable_init_fns io_pgtable_arm_short_init_fns;
> > 
> >  static const struct io_pgtable_init_fns *
> >  io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] =
> > @@ -38,6 +39,9 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] =
> >         [ARM_64_LPAE_S1] = &io_pgtable_arm_64_lpae_s1_init_fns,
> >         [ARM_64_LPAE_S2] = &io_pgtable_arm_64_lpae_s2_init_fns,
> >  #endif
> > +#ifdef CONFIG_IOMMU_IO_PGTABLE_SHORT
> > +       [ARM_SHORT_DESC] = &io_pgtable_arm_short_init_fns,
> > +#endif
> >  };
> > 
> >  struct io_pgtable_ops *alloc_io_pgtable_ops(enum io_pgtable_fmt fmt,
> > diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
> > index 68c63d9..0f45e60 100644
> > --- a/drivers/iommu/io-pgtable.h
> > +++ b/drivers/iommu/io-pgtable.h
> > @@ -9,6 +9,7 @@ enum io_pgtable_fmt {
> >         ARM_32_LPAE_S2,
> >         ARM_64_LPAE_S1,
> >         ARM_64_LPAE_S2,
> > +       ARM_SHORT_DESC,
> >         IO_PGTABLE_NUM_FMTS,
> >  };
> > 
> > @@ -45,6 +46,9 @@ struct iommu_gather_ops {
> >   */
> >  struct io_pgtable_cfg {
> >         #define IO_PGTABLE_QUIRK_ARM_NS (1 << 0)        /* Set NS bit in PTEs */
> > +       #define IO_PGTABLE_QUIRK_SHORT_SUPERSECTION     BIT(1)
> > +       #define IO_PGTABLE_QUIRK_SHORT_NO_XN            BIT(2) /* No XN bit */
> > +       #define IO_PGTABLE_QUIRK_SHORT_NO_PERMS         BIT(3) /* No AP bit */
> 
> Why have two quirks for this? I suggested included NO_XN in NO_PERMS:
> 
>   http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/361160.html

Sorry. I will change like this in next time:

#define IO_PGTABLE_QUIRK_NO_PERMS            BIT(1) /* No XN/AP bits */
#define IO_PGTABLE_QUIRK_TLBI_ON_MAP         BIT(2) /* TLB-Flush after
map */
#define IO_PGTABLE_QUIRK_SHORT_SUPERSECTION  BIT(3)

Do I need change (1 << 0) to BIT(0) in ARM_NS?

> 
> >         int                             quirks;
> >         unsigned long                   pgsize_bitmap;
> >         unsigned int                    ias;
> > @@ -64,6 +68,13 @@ struct io_pgtable_cfg {
> >                         u64     vttbr;
> >                         u64     vtcr;
> >                 } arm_lpae_s2_cfg;
> > +
> > +               struct {
> > +                       u32     ttbr[2];
> > +                       u32     tcr;
> > +                       u32     nmrr;
> > +                       u32     prrr;
> > +               } arm_short_cfg;
> 
> We don't return an SCTLR value here, so a comment somewhere saying that
> access flag is not supported would be helpful (so that drivers can ensure
> that they configure things for the AP[2:0] permission model).

Do we need add SCTLR? like: 
         struct {
                 u32     ttbr[2];
                 u32     tcr;
                 u32     nmrr;
                 u32     prrr;
+                u32     sctlr;
          } arm_short_cfg;

  Or only add a comment in the place where ttbr,tcr is set?
  /* The iommu user should configure IOMMU_{READ/WRITE} since SCTLR
isn't implemented */
> 
> Will

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
  2015-09-17 14:54       ` Yong Wu
  (?)
@ 2015-09-22 14:12         ` Yong Wu
  -1 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-09-22 14:12 UTC (permalink / raw)
  To: Will Deacon, Robin Murphy
  Cc: Mark Rutland, Catalin Marinas, youhua.li, Thierry Reding,
	k.zhang, devicetree, arnd, Tomasz Figa, Rob Herring,
	linux-mediatek, Matthias Brugger, linux-arm-kernel, pebolle,
	frederic.chen, srv_heupstream, linux-kernel, iommu, Daniel Kurtz,
	Sasha Hauer, Lucas Stach

> > > +static int arm_short_unmap(struct io_pgtable_ops *ops,
> > > +                          unsigned long iova,
> > > +                          size_t size)
> > > +{
> > > +       struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
> > > +       struct io_pgtable_cfg *cfg = &data->iop.cfg;
> > > +       arm_short_iopte *pgd, *pte = NULL;
> > > +       arm_short_iopte curpgd, curpte = 0;
> > > +       phys_addr_t paddr;
> > > +       unsigned int iova_base, blk_size = 0;
> > > +       void *cookie = data->iop.cookie;
> > > +       bool pgtablefree = false;
> > > +
> > > +       pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova);
> > > +
> > > +       /* Get block size */
> > > +       if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(*pgd)) {
> > > +               pte = arm_short_get_pte_in_pgd(*pgd, iova);
> > > +
> > > +               if (ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(*pte))
> > > +                       blk_size = SZ_4K;
> > > +               else if (ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(*pte))
> > > +                       blk_size = SZ_64K;
> > > +               else
> > > +                       WARN_ON(1);
> 
> > > +       } else if (ARM_SHORT_PGD_TYPE_IS_SECTION(*pgd)) {
> > > +               blk_size = SZ_1M;
> > > +       } else if (ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(*pgd)) {
> > > +               blk_size = SZ_16M;
> > > +       } else {
> > > +               WARN_ON(1); 
> > > +       }
> > > +
> > > +       iova_base = iova & ~(blk_size - 1);
> > > +       pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova_base);
> > > +       paddr = arm_short_iova_to_phys(ops, iova_base);
> > > +       curpgd = *pgd;
> > > +
> > > +       if (blk_size == SZ_4K || blk_size == SZ_64K) {
> > > +               pte = arm_short_get_pte_in_pgd(*pgd, iova_base);
> > > +               curpte = *pte;
> > > +               __arm_short_set_pte(pte, 0, blk_size / SZ_4K, cfg);
> > > +
> > > +            pgtablefree = _arm_short_whether_free_pgtable(pgd);
> > > +            if (pgtablefree)
> > > +                 __arm_short_set_pte(pgd, 0, 1, cfg);
> > > +       } else if (blk_size == SZ_1M || blk_size == SZ_16M) {
> > > +               __arm_short_set_pte(pgd, 0, blk_size / SZ_1M, cfg);
> > > +       }
> > > +
> > > +    cfg->tlb->tlb_add_flush(iova_base, blk_size, true, cookie);
> > > +    cfg->tlb->tlb_sync(cookie);
> > > +
> > > +       if (pgtablefree)/* Free pgtable after tlb-flush */
> > > +              __arm_short_free_pgtable(ARM_SHORT_GET_PGTABLE_VA(curpgd),
> > > +                                        ARM_SHORT_BYTES_PER_PTE, false, cfg);
> > > +
> > > +       if (blk_size > size) { /* Split the block */
> > > +               return arm_short_split_blk_unmap(
> > > +                               ops, iova, paddr, size,
> > > +                               ARM_SHORT_PGD_GET_PROT(curpgd),
> > > +                               ARM_SHORT_PTE_LARGE_GET_PROT(curpte),
> > > +                               blk_size);
> > > +       } else if (blk_size < size) {
> > > +               /* Unmap the block while remap partial again after split */
> > > +               return blk_size +
> > > +                       arm_short_unmap(ops, iova + blk_size, size - blk_size);

Hi Will, Robin,
     I would like to show you a problem I met, The recursion here may
lead to stack overflow while we test FHD video decode.

    From the log, I get the internal variable in the error case: the
"size" is 0x100000, the "iova" is 0xfea00000, but at that time the
"blk_size" is 0x1000 as it was the map of small-page. so it enter the
recursion here.
    
    After check the unmap flow, there is only a iommu_unmap in
__iommu_dma_unmap, and it won't check the physical address align in
iommu_unmap.
so if the iova and size both are SZ_16M or SZ_1M align by chance, it
also enter the arm_short_unmap even though it was the map of small-page.

    So.
    a) Do we need unmap each sg item while iommu_dma_unmap?, like below:

//===============
static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t
dma_addr)
{
	/* ...and if we can't, then something is horribly, horribly wrong */
+       for_each_sg(sg, s, nents, i)
		BUG_ON(iommu_unmap(domain, pfn << shift, size) < size);
	__free_iova(iovad, iova);
}
//===============
 
    b). I need to add do-while which was suggested to delete from [1] in
arm_short_unmap for this case.

     After I test locally, I prepare to add the do-while like below:
     
//==============================
static int arm_short_unmap(struct io_pgtable_ops *ops,
			   unsigned long iova,
			   size_t size)
{
	struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
	struct io_pgtable_cfg *cfg = &data->iop.cfg;
	arm_short_iopte *pgd, *pte = NULL;
	arm_short_iopte curpgd, curpte = 0;
	unsigned int blk_base, blk_size;
	int unmap_size = 0;
	bool pgtempty;

	do {
		pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova);
		blk_size = 0;
		pgtempty = false;

		/* Get block size */
		if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(*pgd)) {
			pte = arm_short_get_pte_in_pgd(*pgd, iova);

			if (ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(*pte))
				blk_size = SZ_4K;
			else if (ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(*pte))
				blk_size = SZ_64K;
		} else if (ARM_SHORT_PGD_TYPE_IS_SECTION(*pgd)) {
			blk_size = SZ_1M;
		} else if (ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(*pgd)) {
			blk_size = SZ_16M;
		}

		if (WARN_ON(!blk_size))
			return 0;

		blk_base = iova & ~(blk_size - 1);
		pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(blk_base);
		curpgd = *pgd;

		if (blk_size == SZ_4K || blk_size == SZ_64K) {
			pte = arm_short_get_pte_in_pgd(*pgd, blk_base);
			curpte = *pte;
			__arm_short_set_pte(pte, 0, blk_size / SZ_4K, cfg);

			pgtempty = __arm_short_pgtable_empty(pgd);
			if (pgtempty)
				__arm_short_set_pte(pgd, 0, 1, cfg);
		} else if (blk_size == SZ_1M || blk_size == SZ_16M) {
			__arm_short_set_pte(pgd, 0, blk_size / SZ_1M, cfg);
		}

		cfg->tlb->tlb_add_flush(blk_base, blk_size, true, data->iop.cookie);
		cfg->tlb->tlb_sync(data->iop.cookie);

		if (pgtempty)/* Free lvl2 pgtable after tlb-flush */
			__arm_short_free_pgtable(
					ARM_SHORT_GET_PGTABLE_VA(curpgd),
					ARM_SHORT_BYTES_PER_PTE, false, cfg);

		/*
		 * If the unmap size which is from the pgsize_bitmap is more
		 * than the current blk_size, unmap it continuously.
		 */
		if (blk_size <= size) {
			iova += blk_size;
			size -= blk_size;
			unmap_size += blk_size;
			continue;
		} else { /* Split this block */
			return arm_short_split_blk_unmap(
					ops, iova, size, blk_size,
					ARM_SHORT_PGD_GET_PROT(curpgd),
					ARM_SHORT_PTE_GET_PROT_LARGE(curpte));
		}
	}while (size);

	return unmap_size;
}
//=============================

    Is there any other suggestion?
    Thanks very much.


[1]:http://lists.linuxfoundation.org/pipermail/iommu/2015-June/013322.html

> > > +       }
> > > +
> > > +       return size;
> > > +}
> > > +
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
@ 2015-09-22 14:12         ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-09-22 14:12 UTC (permalink / raw)
  To: Will Deacon, Robin Murphy
  Cc: Mark Rutland, Catalin Marinas, youhua.li, Thierry Reding,
	k.zhang, devicetree, arnd, Tomasz Figa, Rob Herring,
	linux-mediatek, Matthias Brugger, linux-arm-kernel, pebolle,
	frederic.chen, srv_heupstream, linux-kernel, iommu, Daniel

> > > +static int arm_short_unmap(struct io_pgtable_ops *ops,
> > > +                          unsigned long iova,
> > > +                          size_t size)
> > > +{
> > > +       struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
> > > +       struct io_pgtable_cfg *cfg = &data->iop.cfg;
> > > +       arm_short_iopte *pgd, *pte = NULL;
> > > +       arm_short_iopte curpgd, curpte = 0;
> > > +       phys_addr_t paddr;
> > > +       unsigned int iova_base, blk_size = 0;
> > > +       void *cookie = data->iop.cookie;
> > > +       bool pgtablefree = false;
> > > +
> > > +       pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova);
> > > +
> > > +       /* Get block size */
> > > +       if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(*pgd)) {
> > > +               pte = arm_short_get_pte_in_pgd(*pgd, iova);
> > > +
> > > +               if (ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(*pte))
> > > +                       blk_size = SZ_4K;
> > > +               else if (ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(*pte))
> > > +                       blk_size = SZ_64K;
> > > +               else
> > > +                       WARN_ON(1);
> 
> > > +       } else if (ARM_SHORT_PGD_TYPE_IS_SECTION(*pgd)) {
> > > +               blk_size = SZ_1M;
> > > +       } else if (ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(*pgd)) {
> > > +               blk_size = SZ_16M;
> > > +       } else {
> > > +               WARN_ON(1); 
> > > +       }
> > > +
> > > +       iova_base = iova & ~(blk_size - 1);
> > > +       pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova_base);
> > > +       paddr = arm_short_iova_to_phys(ops, iova_base);
> > > +       curpgd = *pgd;
> > > +
> > > +       if (blk_size == SZ_4K || blk_size == SZ_64K) {
> > > +               pte = arm_short_get_pte_in_pgd(*pgd, iova_base);
> > > +               curpte = *pte;
> > > +               __arm_short_set_pte(pte, 0, blk_size / SZ_4K, cfg);
> > > +
> > > +            pgtablefree = _arm_short_whether_free_pgtable(pgd);
> > > +            if (pgtablefree)
> > > +                 __arm_short_set_pte(pgd, 0, 1, cfg);
> > > +       } else if (blk_size == SZ_1M || blk_size == SZ_16M) {
> > > +               __arm_short_set_pte(pgd, 0, blk_size / SZ_1M, cfg);
> > > +       }
> > > +
> > > +    cfg->tlb->tlb_add_flush(iova_base, blk_size, true, cookie);
> > > +    cfg->tlb->tlb_sync(cookie);
> > > +
> > > +       if (pgtablefree)/* Free pgtable after tlb-flush */
> > > +              __arm_short_free_pgtable(ARM_SHORT_GET_PGTABLE_VA(curpgd),
> > > +                                        ARM_SHORT_BYTES_PER_PTE, false, cfg);
> > > +
> > > +       if (blk_size > size) { /* Split the block */
> > > +               return arm_short_split_blk_unmap(
> > > +                               ops, iova, paddr, size,
> > > +                               ARM_SHORT_PGD_GET_PROT(curpgd),
> > > +                               ARM_SHORT_PTE_LARGE_GET_PROT(curpte),
> > > +                               blk_size);
> > > +       } else if (blk_size < size) {
> > > +               /* Unmap the block while remap partial again after split */
> > > +               return blk_size +
> > > +                       arm_short_unmap(ops, iova + blk_size, size - blk_size);

Hi Will, Robin,
     I would like to show you a problem I met, The recursion here may
lead to stack overflow while we test FHD video decode.

    From the log, I get the internal variable in the error case: the
"size" is 0x100000, the "iova" is 0xfea00000, but at that time the
"blk_size" is 0x1000 as it was the map of small-page. so it enter the
recursion here.
    
    After check the unmap flow, there is only a iommu_unmap in
__iommu_dma_unmap, and it won't check the physical address align in
iommu_unmap.
so if the iova and size both are SZ_16M or SZ_1M align by chance, it
also enter the arm_short_unmap even though it was the map of small-page.

    So.
    a) Do we need unmap each sg item while iommu_dma_unmap?, like below:

//===============
static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t
dma_addr)
{
	/* ...and if we can't, then something is horribly, horribly wrong */
+       for_each_sg(sg, s, nents, i)
		BUG_ON(iommu_unmap(domain, pfn << shift, size) < size);
	__free_iova(iovad, iova);
}
//===============
 
    b). I need to add do-while which was suggested to delete from [1] in
arm_short_unmap for this case.

     After I test locally, I prepare to add the do-while like below:
     
//==============================
static int arm_short_unmap(struct io_pgtable_ops *ops,
			   unsigned long iova,
			   size_t size)
{
	struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
	struct io_pgtable_cfg *cfg = &data->iop.cfg;
	arm_short_iopte *pgd, *pte = NULL;
	arm_short_iopte curpgd, curpte = 0;
	unsigned int blk_base, blk_size;
	int unmap_size = 0;
	bool pgtempty;

	do {
		pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova);
		blk_size = 0;
		pgtempty = false;

		/* Get block size */
		if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(*pgd)) {
			pte = arm_short_get_pte_in_pgd(*pgd, iova);

			if (ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(*pte))
				blk_size = SZ_4K;
			else if (ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(*pte))
				blk_size = SZ_64K;
		} else if (ARM_SHORT_PGD_TYPE_IS_SECTION(*pgd)) {
			blk_size = SZ_1M;
		} else if (ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(*pgd)) {
			blk_size = SZ_16M;
		}

		if (WARN_ON(!blk_size))
			return 0;

		blk_base = iova & ~(blk_size - 1);
		pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(blk_base);
		curpgd = *pgd;

		if (blk_size == SZ_4K || blk_size == SZ_64K) {
			pte = arm_short_get_pte_in_pgd(*pgd, blk_base);
			curpte = *pte;
			__arm_short_set_pte(pte, 0, blk_size / SZ_4K, cfg);

			pgtempty = __arm_short_pgtable_empty(pgd);
			if (pgtempty)
				__arm_short_set_pte(pgd, 0, 1, cfg);
		} else if (blk_size == SZ_1M || blk_size == SZ_16M) {
			__arm_short_set_pte(pgd, 0, blk_size / SZ_1M, cfg);
		}

		cfg->tlb->tlb_add_flush(blk_base, blk_size, true, data->iop.cookie);
		cfg->tlb->tlb_sync(data->iop.cookie);

		if (pgtempty)/* Free lvl2 pgtable after tlb-flush */
			__arm_short_free_pgtable(
					ARM_SHORT_GET_PGTABLE_VA(curpgd),
					ARM_SHORT_BYTES_PER_PTE, false, cfg);

		/*
		 * If the unmap size which is from the pgsize_bitmap is more
		 * than the current blk_size, unmap it continuously.
		 */
		if (blk_size <= size) {
			iova += blk_size;
			size -= blk_size;
			unmap_size += blk_size;
			continue;
		} else { /* Split this block */
			return arm_short_split_blk_unmap(
					ops, iova, size, blk_size,
					ARM_SHORT_PGD_GET_PROT(curpgd),
					ARM_SHORT_PTE_GET_PROT_LARGE(curpte));
		}
	}while (size);

	return unmap_size;
}
//=============================

    Is there any other suggestion?
    Thanks very much.


[1]:http://lists.linuxfoundation.org/pipermail/iommu/2015-June/013322.html

> > > +       }
> > > +
> > > +       return size;
> > > +}
> > > +
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
@ 2015-09-22 14:12         ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-09-22 14:12 UTC (permalink / raw)
  To: linux-arm-kernel

> > > +static int arm_short_unmap(struct io_pgtable_ops *ops,
> > > +                          unsigned long iova,
> > > +                          size_t size)
> > > +{
> > > +       struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
> > > +       struct io_pgtable_cfg *cfg = &data->iop.cfg;
> > > +       arm_short_iopte *pgd, *pte = NULL;
> > > +       arm_short_iopte curpgd, curpte = 0;
> > > +       phys_addr_t paddr;
> > > +       unsigned int iova_base, blk_size = 0;
> > > +       void *cookie = data->iop.cookie;
> > > +       bool pgtablefree = false;
> > > +
> > > +       pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova);
> > > +
> > > +       /* Get block size */
> > > +       if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(*pgd)) {
> > > +               pte = arm_short_get_pte_in_pgd(*pgd, iova);
> > > +
> > > +               if (ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(*pte))
> > > +                       blk_size = SZ_4K;
> > > +               else if (ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(*pte))
> > > +                       blk_size = SZ_64K;
> > > +               else
> > > +                       WARN_ON(1);
> 
> > > +       } else if (ARM_SHORT_PGD_TYPE_IS_SECTION(*pgd)) {
> > > +               blk_size = SZ_1M;
> > > +       } else if (ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(*pgd)) {
> > > +               blk_size = SZ_16M;
> > > +       } else {
> > > +               WARN_ON(1); 
> > > +       }
> > > +
> > > +       iova_base = iova & ~(blk_size - 1);
> > > +       pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova_base);
> > > +       paddr = arm_short_iova_to_phys(ops, iova_base);
> > > +       curpgd = *pgd;
> > > +
> > > +       if (blk_size == SZ_4K || blk_size == SZ_64K) {
> > > +               pte = arm_short_get_pte_in_pgd(*pgd, iova_base);
> > > +               curpte = *pte;
> > > +               __arm_short_set_pte(pte, 0, blk_size / SZ_4K, cfg);
> > > +
> > > +            pgtablefree = _arm_short_whether_free_pgtable(pgd);
> > > +            if (pgtablefree)
> > > +                 __arm_short_set_pte(pgd, 0, 1, cfg);
> > > +       } else if (blk_size == SZ_1M || blk_size == SZ_16M) {
> > > +               __arm_short_set_pte(pgd, 0, blk_size / SZ_1M, cfg);
> > > +       }
> > > +
> > > +    cfg->tlb->tlb_add_flush(iova_base, blk_size, true, cookie);
> > > +    cfg->tlb->tlb_sync(cookie);
> > > +
> > > +       if (pgtablefree)/* Free pgtable after tlb-flush */
> > > +              __arm_short_free_pgtable(ARM_SHORT_GET_PGTABLE_VA(curpgd),
> > > +                                        ARM_SHORT_BYTES_PER_PTE, false, cfg);
> > > +
> > > +       if (blk_size > size) { /* Split the block */
> > > +               return arm_short_split_blk_unmap(
> > > +                               ops, iova, paddr, size,
> > > +                               ARM_SHORT_PGD_GET_PROT(curpgd),
> > > +                               ARM_SHORT_PTE_LARGE_GET_PROT(curpte),
> > > +                               blk_size);
> > > +       } else if (blk_size < size) {
> > > +               /* Unmap the block while remap partial again after split */
> > > +               return blk_size +
> > > +                       arm_short_unmap(ops, iova + blk_size, size - blk_size);

Hi Will, Robin,
     I would like to show you a problem I met, The recursion here may
lead to stack overflow while we test FHD video decode.

    From the log, I get the internal variable in the error case: the
"size" is 0x100000, the "iova" is 0xfea00000, but at that time the
"blk_size" is 0x1000 as it was the map of small-page. so it enter the
recursion here.
    
    After check the unmap flow, there is only a iommu_unmap in
__iommu_dma_unmap, and it won't check the physical address align in
iommu_unmap.
so if the iova and size both are SZ_16M or SZ_1M align by chance, it
also enter the arm_short_unmap even though it was the map of small-page.

    So.
    a) Do we need unmap each sg item while iommu_dma_unmap?, like below:

//===============
static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t
dma_addr)
{
	/* ...and if we can't, then something is horribly, horribly wrong */
+       for_each_sg(sg, s, nents, i)
		BUG_ON(iommu_unmap(domain, pfn << shift, size) < size);
	__free_iova(iovad, iova);
}
//===============
 
    b). I need to add do-while which was suggested to delete from [1] in
arm_short_unmap for this case.

     After I test locally, I prepare to add the do-while like below:
     
//==============================
static int arm_short_unmap(struct io_pgtable_ops *ops,
			   unsigned long iova,
			   size_t size)
{
	struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
	struct io_pgtable_cfg *cfg = &data->iop.cfg;
	arm_short_iopte *pgd, *pte = NULL;
	arm_short_iopte curpgd, curpte = 0;
	unsigned int blk_base, blk_size;
	int unmap_size = 0;
	bool pgtempty;

	do {
		pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(iova);
		blk_size = 0;
		pgtempty = false;

		/* Get block size */
		if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(*pgd)) {
			pte = arm_short_get_pte_in_pgd(*pgd, iova);

			if (ARM_SHORT_PTE_TYPE_IS_SMALLPAGE(*pte))
				blk_size = SZ_4K;
			else if (ARM_SHORT_PTE_TYPE_IS_LARGEPAGE(*pte))
				blk_size = SZ_64K;
		} else if (ARM_SHORT_PGD_TYPE_IS_SECTION(*pgd)) {
			blk_size = SZ_1M;
		} else if (ARM_SHORT_PGD_TYPE_IS_SUPERSECTION(*pgd)) {
			blk_size = SZ_16M;
		}

		if (WARN_ON(!blk_size))
			return 0;

		blk_base = iova & ~(blk_size - 1);
		pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(blk_base);
		curpgd = *pgd;

		if (blk_size == SZ_4K || blk_size == SZ_64K) {
			pte = arm_short_get_pte_in_pgd(*pgd, blk_base);
			curpte = *pte;
			__arm_short_set_pte(pte, 0, blk_size / SZ_4K, cfg);

			pgtempty = __arm_short_pgtable_empty(pgd);
			if (pgtempty)
				__arm_short_set_pte(pgd, 0, 1, cfg);
		} else if (blk_size == SZ_1M || blk_size == SZ_16M) {
			__arm_short_set_pte(pgd, 0, blk_size / SZ_1M, cfg);
		}

		cfg->tlb->tlb_add_flush(blk_base, blk_size, true, data->iop.cookie);
		cfg->tlb->tlb_sync(data->iop.cookie);

		if (pgtempty)/* Free lvl2 pgtable after tlb-flush */
			__arm_short_free_pgtable(
					ARM_SHORT_GET_PGTABLE_VA(curpgd),
					ARM_SHORT_BYTES_PER_PTE, false, cfg);

		/*
		 * If the unmap size which is from the pgsize_bitmap is more
		 * than the current blk_size, unmap it continuously.
		 */
		if (blk_size <= size) {
			iova += blk_size;
			size -= blk_size;
			unmap_size += blk_size;
			continue;
		} else { /* Split this block */
			return arm_short_split_blk_unmap(
					ops, iova, size, blk_size,
					ARM_SHORT_PGD_GET_PROT(curpgd),
					ARM_SHORT_PTE_GET_PROT_LARGE(curpte));
		}
	}while (size);

	return unmap_size;
}
//=============================

    Is there any other suggestion?
    Thanks very much.


[1]:http://lists.linuxfoundation.org/pipermail/iommu/2015-June/013322.html

> > > +       }
> > > +
> > > +       return size;
> > > +}
> > > +
> _______________________________________________
> iommu mailing list
> iommu at lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
  2015-09-22 14:12         ` Yong Wu
  (?)
@ 2015-10-09 15:57           ` Will Deacon
  -1 siblings, 0 replies; 60+ messages in thread
From: Will Deacon @ 2015-10-09 15:57 UTC (permalink / raw)
  To: Yong Wu
  Cc: Robin Murphy, Mark Rutland, Catalin Marinas, youhua.li,
	Thierry Reding, k.zhang, devicetree, arnd, Tomasz Figa,
	Rob Herring, linux-mediatek, Matthias Brugger, linux-arm-kernel,
	pebolle, frederic.chen, srv_heupstream, linux-kernel, iommu,
	Daniel Kurtz, Sasha Hauer, Lucas Stach

On Tue, Sep 22, 2015 at 03:12:47PM +0100, Yong Wu wrote:
>      I would like to show you a problem I met, The recursion here may
> lead to stack overflow while we test FHD video decode.
> 
>     From the log, I get the internal variable in the error case: the
> "size" is 0x100000, the "iova" is 0xfea00000, but at that time the
> "blk_size" is 0x1000 as it was the map of small-page. so it enter the
> recursion here.
>     
>     After check the unmap flow, there is only a iommu_unmap in
> __iommu_dma_unmap, and it won't check the physical address align in
> iommu_unmap.

That sounds like a bug in __iommu_dma_unmap. Robin?

Will

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
@ 2015-10-09 15:57           ` Will Deacon
  0 siblings, 0 replies; 60+ messages in thread
From: Will Deacon @ 2015-10-09 15:57 UTC (permalink / raw)
  To: Yong Wu
  Cc: Robin Murphy, Mark Rutland, Catalin Marinas,
	youhua.li-NuS5LvNUpcJWk0Htik3J/w, Thierry Reding,
	k.zhang-NuS5LvNUpcJWk0Htik3J/w,
	devicetree-u79uwXL29TY76Z2rM5mHXA, arnd-r2nGTMty4D4, Tomasz Figa,
	Rob Herring, linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Matthias Brugger,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	pebolle-IWqWACnzNjzz+pZb47iToQ,
	frederic.chen-NuS5LvNUpcJWk0Htik3J/w,
	srv_heupstream-NuS5LvNUpcJWk0Htik3J/w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org

On Tue, Sep 22, 2015 at 03:12:47PM +0100, Yong Wu wrote:
>      I would like to show you a problem I met, The recursion here may
> lead to stack overflow while we test FHD video decode.
> 
>     From the log, I get the internal variable in the error case: the
> "size" is 0x100000, the "iova" is 0xfea00000, but at that time the
> "blk_size" is 0x1000 as it was the map of small-page. so it enter the
> recursion here.
>     
>     After check the unmap flow, there is only a iommu_unmap in
> __iommu_dma_unmap, and it won't check the physical address align in
> iommu_unmap.

That sounds like a bug in __iommu_dma_unmap. Robin?

Will
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
@ 2015-10-09 15:57           ` Will Deacon
  0 siblings, 0 replies; 60+ messages in thread
From: Will Deacon @ 2015-10-09 15:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 22, 2015 at 03:12:47PM +0100, Yong Wu wrote:
>      I would like to show you a problem I met, The recursion here may
> lead to stack overflow while we test FHD video decode.
> 
>     From the log, I get the internal variable in the error case: the
> "size" is 0x100000, the "iova" is 0xfea00000, but at that time the
> "blk_size" is 0x1000 as it was the map of small-page. so it enter the
> recursion here.
>     
>     After check the unmap flow, there is only a iommu_unmap in
> __iommu_dma_unmap, and it won't check the physical address align in
> iommu_unmap.

That sounds like a bug in __iommu_dma_unmap. Robin?

Will

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
@ 2015-10-09 17:41             ` Robin Murphy
  0 siblings, 0 replies; 60+ messages in thread
From: Robin Murphy @ 2015-10-09 17:41 UTC (permalink / raw)
  To: Will Deacon, Yong Wu
  Cc: Mark Rutland, Catalin Marinas, youhua.li, Thierry Reding,
	k.zhang, devicetree, arnd, Tomasz Figa, Rob Herring,
	linux-mediatek, Matthias Brugger, linux-arm-kernel, pebolle,
	frederic.chen, srv_heupstream, linux-kernel, iommu, Daniel Kurtz,
	Sasha Hauer, Lucas Stach

On 09/10/15 16:57, Will Deacon wrote:
> On Tue, Sep 22, 2015 at 03:12:47PM +0100, Yong Wu wrote:
>>       I would like to show you a problem I met, The recursion here may
>> lead to stack overflow while we test FHD video decode.
>>
>>      From the log, I get the internal variable in the error case: the
>> "size" is 0x100000, the "iova" is 0xfea00000, but at that time the
>> "blk_size" is 0x1000 as it was the map of small-page. so it enter the
>> recursion here.
>>
>>      After check the unmap flow, there is only a iommu_unmap in
>> __iommu_dma_unmap, and it won't check the physical address align in
>> iommu_unmap.
>
> That sounds like a bug in __iommu_dma_unmap. Robin?

Isn't it just cf27ec930be9 again wearing different trousers? All I do is 
call iommu_unmap with the same total size that was mapped originally.

Robin.


^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
@ 2015-10-09 17:41             ` Robin Murphy
  0 siblings, 0 replies; 60+ messages in thread
From: Robin Murphy @ 2015-10-09 17:41 UTC (permalink / raw)
  To: Will Deacon, Yong Wu
  Cc: Mark Rutland, Catalin Marinas, youhua.li-NuS5LvNUpcJWk0Htik3J/w,
	Thierry Reding, k.zhang-NuS5LvNUpcJWk0Htik3J/w,
	devicetree-u79uwXL29TY76Z2rM5mHXA, arnd-r2nGTMty4D4, Tomasz Figa,
	Rob Herring, linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Matthias Brugger,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	pebolle-IWqWACnzNjzz+pZb47iToQ,
	frederic.chen-NuS5LvNUpcJWk0Htik3J/w,
	srv_heupstream-NuS5LvNUpcJWk0Htik3J/w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Daniel

On 09/10/15 16:57, Will Deacon wrote:
> On Tue, Sep 22, 2015 at 03:12:47PM +0100, Yong Wu wrote:
>>       I would like to show you a problem I met, The recursion here may
>> lead to stack overflow while we test FHD video decode.
>>
>>      From the log, I get the internal variable in the error case: the
>> "size" is 0x100000, the "iova" is 0xfea00000, but at that time the
>> "blk_size" is 0x1000 as it was the map of small-page. so it enter the
>> recursion here.
>>
>>      After check the unmap flow, there is only a iommu_unmap in
>> __iommu_dma_unmap, and it won't check the physical address align in
>> iommu_unmap.
>
> That sounds like a bug in __iommu_dma_unmap. Robin?

Isn't it just cf27ec930be9 again wearing different trousers? All I do is 
call iommu_unmap with the same total size that was mapped originally.

Robin.

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
@ 2015-10-09 17:41             ` Robin Murphy
  0 siblings, 0 replies; 60+ messages in thread
From: Robin Murphy @ 2015-10-09 17:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/10/15 16:57, Will Deacon wrote:
> On Tue, Sep 22, 2015 at 03:12:47PM +0100, Yong Wu wrote:
>>       I would like to show you a problem I met, The recursion here may
>> lead to stack overflow while we test FHD video decode.
>>
>>      From the log, I get the internal variable in the error case: the
>> "size" is 0x100000, the "iova" is 0xfea00000, but at that time the
>> "blk_size" is 0x1000 as it was the map of small-page. so it enter the
>> recursion here.
>>
>>      After check the unmap flow, there is only a iommu_unmap in
>> __iommu_dma_unmap, and it won't check the physical address align in
>> iommu_unmap.
>
> That sounds like a bug in __iommu_dma_unmap. Robin?

Isn't it just cf27ec930be9 again wearing different trousers? All I do is 
call iommu_unmap with the same total size that was mapped originally.

Robin.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
@ 2015-10-09 18:19               ` Will Deacon
  0 siblings, 0 replies; 60+ messages in thread
From: Will Deacon @ 2015-10-09 18:19 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Yong Wu, Mark Rutland, Catalin Marinas, youhua.li,
	Thierry Reding, k.zhang, devicetree, arnd, Tomasz Figa,
	Rob Herring, linux-mediatek, Matthias Brugger, linux-arm-kernel,
	pebolle, frederic.chen, srv_heupstream, linux-kernel, iommu,
	Daniel Kurtz, Sasha Hauer, Lucas Stach

On Fri, Oct 09, 2015 at 06:41:51PM +0100, Robin Murphy wrote:
> On 09/10/15 16:57, Will Deacon wrote:
> >On Tue, Sep 22, 2015 at 03:12:47PM +0100, Yong Wu wrote:
> >>      I would like to show you a problem I met, The recursion here may
> >>lead to stack overflow while we test FHD video decode.
> >>
> >>     From the log, I get the internal variable in the error case: the
> >>"size" is 0x100000, the "iova" is 0xfea00000, but at that time the
> >>"blk_size" is 0x1000 as it was the map of small-page. so it enter the
> >>recursion here.
> >>
> >>     After check the unmap flow, there is only a iommu_unmap in
> >>__iommu_dma_unmap, and it won't check the physical address align in
> >>iommu_unmap.
> >
> >That sounds like a bug in __iommu_dma_unmap. Robin?
> 
> Isn't it just cf27ec930be9 again wearing different trousers? All I do is
> call iommu_unmap with the same total size that was mapped originally.

I don't think it's the same as that issue, which was to do with installing
block mappings over the top of an existing table entry. The problem here
seems to be that we don't walk the page table properly on unmap.

The long descriptor code has:

	/* If the size matches this level, we're in the right place */
	if (size == blk_size) {
		__arm_lpae_set_pte(ptep, 0, &data->iop.cfg);

		if (!iopte_leaf(pte, lvl)) {
			/* Also flush any partial walks */
			tlb->tlb_add_flush(iova, size, false, cookie);
			tlb->tlb_sync(cookie);
			ptep = iopte_deref(pte, data);
			__arm_lpae_free_pgtable(data, lvl + 1, ptep);
		} else {
			tlb->tlb_add_flush(iova, size, true, cookie);
		}

		return size;
	} else if (iopte_leaf(pte, lvl)) {
		/*
		 * Insert a table at the next level to map the old region,
		 * minus the part we want to unmap
		 */
		return arm_lpae_split_blk_unmap(data, iova, size,
						iopte_prot(pte), lvl, ptep,
						blk_size);
	}

why doesn't something similar work for short descriptors?

Will

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
@ 2015-10-09 18:19               ` Will Deacon
  0 siblings, 0 replies; 60+ messages in thread
From: Will Deacon @ 2015-10-09 18:19 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Mark Rutland, k.zhang-NuS5LvNUpcJWk0Htik3J/w,
	pebolle-IWqWACnzNjzz+pZb47iToQ,
	frederic.chen-NuS5LvNUpcJWk0Htik3J/w,
	srv_heupstream-NuS5LvNUpcJWk0Htik3J/w, arnd-r2nGTMty4D4,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Catalin Marinas,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tomasz Figa,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Sasha Hauer,
	Lucas Stach, Matthias Brugger, Daniel Kurtz

On Fri, Oct 09, 2015 at 06:41:51PM +0100, Robin Murphy wrote:
> On 09/10/15 16:57, Will Deacon wrote:
> >On Tue, Sep 22, 2015 at 03:12:47PM +0100, Yong Wu wrote:
> >>      I would like to show you a problem I met, The recursion here may
> >>lead to stack overflow while we test FHD video decode.
> >>
> >>     From the log, I get the internal variable in the error case: the
> >>"size" is 0x100000, the "iova" is 0xfea00000, but at that time the
> >>"blk_size" is 0x1000 as it was the map of small-page. so it enter the
> >>recursion here.
> >>
> >>     After check the unmap flow, there is only a iommu_unmap in
> >>__iommu_dma_unmap, and it won't check the physical address align in
> >>iommu_unmap.
> >
> >That sounds like a bug in __iommu_dma_unmap. Robin?
> 
> Isn't it just cf27ec930be9 again wearing different trousers? All I do is
> call iommu_unmap with the same total size that was mapped originally.

I don't think it's the same as that issue, which was to do with installing
block mappings over the top of an existing table entry. The problem here
seems to be that we don't walk the page table properly on unmap.

The long descriptor code has:

	/* If the size matches this level, we're in the right place */
	if (size == blk_size) {
		__arm_lpae_set_pte(ptep, 0, &data->iop.cfg);

		if (!iopte_leaf(pte, lvl)) {
			/* Also flush any partial walks */
			tlb->tlb_add_flush(iova, size, false, cookie);
			tlb->tlb_sync(cookie);
			ptep = iopte_deref(pte, data);
			__arm_lpae_free_pgtable(data, lvl + 1, ptep);
		} else {
			tlb->tlb_add_flush(iova, size, true, cookie);
		}

		return size;
	} else if (iopte_leaf(pte, lvl)) {
		/*
		 * Insert a table at the next level to map the old region,
		 * minus the part we want to unmap
		 */
		return arm_lpae_split_blk_unmap(data, iova, size,
						iopte_prot(pte), lvl, ptep,
						blk_size);
	}

why doesn't something similar work for short descriptors?

Will

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
@ 2015-10-09 18:19               ` Will Deacon
  0 siblings, 0 replies; 60+ messages in thread
From: Will Deacon @ 2015-10-09 18:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 09, 2015 at 06:41:51PM +0100, Robin Murphy wrote:
> On 09/10/15 16:57, Will Deacon wrote:
> >On Tue, Sep 22, 2015 at 03:12:47PM +0100, Yong Wu wrote:
> >>      I would like to show you a problem I met, The recursion here may
> >>lead to stack overflow while we test FHD video decode.
> >>
> >>     From the log, I get the internal variable in the error case: the
> >>"size" is 0x100000, the "iova" is 0xfea00000, but at that time the
> >>"blk_size" is 0x1000 as it was the map of small-page. so it enter the
> >>recursion here.
> >>
> >>     After check the unmap flow, there is only a iommu_unmap in
> >>__iommu_dma_unmap, and it won't check the physical address align in
> >>iommu_unmap.
> >
> >That sounds like a bug in __iommu_dma_unmap. Robin?
> 
> Isn't it just cf27ec930be9 again wearing different trousers? All I do is
> call iommu_unmap with the same total size that was mapped originally.

I don't think it's the same as that issue, which was to do with installing
block mappings over the top of an existing table entry. The problem here
seems to be that we don't walk the page table properly on unmap.

The long descriptor code has:

	/* If the size matches this level, we're in the right place */
	if (size == blk_size) {
		__arm_lpae_set_pte(ptep, 0, &data->iop.cfg);

		if (!iopte_leaf(pte, lvl)) {
			/* Also flush any partial walks */
			tlb->tlb_add_flush(iova, size, false, cookie);
			tlb->tlb_sync(cookie);
			ptep = iopte_deref(pte, data);
			__arm_lpae_free_pgtable(data, lvl + 1, ptep);
		} else {
			tlb->tlb_add_flush(iova, size, true, cookie);
		}

		return size;
	} else if (iopte_leaf(pte, lvl)) {
		/*
		 * Insert a table at the next level to map the old region,
		 * minus the part we want to unmap
		 */
		return arm_lpae_split_blk_unmap(data, iova, size,
						iopte_prot(pte), lvl, ptep,
						blk_size);
	}

why doesn't something similar work for short descriptors?

Will

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
@ 2015-10-21 10:34                 ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-10-21 10:34 UTC (permalink / raw)
  To: Will Deacon, Robin Murphy, Joerg Roedel
  Cc: Mark Rutland, Catalin Marinas, youhua.li, Thierry Reding,
	k.zhang, devicetree, arnd, Tomasz Figa, Rob Herring,
	linux-mediatek, Matthias Brugger, linux-arm-kernel, pebolle,
	frederic.chen, srv_heupstream, linux-kernel, iommu, Daniel Kurtz,
	Sasha Hauer, Lucas Stach

On Fri, 2015-10-09 at 19:19 +0100, Will Deacon wrote:
> On Fri, Oct 09, 2015 at 06:41:51PM +0100, Robin Murphy wrote:
> > On 09/10/15 16:57, Will Deacon wrote:
> > >On Tue, Sep 22, 2015 at 03:12:47PM +0100, Yong Wu wrote:
> > >>      I would like to show you a problem I met, The recursion here may
> > >>lead to stack overflow while we test FHD video decode.
> > >>
> > >>     From the log, I get the internal variable in the error case: the
> > >>"size" is 0x100000, the "iova" is 0xfea00000, but at that time the
> > >>"blk_size" is 0x1000 as it was the map of small-page. so it enter the
> > >>recursion here.
> > >>
> > >>     After check the unmap flow, there is only a iommu_unmap in
> > >>__iommu_dma_unmap, and it won't check the physical address align in
> > >>iommu_unmap.
> > >
> > >That sounds like a bug in __iommu_dma_unmap. Robin?
> > 
> > Isn't it just cf27ec930be9 again wearing different trousers? All I do is
> > call iommu_unmap with the same total size that was mapped originally.
> 
> I don't think it's the same as that issue, which was to do with installing
> block mappings over the top of an existing table entry. The problem here
> seems to be that we don't walk the page table properly on unmap.
> 
> The long descriptor code has:
> 
> 	/* If the size matches this level, we're in the right place */
> 	if (size == blk_size) {
> 		__arm_lpae_set_pte(ptep, 0, &data->iop.cfg);
> 
> 		if (!iopte_leaf(pte, lvl)) {
> 			/* Also flush any partial walks */
> 			tlb->tlb_add_flush(iova, size, false, cookie);
> 			tlb->tlb_sync(cookie);
> 			ptep = iopte_deref(pte, data);
> 			__arm_lpae_free_pgtable(data, lvl + 1, ptep);
> 		} else {
> 			tlb->tlb_add_flush(iova, size, true, cookie);
> 		}
> 
> 		return size;
> 	} else if (iopte_leaf(pte, lvl)) {
> 		/*
> 		 * Insert a table at the next level to map the old region,
> 		 * minus the part we want to unmap
> 		 */
> 		return arm_lpae_split_blk_unmap(data, iova, size,
> 						iopte_prot(pte), lvl, ptep,
> 						blk_size);
> 	}
> 
> why doesn't something similar work for short descriptors?
> 
> Will

Hi Will,

   There are some different between long and short descriptor, I can not
use it directly.

1. Long descriptor control the blk_size with 3 levels easily whose 
lvl1 is 4KB, lvl2 is 2MB and lvl3 is 1GB in stage 1. It have 3 levels
pagetable, then it use 3 levels block_size here. It is ok.

But I don't use the "level" in short descriptor. At the beginning of
designing short, I planned to use 4 levels whose lvl1 is 4KB, lvl2 is
64KB, lvl3 is 1MB, lvl4 is 16MB in short descriptor. then the code may
be more similar with long descriptor. But there is only 2 levels
pagetable in short. if we use 4 levels here, It may lead to
misunderstand. so I don't use the "level" and list the four case in map
and unmap.
(If you think short-descriptor could use 4 level like above, I can try
it.)

2. Following the unmap in long, if it's not a leaf, we free the
pagetable, then we can delete do-while. I have tested this:
 
//===========================
static int arm_short_unmap(struct io_pgtable_ops *ops,
			   unsigned long iova,
			   size_t size)
{
	struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
	struct io_pgtable_cfg *cfg = &data->iop.cfg;
	void *cookie = data->iop.cookie;
	arm_short_iopte *pgd, *pte = NULL;
	arm_short_iopte pgd_tmp, pte_tmp = 0;
	unsigned int blk_size = 0, blk_base;
	bool empty = false, split = false;
	int i;

	blk_size = arm_short_iova_to_blk_size(ops, iova);
	if (WARN_ON(!blk_size))
		return 0;

	blk_base = iova & ~(blk_size - 1);
	pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(blk_base);

	if (size == SZ_1M || size == SZ_16M) {/* section or supersection */
		for (i = 0; i < size/SZ_1M; i++, pgd++, blk_base += SZ_1M) {
			pgd_tmp = *pgd;
			__arm_short_set_pte(pgd, 0, 1, cfg);

			cfg->tlb->tlb_add_flush(blk_base, SZ_1M, true, cookie);
			cfg->tlb->tlb_sync(cookie);

			/* Lvl2 pgtable should be freed while current is pgtable */
			if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(pgd_tmp))
				__arm_short_free_pgtable(
					ARM_SHORT_GET_PGTABLE_VA(pgd_tmp),
					ARM_SHORT_BYTES_PER_PTE, false, cfg);

			/* Split is needed while unmap 1M in supersection */
			if (size == SZ_1M && blk_size == SZ_16M)
				split = true;
		}
	} else if (size == SZ_4K || size == SZ_64K) {/* page or large page */
		pgd_tmp = *pgd;

		/* Unmap the current node */
		if (blk_size == SZ_4K || blk_size == SZ_64K) {
			pte = arm_short_get_pte_in_pgd(*pgd, blk_base);
			pte_tmp = *pte;
			__arm_short_set_pte(
				pte, 0, max_t(size_t, size, blk_size) / SZ_4K, cfg);

			empty = __arm_short_pgtable_empty(pgd);
			if (empty)
				__arm_short_set_pte(pgd, 0, 1, cfg);
		} else if (blk_size == SZ_1M || blk_size == SZ_16M) {
			__arm_short_set_pte(pgd, 0, blk_size / SZ_1M, cfg);
		}

		cfg->tlb->tlb_add_flush(blk_size, size, true, cookie);
		cfg->tlb->tlb_sync(cookie);

		if (empty)/* Free lvl2 pgtable */
			__arm_short_free_pgtable(
					ARM_SHORT_GET_PGTABLE_VA(pgd_tmp),
					ARM_SHORT_BYTES_PER_PTE, false, cfg);

		if (blk_size > size)
			split = true;

	} else {
		return 0;/* Unmapped size */
	}

	if (split)/* Split while blk_size > size */
		return arm_short_split_blk_unmap(
				ops, iova, size, blk_size,
				ARM_SHORT_PGD_GET_PROT(pgd_tmp),
				ARM_SHORT_PTE_GET_PROT_LARGE(pte_tmp));

	return size;
}
//===========================
This look also not good. The do-while in our v5 maybe better than this
one. what's your opinion?

3. (Also add Joerg)There is a line in iommu_map:
size_t pgsize = iommu_pgsize(domain, iova | paddr, size);

   And there is a line in iommu_unmap:
size_t pgsize = iommu_pgsize(domain, iova, size - unmapped);

   Is it possible to change like this:
phys_addr_t paddr = domain->ops->iova_to_phys(domain, iova);
size_t pgsize = iommu_pgsize(domain, iova | paddr, size - unmapped);

   If we add physical address align check in iommu_unmap, then it may be
simpler in the unmap flow.
   I think iommu_map&iommu_unmap both should be atmoic map/unmap,
iommu_map check the physical address, Why iommu_unmap don't check the
physical address?



^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
@ 2015-10-21 10:34                 ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-10-21 10:34 UTC (permalink / raw)
  To: Will Deacon, Robin Murphy, Joerg Roedel
  Cc: Mark Rutland, Catalin Marinas, youhua.li-NuS5LvNUpcJWk0Htik3J/w,
	Thierry Reding, k.zhang-NuS5LvNUpcJWk0Htik3J/w,
	devicetree-u79uwXL29TY76Z2rM5mHXA, arnd-r2nGTMty4D4, Tomasz Figa,
	Rob Herring, linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	Matthias Brugger,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	pebolle-IWqWACnzNjzz+pZb47iToQ,
	frederic.chen-NuS5LvNUpcJWk0Htik3J/w,
	srv_heupstream-NuS5LvNUpcJWk0Htik3J/w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Daniel

On Fri, 2015-10-09 at 19:19 +0100, Will Deacon wrote:
> On Fri, Oct 09, 2015 at 06:41:51PM +0100, Robin Murphy wrote:
> > On 09/10/15 16:57, Will Deacon wrote:
> > >On Tue, Sep 22, 2015 at 03:12:47PM +0100, Yong Wu wrote:
> > >>      I would like to show you a problem I met, The recursion here may
> > >>lead to stack overflow while we test FHD video decode.
> > >>
> > >>     From the log, I get the internal variable in the error case: the
> > >>"size" is 0x100000, the "iova" is 0xfea00000, but at that time the
> > >>"blk_size" is 0x1000 as it was the map of small-page. so it enter the
> > >>recursion here.
> > >>
> > >>     After check the unmap flow, there is only a iommu_unmap in
> > >>__iommu_dma_unmap, and it won't check the physical address align in
> > >>iommu_unmap.
> > >
> > >That sounds like a bug in __iommu_dma_unmap. Robin?
> > 
> > Isn't it just cf27ec930be9 again wearing different trousers? All I do is
> > call iommu_unmap with the same total size that was mapped originally.
> 
> I don't think it's the same as that issue, which was to do with installing
> block mappings over the top of an existing table entry. The problem here
> seems to be that we don't walk the page table properly on unmap.
> 
> The long descriptor code has:
> 
> 	/* If the size matches this level, we're in the right place */
> 	if (size == blk_size) {
> 		__arm_lpae_set_pte(ptep, 0, &data->iop.cfg);
> 
> 		if (!iopte_leaf(pte, lvl)) {
> 			/* Also flush any partial walks */
> 			tlb->tlb_add_flush(iova, size, false, cookie);
> 			tlb->tlb_sync(cookie);
> 			ptep = iopte_deref(pte, data);
> 			__arm_lpae_free_pgtable(data, lvl + 1, ptep);
> 		} else {
> 			tlb->tlb_add_flush(iova, size, true, cookie);
> 		}
> 
> 		return size;
> 	} else if (iopte_leaf(pte, lvl)) {
> 		/*
> 		 * Insert a table at the next level to map the old region,
> 		 * minus the part we want to unmap
> 		 */
> 		return arm_lpae_split_blk_unmap(data, iova, size,
> 						iopte_prot(pte), lvl, ptep,
> 						blk_size);
> 	}
> 
> why doesn't something similar work for short descriptors?
> 
> Will

Hi Will,

   There are some different between long and short descriptor, I can not
use it directly.

1. Long descriptor control the blk_size with 3 levels easily whose 
lvl1 is 4KB, lvl2 is 2MB and lvl3 is 1GB in stage 1. It have 3 levels
pagetable, then it use 3 levels block_size here. It is ok.

But I don't use the "level" in short descriptor. At the beginning of
designing short, I planned to use 4 levels whose lvl1 is 4KB, lvl2 is
64KB, lvl3 is 1MB, lvl4 is 16MB in short descriptor. then the code may
be more similar with long descriptor. But there is only 2 levels
pagetable in short. if we use 4 levels here, It may lead to
misunderstand. so I don't use the "level" and list the four case in map
and unmap.
(If you think short-descriptor could use 4 level like above, I can try
it.)

2. Following the unmap in long, if it's not a leaf, we free the
pagetable, then we can delete do-while. I have tested this:
 
//===========================
static int arm_short_unmap(struct io_pgtable_ops *ops,
			   unsigned long iova,
			   size_t size)
{
	struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
	struct io_pgtable_cfg *cfg = &data->iop.cfg;
	void *cookie = data->iop.cookie;
	arm_short_iopte *pgd, *pte = NULL;
	arm_short_iopte pgd_tmp, pte_tmp = 0;
	unsigned int blk_size = 0, blk_base;
	bool empty = false, split = false;
	int i;

	blk_size = arm_short_iova_to_blk_size(ops, iova);
	if (WARN_ON(!blk_size))
		return 0;

	blk_base = iova & ~(blk_size - 1);
	pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(blk_base);

	if (size == SZ_1M || size == SZ_16M) {/* section or supersection */
		for (i = 0; i < size/SZ_1M; i++, pgd++, blk_base += SZ_1M) {
			pgd_tmp = *pgd;
			__arm_short_set_pte(pgd, 0, 1, cfg);

			cfg->tlb->tlb_add_flush(blk_base, SZ_1M, true, cookie);
			cfg->tlb->tlb_sync(cookie);

			/* Lvl2 pgtable should be freed while current is pgtable */
			if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(pgd_tmp))
				__arm_short_free_pgtable(
					ARM_SHORT_GET_PGTABLE_VA(pgd_tmp),
					ARM_SHORT_BYTES_PER_PTE, false, cfg);

			/* Split is needed while unmap 1M in supersection */
			if (size == SZ_1M && blk_size == SZ_16M)
				split = true;
		}
	} else if (size == SZ_4K || size == SZ_64K) {/* page or large page */
		pgd_tmp = *pgd;

		/* Unmap the current node */
		if (blk_size == SZ_4K || blk_size == SZ_64K) {
			pte = arm_short_get_pte_in_pgd(*pgd, blk_base);
			pte_tmp = *pte;
			__arm_short_set_pte(
				pte, 0, max_t(size_t, size, blk_size) / SZ_4K, cfg);

			empty = __arm_short_pgtable_empty(pgd);
			if (empty)
				__arm_short_set_pte(pgd, 0, 1, cfg);
		} else if (blk_size == SZ_1M || blk_size == SZ_16M) {
			__arm_short_set_pte(pgd, 0, blk_size / SZ_1M, cfg);
		}

		cfg->tlb->tlb_add_flush(blk_size, size, true, cookie);
		cfg->tlb->tlb_sync(cookie);

		if (empty)/* Free lvl2 pgtable */
			__arm_short_free_pgtable(
					ARM_SHORT_GET_PGTABLE_VA(pgd_tmp),
					ARM_SHORT_BYTES_PER_PTE, false, cfg);

		if (blk_size > size)
			split = true;

	} else {
		return 0;/* Unmapped size */
	}

	if (split)/* Split while blk_size > size */
		return arm_short_split_blk_unmap(
				ops, iova, size, blk_size,
				ARM_SHORT_PGD_GET_PROT(pgd_tmp),
				ARM_SHORT_PTE_GET_PROT_LARGE(pte_tmp));

	return size;
}
//===========================
This look also not good. The do-while in our v5 maybe better than this
one. what's your opinion?

3. (Also add Joerg)There is a line in iommu_map:
size_t pgsize = iommu_pgsize(domain, iova | paddr, size);

   And there is a line in iommu_unmap:
size_t pgsize = iommu_pgsize(domain, iova, size - unmapped);

   Is it possible to change like this:
phys_addr_t paddr = domain->ops->iova_to_phys(domain, iova);
size_t pgsize = iommu_pgsize(domain, iova | paddr, size - unmapped);

   If we add physical address align check in iommu_unmap, then it may be
simpler in the unmap flow.
   I think iommu_map&iommu_unmap both should be atmoic map/unmap,
iommu_map check the physical address, Why iommu_unmap don't check the
physical address?


--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator.
@ 2015-10-21 10:34                 ` Yong Wu
  0 siblings, 0 replies; 60+ messages in thread
From: Yong Wu @ 2015-10-21 10:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2015-10-09 at 19:19 +0100, Will Deacon wrote:
> On Fri, Oct 09, 2015 at 06:41:51PM +0100, Robin Murphy wrote:
> > On 09/10/15 16:57, Will Deacon wrote:
> > >On Tue, Sep 22, 2015 at 03:12:47PM +0100, Yong Wu wrote:
> > >>      I would like to show you a problem I met, The recursion here may
> > >>lead to stack overflow while we test FHD video decode.
> > >>
> > >>     From the log, I get the internal variable in the error case: the
> > >>"size" is 0x100000, the "iova" is 0xfea00000, but at that time the
> > >>"blk_size" is 0x1000 as it was the map of small-page. so it enter the
> > >>recursion here.
> > >>
> > >>     After check the unmap flow, there is only a iommu_unmap in
> > >>__iommu_dma_unmap, and it won't check the physical address align in
> > >>iommu_unmap.
> > >
> > >That sounds like a bug in __iommu_dma_unmap. Robin?
> > 
> > Isn't it just cf27ec930be9 again wearing different trousers? All I do is
> > call iommu_unmap with the same total size that was mapped originally.
> 
> I don't think it's the same as that issue, which was to do with installing
> block mappings over the top of an existing table entry. The problem here
> seems to be that we don't walk the page table properly on unmap.
> 
> The long descriptor code has:
> 
> 	/* If the size matches this level, we're in the right place */
> 	if (size == blk_size) {
> 		__arm_lpae_set_pte(ptep, 0, &data->iop.cfg);
> 
> 		if (!iopte_leaf(pte, lvl)) {
> 			/* Also flush any partial walks */
> 			tlb->tlb_add_flush(iova, size, false, cookie);
> 			tlb->tlb_sync(cookie);
> 			ptep = iopte_deref(pte, data);
> 			__arm_lpae_free_pgtable(data, lvl + 1, ptep);
> 		} else {
> 			tlb->tlb_add_flush(iova, size, true, cookie);
> 		}
> 
> 		return size;
> 	} else if (iopte_leaf(pte, lvl)) {
> 		/*
> 		 * Insert a table at the next level to map the old region,
> 		 * minus the part we want to unmap
> 		 */
> 		return arm_lpae_split_blk_unmap(data, iova, size,
> 						iopte_prot(pte), lvl, ptep,
> 						blk_size);
> 	}
> 
> why doesn't something similar work for short descriptors?
> 
> Will

Hi Will,

   There are some different between long and short descriptor, I can not
use it directly.

1. Long descriptor control the blk_size with 3 levels easily whose 
lvl1 is 4KB, lvl2 is 2MB and lvl3 is 1GB in stage 1. It have 3 levels
pagetable, then it use 3 levels block_size here. It is ok.

But I don't use the "level" in short descriptor. At the beginning of
designing short, I planned to use 4 levels whose lvl1 is 4KB, lvl2 is
64KB, lvl3 is 1MB, lvl4 is 16MB in short descriptor. then the code may
be more similar with long descriptor. But there is only 2 levels
pagetable in short. if we use 4 levels here, It may lead to
misunderstand. so I don't use the "level" and list the four case in map
and unmap.
(If you think short-descriptor could use 4 level like above, I can try
it.)

2. Following the unmap in long, if it's not a leaf, we free the
pagetable, then we can delete do-while. I have tested this:
 
//===========================
static int arm_short_unmap(struct io_pgtable_ops *ops,
			   unsigned long iova,
			   size_t size)
{
	struct arm_short_io_pgtable *data = io_pgtable_ops_to_data(ops);
	struct io_pgtable_cfg *cfg = &data->iop.cfg;
	void *cookie = data->iop.cookie;
	arm_short_iopte *pgd, *pte = NULL;
	arm_short_iopte pgd_tmp, pte_tmp = 0;
	unsigned int blk_size = 0, blk_base;
	bool empty = false, split = false;
	int i;

	blk_size = arm_short_iova_to_blk_size(ops, iova);
	if (WARN_ON(!blk_size))
		return 0;

	blk_base = iova & ~(blk_size - 1);
	pgd = (arm_short_iopte *)data->pgd + ARM_SHORT_PGD_IDX(blk_base);

	if (size == SZ_1M || size == SZ_16M) {/* section or supersection */
		for (i = 0; i < size/SZ_1M; i++, pgd++, blk_base += SZ_1M) {
			pgd_tmp = *pgd;
			__arm_short_set_pte(pgd, 0, 1, cfg);

			cfg->tlb->tlb_add_flush(blk_base, SZ_1M, true, cookie);
			cfg->tlb->tlb_sync(cookie);

			/* Lvl2 pgtable should be freed while current is pgtable */
			if (ARM_SHORT_PGD_TYPE_IS_PGTABLE(pgd_tmp))
				__arm_short_free_pgtable(
					ARM_SHORT_GET_PGTABLE_VA(pgd_tmp),
					ARM_SHORT_BYTES_PER_PTE, false, cfg);

			/* Split is needed while unmap 1M in supersection */
			if (size == SZ_1M && blk_size == SZ_16M)
				split = true;
		}
	} else if (size == SZ_4K || size == SZ_64K) {/* page or large page */
		pgd_tmp = *pgd;

		/* Unmap the current node */
		if (blk_size == SZ_4K || blk_size == SZ_64K) {
			pte = arm_short_get_pte_in_pgd(*pgd, blk_base);
			pte_tmp = *pte;
			__arm_short_set_pte(
				pte, 0, max_t(size_t, size, blk_size) / SZ_4K, cfg);

			empty = __arm_short_pgtable_empty(pgd);
			if (empty)
				__arm_short_set_pte(pgd, 0, 1, cfg);
		} else if (blk_size == SZ_1M || blk_size == SZ_16M) {
			__arm_short_set_pte(pgd, 0, blk_size / SZ_1M, cfg);
		}

		cfg->tlb->tlb_add_flush(blk_size, size, true, cookie);
		cfg->tlb->tlb_sync(cookie);

		if (empty)/* Free lvl2 pgtable */
			__arm_short_free_pgtable(
					ARM_SHORT_GET_PGTABLE_VA(pgd_tmp),
					ARM_SHORT_BYTES_PER_PTE, false, cfg);

		if (blk_size > size)
			split = true;

	} else {
		return 0;/* Unmapped size */
	}

	if (split)/* Split while blk_size > size */
		return arm_short_split_blk_unmap(
				ops, iova, size, blk_size,
				ARM_SHORT_PGD_GET_PROT(pgd_tmp),
				ARM_SHORT_PTE_GET_PROT_LARGE(pte_tmp));

	return size;
}
//===========================
This look also not good. The do-while in our v5 maybe better than this
one. what's your opinion?

3. (Also add Joerg)There is a line in iommu_map:
size_t pgsize = iommu_pgsize(domain, iova | paddr, size);

   And there is a line in iommu_unmap:
size_t pgsize = iommu_pgsize(domain, iova, size - unmapped);

   Is it possible to change like this:
phys_addr_t paddr = domain->ops->iova_to_phys(domain, iova);
size_t pgsize = iommu_pgsize(domain, iova | paddr, size - unmapped);

   If we add physical address align check in iommu_unmap, then it may be
simpler in the unmap flow.
   I think iommu_map&iommu_unmap both should be atmoic map/unmap,
iommu_map check the physical address, Why iommu_unmap don't check the
physical address?

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2015-10-21 10:34 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-03 10:21 [PATCH v4 0/6] MT8173 IOMMU SUPPORT Yong Wu
2015-08-03 10:21 ` Yong Wu
2015-08-03 10:21 ` Yong Wu
2015-08-03 10:21 ` [PATCH v4 1/6] dt-bindings: iommu: Add binding for mediatek IOMMU Yong Wu
2015-08-03 10:21   ` Yong Wu
2015-08-03 10:21   ` Yong Wu
2015-08-03 10:21 ` [PATCH v4 2/6] dt-bindings: mediatek: Add smi dts binding Yong Wu
2015-08-03 10:21   ` Yong Wu
2015-08-03 10:21   ` Yong Wu
2015-08-03 10:21 ` [PATCH v4 3/6] iommu: add ARM short descriptor page table allocator Yong Wu
2015-08-03 10:21   ` Yong Wu
2015-08-03 10:21   ` Yong Wu
2015-09-16 15:58   ` Will Deacon
2015-09-16 15:58     ` Will Deacon
2015-09-16 15:58     ` Will Deacon
2015-09-17 14:54     ` Yong Wu
2015-09-17 14:54       ` Yong Wu
2015-09-17 14:54       ` Yong Wu
2015-09-22 14:12       ` Yong Wu
2015-09-22 14:12         ` Yong Wu
2015-09-22 14:12         ` Yong Wu
2015-10-09 15:57         ` Will Deacon
2015-10-09 15:57           ` Will Deacon
2015-10-09 15:57           ` Will Deacon
2015-10-09 17:41           ` Robin Murphy
2015-10-09 17:41             ` Robin Murphy
2015-10-09 17:41             ` Robin Murphy
2015-10-09 18:19             ` Will Deacon
2015-10-09 18:19               ` Will Deacon
2015-10-09 18:19               ` Will Deacon
2015-10-21 10:34               ` Yong Wu
2015-10-21 10:34                 ` Yong Wu
2015-10-21 10:34                 ` Yong Wu
2015-08-03 10:21 ` [PATCH v4 4/6] memory: mediatek: Add SMI driver Yong Wu
2015-08-03 10:21   ` Yong Wu
2015-08-03 10:21   ` Yong Wu
2015-08-11 14:56   ` Joerg Roedel
2015-08-11 14:56     ` Joerg Roedel
2015-08-11 14:56     ` Joerg Roedel
2015-08-12 12:39     ` Yong Wu
2015-08-12 12:39       ` Yong Wu
2015-08-12 12:39       ` Yong Wu
2015-08-03 10:21 ` [PATCH v4 5/6] iommu/mediatek: Add mt8173 IOMMU driver Yong Wu
2015-08-03 10:21   ` Yong Wu
2015-08-03 10:21   ` Yong Wu
2015-08-11 15:39   ` Joerg Roedel
2015-08-11 15:39     ` Joerg Roedel
2015-08-11 15:39     ` Joerg Roedel
2015-08-12 12:28     ` Yong Wu
2015-08-12 12:28       ` Yong Wu
2015-08-12 12:28       ` Yong Wu
2015-09-11 15:33   ` Robin Murphy
2015-09-11 15:33     ` Robin Murphy
2015-09-11 15:33     ` Robin Murphy
2015-09-15  5:53     ` Yong Wu
2015-09-15  5:53       ` Yong Wu
2015-09-15  5:53       ` Yong Wu
2015-08-03 10:21 ` [PATCH v4 6/6] dts: mt8173: Add iommu/smi nodes for mt8173 Yong Wu
2015-08-03 10:21   ` Yong Wu
2015-08-03 10:21   ` Yong Wu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.