All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/4]  Add a DRM driver to support AI Processing Unit (APU)
@ 2021-09-17 12:59 ` Alexandre Bailon
  0 siblings, 0 replies; 34+ messages in thread
From: Alexandre Bailon @ 2021-09-17 12:59 UTC (permalink / raw)
  To: airlied, daniel, robh+dt, matthias.bgg, maarten.lankhorst,
	mripard, tzimmermann, ohad, bjorn.andersson, mathieu.poirier,
	sumit.semwal
  Cc: christian.koenig, dri-devel, devicetree, linux-arm-kernel,
	linux-mediatek, linux-kernel, linux-remoteproc, linux-media,
	linaro-mm-sig, khilman, gpain, Alexandre Bailon

This adds a DRM driver that implements communication between the CPU and an
APU.
This uses VirtIO buffer to exchange messages.
For the data, we allocate a GEM object and map it using IOMMU to make it
available to the APU.
The driver is relatively generic, and should work with any SoC implementing
hardware accelerator for AI if they use support remoteproc and VirtIO.

For the people interested by the firmware or userspace library,
the sources are available here:
https://github.com/BayLibre/open-amp/tree/v2020.01-mtk/apps/examples/apu

This RFC is a rewrite of a previous RFC that was not using DRM:
https://patchwork.kernel.org/project/linux-remoteproc/cover/20200930115350.5272-1-abailon@baylibre.com/

Alexandre Bailon (4):
  dt-bindings: Add bidings for mtk,apu-drm
  DRM: Add support of AI Processor Unit (APU)
  rpmsg: Add support of AI Processor Unit (APU)
  ARM64: mt8183-pumpkin: Add the APU DRM device

 .../devicetree/bindings/gpu/mtk,apu-drm.yaml  |  38 ++
 .../boot/dts/mediatek/mt8183-pumpkin.dts      |   6 +
 drivers/gpu/drm/Kconfig                       |   2 +
 drivers/gpu/drm/Makefile                      |   1 +
 drivers/gpu/drm/apu/Kconfig                   |  10 +
 drivers/gpu/drm/apu/Makefile                  |   7 +
 drivers/gpu/drm/apu/apu_drm_drv.c             | 238 +++++++
 drivers/gpu/drm/apu/apu_gem.c                 | 232 +++++++
 drivers/gpu/drm/apu/apu_internal.h            |  89 +++
 drivers/gpu/drm/apu/apu_sched.c               | 634 ++++++++++++++++++
 drivers/rpmsg/Kconfig                         |  10 +
 drivers/rpmsg/Makefile                        |   1 +
 drivers/rpmsg/apu_rpmsg.c                     | 184 +++++
 include/drm/apu_drm.h                         |  59 ++
 include/uapi/drm/apu_drm.h                    | 106 +++
 15 files changed, 1617 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
 create mode 100644 drivers/gpu/drm/apu/Kconfig
 create mode 100644 drivers/gpu/drm/apu/Makefile
 create mode 100644 drivers/gpu/drm/apu/apu_drm_drv.c
 create mode 100644 drivers/gpu/drm/apu/apu_gem.c
 create mode 100644 drivers/gpu/drm/apu/apu_internal.h
 create mode 100644 drivers/gpu/drm/apu/apu_sched.c
 create mode 100644 drivers/rpmsg/apu_rpmsg.c
 create mode 100644 include/drm/apu_drm.h
 create mode 100644 include/uapi/drm/apu_drm.h

-- 
2.31.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH 0/4] Add a DRM driver to support AI Processing Unit (APU)
@ 2021-09-17 12:59 ` Alexandre Bailon
  0 siblings, 0 replies; 34+ messages in thread
From: Alexandre Bailon @ 2021-09-17 12:59 UTC (permalink / raw)
  To: airlied, daniel, robh+dt, matthias.bgg, maarten.lankhorst,
	mripard, tzimmermann, ohad, bjorn.andersson, mathieu.poirier,
	sumit.semwal
  Cc: christian.koenig, dri-devel, devicetree, linux-arm-kernel,
	linux-mediatek, linux-kernel, linux-remoteproc, linux-media,
	linaro-mm-sig, khilman, gpain, Alexandre Bailon

This adds a DRM driver that implements communication between the CPU and an
APU.
This uses VirtIO buffer to exchange messages.
For the data, we allocate a GEM object and map it using IOMMU to make it
available to the APU.
The driver is relatively generic, and should work with any SoC implementing
hardware accelerator for AI if they use support remoteproc and VirtIO.

For the people interested by the firmware or userspace library,
the sources are available here:
https://github.com/BayLibre/open-amp/tree/v2020.01-mtk/apps/examples/apu

This RFC is a rewrite of a previous RFC that was not using DRM:
https://patchwork.kernel.org/project/linux-remoteproc/cover/20200930115350.5272-1-abailon@baylibre.com/

Alexandre Bailon (4):
  dt-bindings: Add bidings for mtk,apu-drm
  DRM: Add support of AI Processor Unit (APU)
  rpmsg: Add support of AI Processor Unit (APU)
  ARM64: mt8183-pumpkin: Add the APU DRM device

 .../devicetree/bindings/gpu/mtk,apu-drm.yaml  |  38 ++
 .../boot/dts/mediatek/mt8183-pumpkin.dts      |   6 +
 drivers/gpu/drm/Kconfig                       |   2 +
 drivers/gpu/drm/Makefile                      |   1 +
 drivers/gpu/drm/apu/Kconfig                   |  10 +
 drivers/gpu/drm/apu/Makefile                  |   7 +
 drivers/gpu/drm/apu/apu_drm_drv.c             | 238 +++++++
 drivers/gpu/drm/apu/apu_gem.c                 | 232 +++++++
 drivers/gpu/drm/apu/apu_internal.h            |  89 +++
 drivers/gpu/drm/apu/apu_sched.c               | 634 ++++++++++++++++++
 drivers/rpmsg/Kconfig                         |  10 +
 drivers/rpmsg/Makefile                        |   1 +
 drivers/rpmsg/apu_rpmsg.c                     | 184 +++++
 include/drm/apu_drm.h                         |  59 ++
 include/uapi/drm/apu_drm.h                    | 106 +++
 15 files changed, 1617 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
 create mode 100644 drivers/gpu/drm/apu/Kconfig
 create mode 100644 drivers/gpu/drm/apu/Makefile
 create mode 100644 drivers/gpu/drm/apu/apu_drm_drv.c
 create mode 100644 drivers/gpu/drm/apu/apu_gem.c
 create mode 100644 drivers/gpu/drm/apu/apu_internal.h
 create mode 100644 drivers/gpu/drm/apu/apu_sched.c
 create mode 100644 drivers/rpmsg/apu_rpmsg.c
 create mode 100644 include/drm/apu_drm.h
 create mode 100644 include/uapi/drm/apu_drm.h

-- 
2.31.1


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH 0/4] Add a DRM driver to support AI Processing Unit (APU)
@ 2021-09-17 12:59 ` Alexandre Bailon
  0 siblings, 0 replies; 34+ messages in thread
From: Alexandre Bailon @ 2021-09-17 12:59 UTC (permalink / raw)
  To: airlied, daniel, robh+dt, matthias.bgg, maarten.lankhorst,
	mripard, tzimmermann, ohad, bjorn.andersson, mathieu.poirier,
	sumit.semwal
  Cc: christian.koenig, dri-devel, devicetree, linux-arm-kernel,
	linux-mediatek, linux-kernel, linux-remoteproc, linux-media,
	linaro-mm-sig, khilman, gpain, Alexandre Bailon

This adds a DRM driver that implements communication between the CPU and an
APU.
This uses VirtIO buffer to exchange messages.
For the data, we allocate a GEM object and map it using IOMMU to make it
available to the APU.
The driver is relatively generic, and should work with any SoC implementing
hardware accelerator for AI if they use support remoteproc and VirtIO.

For the people interested by the firmware or userspace library,
the sources are available here:
https://github.com/BayLibre/open-amp/tree/v2020.01-mtk/apps/examples/apu

This RFC is a rewrite of a previous RFC that was not using DRM:
https://patchwork.kernel.org/project/linux-remoteproc/cover/20200930115350.5272-1-abailon@baylibre.com/

Alexandre Bailon (4):
  dt-bindings: Add bidings for mtk,apu-drm
  DRM: Add support of AI Processor Unit (APU)
  rpmsg: Add support of AI Processor Unit (APU)
  ARM64: mt8183-pumpkin: Add the APU DRM device

 .../devicetree/bindings/gpu/mtk,apu-drm.yaml  |  38 ++
 .../boot/dts/mediatek/mt8183-pumpkin.dts      |   6 +
 drivers/gpu/drm/Kconfig                       |   2 +
 drivers/gpu/drm/Makefile                      |   1 +
 drivers/gpu/drm/apu/Kconfig                   |  10 +
 drivers/gpu/drm/apu/Makefile                  |   7 +
 drivers/gpu/drm/apu/apu_drm_drv.c             | 238 +++++++
 drivers/gpu/drm/apu/apu_gem.c                 | 232 +++++++
 drivers/gpu/drm/apu/apu_internal.h            |  89 +++
 drivers/gpu/drm/apu/apu_sched.c               | 634 ++++++++++++++++++
 drivers/rpmsg/Kconfig                         |  10 +
 drivers/rpmsg/Makefile                        |   1 +
 drivers/rpmsg/apu_rpmsg.c                     | 184 +++++
 include/drm/apu_drm.h                         |  59 ++
 include/uapi/drm/apu_drm.h                    | 106 +++
 15 files changed, 1617 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
 create mode 100644 drivers/gpu/drm/apu/Kconfig
 create mode 100644 drivers/gpu/drm/apu/Makefile
 create mode 100644 drivers/gpu/drm/apu/apu_drm_drv.c
 create mode 100644 drivers/gpu/drm/apu/apu_gem.c
 create mode 100644 drivers/gpu/drm/apu/apu_internal.h
 create mode 100644 drivers/gpu/drm/apu/apu_sched.c
 create mode 100644 drivers/rpmsg/apu_rpmsg.c
 create mode 100644 include/drm/apu_drm.h
 create mode 100644 include/uapi/drm/apu_drm.h

-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH 0/4] Add a DRM driver to support AI Processing Unit (APU)
@ 2021-09-17 12:59 ` Alexandre Bailon
  0 siblings, 0 replies; 34+ messages in thread
From: Alexandre Bailon @ 2021-09-17 12:59 UTC (permalink / raw)
  To: airlied, daniel, robh+dt, matthias.bgg, maarten.lankhorst,
	mripard, tzimmermann, ohad, bjorn.andersson, mathieu.poirier,
	sumit.semwal
  Cc: christian.koenig, dri-devel, devicetree, linux-arm-kernel,
	linux-mediatek, linux-kernel, linux-remoteproc, linux-media,
	linaro-mm-sig, khilman, gpain, Alexandre Bailon

This adds a DRM driver that implements communication between the CPU and an
APU.
This uses VirtIO buffer to exchange messages.
For the data, we allocate a GEM object and map it using IOMMU to make it
available to the APU.
The driver is relatively generic, and should work with any SoC implementing
hardware accelerator for AI if they use support remoteproc and VirtIO.

For the people interested by the firmware or userspace library,
the sources are available here:
https://github.com/BayLibre/open-amp/tree/v2020.01-mtk/apps/examples/apu

This RFC is a rewrite of a previous RFC that was not using DRM:
https://patchwork.kernel.org/project/linux-remoteproc/cover/20200930115350.5272-1-abailon@baylibre.com/

Alexandre Bailon (4):
  dt-bindings: Add bidings for mtk,apu-drm
  DRM: Add support of AI Processor Unit (APU)
  rpmsg: Add support of AI Processor Unit (APU)
  ARM64: mt8183-pumpkin: Add the APU DRM device

 .../devicetree/bindings/gpu/mtk,apu-drm.yaml  |  38 ++
 .../boot/dts/mediatek/mt8183-pumpkin.dts      |   6 +
 drivers/gpu/drm/Kconfig                       |   2 +
 drivers/gpu/drm/Makefile                      |   1 +
 drivers/gpu/drm/apu/Kconfig                   |  10 +
 drivers/gpu/drm/apu/Makefile                  |   7 +
 drivers/gpu/drm/apu/apu_drm_drv.c             | 238 +++++++
 drivers/gpu/drm/apu/apu_gem.c                 | 232 +++++++
 drivers/gpu/drm/apu/apu_internal.h            |  89 +++
 drivers/gpu/drm/apu/apu_sched.c               | 634 ++++++++++++++++++
 drivers/rpmsg/Kconfig                         |  10 +
 drivers/rpmsg/Makefile                        |   1 +
 drivers/rpmsg/apu_rpmsg.c                     | 184 +++++
 include/drm/apu_drm.h                         |  59 ++
 include/uapi/drm/apu_drm.h                    | 106 +++
 15 files changed, 1617 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
 create mode 100644 drivers/gpu/drm/apu/Kconfig
 create mode 100644 drivers/gpu/drm/apu/Makefile
 create mode 100644 drivers/gpu/drm/apu/apu_drm_drv.c
 create mode 100644 drivers/gpu/drm/apu/apu_gem.c
 create mode 100644 drivers/gpu/drm/apu/apu_internal.h
 create mode 100644 drivers/gpu/drm/apu/apu_sched.c
 create mode 100644 drivers/rpmsg/apu_rpmsg.c
 create mode 100644 include/drm/apu_drm.h
 create mode 100644 include/uapi/drm/apu_drm.h

-- 
2.31.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC PATCH 1/4] dt-bindings: Add bidings for mtk,apu-drm
  2021-09-17 12:59 ` Alexandre Bailon
  (?)
@ 2021-09-17 12:59   ` Alexandre Bailon
  -1 siblings, 0 replies; 34+ messages in thread
From: Alexandre Bailon @ 2021-09-17 12:59 UTC (permalink / raw)
  To: airlied, daniel, robh+dt, matthias.bgg, maarten.lankhorst,
	mripard, tzimmermann, ohad, bjorn.andersson, mathieu.poirier,
	sumit.semwal
  Cc: christian.koenig, dri-devel, devicetree, linux-arm-kernel,
	linux-mediatek, linux-kernel, linux-remoteproc, linux-media,
	linaro-mm-sig, khilman, gpain, Alexandre Bailon

This adds the device tree bindings for the APU DRM driver.

Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
---
 .../devicetree/bindings/gpu/mtk,apu-drm.yaml  | 38 +++++++++++++++++++
 1 file changed, 38 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml

diff --git a/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
new file mode 100644
index 0000000000000..6f432d3ea478c
--- /dev/null
+++ b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
@@ -0,0 +1,38 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/gpu/mediatek,apu-drm.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: AI Processor Unit DRM
+
+properties:
+  compatible:
+    const: mediatek,apu-drm
+
+  remoteproc:
+    maxItems: 2
+    description:
+      Handle to remoteproc devices controlling the APU
+
+  iova:
+    maxItems: 1
+    description:
+      Address and size of virtual memory that could used by the APU
+
+required:
+  - compatible
+  - remoteproc
+  - iova
+
+additionalProperties: false
+
+examples:
+  - |
+    apu@0 {
+      compatible = "mediatek,apu-drm";
+      remoteproc = <&vpu0>, <&vpu1>;
+      iova = <0 0x60000000 0 0x10000000>;
+    };
+
+...
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 1/4] dt-bindings: Add bidings for mtk,apu-drm
@ 2021-09-17 12:59   ` Alexandre Bailon
  0 siblings, 0 replies; 34+ messages in thread
From: Alexandre Bailon @ 2021-09-17 12:59 UTC (permalink / raw)
  To: airlied, daniel, robh+dt, matthias.bgg, maarten.lankhorst,
	mripard, tzimmermann, ohad, bjorn.andersson, mathieu.poirier,
	sumit.semwal
  Cc: christian.koenig, dri-devel, devicetree, linux-arm-kernel,
	linux-mediatek, linux-kernel, linux-remoteproc, linux-media,
	linaro-mm-sig, khilman, gpain, Alexandre Bailon

This adds the device tree bindings for the APU DRM driver.

Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
---
 .../devicetree/bindings/gpu/mtk,apu-drm.yaml  | 38 +++++++++++++++++++
 1 file changed, 38 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml

diff --git a/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
new file mode 100644
index 0000000000000..6f432d3ea478c
--- /dev/null
+++ b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
@@ -0,0 +1,38 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/gpu/mediatek,apu-drm.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: AI Processor Unit DRM
+
+properties:
+  compatible:
+    const: mediatek,apu-drm
+
+  remoteproc:
+    maxItems: 2
+    description:
+      Handle to remoteproc devices controlling the APU
+
+  iova:
+    maxItems: 1
+    description:
+      Address and size of virtual memory that could used by the APU
+
+required:
+  - compatible
+  - remoteproc
+  - iova
+
+additionalProperties: false
+
+examples:
+  - |
+    apu@0 {
+      compatible = "mediatek,apu-drm";
+      remoteproc = <&vpu0>, <&vpu1>;
+      iova = <0 0x60000000 0 0x10000000>;
+    };
+
+...
-- 
2.31.1


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 1/4] dt-bindings: Add bidings for mtk,apu-drm
@ 2021-09-17 12:59   ` Alexandre Bailon
  0 siblings, 0 replies; 34+ messages in thread
From: Alexandre Bailon @ 2021-09-17 12:59 UTC (permalink / raw)
  To: airlied, daniel, robh+dt, matthias.bgg, maarten.lankhorst,
	mripard, tzimmermann, ohad, bjorn.andersson, mathieu.poirier,
	sumit.semwal
  Cc: christian.koenig, dri-devel, devicetree, linux-arm-kernel,
	linux-mediatek, linux-kernel, linux-remoteproc, linux-media,
	linaro-mm-sig, khilman, gpain, Alexandre Bailon

This adds the device tree bindings for the APU DRM driver.

Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
---
 .../devicetree/bindings/gpu/mtk,apu-drm.yaml  | 38 +++++++++++++++++++
 1 file changed, 38 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml

diff --git a/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
new file mode 100644
index 0000000000000..6f432d3ea478c
--- /dev/null
+++ b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
@@ -0,0 +1,38 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/gpu/mediatek,apu-drm.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: AI Processor Unit DRM
+
+properties:
+  compatible:
+    const: mediatek,apu-drm
+
+  remoteproc:
+    maxItems: 2
+    description:
+      Handle to remoteproc devices controlling the APU
+
+  iova:
+    maxItems: 1
+    description:
+      Address and size of virtual memory that could used by the APU
+
+required:
+  - compatible
+  - remoteproc
+  - iova
+
+additionalProperties: false
+
+examples:
+  - |
+    apu@0 {
+      compatible = "mediatek,apu-drm";
+      remoteproc = <&vpu0>, <&vpu1>;
+      iova = <0 0x60000000 0 0x10000000>;
+    };
+
+...
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 2/4] DRM: Add support of AI Processor Unit (APU)
  2021-09-17 12:59 ` Alexandre Bailon
  (?)
@ 2021-09-17 12:59   ` Alexandre Bailon
  -1 siblings, 0 replies; 34+ messages in thread
From: Alexandre Bailon @ 2021-09-17 12:59 UTC (permalink / raw)
  To: airlied, daniel, robh+dt, matthias.bgg, maarten.lankhorst,
	mripard, tzimmermann, ohad, bjorn.andersson, mathieu.poirier,
	sumit.semwal
  Cc: christian.koenig, dri-devel, devicetree, linux-arm-kernel,
	linux-mediatek, linux-kernel, linux-remoteproc, linux-media,
	linaro-mm-sig, khilman, gpain, Alexandre Bailon

Some Mediatek SoC provides hardware accelerator for AI / ML.
This driver provides the infrastructure to manage memory
shared between host CPU and the accelerator, and to submit
jobs to the accelerator.
The APU itself is managed by remoteproc so this drivers
relies on remoteproc to found the APU and get some important data
from it. But, the driver is quite generic and it should possible
to manage accelerator using another ways.
This driver doesn't manage itself the data transmitions.
It must be registered by another driver implementing the transmitions.

Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
---
 drivers/gpu/drm/Kconfig            |   2 +
 drivers/gpu/drm/Makefile           |   1 +
 drivers/gpu/drm/apu/Kconfig        |  10 +
 drivers/gpu/drm/apu/Makefile       |   7 +
 drivers/gpu/drm/apu/apu_drm_drv.c  | 238 +++++++++++
 drivers/gpu/drm/apu/apu_gem.c      | 232 +++++++++++
 drivers/gpu/drm/apu/apu_internal.h |  89 ++++
 drivers/gpu/drm/apu/apu_sched.c    | 634 +++++++++++++++++++++++++++++
 include/drm/apu_drm.h              |  59 +++
 include/uapi/drm/apu_drm.h         | 106 +++++
 10 files changed, 1378 insertions(+)
 create mode 100644 drivers/gpu/drm/apu/Kconfig
 create mode 100644 drivers/gpu/drm/apu/Makefile
 create mode 100644 drivers/gpu/drm/apu/apu_drm_drv.c
 create mode 100644 drivers/gpu/drm/apu/apu_gem.c
 create mode 100644 drivers/gpu/drm/apu/apu_internal.h
 create mode 100644 drivers/gpu/drm/apu/apu_sched.c
 create mode 100644 include/drm/apu_drm.h
 create mode 100644 include/uapi/drm/apu_drm.h

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 8fc40317f2b77..bcdca35c9eda5 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -382,6 +382,8 @@ source "drivers/gpu/drm/xlnx/Kconfig"
 
 source "drivers/gpu/drm/gud/Kconfig"
 
+source "drivers/gpu/drm/apu/Kconfig"
+
 config DRM_HYPERV
 	tristate "DRM Support for Hyper-V synthetic video device"
 	depends on DRM && PCI && MMU && HYPERV
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index ad11121548983..f3d8432976558 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -127,4 +127,5 @@ obj-$(CONFIG_DRM_MCDE) += mcde/
 obj-$(CONFIG_DRM_TIDSS) += tidss/
 obj-y			+= xlnx/
 obj-y			+= gud/
+obj-$(CONFIG_DRM_APU) += apu/
 obj-$(CONFIG_DRM_HYPERV) += hyperv/
diff --git a/drivers/gpu/drm/apu/Kconfig b/drivers/gpu/drm/apu/Kconfig
new file mode 100644
index 0000000000000..c8471309a0351
--- /dev/null
+++ b/drivers/gpu/drm/apu/Kconfig
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+
+config DRM_APU
+	tristate "APU (AI Processor Unit)"
+	select REMOTEPROC
+	select DRM_SCHED
+	help
+	  This provides a DRM driver that provides some facilities to
+	  communicate with an accelerated processing unit (APU).
diff --git a/drivers/gpu/drm/apu/Makefile b/drivers/gpu/drm/apu/Makefile
new file mode 100644
index 0000000000000..3e97846b091c9
--- /dev/null
+++ b/drivers/gpu/drm/apu/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0
+
+apu_drm-y += apu_drm_drv.o
+apu_drm-y += apu_sched.o
+apu_drm-y += apu_gem.o
+
+obj-$(CONFIG_DRM_APU) += apu_drm.o
diff --git a/drivers/gpu/drm/apu/apu_drm_drv.c b/drivers/gpu/drm/apu/apu_drm_drv.c
new file mode 100644
index 0000000000000..91d8c99e373c0
--- /dev/null
+++ b/drivers/gpu/drm/apu/apu_drm_drv.c
@@ -0,0 +1,238 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Copyright 2020 BayLibre SAS
+
+#include <linux/dma-map-ops.h>
+#include <linux/dma-mapping.h>
+#include <linux/iommu.h>
+#include <linux/iova.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/remoteproc.h>
+
+#include <drm/apu_drm.h>
+#include <drm/drm_drv.h>
+#include <drm/drm_gem_cma_helper.h>
+#include <drm/drm_probe_helper.h>
+
+#include <uapi/drm/apu_drm.h>
+
+#include "apu_internal.h"
+
+static LIST_HEAD(apu_devices);
+
+static const struct drm_ioctl_desc ioctls[] = {
+	DRM_IOCTL_DEF_DRV(APU_GEM_NEW, ioctl_gem_new,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(APU_GEM_QUEUE, ioctl_gem_queue,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(APU_GEM_DEQUEUE, ioctl_gem_dequeue,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_MAP, ioctl_gem_iommu_map,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_UNMAP, ioctl_gem_iommu_unmap,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(APU_STATE, ioctl_apu_state,
+			  DRM_RENDER_ALLOW),
+};
+
+DEFINE_DRM_GEM_CMA_FOPS(apu_drm_ops);
+
+static struct drm_driver apu_drm_driver = {
+	.driver_features = DRIVER_GEM | DRIVER_SYNCOBJ,
+	.name = "drm_apu",
+	.desc = "APU DRM driver",
+	.date = "20210319",
+	.major = 1,
+	.minor = 0,
+	.patchlevel = 0,
+	.ioctls = ioctls,
+	.num_ioctls = ARRAY_SIZE(ioctls),
+	.fops = &apu_drm_ops,
+	DRM_GEM_CMA_DRIVER_OPS_WITH_DUMB_CREATE(drm_gem_cma_dumb_create),
+};
+
+void *apu_drm_priv(struct apu_core *apu_core)
+{
+	return apu_core->priv;
+}
+EXPORT_SYMBOL_GPL(apu_drm_priv);
+
+int apu_drm_reserve_iova(struct apu_core *apu_core, u64 start, u64 size)
+{
+	struct apu_drm *apu_drm = apu_core->apu_drm;
+	struct iova *iova;
+
+	iova = reserve_iova(&apu_drm->iovad, PHYS_PFN(start),
+			    PHYS_PFN(start + size));
+	if (!iova)
+		return -ENOMEM;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(apu_drm_reserve_iova);
+
+static int apu_drm_init_first_core(struct apu_drm *apu_drm,
+				   struct apu_core *apu_core)
+{
+	struct drm_device *drm;
+	struct device *parent;
+	u64 mask;
+
+	drm = apu_drm->drm;
+	parent = apu_core->rproc->dev.parent;
+	drm->dev->iommu_group = parent->iommu_group;
+	apu_drm->domain = iommu_get_domain_for_dev(parent);
+	set_dma_ops(drm->dev, get_dma_ops(parent));
+	mask = dma_get_mask(parent);
+	return dma_coerce_mask_and_coherent(drm->dev, mask);
+}
+
+struct apu_core *apu_drm_register_core(struct rproc *rproc,
+				       struct apu_drm_ops *ops, void *priv)
+{
+	struct apu_drm *apu_drm;
+	struct apu_core *apu_core;
+	int ret;
+
+	list_for_each_entry(apu_drm, &apu_devices, node) {
+		list_for_each_entry(apu_core, &apu_drm->apu_cores, node) {
+			if (apu_core->rproc == rproc) {
+				ret =
+				    apu_drm_init_first_core(apu_drm, apu_core);
+				apu_core->dev = &rproc->dev;
+				apu_core->priv = priv;
+				apu_core->ops = ops;
+
+				ret = apu_drm_job_init(apu_core);
+				if (ret)
+					return NULL;
+
+				return apu_core;
+			}
+		}
+	}
+
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(apu_drm_register_core);
+
+int apu_drm_unregister_core(void *priv)
+{
+	struct apu_drm *apu_drm;
+	struct apu_core *apu_core;
+
+	list_for_each_entry(apu_drm, &apu_devices, node) {
+		list_for_each_entry(apu_core, &apu_drm->apu_cores, node) {
+			if (apu_core->priv == priv) {
+				apu_sched_fini(apu_core);
+				apu_core->priv = NULL;
+				apu_core->ops = NULL;
+			}
+		}
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(apu_drm_unregister_core);
+
+#ifdef CONFIG_OF
+static const struct of_device_id apu_platform_of_match[] = {
+	{ .compatible = "mediatek,apu-drm", },
+	{ },
+};
+
+MODULE_DEVICE_TABLE(of, apu_platform_of_match);
+#endif
+
+static int apu_platform_probe(struct platform_device *pdev)
+{
+	struct drm_device *drm;
+	struct apu_drm *apu_drm;
+	struct of_phandle_iterator it;
+	int index = 0;
+	u64 iova[2];
+	int ret;
+
+	apu_drm = devm_kzalloc(&pdev->dev, sizeof(*apu_drm), GFP_KERNEL);
+	if (!apu_drm)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&apu_drm->apu_cores);
+
+	of_phandle_iterator_init(&it, pdev->dev.of_node, "remoteproc", NULL, 0);
+	while (of_phandle_iterator_next(&it) == 0) {
+		struct rproc *rproc = rproc_get_by_phandle(it.phandle);
+		struct apu_core *apu_core;
+
+		if (!rproc)
+			return -EPROBE_DEFER;
+
+		apu_core = devm_kzalloc(&pdev->dev, sizeof(*apu_core),
+					GFP_KERNEL);
+		if (!apu_core)
+			return -ENOMEM;
+
+		apu_core->rproc = rproc;
+		apu_core->device_id = index++;
+		apu_core->apu_drm = apu_drm;
+		spin_lock_init(&apu_core->ctx_lock);
+		INIT_LIST_HEAD(&apu_core->requests);
+		list_add(&apu_core->node, &apu_drm->apu_cores);
+	}
+
+	if (of_property_read_variable_u64_array(pdev->dev.of_node, "iova",
+						iova, ARRAY_SIZE(iova),
+						ARRAY_SIZE(iova)) !=
+	    ARRAY_SIZE(iova))
+		return -EINVAL;
+
+	init_iova_domain(&apu_drm->iovad, PAGE_SIZE, PHYS_PFN(iova[0]));
+	apu_drm->iova_limit_pfn = PHYS_PFN(iova[0] + iova[1]) - 1;
+
+	drm = drm_dev_alloc(&apu_drm_driver, &pdev->dev);
+	if (IS_ERR(drm)) {
+		ret = PTR_ERR(drm);
+		return ret;
+	}
+
+	ret = drm_dev_register(drm, 0);
+	if (ret) {
+		drm_dev_put(drm);
+		return ret;
+	}
+
+	drm->dev_private = apu_drm;
+	apu_drm->drm = drm;
+	apu_drm->dev = &pdev->dev;
+
+	platform_set_drvdata(pdev, drm);
+
+	list_add(&apu_drm->node, &apu_devices);
+
+	return 0;
+}
+
+static int apu_platform_remove(struct platform_device *pdev)
+{
+	struct drm_device *drm;
+
+	drm = platform_get_drvdata(pdev);
+
+	drm_dev_unregister(drm);
+	drm_dev_put(drm);
+
+	return 0;
+}
+
+static struct platform_driver apu_platform_driver = {
+	.probe = apu_platform_probe,
+	.remove = apu_platform_remove,
+	.driver = {
+		   .name = "apu_drm",
+		   .of_match_table = of_match_ptr(apu_platform_of_match),
+	},
+};
+
+module_platform_driver(apu_platform_driver);
diff --git a/drivers/gpu/drm/apu/apu_gem.c b/drivers/gpu/drm/apu/apu_gem.c
new file mode 100644
index 0000000000000..c867143dab436
--- /dev/null
+++ b/drivers/gpu/drm/apu/apu_gem.c
@@ -0,0 +1,232 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Copyright 2020 BayLibre SAS
+
+#include <asm/cacheflush.h>
+
+#include <linux/dma-buf.h>
+#include <linux/dma-mapping.h>
+#include <linux/highmem.h>
+#include <linux/iommu.h>
+#include <linux/iova.h>
+#include <linux/mm.h>
+#include <linux/swap.h>
+
+#include <drm/drm_drv.h>
+#include <drm/drm_gem_cma_helper.h>
+
+#include <uapi/drm/apu_drm.h>
+
+#include "apu_internal.h"
+
+struct drm_gem_object *apu_gem_create_object(struct drm_device *dev,
+					     size_t size)
+{
+	struct drm_gem_cma_object *cma_obj;
+
+	cma_obj = drm_gem_cma_create(dev, size);
+	if (!cma_obj)
+		return NULL;
+
+	return &cma_obj->base;
+}
+
+int ioctl_gem_new(struct drm_device *dev, void *data,
+		  struct drm_file *file_priv)
+{
+	struct drm_apu_gem_new *args = data;
+	struct drm_gem_cma_object *cma_obj;
+	struct apu_gem_object *apu_obj;
+	struct drm_gem_object *gem_obj;
+	int ret;
+
+	cma_obj = drm_gem_cma_create(dev, args->size);
+	if (IS_ERR(cma_obj))
+		return PTR_ERR(cma_obj);
+
+	gem_obj = &cma_obj->base;
+	apu_obj = to_apu_bo(gem_obj);
+
+	/*
+	 * Save the size of buffer expected by application instead of the
+	 * aligned one.
+	 */
+	apu_obj->size = args->size;
+	apu_obj->offset = 0;
+	apu_obj->iommu_refcount = 0;
+	mutex_init(&apu_obj->mutex);
+
+	ret = drm_gem_handle_create(file_priv, gem_obj, &args->handle);
+	drm_gem_object_put(gem_obj);
+	if (ret) {
+		drm_gem_cma_free_object(gem_obj);
+		return ret;
+	}
+	args->offset = drm_vma_node_offset_addr(&gem_obj->vma_node);
+
+	return 0;
+}
+
+void apu_bo_iommu_unmap(struct apu_drm *apu_drm, struct apu_gem_object *obj)
+{
+	int iova_pfn;
+	int i;
+
+	if (!obj->iommu_sgt)
+		return;
+
+	mutex_lock(&obj->mutex);
+	obj->iommu_refcount--;
+	if (obj->iommu_refcount) {
+		mutex_unlock(&obj->mutex);
+		return;
+	}
+
+	iova_pfn = PHYS_PFN(obj->iova);
+	for (i = 0; i < obj->iommu_sgt->nents; i++) {
+		iommu_unmap(apu_drm->domain, PFN_PHYS(iova_pfn),
+			    PAGE_ALIGN(obj->iommu_sgt->sgl[i].length));
+		iova_pfn += PHYS_PFN(PAGE_ALIGN(obj->iommu_sgt->sgl[i].length));
+	}
+
+	sg_free_table(obj->iommu_sgt);
+	kfree(obj->iommu_sgt);
+
+	free_iova(&apu_drm->iovad, PHYS_PFN(obj->iova));
+	mutex_unlock(&obj->mutex);
+}
+
+static struct sg_table *apu_get_sg_table(struct drm_gem_object *obj)
+{
+	if (obj->funcs)
+		return obj->funcs->get_sg_table(obj);
+	return NULL;
+}
+
+int apu_bo_iommu_map(struct apu_drm *apu_drm, struct drm_gem_object *obj)
+{
+	struct apu_gem_object *apu_obj = to_apu_bo(obj);
+	struct scatterlist *sgl;
+	phys_addr_t phys;
+	int total_buf_space;
+	int iova_pfn;
+	int iova;
+	int ret;
+	int i;
+
+	mutex_lock(&apu_obj->mutex);
+	apu_obj->iommu_refcount++;
+	if (apu_obj->iommu_refcount != 1) {
+		mutex_unlock(&apu_obj->mutex);
+		return 0;
+	}
+
+	apu_obj->iommu_sgt = apu_get_sg_table(obj);
+	if (IS_ERR(apu_obj->iommu_sgt)) {
+		mutex_unlock(&apu_obj->mutex);
+		return PTR_ERR(apu_obj->iommu_sgt);
+	}
+
+	total_buf_space = obj->size;
+	iova_pfn = alloc_iova_fast(&apu_drm->iovad,
+				   total_buf_space >> PAGE_SHIFT,
+				   apu_drm->iova_limit_pfn, true);
+	apu_obj->iova = PFN_PHYS(iova_pfn);
+
+	if (!iova_pfn) {
+		dev_err(apu_drm->dev, "Failed to allocate iova address\n");
+		mutex_unlock(&apu_obj->mutex);
+		return -ENOMEM;
+	}
+
+	iova = apu_obj->iova;
+	sgl = apu_obj->iommu_sgt->sgl;
+	for (i = 0; i < apu_obj->iommu_sgt->nents; i++) {
+		phys = page_to_phys(sg_page(&sgl[i]));
+		ret =
+		    iommu_map(apu_drm->domain, PFN_PHYS(iova_pfn), phys,
+			      PAGE_ALIGN(sgl[i].length),
+			      IOMMU_READ | IOMMU_WRITE);
+		if (ret) {
+			dev_err(apu_drm->dev, "Failed to iommu map\n");
+			free_iova(&apu_drm->iovad, iova_pfn);
+			mutex_unlock(&apu_obj->mutex);
+			return ret;
+		}
+		iova += sgl[i].offset + sgl[i].length;
+		iova_pfn += PHYS_PFN(PAGE_ALIGN(sgl[i].length));
+	}
+	mutex_unlock(&apu_obj->mutex);
+
+	return 0;
+}
+
+int ioctl_gem_iommu_map(struct drm_device *dev, void *data,
+			struct drm_file *file_priv)
+{
+	struct apu_drm *apu_drm = dev->dev_private;
+	struct drm_apu_gem_iommu_map *args = data;
+	struct drm_gem_object **bos;
+	void __user *bo_handles;
+	int ret;
+	int i;
+
+	u64 *das = kvmalloc_array(args->bo_handle_count,
+				  sizeof(u64), GFP_KERNEL);
+	if (!das)
+		return -ENOMEM;
+
+	bo_handles = (void __user *)(uintptr_t) args->bo_handles;
+	ret = drm_gem_objects_lookup(file_priv, bo_handles,
+				     args->bo_handle_count, &bos);
+	if (ret) {
+		kvfree(das);
+		return ret;
+	}
+
+	for (i = 0; i < args->bo_handle_count; i++) {
+		ret = apu_bo_iommu_map(apu_drm, bos[i]);
+		if (ret) {
+			/* TODO: handle error */
+			break;
+		}
+		das[i] = to_apu_bo(bos[i])->iova + to_apu_bo(bos[i])->offset;
+	}
+
+	if (copy_to_user((void *)args->bo_device_addresses, das,
+			 args->bo_handle_count * sizeof(u64))) {
+		ret = -EFAULT;
+		DRM_DEBUG("Failed to copy device addresses\n");
+		goto out;
+	}
+
+out:
+	kvfree(das);
+	kvfree(bos);
+
+	return 0;
+}
+
+int ioctl_gem_iommu_unmap(struct drm_device *dev, void *data,
+			  struct drm_file *file_priv)
+{
+	struct apu_drm *apu_drm = dev->dev_private;
+	struct drm_apu_gem_iommu_map *args = data;
+	struct drm_gem_object **bos;
+	void __user *bo_handles;
+	int ret;
+	int i;
+
+	bo_handles = (void __user *)(uintptr_t) args->bo_handles;
+	ret = drm_gem_objects_lookup(file_priv, bo_handles,
+				     args->bo_handle_count, &bos);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < args->bo_handle_count; i++)
+		apu_bo_iommu_unmap(apu_drm, to_apu_bo(bos[i]));
+
+	kvfree(bos);
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/apu/apu_internal.h b/drivers/gpu/drm/apu/apu_internal.h
new file mode 100644
index 0000000000000..b789b2f3ad9c6
--- /dev/null
+++ b/drivers/gpu/drm/apu/apu_internal.h
@@ -0,0 +1,89 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __APU_INTERNAL_H__
+#define __APU_INTERNAL_H__
+
+#include <linux/iova.h>
+
+#include <drm/drm_drv.h>
+#include <drm/drm_gem_cma_helper.h>
+#include <drm/gpu_scheduler.h>
+
+struct apu_gem_object {
+	struct drm_gem_cma_object base;
+	struct mutex mutex;
+	struct sg_table *iommu_sgt;
+	int iommu_refcount;
+	size_t size;
+	u32 iova;
+	u32 offset;
+};
+
+struct apu_sched;
+struct apu_core {
+	int device_id;
+	struct device *dev;
+	struct rproc *rproc;
+	struct apu_drm_ops *ops;
+	struct apu_drm *apu_drm;
+
+	spinlock_t ctx_lock;
+	struct list_head requests;
+
+	struct list_head node;
+	void *priv;
+
+	struct apu_sched *sched;
+	u32 flags;
+};
+
+struct apu_drm {
+	struct device *dev;
+	struct drm_device *drm;
+
+	struct iommu_domain *domain;
+	struct iova_domain iovad;
+	int iova_limit_pfn;
+
+	struct list_head apu_cores;
+	struct list_head node;
+};
+
+static inline struct apu_gem_object *to_apu_bo(struct drm_gem_object *obj)
+{
+	return container_of(to_drm_gem_cma_obj(obj), struct apu_gem_object,
+			    base);
+}
+
+struct apu_gem_object *to_apu_bo(struct drm_gem_object *obj);
+struct drm_gem_object *apu_gem_create_object(struct drm_device *dev,
+					     size_t size);
+
+int apu_bo_iommu_map(struct apu_drm *apu_drm, struct drm_gem_object *obj);
+void apu_bo_iommu_unmap(struct apu_drm *apu_drm, struct apu_gem_object *obj);
+struct drm_gem_object *apu_gem_create_object(struct drm_device *dev,
+					     size_t size);
+int ioctl_gem_new(struct drm_device *dev, void *data,
+		  struct drm_file *file_priv);
+int ioctl_gem_user_new(struct drm_device *dev, void *data,
+		       struct drm_file *file_priv);
+int ioctl_gem_iommu_map(struct drm_device *dev, void *data,
+			struct drm_file *file_priv);
+int ioctl_gem_iommu_unmap(struct drm_device *dev, void *data,
+			  struct drm_file *file_priv);
+int ioctl_gem_queue(struct drm_device *dev, void *data,
+		    struct drm_file *file_priv);
+int ioctl_gem_dequeue(struct drm_device *dev, void *data,
+		      struct drm_file *file_priv);
+int ioctl_apu_state(struct drm_device *dev, void *data,
+		    struct drm_file *file_priv);
+struct dma_buf *apu_gem_prime_export(struct drm_gem_object *gem,
+				     int flags);
+
+struct apu_job;
+
+int apu_drm_job_init(struct apu_core *core);
+void apu_sched_fini(struct apu_core *core);
+int apu_job_push(struct apu_job *job);
+void apu_job_put(struct apu_job *job);
+
+#endif /* __APU_INTERNAL_H__ */
diff --git a/drivers/gpu/drm/apu/apu_sched.c b/drivers/gpu/drm/apu/apu_sched.c
new file mode 100644
index 0000000000000..cebb0155c7783
--- /dev/null
+++ b/drivers/gpu/drm/apu/apu_sched.c
@@ -0,0 +1,634 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Copyright 2020 BayLibre SAS
+
+#include <drm/apu_drm.h>
+#include <drm/drm_drv.h>
+#include <drm/drm_gem_cma_helper.h>
+#include <drm/drm_syncobj.h>
+#include <drm/gpu_scheduler.h>
+
+#include <uapi/drm/apu_drm.h>
+
+#include "apu_internal.h"
+
+struct apu_queue_state {
+	struct drm_gpu_scheduler sched;
+
+	u64 fence_context;
+	u64 seqno;
+};
+
+struct apu_request {
+	struct list_head node;
+	void *job;
+};
+
+struct apu_sched {
+	struct apu_queue_state apu_queue;
+	spinlock_t job_lock;
+	struct drm_sched_entity sched_entity;
+};
+
+struct apu_event {
+	struct drm_pending_event pending_event;
+	union {
+		struct drm_event base;
+		struct apu_job_event job_event;
+	};
+};
+
+struct apu_job {
+	struct drm_sched_job base;
+
+	struct kref refcount;
+
+	struct apu_core *apu_core;
+	struct apu_drm *apu_drm;
+
+	/* Fence to be signaled by IRQ handler when the job is complete. */
+	struct dma_fence *done_fence;
+
+	__u32 cmd;
+
+	/* Exclusive fences we have taken from the BOs to wait for */
+	struct dma_fence **implicit_fences;
+	struct drm_gem_object **bos;
+	u32 bo_count;
+
+	/* Fence to be signaled by drm-sched once its done with the job */
+	struct dma_fence *render_done_fence;
+
+	void *data_in;
+	uint16_t size_in;
+	void *data_out;
+	uint16_t size_out;
+	uint16_t result;
+	uint16_t id;
+
+	struct list_head node;
+	struct drm_syncobj *sync_out;
+
+	struct apu_event *event;
+};
+
+static DEFINE_IDA(req_ida);
+static LIST_HEAD(complete_node);
+
+int apu_drm_callback(struct apu_core *apu_core, void *data, int len)
+{
+	struct apu_request *apu_req, *tmp;
+	struct apu_dev_request *hdr = data;
+	unsigned long flags;
+
+	spin_lock_irqsave(&apu_core->ctx_lock, flags);
+	list_for_each_entry_safe(apu_req, tmp, &apu_core->requests, node) {
+		struct apu_job *job = apu_req->job;
+
+		if (job && hdr->id == job->id) {
+			kref_get(&job->refcount);
+			job->result = hdr->result;
+			if (job->size_out)
+				memcpy(job->data_out, hdr->data + job->size_in,
+				       min(job->size_out, hdr->size_out));
+			job->size_out = hdr->size_out;
+			list_add(&job->node, &complete_node);
+			list_del(&apu_req->node);
+			ida_simple_remove(&req_ida, hdr->id);
+			kfree(apu_req);
+			drm_send_event(job->apu_drm->drm,
+				       &job->event->pending_event);
+			dma_fence_signal_locked(job->done_fence);
+		}
+	}
+	spin_unlock_irqrestore(&apu_core->ctx_lock, flags);
+
+	return 0;
+}
+
+void apu_sched_fini(struct apu_core *core)
+{
+	drm_sched_fini(&core->sched->apu_queue.sched);
+	devm_kfree(core->dev, core->sched);
+	core->flags &= ~APU_ONLINE;
+	core->sched = NULL;
+}
+
+static void apu_job_cleanup(struct kref *ref)
+{
+	struct apu_job *job = container_of(ref, struct apu_job,
+					   refcount);
+	unsigned int i;
+
+	if (job->implicit_fences) {
+		for (i = 0; i < job->bo_count; i++)
+			dma_fence_put(job->implicit_fences[i]);
+		kvfree(job->implicit_fences);
+	}
+	dma_fence_put(job->done_fence);
+	dma_fence_put(job->render_done_fence);
+
+	if (job->bos) {
+		for (i = 0; i < job->bo_count; i++) {
+			struct apu_gem_object *apu_obj;
+
+			apu_obj = to_apu_bo(job->bos[i]);
+			apu_bo_iommu_unmap(job->apu_drm, apu_obj);
+			drm_gem_object_put(job->bos[i]);
+		}
+
+		kvfree(job->bos);
+	}
+
+	kfree(job->data_out);
+	kfree(job->data_in);
+	kfree(job);
+}
+
+void apu_job_put(struct apu_job *job)
+{
+	kref_put(&job->refcount, apu_job_cleanup);
+}
+
+static void apu_acquire_object_fences(struct drm_gem_object **bos,
+				      int bo_count,
+				      struct dma_fence **implicit_fences)
+{
+	int i;
+
+	for (i = 0; i < bo_count; i++)
+		implicit_fences[i] = dma_resv_get_excl_unlocked(bos[i]->resv);
+}
+
+static void apu_attach_object_fences(struct drm_gem_object **bos,
+				     int bo_count, struct dma_fence *fence)
+{
+	int i;
+
+	for (i = 0; i < bo_count; i++)
+		dma_resv_add_excl_fence(bos[i]->resv, fence);
+}
+
+int apu_job_push(struct apu_job *job)
+{
+	struct drm_sched_entity *entity = &job->apu_core->sched->sched_entity;
+	struct ww_acquire_ctx acquire_ctx;
+	int ret = 0;
+
+	ret = drm_gem_lock_reservations(job->bos, job->bo_count, &acquire_ctx);
+	if (ret)
+		return ret;
+
+	ret = drm_sched_job_init(&job->base, entity, NULL);
+	if (ret)
+		goto unlock;
+
+	job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
+
+	kref_get(&job->refcount);	/* put by scheduler job completion */
+
+	apu_acquire_object_fences(job->bos, job->bo_count,
+				  job->implicit_fences);
+
+	drm_sched_entity_push_job(&job->base, entity);
+
+	apu_attach_object_fences(job->bos, job->bo_count,
+				 job->render_done_fence);
+
+unlock:
+	drm_gem_unlock_reservations(job->bos, job->bo_count, &acquire_ctx);
+
+	return ret;
+}
+
+static const char *apu_fence_get_driver_name(struct dma_fence *fence)
+{
+	return "apu";
+}
+
+static const char *apu_fence_get_timeline_name(struct dma_fence *fence)
+{
+	return "apu-0";
+}
+
+static void apu_fence_release(struct dma_fence *f)
+{
+	kfree(f);
+}
+
+static const struct dma_fence_ops apu_fence_ops = {
+	.get_driver_name = apu_fence_get_driver_name,
+	.get_timeline_name = apu_fence_get_timeline_name,
+	.release = apu_fence_release,
+};
+
+static struct dma_fence *apu_fence_create(struct apu_sched *sched)
+{
+	struct dma_fence *fence;
+	struct apu_queue_state *apu_queue = &sched->apu_queue;
+
+	fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+	if (!fence)
+		return ERR_PTR(-ENOMEM);
+
+	dma_fence_init(fence, &apu_fence_ops, &sched->job_lock,
+		       apu_queue->fence_context, apu_queue->seqno++);
+
+	return fence;
+}
+
+static struct apu_job *to_apu_job(struct drm_sched_job *sched_job)
+{
+	return container_of(sched_job, struct apu_job, base);
+}
+
+static struct dma_fence *apu_job_dependency(struct drm_sched_job *sched_job,
+					    struct drm_sched_entity *s_entity)
+{
+	struct apu_job *job = to_apu_job(sched_job);
+	struct dma_fence *fence;
+	unsigned int i;
+
+	/* Implicit fences, max. one per BO */
+	for (i = 0; i < job->bo_count; i++) {
+		if (job->implicit_fences[i]) {
+			fence = job->implicit_fences[i];
+			job->implicit_fences[i] = NULL;
+			return fence;
+		}
+	}
+
+	return NULL;
+}
+
+static int apu_job_hw_submit(struct apu_job *job)
+{
+	int ret;
+	struct apu_core *apu_core = job->apu_core;
+	struct apu_dev_request *dev_req;
+	struct apu_request *apu_req;
+	unsigned long flags;
+
+	int size = sizeof(*dev_req) + sizeof(u32) * job->bo_count * 2;
+	u32 *dev_req_da;
+	u32 *dev_req_buffer_size;
+	int i;
+
+	dev_req = kmalloc(size + job->size_in + job->size_out, GFP_KERNEL);
+	if (!dev_req)
+		return -ENOMEM;
+
+	dev_req->cmd = job->cmd;
+	dev_req->size_in = job->size_in;
+	dev_req->size_out = job->size_out;
+	dev_req->count = job->bo_count;
+	dev_req_da =
+	    (u32 *) (dev_req->data + dev_req->size_in + dev_req->size_out);
+	dev_req_buffer_size = (u32 *) (dev_req_da + dev_req->count);
+	memcpy(dev_req->data, job->data_in, job->size_in);
+
+	apu_req = kzalloc(sizeof(*apu_req), GFP_KERNEL);
+	if (!apu_req)
+		return -ENOMEM;
+
+	for (i = 0; i < job->bo_count; i++) {
+		struct apu_gem_object *obj = to_apu_bo(job->bos[i]);
+
+		dev_req_da[i] = obj->iova + obj->offset;
+		dev_req_buffer_size[i] = obj->size;
+	}
+
+	ret = ida_simple_get(&req_ida, 0, 0xffff, GFP_KERNEL);
+	if (ret < 0)
+		goto err_free_memory;
+
+	dev_req->id = ret;
+
+	job->id = dev_req->id;
+	apu_req->job = job;
+	spin_lock_irqsave(&apu_core->ctx_lock, flags);
+	list_add(&apu_req->node, &apu_core->requests);
+	spin_unlock_irqrestore(&apu_core->ctx_lock, flags);
+	ret =
+	    apu_core->ops->send(apu_core, dev_req,
+				size + dev_req->size_in + dev_req->size_out);
+	if (ret < 0)
+		goto err;
+	kfree(dev_req);
+
+	return 0;
+
+err:
+	list_del(&apu_req->node);
+	ida_simple_remove(&req_ida, dev_req->id);
+err_free_memory:
+	kfree(apu_req);
+	kfree(dev_req);
+
+	return ret;
+}
+
+static struct dma_fence *apu_job_run(struct drm_sched_job *sched_job)
+{
+	struct apu_job *job = to_apu_job(sched_job);
+	struct dma_fence *fence = NULL;
+
+	if (unlikely(job->base.s_fence->finished.error))
+		return NULL;
+
+	fence = apu_fence_create(job->apu_core->sched);
+	if (IS_ERR(fence))
+		return NULL;
+
+	job->done_fence = dma_fence_get(fence);
+
+	apu_job_hw_submit(job);
+
+	return fence;
+}
+
+static void apu_update_rpoc_state(struct apu_core *core)
+{
+	if (core->rproc) {
+		if (core->rproc->state == RPROC_CRASHED)
+			core->flags |= APU_CRASHED;
+		if (core->rproc->state == RPROC_OFFLINE)
+			core->flags &= ~APU_ONLINE;
+	}
+}
+
+static enum drm_gpu_sched_stat apu_job_timedout(struct drm_sched_job *sched_job)
+{
+	struct apu_request *apu_req, *tmp;
+	struct apu_job *job = to_apu_job(sched_job);
+
+	if (dma_fence_is_signaled(job->done_fence))
+		return DRM_GPU_SCHED_STAT_NOMINAL;
+
+	list_for_each_entry_safe(apu_req, tmp, &job->apu_core->requests, node) {
+		/* Remove the request and notify user about timeout */
+		if (apu_req->job == job) {
+			kref_get(&job->refcount);
+			job->apu_core->flags |= APU_TIMEDOUT;
+			apu_update_rpoc_state(job->apu_core);
+			job->result = ETIMEDOUT;
+			list_add(&job->node, &complete_node);
+			list_del(&apu_req->node);
+			ida_simple_remove(&req_ida, job->id);
+			kfree(apu_req);
+			drm_send_event(job->apu_drm->drm,
+				       &job->event->pending_event);
+			dma_fence_signal_locked(job->done_fence);
+		}
+	}
+
+	return DRM_GPU_SCHED_STAT_NOMINAL;
+}
+
+static void apu_job_free(struct drm_sched_job *sched_job)
+{
+	struct apu_job *job = to_apu_job(sched_job);
+
+	drm_sched_job_cleanup(sched_job);
+
+	apu_job_put(job);
+}
+
+static const struct drm_sched_backend_ops apu_sched_ops = {
+	.dependency = apu_job_dependency,
+	.run_job = apu_job_run,
+	.timedout_job = apu_job_timedout,
+	.free_job = apu_job_free
+};
+
+int apu_drm_job_init(struct apu_core *core)
+{
+	int ret;
+	struct apu_sched *apu_sched;
+	struct drm_gpu_scheduler *sched;
+
+	apu_sched = devm_kzalloc(core->dev, sizeof(*apu_sched), GFP_KERNEL);
+	if (!apu_sched)
+		return -ENOMEM;
+
+	sched = &apu_sched->apu_queue.sched;
+	apu_sched->apu_queue.fence_context = dma_fence_context_alloc(1);
+	ret = drm_sched_init(sched, &apu_sched_ops,
+			     1, 0, msecs_to_jiffies(500),
+			     NULL, NULL, "apu_js");
+	if (ret) {
+		dev_err(core->dev, "Failed to create scheduler: %d.", ret);
+		return ret;
+	}
+
+	ret = drm_sched_entity_init(&apu_sched->sched_entity,
+				    DRM_SCHED_PRIORITY_NORMAL,
+				    &sched, 1, NULL);
+
+	core->sched = apu_sched;
+	core->flags = APU_ONLINE;
+
+	return ret;
+}
+
+static struct apu_core *get_apu_core(struct apu_drm *apu_drm, int device_id)
+{
+	struct apu_core *apu_core;
+
+	list_for_each_entry(apu_core, &apu_drm->apu_cores, node) {
+		if (apu_core->device_id == device_id)
+			return apu_core;
+	}
+
+	return NULL;
+}
+
+static int apu_core_is_running(struct apu_core *core)
+{
+	return core->ops && core->priv && core->sched;
+}
+
+static int
+apu_lookup_bos(struct drm_device *dev,
+	       struct drm_file *file_priv,
+	       struct drm_apu_gem_queue *args, struct apu_job *job)
+{
+	void __user *bo_handles;
+	unsigned int i;
+	int ret;
+
+	job->bo_count = args->bo_handle_count;
+
+	if (!job->bo_count)
+		return 0;
+
+	job->implicit_fences = kvmalloc_array(job->bo_count,
+					      sizeof(struct dma_fence *),
+					      GFP_KERNEL | __GFP_ZERO);
+	if (!job->implicit_fences)
+		return -ENOMEM;
+
+	bo_handles = (void __user *)(uintptr_t) args->bo_handles;
+	ret = drm_gem_objects_lookup(file_priv, bo_handles,
+				     job->bo_count, &job->bos);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < job->bo_count; i++) {
+		ret = apu_bo_iommu_map(job->apu_drm, job->bos[i]);
+		if (ret) {
+			/* TODO: handle error */
+			break;
+		}
+	}
+
+	return ret;
+}
+
+int ioctl_gem_queue(struct drm_device *dev, void *data,
+		    struct drm_file *file_priv)
+{
+	struct apu_drm *apu_drm = dev->dev_private;
+	struct drm_apu_gem_queue *args = data;
+	struct apu_event *event;
+	struct apu_core *core;
+	struct drm_syncobj *sync_out = NULL;
+	struct apu_job *job;
+	int ret = 0;
+
+	core = get_apu_core(apu_drm, args->device);
+	if (!apu_core_is_running(core))
+		return -ENODEV;
+
+	if (args->out_sync > 0) {
+		sync_out = drm_syncobj_find(file_priv, args->out_sync);
+		if (!sync_out)
+			return -ENODEV;
+	}
+
+	job = kzalloc(sizeof(*job), GFP_KERNEL);
+	if (!job) {
+		ret = -ENOMEM;
+		goto fail_out_sync;
+	}
+
+	kref_init(&job->refcount);
+
+	job->apu_drm = apu_drm;
+	job->apu_core = core;
+	job->cmd = args->cmd;
+	job->size_in = args->size_in;
+	job->size_out = args->size_out;
+	job->sync_out = sync_out;
+	if (job->size_in) {
+		job->data_in = kmalloc(job->size_in, GFP_KERNEL);
+		if (!job->data_in) {
+			ret = -ENOMEM;
+			goto fail_job;
+		}
+
+		ret =
+		    copy_from_user(job->data_in,
+				   (void __user *)(uintptr_t) args->data,
+				   job->size_in);
+		if (ret)
+			goto fail_job;
+	}
+
+	if (job->size_out) {
+		job->data_out = kmalloc(job->size_out, GFP_KERNEL);
+		if (!job->data_out) {
+			ret = -ENOMEM;
+			goto fail_job;
+		}
+	}
+
+	ret = apu_lookup_bos(dev, file_priv, args, job);
+	if (ret)
+		goto fail_job;
+
+	event = kzalloc(sizeof(*event), GFP_KERNEL);
+	event->base.length = sizeof(struct apu_job_event);
+	event->base.type = APU_JOB_COMPLETED;
+	event->job_event.out_sync = args->out_sync;
+	job->event = event;
+	ret = drm_event_reserve_init(dev, file_priv, &job->event->pending_event,
+				     &job->event->base);
+	if (ret)
+		goto fail_job;
+
+	ret = apu_job_push(job);
+	if (ret) {
+		drm_event_cancel_free(dev, &job->event->pending_event);
+		goto fail_job;
+	}
+
+	/* Update the return sync object for the job */
+	if (sync_out)
+		drm_syncobj_replace_fence(sync_out, job->render_done_fence);
+
+fail_job:
+	apu_job_put(job);
+fail_out_sync:
+	if (sync_out)
+		drm_syncobj_put(sync_out);
+
+	return ret;
+}
+
+int ioctl_gem_dequeue(struct drm_device *dev, void *data,
+		      struct drm_file *file_priv)
+{
+	struct drm_apu_gem_dequeue *args = data;
+	struct drm_syncobj *sync_out = NULL;
+	struct apu_job *job;
+	int ret = 0;
+
+	if (args->out_sync > 0) {
+		sync_out = drm_syncobj_find(file_priv, args->out_sync);
+		if (!sync_out)
+			return -ENODEV;
+	}
+
+	list_for_each_entry(job, &complete_node, node) {
+		if (job->sync_out == sync_out) {
+			if (job->data_out) {
+				ret = copy_to_user((void __user *)(uintptr_t)
+						   args->data, job->data_out,
+						   job->size_out);
+				args->size = job->size_out;
+			}
+			args->result = job->result;
+			list_del(&job->node);
+			apu_job_put(job);
+			drm_syncobj_put(sync_out);
+
+			return ret;
+		}
+	}
+
+	if (sync_out)
+		drm_syncobj_put(sync_out);
+
+	return 0;
+}
+
+int ioctl_apu_state(struct drm_device *dev, void *data,
+		    struct drm_file *file_priv)
+{
+	struct apu_drm *apu_drm = dev->dev_private;
+	struct drm_apu_state *args = data;
+	struct apu_core *core;
+
+	args->flags = 0;
+
+	core = get_apu_core(apu_drm, args->device);
+	if (!core)
+		return -ENODEV;
+	args->flags |= core->flags;
+
+	/* Reset APU flags */
+	core->flags &= ~(APU_TIMEDOUT | APU_CRASHED);
+
+	return 0;
+}
diff --git a/include/drm/apu_drm.h b/include/drm/apu_drm.h
new file mode 100644
index 0000000000000..f044ed0427fdd
--- /dev/null
+++ b/include/drm/apu_drm.h
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __APU_DRM_H__
+#define __APU_DRM_H__
+
+#include <linux/iova.h>
+#include <linux/remoteproc.h>
+
+struct apu_core;
+struct apu_drm;
+
+struct apu_drm_ops {
+	int (*send)(struct apu_core *apu_core, void *data, int len);
+	int (*callback)(struct apu_core *apu_core, void *data, int len);
+};
+
+#ifdef CONFIG_DRM_APU
+
+struct apu_core *apu_drm_register_core(struct rproc *rproc,
+				       struct apu_drm_ops *ops, void *priv);
+int apu_drm_reserve_iova(struct apu_core *apu_core, u64 start, u64 size);
+int apu_drm_unregister_core(void *priv);
+int apu_drm_callback(struct apu_core *apu_core, void *data, int len);
+void *apu_drm_priv(struct apu_core *apu_core);
+
+#else /* CONFIG_DRM_APU */
+
+static inline
+struct apu_core *apu_drm_register_core(struct rproc *rproc,
+				       struct apu_drm_ops *ops, void *priv)
+{
+	return NULL;
+}
+
+static inline
+int apu_drm_reserve_iova(struct apu_core *apu_core, u64 start, u64 size)
+{
+	return -ENOMEM;
+}
+
+static inline
+int apu_drm_uregister_core(void *priv)
+{
+	return -ENODEV;
+}
+
+static inline
+int apu_drm_callback(struct apu_core *apu_core, void *data, int len)
+{
+	return -ENODEV;
+}
+
+static inline void *apu_drm_priv(struct apu_core *apu_core)
+{
+	return NULL;
+}
+#endif /* CONFIG_DRM_APU */
+
+
+#endif /* __APU_DRM_H__ */
diff --git a/include/uapi/drm/apu_drm.h b/include/uapi/drm/apu_drm.h
new file mode 100644
index 0000000000000..c52e187bb0599
--- /dev/null
+++ b/include/uapi/drm/apu_drm.h
@@ -0,0 +1,106 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __UAPI_APU_DRM_H__
+#define __UAPI_APU_DRM_H__
+
+#include "drm.h"
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+#define APU_JOB_COMPLETED 0x80000000
+
+/*
+ * Please note that modifications to all structs defined here are
+ * subject to backwards-compatibility constraints.
+ */
+
+/*
+ * Firmware request, must be aligned with the one defined in firmware.
+ * @id: Request id, used in the case of reply, to find the pending request
+ * @cmd: The command id to execute in the firmware
+ * @result: The result of the command executed on the firmware
+ * @size: The size of the data available in this request
+ * @count: The number of shared buffer
+ * @data: Contains the data attached with the request if size is greater than
+ *        zero, and the addresses of shared buffers if count is greater than
+ *        zero. Both the data and the shared buffer could be read and write
+ *        by the APU.
+ */
+struct  apu_dev_request {
+	u16 id;
+	u16 cmd;
+	u16 result;
+	u16 size_in;
+	u16 size_out;
+	u16 count;
+	u8 data[0];
+} __packed;
+
+struct drm_apu_gem_new {
+	__u32 size;			/* in */
+	__u32 flags;			/* in */
+	__u32 handle;			/* out */
+	__u64 offset;			/* out */
+};
+
+struct drm_apu_gem_queue {
+	__u32 device;
+	__u32 cmd;
+	__u32 out_sync;
+	__u64 bo_handles;
+	__u32 bo_handle_count;
+	__u16 size_in;
+	__u16 size_out;
+	__u64 data;
+};
+
+struct drm_apu_gem_dequeue {
+	__u32 out_sync;
+	__u16 result;
+	__u16 size;
+	__u64 data;
+};
+
+struct drm_apu_gem_iommu_map {
+	__u64 bo_handles;
+	__u32 bo_handle_count;
+	__u64 bo_device_addresses;
+};
+
+struct apu_job_event {
+	struct drm_event base;
+	__u32 out_sync;
+};
+
+#define APU_ONLINE		BIT(0)
+#define APU_CRASHED		BIT(1)
+#define APU_TIMEDOUT		BIT(2)
+
+struct drm_apu_state {
+	__u32 device;
+	__u32 flags;
+};
+
+#define DRM_APU_GEM_NEW			0x00
+#define DRM_APU_GEM_QUEUE		0x01
+#define DRM_APU_GEM_DEQUEUE		0x02
+#define DRM_APU_GEM_IOMMU_MAP		0x03
+#define DRM_APU_GEM_IOMMU_UNMAP		0x04
+#define DRM_APU_STATE			0x05
+#define DRM_APU_NUM_IOCTLS		0x06
+
+#define DRM_IOCTL_APU_GEM_NEW		DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_NEW, struct drm_apu_gem_new)
+#define DRM_IOCTL_APU_GEM_USER_NEW	DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_USER_NEW, struct drm_apu_gem_user_new)
+#define DRM_IOCTL_APU_GEM_QUEUE		DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_QUEUE, struct drm_apu_gem_queue)
+#define DRM_IOCTL_APU_GEM_DEQUEUE	DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_DEQUEUE, struct drm_apu_gem_dequeue)
+#define DRM_IOCTL_APU_GEM_IOMMU_MAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_IOMMU_MAP, struct drm_apu_gem_iommu_map)
+#define DRM_IOCTL_APU_GEM_IOMMU_UNMAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_IOMMU_UNMAP, struct drm_apu_gem_iommu_map)
+#define DRM_IOCTL_APU_STATE		DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_STATE, struct drm_apu_state)
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif /* __UAPI_APU_DRM_H__ */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 2/4] DRM: Add support of AI Processor Unit (APU)
@ 2021-09-17 12:59   ` Alexandre Bailon
  0 siblings, 0 replies; 34+ messages in thread
From: Alexandre Bailon @ 2021-09-17 12:59 UTC (permalink / raw)
  To: airlied, daniel, robh+dt, matthias.bgg, maarten.lankhorst,
	mripard, tzimmermann, ohad, bjorn.andersson, mathieu.poirier,
	sumit.semwal
  Cc: christian.koenig, dri-devel, devicetree, linux-arm-kernel,
	linux-mediatek, linux-kernel, linux-remoteproc, linux-media,
	linaro-mm-sig, khilman, gpain, Alexandre Bailon

Some Mediatek SoC provides hardware accelerator for AI / ML.
This driver provides the infrastructure to manage memory
shared between host CPU and the accelerator, and to submit
jobs to the accelerator.
The APU itself is managed by remoteproc so this drivers
relies on remoteproc to found the APU and get some important data
from it. But, the driver is quite generic and it should possible
to manage accelerator using another ways.
This driver doesn't manage itself the data transmitions.
It must be registered by another driver implementing the transmitions.

Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
---
 drivers/gpu/drm/Kconfig            |   2 +
 drivers/gpu/drm/Makefile           |   1 +
 drivers/gpu/drm/apu/Kconfig        |  10 +
 drivers/gpu/drm/apu/Makefile       |   7 +
 drivers/gpu/drm/apu/apu_drm_drv.c  | 238 +++++++++++
 drivers/gpu/drm/apu/apu_gem.c      | 232 +++++++++++
 drivers/gpu/drm/apu/apu_internal.h |  89 ++++
 drivers/gpu/drm/apu/apu_sched.c    | 634 +++++++++++++++++++++++++++++
 include/drm/apu_drm.h              |  59 +++
 include/uapi/drm/apu_drm.h         | 106 +++++
 10 files changed, 1378 insertions(+)
 create mode 100644 drivers/gpu/drm/apu/Kconfig
 create mode 100644 drivers/gpu/drm/apu/Makefile
 create mode 100644 drivers/gpu/drm/apu/apu_drm_drv.c
 create mode 100644 drivers/gpu/drm/apu/apu_gem.c
 create mode 100644 drivers/gpu/drm/apu/apu_internal.h
 create mode 100644 drivers/gpu/drm/apu/apu_sched.c
 create mode 100644 include/drm/apu_drm.h
 create mode 100644 include/uapi/drm/apu_drm.h

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 8fc40317f2b77..bcdca35c9eda5 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -382,6 +382,8 @@ source "drivers/gpu/drm/xlnx/Kconfig"
 
 source "drivers/gpu/drm/gud/Kconfig"
 
+source "drivers/gpu/drm/apu/Kconfig"
+
 config DRM_HYPERV
 	tristate "DRM Support for Hyper-V synthetic video device"
 	depends on DRM && PCI && MMU && HYPERV
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index ad11121548983..f3d8432976558 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -127,4 +127,5 @@ obj-$(CONFIG_DRM_MCDE) += mcde/
 obj-$(CONFIG_DRM_TIDSS) += tidss/
 obj-y			+= xlnx/
 obj-y			+= gud/
+obj-$(CONFIG_DRM_APU) += apu/
 obj-$(CONFIG_DRM_HYPERV) += hyperv/
diff --git a/drivers/gpu/drm/apu/Kconfig b/drivers/gpu/drm/apu/Kconfig
new file mode 100644
index 0000000000000..c8471309a0351
--- /dev/null
+++ b/drivers/gpu/drm/apu/Kconfig
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+
+config DRM_APU
+	tristate "APU (AI Processor Unit)"
+	select REMOTEPROC
+	select DRM_SCHED
+	help
+	  This provides a DRM driver that provides some facilities to
+	  communicate with an accelerated processing unit (APU).
diff --git a/drivers/gpu/drm/apu/Makefile b/drivers/gpu/drm/apu/Makefile
new file mode 100644
index 0000000000000..3e97846b091c9
--- /dev/null
+++ b/drivers/gpu/drm/apu/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0
+
+apu_drm-y += apu_drm_drv.o
+apu_drm-y += apu_sched.o
+apu_drm-y += apu_gem.o
+
+obj-$(CONFIG_DRM_APU) += apu_drm.o
diff --git a/drivers/gpu/drm/apu/apu_drm_drv.c b/drivers/gpu/drm/apu/apu_drm_drv.c
new file mode 100644
index 0000000000000..91d8c99e373c0
--- /dev/null
+++ b/drivers/gpu/drm/apu/apu_drm_drv.c
@@ -0,0 +1,238 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Copyright 2020 BayLibre SAS
+
+#include <linux/dma-map-ops.h>
+#include <linux/dma-mapping.h>
+#include <linux/iommu.h>
+#include <linux/iova.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/remoteproc.h>
+
+#include <drm/apu_drm.h>
+#include <drm/drm_drv.h>
+#include <drm/drm_gem_cma_helper.h>
+#include <drm/drm_probe_helper.h>
+
+#include <uapi/drm/apu_drm.h>
+
+#include "apu_internal.h"
+
+static LIST_HEAD(apu_devices);
+
+static const struct drm_ioctl_desc ioctls[] = {
+	DRM_IOCTL_DEF_DRV(APU_GEM_NEW, ioctl_gem_new,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(APU_GEM_QUEUE, ioctl_gem_queue,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(APU_GEM_DEQUEUE, ioctl_gem_dequeue,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_MAP, ioctl_gem_iommu_map,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_UNMAP, ioctl_gem_iommu_unmap,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(APU_STATE, ioctl_apu_state,
+			  DRM_RENDER_ALLOW),
+};
+
+DEFINE_DRM_GEM_CMA_FOPS(apu_drm_ops);
+
+static struct drm_driver apu_drm_driver = {
+	.driver_features = DRIVER_GEM | DRIVER_SYNCOBJ,
+	.name = "drm_apu",
+	.desc = "APU DRM driver",
+	.date = "20210319",
+	.major = 1,
+	.minor = 0,
+	.patchlevel = 0,
+	.ioctls = ioctls,
+	.num_ioctls = ARRAY_SIZE(ioctls),
+	.fops = &apu_drm_ops,
+	DRM_GEM_CMA_DRIVER_OPS_WITH_DUMB_CREATE(drm_gem_cma_dumb_create),
+};
+
+void *apu_drm_priv(struct apu_core *apu_core)
+{
+	return apu_core->priv;
+}
+EXPORT_SYMBOL_GPL(apu_drm_priv);
+
+int apu_drm_reserve_iova(struct apu_core *apu_core, u64 start, u64 size)
+{
+	struct apu_drm *apu_drm = apu_core->apu_drm;
+	struct iova *iova;
+
+	iova = reserve_iova(&apu_drm->iovad, PHYS_PFN(start),
+			    PHYS_PFN(start + size));
+	if (!iova)
+		return -ENOMEM;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(apu_drm_reserve_iova);
+
+static int apu_drm_init_first_core(struct apu_drm *apu_drm,
+				   struct apu_core *apu_core)
+{
+	struct drm_device *drm;
+	struct device *parent;
+	u64 mask;
+
+	drm = apu_drm->drm;
+	parent = apu_core->rproc->dev.parent;
+	drm->dev->iommu_group = parent->iommu_group;
+	apu_drm->domain = iommu_get_domain_for_dev(parent);
+	set_dma_ops(drm->dev, get_dma_ops(parent));
+	mask = dma_get_mask(parent);
+	return dma_coerce_mask_and_coherent(drm->dev, mask);
+}
+
+struct apu_core *apu_drm_register_core(struct rproc *rproc,
+				       struct apu_drm_ops *ops, void *priv)
+{
+	struct apu_drm *apu_drm;
+	struct apu_core *apu_core;
+	int ret;
+
+	list_for_each_entry(apu_drm, &apu_devices, node) {
+		list_for_each_entry(apu_core, &apu_drm->apu_cores, node) {
+			if (apu_core->rproc == rproc) {
+				ret =
+				    apu_drm_init_first_core(apu_drm, apu_core);
+				apu_core->dev = &rproc->dev;
+				apu_core->priv = priv;
+				apu_core->ops = ops;
+
+				ret = apu_drm_job_init(apu_core);
+				if (ret)
+					return NULL;
+
+				return apu_core;
+			}
+		}
+	}
+
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(apu_drm_register_core);
+
+int apu_drm_unregister_core(void *priv)
+{
+	struct apu_drm *apu_drm;
+	struct apu_core *apu_core;
+
+	list_for_each_entry(apu_drm, &apu_devices, node) {
+		list_for_each_entry(apu_core, &apu_drm->apu_cores, node) {
+			if (apu_core->priv == priv) {
+				apu_sched_fini(apu_core);
+				apu_core->priv = NULL;
+				apu_core->ops = NULL;
+			}
+		}
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(apu_drm_unregister_core);
+
+#ifdef CONFIG_OF
+static const struct of_device_id apu_platform_of_match[] = {
+	{ .compatible = "mediatek,apu-drm", },
+	{ },
+};
+
+MODULE_DEVICE_TABLE(of, apu_platform_of_match);
+#endif
+
+static int apu_platform_probe(struct platform_device *pdev)
+{
+	struct drm_device *drm;
+	struct apu_drm *apu_drm;
+	struct of_phandle_iterator it;
+	int index = 0;
+	u64 iova[2];
+	int ret;
+
+	apu_drm = devm_kzalloc(&pdev->dev, sizeof(*apu_drm), GFP_KERNEL);
+	if (!apu_drm)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&apu_drm->apu_cores);
+
+	of_phandle_iterator_init(&it, pdev->dev.of_node, "remoteproc", NULL, 0);
+	while (of_phandle_iterator_next(&it) == 0) {
+		struct rproc *rproc = rproc_get_by_phandle(it.phandle);
+		struct apu_core *apu_core;
+
+		if (!rproc)
+			return -EPROBE_DEFER;
+
+		apu_core = devm_kzalloc(&pdev->dev, sizeof(*apu_core),
+					GFP_KERNEL);
+		if (!apu_core)
+			return -ENOMEM;
+
+		apu_core->rproc = rproc;
+		apu_core->device_id = index++;
+		apu_core->apu_drm = apu_drm;
+		spin_lock_init(&apu_core->ctx_lock);
+		INIT_LIST_HEAD(&apu_core->requests);
+		list_add(&apu_core->node, &apu_drm->apu_cores);
+	}
+
+	if (of_property_read_variable_u64_array(pdev->dev.of_node, "iova",
+						iova, ARRAY_SIZE(iova),
+						ARRAY_SIZE(iova)) !=
+	    ARRAY_SIZE(iova))
+		return -EINVAL;
+
+	init_iova_domain(&apu_drm->iovad, PAGE_SIZE, PHYS_PFN(iova[0]));
+	apu_drm->iova_limit_pfn = PHYS_PFN(iova[0] + iova[1]) - 1;
+
+	drm = drm_dev_alloc(&apu_drm_driver, &pdev->dev);
+	if (IS_ERR(drm)) {
+		ret = PTR_ERR(drm);
+		return ret;
+	}
+
+	ret = drm_dev_register(drm, 0);
+	if (ret) {
+		drm_dev_put(drm);
+		return ret;
+	}
+
+	drm->dev_private = apu_drm;
+	apu_drm->drm = drm;
+	apu_drm->dev = &pdev->dev;
+
+	platform_set_drvdata(pdev, drm);
+
+	list_add(&apu_drm->node, &apu_devices);
+
+	return 0;
+}
+
+static int apu_platform_remove(struct platform_device *pdev)
+{
+	struct drm_device *drm;
+
+	drm = platform_get_drvdata(pdev);
+
+	drm_dev_unregister(drm);
+	drm_dev_put(drm);
+
+	return 0;
+}
+
+static struct platform_driver apu_platform_driver = {
+	.probe = apu_platform_probe,
+	.remove = apu_platform_remove,
+	.driver = {
+		   .name = "apu_drm",
+		   .of_match_table = of_match_ptr(apu_platform_of_match),
+	},
+};
+
+module_platform_driver(apu_platform_driver);
diff --git a/drivers/gpu/drm/apu/apu_gem.c b/drivers/gpu/drm/apu/apu_gem.c
new file mode 100644
index 0000000000000..c867143dab436
--- /dev/null
+++ b/drivers/gpu/drm/apu/apu_gem.c
@@ -0,0 +1,232 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Copyright 2020 BayLibre SAS
+
+#include <asm/cacheflush.h>
+
+#include <linux/dma-buf.h>
+#include <linux/dma-mapping.h>
+#include <linux/highmem.h>
+#include <linux/iommu.h>
+#include <linux/iova.h>
+#include <linux/mm.h>
+#include <linux/swap.h>
+
+#include <drm/drm_drv.h>
+#include <drm/drm_gem_cma_helper.h>
+
+#include <uapi/drm/apu_drm.h>
+
+#include "apu_internal.h"
+
+struct drm_gem_object *apu_gem_create_object(struct drm_device *dev,
+					     size_t size)
+{
+	struct drm_gem_cma_object *cma_obj;
+
+	cma_obj = drm_gem_cma_create(dev, size);
+	if (!cma_obj)
+		return NULL;
+
+	return &cma_obj->base;
+}
+
+int ioctl_gem_new(struct drm_device *dev, void *data,
+		  struct drm_file *file_priv)
+{
+	struct drm_apu_gem_new *args = data;
+	struct drm_gem_cma_object *cma_obj;
+	struct apu_gem_object *apu_obj;
+	struct drm_gem_object *gem_obj;
+	int ret;
+
+	cma_obj = drm_gem_cma_create(dev, args->size);
+	if (IS_ERR(cma_obj))
+		return PTR_ERR(cma_obj);
+
+	gem_obj = &cma_obj->base;
+	apu_obj = to_apu_bo(gem_obj);
+
+	/*
+	 * Save the size of buffer expected by application instead of the
+	 * aligned one.
+	 */
+	apu_obj->size = args->size;
+	apu_obj->offset = 0;
+	apu_obj->iommu_refcount = 0;
+	mutex_init(&apu_obj->mutex);
+
+	ret = drm_gem_handle_create(file_priv, gem_obj, &args->handle);
+	drm_gem_object_put(gem_obj);
+	if (ret) {
+		drm_gem_cma_free_object(gem_obj);
+		return ret;
+	}
+	args->offset = drm_vma_node_offset_addr(&gem_obj->vma_node);
+
+	return 0;
+}
+
+void apu_bo_iommu_unmap(struct apu_drm *apu_drm, struct apu_gem_object *obj)
+{
+	int iova_pfn;
+	int i;
+
+	if (!obj->iommu_sgt)
+		return;
+
+	mutex_lock(&obj->mutex);
+	obj->iommu_refcount--;
+	if (obj->iommu_refcount) {
+		mutex_unlock(&obj->mutex);
+		return;
+	}
+
+	iova_pfn = PHYS_PFN(obj->iova);
+	for (i = 0; i < obj->iommu_sgt->nents; i++) {
+		iommu_unmap(apu_drm->domain, PFN_PHYS(iova_pfn),
+			    PAGE_ALIGN(obj->iommu_sgt->sgl[i].length));
+		iova_pfn += PHYS_PFN(PAGE_ALIGN(obj->iommu_sgt->sgl[i].length));
+	}
+
+	sg_free_table(obj->iommu_sgt);
+	kfree(obj->iommu_sgt);
+
+	free_iova(&apu_drm->iovad, PHYS_PFN(obj->iova));
+	mutex_unlock(&obj->mutex);
+}
+
+static struct sg_table *apu_get_sg_table(struct drm_gem_object *obj)
+{
+	if (obj->funcs)
+		return obj->funcs->get_sg_table(obj);
+	return NULL;
+}
+
+int apu_bo_iommu_map(struct apu_drm *apu_drm, struct drm_gem_object *obj)
+{
+	struct apu_gem_object *apu_obj = to_apu_bo(obj);
+	struct scatterlist *sgl;
+	phys_addr_t phys;
+	int total_buf_space;
+	int iova_pfn;
+	int iova;
+	int ret;
+	int i;
+
+	mutex_lock(&apu_obj->mutex);
+	apu_obj->iommu_refcount++;
+	if (apu_obj->iommu_refcount != 1) {
+		mutex_unlock(&apu_obj->mutex);
+		return 0;
+	}
+
+	apu_obj->iommu_sgt = apu_get_sg_table(obj);
+	if (IS_ERR(apu_obj->iommu_sgt)) {
+		mutex_unlock(&apu_obj->mutex);
+		return PTR_ERR(apu_obj->iommu_sgt);
+	}
+
+	total_buf_space = obj->size;
+	iova_pfn = alloc_iova_fast(&apu_drm->iovad,
+				   total_buf_space >> PAGE_SHIFT,
+				   apu_drm->iova_limit_pfn, true);
+	apu_obj->iova = PFN_PHYS(iova_pfn);
+
+	if (!iova_pfn) {
+		dev_err(apu_drm->dev, "Failed to allocate iova address\n");
+		mutex_unlock(&apu_obj->mutex);
+		return -ENOMEM;
+	}
+
+	iova = apu_obj->iova;
+	sgl = apu_obj->iommu_sgt->sgl;
+	for (i = 0; i < apu_obj->iommu_sgt->nents; i++) {
+		phys = page_to_phys(sg_page(&sgl[i]));
+		ret =
+		    iommu_map(apu_drm->domain, PFN_PHYS(iova_pfn), phys,
+			      PAGE_ALIGN(sgl[i].length),
+			      IOMMU_READ | IOMMU_WRITE);
+		if (ret) {
+			dev_err(apu_drm->dev, "Failed to iommu map\n");
+			free_iova(&apu_drm->iovad, iova_pfn);
+			mutex_unlock(&apu_obj->mutex);
+			return ret;
+		}
+		iova += sgl[i].offset + sgl[i].length;
+		iova_pfn += PHYS_PFN(PAGE_ALIGN(sgl[i].length));
+	}
+	mutex_unlock(&apu_obj->mutex);
+
+	return 0;
+}
+
+int ioctl_gem_iommu_map(struct drm_device *dev, void *data,
+			struct drm_file *file_priv)
+{
+	struct apu_drm *apu_drm = dev->dev_private;
+	struct drm_apu_gem_iommu_map *args = data;
+	struct drm_gem_object **bos;
+	void __user *bo_handles;
+	int ret;
+	int i;
+
+	u64 *das = kvmalloc_array(args->bo_handle_count,
+				  sizeof(u64), GFP_KERNEL);
+	if (!das)
+		return -ENOMEM;
+
+	bo_handles = (void __user *)(uintptr_t) args->bo_handles;
+	ret = drm_gem_objects_lookup(file_priv, bo_handles,
+				     args->bo_handle_count, &bos);
+	if (ret) {
+		kvfree(das);
+		return ret;
+	}
+
+	for (i = 0; i < args->bo_handle_count; i++) {
+		ret = apu_bo_iommu_map(apu_drm, bos[i]);
+		if (ret) {
+			/* TODO: handle error */
+			break;
+		}
+		das[i] = to_apu_bo(bos[i])->iova + to_apu_bo(bos[i])->offset;
+	}
+
+	if (copy_to_user((void *)args->bo_device_addresses, das,
+			 args->bo_handle_count * sizeof(u64))) {
+		ret = -EFAULT;
+		DRM_DEBUG("Failed to copy device addresses\n");
+		goto out;
+	}
+
+out:
+	kvfree(das);
+	kvfree(bos);
+
+	return 0;
+}
+
+int ioctl_gem_iommu_unmap(struct drm_device *dev, void *data,
+			  struct drm_file *file_priv)
+{
+	struct apu_drm *apu_drm = dev->dev_private;
+	struct drm_apu_gem_iommu_map *args = data;
+	struct drm_gem_object **bos;
+	void __user *bo_handles;
+	int ret;
+	int i;
+
+	bo_handles = (void __user *)(uintptr_t) args->bo_handles;
+	ret = drm_gem_objects_lookup(file_priv, bo_handles,
+				     args->bo_handle_count, &bos);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < args->bo_handle_count; i++)
+		apu_bo_iommu_unmap(apu_drm, to_apu_bo(bos[i]));
+
+	kvfree(bos);
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/apu/apu_internal.h b/drivers/gpu/drm/apu/apu_internal.h
new file mode 100644
index 0000000000000..b789b2f3ad9c6
--- /dev/null
+++ b/drivers/gpu/drm/apu/apu_internal.h
@@ -0,0 +1,89 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __APU_INTERNAL_H__
+#define __APU_INTERNAL_H__
+
+#include <linux/iova.h>
+
+#include <drm/drm_drv.h>
+#include <drm/drm_gem_cma_helper.h>
+#include <drm/gpu_scheduler.h>
+
+struct apu_gem_object {
+	struct drm_gem_cma_object base;
+	struct mutex mutex;
+	struct sg_table *iommu_sgt;
+	int iommu_refcount;
+	size_t size;
+	u32 iova;
+	u32 offset;
+};
+
+struct apu_sched;
+struct apu_core {
+	int device_id;
+	struct device *dev;
+	struct rproc *rproc;
+	struct apu_drm_ops *ops;
+	struct apu_drm *apu_drm;
+
+	spinlock_t ctx_lock;
+	struct list_head requests;
+
+	struct list_head node;
+	void *priv;
+
+	struct apu_sched *sched;
+	u32 flags;
+};
+
+struct apu_drm {
+	struct device *dev;
+	struct drm_device *drm;
+
+	struct iommu_domain *domain;
+	struct iova_domain iovad;
+	int iova_limit_pfn;
+
+	struct list_head apu_cores;
+	struct list_head node;
+};
+
+static inline struct apu_gem_object *to_apu_bo(struct drm_gem_object *obj)
+{
+	return container_of(to_drm_gem_cma_obj(obj), struct apu_gem_object,
+			    base);
+}
+
+struct apu_gem_object *to_apu_bo(struct drm_gem_object *obj);
+struct drm_gem_object *apu_gem_create_object(struct drm_device *dev,
+					     size_t size);
+
+int apu_bo_iommu_map(struct apu_drm *apu_drm, struct drm_gem_object *obj);
+void apu_bo_iommu_unmap(struct apu_drm *apu_drm, struct apu_gem_object *obj);
+struct drm_gem_object *apu_gem_create_object(struct drm_device *dev,
+					     size_t size);
+int ioctl_gem_new(struct drm_device *dev, void *data,
+		  struct drm_file *file_priv);
+int ioctl_gem_user_new(struct drm_device *dev, void *data,
+		       struct drm_file *file_priv);
+int ioctl_gem_iommu_map(struct drm_device *dev, void *data,
+			struct drm_file *file_priv);
+int ioctl_gem_iommu_unmap(struct drm_device *dev, void *data,
+			  struct drm_file *file_priv);
+int ioctl_gem_queue(struct drm_device *dev, void *data,
+		    struct drm_file *file_priv);
+int ioctl_gem_dequeue(struct drm_device *dev, void *data,
+		      struct drm_file *file_priv);
+int ioctl_apu_state(struct drm_device *dev, void *data,
+		    struct drm_file *file_priv);
+struct dma_buf *apu_gem_prime_export(struct drm_gem_object *gem,
+				     int flags);
+
+struct apu_job;
+
+int apu_drm_job_init(struct apu_core *core);
+void apu_sched_fini(struct apu_core *core);
+int apu_job_push(struct apu_job *job);
+void apu_job_put(struct apu_job *job);
+
+#endif /* __APU_INTERNAL_H__ */
diff --git a/drivers/gpu/drm/apu/apu_sched.c b/drivers/gpu/drm/apu/apu_sched.c
new file mode 100644
index 0000000000000..cebb0155c7783
--- /dev/null
+++ b/drivers/gpu/drm/apu/apu_sched.c
@@ -0,0 +1,634 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Copyright 2020 BayLibre SAS
+
+#include <drm/apu_drm.h>
+#include <drm/drm_drv.h>
+#include <drm/drm_gem_cma_helper.h>
+#include <drm/drm_syncobj.h>
+#include <drm/gpu_scheduler.h>
+
+#include <uapi/drm/apu_drm.h>
+
+#include "apu_internal.h"
+
+struct apu_queue_state {
+	struct drm_gpu_scheduler sched;
+
+	u64 fence_context;
+	u64 seqno;
+};
+
+struct apu_request {
+	struct list_head node;
+	void *job;
+};
+
+struct apu_sched {
+	struct apu_queue_state apu_queue;
+	spinlock_t job_lock;
+	struct drm_sched_entity sched_entity;
+};
+
+struct apu_event {
+	struct drm_pending_event pending_event;
+	union {
+		struct drm_event base;
+		struct apu_job_event job_event;
+	};
+};
+
+struct apu_job {
+	struct drm_sched_job base;
+
+	struct kref refcount;
+
+	struct apu_core *apu_core;
+	struct apu_drm *apu_drm;
+
+	/* Fence to be signaled by IRQ handler when the job is complete. */
+	struct dma_fence *done_fence;
+
+	__u32 cmd;
+
+	/* Exclusive fences we have taken from the BOs to wait for */
+	struct dma_fence **implicit_fences;
+	struct drm_gem_object **bos;
+	u32 bo_count;
+
+	/* Fence to be signaled by drm-sched once its done with the job */
+	struct dma_fence *render_done_fence;
+
+	void *data_in;
+	uint16_t size_in;
+	void *data_out;
+	uint16_t size_out;
+	uint16_t result;
+	uint16_t id;
+
+	struct list_head node;
+	struct drm_syncobj *sync_out;
+
+	struct apu_event *event;
+};
+
+static DEFINE_IDA(req_ida);
+static LIST_HEAD(complete_node);
+
+int apu_drm_callback(struct apu_core *apu_core, void *data, int len)
+{
+	struct apu_request *apu_req, *tmp;
+	struct apu_dev_request *hdr = data;
+	unsigned long flags;
+
+	spin_lock_irqsave(&apu_core->ctx_lock, flags);
+	list_for_each_entry_safe(apu_req, tmp, &apu_core->requests, node) {
+		struct apu_job *job = apu_req->job;
+
+		if (job && hdr->id == job->id) {
+			kref_get(&job->refcount);
+			job->result = hdr->result;
+			if (job->size_out)
+				memcpy(job->data_out, hdr->data + job->size_in,
+				       min(job->size_out, hdr->size_out));
+			job->size_out = hdr->size_out;
+			list_add(&job->node, &complete_node);
+			list_del(&apu_req->node);
+			ida_simple_remove(&req_ida, hdr->id);
+			kfree(apu_req);
+			drm_send_event(job->apu_drm->drm,
+				       &job->event->pending_event);
+			dma_fence_signal_locked(job->done_fence);
+		}
+	}
+	spin_unlock_irqrestore(&apu_core->ctx_lock, flags);
+
+	return 0;
+}
+
+void apu_sched_fini(struct apu_core *core)
+{
+	drm_sched_fini(&core->sched->apu_queue.sched);
+	devm_kfree(core->dev, core->sched);
+	core->flags &= ~APU_ONLINE;
+	core->sched = NULL;
+}
+
+static void apu_job_cleanup(struct kref *ref)
+{
+	struct apu_job *job = container_of(ref, struct apu_job,
+					   refcount);
+	unsigned int i;
+
+	if (job->implicit_fences) {
+		for (i = 0; i < job->bo_count; i++)
+			dma_fence_put(job->implicit_fences[i]);
+		kvfree(job->implicit_fences);
+	}
+	dma_fence_put(job->done_fence);
+	dma_fence_put(job->render_done_fence);
+
+	if (job->bos) {
+		for (i = 0; i < job->bo_count; i++) {
+			struct apu_gem_object *apu_obj;
+
+			apu_obj = to_apu_bo(job->bos[i]);
+			apu_bo_iommu_unmap(job->apu_drm, apu_obj);
+			drm_gem_object_put(job->bos[i]);
+		}
+
+		kvfree(job->bos);
+	}
+
+	kfree(job->data_out);
+	kfree(job->data_in);
+	kfree(job);
+}
+
+void apu_job_put(struct apu_job *job)
+{
+	kref_put(&job->refcount, apu_job_cleanup);
+}
+
+static void apu_acquire_object_fences(struct drm_gem_object **bos,
+				      int bo_count,
+				      struct dma_fence **implicit_fences)
+{
+	int i;
+
+	for (i = 0; i < bo_count; i++)
+		implicit_fences[i] = dma_resv_get_excl_unlocked(bos[i]->resv);
+}
+
+static void apu_attach_object_fences(struct drm_gem_object **bos,
+				     int bo_count, struct dma_fence *fence)
+{
+	int i;
+
+	for (i = 0; i < bo_count; i++)
+		dma_resv_add_excl_fence(bos[i]->resv, fence);
+}
+
+int apu_job_push(struct apu_job *job)
+{
+	struct drm_sched_entity *entity = &job->apu_core->sched->sched_entity;
+	struct ww_acquire_ctx acquire_ctx;
+	int ret = 0;
+
+	ret = drm_gem_lock_reservations(job->bos, job->bo_count, &acquire_ctx);
+	if (ret)
+		return ret;
+
+	ret = drm_sched_job_init(&job->base, entity, NULL);
+	if (ret)
+		goto unlock;
+
+	job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
+
+	kref_get(&job->refcount);	/* put by scheduler job completion */
+
+	apu_acquire_object_fences(job->bos, job->bo_count,
+				  job->implicit_fences);
+
+	drm_sched_entity_push_job(&job->base, entity);
+
+	apu_attach_object_fences(job->bos, job->bo_count,
+				 job->render_done_fence);
+
+unlock:
+	drm_gem_unlock_reservations(job->bos, job->bo_count, &acquire_ctx);
+
+	return ret;
+}
+
+static const char *apu_fence_get_driver_name(struct dma_fence *fence)
+{
+	return "apu";
+}
+
+static const char *apu_fence_get_timeline_name(struct dma_fence *fence)
+{
+	return "apu-0";
+}
+
+static void apu_fence_release(struct dma_fence *f)
+{
+	kfree(f);
+}
+
+static const struct dma_fence_ops apu_fence_ops = {
+	.get_driver_name = apu_fence_get_driver_name,
+	.get_timeline_name = apu_fence_get_timeline_name,
+	.release = apu_fence_release,
+};
+
+static struct dma_fence *apu_fence_create(struct apu_sched *sched)
+{
+	struct dma_fence *fence;
+	struct apu_queue_state *apu_queue = &sched->apu_queue;
+
+	fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+	if (!fence)
+		return ERR_PTR(-ENOMEM);
+
+	dma_fence_init(fence, &apu_fence_ops, &sched->job_lock,
+		       apu_queue->fence_context, apu_queue->seqno++);
+
+	return fence;
+}
+
+static struct apu_job *to_apu_job(struct drm_sched_job *sched_job)
+{
+	return container_of(sched_job, struct apu_job, base);
+}
+
+static struct dma_fence *apu_job_dependency(struct drm_sched_job *sched_job,
+					    struct drm_sched_entity *s_entity)
+{
+	struct apu_job *job = to_apu_job(sched_job);
+	struct dma_fence *fence;
+	unsigned int i;
+
+	/* Implicit fences, max. one per BO */
+	for (i = 0; i < job->bo_count; i++) {
+		if (job->implicit_fences[i]) {
+			fence = job->implicit_fences[i];
+			job->implicit_fences[i] = NULL;
+			return fence;
+		}
+	}
+
+	return NULL;
+}
+
+static int apu_job_hw_submit(struct apu_job *job)
+{
+	int ret;
+	struct apu_core *apu_core = job->apu_core;
+	struct apu_dev_request *dev_req;
+	struct apu_request *apu_req;
+	unsigned long flags;
+
+	int size = sizeof(*dev_req) + sizeof(u32) * job->bo_count * 2;
+	u32 *dev_req_da;
+	u32 *dev_req_buffer_size;
+	int i;
+
+	dev_req = kmalloc(size + job->size_in + job->size_out, GFP_KERNEL);
+	if (!dev_req)
+		return -ENOMEM;
+
+	dev_req->cmd = job->cmd;
+	dev_req->size_in = job->size_in;
+	dev_req->size_out = job->size_out;
+	dev_req->count = job->bo_count;
+	dev_req_da =
+	    (u32 *) (dev_req->data + dev_req->size_in + dev_req->size_out);
+	dev_req_buffer_size = (u32 *) (dev_req_da + dev_req->count);
+	memcpy(dev_req->data, job->data_in, job->size_in);
+
+	apu_req = kzalloc(sizeof(*apu_req), GFP_KERNEL);
+	if (!apu_req)
+		return -ENOMEM;
+
+	for (i = 0; i < job->bo_count; i++) {
+		struct apu_gem_object *obj = to_apu_bo(job->bos[i]);
+
+		dev_req_da[i] = obj->iova + obj->offset;
+		dev_req_buffer_size[i] = obj->size;
+	}
+
+	ret = ida_simple_get(&req_ida, 0, 0xffff, GFP_KERNEL);
+	if (ret < 0)
+		goto err_free_memory;
+
+	dev_req->id = ret;
+
+	job->id = dev_req->id;
+	apu_req->job = job;
+	spin_lock_irqsave(&apu_core->ctx_lock, flags);
+	list_add(&apu_req->node, &apu_core->requests);
+	spin_unlock_irqrestore(&apu_core->ctx_lock, flags);
+	ret =
+	    apu_core->ops->send(apu_core, dev_req,
+				size + dev_req->size_in + dev_req->size_out);
+	if (ret < 0)
+		goto err;
+	kfree(dev_req);
+
+	return 0;
+
+err:
+	list_del(&apu_req->node);
+	ida_simple_remove(&req_ida, dev_req->id);
+err_free_memory:
+	kfree(apu_req);
+	kfree(dev_req);
+
+	return ret;
+}
+
+static struct dma_fence *apu_job_run(struct drm_sched_job *sched_job)
+{
+	struct apu_job *job = to_apu_job(sched_job);
+	struct dma_fence *fence = NULL;
+
+	if (unlikely(job->base.s_fence->finished.error))
+		return NULL;
+
+	fence = apu_fence_create(job->apu_core->sched);
+	if (IS_ERR(fence))
+		return NULL;
+
+	job->done_fence = dma_fence_get(fence);
+
+	apu_job_hw_submit(job);
+
+	return fence;
+}
+
+static void apu_update_rpoc_state(struct apu_core *core)
+{
+	if (core->rproc) {
+		if (core->rproc->state == RPROC_CRASHED)
+			core->flags |= APU_CRASHED;
+		if (core->rproc->state == RPROC_OFFLINE)
+			core->flags &= ~APU_ONLINE;
+	}
+}
+
+static enum drm_gpu_sched_stat apu_job_timedout(struct drm_sched_job *sched_job)
+{
+	struct apu_request *apu_req, *tmp;
+	struct apu_job *job = to_apu_job(sched_job);
+
+	if (dma_fence_is_signaled(job->done_fence))
+		return DRM_GPU_SCHED_STAT_NOMINAL;
+
+	list_for_each_entry_safe(apu_req, tmp, &job->apu_core->requests, node) {
+		/* Remove the request and notify user about timeout */
+		if (apu_req->job == job) {
+			kref_get(&job->refcount);
+			job->apu_core->flags |= APU_TIMEDOUT;
+			apu_update_rpoc_state(job->apu_core);
+			job->result = ETIMEDOUT;
+			list_add(&job->node, &complete_node);
+			list_del(&apu_req->node);
+			ida_simple_remove(&req_ida, job->id);
+			kfree(apu_req);
+			drm_send_event(job->apu_drm->drm,
+				       &job->event->pending_event);
+			dma_fence_signal_locked(job->done_fence);
+		}
+	}
+
+	return DRM_GPU_SCHED_STAT_NOMINAL;
+}
+
+static void apu_job_free(struct drm_sched_job *sched_job)
+{
+	struct apu_job *job = to_apu_job(sched_job);
+
+	drm_sched_job_cleanup(sched_job);
+
+	apu_job_put(job);
+}
+
+static const struct drm_sched_backend_ops apu_sched_ops = {
+	.dependency = apu_job_dependency,
+	.run_job = apu_job_run,
+	.timedout_job = apu_job_timedout,
+	.free_job = apu_job_free
+};
+
+int apu_drm_job_init(struct apu_core *core)
+{
+	int ret;
+	struct apu_sched *apu_sched;
+	struct drm_gpu_scheduler *sched;
+
+	apu_sched = devm_kzalloc(core->dev, sizeof(*apu_sched), GFP_KERNEL);
+	if (!apu_sched)
+		return -ENOMEM;
+
+	sched = &apu_sched->apu_queue.sched;
+	apu_sched->apu_queue.fence_context = dma_fence_context_alloc(1);
+	ret = drm_sched_init(sched, &apu_sched_ops,
+			     1, 0, msecs_to_jiffies(500),
+			     NULL, NULL, "apu_js");
+	if (ret) {
+		dev_err(core->dev, "Failed to create scheduler: %d.", ret);
+		return ret;
+	}
+
+	ret = drm_sched_entity_init(&apu_sched->sched_entity,
+				    DRM_SCHED_PRIORITY_NORMAL,
+				    &sched, 1, NULL);
+
+	core->sched = apu_sched;
+	core->flags = APU_ONLINE;
+
+	return ret;
+}
+
+static struct apu_core *get_apu_core(struct apu_drm *apu_drm, int device_id)
+{
+	struct apu_core *apu_core;
+
+	list_for_each_entry(apu_core, &apu_drm->apu_cores, node) {
+		if (apu_core->device_id == device_id)
+			return apu_core;
+	}
+
+	return NULL;
+}
+
+static int apu_core_is_running(struct apu_core *core)
+{
+	return core->ops && core->priv && core->sched;
+}
+
+static int
+apu_lookup_bos(struct drm_device *dev,
+	       struct drm_file *file_priv,
+	       struct drm_apu_gem_queue *args, struct apu_job *job)
+{
+	void __user *bo_handles;
+	unsigned int i;
+	int ret;
+
+	job->bo_count = args->bo_handle_count;
+
+	if (!job->bo_count)
+		return 0;
+
+	job->implicit_fences = kvmalloc_array(job->bo_count,
+					      sizeof(struct dma_fence *),
+					      GFP_KERNEL | __GFP_ZERO);
+	if (!job->implicit_fences)
+		return -ENOMEM;
+
+	bo_handles = (void __user *)(uintptr_t) args->bo_handles;
+	ret = drm_gem_objects_lookup(file_priv, bo_handles,
+				     job->bo_count, &job->bos);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < job->bo_count; i++) {
+		ret = apu_bo_iommu_map(job->apu_drm, job->bos[i]);
+		if (ret) {
+			/* TODO: handle error */
+			break;
+		}
+	}
+
+	return ret;
+}
+
+int ioctl_gem_queue(struct drm_device *dev, void *data,
+		    struct drm_file *file_priv)
+{
+	struct apu_drm *apu_drm = dev->dev_private;
+	struct drm_apu_gem_queue *args = data;
+	struct apu_event *event;
+	struct apu_core *core;
+	struct drm_syncobj *sync_out = NULL;
+	struct apu_job *job;
+	int ret = 0;
+
+	core = get_apu_core(apu_drm, args->device);
+	if (!apu_core_is_running(core))
+		return -ENODEV;
+
+	if (args->out_sync > 0) {
+		sync_out = drm_syncobj_find(file_priv, args->out_sync);
+		if (!sync_out)
+			return -ENODEV;
+	}
+
+	job = kzalloc(sizeof(*job), GFP_KERNEL);
+	if (!job) {
+		ret = -ENOMEM;
+		goto fail_out_sync;
+	}
+
+	kref_init(&job->refcount);
+
+	job->apu_drm = apu_drm;
+	job->apu_core = core;
+	job->cmd = args->cmd;
+	job->size_in = args->size_in;
+	job->size_out = args->size_out;
+	job->sync_out = sync_out;
+	if (job->size_in) {
+		job->data_in = kmalloc(job->size_in, GFP_KERNEL);
+		if (!job->data_in) {
+			ret = -ENOMEM;
+			goto fail_job;
+		}
+
+		ret =
+		    copy_from_user(job->data_in,
+				   (void __user *)(uintptr_t) args->data,
+				   job->size_in);
+		if (ret)
+			goto fail_job;
+	}
+
+	if (job->size_out) {
+		job->data_out = kmalloc(job->size_out, GFP_KERNEL);
+		if (!job->data_out) {
+			ret = -ENOMEM;
+			goto fail_job;
+		}
+	}
+
+	ret = apu_lookup_bos(dev, file_priv, args, job);
+	if (ret)
+		goto fail_job;
+
+	event = kzalloc(sizeof(*event), GFP_KERNEL);
+	event->base.length = sizeof(struct apu_job_event);
+	event->base.type = APU_JOB_COMPLETED;
+	event->job_event.out_sync = args->out_sync;
+	job->event = event;
+	ret = drm_event_reserve_init(dev, file_priv, &job->event->pending_event,
+				     &job->event->base);
+	if (ret)
+		goto fail_job;
+
+	ret = apu_job_push(job);
+	if (ret) {
+		drm_event_cancel_free(dev, &job->event->pending_event);
+		goto fail_job;
+	}
+
+	/* Update the return sync object for the job */
+	if (sync_out)
+		drm_syncobj_replace_fence(sync_out, job->render_done_fence);
+
+fail_job:
+	apu_job_put(job);
+fail_out_sync:
+	if (sync_out)
+		drm_syncobj_put(sync_out);
+
+	return ret;
+}
+
+int ioctl_gem_dequeue(struct drm_device *dev, void *data,
+		      struct drm_file *file_priv)
+{
+	struct drm_apu_gem_dequeue *args = data;
+	struct drm_syncobj *sync_out = NULL;
+	struct apu_job *job;
+	int ret = 0;
+
+	if (args->out_sync > 0) {
+		sync_out = drm_syncobj_find(file_priv, args->out_sync);
+		if (!sync_out)
+			return -ENODEV;
+	}
+
+	list_for_each_entry(job, &complete_node, node) {
+		if (job->sync_out == sync_out) {
+			if (job->data_out) {
+				ret = copy_to_user((void __user *)(uintptr_t)
+						   args->data, job->data_out,
+						   job->size_out);
+				args->size = job->size_out;
+			}
+			args->result = job->result;
+			list_del(&job->node);
+			apu_job_put(job);
+			drm_syncobj_put(sync_out);
+
+			return ret;
+		}
+	}
+
+	if (sync_out)
+		drm_syncobj_put(sync_out);
+
+	return 0;
+}
+
+int ioctl_apu_state(struct drm_device *dev, void *data,
+		    struct drm_file *file_priv)
+{
+	struct apu_drm *apu_drm = dev->dev_private;
+	struct drm_apu_state *args = data;
+	struct apu_core *core;
+
+	args->flags = 0;
+
+	core = get_apu_core(apu_drm, args->device);
+	if (!core)
+		return -ENODEV;
+	args->flags |= core->flags;
+
+	/* Reset APU flags */
+	core->flags &= ~(APU_TIMEDOUT | APU_CRASHED);
+
+	return 0;
+}
diff --git a/include/drm/apu_drm.h b/include/drm/apu_drm.h
new file mode 100644
index 0000000000000..f044ed0427fdd
--- /dev/null
+++ b/include/drm/apu_drm.h
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __APU_DRM_H__
+#define __APU_DRM_H__
+
+#include <linux/iova.h>
+#include <linux/remoteproc.h>
+
+struct apu_core;
+struct apu_drm;
+
+struct apu_drm_ops {
+	int (*send)(struct apu_core *apu_core, void *data, int len);
+	int (*callback)(struct apu_core *apu_core, void *data, int len);
+};
+
+#ifdef CONFIG_DRM_APU
+
+struct apu_core *apu_drm_register_core(struct rproc *rproc,
+				       struct apu_drm_ops *ops, void *priv);
+int apu_drm_reserve_iova(struct apu_core *apu_core, u64 start, u64 size);
+int apu_drm_unregister_core(void *priv);
+int apu_drm_callback(struct apu_core *apu_core, void *data, int len);
+void *apu_drm_priv(struct apu_core *apu_core);
+
+#else /* CONFIG_DRM_APU */
+
+static inline
+struct apu_core *apu_drm_register_core(struct rproc *rproc,
+				       struct apu_drm_ops *ops, void *priv)
+{
+	return NULL;
+}
+
+static inline
+int apu_drm_reserve_iova(struct apu_core *apu_core, u64 start, u64 size)
+{
+	return -ENOMEM;
+}
+
+static inline
+int apu_drm_uregister_core(void *priv)
+{
+	return -ENODEV;
+}
+
+static inline
+int apu_drm_callback(struct apu_core *apu_core, void *data, int len)
+{
+	return -ENODEV;
+}
+
+static inline void *apu_drm_priv(struct apu_core *apu_core)
+{
+	return NULL;
+}
+#endif /* CONFIG_DRM_APU */
+
+
+#endif /* __APU_DRM_H__ */
diff --git a/include/uapi/drm/apu_drm.h b/include/uapi/drm/apu_drm.h
new file mode 100644
index 0000000000000..c52e187bb0599
--- /dev/null
+++ b/include/uapi/drm/apu_drm.h
@@ -0,0 +1,106 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __UAPI_APU_DRM_H__
+#define __UAPI_APU_DRM_H__
+
+#include "drm.h"
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+#define APU_JOB_COMPLETED 0x80000000
+
+/*
+ * Please note that modifications to all structs defined here are
+ * subject to backwards-compatibility constraints.
+ */
+
+/*
+ * Firmware request, must be aligned with the one defined in firmware.
+ * @id: Request id, used in the case of reply, to find the pending request
+ * @cmd: The command id to execute in the firmware
+ * @result: The result of the command executed on the firmware
+ * @size: The size of the data available in this request
+ * @count: The number of shared buffer
+ * @data: Contains the data attached with the request if size is greater than
+ *        zero, and the addresses of shared buffers if count is greater than
+ *        zero. Both the data and the shared buffer could be read and write
+ *        by the APU.
+ */
+struct  apu_dev_request {
+	u16 id;
+	u16 cmd;
+	u16 result;
+	u16 size_in;
+	u16 size_out;
+	u16 count;
+	u8 data[0];
+} __packed;
+
+struct drm_apu_gem_new {
+	__u32 size;			/* in */
+	__u32 flags;			/* in */
+	__u32 handle;			/* out */
+	__u64 offset;			/* out */
+};
+
+struct drm_apu_gem_queue {
+	__u32 device;
+	__u32 cmd;
+	__u32 out_sync;
+	__u64 bo_handles;
+	__u32 bo_handle_count;
+	__u16 size_in;
+	__u16 size_out;
+	__u64 data;
+};
+
+struct drm_apu_gem_dequeue {
+	__u32 out_sync;
+	__u16 result;
+	__u16 size;
+	__u64 data;
+};
+
+struct drm_apu_gem_iommu_map {
+	__u64 bo_handles;
+	__u32 bo_handle_count;
+	__u64 bo_device_addresses;
+};
+
+struct apu_job_event {
+	struct drm_event base;
+	__u32 out_sync;
+};
+
+#define APU_ONLINE		BIT(0)
+#define APU_CRASHED		BIT(1)
+#define APU_TIMEDOUT		BIT(2)
+
+struct drm_apu_state {
+	__u32 device;
+	__u32 flags;
+};
+
+#define DRM_APU_GEM_NEW			0x00
+#define DRM_APU_GEM_QUEUE		0x01
+#define DRM_APU_GEM_DEQUEUE		0x02
+#define DRM_APU_GEM_IOMMU_MAP		0x03
+#define DRM_APU_GEM_IOMMU_UNMAP		0x04
+#define DRM_APU_STATE			0x05
+#define DRM_APU_NUM_IOCTLS		0x06
+
+#define DRM_IOCTL_APU_GEM_NEW		DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_NEW, struct drm_apu_gem_new)
+#define DRM_IOCTL_APU_GEM_USER_NEW	DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_USER_NEW, struct drm_apu_gem_user_new)
+#define DRM_IOCTL_APU_GEM_QUEUE		DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_QUEUE, struct drm_apu_gem_queue)
+#define DRM_IOCTL_APU_GEM_DEQUEUE	DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_DEQUEUE, struct drm_apu_gem_dequeue)
+#define DRM_IOCTL_APU_GEM_IOMMU_MAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_IOMMU_MAP, struct drm_apu_gem_iommu_map)
+#define DRM_IOCTL_APU_GEM_IOMMU_UNMAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_IOMMU_UNMAP, struct drm_apu_gem_iommu_map)
+#define DRM_IOCTL_APU_STATE		DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_STATE, struct drm_apu_state)
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif /* __UAPI_APU_DRM_H__ */
-- 
2.31.1


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 2/4] DRM: Add support of AI Processor Unit (APU)
@ 2021-09-17 12:59   ` Alexandre Bailon
  0 siblings, 0 replies; 34+ messages in thread
From: Alexandre Bailon @ 2021-09-17 12:59 UTC (permalink / raw)
  To: airlied, daniel, robh+dt, matthias.bgg, maarten.lankhorst,
	mripard, tzimmermann, ohad, bjorn.andersson, mathieu.poirier,
	sumit.semwal
  Cc: christian.koenig, dri-devel, devicetree, linux-arm-kernel,
	linux-mediatek, linux-kernel, linux-remoteproc, linux-media,
	linaro-mm-sig, khilman, gpain, Alexandre Bailon

Some Mediatek SoC provides hardware accelerator for AI / ML.
This driver provides the infrastructure to manage memory
shared between host CPU and the accelerator, and to submit
jobs to the accelerator.
The APU itself is managed by remoteproc so this drivers
relies on remoteproc to found the APU and get some important data
from it. But, the driver is quite generic and it should possible
to manage accelerator using another ways.
This driver doesn't manage itself the data transmitions.
It must be registered by another driver implementing the transmitions.

Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
---
 drivers/gpu/drm/Kconfig            |   2 +
 drivers/gpu/drm/Makefile           |   1 +
 drivers/gpu/drm/apu/Kconfig        |  10 +
 drivers/gpu/drm/apu/Makefile       |   7 +
 drivers/gpu/drm/apu/apu_drm_drv.c  | 238 +++++++++++
 drivers/gpu/drm/apu/apu_gem.c      | 232 +++++++++++
 drivers/gpu/drm/apu/apu_internal.h |  89 ++++
 drivers/gpu/drm/apu/apu_sched.c    | 634 +++++++++++++++++++++++++++++
 include/drm/apu_drm.h              |  59 +++
 include/uapi/drm/apu_drm.h         | 106 +++++
 10 files changed, 1378 insertions(+)
 create mode 100644 drivers/gpu/drm/apu/Kconfig
 create mode 100644 drivers/gpu/drm/apu/Makefile
 create mode 100644 drivers/gpu/drm/apu/apu_drm_drv.c
 create mode 100644 drivers/gpu/drm/apu/apu_gem.c
 create mode 100644 drivers/gpu/drm/apu/apu_internal.h
 create mode 100644 drivers/gpu/drm/apu/apu_sched.c
 create mode 100644 include/drm/apu_drm.h
 create mode 100644 include/uapi/drm/apu_drm.h

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 8fc40317f2b77..bcdca35c9eda5 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -382,6 +382,8 @@ source "drivers/gpu/drm/xlnx/Kconfig"
 
 source "drivers/gpu/drm/gud/Kconfig"
 
+source "drivers/gpu/drm/apu/Kconfig"
+
 config DRM_HYPERV
 	tristate "DRM Support for Hyper-V synthetic video device"
 	depends on DRM && PCI && MMU && HYPERV
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index ad11121548983..f3d8432976558 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -127,4 +127,5 @@ obj-$(CONFIG_DRM_MCDE) += mcde/
 obj-$(CONFIG_DRM_TIDSS) += tidss/
 obj-y			+= xlnx/
 obj-y			+= gud/
+obj-$(CONFIG_DRM_APU) += apu/
 obj-$(CONFIG_DRM_HYPERV) += hyperv/
diff --git a/drivers/gpu/drm/apu/Kconfig b/drivers/gpu/drm/apu/Kconfig
new file mode 100644
index 0000000000000..c8471309a0351
--- /dev/null
+++ b/drivers/gpu/drm/apu/Kconfig
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+
+config DRM_APU
+	tristate "APU (AI Processor Unit)"
+	select REMOTEPROC
+	select DRM_SCHED
+	help
+	  This provides a DRM driver that provides some facilities to
+	  communicate with an accelerated processing unit (APU).
diff --git a/drivers/gpu/drm/apu/Makefile b/drivers/gpu/drm/apu/Makefile
new file mode 100644
index 0000000000000..3e97846b091c9
--- /dev/null
+++ b/drivers/gpu/drm/apu/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0
+
+apu_drm-y += apu_drm_drv.o
+apu_drm-y += apu_sched.o
+apu_drm-y += apu_gem.o
+
+obj-$(CONFIG_DRM_APU) += apu_drm.o
diff --git a/drivers/gpu/drm/apu/apu_drm_drv.c b/drivers/gpu/drm/apu/apu_drm_drv.c
new file mode 100644
index 0000000000000..91d8c99e373c0
--- /dev/null
+++ b/drivers/gpu/drm/apu/apu_drm_drv.c
@@ -0,0 +1,238 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Copyright 2020 BayLibre SAS
+
+#include <linux/dma-map-ops.h>
+#include <linux/dma-mapping.h>
+#include <linux/iommu.h>
+#include <linux/iova.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/remoteproc.h>
+
+#include <drm/apu_drm.h>
+#include <drm/drm_drv.h>
+#include <drm/drm_gem_cma_helper.h>
+#include <drm/drm_probe_helper.h>
+
+#include <uapi/drm/apu_drm.h>
+
+#include "apu_internal.h"
+
+static LIST_HEAD(apu_devices);
+
+static const struct drm_ioctl_desc ioctls[] = {
+	DRM_IOCTL_DEF_DRV(APU_GEM_NEW, ioctl_gem_new,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(APU_GEM_QUEUE, ioctl_gem_queue,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(APU_GEM_DEQUEUE, ioctl_gem_dequeue,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_MAP, ioctl_gem_iommu_map,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_UNMAP, ioctl_gem_iommu_unmap,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(APU_STATE, ioctl_apu_state,
+			  DRM_RENDER_ALLOW),
+};
+
+DEFINE_DRM_GEM_CMA_FOPS(apu_drm_ops);
+
+static struct drm_driver apu_drm_driver = {
+	.driver_features = DRIVER_GEM | DRIVER_SYNCOBJ,
+	.name = "drm_apu",
+	.desc = "APU DRM driver",
+	.date = "20210319",
+	.major = 1,
+	.minor = 0,
+	.patchlevel = 0,
+	.ioctls = ioctls,
+	.num_ioctls = ARRAY_SIZE(ioctls),
+	.fops = &apu_drm_ops,
+	DRM_GEM_CMA_DRIVER_OPS_WITH_DUMB_CREATE(drm_gem_cma_dumb_create),
+};
+
+void *apu_drm_priv(struct apu_core *apu_core)
+{
+	return apu_core->priv;
+}
+EXPORT_SYMBOL_GPL(apu_drm_priv);
+
+int apu_drm_reserve_iova(struct apu_core *apu_core, u64 start, u64 size)
+{
+	struct apu_drm *apu_drm = apu_core->apu_drm;
+	struct iova *iova;
+
+	iova = reserve_iova(&apu_drm->iovad, PHYS_PFN(start),
+			    PHYS_PFN(start + size));
+	if (!iova)
+		return -ENOMEM;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(apu_drm_reserve_iova);
+
+static int apu_drm_init_first_core(struct apu_drm *apu_drm,
+				   struct apu_core *apu_core)
+{
+	struct drm_device *drm;
+	struct device *parent;
+	u64 mask;
+
+	drm = apu_drm->drm;
+	parent = apu_core->rproc->dev.parent;
+	drm->dev->iommu_group = parent->iommu_group;
+	apu_drm->domain = iommu_get_domain_for_dev(parent);
+	set_dma_ops(drm->dev, get_dma_ops(parent));
+	mask = dma_get_mask(parent);
+	return dma_coerce_mask_and_coherent(drm->dev, mask);
+}
+
+struct apu_core *apu_drm_register_core(struct rproc *rproc,
+				       struct apu_drm_ops *ops, void *priv)
+{
+	struct apu_drm *apu_drm;
+	struct apu_core *apu_core;
+	int ret;
+
+	list_for_each_entry(apu_drm, &apu_devices, node) {
+		list_for_each_entry(apu_core, &apu_drm->apu_cores, node) {
+			if (apu_core->rproc == rproc) {
+				ret =
+				    apu_drm_init_first_core(apu_drm, apu_core);
+				apu_core->dev = &rproc->dev;
+				apu_core->priv = priv;
+				apu_core->ops = ops;
+
+				ret = apu_drm_job_init(apu_core);
+				if (ret)
+					return NULL;
+
+				return apu_core;
+			}
+		}
+	}
+
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(apu_drm_register_core);
+
+int apu_drm_unregister_core(void *priv)
+{
+	struct apu_drm *apu_drm;
+	struct apu_core *apu_core;
+
+	list_for_each_entry(apu_drm, &apu_devices, node) {
+		list_for_each_entry(apu_core, &apu_drm->apu_cores, node) {
+			if (apu_core->priv == priv) {
+				apu_sched_fini(apu_core);
+				apu_core->priv = NULL;
+				apu_core->ops = NULL;
+			}
+		}
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(apu_drm_unregister_core);
+
+#ifdef CONFIG_OF
+static const struct of_device_id apu_platform_of_match[] = {
+	{ .compatible = "mediatek,apu-drm", },
+	{ },
+};
+
+MODULE_DEVICE_TABLE(of, apu_platform_of_match);
+#endif
+
+static int apu_platform_probe(struct platform_device *pdev)
+{
+	struct drm_device *drm;
+	struct apu_drm *apu_drm;
+	struct of_phandle_iterator it;
+	int index = 0;
+	u64 iova[2];
+	int ret;
+
+	apu_drm = devm_kzalloc(&pdev->dev, sizeof(*apu_drm), GFP_KERNEL);
+	if (!apu_drm)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&apu_drm->apu_cores);
+
+	of_phandle_iterator_init(&it, pdev->dev.of_node, "remoteproc", NULL, 0);
+	while (of_phandle_iterator_next(&it) == 0) {
+		struct rproc *rproc = rproc_get_by_phandle(it.phandle);
+		struct apu_core *apu_core;
+
+		if (!rproc)
+			return -EPROBE_DEFER;
+
+		apu_core = devm_kzalloc(&pdev->dev, sizeof(*apu_core),
+					GFP_KERNEL);
+		if (!apu_core)
+			return -ENOMEM;
+
+		apu_core->rproc = rproc;
+		apu_core->device_id = index++;
+		apu_core->apu_drm = apu_drm;
+		spin_lock_init(&apu_core->ctx_lock);
+		INIT_LIST_HEAD(&apu_core->requests);
+		list_add(&apu_core->node, &apu_drm->apu_cores);
+	}
+
+	if (of_property_read_variable_u64_array(pdev->dev.of_node, "iova",
+						iova, ARRAY_SIZE(iova),
+						ARRAY_SIZE(iova)) !=
+	    ARRAY_SIZE(iova))
+		return -EINVAL;
+
+	init_iova_domain(&apu_drm->iovad, PAGE_SIZE, PHYS_PFN(iova[0]));
+	apu_drm->iova_limit_pfn = PHYS_PFN(iova[0] + iova[1]) - 1;
+
+	drm = drm_dev_alloc(&apu_drm_driver, &pdev->dev);
+	if (IS_ERR(drm)) {
+		ret = PTR_ERR(drm);
+		return ret;
+	}
+
+	ret = drm_dev_register(drm, 0);
+	if (ret) {
+		drm_dev_put(drm);
+		return ret;
+	}
+
+	drm->dev_private = apu_drm;
+	apu_drm->drm = drm;
+	apu_drm->dev = &pdev->dev;
+
+	platform_set_drvdata(pdev, drm);
+
+	list_add(&apu_drm->node, &apu_devices);
+
+	return 0;
+}
+
+static int apu_platform_remove(struct platform_device *pdev)
+{
+	struct drm_device *drm;
+
+	drm = platform_get_drvdata(pdev);
+
+	drm_dev_unregister(drm);
+	drm_dev_put(drm);
+
+	return 0;
+}
+
+static struct platform_driver apu_platform_driver = {
+	.probe = apu_platform_probe,
+	.remove = apu_platform_remove,
+	.driver = {
+		   .name = "apu_drm",
+		   .of_match_table = of_match_ptr(apu_platform_of_match),
+	},
+};
+
+module_platform_driver(apu_platform_driver);
diff --git a/drivers/gpu/drm/apu/apu_gem.c b/drivers/gpu/drm/apu/apu_gem.c
new file mode 100644
index 0000000000000..c867143dab436
--- /dev/null
+++ b/drivers/gpu/drm/apu/apu_gem.c
@@ -0,0 +1,232 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Copyright 2020 BayLibre SAS
+
+#include <asm/cacheflush.h>
+
+#include <linux/dma-buf.h>
+#include <linux/dma-mapping.h>
+#include <linux/highmem.h>
+#include <linux/iommu.h>
+#include <linux/iova.h>
+#include <linux/mm.h>
+#include <linux/swap.h>
+
+#include <drm/drm_drv.h>
+#include <drm/drm_gem_cma_helper.h>
+
+#include <uapi/drm/apu_drm.h>
+
+#include "apu_internal.h"
+
+struct drm_gem_object *apu_gem_create_object(struct drm_device *dev,
+					     size_t size)
+{
+	struct drm_gem_cma_object *cma_obj;
+
+	cma_obj = drm_gem_cma_create(dev, size);
+	if (!cma_obj)
+		return NULL;
+
+	return &cma_obj->base;
+}
+
+int ioctl_gem_new(struct drm_device *dev, void *data,
+		  struct drm_file *file_priv)
+{
+	struct drm_apu_gem_new *args = data;
+	struct drm_gem_cma_object *cma_obj;
+	struct apu_gem_object *apu_obj;
+	struct drm_gem_object *gem_obj;
+	int ret;
+
+	cma_obj = drm_gem_cma_create(dev, args->size);
+	if (IS_ERR(cma_obj))
+		return PTR_ERR(cma_obj);
+
+	gem_obj = &cma_obj->base;
+	apu_obj = to_apu_bo(gem_obj);
+
+	/*
+	 * Save the size of buffer expected by application instead of the
+	 * aligned one.
+	 */
+	apu_obj->size = args->size;
+	apu_obj->offset = 0;
+	apu_obj->iommu_refcount = 0;
+	mutex_init(&apu_obj->mutex);
+
+	ret = drm_gem_handle_create(file_priv, gem_obj, &args->handle);
+	drm_gem_object_put(gem_obj);
+	if (ret) {
+		drm_gem_cma_free_object(gem_obj);
+		return ret;
+	}
+	args->offset = drm_vma_node_offset_addr(&gem_obj->vma_node);
+
+	return 0;
+}
+
+void apu_bo_iommu_unmap(struct apu_drm *apu_drm, struct apu_gem_object *obj)
+{
+	int iova_pfn;
+	int i;
+
+	if (!obj->iommu_sgt)
+		return;
+
+	mutex_lock(&obj->mutex);
+	obj->iommu_refcount--;
+	if (obj->iommu_refcount) {
+		mutex_unlock(&obj->mutex);
+		return;
+	}
+
+	iova_pfn = PHYS_PFN(obj->iova);
+	for (i = 0; i < obj->iommu_sgt->nents; i++) {
+		iommu_unmap(apu_drm->domain, PFN_PHYS(iova_pfn),
+			    PAGE_ALIGN(obj->iommu_sgt->sgl[i].length));
+		iova_pfn += PHYS_PFN(PAGE_ALIGN(obj->iommu_sgt->sgl[i].length));
+	}
+
+	sg_free_table(obj->iommu_sgt);
+	kfree(obj->iommu_sgt);
+
+	free_iova(&apu_drm->iovad, PHYS_PFN(obj->iova));
+	mutex_unlock(&obj->mutex);
+}
+
+static struct sg_table *apu_get_sg_table(struct drm_gem_object *obj)
+{
+	if (obj->funcs)
+		return obj->funcs->get_sg_table(obj);
+	return NULL;
+}
+
+int apu_bo_iommu_map(struct apu_drm *apu_drm, struct drm_gem_object *obj)
+{
+	struct apu_gem_object *apu_obj = to_apu_bo(obj);
+	struct scatterlist *sgl;
+	phys_addr_t phys;
+	int total_buf_space;
+	int iova_pfn;
+	int iova;
+	int ret;
+	int i;
+
+	mutex_lock(&apu_obj->mutex);
+	apu_obj->iommu_refcount++;
+	if (apu_obj->iommu_refcount != 1) {
+		mutex_unlock(&apu_obj->mutex);
+		return 0;
+	}
+
+	apu_obj->iommu_sgt = apu_get_sg_table(obj);
+	if (IS_ERR(apu_obj->iommu_sgt)) {
+		mutex_unlock(&apu_obj->mutex);
+		return PTR_ERR(apu_obj->iommu_sgt);
+	}
+
+	total_buf_space = obj->size;
+	iova_pfn = alloc_iova_fast(&apu_drm->iovad,
+				   total_buf_space >> PAGE_SHIFT,
+				   apu_drm->iova_limit_pfn, true);
+	apu_obj->iova = PFN_PHYS(iova_pfn);
+
+	if (!iova_pfn) {
+		dev_err(apu_drm->dev, "Failed to allocate iova address\n");
+		mutex_unlock(&apu_obj->mutex);
+		return -ENOMEM;
+	}
+
+	iova = apu_obj->iova;
+	sgl = apu_obj->iommu_sgt->sgl;
+	for (i = 0; i < apu_obj->iommu_sgt->nents; i++) {
+		phys = page_to_phys(sg_page(&sgl[i]));
+		ret =
+		    iommu_map(apu_drm->domain, PFN_PHYS(iova_pfn), phys,
+			      PAGE_ALIGN(sgl[i].length),
+			      IOMMU_READ | IOMMU_WRITE);
+		if (ret) {
+			dev_err(apu_drm->dev, "Failed to iommu map\n");
+			free_iova(&apu_drm->iovad, iova_pfn);
+			mutex_unlock(&apu_obj->mutex);
+			return ret;
+		}
+		iova += sgl[i].offset + sgl[i].length;
+		iova_pfn += PHYS_PFN(PAGE_ALIGN(sgl[i].length));
+	}
+	mutex_unlock(&apu_obj->mutex);
+
+	return 0;
+}
+
+int ioctl_gem_iommu_map(struct drm_device *dev, void *data,
+			struct drm_file *file_priv)
+{
+	struct apu_drm *apu_drm = dev->dev_private;
+	struct drm_apu_gem_iommu_map *args = data;
+	struct drm_gem_object **bos;
+	void __user *bo_handles;
+	int ret;
+	int i;
+
+	u64 *das = kvmalloc_array(args->bo_handle_count,
+				  sizeof(u64), GFP_KERNEL);
+	if (!das)
+		return -ENOMEM;
+
+	bo_handles = (void __user *)(uintptr_t) args->bo_handles;
+	ret = drm_gem_objects_lookup(file_priv, bo_handles,
+				     args->bo_handle_count, &bos);
+	if (ret) {
+		kvfree(das);
+		return ret;
+	}
+
+	for (i = 0; i < args->bo_handle_count; i++) {
+		ret = apu_bo_iommu_map(apu_drm, bos[i]);
+		if (ret) {
+			/* TODO: handle error */
+			break;
+		}
+		das[i] = to_apu_bo(bos[i])->iova + to_apu_bo(bos[i])->offset;
+	}
+
+	if (copy_to_user((void *)args->bo_device_addresses, das,
+			 args->bo_handle_count * sizeof(u64))) {
+		ret = -EFAULT;
+		DRM_DEBUG("Failed to copy device addresses\n");
+		goto out;
+	}
+
+out:
+	kvfree(das);
+	kvfree(bos);
+
+	return 0;
+}
+
+int ioctl_gem_iommu_unmap(struct drm_device *dev, void *data,
+			  struct drm_file *file_priv)
+{
+	struct apu_drm *apu_drm = dev->dev_private;
+	struct drm_apu_gem_iommu_map *args = data;
+	struct drm_gem_object **bos;
+	void __user *bo_handles;
+	int ret;
+	int i;
+
+	bo_handles = (void __user *)(uintptr_t) args->bo_handles;
+	ret = drm_gem_objects_lookup(file_priv, bo_handles,
+				     args->bo_handle_count, &bos);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < args->bo_handle_count; i++)
+		apu_bo_iommu_unmap(apu_drm, to_apu_bo(bos[i]));
+
+	kvfree(bos);
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/apu/apu_internal.h b/drivers/gpu/drm/apu/apu_internal.h
new file mode 100644
index 0000000000000..b789b2f3ad9c6
--- /dev/null
+++ b/drivers/gpu/drm/apu/apu_internal.h
@@ -0,0 +1,89 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __APU_INTERNAL_H__
+#define __APU_INTERNAL_H__
+
+#include <linux/iova.h>
+
+#include <drm/drm_drv.h>
+#include <drm/drm_gem_cma_helper.h>
+#include <drm/gpu_scheduler.h>
+
+struct apu_gem_object {
+	struct drm_gem_cma_object base;
+	struct mutex mutex;
+	struct sg_table *iommu_sgt;
+	int iommu_refcount;
+	size_t size;
+	u32 iova;
+	u32 offset;
+};
+
+struct apu_sched;
+struct apu_core {
+	int device_id;
+	struct device *dev;
+	struct rproc *rproc;
+	struct apu_drm_ops *ops;
+	struct apu_drm *apu_drm;
+
+	spinlock_t ctx_lock;
+	struct list_head requests;
+
+	struct list_head node;
+	void *priv;
+
+	struct apu_sched *sched;
+	u32 flags;
+};
+
+struct apu_drm {
+	struct device *dev;
+	struct drm_device *drm;
+
+	struct iommu_domain *domain;
+	struct iova_domain iovad;
+	int iova_limit_pfn;
+
+	struct list_head apu_cores;
+	struct list_head node;
+};
+
+static inline struct apu_gem_object *to_apu_bo(struct drm_gem_object *obj)
+{
+	return container_of(to_drm_gem_cma_obj(obj), struct apu_gem_object,
+			    base);
+}
+
+struct apu_gem_object *to_apu_bo(struct drm_gem_object *obj);
+struct drm_gem_object *apu_gem_create_object(struct drm_device *dev,
+					     size_t size);
+
+int apu_bo_iommu_map(struct apu_drm *apu_drm, struct drm_gem_object *obj);
+void apu_bo_iommu_unmap(struct apu_drm *apu_drm, struct apu_gem_object *obj);
+struct drm_gem_object *apu_gem_create_object(struct drm_device *dev,
+					     size_t size);
+int ioctl_gem_new(struct drm_device *dev, void *data,
+		  struct drm_file *file_priv);
+int ioctl_gem_user_new(struct drm_device *dev, void *data,
+		       struct drm_file *file_priv);
+int ioctl_gem_iommu_map(struct drm_device *dev, void *data,
+			struct drm_file *file_priv);
+int ioctl_gem_iommu_unmap(struct drm_device *dev, void *data,
+			  struct drm_file *file_priv);
+int ioctl_gem_queue(struct drm_device *dev, void *data,
+		    struct drm_file *file_priv);
+int ioctl_gem_dequeue(struct drm_device *dev, void *data,
+		      struct drm_file *file_priv);
+int ioctl_apu_state(struct drm_device *dev, void *data,
+		    struct drm_file *file_priv);
+struct dma_buf *apu_gem_prime_export(struct drm_gem_object *gem,
+				     int flags);
+
+struct apu_job;
+
+int apu_drm_job_init(struct apu_core *core);
+void apu_sched_fini(struct apu_core *core);
+int apu_job_push(struct apu_job *job);
+void apu_job_put(struct apu_job *job);
+
+#endif /* __APU_INTERNAL_H__ */
diff --git a/drivers/gpu/drm/apu/apu_sched.c b/drivers/gpu/drm/apu/apu_sched.c
new file mode 100644
index 0000000000000..cebb0155c7783
--- /dev/null
+++ b/drivers/gpu/drm/apu/apu_sched.c
@@ -0,0 +1,634 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Copyright 2020 BayLibre SAS
+
+#include <drm/apu_drm.h>
+#include <drm/drm_drv.h>
+#include <drm/drm_gem_cma_helper.h>
+#include <drm/drm_syncobj.h>
+#include <drm/gpu_scheduler.h>
+
+#include <uapi/drm/apu_drm.h>
+
+#include "apu_internal.h"
+
+struct apu_queue_state {
+	struct drm_gpu_scheduler sched;
+
+	u64 fence_context;
+	u64 seqno;
+};
+
+struct apu_request {
+	struct list_head node;
+	void *job;
+};
+
+struct apu_sched {
+	struct apu_queue_state apu_queue;
+	spinlock_t job_lock;
+	struct drm_sched_entity sched_entity;
+};
+
+struct apu_event {
+	struct drm_pending_event pending_event;
+	union {
+		struct drm_event base;
+		struct apu_job_event job_event;
+	};
+};
+
+struct apu_job {
+	struct drm_sched_job base;
+
+	struct kref refcount;
+
+	struct apu_core *apu_core;
+	struct apu_drm *apu_drm;
+
+	/* Fence to be signaled by IRQ handler when the job is complete. */
+	struct dma_fence *done_fence;
+
+	__u32 cmd;
+
+	/* Exclusive fences we have taken from the BOs to wait for */
+	struct dma_fence **implicit_fences;
+	struct drm_gem_object **bos;
+	u32 bo_count;
+
+	/* Fence to be signaled by drm-sched once its done with the job */
+	struct dma_fence *render_done_fence;
+
+	void *data_in;
+	uint16_t size_in;
+	void *data_out;
+	uint16_t size_out;
+	uint16_t result;
+	uint16_t id;
+
+	struct list_head node;
+	struct drm_syncobj *sync_out;
+
+	struct apu_event *event;
+};
+
+static DEFINE_IDA(req_ida);
+static LIST_HEAD(complete_node);
+
+int apu_drm_callback(struct apu_core *apu_core, void *data, int len)
+{
+	struct apu_request *apu_req, *tmp;
+	struct apu_dev_request *hdr = data;
+	unsigned long flags;
+
+	spin_lock_irqsave(&apu_core->ctx_lock, flags);
+	list_for_each_entry_safe(apu_req, tmp, &apu_core->requests, node) {
+		struct apu_job *job = apu_req->job;
+
+		if (job && hdr->id == job->id) {
+			kref_get(&job->refcount);
+			job->result = hdr->result;
+			if (job->size_out)
+				memcpy(job->data_out, hdr->data + job->size_in,
+				       min(job->size_out, hdr->size_out));
+			job->size_out = hdr->size_out;
+			list_add(&job->node, &complete_node);
+			list_del(&apu_req->node);
+			ida_simple_remove(&req_ida, hdr->id);
+			kfree(apu_req);
+			drm_send_event(job->apu_drm->drm,
+				       &job->event->pending_event);
+			dma_fence_signal_locked(job->done_fence);
+		}
+	}
+	spin_unlock_irqrestore(&apu_core->ctx_lock, flags);
+
+	return 0;
+}
+
+void apu_sched_fini(struct apu_core *core)
+{
+	drm_sched_fini(&core->sched->apu_queue.sched);
+	devm_kfree(core->dev, core->sched);
+	core->flags &= ~APU_ONLINE;
+	core->sched = NULL;
+}
+
+static void apu_job_cleanup(struct kref *ref)
+{
+	struct apu_job *job = container_of(ref, struct apu_job,
+					   refcount);
+	unsigned int i;
+
+	if (job->implicit_fences) {
+		for (i = 0; i < job->bo_count; i++)
+			dma_fence_put(job->implicit_fences[i]);
+		kvfree(job->implicit_fences);
+	}
+	dma_fence_put(job->done_fence);
+	dma_fence_put(job->render_done_fence);
+
+	if (job->bos) {
+		for (i = 0; i < job->bo_count; i++) {
+			struct apu_gem_object *apu_obj;
+
+			apu_obj = to_apu_bo(job->bos[i]);
+			apu_bo_iommu_unmap(job->apu_drm, apu_obj);
+			drm_gem_object_put(job->bos[i]);
+		}
+
+		kvfree(job->bos);
+	}
+
+	kfree(job->data_out);
+	kfree(job->data_in);
+	kfree(job);
+}
+
+void apu_job_put(struct apu_job *job)
+{
+	kref_put(&job->refcount, apu_job_cleanup);
+}
+
+static void apu_acquire_object_fences(struct drm_gem_object **bos,
+				      int bo_count,
+				      struct dma_fence **implicit_fences)
+{
+	int i;
+
+	for (i = 0; i < bo_count; i++)
+		implicit_fences[i] = dma_resv_get_excl_unlocked(bos[i]->resv);
+}
+
+static void apu_attach_object_fences(struct drm_gem_object **bos,
+				     int bo_count, struct dma_fence *fence)
+{
+	int i;
+
+	for (i = 0; i < bo_count; i++)
+		dma_resv_add_excl_fence(bos[i]->resv, fence);
+}
+
+int apu_job_push(struct apu_job *job)
+{
+	struct drm_sched_entity *entity = &job->apu_core->sched->sched_entity;
+	struct ww_acquire_ctx acquire_ctx;
+	int ret = 0;
+
+	ret = drm_gem_lock_reservations(job->bos, job->bo_count, &acquire_ctx);
+	if (ret)
+		return ret;
+
+	ret = drm_sched_job_init(&job->base, entity, NULL);
+	if (ret)
+		goto unlock;
+
+	job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
+
+	kref_get(&job->refcount);	/* put by scheduler job completion */
+
+	apu_acquire_object_fences(job->bos, job->bo_count,
+				  job->implicit_fences);
+
+	drm_sched_entity_push_job(&job->base, entity);
+
+	apu_attach_object_fences(job->bos, job->bo_count,
+				 job->render_done_fence);
+
+unlock:
+	drm_gem_unlock_reservations(job->bos, job->bo_count, &acquire_ctx);
+
+	return ret;
+}
+
+static const char *apu_fence_get_driver_name(struct dma_fence *fence)
+{
+	return "apu";
+}
+
+static const char *apu_fence_get_timeline_name(struct dma_fence *fence)
+{
+	return "apu-0";
+}
+
+static void apu_fence_release(struct dma_fence *f)
+{
+	kfree(f);
+}
+
+static const struct dma_fence_ops apu_fence_ops = {
+	.get_driver_name = apu_fence_get_driver_name,
+	.get_timeline_name = apu_fence_get_timeline_name,
+	.release = apu_fence_release,
+};
+
+static struct dma_fence *apu_fence_create(struct apu_sched *sched)
+{
+	struct dma_fence *fence;
+	struct apu_queue_state *apu_queue = &sched->apu_queue;
+
+	fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+	if (!fence)
+		return ERR_PTR(-ENOMEM);
+
+	dma_fence_init(fence, &apu_fence_ops, &sched->job_lock,
+		       apu_queue->fence_context, apu_queue->seqno++);
+
+	return fence;
+}
+
+static struct apu_job *to_apu_job(struct drm_sched_job *sched_job)
+{
+	return container_of(sched_job, struct apu_job, base);
+}
+
+static struct dma_fence *apu_job_dependency(struct drm_sched_job *sched_job,
+					    struct drm_sched_entity *s_entity)
+{
+	struct apu_job *job = to_apu_job(sched_job);
+	struct dma_fence *fence;
+	unsigned int i;
+
+	/* Implicit fences, max. one per BO */
+	for (i = 0; i < job->bo_count; i++) {
+		if (job->implicit_fences[i]) {
+			fence = job->implicit_fences[i];
+			job->implicit_fences[i] = NULL;
+			return fence;
+		}
+	}
+
+	return NULL;
+}
+
+static int apu_job_hw_submit(struct apu_job *job)
+{
+	int ret;
+	struct apu_core *apu_core = job->apu_core;
+	struct apu_dev_request *dev_req;
+	struct apu_request *apu_req;
+	unsigned long flags;
+
+	int size = sizeof(*dev_req) + sizeof(u32) * job->bo_count * 2;
+	u32 *dev_req_da;
+	u32 *dev_req_buffer_size;
+	int i;
+
+	dev_req = kmalloc(size + job->size_in + job->size_out, GFP_KERNEL);
+	if (!dev_req)
+		return -ENOMEM;
+
+	dev_req->cmd = job->cmd;
+	dev_req->size_in = job->size_in;
+	dev_req->size_out = job->size_out;
+	dev_req->count = job->bo_count;
+	dev_req_da =
+	    (u32 *) (dev_req->data + dev_req->size_in + dev_req->size_out);
+	dev_req_buffer_size = (u32 *) (dev_req_da + dev_req->count);
+	memcpy(dev_req->data, job->data_in, job->size_in);
+
+	apu_req = kzalloc(sizeof(*apu_req), GFP_KERNEL);
+	if (!apu_req)
+		return -ENOMEM;
+
+	for (i = 0; i < job->bo_count; i++) {
+		struct apu_gem_object *obj = to_apu_bo(job->bos[i]);
+
+		dev_req_da[i] = obj->iova + obj->offset;
+		dev_req_buffer_size[i] = obj->size;
+	}
+
+	ret = ida_simple_get(&req_ida, 0, 0xffff, GFP_KERNEL);
+	if (ret < 0)
+		goto err_free_memory;
+
+	dev_req->id = ret;
+
+	job->id = dev_req->id;
+	apu_req->job = job;
+	spin_lock_irqsave(&apu_core->ctx_lock, flags);
+	list_add(&apu_req->node, &apu_core->requests);
+	spin_unlock_irqrestore(&apu_core->ctx_lock, flags);
+	ret =
+	    apu_core->ops->send(apu_core, dev_req,
+				size + dev_req->size_in + dev_req->size_out);
+	if (ret < 0)
+		goto err;
+	kfree(dev_req);
+
+	return 0;
+
+err:
+	list_del(&apu_req->node);
+	ida_simple_remove(&req_ida, dev_req->id);
+err_free_memory:
+	kfree(apu_req);
+	kfree(dev_req);
+
+	return ret;
+}
+
+static struct dma_fence *apu_job_run(struct drm_sched_job *sched_job)
+{
+	struct apu_job *job = to_apu_job(sched_job);
+	struct dma_fence *fence = NULL;
+
+	if (unlikely(job->base.s_fence->finished.error))
+		return NULL;
+
+	fence = apu_fence_create(job->apu_core->sched);
+	if (IS_ERR(fence))
+		return NULL;
+
+	job->done_fence = dma_fence_get(fence);
+
+	apu_job_hw_submit(job);
+
+	return fence;
+}
+
+static void apu_update_rpoc_state(struct apu_core *core)
+{
+	if (core->rproc) {
+		if (core->rproc->state == RPROC_CRASHED)
+			core->flags |= APU_CRASHED;
+		if (core->rproc->state == RPROC_OFFLINE)
+			core->flags &= ~APU_ONLINE;
+	}
+}
+
+static enum drm_gpu_sched_stat apu_job_timedout(struct drm_sched_job *sched_job)
+{
+	struct apu_request *apu_req, *tmp;
+	struct apu_job *job = to_apu_job(sched_job);
+
+	if (dma_fence_is_signaled(job->done_fence))
+		return DRM_GPU_SCHED_STAT_NOMINAL;
+
+	list_for_each_entry_safe(apu_req, tmp, &job->apu_core->requests, node) {
+		/* Remove the request and notify user about timeout */
+		if (apu_req->job == job) {
+			kref_get(&job->refcount);
+			job->apu_core->flags |= APU_TIMEDOUT;
+			apu_update_rpoc_state(job->apu_core);
+			job->result = ETIMEDOUT;
+			list_add(&job->node, &complete_node);
+			list_del(&apu_req->node);
+			ida_simple_remove(&req_ida, job->id);
+			kfree(apu_req);
+			drm_send_event(job->apu_drm->drm,
+				       &job->event->pending_event);
+			dma_fence_signal_locked(job->done_fence);
+		}
+	}
+
+	return DRM_GPU_SCHED_STAT_NOMINAL;
+}
+
+static void apu_job_free(struct drm_sched_job *sched_job)
+{
+	struct apu_job *job = to_apu_job(sched_job);
+
+	drm_sched_job_cleanup(sched_job);
+
+	apu_job_put(job);
+}
+
+static const struct drm_sched_backend_ops apu_sched_ops = {
+	.dependency = apu_job_dependency,
+	.run_job = apu_job_run,
+	.timedout_job = apu_job_timedout,
+	.free_job = apu_job_free
+};
+
+int apu_drm_job_init(struct apu_core *core)
+{
+	int ret;
+	struct apu_sched *apu_sched;
+	struct drm_gpu_scheduler *sched;
+
+	apu_sched = devm_kzalloc(core->dev, sizeof(*apu_sched), GFP_KERNEL);
+	if (!apu_sched)
+		return -ENOMEM;
+
+	sched = &apu_sched->apu_queue.sched;
+	apu_sched->apu_queue.fence_context = dma_fence_context_alloc(1);
+	ret = drm_sched_init(sched, &apu_sched_ops,
+			     1, 0, msecs_to_jiffies(500),
+			     NULL, NULL, "apu_js");
+	if (ret) {
+		dev_err(core->dev, "Failed to create scheduler: %d.", ret);
+		return ret;
+	}
+
+	ret = drm_sched_entity_init(&apu_sched->sched_entity,
+				    DRM_SCHED_PRIORITY_NORMAL,
+				    &sched, 1, NULL);
+
+	core->sched = apu_sched;
+	core->flags = APU_ONLINE;
+
+	return ret;
+}
+
+static struct apu_core *get_apu_core(struct apu_drm *apu_drm, int device_id)
+{
+	struct apu_core *apu_core;
+
+	list_for_each_entry(apu_core, &apu_drm->apu_cores, node) {
+		if (apu_core->device_id == device_id)
+			return apu_core;
+	}
+
+	return NULL;
+}
+
+static int apu_core_is_running(struct apu_core *core)
+{
+	return core->ops && core->priv && core->sched;
+}
+
+static int
+apu_lookup_bos(struct drm_device *dev,
+	       struct drm_file *file_priv,
+	       struct drm_apu_gem_queue *args, struct apu_job *job)
+{
+	void __user *bo_handles;
+	unsigned int i;
+	int ret;
+
+	job->bo_count = args->bo_handle_count;
+
+	if (!job->bo_count)
+		return 0;
+
+	job->implicit_fences = kvmalloc_array(job->bo_count,
+					      sizeof(struct dma_fence *),
+					      GFP_KERNEL | __GFP_ZERO);
+	if (!job->implicit_fences)
+		return -ENOMEM;
+
+	bo_handles = (void __user *)(uintptr_t) args->bo_handles;
+	ret = drm_gem_objects_lookup(file_priv, bo_handles,
+				     job->bo_count, &job->bos);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < job->bo_count; i++) {
+		ret = apu_bo_iommu_map(job->apu_drm, job->bos[i]);
+		if (ret) {
+			/* TODO: handle error */
+			break;
+		}
+	}
+
+	return ret;
+}
+
+int ioctl_gem_queue(struct drm_device *dev, void *data,
+		    struct drm_file *file_priv)
+{
+	struct apu_drm *apu_drm = dev->dev_private;
+	struct drm_apu_gem_queue *args = data;
+	struct apu_event *event;
+	struct apu_core *core;
+	struct drm_syncobj *sync_out = NULL;
+	struct apu_job *job;
+	int ret = 0;
+
+	core = get_apu_core(apu_drm, args->device);
+	if (!apu_core_is_running(core))
+		return -ENODEV;
+
+	if (args->out_sync > 0) {
+		sync_out = drm_syncobj_find(file_priv, args->out_sync);
+		if (!sync_out)
+			return -ENODEV;
+	}
+
+	job = kzalloc(sizeof(*job), GFP_KERNEL);
+	if (!job) {
+		ret = -ENOMEM;
+		goto fail_out_sync;
+	}
+
+	kref_init(&job->refcount);
+
+	job->apu_drm = apu_drm;
+	job->apu_core = core;
+	job->cmd = args->cmd;
+	job->size_in = args->size_in;
+	job->size_out = args->size_out;
+	job->sync_out = sync_out;
+	if (job->size_in) {
+		job->data_in = kmalloc(job->size_in, GFP_KERNEL);
+		if (!job->data_in) {
+			ret = -ENOMEM;
+			goto fail_job;
+		}
+
+		ret =
+		    copy_from_user(job->data_in,
+				   (void __user *)(uintptr_t) args->data,
+				   job->size_in);
+		if (ret)
+			goto fail_job;
+	}
+
+	if (job->size_out) {
+		job->data_out = kmalloc(job->size_out, GFP_KERNEL);
+		if (!job->data_out) {
+			ret = -ENOMEM;
+			goto fail_job;
+		}
+	}
+
+	ret = apu_lookup_bos(dev, file_priv, args, job);
+	if (ret)
+		goto fail_job;
+
+	event = kzalloc(sizeof(*event), GFP_KERNEL);
+	event->base.length = sizeof(struct apu_job_event);
+	event->base.type = APU_JOB_COMPLETED;
+	event->job_event.out_sync = args->out_sync;
+	job->event = event;
+	ret = drm_event_reserve_init(dev, file_priv, &job->event->pending_event,
+				     &job->event->base);
+	if (ret)
+		goto fail_job;
+
+	ret = apu_job_push(job);
+	if (ret) {
+		drm_event_cancel_free(dev, &job->event->pending_event);
+		goto fail_job;
+	}
+
+	/* Update the return sync object for the job */
+	if (sync_out)
+		drm_syncobj_replace_fence(sync_out, job->render_done_fence);
+
+fail_job:
+	apu_job_put(job);
+fail_out_sync:
+	if (sync_out)
+		drm_syncobj_put(sync_out);
+
+	return ret;
+}
+
+int ioctl_gem_dequeue(struct drm_device *dev, void *data,
+		      struct drm_file *file_priv)
+{
+	struct drm_apu_gem_dequeue *args = data;
+	struct drm_syncobj *sync_out = NULL;
+	struct apu_job *job;
+	int ret = 0;
+
+	if (args->out_sync > 0) {
+		sync_out = drm_syncobj_find(file_priv, args->out_sync);
+		if (!sync_out)
+			return -ENODEV;
+	}
+
+	list_for_each_entry(job, &complete_node, node) {
+		if (job->sync_out == sync_out) {
+			if (job->data_out) {
+				ret = copy_to_user((void __user *)(uintptr_t)
+						   args->data, job->data_out,
+						   job->size_out);
+				args->size = job->size_out;
+			}
+			args->result = job->result;
+			list_del(&job->node);
+			apu_job_put(job);
+			drm_syncobj_put(sync_out);
+
+			return ret;
+		}
+	}
+
+	if (sync_out)
+		drm_syncobj_put(sync_out);
+
+	return 0;
+}
+
+int ioctl_apu_state(struct drm_device *dev, void *data,
+		    struct drm_file *file_priv)
+{
+	struct apu_drm *apu_drm = dev->dev_private;
+	struct drm_apu_state *args = data;
+	struct apu_core *core;
+
+	args->flags = 0;
+
+	core = get_apu_core(apu_drm, args->device);
+	if (!core)
+		return -ENODEV;
+	args->flags |= core->flags;
+
+	/* Reset APU flags */
+	core->flags &= ~(APU_TIMEDOUT | APU_CRASHED);
+
+	return 0;
+}
diff --git a/include/drm/apu_drm.h b/include/drm/apu_drm.h
new file mode 100644
index 0000000000000..f044ed0427fdd
--- /dev/null
+++ b/include/drm/apu_drm.h
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __APU_DRM_H__
+#define __APU_DRM_H__
+
+#include <linux/iova.h>
+#include <linux/remoteproc.h>
+
+struct apu_core;
+struct apu_drm;
+
+struct apu_drm_ops {
+	int (*send)(struct apu_core *apu_core, void *data, int len);
+	int (*callback)(struct apu_core *apu_core, void *data, int len);
+};
+
+#ifdef CONFIG_DRM_APU
+
+struct apu_core *apu_drm_register_core(struct rproc *rproc,
+				       struct apu_drm_ops *ops, void *priv);
+int apu_drm_reserve_iova(struct apu_core *apu_core, u64 start, u64 size);
+int apu_drm_unregister_core(void *priv);
+int apu_drm_callback(struct apu_core *apu_core, void *data, int len);
+void *apu_drm_priv(struct apu_core *apu_core);
+
+#else /* CONFIG_DRM_APU */
+
+static inline
+struct apu_core *apu_drm_register_core(struct rproc *rproc,
+				       struct apu_drm_ops *ops, void *priv)
+{
+	return NULL;
+}
+
+static inline
+int apu_drm_reserve_iova(struct apu_core *apu_core, u64 start, u64 size)
+{
+	return -ENOMEM;
+}
+
+static inline
+int apu_drm_uregister_core(void *priv)
+{
+	return -ENODEV;
+}
+
+static inline
+int apu_drm_callback(struct apu_core *apu_core, void *data, int len)
+{
+	return -ENODEV;
+}
+
+static inline void *apu_drm_priv(struct apu_core *apu_core)
+{
+	return NULL;
+}
+#endif /* CONFIG_DRM_APU */
+
+
+#endif /* __APU_DRM_H__ */
diff --git a/include/uapi/drm/apu_drm.h b/include/uapi/drm/apu_drm.h
new file mode 100644
index 0000000000000..c52e187bb0599
--- /dev/null
+++ b/include/uapi/drm/apu_drm.h
@@ -0,0 +1,106 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __UAPI_APU_DRM_H__
+#define __UAPI_APU_DRM_H__
+
+#include "drm.h"
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+#define APU_JOB_COMPLETED 0x80000000
+
+/*
+ * Please note that modifications to all structs defined here are
+ * subject to backwards-compatibility constraints.
+ */
+
+/*
+ * Firmware request, must be aligned with the one defined in firmware.
+ * @id: Request id, used in the case of reply, to find the pending request
+ * @cmd: The command id to execute in the firmware
+ * @result: The result of the command executed on the firmware
+ * @size: The size of the data available in this request
+ * @count: The number of shared buffer
+ * @data: Contains the data attached with the request if size is greater than
+ *        zero, and the addresses of shared buffers if count is greater than
+ *        zero. Both the data and the shared buffer could be read and write
+ *        by the APU.
+ */
+struct  apu_dev_request {
+	u16 id;
+	u16 cmd;
+	u16 result;
+	u16 size_in;
+	u16 size_out;
+	u16 count;
+	u8 data[0];
+} __packed;
+
+struct drm_apu_gem_new {
+	__u32 size;			/* in */
+	__u32 flags;			/* in */
+	__u32 handle;			/* out */
+	__u64 offset;			/* out */
+};
+
+struct drm_apu_gem_queue {
+	__u32 device;
+	__u32 cmd;
+	__u32 out_sync;
+	__u64 bo_handles;
+	__u32 bo_handle_count;
+	__u16 size_in;
+	__u16 size_out;
+	__u64 data;
+};
+
+struct drm_apu_gem_dequeue {
+	__u32 out_sync;
+	__u16 result;
+	__u16 size;
+	__u64 data;
+};
+
+struct drm_apu_gem_iommu_map {
+	__u64 bo_handles;
+	__u32 bo_handle_count;
+	__u64 bo_device_addresses;
+};
+
+struct apu_job_event {
+	struct drm_event base;
+	__u32 out_sync;
+};
+
+#define APU_ONLINE		BIT(0)
+#define APU_CRASHED		BIT(1)
+#define APU_TIMEDOUT		BIT(2)
+
+struct drm_apu_state {
+	__u32 device;
+	__u32 flags;
+};
+
+#define DRM_APU_GEM_NEW			0x00
+#define DRM_APU_GEM_QUEUE		0x01
+#define DRM_APU_GEM_DEQUEUE		0x02
+#define DRM_APU_GEM_IOMMU_MAP		0x03
+#define DRM_APU_GEM_IOMMU_UNMAP		0x04
+#define DRM_APU_STATE			0x05
+#define DRM_APU_NUM_IOCTLS		0x06
+
+#define DRM_IOCTL_APU_GEM_NEW		DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_NEW, struct drm_apu_gem_new)
+#define DRM_IOCTL_APU_GEM_USER_NEW	DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_USER_NEW, struct drm_apu_gem_user_new)
+#define DRM_IOCTL_APU_GEM_QUEUE		DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_QUEUE, struct drm_apu_gem_queue)
+#define DRM_IOCTL_APU_GEM_DEQUEUE	DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_DEQUEUE, struct drm_apu_gem_dequeue)
+#define DRM_IOCTL_APU_GEM_IOMMU_MAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_IOMMU_MAP, struct drm_apu_gem_iommu_map)
+#define DRM_IOCTL_APU_GEM_IOMMU_UNMAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_GEM_IOMMU_UNMAP, struct drm_apu_gem_iommu_map)
+#define DRM_IOCTL_APU_STATE		DRM_IOWR(DRM_COMMAND_BASE + DRM_APU_STATE, struct drm_apu_state)
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif /* __UAPI_APU_DRM_H__ */
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 3/4] rpmsg: Add support of AI Processor Unit (APU)
  2021-09-17 12:59 ` Alexandre Bailon
  (?)
@ 2021-09-17 12:59   ` Alexandre Bailon
  -1 siblings, 0 replies; 34+ messages in thread
From: Alexandre Bailon @ 2021-09-17 12:59 UTC (permalink / raw)
  To: airlied, daniel, robh+dt, matthias.bgg, maarten.lankhorst,
	mripard, tzimmermann, ohad, bjorn.andersson, mathieu.poirier,
	sumit.semwal
  Cc: christian.koenig, dri-devel, devicetree, linux-arm-kernel,
	linux-mediatek, linux-kernel, linux-remoteproc, linux-media,
	linaro-mm-sig, khilman, gpain, Alexandre Bailon

Some Mediatek SoC provides hardware accelerator for AI / ML.
This driver use the DRM driver to manage the shared memory,
and use rpmsg to execute jobs on the APU.

Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
---
 drivers/rpmsg/Kconfig     |  10 +++
 drivers/rpmsg/Makefile    |   1 +
 drivers/rpmsg/apu_rpmsg.c | 184 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 195 insertions(+)
 create mode 100644 drivers/rpmsg/apu_rpmsg.c

diff --git a/drivers/rpmsg/Kconfig b/drivers/rpmsg/Kconfig
index 0b4407abdf138..fc1668f795004 100644
--- a/drivers/rpmsg/Kconfig
+++ b/drivers/rpmsg/Kconfig
@@ -73,4 +73,14 @@ config RPMSG_VIRTIO
 	select RPMSG_NS
 	select VIRTIO
 
+config RPMSG_APU
+	tristate "APU RPMSG driver"
+	select REMOTEPROC
+	select RPMSG_VIRTIO
+	select DRM_APU
+	help
+	  This provides a RPMSG driver that provides some facilities to
+	  communicate with an accelerated processing unit (APU).
+	  This Uses the APU DRM driver to manage memory and job scheduling.
+
 endmenu
diff --git a/drivers/rpmsg/Makefile b/drivers/rpmsg/Makefile
index 8d452656f0ee3..8b336b9a817c1 100644
--- a/drivers/rpmsg/Makefile
+++ b/drivers/rpmsg/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_RPMSG_QCOM_GLINK_RPM) += qcom_glink_rpm.o
 obj-$(CONFIG_RPMSG_QCOM_GLINK_SMEM) += qcom_glink_smem.o
 obj-$(CONFIG_RPMSG_QCOM_SMD)	+= qcom_smd.o
 obj-$(CONFIG_RPMSG_VIRTIO)	+= virtio_rpmsg_bus.o
+obj-$(CONFIG_RPMSG_APU)		+= apu_rpmsg.o
diff --git a/drivers/rpmsg/apu_rpmsg.c b/drivers/rpmsg/apu_rpmsg.c
new file mode 100644
index 0000000000000..7e504bd176a4d
--- /dev/null
+++ b/drivers/rpmsg/apu_rpmsg.c
@@ -0,0 +1,184 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Copyright 2020 BayLibre SAS
+
+#include <asm/cacheflush.h>
+
+#include <linux/cdev.h>
+#include <linux/dma-buf.h>
+#include <linux/dma-map-ops.h>
+#include <linux/dma-mapping.h>
+#include <linux/iommu.h>
+#include <linux/iova.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/remoteproc.h>
+#include <linux/rpmsg.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include <drm/apu_drm.h>
+
+#include "rpmsg_internal.h"
+
+#define APU_RPMSG_SERVICE_MT8183 "rpmsg-mt8183-apu0"
+
+struct rpmsg_apu {
+	struct apu_core *core;
+	struct rpmsg_device *rpdev;
+};
+
+static int apu_rpmsg_callback(struct rpmsg_device *rpdev, void *data, int count,
+			      void *priv, u32 addr)
+{
+	struct rpmsg_apu *apu = dev_get_drvdata(&rpdev->dev);
+	struct apu_core *apu_core = apu->core;
+
+	return apu_drm_callback(apu_core, data, count);
+}
+
+static int apu_rpmsg_send(struct apu_core *apu_core, void *data, int len)
+{
+	struct rpmsg_apu *apu = apu_drm_priv(apu_core);
+	struct rpmsg_device *rpdev = apu->rpdev;
+
+	return rpmsg_send(rpdev->ept, data, len);
+}
+
+static struct apu_drm_ops apu_rpmsg_ops = {
+	.send = apu_rpmsg_send,
+};
+
+static int apu_init_iovad(struct rproc *rproc, struct rpmsg_apu *apu)
+{
+	struct resource_table *table;
+	struct fw_rsc_carveout *rsc;
+	int i;
+
+	if (!rproc->table_ptr) {
+		dev_err(&apu->rpdev->dev,
+			"No resource_table: has the firmware been loaded ?\n");
+		return -ENODEV;
+	}
+
+	table = rproc->table_ptr;
+	for (i = 0; i < table->num; i++) {
+		int offset = table->offset[i];
+		struct fw_rsc_hdr *hdr = (void *)table + offset;
+
+		if (hdr->type != RSC_CARVEOUT)
+			continue;
+
+		rsc = (void *)hdr + sizeof(*hdr);
+		if (apu_drm_reserve_iova(apu->core, rsc->da, rsc->len)) {
+			dev_err(&apu->rpdev->dev,
+				"failed to reserve iova\n");
+			return -ENOMEM;
+		}
+	}
+
+	return 0;
+}
+
+static struct rproc *apu_get_rproc(struct rpmsg_device *rpdev)
+{
+	/*
+	 * To work, the APU RPMsg driver need to get the rproc device.
+	 * Currently, we only use virtio so we could use that to find the
+	 * remoteproc parent.
+	 */
+	if (!rpdev->dev.parent && rpdev->dev.parent->bus) {
+		dev_err(&rpdev->dev, "invalid rpmsg device\n");
+		return ERR_PTR(-EINVAL);
+	}
+
+	if (strcmp(rpdev->dev.parent->bus->name, "virtio")) {
+		dev_err(&rpdev->dev, "unsupported bus\n");
+		return ERR_PTR(-EINVAL);
+	}
+
+	return vdev_to_rproc(dev_to_virtio(rpdev->dev.parent));
+}
+
+static int apu_rpmsg_probe(struct rpmsg_device *rpdev)
+{
+	struct rpmsg_apu *apu;
+	struct rproc *rproc;
+	int ret;
+
+	apu = devm_kzalloc(&rpdev->dev, sizeof(*apu), GFP_KERNEL);
+	if (!apu)
+		return -ENOMEM;
+	apu->rpdev = rpdev;
+
+	rproc = apu_get_rproc(rpdev);
+	if (IS_ERR_OR_NULL(rproc))
+		return PTR_ERR(rproc);
+
+	/* Make device dma capable by inheriting from parent's capabilities */
+	set_dma_ops(&rpdev->dev, get_dma_ops(rproc->dev.parent));
+
+	ret = dma_coerce_mask_and_coherent(&rpdev->dev,
+					   dma_get_mask(rproc->dev.parent));
+	if (ret)
+		goto err_put_device;
+
+	rpdev->dev.iommu_group = rproc->dev.parent->iommu_group;
+
+	apu->core = apu_drm_register_core(rproc, &apu_rpmsg_ops, apu);
+	if (!apu->core) {
+		ret = -ENODEV;
+		goto err_put_device;
+	}
+
+	ret = apu_init_iovad(rproc, apu);
+
+	dev_set_drvdata(&rpdev->dev, apu);
+
+	return ret;
+
+err_put_device:
+	devm_kfree(&rpdev->dev, apu);
+
+	return ret;
+}
+
+static void apu_rpmsg_remove(struct rpmsg_device *rpdev)
+{
+	struct rpmsg_apu *apu = dev_get_drvdata(&rpdev->dev);
+
+	apu_drm_unregister_core(apu);
+	devm_kfree(&rpdev->dev, apu);
+}
+
+static const struct rpmsg_device_id apu_rpmsg_match[] = {
+	{ APU_RPMSG_SERVICE_MT8183 },
+	{}
+};
+
+static struct rpmsg_driver apu_rpmsg_driver = {
+	.probe = apu_rpmsg_probe,
+	.remove = apu_rpmsg_remove,
+	.callback = apu_rpmsg_callback,
+	.id_table = apu_rpmsg_match,
+	.drv  = {
+		.name  = "apu_rpmsg",
+	},
+};
+
+static int __init apu_rpmsg_init(void)
+{
+	return register_rpmsg_driver(&apu_rpmsg_driver);
+}
+arch_initcall(apu_rpmsg_init);
+
+static void __exit apu_rpmsg_exit(void)
+{
+	unregister_rpmsg_driver(&apu_rpmsg_driver);
+}
+module_exit(apu_rpmsg_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("APU RPMSG driver");
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 3/4] rpmsg: Add support of AI Processor Unit (APU)
@ 2021-09-17 12:59   ` Alexandre Bailon
  0 siblings, 0 replies; 34+ messages in thread
From: Alexandre Bailon @ 2021-09-17 12:59 UTC (permalink / raw)
  To: airlied, daniel, robh+dt, matthias.bgg, maarten.lankhorst,
	mripard, tzimmermann, ohad, bjorn.andersson, mathieu.poirier,
	sumit.semwal
  Cc: christian.koenig, dri-devel, devicetree, linux-arm-kernel,
	linux-mediatek, linux-kernel, linux-remoteproc, linux-media,
	linaro-mm-sig, khilman, gpain, Alexandre Bailon

Some Mediatek SoC provides hardware accelerator for AI / ML.
This driver use the DRM driver to manage the shared memory,
and use rpmsg to execute jobs on the APU.

Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
---
 drivers/rpmsg/Kconfig     |  10 +++
 drivers/rpmsg/Makefile    |   1 +
 drivers/rpmsg/apu_rpmsg.c | 184 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 195 insertions(+)
 create mode 100644 drivers/rpmsg/apu_rpmsg.c

diff --git a/drivers/rpmsg/Kconfig b/drivers/rpmsg/Kconfig
index 0b4407abdf138..fc1668f795004 100644
--- a/drivers/rpmsg/Kconfig
+++ b/drivers/rpmsg/Kconfig
@@ -73,4 +73,14 @@ config RPMSG_VIRTIO
 	select RPMSG_NS
 	select VIRTIO
 
+config RPMSG_APU
+	tristate "APU RPMSG driver"
+	select REMOTEPROC
+	select RPMSG_VIRTIO
+	select DRM_APU
+	help
+	  This provides a RPMSG driver that provides some facilities to
+	  communicate with an accelerated processing unit (APU).
+	  This Uses the APU DRM driver to manage memory and job scheduling.
+
 endmenu
diff --git a/drivers/rpmsg/Makefile b/drivers/rpmsg/Makefile
index 8d452656f0ee3..8b336b9a817c1 100644
--- a/drivers/rpmsg/Makefile
+++ b/drivers/rpmsg/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_RPMSG_QCOM_GLINK_RPM) += qcom_glink_rpm.o
 obj-$(CONFIG_RPMSG_QCOM_GLINK_SMEM) += qcom_glink_smem.o
 obj-$(CONFIG_RPMSG_QCOM_SMD)	+= qcom_smd.o
 obj-$(CONFIG_RPMSG_VIRTIO)	+= virtio_rpmsg_bus.o
+obj-$(CONFIG_RPMSG_APU)		+= apu_rpmsg.o
diff --git a/drivers/rpmsg/apu_rpmsg.c b/drivers/rpmsg/apu_rpmsg.c
new file mode 100644
index 0000000000000..7e504bd176a4d
--- /dev/null
+++ b/drivers/rpmsg/apu_rpmsg.c
@@ -0,0 +1,184 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Copyright 2020 BayLibre SAS
+
+#include <asm/cacheflush.h>
+
+#include <linux/cdev.h>
+#include <linux/dma-buf.h>
+#include <linux/dma-map-ops.h>
+#include <linux/dma-mapping.h>
+#include <linux/iommu.h>
+#include <linux/iova.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/remoteproc.h>
+#include <linux/rpmsg.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include <drm/apu_drm.h>
+
+#include "rpmsg_internal.h"
+
+#define APU_RPMSG_SERVICE_MT8183 "rpmsg-mt8183-apu0"
+
+struct rpmsg_apu {
+	struct apu_core *core;
+	struct rpmsg_device *rpdev;
+};
+
+static int apu_rpmsg_callback(struct rpmsg_device *rpdev, void *data, int count,
+			      void *priv, u32 addr)
+{
+	struct rpmsg_apu *apu = dev_get_drvdata(&rpdev->dev);
+	struct apu_core *apu_core = apu->core;
+
+	return apu_drm_callback(apu_core, data, count);
+}
+
+static int apu_rpmsg_send(struct apu_core *apu_core, void *data, int len)
+{
+	struct rpmsg_apu *apu = apu_drm_priv(apu_core);
+	struct rpmsg_device *rpdev = apu->rpdev;
+
+	return rpmsg_send(rpdev->ept, data, len);
+}
+
+static struct apu_drm_ops apu_rpmsg_ops = {
+	.send = apu_rpmsg_send,
+};
+
+static int apu_init_iovad(struct rproc *rproc, struct rpmsg_apu *apu)
+{
+	struct resource_table *table;
+	struct fw_rsc_carveout *rsc;
+	int i;
+
+	if (!rproc->table_ptr) {
+		dev_err(&apu->rpdev->dev,
+			"No resource_table: has the firmware been loaded ?\n");
+		return -ENODEV;
+	}
+
+	table = rproc->table_ptr;
+	for (i = 0; i < table->num; i++) {
+		int offset = table->offset[i];
+		struct fw_rsc_hdr *hdr = (void *)table + offset;
+
+		if (hdr->type != RSC_CARVEOUT)
+			continue;
+
+		rsc = (void *)hdr + sizeof(*hdr);
+		if (apu_drm_reserve_iova(apu->core, rsc->da, rsc->len)) {
+			dev_err(&apu->rpdev->dev,
+				"failed to reserve iova\n");
+			return -ENOMEM;
+		}
+	}
+
+	return 0;
+}
+
+static struct rproc *apu_get_rproc(struct rpmsg_device *rpdev)
+{
+	/*
+	 * To work, the APU RPMsg driver need to get the rproc device.
+	 * Currently, we only use virtio so we could use that to find the
+	 * remoteproc parent.
+	 */
+	if (!rpdev->dev.parent && rpdev->dev.parent->bus) {
+		dev_err(&rpdev->dev, "invalid rpmsg device\n");
+		return ERR_PTR(-EINVAL);
+	}
+
+	if (strcmp(rpdev->dev.parent->bus->name, "virtio")) {
+		dev_err(&rpdev->dev, "unsupported bus\n");
+		return ERR_PTR(-EINVAL);
+	}
+
+	return vdev_to_rproc(dev_to_virtio(rpdev->dev.parent));
+}
+
+static int apu_rpmsg_probe(struct rpmsg_device *rpdev)
+{
+	struct rpmsg_apu *apu;
+	struct rproc *rproc;
+	int ret;
+
+	apu = devm_kzalloc(&rpdev->dev, sizeof(*apu), GFP_KERNEL);
+	if (!apu)
+		return -ENOMEM;
+	apu->rpdev = rpdev;
+
+	rproc = apu_get_rproc(rpdev);
+	if (IS_ERR_OR_NULL(rproc))
+		return PTR_ERR(rproc);
+
+	/* Make device dma capable by inheriting from parent's capabilities */
+	set_dma_ops(&rpdev->dev, get_dma_ops(rproc->dev.parent));
+
+	ret = dma_coerce_mask_and_coherent(&rpdev->dev,
+					   dma_get_mask(rproc->dev.parent));
+	if (ret)
+		goto err_put_device;
+
+	rpdev->dev.iommu_group = rproc->dev.parent->iommu_group;
+
+	apu->core = apu_drm_register_core(rproc, &apu_rpmsg_ops, apu);
+	if (!apu->core) {
+		ret = -ENODEV;
+		goto err_put_device;
+	}
+
+	ret = apu_init_iovad(rproc, apu);
+
+	dev_set_drvdata(&rpdev->dev, apu);
+
+	return ret;
+
+err_put_device:
+	devm_kfree(&rpdev->dev, apu);
+
+	return ret;
+}
+
+static void apu_rpmsg_remove(struct rpmsg_device *rpdev)
+{
+	struct rpmsg_apu *apu = dev_get_drvdata(&rpdev->dev);
+
+	apu_drm_unregister_core(apu);
+	devm_kfree(&rpdev->dev, apu);
+}
+
+static const struct rpmsg_device_id apu_rpmsg_match[] = {
+	{ APU_RPMSG_SERVICE_MT8183 },
+	{}
+};
+
+static struct rpmsg_driver apu_rpmsg_driver = {
+	.probe = apu_rpmsg_probe,
+	.remove = apu_rpmsg_remove,
+	.callback = apu_rpmsg_callback,
+	.id_table = apu_rpmsg_match,
+	.drv  = {
+		.name  = "apu_rpmsg",
+	},
+};
+
+static int __init apu_rpmsg_init(void)
+{
+	return register_rpmsg_driver(&apu_rpmsg_driver);
+}
+arch_initcall(apu_rpmsg_init);
+
+static void __exit apu_rpmsg_exit(void)
+{
+	unregister_rpmsg_driver(&apu_rpmsg_driver);
+}
+module_exit(apu_rpmsg_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("APU RPMSG driver");
-- 
2.31.1


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 3/4] rpmsg: Add support of AI Processor Unit (APU)
@ 2021-09-17 12:59   ` Alexandre Bailon
  0 siblings, 0 replies; 34+ messages in thread
From: Alexandre Bailon @ 2021-09-17 12:59 UTC (permalink / raw)
  To: airlied, daniel, robh+dt, matthias.bgg, maarten.lankhorst,
	mripard, tzimmermann, ohad, bjorn.andersson, mathieu.poirier,
	sumit.semwal
  Cc: christian.koenig, dri-devel, devicetree, linux-arm-kernel,
	linux-mediatek, linux-kernel, linux-remoteproc, linux-media,
	linaro-mm-sig, khilman, gpain, Alexandre Bailon

Some Mediatek SoC provides hardware accelerator for AI / ML.
This driver use the DRM driver to manage the shared memory,
and use rpmsg to execute jobs on the APU.

Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
---
 drivers/rpmsg/Kconfig     |  10 +++
 drivers/rpmsg/Makefile    |   1 +
 drivers/rpmsg/apu_rpmsg.c | 184 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 195 insertions(+)
 create mode 100644 drivers/rpmsg/apu_rpmsg.c

diff --git a/drivers/rpmsg/Kconfig b/drivers/rpmsg/Kconfig
index 0b4407abdf138..fc1668f795004 100644
--- a/drivers/rpmsg/Kconfig
+++ b/drivers/rpmsg/Kconfig
@@ -73,4 +73,14 @@ config RPMSG_VIRTIO
 	select RPMSG_NS
 	select VIRTIO
 
+config RPMSG_APU
+	tristate "APU RPMSG driver"
+	select REMOTEPROC
+	select RPMSG_VIRTIO
+	select DRM_APU
+	help
+	  This provides a RPMSG driver that provides some facilities to
+	  communicate with an accelerated processing unit (APU).
+	  This Uses the APU DRM driver to manage memory and job scheduling.
+
 endmenu
diff --git a/drivers/rpmsg/Makefile b/drivers/rpmsg/Makefile
index 8d452656f0ee3..8b336b9a817c1 100644
--- a/drivers/rpmsg/Makefile
+++ b/drivers/rpmsg/Makefile
@@ -9,3 +9,4 @@ obj-$(CONFIG_RPMSG_QCOM_GLINK_RPM) += qcom_glink_rpm.o
 obj-$(CONFIG_RPMSG_QCOM_GLINK_SMEM) += qcom_glink_smem.o
 obj-$(CONFIG_RPMSG_QCOM_SMD)	+= qcom_smd.o
 obj-$(CONFIG_RPMSG_VIRTIO)	+= virtio_rpmsg_bus.o
+obj-$(CONFIG_RPMSG_APU)		+= apu_rpmsg.o
diff --git a/drivers/rpmsg/apu_rpmsg.c b/drivers/rpmsg/apu_rpmsg.c
new file mode 100644
index 0000000000000..7e504bd176a4d
--- /dev/null
+++ b/drivers/rpmsg/apu_rpmsg.c
@@ -0,0 +1,184 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Copyright 2020 BayLibre SAS
+
+#include <asm/cacheflush.h>
+
+#include <linux/cdev.h>
+#include <linux/dma-buf.h>
+#include <linux/dma-map-ops.h>
+#include <linux/dma-mapping.h>
+#include <linux/iommu.h>
+#include <linux/iova.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/remoteproc.h>
+#include <linux/rpmsg.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include <drm/apu_drm.h>
+
+#include "rpmsg_internal.h"
+
+#define APU_RPMSG_SERVICE_MT8183 "rpmsg-mt8183-apu0"
+
+struct rpmsg_apu {
+	struct apu_core *core;
+	struct rpmsg_device *rpdev;
+};
+
+static int apu_rpmsg_callback(struct rpmsg_device *rpdev, void *data, int count,
+			      void *priv, u32 addr)
+{
+	struct rpmsg_apu *apu = dev_get_drvdata(&rpdev->dev);
+	struct apu_core *apu_core = apu->core;
+
+	return apu_drm_callback(apu_core, data, count);
+}
+
+static int apu_rpmsg_send(struct apu_core *apu_core, void *data, int len)
+{
+	struct rpmsg_apu *apu = apu_drm_priv(apu_core);
+	struct rpmsg_device *rpdev = apu->rpdev;
+
+	return rpmsg_send(rpdev->ept, data, len);
+}
+
+static struct apu_drm_ops apu_rpmsg_ops = {
+	.send = apu_rpmsg_send,
+};
+
+static int apu_init_iovad(struct rproc *rproc, struct rpmsg_apu *apu)
+{
+	struct resource_table *table;
+	struct fw_rsc_carveout *rsc;
+	int i;
+
+	if (!rproc->table_ptr) {
+		dev_err(&apu->rpdev->dev,
+			"No resource_table: has the firmware been loaded ?\n");
+		return -ENODEV;
+	}
+
+	table = rproc->table_ptr;
+	for (i = 0; i < table->num; i++) {
+		int offset = table->offset[i];
+		struct fw_rsc_hdr *hdr = (void *)table + offset;
+
+		if (hdr->type != RSC_CARVEOUT)
+			continue;
+
+		rsc = (void *)hdr + sizeof(*hdr);
+		if (apu_drm_reserve_iova(apu->core, rsc->da, rsc->len)) {
+			dev_err(&apu->rpdev->dev,
+				"failed to reserve iova\n");
+			return -ENOMEM;
+		}
+	}
+
+	return 0;
+}
+
+static struct rproc *apu_get_rproc(struct rpmsg_device *rpdev)
+{
+	/*
+	 * To work, the APU RPMsg driver need to get the rproc device.
+	 * Currently, we only use virtio so we could use that to find the
+	 * remoteproc parent.
+	 */
+	if (!rpdev->dev.parent && rpdev->dev.parent->bus) {
+		dev_err(&rpdev->dev, "invalid rpmsg device\n");
+		return ERR_PTR(-EINVAL);
+	}
+
+	if (strcmp(rpdev->dev.parent->bus->name, "virtio")) {
+		dev_err(&rpdev->dev, "unsupported bus\n");
+		return ERR_PTR(-EINVAL);
+	}
+
+	return vdev_to_rproc(dev_to_virtio(rpdev->dev.parent));
+}
+
+static int apu_rpmsg_probe(struct rpmsg_device *rpdev)
+{
+	struct rpmsg_apu *apu;
+	struct rproc *rproc;
+	int ret;
+
+	apu = devm_kzalloc(&rpdev->dev, sizeof(*apu), GFP_KERNEL);
+	if (!apu)
+		return -ENOMEM;
+	apu->rpdev = rpdev;
+
+	rproc = apu_get_rproc(rpdev);
+	if (IS_ERR_OR_NULL(rproc))
+		return PTR_ERR(rproc);
+
+	/* Make device dma capable by inheriting from parent's capabilities */
+	set_dma_ops(&rpdev->dev, get_dma_ops(rproc->dev.parent));
+
+	ret = dma_coerce_mask_and_coherent(&rpdev->dev,
+					   dma_get_mask(rproc->dev.parent));
+	if (ret)
+		goto err_put_device;
+
+	rpdev->dev.iommu_group = rproc->dev.parent->iommu_group;
+
+	apu->core = apu_drm_register_core(rproc, &apu_rpmsg_ops, apu);
+	if (!apu->core) {
+		ret = -ENODEV;
+		goto err_put_device;
+	}
+
+	ret = apu_init_iovad(rproc, apu);
+
+	dev_set_drvdata(&rpdev->dev, apu);
+
+	return ret;
+
+err_put_device:
+	devm_kfree(&rpdev->dev, apu);
+
+	return ret;
+}
+
+static void apu_rpmsg_remove(struct rpmsg_device *rpdev)
+{
+	struct rpmsg_apu *apu = dev_get_drvdata(&rpdev->dev);
+
+	apu_drm_unregister_core(apu);
+	devm_kfree(&rpdev->dev, apu);
+}
+
+static const struct rpmsg_device_id apu_rpmsg_match[] = {
+	{ APU_RPMSG_SERVICE_MT8183 },
+	{}
+};
+
+static struct rpmsg_driver apu_rpmsg_driver = {
+	.probe = apu_rpmsg_probe,
+	.remove = apu_rpmsg_remove,
+	.callback = apu_rpmsg_callback,
+	.id_table = apu_rpmsg_match,
+	.drv  = {
+		.name  = "apu_rpmsg",
+	},
+};
+
+static int __init apu_rpmsg_init(void)
+{
+	return register_rpmsg_driver(&apu_rpmsg_driver);
+}
+arch_initcall(apu_rpmsg_init);
+
+static void __exit apu_rpmsg_exit(void)
+{
+	unregister_rpmsg_driver(&apu_rpmsg_driver);
+}
+module_exit(apu_rpmsg_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("APU RPMSG driver");
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 4/4] ARM64: mt8183-pumpkin: Add the APU DRM device
  2021-09-17 12:59 ` Alexandre Bailon
  (?)
@ 2021-09-17 12:59   ` Alexandre Bailon
  -1 siblings, 0 replies; 34+ messages in thread
From: Alexandre Bailon @ 2021-09-17 12:59 UTC (permalink / raw)
  To: airlied, daniel, robh+dt, matthias.bgg, maarten.lankhorst,
	mripard, tzimmermann, ohad, bjorn.andersson, mathieu.poirier,
	sumit.semwal
  Cc: christian.koenig, dri-devel, devicetree, linux-arm-kernel,
	linux-mediatek, linux-kernel, linux-remoteproc, linux-media,
	linaro-mm-sig, khilman, gpain, Alexandre Bailon

This add the APU DRM device to pumkpin.

Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
---
 arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts b/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts
index 7fbed2b7bc6f8..c540dbfe30151 100644
--- a/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts
+++ b/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts
@@ -98,6 +98,12 @@ ntc {
 		pulldown-ohm = <0>;
 		io-channels = <&auxadc 0>;
 	};
+
+	apu_drm@0 {
+		compatible = "mediatek,apu-drm";
+		remoteproc = <&apu0>, <&apu1>;
+		iova = <0 0x60000000 0 0x10000000>;
+	};
 };
 
 &auxadc {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 4/4] ARM64: mt8183-pumpkin: Add the APU DRM device
@ 2021-09-17 12:59   ` Alexandre Bailon
  0 siblings, 0 replies; 34+ messages in thread
From: Alexandre Bailon @ 2021-09-17 12:59 UTC (permalink / raw)
  To: airlied, daniel, robh+dt, matthias.bgg, maarten.lankhorst,
	mripard, tzimmermann, ohad, bjorn.andersson, mathieu.poirier,
	sumit.semwal
  Cc: christian.koenig, dri-devel, devicetree, linux-arm-kernel,
	linux-mediatek, linux-kernel, linux-remoteproc, linux-media,
	linaro-mm-sig, khilman, gpain, Alexandre Bailon

This add the APU DRM device to pumkpin.

Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
---
 arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts b/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts
index 7fbed2b7bc6f8..c540dbfe30151 100644
--- a/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts
+++ b/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts
@@ -98,6 +98,12 @@ ntc {
 		pulldown-ohm = <0>;
 		io-channels = <&auxadc 0>;
 	};
+
+	apu_drm@0 {
+		compatible = "mediatek,apu-drm";
+		remoteproc = <&apu0>, <&apu1>;
+		iova = <0 0x60000000 0 0x10000000>;
+	};
 };
 
 &auxadc {
-- 
2.31.1


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC PATCH 4/4] ARM64: mt8183-pumpkin: Add the APU DRM device
@ 2021-09-17 12:59   ` Alexandre Bailon
  0 siblings, 0 replies; 34+ messages in thread
From: Alexandre Bailon @ 2021-09-17 12:59 UTC (permalink / raw)
  To: airlied, daniel, robh+dt, matthias.bgg, maarten.lankhorst,
	mripard, tzimmermann, ohad, bjorn.andersson, mathieu.poirier,
	sumit.semwal
  Cc: christian.koenig, dri-devel, devicetree, linux-arm-kernel,
	linux-mediatek, linux-kernel, linux-remoteproc, linux-media,
	linaro-mm-sig, khilman, gpain, Alexandre Bailon

This add the APU DRM device to pumkpin.

Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
---
 arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts b/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts
index 7fbed2b7bc6f8..c540dbfe30151 100644
--- a/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts
+++ b/arch/arm64/boot/dts/mediatek/mt8183-pumpkin.dts
@@ -98,6 +98,12 @@ ntc {
 		pulldown-ohm = <0>;
 		io-channels = <&auxadc 0>;
 	};
+
+	apu_drm@0 {
+		compatible = "mediatek,apu-drm";
+		remoteproc = <&apu0>, <&apu1>;
+		iova = <0 0x60000000 0 0x10000000>;
+	};
 };
 
 &auxadc {
-- 
2.31.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 1/4] dt-bindings: Add bidings for mtk,apu-drm
  2021-09-17 12:59   ` Alexandre Bailon
  (?)
@ 2021-09-17 19:48     ` Rob Herring
  -1 siblings, 0 replies; 34+ messages in thread
From: Rob Herring @ 2021-09-17 19:48 UTC (permalink / raw)
  To: Alexandre Bailon
  Cc: bjorn.andersson, daniel, ohad, linux-kernel, tzimmermann,
	robh+dt, linux-mediatek, dri-devel, airlied, gpain,
	mathieu.poirier, christian.koenig, devicetree, maarten.lankhorst,
	khilman, linux-media, mripard, matthias.bgg, linaro-mm-sig,
	linux-arm-kernel, sumit.semwal, linux-remoteproc

On Fri, 17 Sep 2021 14:59:42 +0200, Alexandre Bailon wrote:
> This adds the device tree bindings for the APU DRM driver.
> 
> Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
> ---
>  .../devicetree/bindings/gpu/mtk,apu-drm.yaml  | 38 +++++++++++++++++++
>  1 file changed, 38 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
> 

My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
on your patch (DT_CHECKER_FLAGS is new in v5.13):

yamllint warnings/errors:

dtschema/dtc warnings/errors:
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml: 'maintainers' is a required property
	hint: Metaschema for devicetree binding documentation
	from schema $id: http://devicetree.org/meta-schemas/base.yaml#
./Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml: $id: relative path/filename doesn't match actual path or filename
	expected: http://devicetree.org/schemas/gpu/mtk,apu-drm.yaml#
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml: ignoring, error in schema: 
warning: no schema found in file: ./Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
Documentation/devicetree/bindings/gpu/mtk,apu-drm.example.dts:19.15-23.11: Warning (unit_address_vs_reg): /example-0/apu@0: node has a unit name, but no reg or ranges property
Documentation/devicetree/bindings/gpu/mtk,apu-drm.example.dt.yaml:0:0: /example-0/apu@0: failed to match any schema with compatible: ['mediatek,apu-drm']

doc reference errors (make refcheckdocs):

See https://patchwork.ozlabs.org/patch/1529388

This check can fail if there are any dependencies. The base for a patch
series is generally the most recent rc1.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 1/4] dt-bindings: Add bidings for mtk,apu-drm
@ 2021-09-17 19:48     ` Rob Herring
  0 siblings, 0 replies; 34+ messages in thread
From: Rob Herring @ 2021-09-17 19:48 UTC (permalink / raw)
  To: Alexandre Bailon
  Cc: bjorn.andersson, daniel, ohad, linux-kernel, tzimmermann,
	robh+dt, linux-mediatek, dri-devel, airlied, gpain,
	mathieu.poirier, christian.koenig, devicetree, maarten.lankhorst,
	khilman, linux-media, mripard, matthias.bgg, linaro-mm-sig,
	linux-arm-kernel, sumit.semwal, linux-remoteproc

On Fri, 17 Sep 2021 14:59:42 +0200, Alexandre Bailon wrote:
> This adds the device tree bindings for the APU DRM driver.
> 
> Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
> ---
>  .../devicetree/bindings/gpu/mtk,apu-drm.yaml  | 38 +++++++++++++++++++
>  1 file changed, 38 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
> 

My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
on your patch (DT_CHECKER_FLAGS is new in v5.13):

yamllint warnings/errors:

dtschema/dtc warnings/errors:
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml: 'maintainers' is a required property
	hint: Metaschema for devicetree binding documentation
	from schema $id: http://devicetree.org/meta-schemas/base.yaml#
./Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml: $id: relative path/filename doesn't match actual path or filename
	expected: http://devicetree.org/schemas/gpu/mtk,apu-drm.yaml#
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml: ignoring, error in schema: 
warning: no schema found in file: ./Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
Documentation/devicetree/bindings/gpu/mtk,apu-drm.example.dts:19.15-23.11: Warning (unit_address_vs_reg): /example-0/apu@0: node has a unit name, but no reg or ranges property
Documentation/devicetree/bindings/gpu/mtk,apu-drm.example.dt.yaml:0:0: /example-0/apu@0: failed to match any schema with compatible: ['mediatek,apu-drm']

doc reference errors (make refcheckdocs):

See https://patchwork.ozlabs.org/patch/1529388

This check can fail if there are any dependencies. The base for a patch
series is generally the most recent rc1.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit.


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 1/4] dt-bindings: Add bidings for mtk,apu-drm
@ 2021-09-17 19:48     ` Rob Herring
  0 siblings, 0 replies; 34+ messages in thread
From: Rob Herring @ 2021-09-17 19:48 UTC (permalink / raw)
  To: Alexandre Bailon
  Cc: bjorn.andersson, daniel, ohad, linux-kernel, tzimmermann,
	robh+dt, linux-mediatek, dri-devel, airlied, gpain,
	mathieu.poirier, christian.koenig, devicetree, maarten.lankhorst,
	khilman, linux-media, mripard, matthias.bgg, linaro-mm-sig,
	linux-arm-kernel, sumit.semwal, linux-remoteproc

On Fri, 17 Sep 2021 14:59:42 +0200, Alexandre Bailon wrote:
> This adds the device tree bindings for the APU DRM driver.
> 
> Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
> ---
>  .../devicetree/bindings/gpu/mtk,apu-drm.yaml  | 38 +++++++++++++++++++
>  1 file changed, 38 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
> 

My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
on your patch (DT_CHECKER_FLAGS is new in v5.13):

yamllint warnings/errors:

dtschema/dtc warnings/errors:
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml: 'maintainers' is a required property
	hint: Metaschema for devicetree binding documentation
	from schema $id: http://devicetree.org/meta-schemas/base.yaml#
./Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml: $id: relative path/filename doesn't match actual path or filename
	expected: http://devicetree.org/schemas/gpu/mtk,apu-drm.yaml#
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml: ignoring, error in schema: 
warning: no schema found in file: ./Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
Documentation/devicetree/bindings/gpu/mtk,apu-drm.example.dts:19.15-23.11: Warning (unit_address_vs_reg): /example-0/apu@0: node has a unit name, but no reg or ranges property
Documentation/devicetree/bindings/gpu/mtk,apu-drm.example.dt.yaml:0:0: /example-0/apu@0: failed to match any schema with compatible: ['mediatek,apu-drm']

doc reference errors (make refcheckdocs):

See https://patchwork.ozlabs.org/patch/1529388

This check can fail if there are any dependencies. The base for a patch
series is generally the most recent rc1.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 2/4] DRM: Add support of AI Processor Unit (APU)
  2021-09-17 12:59   ` Alexandre Bailon
  (?)
  (?)
@ 2021-09-19  3:19   ` Hillf Danton
  -1 siblings, 0 replies; 34+ messages in thread
From: Hillf Danton @ 2021-09-19  3:19 UTC (permalink / raw)
  To: Alexandre Bailon; +Cc: dri-devel, linux-kernel

On Fri, 17 Sep 2021 14:59:43 +0200 Alexandre Bailon wrote:
> +static DEFINE_IDA(req_ida);
> +static LIST_HEAD(complete_node);

I see accesses to complete_node in apu_drm_callback(), apu_job_timedout()
and ioctl_gem_dequeue() without working out the serialization to avoid
list corruption. Can you add a comment to specify it?

> +
> +int apu_drm_callback(struct apu_core *apu_core, void *data, int len)
> +{
> +	struct apu_request *apu_req, *tmp;
> +	struct apu_dev_request *hdr = data;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&apu_core->ctx_lock, flags);
> +	list_for_each_entry_safe(apu_req, tmp, &apu_core->requests, node) {
> +		struct apu_job *job = apu_req->job;
> +
> +		if (job && hdr->id == job->id) {
> +			kref_get(&job->refcount);
> +			job->result = hdr->result;
> +			if (job->size_out)
> +				memcpy(job->data_out, hdr->data + job->size_in,
> +				       min(job->size_out, hdr->size_out));
> +			job->size_out = hdr->size_out;
> +			list_add(&job->node, &complete_node);
> +			list_del(&apu_req->node);
> +			ida_simple_remove(&req_ida, hdr->id);
> +			kfree(apu_req);
> +			drm_send_event(job->apu_drm->drm,
> +				       &job->event->pending_event);
> +			dma_fence_signal_locked(job->done_fence);
> +		}
> +	}
> +	spin_unlock_irqrestore(&apu_core->ctx_lock, flags);
> +
> +	return 0;
> +}

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 1/4] dt-bindings: Add bidings for mtk,apu-drm
  2021-09-17 12:59   ` Alexandre Bailon
  (?)
@ 2021-09-20 20:55     ` Rob Herring
  -1 siblings, 0 replies; 34+ messages in thread
From: Rob Herring @ 2021-09-20 20:55 UTC (permalink / raw)
  To: Alexandre Bailon
  Cc: airlied, daniel, matthias.bgg, maarten.lankhorst, mripard,
	tzimmermann, ohad, bjorn.andersson, mathieu.poirier,
	sumit.semwal, christian.koenig, dri-devel, devicetree,
	linux-arm-kernel, linux-mediatek, linux-kernel, linux-remoteproc,
	linux-media, linaro-mm-sig, khilman, gpain

On Fri, Sep 17, 2021 at 02:59:42PM +0200, Alexandre Bailon wrote:
> This adds the device tree bindings for the APU DRM driver.
> 
> Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
> ---
>  .../devicetree/bindings/gpu/mtk,apu-drm.yaml  | 38 +++++++++++++++++++
>  1 file changed, 38 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
> 
> diff --git a/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
> new file mode 100644
> index 0000000000000..6f432d3ea478c
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
> @@ -0,0 +1,38 @@
> +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/gpu/mediatek,apu-drm.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: AI Processor Unit DRM

DRM is a linux thing, not h/w.

> +
> +properties:
> +  compatible:
> +    const: mediatek,apu-drm
> +
> +  remoteproc:

So is remoteproc.

Why don't you have the remoteproc driver create the DRM device?

> +    maxItems: 2
> +    description:
> +      Handle to remoteproc devices controlling the APU
> +
> +  iova:
> +    maxItems: 1
> +    description:
> +      Address and size of virtual memory that could used by the APU

Why does this need to be in DT? If you need to reserve certain VAs, then 
this discussion[1] might be of interest.

Rob

[1] https://lore.kernel.org/all/YUIPCxnyRutMS47%2F@orome.fritz.box/

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 1/4] dt-bindings: Add bidings for mtk,apu-drm
@ 2021-09-20 20:55     ` Rob Herring
  0 siblings, 0 replies; 34+ messages in thread
From: Rob Herring @ 2021-09-20 20:55 UTC (permalink / raw)
  To: Alexandre Bailon
  Cc: airlied, daniel, matthias.bgg, maarten.lankhorst, mripard,
	tzimmermann, ohad, bjorn.andersson, mathieu.poirier,
	sumit.semwal, christian.koenig, dri-devel, devicetree,
	linux-arm-kernel, linux-mediatek, linux-kernel, linux-remoteproc,
	linux-media, linaro-mm-sig, khilman, gpain

On Fri, Sep 17, 2021 at 02:59:42PM +0200, Alexandre Bailon wrote:
> This adds the device tree bindings for the APU DRM driver.
> 
> Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
> ---
>  .../devicetree/bindings/gpu/mtk,apu-drm.yaml  | 38 +++++++++++++++++++
>  1 file changed, 38 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
> 
> diff --git a/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
> new file mode 100644
> index 0000000000000..6f432d3ea478c
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
> @@ -0,0 +1,38 @@
> +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/gpu/mediatek,apu-drm.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: AI Processor Unit DRM

DRM is a linux thing, not h/w.

> +
> +properties:
> +  compatible:
> +    const: mediatek,apu-drm
> +
> +  remoteproc:

So is remoteproc.

Why don't you have the remoteproc driver create the DRM device?

> +    maxItems: 2
> +    description:
> +      Handle to remoteproc devices controlling the APU
> +
> +  iova:
> +    maxItems: 1
> +    description:
> +      Address and size of virtual memory that could used by the APU

Why does this need to be in DT? If you need to reserve certain VAs, then 
this discussion[1] might be of interest.

Rob

[1] https://lore.kernel.org/all/YUIPCxnyRutMS47%2F@orome.fritz.box/

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 1/4] dt-bindings: Add bidings for mtk,apu-drm
@ 2021-09-20 20:55     ` Rob Herring
  0 siblings, 0 replies; 34+ messages in thread
From: Rob Herring @ 2021-09-20 20:55 UTC (permalink / raw)
  To: Alexandre Bailon
  Cc: airlied, daniel, matthias.bgg, maarten.lankhorst, mripard,
	tzimmermann, ohad, bjorn.andersson, mathieu.poirier,
	sumit.semwal, christian.koenig, dri-devel, devicetree,
	linux-arm-kernel, linux-mediatek, linux-kernel, linux-remoteproc,
	linux-media, linaro-mm-sig, khilman, gpain

On Fri, Sep 17, 2021 at 02:59:42PM +0200, Alexandre Bailon wrote:
> This adds the device tree bindings for the APU DRM driver.
> 
> Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
> ---
>  .../devicetree/bindings/gpu/mtk,apu-drm.yaml  | 38 +++++++++++++++++++
>  1 file changed, 38 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
> 
> diff --git a/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
> new file mode 100644
> index 0000000000000..6f432d3ea478c
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
> @@ -0,0 +1,38 @@
> +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/gpu/mediatek,apu-drm.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: AI Processor Unit DRM

DRM is a linux thing, not h/w.

> +
> +properties:
> +  compatible:
> +    const: mediatek,apu-drm
> +
> +  remoteproc:

So is remoteproc.

Why don't you have the remoteproc driver create the DRM device?

> +    maxItems: 2
> +    description:
> +      Handle to remoteproc devices controlling the APU
> +
> +  iova:
> +    maxItems: 1
> +    description:
> +      Address and size of virtual memory that could used by the APU

Why does this need to be in DT? If you need to reserve certain VAs, then 
this discussion[1] might be of interest.

Rob

[1] https://lore.kernel.org/all/YUIPCxnyRutMS47%2F@orome.fritz.box/

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 2/4] DRM: Add support of AI Processor Unit (APU)
  2021-09-17 12:59   ` Alexandre Bailon
  (?)
  (?)
@ 2021-09-23  0:58     ` Dave Airlie
  -1 siblings, 0 replies; 34+ messages in thread
From: Dave Airlie @ 2021-09-23  0:58 UTC (permalink / raw)
  To: Alexandre Bailon
  Cc: Dave Airlie, Daniel Vetter, Rob Herring, Matthias Brugger,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, ohad,
	bjorn.andersson, Mathieu Poirier, Sumit Semwal, Koenig,
	Christian, dri-devel,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	linux-arm-kernel, moderated list:ARM/Mediatek SoC support, LKML,
	linux-remoteproc, Linux Media Mailing List,
	moderated list:DMA BUFFER SHARING FRAMEWORK, khilman, gpain

On Sat, 18 Sept 2021 at 07:57, Alexandre Bailon <abailon@baylibre.com> wrote:
>
> Some Mediatek SoC provides hardware accelerator for AI / ML.
> This driver provides the infrastructure to manage memory
> shared between host CPU and the accelerator, and to submit
> jobs to the accelerator.
> The APU itself is managed by remoteproc so this drivers
> relies on remoteproc to found the APU and get some important data
> from it. But, the driver is quite generic and it should possible
> to manage accelerator using another ways.
> This driver doesn't manage itself the data transmitions.
> It must be registered by another driver implementing the transmitions.
>
> Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
> ---
>  drivers/gpu/drm/Kconfig            |   2 +
>  drivers/gpu/drm/Makefile           |   1 +
>  drivers/gpu/drm/apu/Kconfig        |  10 +
>  drivers/gpu/drm/apu/Makefile       |   7 +
>  drivers/gpu/drm/apu/apu_drm_drv.c  | 238 +++++++++++
>  drivers/gpu/drm/apu/apu_gem.c      | 232 +++++++++++
>  drivers/gpu/drm/apu/apu_internal.h |  89 ++++
>  drivers/gpu/drm/apu/apu_sched.c    | 634 +++++++++++++++++++++++++++++
>  include/drm/apu_drm.h              |  59 +++
>  include/uapi/drm/apu_drm.h         | 106 +++++
>  10 files changed, 1378 insertions(+)
>  create mode 100644 drivers/gpu/drm/apu/Kconfig
>  create mode 100644 drivers/gpu/drm/apu/Makefile
>  create mode 100644 drivers/gpu/drm/apu/apu_drm_drv.c
>  create mode 100644 drivers/gpu/drm/apu/apu_gem.c
>  create mode 100644 drivers/gpu/drm/apu/apu_internal.h
>  create mode 100644 drivers/gpu/drm/apu/apu_sched.c
>  create mode 100644 include/drm/apu_drm.h
>  create mode 100644 include/uapi/drm/apu_drm.h
>
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index 8fc40317f2b77..bcdca35c9eda5 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -382,6 +382,8 @@ source "drivers/gpu/drm/xlnx/Kconfig"
>
>  source "drivers/gpu/drm/gud/Kconfig"
>
> +source "drivers/gpu/drm/apu/Kconfig"
> +
>  config DRM_HYPERV
>         tristate "DRM Support for Hyper-V synthetic video device"
>         depends on DRM && PCI && MMU && HYPERV
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index ad11121548983..f3d8432976558 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -127,4 +127,5 @@ obj-$(CONFIG_DRM_MCDE) += mcde/
>  obj-$(CONFIG_DRM_TIDSS) += tidss/
>  obj-y                  += xlnx/
>  obj-y                  += gud/
> +obj-$(CONFIG_DRM_APU) += apu/
>  obj-$(CONFIG_DRM_HYPERV) += hyperv/
> diff --git a/drivers/gpu/drm/apu/Kconfig b/drivers/gpu/drm/apu/Kconfig
> new file mode 100644
> index 0000000000000..c8471309a0351
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/Kconfig
> @@ -0,0 +1,10 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +
> +config DRM_APU
> +       tristate "APU (AI Processor Unit)"
> +       select REMOTEPROC
> +       select DRM_SCHED
> +       help
> +         This provides a DRM driver that provides some facilities to
> +         communicate with an accelerated processing unit (APU).
> diff --git a/drivers/gpu/drm/apu/Makefile b/drivers/gpu/drm/apu/Makefile
> new file mode 100644
> index 0000000000000..3e97846b091c9
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/Makefile
> @@ -0,0 +1,7 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +apu_drm-y += apu_drm_drv.o
> +apu_drm-y += apu_sched.o
> +apu_drm-y += apu_gem.o
> +
> +obj-$(CONFIG_DRM_APU) += apu_drm.o
> diff --git a/drivers/gpu/drm/apu/apu_drm_drv.c b/drivers/gpu/drm/apu/apu_drm_drv.c
> new file mode 100644
> index 0000000000000..91d8c99e373c0
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/apu_drm_drv.c
> @@ -0,0 +1,238 @@
> +// SPDX-License-Identifier: GPL-2.0
> +//
> +// Copyright 2020 BayLibre SAS
> +
> +#include <linux/dma-map-ops.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/iommu.h>
> +#include <linux/iova.h>
> +#include <linux/list.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/platform_device.h>
> +#include <linux/remoteproc.h>
> +
> +#include <drm/apu_drm.h>
> +#include <drm/drm_drv.h>
> +#include <drm/drm_gem_cma_helper.h>
> +#include <drm/drm_probe_helper.h>
> +
> +#include <uapi/drm/apu_drm.h>
> +
> +#include "apu_internal.h"
> +
> +static LIST_HEAD(apu_devices);
> +
> +static const struct drm_ioctl_desc ioctls[] = {
> +       DRM_IOCTL_DEF_DRV(APU_GEM_NEW, ioctl_gem_new,
> +                         DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(APU_GEM_QUEUE, ioctl_gem_queue,
> +                         DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(APU_GEM_DEQUEUE, ioctl_gem_dequeue,
> +                         DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_MAP, ioctl_gem_iommu_map,
> +                         DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_UNMAP, ioctl_gem_iommu_unmap,
> +                         DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(APU_STATE, ioctl_apu_state,
> +                         DRM_RENDER_ALLOW),
> +};
> +
> +DEFINE_DRM_GEM_CMA_FOPS(apu_drm_ops);
> +
> +static struct drm_driver apu_drm_driver = {
> +       .driver_features = DRIVER_GEM | DRIVER_SYNCOBJ,
> +       .name = "drm_apu",
> +       .desc = "APU DRM driver",
> +       .date = "20210319",
> +       .major = 1,
> +       .minor = 0,
> +       .patchlevel = 0,
> +       .ioctls = ioctls,
> +       .num_ioctls = ARRAY_SIZE(ioctls),
> +       .fops = &apu_drm_ops,
> +       DRM_GEM_CMA_DRIVER_OPS_WITH_DUMB_CREATE(drm_gem_cma_dumb_create),
> +};
> +
> +void *apu_drm_priv(struct apu_core *apu_core)
> +{
> +       return apu_core->priv;
> +}
> +EXPORT_SYMBOL_GPL(apu_drm_priv);
> +
> +int apu_drm_reserve_iova(struct apu_core *apu_core, u64 start, u64 size)
> +{
> +       struct apu_drm *apu_drm = apu_core->apu_drm;
> +       struct iova *iova;
> +
> +       iova = reserve_iova(&apu_drm->iovad, PHYS_PFN(start),
> +                           PHYS_PFN(start + size));
> +       if (!iova)
> +               return -ENOMEM;
> +
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(apu_drm_reserve_iova);
> +
> +static int apu_drm_init_first_core(struct apu_drm *apu_drm,
> +                                  struct apu_core *apu_core)
> +{
> +       struct drm_device *drm;
> +       struct device *parent;
> +       u64 mask;
> +
> +       drm = apu_drm->drm;
> +       parent = apu_core->rproc->dev.parent;
> +       drm->dev->iommu_group = parent->iommu_group;
> +       apu_drm->domain = iommu_get_domain_for_dev(parent);
> +       set_dma_ops(drm->dev, get_dma_ops(parent));
> +       mask = dma_get_mask(parent);
> +       return dma_coerce_mask_and_coherent(drm->dev, mask);
> +}
> +
> +struct apu_core *apu_drm_register_core(struct rproc *rproc,
> +                                      struct apu_drm_ops *ops, void *priv)
> +{
> +       struct apu_drm *apu_drm;
> +       struct apu_core *apu_core;
> +       int ret;
> +
> +       list_for_each_entry(apu_drm, &apu_devices, node) {
> +               list_for_each_entry(apu_core, &apu_drm->apu_cores, node) {
> +                       if (apu_core->rproc == rproc) {
> +                               ret =
> +                                   apu_drm_init_first_core(apu_drm, apu_core);
> +                               apu_core->dev = &rproc->dev;
> +                               apu_core->priv = priv;
> +                               apu_core->ops = ops;
> +
> +                               ret = apu_drm_job_init(apu_core);
> +                               if (ret)
> +                                       return NULL;
> +
> +                               return apu_core;
> +                       }
> +               }
> +       }
> +
> +       return NULL;
> +}
> +EXPORT_SYMBOL_GPL(apu_drm_register_core);
> +
> +int apu_drm_unregister_core(void *priv)
> +{
> +       struct apu_drm *apu_drm;
> +       struct apu_core *apu_core;
> +
> +       list_for_each_entry(apu_drm, &apu_devices, node) {
> +               list_for_each_entry(apu_core, &apu_drm->apu_cores, node) {
> +                       if (apu_core->priv == priv) {
> +                               apu_sched_fini(apu_core);
> +                               apu_core->priv = NULL;
> +                               apu_core->ops = NULL;
> +                       }
> +               }
> +       }
> +
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(apu_drm_unregister_core);
> +
> +#ifdef CONFIG_OF
> +static const struct of_device_id apu_platform_of_match[] = {
> +       { .compatible = "mediatek,apu-drm", },
> +       { },
> +};
> +
> +MODULE_DEVICE_TABLE(of, apu_platform_of_match);
> +#endif
> +
> +static int apu_platform_probe(struct platform_device *pdev)
> +{
> +       struct drm_device *drm;
> +       struct apu_drm *apu_drm;
> +       struct of_phandle_iterator it;
> +       int index = 0;
> +       u64 iova[2];
> +       int ret;
> +
> +       apu_drm = devm_kzalloc(&pdev->dev, sizeof(*apu_drm), GFP_KERNEL);
> +       if (!apu_drm)
> +               return -ENOMEM;
> +       INIT_LIST_HEAD(&apu_drm->apu_cores);
> +
> +       of_phandle_iterator_init(&it, pdev->dev.of_node, "remoteproc", NULL, 0);
> +       while (of_phandle_iterator_next(&it) == 0) {
> +               struct rproc *rproc = rproc_get_by_phandle(it.phandle);
> +               struct apu_core *apu_core;
> +
> +               if (!rproc)
> +                       return -EPROBE_DEFER;
> +
> +               apu_core = devm_kzalloc(&pdev->dev, sizeof(*apu_core),
> +                                       GFP_KERNEL);
> +               if (!apu_core)
> +                       return -ENOMEM;
> +
> +               apu_core->rproc = rproc;
> +               apu_core->device_id = index++;
> +               apu_core->apu_drm = apu_drm;
> +               spin_lock_init(&apu_core->ctx_lock);
> +               INIT_LIST_HEAD(&apu_core->requests);
> +               list_add(&apu_core->node, &apu_drm->apu_cores);
> +       }
> +
> +       if (of_property_read_variable_u64_array(pdev->dev.of_node, "iova",
> +                                               iova, ARRAY_SIZE(iova),
> +                                               ARRAY_SIZE(iova)) !=
> +           ARRAY_SIZE(iova))
> +               return -EINVAL;
> +
> +       init_iova_domain(&apu_drm->iovad, PAGE_SIZE, PHYS_PFN(iova[0]));
> +       apu_drm->iova_limit_pfn = PHYS_PFN(iova[0] + iova[1]) - 1;
> +
> +       drm = drm_dev_alloc(&apu_drm_driver, &pdev->dev);
> +       if (IS_ERR(drm)) {
> +               ret = PTR_ERR(drm);
> +               return ret;
> +       }
> +
> +       ret = drm_dev_register(drm, 0);
> +       if (ret) {
> +               drm_dev_put(drm);
> +               return ret;
> +       }
> +
> +       drm->dev_private = apu_drm;
> +       apu_drm->drm = drm;
> +       apu_drm->dev = &pdev->dev;
> +
> +       platform_set_drvdata(pdev, drm);
> +
> +       list_add(&apu_drm->node, &apu_devices);
> +
> +       return 0;
> +}
> +
> +static int apu_platform_remove(struct platform_device *pdev)
> +{
> +       struct drm_device *drm;
> +
> +       drm = platform_get_drvdata(pdev);
> +
> +       drm_dev_unregister(drm);
> +       drm_dev_put(drm);
> +
> +       return 0;
> +}
> +
> +static struct platform_driver apu_platform_driver = {
> +       .probe = apu_platform_probe,
> +       .remove = apu_platform_remove,
> +       .driver = {
> +                  .name = "apu_drm",
> +                  .of_match_table = of_match_ptr(apu_platform_of_match),
> +       },
> +};
> +
> +module_platform_driver(apu_platform_driver);
> diff --git a/drivers/gpu/drm/apu/apu_gem.c b/drivers/gpu/drm/apu/apu_gem.c
> new file mode 100644
> index 0000000000000..c867143dab436
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/apu_gem.c
> @@ -0,0 +1,232 @@
> +// SPDX-License-Identifier: GPL-2.0
> +//
> +// Copyright 2020 BayLibre SAS
> +
> +#include <asm/cacheflush.h>
> +
> +#include <linux/dma-buf.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/highmem.h>
> +#include <linux/iommu.h>
> +#include <linux/iova.h>
> +#include <linux/mm.h>
> +#include <linux/swap.h>
> +
> +#include <drm/drm_drv.h>
> +#include <drm/drm_gem_cma_helper.h>
> +
> +#include <uapi/drm/apu_drm.h>
> +
> +#include "apu_internal.h"
> +
> +struct drm_gem_object *apu_gem_create_object(struct drm_device *dev,
> +                                            size_t size)
> +{
> +       struct drm_gem_cma_object *cma_obj;
> +
> +       cma_obj = drm_gem_cma_create(dev, size);
> +       if (!cma_obj)
> +               return NULL;
> +
> +       return &cma_obj->base;
> +}
> +
> +int ioctl_gem_new(struct drm_device *dev, void *data,
> +                 struct drm_file *file_priv)
> +{
> +       struct drm_apu_gem_new *args = data;
> +       struct drm_gem_cma_object *cma_obj;
> +       struct apu_gem_object *apu_obj;
> +       struct drm_gem_object *gem_obj;
> +       int ret;
> +
> +       cma_obj = drm_gem_cma_create(dev, args->size);
> +       if (IS_ERR(cma_obj))
> +               return PTR_ERR(cma_obj);
> +
> +       gem_obj = &cma_obj->base;
> +       apu_obj = to_apu_bo(gem_obj);
> +
> +       /*
> +        * Save the size of buffer expected by application instead of the
> +        * aligned one.
> +        */
> +       apu_obj->size = args->size;
> +       apu_obj->offset = 0;
> +       apu_obj->iommu_refcount = 0;
> +       mutex_init(&apu_obj->mutex);
> +
> +       ret = drm_gem_handle_create(file_priv, gem_obj, &args->handle);
> +       drm_gem_object_put(gem_obj);
> +       if (ret) {
> +               drm_gem_cma_free_object(gem_obj);
> +               return ret;
> +       }
> +       args->offset = drm_vma_node_offset_addr(&gem_obj->vma_node);
> +
> +       return 0;
> +}
> +
> +void apu_bo_iommu_unmap(struct apu_drm *apu_drm, struct apu_gem_object *obj)
> +{
> +       int iova_pfn;
> +       int i;
> +
> +       if (!obj->iommu_sgt)
> +               return;
> +
> +       mutex_lock(&obj->mutex);
> +       obj->iommu_refcount--;
> +       if (obj->iommu_refcount) {
> +               mutex_unlock(&obj->mutex);
> +               return;
> +       }
> +
> +       iova_pfn = PHYS_PFN(obj->iova);
> +       for (i = 0; i < obj->iommu_sgt->nents; i++) {
> +               iommu_unmap(apu_drm->domain, PFN_PHYS(iova_pfn),
> +                           PAGE_ALIGN(obj->iommu_sgt->sgl[i].length));
> +               iova_pfn += PHYS_PFN(PAGE_ALIGN(obj->iommu_sgt->sgl[i].length));
> +       }
> +
> +       sg_free_table(obj->iommu_sgt);
> +       kfree(obj->iommu_sgt);
> +
> +       free_iova(&apu_drm->iovad, PHYS_PFN(obj->iova));
> +       mutex_unlock(&obj->mutex);
> +}
> +
> +static struct sg_table *apu_get_sg_table(struct drm_gem_object *obj)
> +{
> +       if (obj->funcs)
> +               return obj->funcs->get_sg_table(obj);
> +       return NULL;
> +}
> +
> +int apu_bo_iommu_map(struct apu_drm *apu_drm, struct drm_gem_object *obj)
> +{
> +       struct apu_gem_object *apu_obj = to_apu_bo(obj);
> +       struct scatterlist *sgl;
> +       phys_addr_t phys;
> +       int total_buf_space;
> +       int iova_pfn;
> +       int iova;
> +       int ret;
> +       int i;
> +
> +       mutex_lock(&apu_obj->mutex);
> +       apu_obj->iommu_refcount++;
> +       if (apu_obj->iommu_refcount != 1) {
> +               mutex_unlock(&apu_obj->mutex);
> +               return 0;
> +       }
> +
> +       apu_obj->iommu_sgt = apu_get_sg_table(obj);
> +       if (IS_ERR(apu_obj->iommu_sgt)) {
> +               mutex_unlock(&apu_obj->mutex);
> +               return PTR_ERR(apu_obj->iommu_sgt);
> +       }
> +
> +       total_buf_space = obj->size;
> +       iova_pfn = alloc_iova_fast(&apu_drm->iovad,
> +                                  total_buf_space >> PAGE_SHIFT,
> +                                  apu_drm->iova_limit_pfn, true);
> +       apu_obj->iova = PFN_PHYS(iova_pfn);
> +
> +       if (!iova_pfn) {
> +               dev_err(apu_drm->dev, "Failed to allocate iova address\n");
> +               mutex_unlock(&apu_obj->mutex);
> +               return -ENOMEM;
> +       }
> +
> +       iova = apu_obj->iova;
> +       sgl = apu_obj->iommu_sgt->sgl;
> +       for (i = 0; i < apu_obj->iommu_sgt->nents; i++) {
> +               phys = page_to_phys(sg_page(&sgl[i]));
> +               ret =
> +                   iommu_map(apu_drm->domain, PFN_PHYS(iova_pfn), phys,
> +                             PAGE_ALIGN(sgl[i].length),
> +                             IOMMU_READ | IOMMU_WRITE);
> +               if (ret) {
> +                       dev_err(apu_drm->dev, "Failed to iommu map\n");
> +                       free_iova(&apu_drm->iovad, iova_pfn);
> +                       mutex_unlock(&apu_obj->mutex);
> +                       return ret;
> +               }
> +               iova += sgl[i].offset + sgl[i].length;
> +               iova_pfn += PHYS_PFN(PAGE_ALIGN(sgl[i].length));
> +       }
> +       mutex_unlock(&apu_obj->mutex);
> +
> +       return 0;
> +}
> +
> +int ioctl_gem_iommu_map(struct drm_device *dev, void *data,
> +                       struct drm_file *file_priv)
> +{
> +       struct apu_drm *apu_drm = dev->dev_private;
> +       struct drm_apu_gem_iommu_map *args = data;
> +       struct drm_gem_object **bos;
> +       void __user *bo_handles;
> +       int ret;
> +       int i;
> +
> +       u64 *das = kvmalloc_array(args->bo_handle_count,
> +                                 sizeof(u64), GFP_KERNEL);
> +       if (!das)
> +               return -ENOMEM;
> +
> +       bo_handles = (void __user *)(uintptr_t) args->bo_handles;
> +       ret = drm_gem_objects_lookup(file_priv, bo_handles,
> +                                    args->bo_handle_count, &bos);
> +       if (ret) {
> +               kvfree(das);
> +               return ret;
> +       }
> +
> +       for (i = 0; i < args->bo_handle_count; i++) {
> +               ret = apu_bo_iommu_map(apu_drm, bos[i]);
> +               if (ret) {
> +                       /* TODO: handle error */
> +                       break;
> +               }
> +               das[i] = to_apu_bo(bos[i])->iova + to_apu_bo(bos[i])->offset;
> +       }
> +
> +       if (copy_to_user((void *)args->bo_device_addresses, das,
> +                        args->bo_handle_count * sizeof(u64))) {
> +               ret = -EFAULT;
> +               DRM_DEBUG("Failed to copy device addresses\n");
> +               goto out;
> +       }
> +
> +out:
> +       kvfree(das);
> +       kvfree(bos);
> +
> +       return 0;
> +}
> +
> +int ioctl_gem_iommu_unmap(struct drm_device *dev, void *data,
> +                         struct drm_file *file_priv)
> +{
> +       struct apu_drm *apu_drm = dev->dev_private;
> +       struct drm_apu_gem_iommu_map *args = data;
> +       struct drm_gem_object **bos;
> +       void __user *bo_handles;
> +       int ret;
> +       int i;
> +
> +       bo_handles = (void __user *)(uintptr_t) args->bo_handles;
> +       ret = drm_gem_objects_lookup(file_priv, bo_handles,
> +                                    args->bo_handle_count, &bos);
> +       if (ret)
> +               return ret;
> +
> +       for (i = 0; i < args->bo_handle_count; i++)
> +               apu_bo_iommu_unmap(apu_drm, to_apu_bo(bos[i]));
> +
> +       kvfree(bos);
> +
> +       return 0;
> +}
> diff --git a/drivers/gpu/drm/apu/apu_internal.h b/drivers/gpu/drm/apu/apu_internal.h
> new file mode 100644
> index 0000000000000..b789b2f3ad9c6
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/apu_internal.h
> @@ -0,0 +1,89 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __APU_INTERNAL_H__
> +#define __APU_INTERNAL_H__
> +
> +#include <linux/iova.h>
> +
> +#include <drm/drm_drv.h>
> +#include <drm/drm_gem_cma_helper.h>
> +#include <drm/gpu_scheduler.h>
> +
> +struct apu_gem_object {
> +       struct drm_gem_cma_object base;
> +       struct mutex mutex;
> +       struct sg_table *iommu_sgt;
> +       int iommu_refcount;
> +       size_t size;
> +       u32 iova;
> +       u32 offset;
> +};
> +
> +struct apu_sched;
> +struct apu_core {
> +       int device_id;
> +       struct device *dev;
> +       struct rproc *rproc;
> +       struct apu_drm_ops *ops;
> +       struct apu_drm *apu_drm;
> +
> +       spinlock_t ctx_lock;
> +       struct list_head requests;
> +
> +       struct list_head node;
> +       void *priv;
> +
> +       struct apu_sched *sched;
> +       u32 flags;
> +};
> +
> +struct apu_drm {
> +       struct device *dev;
> +       struct drm_device *drm;
> +
> +       struct iommu_domain *domain;
> +       struct iova_domain iovad;
> +       int iova_limit_pfn;
> +
> +       struct list_head apu_cores;
> +       struct list_head node;
> +};
> +
> +static inline struct apu_gem_object *to_apu_bo(struct drm_gem_object *obj)
> +{
> +       return container_of(to_drm_gem_cma_obj(obj), struct apu_gem_object,
> +                           base);
> +}
> +
> +struct apu_gem_object *to_apu_bo(struct drm_gem_object *obj);
> +struct drm_gem_object *apu_gem_create_object(struct drm_device *dev,
> +                                            size_t size);
> +
> +int apu_bo_iommu_map(struct apu_drm *apu_drm, struct drm_gem_object *obj);
> +void apu_bo_iommu_unmap(struct apu_drm *apu_drm, struct apu_gem_object *obj);
> +struct drm_gem_object *apu_gem_create_object(struct drm_device *dev,
> +                                            size_t size);
> +int ioctl_gem_new(struct drm_device *dev, void *data,
> +                 struct drm_file *file_priv);
> +int ioctl_gem_user_new(struct drm_device *dev, void *data,
> +                      struct drm_file *file_priv);
> +int ioctl_gem_iommu_map(struct drm_device *dev, void *data,
> +                       struct drm_file *file_priv);
> +int ioctl_gem_iommu_unmap(struct drm_device *dev, void *data,
> +                         struct drm_file *file_priv);
> +int ioctl_gem_queue(struct drm_device *dev, void *data,
> +                   struct drm_file *file_priv);
> +int ioctl_gem_dequeue(struct drm_device *dev, void *data,
> +                     struct drm_file *file_priv);
> +int ioctl_apu_state(struct drm_device *dev, void *data,
> +                   struct drm_file *file_priv);
> +struct dma_buf *apu_gem_prime_export(struct drm_gem_object *gem,
> +                                    int flags);
> +
> +struct apu_job;
> +
> +int apu_drm_job_init(struct apu_core *core);
> +void apu_sched_fini(struct apu_core *core);
> +int apu_job_push(struct apu_job *job);
> +void apu_job_put(struct apu_job *job);
> +
> +#endif /* __APU_INTERNAL_H__ */
> diff --git a/drivers/gpu/drm/apu/apu_sched.c b/drivers/gpu/drm/apu/apu_sched.c
> new file mode 100644
> index 0000000000000..cebb0155c7783
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/apu_sched.c
> @@ -0,0 +1,634 @@
> +// SPDX-License-Identifier: GPL-2.0
> +//
> +// Copyright 2020 BayLibre SAS
> +
> +#include <drm/apu_drm.h>
> +#include <drm/drm_drv.h>
> +#include <drm/drm_gem_cma_helper.h>
> +#include <drm/drm_syncobj.h>
> +#include <drm/gpu_scheduler.h>
> +
> +#include <uapi/drm/apu_drm.h>
> +
> +#include "apu_internal.h"
> +
> +struct apu_queue_state {
> +       struct drm_gpu_scheduler sched;
> +
> +       u64 fence_context;
> +       u64 seqno;
> +};
> +
> +struct apu_request {
> +       struct list_head node;
> +       void *job;
> +};
> +
> +struct apu_sched {
> +       struct apu_queue_state apu_queue;
> +       spinlock_t job_lock;
> +       struct drm_sched_entity sched_entity;
> +};
> +
> +struct apu_event {
> +       struct drm_pending_event pending_event;
> +       union {
> +               struct drm_event base;
> +               struct apu_job_event job_event;
> +       };
> +};
> +
> +struct apu_job {
> +       struct drm_sched_job base;
> +
> +       struct kref refcount;
> +
> +       struct apu_core *apu_core;
> +       struct apu_drm *apu_drm;
> +
> +       /* Fence to be signaled by IRQ handler when the job is complete. */
> +       struct dma_fence *done_fence;
> +
> +       __u32 cmd;
> +
> +       /* Exclusive fences we have taken from the BOs to wait for */
> +       struct dma_fence **implicit_fences;
> +       struct drm_gem_object **bos;
> +       u32 bo_count;
> +
> +       /* Fence to be signaled by drm-sched once its done with the job */
> +       struct dma_fence *render_done_fence;
> +
> +       void *data_in;
> +       uint16_t size_in;
> +       void *data_out;
> +       uint16_t size_out;
> +       uint16_t result;
> +       uint16_t id;
> +
> +       struct list_head node;
> +       struct drm_syncobj *sync_out;
> +
> +       struct apu_event *event;
> +};
> +
> +static DEFINE_IDA(req_ida);
> +static LIST_HEAD(complete_node);
> +
> +int apu_drm_callback(struct apu_core *apu_core, void *data, int len)
> +{
> +       struct apu_request *apu_req, *tmp;
> +       struct apu_dev_request *hdr = data;
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&apu_core->ctx_lock, flags);
> +       list_for_each_entry_safe(apu_req, tmp, &apu_core->requests, node) {
> +               struct apu_job *job = apu_req->job;
> +
> +               if (job && hdr->id == job->id) {
> +                       kref_get(&job->refcount);
> +                       job->result = hdr->result;
> +                       if (job->size_out)
> +                               memcpy(job->data_out, hdr->data + job->size_in,
> +                                      min(job->size_out, hdr->size_out));
> +                       job->size_out = hdr->size_out;
> +                       list_add(&job->node, &complete_node);
> +                       list_del(&apu_req->node);
> +                       ida_simple_remove(&req_ida, hdr->id);
> +                       kfree(apu_req);
> +                       drm_send_event(job->apu_drm->drm,
> +                                      &job->event->pending_event);
> +                       dma_fence_signal_locked(job->done_fence);
> +               }
> +       }
> +       spin_unlock_irqrestore(&apu_core->ctx_lock, flags);
> +
> +       return 0;
> +}
> +
> +void apu_sched_fini(struct apu_core *core)
> +{
> +       drm_sched_fini(&core->sched->apu_queue.sched);
> +       devm_kfree(core->dev, core->sched);
> +       core->flags &= ~APU_ONLINE;
> +       core->sched = NULL;
> +}
> +
> +static void apu_job_cleanup(struct kref *ref)
> +{
> +       struct apu_job *job = container_of(ref, struct apu_job,
> +                                          refcount);
> +       unsigned int i;
> +
> +       if (job->implicit_fences) {
> +               for (i = 0; i < job->bo_count; i++)
> +                       dma_fence_put(job->implicit_fences[i]);
> +               kvfree(job->implicit_fences);
> +       }
> +       dma_fence_put(job->done_fence);
> +       dma_fence_put(job->render_done_fence);
> +
> +       if (job->bos) {
> +               for (i = 0; i < job->bo_count; i++) {
> +                       struct apu_gem_object *apu_obj;
> +
> +                       apu_obj = to_apu_bo(job->bos[i]);
> +                       apu_bo_iommu_unmap(job->apu_drm, apu_obj);
> +                       drm_gem_object_put(job->bos[i]);
> +               }
> +
> +               kvfree(job->bos);
> +       }
> +
> +       kfree(job->data_out);
> +       kfree(job->data_in);
> +       kfree(job);
> +}
> +
> +void apu_job_put(struct apu_job *job)
> +{
> +       kref_put(&job->refcount, apu_job_cleanup);
> +}
> +
> +static void apu_acquire_object_fences(struct drm_gem_object **bos,
> +                                     int bo_count,
> +                                     struct dma_fence **implicit_fences)
> +{
> +       int i;
> +
> +       for (i = 0; i < bo_count; i++)
> +               implicit_fences[i] = dma_resv_get_excl_unlocked(bos[i]->resv);
> +}
> +
> +static void apu_attach_object_fences(struct drm_gem_object **bos,
> +                                    int bo_count, struct dma_fence *fence)
> +{
> +       int i;
> +
> +       for (i = 0; i < bo_count; i++)
> +               dma_resv_add_excl_fence(bos[i]->resv, fence);
> +}
> +
> +int apu_job_push(struct apu_job *job)
> +{
> +       struct drm_sched_entity *entity = &job->apu_core->sched->sched_entity;
> +       struct ww_acquire_ctx acquire_ctx;
> +       int ret = 0;
> +
> +       ret = drm_gem_lock_reservations(job->bos, job->bo_count, &acquire_ctx);
> +       if (ret)
> +               return ret;
> +
> +       ret = drm_sched_job_init(&job->base, entity, NULL);
> +       if (ret)
> +               goto unlock;
> +
> +       job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
> +
> +       kref_get(&job->refcount);       /* put by scheduler job completion */
> +
> +       apu_acquire_object_fences(job->bos, job->bo_count,
> +                                 job->implicit_fences);
> +
> +       drm_sched_entity_push_job(&job->base, entity);
> +
> +       apu_attach_object_fences(job->bos, job->bo_count,
> +                                job->render_done_fence);
> +
> +unlock:
> +       drm_gem_unlock_reservations(job->bos, job->bo_count, &acquire_ctx);
> +
> +       return ret;
> +}
> +
> +static const char *apu_fence_get_driver_name(struct dma_fence *fence)
> +{
> +       return "apu";
> +}
> +
> +static const char *apu_fence_get_timeline_name(struct dma_fence *fence)
> +{
> +       return "apu-0";
> +}
> +
> +static void apu_fence_release(struct dma_fence *f)
> +{
> +       kfree(f);
> +}
> +
> +static const struct dma_fence_ops apu_fence_ops = {
> +       .get_driver_name = apu_fence_get_driver_name,
> +       .get_timeline_name = apu_fence_get_timeline_name,
> +       .release = apu_fence_release,
> +};
> +
> +static struct dma_fence *apu_fence_create(struct apu_sched *sched)
> +{
> +       struct dma_fence *fence;
> +       struct apu_queue_state *apu_queue = &sched->apu_queue;
> +
> +       fence = kzalloc(sizeof(*fence), GFP_KERNEL);
> +       if (!fence)
> +               return ERR_PTR(-ENOMEM);
> +
> +       dma_fence_init(fence, &apu_fence_ops, &sched->job_lock,
> +                      apu_queue->fence_context, apu_queue->seqno++);
> +
> +       return fence;
> +}
> +
> +static struct apu_job *to_apu_job(struct drm_sched_job *sched_job)
> +{
> +       return container_of(sched_job, struct apu_job, base);
> +}
> +
> +static struct dma_fence *apu_job_dependency(struct drm_sched_job *sched_job,
> +                                           struct drm_sched_entity *s_entity)
> +{
> +       struct apu_job *job = to_apu_job(sched_job);
> +       struct dma_fence *fence;
> +       unsigned int i;
> +
> +       /* Implicit fences, max. one per BO */
> +       for (i = 0; i < job->bo_count; i++) {
> +               if (job->implicit_fences[i]) {
> +                       fence = job->implicit_fences[i];
> +                       job->implicit_fences[i] = NULL;
> +                       return fence;
> +               }
> +       }
> +
> +       return NULL;
> +}
> +
> +static int apu_job_hw_submit(struct apu_job *job)
> +{
> +       int ret;
> +       struct apu_core *apu_core = job->apu_core;
> +       struct apu_dev_request *dev_req;
> +       struct apu_request *apu_req;
> +       unsigned long flags;
> +
> +       int size = sizeof(*dev_req) + sizeof(u32) * job->bo_count * 2;
> +       u32 *dev_req_da;
> +       u32 *dev_req_buffer_size;
> +       int i;
> +
> +       dev_req = kmalloc(size + job->size_in + job->size_out, GFP_KERNEL);
> +       if (!dev_req)
> +               return -ENOMEM;
> +
> +       dev_req->cmd = job->cmd;
> +       dev_req->size_in = job->size_in;
> +       dev_req->size_out = job->size_out;
> +       dev_req->count = job->bo_count;
> +       dev_req_da =
> +           (u32 *) (dev_req->data + dev_req->size_in + dev_req->size_out);
> +       dev_req_buffer_size = (u32 *) (dev_req_da + dev_req->count);
> +       memcpy(dev_req->data, job->data_in, job->size_in);
> +
> +       apu_req = kzalloc(sizeof(*apu_req), GFP_KERNEL);
> +       if (!apu_req)
> +               return -ENOMEM;
> +
> +       for (i = 0; i < job->bo_count; i++) {
> +               struct apu_gem_object *obj = to_apu_bo(job->bos[i]);
> +
> +               dev_req_da[i] = obj->iova + obj->offset;
> +               dev_req_buffer_size[i] = obj->size;
> +       }
> +
> +       ret = ida_simple_get(&req_ida, 0, 0xffff, GFP_KERNEL);
> +       if (ret < 0)
> +               goto err_free_memory;
> +
> +       dev_req->id = ret;
> +
> +       job->id = dev_req->id;
> +       apu_req->job = job;
> +       spin_lock_irqsave(&apu_core->ctx_lock, flags);
> +       list_add(&apu_req->node, &apu_core->requests);
> +       spin_unlock_irqrestore(&apu_core->ctx_lock, flags);
> +       ret =
> +           apu_core->ops->send(apu_core, dev_req,
> +                               size + dev_req->size_in + dev_req->size_out);
> +       if (ret < 0)
> +               goto err;
> +       kfree(dev_req);
> +
> +       return 0;
> +
> +err:
> +       list_del(&apu_req->node);
> +       ida_simple_remove(&req_ida, dev_req->id);
> +err_free_memory:
> +       kfree(apu_req);
> +       kfree(dev_req);
> +
> +       return ret;
> +}
> +
> +static struct dma_fence *apu_job_run(struct drm_sched_job *sched_job)
> +{
> +       struct apu_job *job = to_apu_job(sched_job);
> +       struct dma_fence *fence = NULL;
> +
> +       if (unlikely(job->base.s_fence->finished.error))
> +               return NULL;
> +
> +       fence = apu_fence_create(job->apu_core->sched);
> +       if (IS_ERR(fence))
> +               return NULL;
> +
> +       job->done_fence = dma_fence_get(fence);
> +
> +       apu_job_hw_submit(job);
> +
> +       return fence;
> +}
> +
> +static void apu_update_rpoc_state(struct apu_core *core)
> +{
> +       if (core->rproc) {
> +               if (core->rproc->state == RPROC_CRASHED)
> +                       core->flags |= APU_CRASHED;
> +               if (core->rproc->state == RPROC_OFFLINE)
> +                       core->flags &= ~APU_ONLINE;
> +       }
> +}
> +
> +static enum drm_gpu_sched_stat apu_job_timedout(struct drm_sched_job *sched_job)
> +{
> +       struct apu_request *apu_req, *tmp;
> +       struct apu_job *job = to_apu_job(sched_job);
> +
> +       if (dma_fence_is_signaled(job->done_fence))
> +               return DRM_GPU_SCHED_STAT_NOMINAL;
> +
> +       list_for_each_entry_safe(apu_req, tmp, &job->apu_core->requests, node) {
> +               /* Remove the request and notify user about timeout */
> +               if (apu_req->job == job) {
> +                       kref_get(&job->refcount);
> +                       job->apu_core->flags |= APU_TIMEDOUT;
> +                       apu_update_rpoc_state(job->apu_core);
> +                       job->result = ETIMEDOUT;
> +                       list_add(&job->node, &complete_node);
> +                       list_del(&apu_req->node);
> +                       ida_simple_remove(&req_ida, job->id);
> +                       kfree(apu_req);
> +                       drm_send_event(job->apu_drm->drm,
> +                                      &job->event->pending_event);
> +                       dma_fence_signal_locked(job->done_fence);
> +               }
> +       }
> +
> +       return DRM_GPU_SCHED_STAT_NOMINAL;
> +}
> +
> +static void apu_job_free(struct drm_sched_job *sched_job)
> +{
> +       struct apu_job *job = to_apu_job(sched_job);
> +
> +       drm_sched_job_cleanup(sched_job);
> +
> +       apu_job_put(job);
> +}
> +
> +static const struct drm_sched_backend_ops apu_sched_ops = {
> +       .dependency = apu_job_dependency,
> +       .run_job = apu_job_run,
> +       .timedout_job = apu_job_timedout,
> +       .free_job = apu_job_free
> +};
> +
> +int apu_drm_job_init(struct apu_core *core)
> +{
> +       int ret;
> +       struct apu_sched *apu_sched;
> +       struct drm_gpu_scheduler *sched;
> +
> +       apu_sched = devm_kzalloc(core->dev, sizeof(*apu_sched), GFP_KERNEL);
> +       if (!apu_sched)
> +               return -ENOMEM;
> +
> +       sched = &apu_sched->apu_queue.sched;
> +       apu_sched->apu_queue.fence_context = dma_fence_context_alloc(1);
> +       ret = drm_sched_init(sched, &apu_sched_ops,
> +                            1, 0, msecs_to_jiffies(500),
> +                            NULL, NULL, "apu_js");
> +       if (ret) {
> +               dev_err(core->dev, "Failed to create scheduler: %d.", ret);
> +               return ret;
> +       }
> +
> +       ret = drm_sched_entity_init(&apu_sched->sched_entity,
> +                                   DRM_SCHED_PRIORITY_NORMAL,
> +                                   &sched, 1, NULL);
> +
> +       core->sched = apu_sched;
> +       core->flags = APU_ONLINE;
> +
> +       return ret;
> +}
> +
> +static struct apu_core *get_apu_core(struct apu_drm *apu_drm, int device_id)
> +{
> +       struct apu_core *apu_core;
> +
> +       list_for_each_entry(apu_core, &apu_drm->apu_cores, node) {
> +               if (apu_core->device_id == device_id)
> +                       return apu_core;
> +       }
> +
> +       return NULL;
> +}
> +
> +static int apu_core_is_running(struct apu_core *core)
> +{
> +       return core->ops && core->priv && core->sched;
> +}
> +
> +static int
> +apu_lookup_bos(struct drm_device *dev,
> +              struct drm_file *file_priv,
> +              struct drm_apu_gem_queue *args, struct apu_job *job)
> +{
> +       void __user *bo_handles;
> +       unsigned int i;
> +       int ret;
> +
> +       job->bo_count = args->bo_handle_count;
> +
> +       if (!job->bo_count)
> +               return 0;
> +
> +       job->implicit_fences = kvmalloc_array(job->bo_count,
> +                                             sizeof(struct dma_fence *),
> +                                             GFP_KERNEL | __GFP_ZERO);
> +       if (!job->implicit_fences)
> +               return -ENOMEM;
> +
> +       bo_handles = (void __user *)(uintptr_t) args->bo_handles;
> +       ret = drm_gem_objects_lookup(file_priv, bo_handles,
> +                                    job->bo_count, &job->bos);
> +       if (ret)
> +               return ret;
> +
> +       for (i = 0; i < job->bo_count; i++) {
> +               ret = apu_bo_iommu_map(job->apu_drm, job->bos[i]);
> +               if (ret) {
> +                       /* TODO: handle error */
> +                       break;
> +               }
> +       }
> +
> +       return ret;
> +}
> +
> +int ioctl_gem_queue(struct drm_device *dev, void *data,
> +                   struct drm_file *file_priv)
> +{
> +       struct apu_drm *apu_drm = dev->dev_private;
> +       struct drm_apu_gem_queue *args = data;
> +       struct apu_event *event;
> +       struct apu_core *core;
> +       struct drm_syncobj *sync_out = NULL;
> +       struct apu_job *job;
> +       int ret = 0;
> +
> +       core = get_apu_core(apu_drm, args->device);
> +       if (!apu_core_is_running(core))
> +               return -ENODEV;
> +
> +       if (args->out_sync > 0) {
> +               sync_out = drm_syncobj_find(file_priv, args->out_sync);
> +               if (!sync_out)
> +                       return -ENODEV;
> +       }
> +
> +       job = kzalloc(sizeof(*job), GFP_KERNEL);
> +       if (!job) {
> +               ret = -ENOMEM;
> +               goto fail_out_sync;
> +       }
> +
> +       kref_init(&job->refcount);
> +
> +       job->apu_drm = apu_drm;
> +       job->apu_core = core;
> +       job->cmd = args->cmd;
> +       job->size_in = args->size_in;
> +       job->size_out = args->size_out;
> +       job->sync_out = sync_out;
> +       if (job->size_in) {
> +               job->data_in = kmalloc(job->size_in, GFP_KERNEL);
> +               if (!job->data_in) {
> +                       ret = -ENOMEM;
> +                       goto fail_job;
> +               }
> +
> +               ret =
> +                   copy_from_user(job->data_in,
> +                                  (void __user *)(uintptr_t) args->data,
> +                                  job->size_in);
> +               if (ret)
> +                       goto fail_job;
> +       }
> +
> +       if (job->size_out) {
> +               job->data_out = kmalloc(job->size_out, GFP_KERNEL);
> +               if (!job->data_out) {
> +                       ret = -ENOMEM;
> +                       goto fail_job;
> +               }
> +       }
> +
> +       ret = apu_lookup_bos(dev, file_priv, args, job);
> +       if (ret)
> +               goto fail_job;
> +
> +       event = kzalloc(sizeof(*event), GFP_KERNEL);
> +       event->base.length = sizeof(struct apu_job_event);
> +       event->base.type = APU_JOB_COMPLETED;
> +       event->job_event.out_sync = args->out_sync;
> +       job->event = event;
> +       ret = drm_event_reserve_init(dev, file_priv, &job->event->pending_event,
> +                                    &job->event->base);
> +       if (ret)
> +               goto fail_job;
> +
> +       ret = apu_job_push(job);
> +       if (ret) {
> +               drm_event_cancel_free(dev, &job->event->pending_event);
> +               goto fail_job;
> +       }
> +
> +       /* Update the return sync object for the job */
> +       if (sync_out)
> +               drm_syncobj_replace_fence(sync_out, job->render_done_fence);
> +
> +fail_job:
> +       apu_job_put(job);
> +fail_out_sync:
> +       if (sync_out)
> +               drm_syncobj_put(sync_out);
> +
> +       return ret;
> +}
> +
> +int ioctl_gem_dequeue(struct drm_device *dev, void *data,
> +                     struct drm_file *file_priv)
> +{
> +       struct drm_apu_gem_dequeue *args = data;
> +       struct drm_syncobj *sync_out = NULL;
> +       struct apu_job *job;
> +       int ret = 0;
> +
> +       if (args->out_sync > 0) {
> +               sync_out = drm_syncobj_find(file_priv, args->out_sync);
> +               if (!sync_out)
> +                       return -ENODEV;
> +       }
> +
> +       list_for_each_entry(job, &complete_node, node) {
> +               if (job->sync_out == sync_out) {
> +                       if (job->data_out) {
> +                               ret = copy_to_user((void __user *)(uintptr_t)
> +                                                  args->data, job->data_out,
> +                                                  job->size_out);
> +                               args->size = job->size_out;
> +                       }
> +                       args->result = job->result;
> +                       list_del(&job->node);
> +                       apu_job_put(job);
> +                       drm_syncobj_put(sync_out);
> +
> +                       return ret;
> +               }
> +       }
> +
> +       if (sync_out)
> +               drm_syncobj_put(sync_out);
> +
> +       return 0;
> +}
> +
> +int ioctl_apu_state(struct drm_device *dev, void *data,
> +                   struct drm_file *file_priv)
> +{
> +       struct apu_drm *apu_drm = dev->dev_private;
> +       struct drm_apu_state *args = data;
> +       struct apu_core *core;
> +
> +       args->flags = 0;
> +
> +       core = get_apu_core(apu_drm, args->device);
> +       if (!core)
> +               return -ENODEV;
> +       args->flags |= core->flags;
> +
> +       /* Reset APU flags */
> +       core->flags &= ~(APU_TIMEDOUT | APU_CRASHED);
> +
> +       return 0;
> +}
> diff --git a/include/drm/apu_drm.h b/include/drm/apu_drm.h
> new file mode 100644
> index 0000000000000..f044ed0427fdd
> --- /dev/null
> +++ b/include/drm/apu_drm.h
> @@ -0,0 +1,59 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __APU_DRM_H__
> +#define __APU_DRM_H__
> +
> +#include <linux/iova.h>
> +#include <linux/remoteproc.h>
> +
> +struct apu_core;
> +struct apu_drm;
> +
> +struct apu_drm_ops {
> +       int (*send)(struct apu_core *apu_core, void *data, int len);
> +       int (*callback)(struct apu_core *apu_core, void *data, int len);
> +};
> +
> +#ifdef CONFIG_DRM_APU
> +
> +struct apu_core *apu_drm_register_core(struct rproc *rproc,
> +                                      struct apu_drm_ops *ops, void *priv);
> +int apu_drm_reserve_iova(struct apu_core *apu_core, u64 start, u64 size);
> +int apu_drm_unregister_core(void *priv);
> +int apu_drm_callback(struct apu_core *apu_core, void *data, int len);
> +void *apu_drm_priv(struct apu_core *apu_core);
> +
> +#else /* CONFIG_DRM_APU */
> +
> +static inline
> +struct apu_core *apu_drm_register_core(struct rproc *rproc,
> +                                      struct apu_drm_ops *ops, void *priv)
> +{
> +       return NULL;
> +}
> +
> +static inline
> +int apu_drm_reserve_iova(struct apu_core *apu_core, u64 start, u64 size)
> +{
> +       return -ENOMEM;
> +}
> +
> +static inline
> +int apu_drm_uregister_core(void *priv)
> +{
> +       return -ENODEV;
> +}
> +
> +static inline
> +int apu_drm_callback(struct apu_core *apu_core, void *data, int len)
> +{
> +       return -ENODEV;
> +}
> +
> +static inline void *apu_drm_priv(struct apu_core *apu_core)
> +{
> +       return NULL;
> +}
> +#endif /* CONFIG_DRM_APU */
> +
> +
> +#endif /* __APU_DRM_H__ */
> diff --git a/include/uapi/drm/apu_drm.h b/include/uapi/drm/apu_drm.h
> new file mode 100644
> index 0000000000000..c52e187bb0599
> --- /dev/null
> +++ b/include/uapi/drm/apu_drm.h
> @@ -0,0 +1,106 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __UAPI_APU_DRM_H__
> +#define __UAPI_APU_DRM_H__
> +
> +#include "drm.h"
> +
> +#if defined(__cplusplus)
> +extern "C" {
> +#endif
> +
> +#define APU_JOB_COMPLETED 0x80000000
> +
> +/*
> + * Please note that modifications to all structs defined here are
> + * subject to backwards-compatibility constraints.
> + */
> +
> +/*
> + * Firmware request, must be aligned with the one defined in firmware.
> + * @id: Request id, used in the case of reply, to find the pending request
> + * @cmd: The command id to execute in the firmware
> + * @result: The result of the command executed on the firmware
> + * @size: The size of the data available in this request
> + * @count: The number of shared buffer
> + * @data: Contains the data attached with the request if size is greater than
> + *        zero, and the addresses of shared buffers if count is greater than
> + *        zero. Both the data and the shared buffer could be read and write
> + *        by the APU.
> + */
> +struct  apu_dev_request {
> +       u16 id;
> +       u16 cmd;
> +       u16 result;
> +       u16 size_in;
> +       u16 size_out;
> +       u16 count;
> +       u8 data[0];
> +} __packed;
> +
> +struct drm_apu_gem_new {
> +       __u32 size;                     /* in */
> +       __u32 flags;                    /* in */
> +       __u32 handle;                   /* out */
> +       __u64 offset;                   /* out */
> +};
> +

Please refer to
https://www.kernel.org/doc/Documentation/ioctl/botching-up-ioctls.rst

here and below in many places.

There's a lot of missing padding/alignment here.

I'm trying to find the time to review this stack in full, any writeups
on how this is used from userspace would be useful (not just the code
repo, but some sort of how do I get at it) it reads as kinda generic
(calling it apu), but then has some specifics around device binding.

Dave.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 2/4] DRM: Add support of AI Processor Unit (APU)
@ 2021-09-23  0:58     ` Dave Airlie
  0 siblings, 0 replies; 34+ messages in thread
From: Dave Airlie @ 2021-09-23  0:58 UTC (permalink / raw)
  To: Alexandre Bailon
  Cc: Dave Airlie, Daniel Vetter, Rob Herring, Matthias Brugger,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, ohad,
	bjorn.andersson, Mathieu Poirier, Sumit Semwal, Koenig,
	Christian, dri-devel,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	linux-arm-kernel, moderated list:ARM/Mediatek SoC support, LKML,
	linux-remoteproc, Linux Media Mailing List,
	moderated list:DMA BUFFER SHARING FRAMEWORK, khilman, gpain

On Sat, 18 Sept 2021 at 07:57, Alexandre Bailon <abailon@baylibre.com> wrote:
>
> Some Mediatek SoC provides hardware accelerator for AI / ML.
> This driver provides the infrastructure to manage memory
> shared between host CPU and the accelerator, and to submit
> jobs to the accelerator.
> The APU itself is managed by remoteproc so this drivers
> relies on remoteproc to found the APU and get some important data
> from it. But, the driver is quite generic and it should possible
> to manage accelerator using another ways.
> This driver doesn't manage itself the data transmitions.
> It must be registered by another driver implementing the transmitions.
>
> Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
> ---
>  drivers/gpu/drm/Kconfig            |   2 +
>  drivers/gpu/drm/Makefile           |   1 +
>  drivers/gpu/drm/apu/Kconfig        |  10 +
>  drivers/gpu/drm/apu/Makefile       |   7 +
>  drivers/gpu/drm/apu/apu_drm_drv.c  | 238 +++++++++++
>  drivers/gpu/drm/apu/apu_gem.c      | 232 +++++++++++
>  drivers/gpu/drm/apu/apu_internal.h |  89 ++++
>  drivers/gpu/drm/apu/apu_sched.c    | 634 +++++++++++++++++++++++++++++
>  include/drm/apu_drm.h              |  59 +++
>  include/uapi/drm/apu_drm.h         | 106 +++++
>  10 files changed, 1378 insertions(+)
>  create mode 100644 drivers/gpu/drm/apu/Kconfig
>  create mode 100644 drivers/gpu/drm/apu/Makefile
>  create mode 100644 drivers/gpu/drm/apu/apu_drm_drv.c
>  create mode 100644 drivers/gpu/drm/apu/apu_gem.c
>  create mode 100644 drivers/gpu/drm/apu/apu_internal.h
>  create mode 100644 drivers/gpu/drm/apu/apu_sched.c
>  create mode 100644 include/drm/apu_drm.h
>  create mode 100644 include/uapi/drm/apu_drm.h
>
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index 8fc40317f2b77..bcdca35c9eda5 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -382,6 +382,8 @@ source "drivers/gpu/drm/xlnx/Kconfig"
>
>  source "drivers/gpu/drm/gud/Kconfig"
>
> +source "drivers/gpu/drm/apu/Kconfig"
> +
>  config DRM_HYPERV
>         tristate "DRM Support for Hyper-V synthetic video device"
>         depends on DRM && PCI && MMU && HYPERV
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index ad11121548983..f3d8432976558 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -127,4 +127,5 @@ obj-$(CONFIG_DRM_MCDE) += mcde/
>  obj-$(CONFIG_DRM_TIDSS) += tidss/
>  obj-y                  += xlnx/
>  obj-y                  += gud/
> +obj-$(CONFIG_DRM_APU) += apu/
>  obj-$(CONFIG_DRM_HYPERV) += hyperv/
> diff --git a/drivers/gpu/drm/apu/Kconfig b/drivers/gpu/drm/apu/Kconfig
> new file mode 100644
> index 0000000000000..c8471309a0351
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/Kconfig
> @@ -0,0 +1,10 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +
> +config DRM_APU
> +       tristate "APU (AI Processor Unit)"
> +       select REMOTEPROC
> +       select DRM_SCHED
> +       help
> +         This provides a DRM driver that provides some facilities to
> +         communicate with an accelerated processing unit (APU).
> diff --git a/drivers/gpu/drm/apu/Makefile b/drivers/gpu/drm/apu/Makefile
> new file mode 100644
> index 0000000000000..3e97846b091c9
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/Makefile
> @@ -0,0 +1,7 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +apu_drm-y += apu_drm_drv.o
> +apu_drm-y += apu_sched.o
> +apu_drm-y += apu_gem.o
> +
> +obj-$(CONFIG_DRM_APU) += apu_drm.o
> diff --git a/drivers/gpu/drm/apu/apu_drm_drv.c b/drivers/gpu/drm/apu/apu_drm_drv.c
> new file mode 100644
> index 0000000000000..91d8c99e373c0
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/apu_drm_drv.c
> @@ -0,0 +1,238 @@
> +// SPDX-License-Identifier: GPL-2.0
> +//
> +// Copyright 2020 BayLibre SAS
> +
> +#include <linux/dma-map-ops.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/iommu.h>
> +#include <linux/iova.h>
> +#include <linux/list.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/platform_device.h>
> +#include <linux/remoteproc.h>
> +
> +#include <drm/apu_drm.h>
> +#include <drm/drm_drv.h>
> +#include <drm/drm_gem_cma_helper.h>
> +#include <drm/drm_probe_helper.h>
> +
> +#include <uapi/drm/apu_drm.h>
> +
> +#include "apu_internal.h"
> +
> +static LIST_HEAD(apu_devices);
> +
> +static const struct drm_ioctl_desc ioctls[] = {
> +       DRM_IOCTL_DEF_DRV(APU_GEM_NEW, ioctl_gem_new,
> +                         DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(APU_GEM_QUEUE, ioctl_gem_queue,
> +                         DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(APU_GEM_DEQUEUE, ioctl_gem_dequeue,
> +                         DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_MAP, ioctl_gem_iommu_map,
> +                         DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_UNMAP, ioctl_gem_iommu_unmap,
> +                         DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(APU_STATE, ioctl_apu_state,
> +                         DRM_RENDER_ALLOW),
> +};
> +
> +DEFINE_DRM_GEM_CMA_FOPS(apu_drm_ops);
> +
> +static struct drm_driver apu_drm_driver = {
> +       .driver_features = DRIVER_GEM | DRIVER_SYNCOBJ,
> +       .name = "drm_apu",
> +       .desc = "APU DRM driver",
> +       .date = "20210319",
> +       .major = 1,
> +       .minor = 0,
> +       .patchlevel = 0,
> +       .ioctls = ioctls,
> +       .num_ioctls = ARRAY_SIZE(ioctls),
> +       .fops = &apu_drm_ops,
> +       DRM_GEM_CMA_DRIVER_OPS_WITH_DUMB_CREATE(drm_gem_cma_dumb_create),
> +};
> +
> +void *apu_drm_priv(struct apu_core *apu_core)
> +{
> +       return apu_core->priv;
> +}
> +EXPORT_SYMBOL_GPL(apu_drm_priv);
> +
> +int apu_drm_reserve_iova(struct apu_core *apu_core, u64 start, u64 size)
> +{
> +       struct apu_drm *apu_drm = apu_core->apu_drm;
> +       struct iova *iova;
> +
> +       iova = reserve_iova(&apu_drm->iovad, PHYS_PFN(start),
> +                           PHYS_PFN(start + size));
> +       if (!iova)
> +               return -ENOMEM;
> +
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(apu_drm_reserve_iova);
> +
> +static int apu_drm_init_first_core(struct apu_drm *apu_drm,
> +                                  struct apu_core *apu_core)
> +{
> +       struct drm_device *drm;
> +       struct device *parent;
> +       u64 mask;
> +
> +       drm = apu_drm->drm;
> +       parent = apu_core->rproc->dev.parent;
> +       drm->dev->iommu_group = parent->iommu_group;
> +       apu_drm->domain = iommu_get_domain_for_dev(parent);
> +       set_dma_ops(drm->dev, get_dma_ops(parent));
> +       mask = dma_get_mask(parent);
> +       return dma_coerce_mask_and_coherent(drm->dev, mask);
> +}
> +
> +struct apu_core *apu_drm_register_core(struct rproc *rproc,
> +                                      struct apu_drm_ops *ops, void *priv)
> +{
> +       struct apu_drm *apu_drm;
> +       struct apu_core *apu_core;
> +       int ret;
> +
> +       list_for_each_entry(apu_drm, &apu_devices, node) {
> +               list_for_each_entry(apu_core, &apu_drm->apu_cores, node) {
> +                       if (apu_core->rproc == rproc) {
> +                               ret =
> +                                   apu_drm_init_first_core(apu_drm, apu_core);
> +                               apu_core->dev = &rproc->dev;
> +                               apu_core->priv = priv;
> +                               apu_core->ops = ops;
> +
> +                               ret = apu_drm_job_init(apu_core);
> +                               if (ret)
> +                                       return NULL;
> +
> +                               return apu_core;
> +                       }
> +               }
> +       }
> +
> +       return NULL;
> +}
> +EXPORT_SYMBOL_GPL(apu_drm_register_core);
> +
> +int apu_drm_unregister_core(void *priv)
> +{
> +       struct apu_drm *apu_drm;
> +       struct apu_core *apu_core;
> +
> +       list_for_each_entry(apu_drm, &apu_devices, node) {
> +               list_for_each_entry(apu_core, &apu_drm->apu_cores, node) {
> +                       if (apu_core->priv == priv) {
> +                               apu_sched_fini(apu_core);
> +                               apu_core->priv = NULL;
> +                               apu_core->ops = NULL;
> +                       }
> +               }
> +       }
> +
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(apu_drm_unregister_core);
> +
> +#ifdef CONFIG_OF
> +static const struct of_device_id apu_platform_of_match[] = {
> +       { .compatible = "mediatek,apu-drm", },
> +       { },
> +};
> +
> +MODULE_DEVICE_TABLE(of, apu_platform_of_match);
> +#endif
> +
> +static int apu_platform_probe(struct platform_device *pdev)
> +{
> +       struct drm_device *drm;
> +       struct apu_drm *apu_drm;
> +       struct of_phandle_iterator it;
> +       int index = 0;
> +       u64 iova[2];
> +       int ret;
> +
> +       apu_drm = devm_kzalloc(&pdev->dev, sizeof(*apu_drm), GFP_KERNEL);
> +       if (!apu_drm)
> +               return -ENOMEM;
> +       INIT_LIST_HEAD(&apu_drm->apu_cores);
> +
> +       of_phandle_iterator_init(&it, pdev->dev.of_node, "remoteproc", NULL, 0);
> +       while (of_phandle_iterator_next(&it) == 0) {
> +               struct rproc *rproc = rproc_get_by_phandle(it.phandle);
> +               struct apu_core *apu_core;
> +
> +               if (!rproc)
> +                       return -EPROBE_DEFER;
> +
> +               apu_core = devm_kzalloc(&pdev->dev, sizeof(*apu_core),
> +                                       GFP_KERNEL);
> +               if (!apu_core)
> +                       return -ENOMEM;
> +
> +               apu_core->rproc = rproc;
> +               apu_core->device_id = index++;
> +               apu_core->apu_drm = apu_drm;
> +               spin_lock_init(&apu_core->ctx_lock);
> +               INIT_LIST_HEAD(&apu_core->requests);
> +               list_add(&apu_core->node, &apu_drm->apu_cores);
> +       }
> +
> +       if (of_property_read_variable_u64_array(pdev->dev.of_node, "iova",
> +                                               iova, ARRAY_SIZE(iova),
> +                                               ARRAY_SIZE(iova)) !=
> +           ARRAY_SIZE(iova))
> +               return -EINVAL;
> +
> +       init_iova_domain(&apu_drm->iovad, PAGE_SIZE, PHYS_PFN(iova[0]));
> +       apu_drm->iova_limit_pfn = PHYS_PFN(iova[0] + iova[1]) - 1;
> +
> +       drm = drm_dev_alloc(&apu_drm_driver, &pdev->dev);
> +       if (IS_ERR(drm)) {
> +               ret = PTR_ERR(drm);
> +               return ret;
> +       }
> +
> +       ret = drm_dev_register(drm, 0);
> +       if (ret) {
> +               drm_dev_put(drm);
> +               return ret;
> +       }
> +
> +       drm->dev_private = apu_drm;
> +       apu_drm->drm = drm;
> +       apu_drm->dev = &pdev->dev;
> +
> +       platform_set_drvdata(pdev, drm);
> +
> +       list_add(&apu_drm->node, &apu_devices);
> +
> +       return 0;
> +}
> +
> +static int apu_platform_remove(struct platform_device *pdev)
> +{
> +       struct drm_device *drm;
> +
> +       drm = platform_get_drvdata(pdev);
> +
> +       drm_dev_unregister(drm);
> +       drm_dev_put(drm);
> +
> +       return 0;
> +}
> +
> +static struct platform_driver apu_platform_driver = {
> +       .probe = apu_platform_probe,
> +       .remove = apu_platform_remove,
> +       .driver = {
> +                  .name = "apu_drm",
> +                  .of_match_table = of_match_ptr(apu_platform_of_match),
> +       },
> +};
> +
> +module_platform_driver(apu_platform_driver);
> diff --git a/drivers/gpu/drm/apu/apu_gem.c b/drivers/gpu/drm/apu/apu_gem.c
> new file mode 100644
> index 0000000000000..c867143dab436
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/apu_gem.c
> @@ -0,0 +1,232 @@
> +// SPDX-License-Identifier: GPL-2.0
> +//
> +// Copyright 2020 BayLibre SAS
> +
> +#include <asm/cacheflush.h>
> +
> +#include <linux/dma-buf.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/highmem.h>
> +#include <linux/iommu.h>
> +#include <linux/iova.h>
> +#include <linux/mm.h>
> +#include <linux/swap.h>
> +
> +#include <drm/drm_drv.h>
> +#include <drm/drm_gem_cma_helper.h>
> +
> +#include <uapi/drm/apu_drm.h>
> +
> +#include "apu_internal.h"
> +
> +struct drm_gem_object *apu_gem_create_object(struct drm_device *dev,
> +                                            size_t size)
> +{
> +       struct drm_gem_cma_object *cma_obj;
> +
> +       cma_obj = drm_gem_cma_create(dev, size);
> +       if (!cma_obj)
> +               return NULL;
> +
> +       return &cma_obj->base;
> +}
> +
> +int ioctl_gem_new(struct drm_device *dev, void *data,
> +                 struct drm_file *file_priv)
> +{
> +       struct drm_apu_gem_new *args = data;
> +       struct drm_gem_cma_object *cma_obj;
> +       struct apu_gem_object *apu_obj;
> +       struct drm_gem_object *gem_obj;
> +       int ret;
> +
> +       cma_obj = drm_gem_cma_create(dev, args->size);
> +       if (IS_ERR(cma_obj))
> +               return PTR_ERR(cma_obj);
> +
> +       gem_obj = &cma_obj->base;
> +       apu_obj = to_apu_bo(gem_obj);
> +
> +       /*
> +        * Save the size of buffer expected by application instead of the
> +        * aligned one.
> +        */
> +       apu_obj->size = args->size;
> +       apu_obj->offset = 0;
> +       apu_obj->iommu_refcount = 0;
> +       mutex_init(&apu_obj->mutex);
> +
> +       ret = drm_gem_handle_create(file_priv, gem_obj, &args->handle);
> +       drm_gem_object_put(gem_obj);
> +       if (ret) {
> +               drm_gem_cma_free_object(gem_obj);
> +               return ret;
> +       }
> +       args->offset = drm_vma_node_offset_addr(&gem_obj->vma_node);
> +
> +       return 0;
> +}
> +
> +void apu_bo_iommu_unmap(struct apu_drm *apu_drm, struct apu_gem_object *obj)
> +{
> +       int iova_pfn;
> +       int i;
> +
> +       if (!obj->iommu_sgt)
> +               return;
> +
> +       mutex_lock(&obj->mutex);
> +       obj->iommu_refcount--;
> +       if (obj->iommu_refcount) {
> +               mutex_unlock(&obj->mutex);
> +               return;
> +       }
> +
> +       iova_pfn = PHYS_PFN(obj->iova);
> +       for (i = 0; i < obj->iommu_sgt->nents; i++) {
> +               iommu_unmap(apu_drm->domain, PFN_PHYS(iova_pfn),
> +                           PAGE_ALIGN(obj->iommu_sgt->sgl[i].length));
> +               iova_pfn += PHYS_PFN(PAGE_ALIGN(obj->iommu_sgt->sgl[i].length));
> +       }
> +
> +       sg_free_table(obj->iommu_sgt);
> +       kfree(obj->iommu_sgt);
> +
> +       free_iova(&apu_drm->iovad, PHYS_PFN(obj->iova));
> +       mutex_unlock(&obj->mutex);
> +}
> +
> +static struct sg_table *apu_get_sg_table(struct drm_gem_object *obj)
> +{
> +       if (obj->funcs)
> +               return obj->funcs->get_sg_table(obj);
> +       return NULL;
> +}
> +
> +int apu_bo_iommu_map(struct apu_drm *apu_drm, struct drm_gem_object *obj)
> +{
> +       struct apu_gem_object *apu_obj = to_apu_bo(obj);
> +       struct scatterlist *sgl;
> +       phys_addr_t phys;
> +       int total_buf_space;
> +       int iova_pfn;
> +       int iova;
> +       int ret;
> +       int i;
> +
> +       mutex_lock(&apu_obj->mutex);
> +       apu_obj->iommu_refcount++;
> +       if (apu_obj->iommu_refcount != 1) {
> +               mutex_unlock(&apu_obj->mutex);
> +               return 0;
> +       }
> +
> +       apu_obj->iommu_sgt = apu_get_sg_table(obj);
> +       if (IS_ERR(apu_obj->iommu_sgt)) {
> +               mutex_unlock(&apu_obj->mutex);
> +               return PTR_ERR(apu_obj->iommu_sgt);
> +       }
> +
> +       total_buf_space = obj->size;
> +       iova_pfn = alloc_iova_fast(&apu_drm->iovad,
> +                                  total_buf_space >> PAGE_SHIFT,
> +                                  apu_drm->iova_limit_pfn, true);
> +       apu_obj->iova = PFN_PHYS(iova_pfn);
> +
> +       if (!iova_pfn) {
> +               dev_err(apu_drm->dev, "Failed to allocate iova address\n");
> +               mutex_unlock(&apu_obj->mutex);
> +               return -ENOMEM;
> +       }
> +
> +       iova = apu_obj->iova;
> +       sgl = apu_obj->iommu_sgt->sgl;
> +       for (i = 0; i < apu_obj->iommu_sgt->nents; i++) {
> +               phys = page_to_phys(sg_page(&sgl[i]));
> +               ret =
> +                   iommu_map(apu_drm->domain, PFN_PHYS(iova_pfn), phys,
> +                             PAGE_ALIGN(sgl[i].length),
> +                             IOMMU_READ | IOMMU_WRITE);
> +               if (ret) {
> +                       dev_err(apu_drm->dev, "Failed to iommu map\n");
> +                       free_iova(&apu_drm->iovad, iova_pfn);
> +                       mutex_unlock(&apu_obj->mutex);
> +                       return ret;
> +               }
> +               iova += sgl[i].offset + sgl[i].length;
> +               iova_pfn += PHYS_PFN(PAGE_ALIGN(sgl[i].length));
> +       }
> +       mutex_unlock(&apu_obj->mutex);
> +
> +       return 0;
> +}
> +
> +int ioctl_gem_iommu_map(struct drm_device *dev, void *data,
> +                       struct drm_file *file_priv)
> +{
> +       struct apu_drm *apu_drm = dev->dev_private;
> +       struct drm_apu_gem_iommu_map *args = data;
> +       struct drm_gem_object **bos;
> +       void __user *bo_handles;
> +       int ret;
> +       int i;
> +
> +       u64 *das = kvmalloc_array(args->bo_handle_count,
> +                                 sizeof(u64), GFP_KERNEL);
> +       if (!das)
> +               return -ENOMEM;
> +
> +       bo_handles = (void __user *)(uintptr_t) args->bo_handles;
> +       ret = drm_gem_objects_lookup(file_priv, bo_handles,
> +                                    args->bo_handle_count, &bos);
> +       if (ret) {
> +               kvfree(das);
> +               return ret;
> +       }
> +
> +       for (i = 0; i < args->bo_handle_count; i++) {
> +               ret = apu_bo_iommu_map(apu_drm, bos[i]);
> +               if (ret) {
> +                       /* TODO: handle error */
> +                       break;
> +               }
> +               das[i] = to_apu_bo(bos[i])->iova + to_apu_bo(bos[i])->offset;
> +       }
> +
> +       if (copy_to_user((void *)args->bo_device_addresses, das,
> +                        args->bo_handle_count * sizeof(u64))) {
> +               ret = -EFAULT;
> +               DRM_DEBUG("Failed to copy device addresses\n");
> +               goto out;
> +       }
> +
> +out:
> +       kvfree(das);
> +       kvfree(bos);
> +
> +       return 0;
> +}
> +
> +int ioctl_gem_iommu_unmap(struct drm_device *dev, void *data,
> +                         struct drm_file *file_priv)
> +{
> +       struct apu_drm *apu_drm = dev->dev_private;
> +       struct drm_apu_gem_iommu_map *args = data;
> +       struct drm_gem_object **bos;
> +       void __user *bo_handles;
> +       int ret;
> +       int i;
> +
> +       bo_handles = (void __user *)(uintptr_t) args->bo_handles;
> +       ret = drm_gem_objects_lookup(file_priv, bo_handles,
> +                                    args->bo_handle_count, &bos);
> +       if (ret)
> +               return ret;
> +
> +       for (i = 0; i < args->bo_handle_count; i++)
> +               apu_bo_iommu_unmap(apu_drm, to_apu_bo(bos[i]));
> +
> +       kvfree(bos);
> +
> +       return 0;
> +}
> diff --git a/drivers/gpu/drm/apu/apu_internal.h b/drivers/gpu/drm/apu/apu_internal.h
> new file mode 100644
> index 0000000000000..b789b2f3ad9c6
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/apu_internal.h
> @@ -0,0 +1,89 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __APU_INTERNAL_H__
> +#define __APU_INTERNAL_H__
> +
> +#include <linux/iova.h>
> +
> +#include <drm/drm_drv.h>
> +#include <drm/drm_gem_cma_helper.h>
> +#include <drm/gpu_scheduler.h>
> +
> +struct apu_gem_object {
> +       struct drm_gem_cma_object base;
> +       struct mutex mutex;
> +       struct sg_table *iommu_sgt;
> +       int iommu_refcount;
> +       size_t size;
> +       u32 iova;
> +       u32 offset;
> +};
> +
> +struct apu_sched;
> +struct apu_core {
> +       int device_id;
> +       struct device *dev;
> +       struct rproc *rproc;
> +       struct apu_drm_ops *ops;
> +       struct apu_drm *apu_drm;
> +
> +       spinlock_t ctx_lock;
> +       struct list_head requests;
> +
> +       struct list_head node;
> +       void *priv;
> +
> +       struct apu_sched *sched;
> +       u32 flags;
> +};
> +
> +struct apu_drm {
> +       struct device *dev;
> +       struct drm_device *drm;
> +
> +       struct iommu_domain *domain;
> +       struct iova_domain iovad;
> +       int iova_limit_pfn;
> +
> +       struct list_head apu_cores;
> +       struct list_head node;
> +};
> +
> +static inline struct apu_gem_object *to_apu_bo(struct drm_gem_object *obj)
> +{
> +       return container_of(to_drm_gem_cma_obj(obj), struct apu_gem_object,
> +                           base);
> +}
> +
> +struct apu_gem_object *to_apu_bo(struct drm_gem_object *obj);
> +struct drm_gem_object *apu_gem_create_object(struct drm_device *dev,
> +                                            size_t size);
> +
> +int apu_bo_iommu_map(struct apu_drm *apu_drm, struct drm_gem_object *obj);
> +void apu_bo_iommu_unmap(struct apu_drm *apu_drm, struct apu_gem_object *obj);
> +struct drm_gem_object *apu_gem_create_object(struct drm_device *dev,
> +                                            size_t size);
> +int ioctl_gem_new(struct drm_device *dev, void *data,
> +                 struct drm_file *file_priv);
> +int ioctl_gem_user_new(struct drm_device *dev, void *data,
> +                      struct drm_file *file_priv);
> +int ioctl_gem_iommu_map(struct drm_device *dev, void *data,
> +                       struct drm_file *file_priv);
> +int ioctl_gem_iommu_unmap(struct drm_device *dev, void *data,
> +                         struct drm_file *file_priv);
> +int ioctl_gem_queue(struct drm_device *dev, void *data,
> +                   struct drm_file *file_priv);
> +int ioctl_gem_dequeue(struct drm_device *dev, void *data,
> +                     struct drm_file *file_priv);
> +int ioctl_apu_state(struct drm_device *dev, void *data,
> +                   struct drm_file *file_priv);
> +struct dma_buf *apu_gem_prime_export(struct drm_gem_object *gem,
> +                                    int flags);
> +
> +struct apu_job;
> +
> +int apu_drm_job_init(struct apu_core *core);
> +void apu_sched_fini(struct apu_core *core);
> +int apu_job_push(struct apu_job *job);
> +void apu_job_put(struct apu_job *job);
> +
> +#endif /* __APU_INTERNAL_H__ */
> diff --git a/drivers/gpu/drm/apu/apu_sched.c b/drivers/gpu/drm/apu/apu_sched.c
> new file mode 100644
> index 0000000000000..cebb0155c7783
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/apu_sched.c
> @@ -0,0 +1,634 @@
> +// SPDX-License-Identifier: GPL-2.0
> +//
> +// Copyright 2020 BayLibre SAS
> +
> +#include <drm/apu_drm.h>
> +#include <drm/drm_drv.h>
> +#include <drm/drm_gem_cma_helper.h>
> +#include <drm/drm_syncobj.h>
> +#include <drm/gpu_scheduler.h>
> +
> +#include <uapi/drm/apu_drm.h>
> +
> +#include "apu_internal.h"
> +
> +struct apu_queue_state {
> +       struct drm_gpu_scheduler sched;
> +
> +       u64 fence_context;
> +       u64 seqno;
> +};
> +
> +struct apu_request {
> +       struct list_head node;
> +       void *job;
> +};
> +
> +struct apu_sched {
> +       struct apu_queue_state apu_queue;
> +       spinlock_t job_lock;
> +       struct drm_sched_entity sched_entity;
> +};
> +
> +struct apu_event {
> +       struct drm_pending_event pending_event;
> +       union {
> +               struct drm_event base;
> +               struct apu_job_event job_event;
> +       };
> +};
> +
> +struct apu_job {
> +       struct drm_sched_job base;
> +
> +       struct kref refcount;
> +
> +       struct apu_core *apu_core;
> +       struct apu_drm *apu_drm;
> +
> +       /* Fence to be signaled by IRQ handler when the job is complete. */
> +       struct dma_fence *done_fence;
> +
> +       __u32 cmd;
> +
> +       /* Exclusive fences we have taken from the BOs to wait for */
> +       struct dma_fence **implicit_fences;
> +       struct drm_gem_object **bos;
> +       u32 bo_count;
> +
> +       /* Fence to be signaled by drm-sched once its done with the job */
> +       struct dma_fence *render_done_fence;
> +
> +       void *data_in;
> +       uint16_t size_in;
> +       void *data_out;
> +       uint16_t size_out;
> +       uint16_t result;
> +       uint16_t id;
> +
> +       struct list_head node;
> +       struct drm_syncobj *sync_out;
> +
> +       struct apu_event *event;
> +};
> +
> +static DEFINE_IDA(req_ida);
> +static LIST_HEAD(complete_node);
> +
> +int apu_drm_callback(struct apu_core *apu_core, void *data, int len)
> +{
> +       struct apu_request *apu_req, *tmp;
> +       struct apu_dev_request *hdr = data;
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&apu_core->ctx_lock, flags);
> +       list_for_each_entry_safe(apu_req, tmp, &apu_core->requests, node) {
> +               struct apu_job *job = apu_req->job;
> +
> +               if (job && hdr->id == job->id) {
> +                       kref_get(&job->refcount);
> +                       job->result = hdr->result;
> +                       if (job->size_out)
> +                               memcpy(job->data_out, hdr->data + job->size_in,
> +                                      min(job->size_out, hdr->size_out));
> +                       job->size_out = hdr->size_out;
> +                       list_add(&job->node, &complete_node);
> +                       list_del(&apu_req->node);
> +                       ida_simple_remove(&req_ida, hdr->id);
> +                       kfree(apu_req);
> +                       drm_send_event(job->apu_drm->drm,
> +                                      &job->event->pending_event);
> +                       dma_fence_signal_locked(job->done_fence);
> +               }
> +       }
> +       spin_unlock_irqrestore(&apu_core->ctx_lock, flags);
> +
> +       return 0;
> +}
> +
> +void apu_sched_fini(struct apu_core *core)
> +{
> +       drm_sched_fini(&core->sched->apu_queue.sched);
> +       devm_kfree(core->dev, core->sched);
> +       core->flags &= ~APU_ONLINE;
> +       core->sched = NULL;
> +}
> +
> +static void apu_job_cleanup(struct kref *ref)
> +{
> +       struct apu_job *job = container_of(ref, struct apu_job,
> +                                          refcount);
> +       unsigned int i;
> +
> +       if (job->implicit_fences) {
> +               for (i = 0; i < job->bo_count; i++)
> +                       dma_fence_put(job->implicit_fences[i]);
> +               kvfree(job->implicit_fences);
> +       }
> +       dma_fence_put(job->done_fence);
> +       dma_fence_put(job->render_done_fence);
> +
> +       if (job->bos) {
> +               for (i = 0; i < job->bo_count; i++) {
> +                       struct apu_gem_object *apu_obj;
> +
> +                       apu_obj = to_apu_bo(job->bos[i]);
> +                       apu_bo_iommu_unmap(job->apu_drm, apu_obj);
> +                       drm_gem_object_put(job->bos[i]);
> +               }
> +
> +               kvfree(job->bos);
> +       }
> +
> +       kfree(job->data_out);
> +       kfree(job->data_in);
> +       kfree(job);
> +}
> +
> +void apu_job_put(struct apu_job *job)
> +{
> +       kref_put(&job->refcount, apu_job_cleanup);
> +}
> +
> +static void apu_acquire_object_fences(struct drm_gem_object **bos,
> +                                     int bo_count,
> +                                     struct dma_fence **implicit_fences)
> +{
> +       int i;
> +
> +       for (i = 0; i < bo_count; i++)
> +               implicit_fences[i] = dma_resv_get_excl_unlocked(bos[i]->resv);
> +}
> +
> +static void apu_attach_object_fences(struct drm_gem_object **bos,
> +                                    int bo_count, struct dma_fence *fence)
> +{
> +       int i;
> +
> +       for (i = 0; i < bo_count; i++)
> +               dma_resv_add_excl_fence(bos[i]->resv, fence);
> +}
> +
> +int apu_job_push(struct apu_job *job)
> +{
> +       struct drm_sched_entity *entity = &job->apu_core->sched->sched_entity;
> +       struct ww_acquire_ctx acquire_ctx;
> +       int ret = 0;
> +
> +       ret = drm_gem_lock_reservations(job->bos, job->bo_count, &acquire_ctx);
> +       if (ret)
> +               return ret;
> +
> +       ret = drm_sched_job_init(&job->base, entity, NULL);
> +       if (ret)
> +               goto unlock;
> +
> +       job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
> +
> +       kref_get(&job->refcount);       /* put by scheduler job completion */
> +
> +       apu_acquire_object_fences(job->bos, job->bo_count,
> +                                 job->implicit_fences);
> +
> +       drm_sched_entity_push_job(&job->base, entity);
> +
> +       apu_attach_object_fences(job->bos, job->bo_count,
> +                                job->render_done_fence);
> +
> +unlock:
> +       drm_gem_unlock_reservations(job->bos, job->bo_count, &acquire_ctx);
> +
> +       return ret;
> +}
> +
> +static const char *apu_fence_get_driver_name(struct dma_fence *fence)
> +{
> +       return "apu";
> +}
> +
> +static const char *apu_fence_get_timeline_name(struct dma_fence *fence)
> +{
> +       return "apu-0";
> +}
> +
> +static void apu_fence_release(struct dma_fence *f)
> +{
> +       kfree(f);
> +}
> +
> +static const struct dma_fence_ops apu_fence_ops = {
> +       .get_driver_name = apu_fence_get_driver_name,
> +       .get_timeline_name = apu_fence_get_timeline_name,
> +       .release = apu_fence_release,
> +};
> +
> +static struct dma_fence *apu_fence_create(struct apu_sched *sched)
> +{
> +       struct dma_fence *fence;
> +       struct apu_queue_state *apu_queue = &sched->apu_queue;
> +
> +       fence = kzalloc(sizeof(*fence), GFP_KERNEL);
> +       if (!fence)
> +               return ERR_PTR(-ENOMEM);
> +
> +       dma_fence_init(fence, &apu_fence_ops, &sched->job_lock,
> +                      apu_queue->fence_context, apu_queue->seqno++);
> +
> +       return fence;
> +}
> +
> +static struct apu_job *to_apu_job(struct drm_sched_job *sched_job)
> +{
> +       return container_of(sched_job, struct apu_job, base);
> +}
> +
> +static struct dma_fence *apu_job_dependency(struct drm_sched_job *sched_job,
> +                                           struct drm_sched_entity *s_entity)
> +{
> +       struct apu_job *job = to_apu_job(sched_job);
> +       struct dma_fence *fence;
> +       unsigned int i;
> +
> +       /* Implicit fences, max. one per BO */
> +       for (i = 0; i < job->bo_count; i++) {
> +               if (job->implicit_fences[i]) {
> +                       fence = job->implicit_fences[i];
> +                       job->implicit_fences[i] = NULL;
> +                       return fence;
> +               }
> +       }
> +
> +       return NULL;
> +}
> +
> +static int apu_job_hw_submit(struct apu_job *job)
> +{
> +       int ret;
> +       struct apu_core *apu_core = job->apu_core;
> +       struct apu_dev_request *dev_req;
> +       struct apu_request *apu_req;
> +       unsigned long flags;
> +
> +       int size = sizeof(*dev_req) + sizeof(u32) * job->bo_count * 2;
> +       u32 *dev_req_da;
> +       u32 *dev_req_buffer_size;
> +       int i;
> +
> +       dev_req = kmalloc(size + job->size_in + job->size_out, GFP_KERNEL);
> +       if (!dev_req)
> +               return -ENOMEM;
> +
> +       dev_req->cmd = job->cmd;
> +       dev_req->size_in = job->size_in;
> +       dev_req->size_out = job->size_out;
> +       dev_req->count = job->bo_count;
> +       dev_req_da =
> +           (u32 *) (dev_req->data + dev_req->size_in + dev_req->size_out);
> +       dev_req_buffer_size = (u32 *) (dev_req_da + dev_req->count);
> +       memcpy(dev_req->data, job->data_in, job->size_in);
> +
> +       apu_req = kzalloc(sizeof(*apu_req), GFP_KERNEL);
> +       if (!apu_req)
> +               return -ENOMEM;
> +
> +       for (i = 0; i < job->bo_count; i++) {
> +               struct apu_gem_object *obj = to_apu_bo(job->bos[i]);
> +
> +               dev_req_da[i] = obj->iova + obj->offset;
> +               dev_req_buffer_size[i] = obj->size;
> +       }
> +
> +       ret = ida_simple_get(&req_ida, 0, 0xffff, GFP_KERNEL);
> +       if (ret < 0)
> +               goto err_free_memory;
> +
> +       dev_req->id = ret;
> +
> +       job->id = dev_req->id;
> +       apu_req->job = job;
> +       spin_lock_irqsave(&apu_core->ctx_lock, flags);
> +       list_add(&apu_req->node, &apu_core->requests);
> +       spin_unlock_irqrestore(&apu_core->ctx_lock, flags);
> +       ret =
> +           apu_core->ops->send(apu_core, dev_req,
> +                               size + dev_req->size_in + dev_req->size_out);
> +       if (ret < 0)
> +               goto err;
> +       kfree(dev_req);
> +
> +       return 0;
> +
> +err:
> +       list_del(&apu_req->node);
> +       ida_simple_remove(&req_ida, dev_req->id);
> +err_free_memory:
> +       kfree(apu_req);
> +       kfree(dev_req);
> +
> +       return ret;
> +}
> +
> +static struct dma_fence *apu_job_run(struct drm_sched_job *sched_job)
> +{
> +       struct apu_job *job = to_apu_job(sched_job);
> +       struct dma_fence *fence = NULL;
> +
> +       if (unlikely(job->base.s_fence->finished.error))
> +               return NULL;
> +
> +       fence = apu_fence_create(job->apu_core->sched);
> +       if (IS_ERR(fence))
> +               return NULL;
> +
> +       job->done_fence = dma_fence_get(fence);
> +
> +       apu_job_hw_submit(job);
> +
> +       return fence;
> +}
> +
> +static void apu_update_rpoc_state(struct apu_core *core)
> +{
> +       if (core->rproc) {
> +               if (core->rproc->state == RPROC_CRASHED)
> +                       core->flags |= APU_CRASHED;
> +               if (core->rproc->state == RPROC_OFFLINE)
> +                       core->flags &= ~APU_ONLINE;
> +       }
> +}
> +
> +static enum drm_gpu_sched_stat apu_job_timedout(struct drm_sched_job *sched_job)
> +{
> +       struct apu_request *apu_req, *tmp;
> +       struct apu_job *job = to_apu_job(sched_job);
> +
> +       if (dma_fence_is_signaled(job->done_fence))
> +               return DRM_GPU_SCHED_STAT_NOMINAL;
> +
> +       list_for_each_entry_safe(apu_req, tmp, &job->apu_core->requests, node) {
> +               /* Remove the request and notify user about timeout */
> +               if (apu_req->job == job) {
> +                       kref_get(&job->refcount);
> +                       job->apu_core->flags |= APU_TIMEDOUT;
> +                       apu_update_rpoc_state(job->apu_core);
> +                       job->result = ETIMEDOUT;
> +                       list_add(&job->node, &complete_node);
> +                       list_del(&apu_req->node);
> +                       ida_simple_remove(&req_ida, job->id);
> +                       kfree(apu_req);
> +                       drm_send_event(job->apu_drm->drm,
> +                                      &job->event->pending_event);
> +                       dma_fence_signal_locked(job->done_fence);
> +               }
> +       }
> +
> +       return DRM_GPU_SCHED_STAT_NOMINAL;
> +}
> +
> +static void apu_job_free(struct drm_sched_job *sched_job)
> +{
> +       struct apu_job *job = to_apu_job(sched_job);
> +
> +       drm_sched_job_cleanup(sched_job);
> +
> +       apu_job_put(job);
> +}
> +
> +static const struct drm_sched_backend_ops apu_sched_ops = {
> +       .dependency = apu_job_dependency,
> +       .run_job = apu_job_run,
> +       .timedout_job = apu_job_timedout,
> +       .free_job = apu_job_free
> +};
> +
> +int apu_drm_job_init(struct apu_core *core)
> +{
> +       int ret;
> +       struct apu_sched *apu_sched;
> +       struct drm_gpu_scheduler *sched;
> +
> +       apu_sched = devm_kzalloc(core->dev, sizeof(*apu_sched), GFP_KERNEL);
> +       if (!apu_sched)
> +               return -ENOMEM;
> +
> +       sched = &apu_sched->apu_queue.sched;
> +       apu_sched->apu_queue.fence_context = dma_fence_context_alloc(1);
> +       ret = drm_sched_init(sched, &apu_sched_ops,
> +                            1, 0, msecs_to_jiffies(500),
> +                            NULL, NULL, "apu_js");
> +       if (ret) {
> +               dev_err(core->dev, "Failed to create scheduler: %d.", ret);
> +               return ret;
> +       }
> +
> +       ret = drm_sched_entity_init(&apu_sched->sched_entity,
> +                                   DRM_SCHED_PRIORITY_NORMAL,
> +                                   &sched, 1, NULL);
> +
> +       core->sched = apu_sched;
> +       core->flags = APU_ONLINE;
> +
> +       return ret;
> +}
> +
> +static struct apu_core *get_apu_core(struct apu_drm *apu_drm, int device_id)
> +{
> +       struct apu_core *apu_core;
> +
> +       list_for_each_entry(apu_core, &apu_drm->apu_cores, node) {
> +               if (apu_core->device_id == device_id)
> +                       return apu_core;
> +       }
> +
> +       return NULL;
> +}
> +
> +static int apu_core_is_running(struct apu_core *core)
> +{
> +       return core->ops && core->priv && core->sched;
> +}
> +
> +static int
> +apu_lookup_bos(struct drm_device *dev,
> +              struct drm_file *file_priv,
> +              struct drm_apu_gem_queue *args, struct apu_job *job)
> +{
> +       void __user *bo_handles;
> +       unsigned int i;
> +       int ret;
> +
> +       job->bo_count = args->bo_handle_count;
> +
> +       if (!job->bo_count)
> +               return 0;
> +
> +       job->implicit_fences = kvmalloc_array(job->bo_count,
> +                                             sizeof(struct dma_fence *),
> +                                             GFP_KERNEL | __GFP_ZERO);
> +       if (!job->implicit_fences)
> +               return -ENOMEM;
> +
> +       bo_handles = (void __user *)(uintptr_t) args->bo_handles;
> +       ret = drm_gem_objects_lookup(file_priv, bo_handles,
> +                                    job->bo_count, &job->bos);
> +       if (ret)
> +               return ret;
> +
> +       for (i = 0; i < job->bo_count; i++) {
> +               ret = apu_bo_iommu_map(job->apu_drm, job->bos[i]);
> +               if (ret) {
> +                       /* TODO: handle error */
> +                       break;
> +               }
> +       }
> +
> +       return ret;
> +}
> +
> +int ioctl_gem_queue(struct drm_device *dev, void *data,
> +                   struct drm_file *file_priv)
> +{
> +       struct apu_drm *apu_drm = dev->dev_private;
> +       struct drm_apu_gem_queue *args = data;
> +       struct apu_event *event;
> +       struct apu_core *core;
> +       struct drm_syncobj *sync_out = NULL;
> +       struct apu_job *job;
> +       int ret = 0;
> +
> +       core = get_apu_core(apu_drm, args->device);
> +       if (!apu_core_is_running(core))
> +               return -ENODEV;
> +
> +       if (args->out_sync > 0) {
> +               sync_out = drm_syncobj_find(file_priv, args->out_sync);
> +               if (!sync_out)
> +                       return -ENODEV;
> +       }
> +
> +       job = kzalloc(sizeof(*job), GFP_KERNEL);
> +       if (!job) {
> +               ret = -ENOMEM;
> +               goto fail_out_sync;
> +       }
> +
> +       kref_init(&job->refcount);
> +
> +       job->apu_drm = apu_drm;
> +       job->apu_core = core;
> +       job->cmd = args->cmd;
> +       job->size_in = args->size_in;
> +       job->size_out = args->size_out;
> +       job->sync_out = sync_out;
> +       if (job->size_in) {
> +               job->data_in = kmalloc(job->size_in, GFP_KERNEL);
> +               if (!job->data_in) {
> +                       ret = -ENOMEM;
> +                       goto fail_job;
> +               }
> +
> +               ret =
> +                   copy_from_user(job->data_in,
> +                                  (void __user *)(uintptr_t) args->data,
> +                                  job->size_in);
> +               if (ret)
> +                       goto fail_job;
> +       }
> +
> +       if (job->size_out) {
> +               job->data_out = kmalloc(job->size_out, GFP_KERNEL);
> +               if (!job->data_out) {
> +                       ret = -ENOMEM;
> +                       goto fail_job;
> +               }
> +       }
> +
> +       ret = apu_lookup_bos(dev, file_priv, args, job);
> +       if (ret)
> +               goto fail_job;
> +
> +       event = kzalloc(sizeof(*event), GFP_KERNEL);
> +       event->base.length = sizeof(struct apu_job_event);
> +       event->base.type = APU_JOB_COMPLETED;
> +       event->job_event.out_sync = args->out_sync;
> +       job->event = event;
> +       ret = drm_event_reserve_init(dev, file_priv, &job->event->pending_event,
> +                                    &job->event->base);
> +       if (ret)
> +               goto fail_job;
> +
> +       ret = apu_job_push(job);
> +       if (ret) {
> +               drm_event_cancel_free(dev, &job->event->pending_event);
> +               goto fail_job;
> +       }
> +
> +       /* Update the return sync object for the job */
> +       if (sync_out)
> +               drm_syncobj_replace_fence(sync_out, job->render_done_fence);
> +
> +fail_job:
> +       apu_job_put(job);
> +fail_out_sync:
> +       if (sync_out)
> +               drm_syncobj_put(sync_out);
> +
> +       return ret;
> +}
> +
> +int ioctl_gem_dequeue(struct drm_device *dev, void *data,
> +                     struct drm_file *file_priv)
> +{
> +       struct drm_apu_gem_dequeue *args = data;
> +       struct drm_syncobj *sync_out = NULL;
> +       struct apu_job *job;
> +       int ret = 0;
> +
> +       if (args->out_sync > 0) {
> +               sync_out = drm_syncobj_find(file_priv, args->out_sync);
> +               if (!sync_out)
> +                       return -ENODEV;
> +       }
> +
> +       list_for_each_entry(job, &complete_node, node) {
> +               if (job->sync_out == sync_out) {
> +                       if (job->data_out) {
> +                               ret = copy_to_user((void __user *)(uintptr_t)
> +                                                  args->data, job->data_out,
> +                                                  job->size_out);
> +                               args->size = job->size_out;
> +                       }
> +                       args->result = job->result;
> +                       list_del(&job->node);
> +                       apu_job_put(job);
> +                       drm_syncobj_put(sync_out);
> +
> +                       return ret;
> +               }
> +       }
> +
> +       if (sync_out)
> +               drm_syncobj_put(sync_out);
> +
> +       return 0;
> +}
> +
> +int ioctl_apu_state(struct drm_device *dev, void *data,
> +                   struct drm_file *file_priv)
> +{
> +       struct apu_drm *apu_drm = dev->dev_private;
> +       struct drm_apu_state *args = data;
> +       struct apu_core *core;
> +
> +       args->flags = 0;
> +
> +       core = get_apu_core(apu_drm, args->device);
> +       if (!core)
> +               return -ENODEV;
> +       args->flags |= core->flags;
> +
> +       /* Reset APU flags */
> +       core->flags &= ~(APU_TIMEDOUT | APU_CRASHED);
> +
> +       return 0;
> +}
> diff --git a/include/drm/apu_drm.h b/include/drm/apu_drm.h
> new file mode 100644
> index 0000000000000..f044ed0427fdd
> --- /dev/null
> +++ b/include/drm/apu_drm.h
> @@ -0,0 +1,59 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __APU_DRM_H__
> +#define __APU_DRM_H__
> +
> +#include <linux/iova.h>
> +#include <linux/remoteproc.h>
> +
> +struct apu_core;
> +struct apu_drm;
> +
> +struct apu_drm_ops {
> +       int (*send)(struct apu_core *apu_core, void *data, int len);
> +       int (*callback)(struct apu_core *apu_core, void *data, int len);
> +};
> +
> +#ifdef CONFIG_DRM_APU
> +
> +struct apu_core *apu_drm_register_core(struct rproc *rproc,
> +                                      struct apu_drm_ops *ops, void *priv);
> +int apu_drm_reserve_iova(struct apu_core *apu_core, u64 start, u64 size);
> +int apu_drm_unregister_core(void *priv);
> +int apu_drm_callback(struct apu_core *apu_core, void *data, int len);
> +void *apu_drm_priv(struct apu_core *apu_core);
> +
> +#else /* CONFIG_DRM_APU */
> +
> +static inline
> +struct apu_core *apu_drm_register_core(struct rproc *rproc,
> +                                      struct apu_drm_ops *ops, void *priv)
> +{
> +       return NULL;
> +}
> +
> +static inline
> +int apu_drm_reserve_iova(struct apu_core *apu_core, u64 start, u64 size)
> +{
> +       return -ENOMEM;
> +}
> +
> +static inline
> +int apu_drm_uregister_core(void *priv)
> +{
> +       return -ENODEV;
> +}
> +
> +static inline
> +int apu_drm_callback(struct apu_core *apu_core, void *data, int len)
> +{
> +       return -ENODEV;
> +}
> +
> +static inline void *apu_drm_priv(struct apu_core *apu_core)
> +{
> +       return NULL;
> +}
> +#endif /* CONFIG_DRM_APU */
> +
> +
> +#endif /* __APU_DRM_H__ */
> diff --git a/include/uapi/drm/apu_drm.h b/include/uapi/drm/apu_drm.h
> new file mode 100644
> index 0000000000000..c52e187bb0599
> --- /dev/null
> +++ b/include/uapi/drm/apu_drm.h
> @@ -0,0 +1,106 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __UAPI_APU_DRM_H__
> +#define __UAPI_APU_DRM_H__
> +
> +#include "drm.h"
> +
> +#if defined(__cplusplus)
> +extern "C" {
> +#endif
> +
> +#define APU_JOB_COMPLETED 0x80000000
> +
> +/*
> + * Please note that modifications to all structs defined here are
> + * subject to backwards-compatibility constraints.
> + */
> +
> +/*
> + * Firmware request, must be aligned with the one defined in firmware.
> + * @id: Request id, used in the case of reply, to find the pending request
> + * @cmd: The command id to execute in the firmware
> + * @result: The result of the command executed on the firmware
> + * @size: The size of the data available in this request
> + * @count: The number of shared buffer
> + * @data: Contains the data attached with the request if size is greater than
> + *        zero, and the addresses of shared buffers if count is greater than
> + *        zero. Both the data and the shared buffer could be read and write
> + *        by the APU.
> + */
> +struct  apu_dev_request {
> +       u16 id;
> +       u16 cmd;
> +       u16 result;
> +       u16 size_in;
> +       u16 size_out;
> +       u16 count;
> +       u8 data[0];
> +} __packed;
> +
> +struct drm_apu_gem_new {
> +       __u32 size;                     /* in */
> +       __u32 flags;                    /* in */
> +       __u32 handle;                   /* out */
> +       __u64 offset;                   /* out */
> +};
> +

Please refer to
https://www.kernel.org/doc/Documentation/ioctl/botching-up-ioctls.rst

here and below in many places.

There's a lot of missing padding/alignment here.

I'm trying to find the time to review this stack in full, any writeups
on how this is used from userspace would be useful (not just the code
repo, but some sort of how do I get at it) it reads as kinda generic
(calling it apu), but then has some specifics around device binding.

Dave.

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 2/4] DRM: Add support of AI Processor Unit (APU)
@ 2021-09-23  0:58     ` Dave Airlie
  0 siblings, 0 replies; 34+ messages in thread
From: Dave Airlie @ 2021-09-23  0:58 UTC (permalink / raw)
  To: Alexandre Bailon
  Cc: Dave Airlie, Daniel Vetter, Rob Herring, Matthias Brugger,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, ohad,
	bjorn.andersson, Mathieu Poirier, Sumit Semwal, Koenig,
	Christian, dri-devel,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	linux-arm-kernel, moderated list:ARM/Mediatek SoC support, LKML,
	linux-remoteproc, Linux Media Mailing List,
	moderated list:DMA BUFFER SHARING FRAMEWORK, khilman, gpain

On Sat, 18 Sept 2021 at 07:57, Alexandre Bailon <abailon@baylibre.com> wrote:
>
> Some Mediatek SoC provides hardware accelerator for AI / ML.
> This driver provides the infrastructure to manage memory
> shared between host CPU and the accelerator, and to submit
> jobs to the accelerator.
> The APU itself is managed by remoteproc so this drivers
> relies on remoteproc to found the APU and get some important data
> from it. But, the driver is quite generic and it should possible
> to manage accelerator using another ways.
> This driver doesn't manage itself the data transmitions.
> It must be registered by another driver implementing the transmitions.
>
> Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
> ---
>  drivers/gpu/drm/Kconfig            |   2 +
>  drivers/gpu/drm/Makefile           |   1 +
>  drivers/gpu/drm/apu/Kconfig        |  10 +
>  drivers/gpu/drm/apu/Makefile       |   7 +
>  drivers/gpu/drm/apu/apu_drm_drv.c  | 238 +++++++++++
>  drivers/gpu/drm/apu/apu_gem.c      | 232 +++++++++++
>  drivers/gpu/drm/apu/apu_internal.h |  89 ++++
>  drivers/gpu/drm/apu/apu_sched.c    | 634 +++++++++++++++++++++++++++++
>  include/drm/apu_drm.h              |  59 +++
>  include/uapi/drm/apu_drm.h         | 106 +++++
>  10 files changed, 1378 insertions(+)
>  create mode 100644 drivers/gpu/drm/apu/Kconfig
>  create mode 100644 drivers/gpu/drm/apu/Makefile
>  create mode 100644 drivers/gpu/drm/apu/apu_drm_drv.c
>  create mode 100644 drivers/gpu/drm/apu/apu_gem.c
>  create mode 100644 drivers/gpu/drm/apu/apu_internal.h
>  create mode 100644 drivers/gpu/drm/apu/apu_sched.c
>  create mode 100644 include/drm/apu_drm.h
>  create mode 100644 include/uapi/drm/apu_drm.h
>
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index 8fc40317f2b77..bcdca35c9eda5 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -382,6 +382,8 @@ source "drivers/gpu/drm/xlnx/Kconfig"
>
>  source "drivers/gpu/drm/gud/Kconfig"
>
> +source "drivers/gpu/drm/apu/Kconfig"
> +
>  config DRM_HYPERV
>         tristate "DRM Support for Hyper-V synthetic video device"
>         depends on DRM && PCI && MMU && HYPERV
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index ad11121548983..f3d8432976558 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -127,4 +127,5 @@ obj-$(CONFIG_DRM_MCDE) += mcde/
>  obj-$(CONFIG_DRM_TIDSS) += tidss/
>  obj-y                  += xlnx/
>  obj-y                  += gud/
> +obj-$(CONFIG_DRM_APU) += apu/
>  obj-$(CONFIG_DRM_HYPERV) += hyperv/
> diff --git a/drivers/gpu/drm/apu/Kconfig b/drivers/gpu/drm/apu/Kconfig
> new file mode 100644
> index 0000000000000..c8471309a0351
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/Kconfig
> @@ -0,0 +1,10 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +
> +config DRM_APU
> +       tristate "APU (AI Processor Unit)"
> +       select REMOTEPROC
> +       select DRM_SCHED
> +       help
> +         This provides a DRM driver that provides some facilities to
> +         communicate with an accelerated processing unit (APU).
> diff --git a/drivers/gpu/drm/apu/Makefile b/drivers/gpu/drm/apu/Makefile
> new file mode 100644
> index 0000000000000..3e97846b091c9
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/Makefile
> @@ -0,0 +1,7 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +apu_drm-y += apu_drm_drv.o
> +apu_drm-y += apu_sched.o
> +apu_drm-y += apu_gem.o
> +
> +obj-$(CONFIG_DRM_APU) += apu_drm.o
> diff --git a/drivers/gpu/drm/apu/apu_drm_drv.c b/drivers/gpu/drm/apu/apu_drm_drv.c
> new file mode 100644
> index 0000000000000..91d8c99e373c0
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/apu_drm_drv.c
> @@ -0,0 +1,238 @@
> +// SPDX-License-Identifier: GPL-2.0
> +//
> +// Copyright 2020 BayLibre SAS
> +
> +#include <linux/dma-map-ops.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/iommu.h>
> +#include <linux/iova.h>
> +#include <linux/list.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/platform_device.h>
> +#include <linux/remoteproc.h>
> +
> +#include <drm/apu_drm.h>
> +#include <drm/drm_drv.h>
> +#include <drm/drm_gem_cma_helper.h>
> +#include <drm/drm_probe_helper.h>
> +
> +#include <uapi/drm/apu_drm.h>
> +
> +#include "apu_internal.h"
> +
> +static LIST_HEAD(apu_devices);
> +
> +static const struct drm_ioctl_desc ioctls[] = {
> +       DRM_IOCTL_DEF_DRV(APU_GEM_NEW, ioctl_gem_new,
> +                         DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(APU_GEM_QUEUE, ioctl_gem_queue,
> +                         DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(APU_GEM_DEQUEUE, ioctl_gem_dequeue,
> +                         DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_MAP, ioctl_gem_iommu_map,
> +                         DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_UNMAP, ioctl_gem_iommu_unmap,
> +                         DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(APU_STATE, ioctl_apu_state,
> +                         DRM_RENDER_ALLOW),
> +};
> +
> +DEFINE_DRM_GEM_CMA_FOPS(apu_drm_ops);
> +
> +static struct drm_driver apu_drm_driver = {
> +       .driver_features = DRIVER_GEM | DRIVER_SYNCOBJ,
> +       .name = "drm_apu",
> +       .desc = "APU DRM driver",
> +       .date = "20210319",
> +       .major = 1,
> +       .minor = 0,
> +       .patchlevel = 0,
> +       .ioctls = ioctls,
> +       .num_ioctls = ARRAY_SIZE(ioctls),
> +       .fops = &apu_drm_ops,
> +       DRM_GEM_CMA_DRIVER_OPS_WITH_DUMB_CREATE(drm_gem_cma_dumb_create),
> +};
> +
> +void *apu_drm_priv(struct apu_core *apu_core)
> +{
> +       return apu_core->priv;
> +}
> +EXPORT_SYMBOL_GPL(apu_drm_priv);
> +
> +int apu_drm_reserve_iova(struct apu_core *apu_core, u64 start, u64 size)
> +{
> +       struct apu_drm *apu_drm = apu_core->apu_drm;
> +       struct iova *iova;
> +
> +       iova = reserve_iova(&apu_drm->iovad, PHYS_PFN(start),
> +                           PHYS_PFN(start + size));
> +       if (!iova)
> +               return -ENOMEM;
> +
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(apu_drm_reserve_iova);
> +
> +static int apu_drm_init_first_core(struct apu_drm *apu_drm,
> +                                  struct apu_core *apu_core)
> +{
> +       struct drm_device *drm;
> +       struct device *parent;
> +       u64 mask;
> +
> +       drm = apu_drm->drm;
> +       parent = apu_core->rproc->dev.parent;
> +       drm->dev->iommu_group = parent->iommu_group;
> +       apu_drm->domain = iommu_get_domain_for_dev(parent);
> +       set_dma_ops(drm->dev, get_dma_ops(parent));
> +       mask = dma_get_mask(parent);
> +       return dma_coerce_mask_and_coherent(drm->dev, mask);
> +}
> +
> +struct apu_core *apu_drm_register_core(struct rproc *rproc,
> +                                      struct apu_drm_ops *ops, void *priv)
> +{
> +       struct apu_drm *apu_drm;
> +       struct apu_core *apu_core;
> +       int ret;
> +
> +       list_for_each_entry(apu_drm, &apu_devices, node) {
> +               list_for_each_entry(apu_core, &apu_drm->apu_cores, node) {
> +                       if (apu_core->rproc == rproc) {
> +                               ret =
> +                                   apu_drm_init_first_core(apu_drm, apu_core);
> +                               apu_core->dev = &rproc->dev;
> +                               apu_core->priv = priv;
> +                               apu_core->ops = ops;
> +
> +                               ret = apu_drm_job_init(apu_core);
> +                               if (ret)
> +                                       return NULL;
> +
> +                               return apu_core;
> +                       }
> +               }
> +       }
> +
> +       return NULL;
> +}
> +EXPORT_SYMBOL_GPL(apu_drm_register_core);
> +
> +int apu_drm_unregister_core(void *priv)
> +{
> +       struct apu_drm *apu_drm;
> +       struct apu_core *apu_core;
> +
> +       list_for_each_entry(apu_drm, &apu_devices, node) {
> +               list_for_each_entry(apu_core, &apu_drm->apu_cores, node) {
> +                       if (apu_core->priv == priv) {
> +                               apu_sched_fini(apu_core);
> +                               apu_core->priv = NULL;
> +                               apu_core->ops = NULL;
> +                       }
> +               }
> +       }
> +
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(apu_drm_unregister_core);
> +
> +#ifdef CONFIG_OF
> +static const struct of_device_id apu_platform_of_match[] = {
> +       { .compatible = "mediatek,apu-drm", },
> +       { },
> +};
> +
> +MODULE_DEVICE_TABLE(of, apu_platform_of_match);
> +#endif
> +
> +static int apu_platform_probe(struct platform_device *pdev)
> +{
> +       struct drm_device *drm;
> +       struct apu_drm *apu_drm;
> +       struct of_phandle_iterator it;
> +       int index = 0;
> +       u64 iova[2];
> +       int ret;
> +
> +       apu_drm = devm_kzalloc(&pdev->dev, sizeof(*apu_drm), GFP_KERNEL);
> +       if (!apu_drm)
> +               return -ENOMEM;
> +       INIT_LIST_HEAD(&apu_drm->apu_cores);
> +
> +       of_phandle_iterator_init(&it, pdev->dev.of_node, "remoteproc", NULL, 0);
> +       while (of_phandle_iterator_next(&it) == 0) {
> +               struct rproc *rproc = rproc_get_by_phandle(it.phandle);
> +               struct apu_core *apu_core;
> +
> +               if (!rproc)
> +                       return -EPROBE_DEFER;
> +
> +               apu_core = devm_kzalloc(&pdev->dev, sizeof(*apu_core),
> +                                       GFP_KERNEL);
> +               if (!apu_core)
> +                       return -ENOMEM;
> +
> +               apu_core->rproc = rproc;
> +               apu_core->device_id = index++;
> +               apu_core->apu_drm = apu_drm;
> +               spin_lock_init(&apu_core->ctx_lock);
> +               INIT_LIST_HEAD(&apu_core->requests);
> +               list_add(&apu_core->node, &apu_drm->apu_cores);
> +       }
> +
> +       if (of_property_read_variable_u64_array(pdev->dev.of_node, "iova",
> +                                               iova, ARRAY_SIZE(iova),
> +                                               ARRAY_SIZE(iova)) !=
> +           ARRAY_SIZE(iova))
> +               return -EINVAL;
> +
> +       init_iova_domain(&apu_drm->iovad, PAGE_SIZE, PHYS_PFN(iova[0]));
> +       apu_drm->iova_limit_pfn = PHYS_PFN(iova[0] + iova[1]) - 1;
> +
> +       drm = drm_dev_alloc(&apu_drm_driver, &pdev->dev);
> +       if (IS_ERR(drm)) {
> +               ret = PTR_ERR(drm);
> +               return ret;
> +       }
> +
> +       ret = drm_dev_register(drm, 0);
> +       if (ret) {
> +               drm_dev_put(drm);
> +               return ret;
> +       }
> +
> +       drm->dev_private = apu_drm;
> +       apu_drm->drm = drm;
> +       apu_drm->dev = &pdev->dev;
> +
> +       platform_set_drvdata(pdev, drm);
> +
> +       list_add(&apu_drm->node, &apu_devices);
> +
> +       return 0;
> +}
> +
> +static int apu_platform_remove(struct platform_device *pdev)
> +{
> +       struct drm_device *drm;
> +
> +       drm = platform_get_drvdata(pdev);
> +
> +       drm_dev_unregister(drm);
> +       drm_dev_put(drm);
> +
> +       return 0;
> +}
> +
> +static struct platform_driver apu_platform_driver = {
> +       .probe = apu_platform_probe,
> +       .remove = apu_platform_remove,
> +       .driver = {
> +                  .name = "apu_drm",
> +                  .of_match_table = of_match_ptr(apu_platform_of_match),
> +       },
> +};
> +
> +module_platform_driver(apu_platform_driver);
> diff --git a/drivers/gpu/drm/apu/apu_gem.c b/drivers/gpu/drm/apu/apu_gem.c
> new file mode 100644
> index 0000000000000..c867143dab436
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/apu_gem.c
> @@ -0,0 +1,232 @@
> +// SPDX-License-Identifier: GPL-2.0
> +//
> +// Copyright 2020 BayLibre SAS
> +
> +#include <asm/cacheflush.h>
> +
> +#include <linux/dma-buf.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/highmem.h>
> +#include <linux/iommu.h>
> +#include <linux/iova.h>
> +#include <linux/mm.h>
> +#include <linux/swap.h>
> +
> +#include <drm/drm_drv.h>
> +#include <drm/drm_gem_cma_helper.h>
> +
> +#include <uapi/drm/apu_drm.h>
> +
> +#include "apu_internal.h"
> +
> +struct drm_gem_object *apu_gem_create_object(struct drm_device *dev,
> +                                            size_t size)
> +{
> +       struct drm_gem_cma_object *cma_obj;
> +
> +       cma_obj = drm_gem_cma_create(dev, size);
> +       if (!cma_obj)
> +               return NULL;
> +
> +       return &cma_obj->base;
> +}
> +
> +int ioctl_gem_new(struct drm_device *dev, void *data,
> +                 struct drm_file *file_priv)
> +{
> +       struct drm_apu_gem_new *args = data;
> +       struct drm_gem_cma_object *cma_obj;
> +       struct apu_gem_object *apu_obj;
> +       struct drm_gem_object *gem_obj;
> +       int ret;
> +
> +       cma_obj = drm_gem_cma_create(dev, args->size);
> +       if (IS_ERR(cma_obj))
> +               return PTR_ERR(cma_obj);
> +
> +       gem_obj = &cma_obj->base;
> +       apu_obj = to_apu_bo(gem_obj);
> +
> +       /*
> +        * Save the size of buffer expected by application instead of the
> +        * aligned one.
> +        */
> +       apu_obj->size = args->size;
> +       apu_obj->offset = 0;
> +       apu_obj->iommu_refcount = 0;
> +       mutex_init(&apu_obj->mutex);
> +
> +       ret = drm_gem_handle_create(file_priv, gem_obj, &args->handle);
> +       drm_gem_object_put(gem_obj);
> +       if (ret) {
> +               drm_gem_cma_free_object(gem_obj);
> +               return ret;
> +       }
> +       args->offset = drm_vma_node_offset_addr(&gem_obj->vma_node);
> +
> +       return 0;
> +}
> +
> +void apu_bo_iommu_unmap(struct apu_drm *apu_drm, struct apu_gem_object *obj)
> +{
> +       int iova_pfn;
> +       int i;
> +
> +       if (!obj->iommu_sgt)
> +               return;
> +
> +       mutex_lock(&obj->mutex);
> +       obj->iommu_refcount--;
> +       if (obj->iommu_refcount) {
> +               mutex_unlock(&obj->mutex);
> +               return;
> +       }
> +
> +       iova_pfn = PHYS_PFN(obj->iova);
> +       for (i = 0; i < obj->iommu_sgt->nents; i++) {
> +               iommu_unmap(apu_drm->domain, PFN_PHYS(iova_pfn),
> +                           PAGE_ALIGN(obj->iommu_sgt->sgl[i].length));
> +               iova_pfn += PHYS_PFN(PAGE_ALIGN(obj->iommu_sgt->sgl[i].length));
> +       }
> +
> +       sg_free_table(obj->iommu_sgt);
> +       kfree(obj->iommu_sgt);
> +
> +       free_iova(&apu_drm->iovad, PHYS_PFN(obj->iova));
> +       mutex_unlock(&obj->mutex);
> +}
> +
> +static struct sg_table *apu_get_sg_table(struct drm_gem_object *obj)
> +{
> +       if (obj->funcs)
> +               return obj->funcs->get_sg_table(obj);
> +       return NULL;
> +}
> +
> +int apu_bo_iommu_map(struct apu_drm *apu_drm, struct drm_gem_object *obj)
> +{
> +       struct apu_gem_object *apu_obj = to_apu_bo(obj);
> +       struct scatterlist *sgl;
> +       phys_addr_t phys;
> +       int total_buf_space;
> +       int iova_pfn;
> +       int iova;
> +       int ret;
> +       int i;
> +
> +       mutex_lock(&apu_obj->mutex);
> +       apu_obj->iommu_refcount++;
> +       if (apu_obj->iommu_refcount != 1) {
> +               mutex_unlock(&apu_obj->mutex);
> +               return 0;
> +       }
> +
> +       apu_obj->iommu_sgt = apu_get_sg_table(obj);
> +       if (IS_ERR(apu_obj->iommu_sgt)) {
> +               mutex_unlock(&apu_obj->mutex);
> +               return PTR_ERR(apu_obj->iommu_sgt);
> +       }
> +
> +       total_buf_space = obj->size;
> +       iova_pfn = alloc_iova_fast(&apu_drm->iovad,
> +                                  total_buf_space >> PAGE_SHIFT,
> +                                  apu_drm->iova_limit_pfn, true);
> +       apu_obj->iova = PFN_PHYS(iova_pfn);
> +
> +       if (!iova_pfn) {
> +               dev_err(apu_drm->dev, "Failed to allocate iova address\n");
> +               mutex_unlock(&apu_obj->mutex);
> +               return -ENOMEM;
> +       }
> +
> +       iova = apu_obj->iova;
> +       sgl = apu_obj->iommu_sgt->sgl;
> +       for (i = 0; i < apu_obj->iommu_sgt->nents; i++) {
> +               phys = page_to_phys(sg_page(&sgl[i]));
> +               ret =
> +                   iommu_map(apu_drm->domain, PFN_PHYS(iova_pfn), phys,
> +                             PAGE_ALIGN(sgl[i].length),
> +                             IOMMU_READ | IOMMU_WRITE);
> +               if (ret) {
> +                       dev_err(apu_drm->dev, "Failed to iommu map\n");
> +                       free_iova(&apu_drm->iovad, iova_pfn);
> +                       mutex_unlock(&apu_obj->mutex);
> +                       return ret;
> +               }
> +               iova += sgl[i].offset + sgl[i].length;
> +               iova_pfn += PHYS_PFN(PAGE_ALIGN(sgl[i].length));
> +       }
> +       mutex_unlock(&apu_obj->mutex);
> +
> +       return 0;
> +}
> +
> +int ioctl_gem_iommu_map(struct drm_device *dev, void *data,
> +                       struct drm_file *file_priv)
> +{
> +       struct apu_drm *apu_drm = dev->dev_private;
> +       struct drm_apu_gem_iommu_map *args = data;
> +       struct drm_gem_object **bos;
> +       void __user *bo_handles;
> +       int ret;
> +       int i;
> +
> +       u64 *das = kvmalloc_array(args->bo_handle_count,
> +                                 sizeof(u64), GFP_KERNEL);
> +       if (!das)
> +               return -ENOMEM;
> +
> +       bo_handles = (void __user *)(uintptr_t) args->bo_handles;
> +       ret = drm_gem_objects_lookup(file_priv, bo_handles,
> +                                    args->bo_handle_count, &bos);
> +       if (ret) {
> +               kvfree(das);
> +               return ret;
> +       }
> +
> +       for (i = 0; i < args->bo_handle_count; i++) {
> +               ret = apu_bo_iommu_map(apu_drm, bos[i]);
> +               if (ret) {
> +                       /* TODO: handle error */
> +                       break;
> +               }
> +               das[i] = to_apu_bo(bos[i])->iova + to_apu_bo(bos[i])->offset;
> +       }
> +
> +       if (copy_to_user((void *)args->bo_device_addresses, das,
> +                        args->bo_handle_count * sizeof(u64))) {
> +               ret = -EFAULT;
> +               DRM_DEBUG("Failed to copy device addresses\n");
> +               goto out;
> +       }
> +
> +out:
> +       kvfree(das);
> +       kvfree(bos);
> +
> +       return 0;
> +}
> +
> +int ioctl_gem_iommu_unmap(struct drm_device *dev, void *data,
> +                         struct drm_file *file_priv)
> +{
> +       struct apu_drm *apu_drm = dev->dev_private;
> +       struct drm_apu_gem_iommu_map *args = data;
> +       struct drm_gem_object **bos;
> +       void __user *bo_handles;
> +       int ret;
> +       int i;
> +
> +       bo_handles = (void __user *)(uintptr_t) args->bo_handles;
> +       ret = drm_gem_objects_lookup(file_priv, bo_handles,
> +                                    args->bo_handle_count, &bos);
> +       if (ret)
> +               return ret;
> +
> +       for (i = 0; i < args->bo_handle_count; i++)
> +               apu_bo_iommu_unmap(apu_drm, to_apu_bo(bos[i]));
> +
> +       kvfree(bos);
> +
> +       return 0;
> +}
> diff --git a/drivers/gpu/drm/apu/apu_internal.h b/drivers/gpu/drm/apu/apu_internal.h
> new file mode 100644
> index 0000000000000..b789b2f3ad9c6
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/apu_internal.h
> @@ -0,0 +1,89 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __APU_INTERNAL_H__
> +#define __APU_INTERNAL_H__
> +
> +#include <linux/iova.h>
> +
> +#include <drm/drm_drv.h>
> +#include <drm/drm_gem_cma_helper.h>
> +#include <drm/gpu_scheduler.h>
> +
> +struct apu_gem_object {
> +       struct drm_gem_cma_object base;
> +       struct mutex mutex;
> +       struct sg_table *iommu_sgt;
> +       int iommu_refcount;
> +       size_t size;
> +       u32 iova;
> +       u32 offset;
> +};
> +
> +struct apu_sched;
> +struct apu_core {
> +       int device_id;
> +       struct device *dev;
> +       struct rproc *rproc;
> +       struct apu_drm_ops *ops;
> +       struct apu_drm *apu_drm;
> +
> +       spinlock_t ctx_lock;
> +       struct list_head requests;
> +
> +       struct list_head node;
> +       void *priv;
> +
> +       struct apu_sched *sched;
> +       u32 flags;
> +};
> +
> +struct apu_drm {
> +       struct device *dev;
> +       struct drm_device *drm;
> +
> +       struct iommu_domain *domain;
> +       struct iova_domain iovad;
> +       int iova_limit_pfn;
> +
> +       struct list_head apu_cores;
> +       struct list_head node;
> +};
> +
> +static inline struct apu_gem_object *to_apu_bo(struct drm_gem_object *obj)
> +{
> +       return container_of(to_drm_gem_cma_obj(obj), struct apu_gem_object,
> +                           base);
> +}
> +
> +struct apu_gem_object *to_apu_bo(struct drm_gem_object *obj);
> +struct drm_gem_object *apu_gem_create_object(struct drm_device *dev,
> +                                            size_t size);
> +
> +int apu_bo_iommu_map(struct apu_drm *apu_drm, struct drm_gem_object *obj);
> +void apu_bo_iommu_unmap(struct apu_drm *apu_drm, struct apu_gem_object *obj);
> +struct drm_gem_object *apu_gem_create_object(struct drm_device *dev,
> +                                            size_t size);
> +int ioctl_gem_new(struct drm_device *dev, void *data,
> +                 struct drm_file *file_priv);
> +int ioctl_gem_user_new(struct drm_device *dev, void *data,
> +                      struct drm_file *file_priv);
> +int ioctl_gem_iommu_map(struct drm_device *dev, void *data,
> +                       struct drm_file *file_priv);
> +int ioctl_gem_iommu_unmap(struct drm_device *dev, void *data,
> +                         struct drm_file *file_priv);
> +int ioctl_gem_queue(struct drm_device *dev, void *data,
> +                   struct drm_file *file_priv);
> +int ioctl_gem_dequeue(struct drm_device *dev, void *data,
> +                     struct drm_file *file_priv);
> +int ioctl_apu_state(struct drm_device *dev, void *data,
> +                   struct drm_file *file_priv);
> +struct dma_buf *apu_gem_prime_export(struct drm_gem_object *gem,
> +                                    int flags);
> +
> +struct apu_job;
> +
> +int apu_drm_job_init(struct apu_core *core);
> +void apu_sched_fini(struct apu_core *core);
> +int apu_job_push(struct apu_job *job);
> +void apu_job_put(struct apu_job *job);
> +
> +#endif /* __APU_INTERNAL_H__ */
> diff --git a/drivers/gpu/drm/apu/apu_sched.c b/drivers/gpu/drm/apu/apu_sched.c
> new file mode 100644
> index 0000000000000..cebb0155c7783
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/apu_sched.c
> @@ -0,0 +1,634 @@
> +// SPDX-License-Identifier: GPL-2.0
> +//
> +// Copyright 2020 BayLibre SAS
> +
> +#include <drm/apu_drm.h>
> +#include <drm/drm_drv.h>
> +#include <drm/drm_gem_cma_helper.h>
> +#include <drm/drm_syncobj.h>
> +#include <drm/gpu_scheduler.h>
> +
> +#include <uapi/drm/apu_drm.h>
> +
> +#include "apu_internal.h"
> +
> +struct apu_queue_state {
> +       struct drm_gpu_scheduler sched;
> +
> +       u64 fence_context;
> +       u64 seqno;
> +};
> +
> +struct apu_request {
> +       struct list_head node;
> +       void *job;
> +};
> +
> +struct apu_sched {
> +       struct apu_queue_state apu_queue;
> +       spinlock_t job_lock;
> +       struct drm_sched_entity sched_entity;
> +};
> +
> +struct apu_event {
> +       struct drm_pending_event pending_event;
> +       union {
> +               struct drm_event base;
> +               struct apu_job_event job_event;
> +       };
> +};
> +
> +struct apu_job {
> +       struct drm_sched_job base;
> +
> +       struct kref refcount;
> +
> +       struct apu_core *apu_core;
> +       struct apu_drm *apu_drm;
> +
> +       /* Fence to be signaled by IRQ handler when the job is complete. */
> +       struct dma_fence *done_fence;
> +
> +       __u32 cmd;
> +
> +       /* Exclusive fences we have taken from the BOs to wait for */
> +       struct dma_fence **implicit_fences;
> +       struct drm_gem_object **bos;
> +       u32 bo_count;
> +
> +       /* Fence to be signaled by drm-sched once its done with the job */
> +       struct dma_fence *render_done_fence;
> +
> +       void *data_in;
> +       uint16_t size_in;
> +       void *data_out;
> +       uint16_t size_out;
> +       uint16_t result;
> +       uint16_t id;
> +
> +       struct list_head node;
> +       struct drm_syncobj *sync_out;
> +
> +       struct apu_event *event;
> +};
> +
> +static DEFINE_IDA(req_ida);
> +static LIST_HEAD(complete_node);
> +
> +int apu_drm_callback(struct apu_core *apu_core, void *data, int len)
> +{
> +       struct apu_request *apu_req, *tmp;
> +       struct apu_dev_request *hdr = data;
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&apu_core->ctx_lock, flags);
> +       list_for_each_entry_safe(apu_req, tmp, &apu_core->requests, node) {
> +               struct apu_job *job = apu_req->job;
> +
> +               if (job && hdr->id == job->id) {
> +                       kref_get(&job->refcount);
> +                       job->result = hdr->result;
> +                       if (job->size_out)
> +                               memcpy(job->data_out, hdr->data + job->size_in,
> +                                      min(job->size_out, hdr->size_out));
> +                       job->size_out = hdr->size_out;
> +                       list_add(&job->node, &complete_node);
> +                       list_del(&apu_req->node);
> +                       ida_simple_remove(&req_ida, hdr->id);
> +                       kfree(apu_req);
> +                       drm_send_event(job->apu_drm->drm,
> +                                      &job->event->pending_event);
> +                       dma_fence_signal_locked(job->done_fence);
> +               }
> +       }
> +       spin_unlock_irqrestore(&apu_core->ctx_lock, flags);
> +
> +       return 0;
> +}
> +
> +void apu_sched_fini(struct apu_core *core)
> +{
> +       drm_sched_fini(&core->sched->apu_queue.sched);
> +       devm_kfree(core->dev, core->sched);
> +       core->flags &= ~APU_ONLINE;
> +       core->sched = NULL;
> +}
> +
> +static void apu_job_cleanup(struct kref *ref)
> +{
> +       struct apu_job *job = container_of(ref, struct apu_job,
> +                                          refcount);
> +       unsigned int i;
> +
> +       if (job->implicit_fences) {
> +               for (i = 0; i < job->bo_count; i++)
> +                       dma_fence_put(job->implicit_fences[i]);
> +               kvfree(job->implicit_fences);
> +       }
> +       dma_fence_put(job->done_fence);
> +       dma_fence_put(job->render_done_fence);
> +
> +       if (job->bos) {
> +               for (i = 0; i < job->bo_count; i++) {
> +                       struct apu_gem_object *apu_obj;
> +
> +                       apu_obj = to_apu_bo(job->bos[i]);
> +                       apu_bo_iommu_unmap(job->apu_drm, apu_obj);
> +                       drm_gem_object_put(job->bos[i]);
> +               }
> +
> +               kvfree(job->bos);
> +       }
> +
> +       kfree(job->data_out);
> +       kfree(job->data_in);
> +       kfree(job);
> +}
> +
> +void apu_job_put(struct apu_job *job)
> +{
> +       kref_put(&job->refcount, apu_job_cleanup);
> +}
> +
> +static void apu_acquire_object_fences(struct drm_gem_object **bos,
> +                                     int bo_count,
> +                                     struct dma_fence **implicit_fences)
> +{
> +       int i;
> +
> +       for (i = 0; i < bo_count; i++)
> +               implicit_fences[i] = dma_resv_get_excl_unlocked(bos[i]->resv);
> +}
> +
> +static void apu_attach_object_fences(struct drm_gem_object **bos,
> +                                    int bo_count, struct dma_fence *fence)
> +{
> +       int i;
> +
> +       for (i = 0; i < bo_count; i++)
> +               dma_resv_add_excl_fence(bos[i]->resv, fence);
> +}
> +
> +int apu_job_push(struct apu_job *job)
> +{
> +       struct drm_sched_entity *entity = &job->apu_core->sched->sched_entity;
> +       struct ww_acquire_ctx acquire_ctx;
> +       int ret = 0;
> +
> +       ret = drm_gem_lock_reservations(job->bos, job->bo_count, &acquire_ctx);
> +       if (ret)
> +               return ret;
> +
> +       ret = drm_sched_job_init(&job->base, entity, NULL);
> +       if (ret)
> +               goto unlock;
> +
> +       job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
> +
> +       kref_get(&job->refcount);       /* put by scheduler job completion */
> +
> +       apu_acquire_object_fences(job->bos, job->bo_count,
> +                                 job->implicit_fences);
> +
> +       drm_sched_entity_push_job(&job->base, entity);
> +
> +       apu_attach_object_fences(job->bos, job->bo_count,
> +                                job->render_done_fence);
> +
> +unlock:
> +       drm_gem_unlock_reservations(job->bos, job->bo_count, &acquire_ctx);
> +
> +       return ret;
> +}
> +
> +static const char *apu_fence_get_driver_name(struct dma_fence *fence)
> +{
> +       return "apu";
> +}
> +
> +static const char *apu_fence_get_timeline_name(struct dma_fence *fence)
> +{
> +       return "apu-0";
> +}
> +
> +static void apu_fence_release(struct dma_fence *f)
> +{
> +       kfree(f);
> +}
> +
> +static const struct dma_fence_ops apu_fence_ops = {
> +       .get_driver_name = apu_fence_get_driver_name,
> +       .get_timeline_name = apu_fence_get_timeline_name,
> +       .release = apu_fence_release,
> +};
> +
> +static struct dma_fence *apu_fence_create(struct apu_sched *sched)
> +{
> +       struct dma_fence *fence;
> +       struct apu_queue_state *apu_queue = &sched->apu_queue;
> +
> +       fence = kzalloc(sizeof(*fence), GFP_KERNEL);
> +       if (!fence)
> +               return ERR_PTR(-ENOMEM);
> +
> +       dma_fence_init(fence, &apu_fence_ops, &sched->job_lock,
> +                      apu_queue->fence_context, apu_queue->seqno++);
> +
> +       return fence;
> +}
> +
> +static struct apu_job *to_apu_job(struct drm_sched_job *sched_job)
> +{
> +       return container_of(sched_job, struct apu_job, base);
> +}
> +
> +static struct dma_fence *apu_job_dependency(struct drm_sched_job *sched_job,
> +                                           struct drm_sched_entity *s_entity)
> +{
> +       struct apu_job *job = to_apu_job(sched_job);
> +       struct dma_fence *fence;
> +       unsigned int i;
> +
> +       /* Implicit fences, max. one per BO */
> +       for (i = 0; i < job->bo_count; i++) {
> +               if (job->implicit_fences[i]) {
> +                       fence = job->implicit_fences[i];
> +                       job->implicit_fences[i] = NULL;
> +                       return fence;
> +               }
> +       }
> +
> +       return NULL;
> +}
> +
> +static int apu_job_hw_submit(struct apu_job *job)
> +{
> +       int ret;
> +       struct apu_core *apu_core = job->apu_core;
> +       struct apu_dev_request *dev_req;
> +       struct apu_request *apu_req;
> +       unsigned long flags;
> +
> +       int size = sizeof(*dev_req) + sizeof(u32) * job->bo_count * 2;
> +       u32 *dev_req_da;
> +       u32 *dev_req_buffer_size;
> +       int i;
> +
> +       dev_req = kmalloc(size + job->size_in + job->size_out, GFP_KERNEL);
> +       if (!dev_req)
> +               return -ENOMEM;
> +
> +       dev_req->cmd = job->cmd;
> +       dev_req->size_in = job->size_in;
> +       dev_req->size_out = job->size_out;
> +       dev_req->count = job->bo_count;
> +       dev_req_da =
> +           (u32 *) (dev_req->data + dev_req->size_in + dev_req->size_out);
> +       dev_req_buffer_size = (u32 *) (dev_req_da + dev_req->count);
> +       memcpy(dev_req->data, job->data_in, job->size_in);
> +
> +       apu_req = kzalloc(sizeof(*apu_req), GFP_KERNEL);
> +       if (!apu_req)
> +               return -ENOMEM;
> +
> +       for (i = 0; i < job->bo_count; i++) {
> +               struct apu_gem_object *obj = to_apu_bo(job->bos[i]);
> +
> +               dev_req_da[i] = obj->iova + obj->offset;
> +               dev_req_buffer_size[i] = obj->size;
> +       }
> +
> +       ret = ida_simple_get(&req_ida, 0, 0xffff, GFP_KERNEL);
> +       if (ret < 0)
> +               goto err_free_memory;
> +
> +       dev_req->id = ret;
> +
> +       job->id = dev_req->id;
> +       apu_req->job = job;
> +       spin_lock_irqsave(&apu_core->ctx_lock, flags);
> +       list_add(&apu_req->node, &apu_core->requests);
> +       spin_unlock_irqrestore(&apu_core->ctx_lock, flags);
> +       ret =
> +           apu_core->ops->send(apu_core, dev_req,
> +                               size + dev_req->size_in + dev_req->size_out);
> +       if (ret < 0)
> +               goto err;
> +       kfree(dev_req);
> +
> +       return 0;
> +
> +err:
> +       list_del(&apu_req->node);
> +       ida_simple_remove(&req_ida, dev_req->id);
> +err_free_memory:
> +       kfree(apu_req);
> +       kfree(dev_req);
> +
> +       return ret;
> +}
> +
> +static struct dma_fence *apu_job_run(struct drm_sched_job *sched_job)
> +{
> +       struct apu_job *job = to_apu_job(sched_job);
> +       struct dma_fence *fence = NULL;
> +
> +       if (unlikely(job->base.s_fence->finished.error))
> +               return NULL;
> +
> +       fence = apu_fence_create(job->apu_core->sched);
> +       if (IS_ERR(fence))
> +               return NULL;
> +
> +       job->done_fence = dma_fence_get(fence);
> +
> +       apu_job_hw_submit(job);
> +
> +       return fence;
> +}
> +
> +static void apu_update_rpoc_state(struct apu_core *core)
> +{
> +       if (core->rproc) {
> +               if (core->rproc->state == RPROC_CRASHED)
> +                       core->flags |= APU_CRASHED;
> +               if (core->rproc->state == RPROC_OFFLINE)
> +                       core->flags &= ~APU_ONLINE;
> +       }
> +}
> +
> +static enum drm_gpu_sched_stat apu_job_timedout(struct drm_sched_job *sched_job)
> +{
> +       struct apu_request *apu_req, *tmp;
> +       struct apu_job *job = to_apu_job(sched_job);
> +
> +       if (dma_fence_is_signaled(job->done_fence))
> +               return DRM_GPU_SCHED_STAT_NOMINAL;
> +
> +       list_for_each_entry_safe(apu_req, tmp, &job->apu_core->requests, node) {
> +               /* Remove the request and notify user about timeout */
> +               if (apu_req->job == job) {
> +                       kref_get(&job->refcount);
> +                       job->apu_core->flags |= APU_TIMEDOUT;
> +                       apu_update_rpoc_state(job->apu_core);
> +                       job->result = ETIMEDOUT;
> +                       list_add(&job->node, &complete_node);
> +                       list_del(&apu_req->node);
> +                       ida_simple_remove(&req_ida, job->id);
> +                       kfree(apu_req);
> +                       drm_send_event(job->apu_drm->drm,
> +                                      &job->event->pending_event);
> +                       dma_fence_signal_locked(job->done_fence);
> +               }
> +       }
> +
> +       return DRM_GPU_SCHED_STAT_NOMINAL;
> +}
> +
> +static void apu_job_free(struct drm_sched_job *sched_job)
> +{
> +       struct apu_job *job = to_apu_job(sched_job);
> +
> +       drm_sched_job_cleanup(sched_job);
> +
> +       apu_job_put(job);
> +}
> +
> +static const struct drm_sched_backend_ops apu_sched_ops = {
> +       .dependency = apu_job_dependency,
> +       .run_job = apu_job_run,
> +       .timedout_job = apu_job_timedout,
> +       .free_job = apu_job_free
> +};
> +
> +int apu_drm_job_init(struct apu_core *core)
> +{
> +       int ret;
> +       struct apu_sched *apu_sched;
> +       struct drm_gpu_scheduler *sched;
> +
> +       apu_sched = devm_kzalloc(core->dev, sizeof(*apu_sched), GFP_KERNEL);
> +       if (!apu_sched)
> +               return -ENOMEM;
> +
> +       sched = &apu_sched->apu_queue.sched;
> +       apu_sched->apu_queue.fence_context = dma_fence_context_alloc(1);
> +       ret = drm_sched_init(sched, &apu_sched_ops,
> +                            1, 0, msecs_to_jiffies(500),
> +                            NULL, NULL, "apu_js");
> +       if (ret) {
> +               dev_err(core->dev, "Failed to create scheduler: %d.", ret);
> +               return ret;
> +       }
> +
> +       ret = drm_sched_entity_init(&apu_sched->sched_entity,
> +                                   DRM_SCHED_PRIORITY_NORMAL,
> +                                   &sched, 1, NULL);
> +
> +       core->sched = apu_sched;
> +       core->flags = APU_ONLINE;
> +
> +       return ret;
> +}
> +
> +static struct apu_core *get_apu_core(struct apu_drm *apu_drm, int device_id)
> +{
> +       struct apu_core *apu_core;
> +
> +       list_for_each_entry(apu_core, &apu_drm->apu_cores, node) {
> +               if (apu_core->device_id == device_id)
> +                       return apu_core;
> +       }
> +
> +       return NULL;
> +}
> +
> +static int apu_core_is_running(struct apu_core *core)
> +{
> +       return core->ops && core->priv && core->sched;
> +}
> +
> +static int
> +apu_lookup_bos(struct drm_device *dev,
> +              struct drm_file *file_priv,
> +              struct drm_apu_gem_queue *args, struct apu_job *job)
> +{
> +       void __user *bo_handles;
> +       unsigned int i;
> +       int ret;
> +
> +       job->bo_count = args->bo_handle_count;
> +
> +       if (!job->bo_count)
> +               return 0;
> +
> +       job->implicit_fences = kvmalloc_array(job->bo_count,
> +                                             sizeof(struct dma_fence *),
> +                                             GFP_KERNEL | __GFP_ZERO);
> +       if (!job->implicit_fences)
> +               return -ENOMEM;
> +
> +       bo_handles = (void __user *)(uintptr_t) args->bo_handles;
> +       ret = drm_gem_objects_lookup(file_priv, bo_handles,
> +                                    job->bo_count, &job->bos);
> +       if (ret)
> +               return ret;
> +
> +       for (i = 0; i < job->bo_count; i++) {
> +               ret = apu_bo_iommu_map(job->apu_drm, job->bos[i]);
> +               if (ret) {
> +                       /* TODO: handle error */
> +                       break;
> +               }
> +       }
> +
> +       return ret;
> +}
> +
> +int ioctl_gem_queue(struct drm_device *dev, void *data,
> +                   struct drm_file *file_priv)
> +{
> +       struct apu_drm *apu_drm = dev->dev_private;
> +       struct drm_apu_gem_queue *args = data;
> +       struct apu_event *event;
> +       struct apu_core *core;
> +       struct drm_syncobj *sync_out = NULL;
> +       struct apu_job *job;
> +       int ret = 0;
> +
> +       core = get_apu_core(apu_drm, args->device);
> +       if (!apu_core_is_running(core))
> +               return -ENODEV;
> +
> +       if (args->out_sync > 0) {
> +               sync_out = drm_syncobj_find(file_priv, args->out_sync);
> +               if (!sync_out)
> +                       return -ENODEV;
> +       }
> +
> +       job = kzalloc(sizeof(*job), GFP_KERNEL);
> +       if (!job) {
> +               ret = -ENOMEM;
> +               goto fail_out_sync;
> +       }
> +
> +       kref_init(&job->refcount);
> +
> +       job->apu_drm = apu_drm;
> +       job->apu_core = core;
> +       job->cmd = args->cmd;
> +       job->size_in = args->size_in;
> +       job->size_out = args->size_out;
> +       job->sync_out = sync_out;
> +       if (job->size_in) {
> +               job->data_in = kmalloc(job->size_in, GFP_KERNEL);
> +               if (!job->data_in) {
> +                       ret = -ENOMEM;
> +                       goto fail_job;
> +               }
> +
> +               ret =
> +                   copy_from_user(job->data_in,
> +                                  (void __user *)(uintptr_t) args->data,
> +                                  job->size_in);
> +               if (ret)
> +                       goto fail_job;
> +       }
> +
> +       if (job->size_out) {
> +               job->data_out = kmalloc(job->size_out, GFP_KERNEL);
> +               if (!job->data_out) {
> +                       ret = -ENOMEM;
> +                       goto fail_job;
> +               }
> +       }
> +
> +       ret = apu_lookup_bos(dev, file_priv, args, job);
> +       if (ret)
> +               goto fail_job;
> +
> +       event = kzalloc(sizeof(*event), GFP_KERNEL);
> +       event->base.length = sizeof(struct apu_job_event);
> +       event->base.type = APU_JOB_COMPLETED;
> +       event->job_event.out_sync = args->out_sync;
> +       job->event = event;
> +       ret = drm_event_reserve_init(dev, file_priv, &job->event->pending_event,
> +                                    &job->event->base);
> +       if (ret)
> +               goto fail_job;
> +
> +       ret = apu_job_push(job);
> +       if (ret) {
> +               drm_event_cancel_free(dev, &job->event->pending_event);
> +               goto fail_job;
> +       }
> +
> +       /* Update the return sync object for the job */
> +       if (sync_out)
> +               drm_syncobj_replace_fence(sync_out, job->render_done_fence);
> +
> +fail_job:
> +       apu_job_put(job);
> +fail_out_sync:
> +       if (sync_out)
> +               drm_syncobj_put(sync_out);
> +
> +       return ret;
> +}
> +
> +int ioctl_gem_dequeue(struct drm_device *dev, void *data,
> +                     struct drm_file *file_priv)
> +{
> +       struct drm_apu_gem_dequeue *args = data;
> +       struct drm_syncobj *sync_out = NULL;
> +       struct apu_job *job;
> +       int ret = 0;
> +
> +       if (args->out_sync > 0) {
> +               sync_out = drm_syncobj_find(file_priv, args->out_sync);
> +               if (!sync_out)
> +                       return -ENODEV;
> +       }
> +
> +       list_for_each_entry(job, &complete_node, node) {
> +               if (job->sync_out == sync_out) {
> +                       if (job->data_out) {
> +                               ret = copy_to_user((void __user *)(uintptr_t)
> +                                                  args->data, job->data_out,
> +                                                  job->size_out);
> +                               args->size = job->size_out;
> +                       }
> +                       args->result = job->result;
> +                       list_del(&job->node);
> +                       apu_job_put(job);
> +                       drm_syncobj_put(sync_out);
> +
> +                       return ret;
> +               }
> +       }
> +
> +       if (sync_out)
> +               drm_syncobj_put(sync_out);
> +
> +       return 0;
> +}
> +
> +int ioctl_apu_state(struct drm_device *dev, void *data,
> +                   struct drm_file *file_priv)
> +{
> +       struct apu_drm *apu_drm = dev->dev_private;
> +       struct drm_apu_state *args = data;
> +       struct apu_core *core;
> +
> +       args->flags = 0;
> +
> +       core = get_apu_core(apu_drm, args->device);
> +       if (!core)
> +               return -ENODEV;
> +       args->flags |= core->flags;
> +
> +       /* Reset APU flags */
> +       core->flags &= ~(APU_TIMEDOUT | APU_CRASHED);
> +
> +       return 0;
> +}
> diff --git a/include/drm/apu_drm.h b/include/drm/apu_drm.h
> new file mode 100644
> index 0000000000000..f044ed0427fdd
> --- /dev/null
> +++ b/include/drm/apu_drm.h
> @@ -0,0 +1,59 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __APU_DRM_H__
> +#define __APU_DRM_H__
> +
> +#include <linux/iova.h>
> +#include <linux/remoteproc.h>
> +
> +struct apu_core;
> +struct apu_drm;
> +
> +struct apu_drm_ops {
> +       int (*send)(struct apu_core *apu_core, void *data, int len);
> +       int (*callback)(struct apu_core *apu_core, void *data, int len);
> +};
> +
> +#ifdef CONFIG_DRM_APU
> +
> +struct apu_core *apu_drm_register_core(struct rproc *rproc,
> +                                      struct apu_drm_ops *ops, void *priv);
> +int apu_drm_reserve_iova(struct apu_core *apu_core, u64 start, u64 size);
> +int apu_drm_unregister_core(void *priv);
> +int apu_drm_callback(struct apu_core *apu_core, void *data, int len);
> +void *apu_drm_priv(struct apu_core *apu_core);
> +
> +#else /* CONFIG_DRM_APU */
> +
> +static inline
> +struct apu_core *apu_drm_register_core(struct rproc *rproc,
> +                                      struct apu_drm_ops *ops, void *priv)
> +{
> +       return NULL;
> +}
> +
> +static inline
> +int apu_drm_reserve_iova(struct apu_core *apu_core, u64 start, u64 size)
> +{
> +       return -ENOMEM;
> +}
> +
> +static inline
> +int apu_drm_uregister_core(void *priv)
> +{
> +       return -ENODEV;
> +}
> +
> +static inline
> +int apu_drm_callback(struct apu_core *apu_core, void *data, int len)
> +{
> +       return -ENODEV;
> +}
> +
> +static inline void *apu_drm_priv(struct apu_core *apu_core)
> +{
> +       return NULL;
> +}
> +#endif /* CONFIG_DRM_APU */
> +
> +
> +#endif /* __APU_DRM_H__ */
> diff --git a/include/uapi/drm/apu_drm.h b/include/uapi/drm/apu_drm.h
> new file mode 100644
> index 0000000000000..c52e187bb0599
> --- /dev/null
> +++ b/include/uapi/drm/apu_drm.h
> @@ -0,0 +1,106 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __UAPI_APU_DRM_H__
> +#define __UAPI_APU_DRM_H__
> +
> +#include "drm.h"
> +
> +#if defined(__cplusplus)
> +extern "C" {
> +#endif
> +
> +#define APU_JOB_COMPLETED 0x80000000
> +
> +/*
> + * Please note that modifications to all structs defined here are
> + * subject to backwards-compatibility constraints.
> + */
> +
> +/*
> + * Firmware request, must be aligned with the one defined in firmware.
> + * @id: Request id, used in the case of reply, to find the pending request
> + * @cmd: The command id to execute in the firmware
> + * @result: The result of the command executed on the firmware
> + * @size: The size of the data available in this request
> + * @count: The number of shared buffer
> + * @data: Contains the data attached with the request if size is greater than
> + *        zero, and the addresses of shared buffers if count is greater than
> + *        zero. Both the data and the shared buffer could be read and write
> + *        by the APU.
> + */
> +struct  apu_dev_request {
> +       u16 id;
> +       u16 cmd;
> +       u16 result;
> +       u16 size_in;
> +       u16 size_out;
> +       u16 count;
> +       u8 data[0];
> +} __packed;
> +
> +struct drm_apu_gem_new {
> +       __u32 size;                     /* in */
> +       __u32 flags;                    /* in */
> +       __u32 handle;                   /* out */
> +       __u64 offset;                   /* out */
> +};
> +

Please refer to
https://www.kernel.org/doc/Documentation/ioctl/botching-up-ioctls.rst

here and below in many places.

There's a lot of missing padding/alignment here.

I'm trying to find the time to review this stack in full, any writeups
on how this is used from userspace would be useful (not just the code
repo, but some sort of how do I get at it) it reads as kinda generic
(calling it apu), but then has some specifics around device binding.

Dave.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 2/4] DRM: Add support of AI Processor Unit (APU)
@ 2021-09-23  0:58     ` Dave Airlie
  0 siblings, 0 replies; 34+ messages in thread
From: Dave Airlie @ 2021-09-23  0:58 UTC (permalink / raw)
  To: Alexandre Bailon
  Cc: Dave Airlie, Daniel Vetter, Rob Herring, Matthias Brugger,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, ohad,
	bjorn.andersson, Mathieu Poirier, Sumit Semwal, Koenig,
	Christian, dri-devel,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	linux-arm-kernel, moderated list:ARM/Mediatek SoC support, LKML,
	linux-remoteproc, Linux Media Mailing List,
	moderated list:DMA BUFFER SHARING FRAMEWORK, khilman, gpain

On Sat, 18 Sept 2021 at 07:57, Alexandre Bailon <abailon@baylibre.com> wrote:
>
> Some Mediatek SoC provides hardware accelerator for AI / ML.
> This driver provides the infrastructure to manage memory
> shared between host CPU and the accelerator, and to submit
> jobs to the accelerator.
> The APU itself is managed by remoteproc so this drivers
> relies on remoteproc to found the APU and get some important data
> from it. But, the driver is quite generic and it should possible
> to manage accelerator using another ways.
> This driver doesn't manage itself the data transmitions.
> It must be registered by another driver implementing the transmitions.
>
> Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
> ---
>  drivers/gpu/drm/Kconfig            |   2 +
>  drivers/gpu/drm/Makefile           |   1 +
>  drivers/gpu/drm/apu/Kconfig        |  10 +
>  drivers/gpu/drm/apu/Makefile       |   7 +
>  drivers/gpu/drm/apu/apu_drm_drv.c  | 238 +++++++++++
>  drivers/gpu/drm/apu/apu_gem.c      | 232 +++++++++++
>  drivers/gpu/drm/apu/apu_internal.h |  89 ++++
>  drivers/gpu/drm/apu/apu_sched.c    | 634 +++++++++++++++++++++++++++++
>  include/drm/apu_drm.h              |  59 +++
>  include/uapi/drm/apu_drm.h         | 106 +++++
>  10 files changed, 1378 insertions(+)
>  create mode 100644 drivers/gpu/drm/apu/Kconfig
>  create mode 100644 drivers/gpu/drm/apu/Makefile
>  create mode 100644 drivers/gpu/drm/apu/apu_drm_drv.c
>  create mode 100644 drivers/gpu/drm/apu/apu_gem.c
>  create mode 100644 drivers/gpu/drm/apu/apu_internal.h
>  create mode 100644 drivers/gpu/drm/apu/apu_sched.c
>  create mode 100644 include/drm/apu_drm.h
>  create mode 100644 include/uapi/drm/apu_drm.h
>
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index 8fc40317f2b77..bcdca35c9eda5 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -382,6 +382,8 @@ source "drivers/gpu/drm/xlnx/Kconfig"
>
>  source "drivers/gpu/drm/gud/Kconfig"
>
> +source "drivers/gpu/drm/apu/Kconfig"
> +
>  config DRM_HYPERV
>         tristate "DRM Support for Hyper-V synthetic video device"
>         depends on DRM && PCI && MMU && HYPERV
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index ad11121548983..f3d8432976558 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -127,4 +127,5 @@ obj-$(CONFIG_DRM_MCDE) += mcde/
>  obj-$(CONFIG_DRM_TIDSS) += tidss/
>  obj-y                  += xlnx/
>  obj-y                  += gud/
> +obj-$(CONFIG_DRM_APU) += apu/
>  obj-$(CONFIG_DRM_HYPERV) += hyperv/
> diff --git a/drivers/gpu/drm/apu/Kconfig b/drivers/gpu/drm/apu/Kconfig
> new file mode 100644
> index 0000000000000..c8471309a0351
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/Kconfig
> @@ -0,0 +1,10 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +
> +config DRM_APU
> +       tristate "APU (AI Processor Unit)"
> +       select REMOTEPROC
> +       select DRM_SCHED
> +       help
> +         This provides a DRM driver that provides some facilities to
> +         communicate with an accelerated processing unit (APU).
> diff --git a/drivers/gpu/drm/apu/Makefile b/drivers/gpu/drm/apu/Makefile
> new file mode 100644
> index 0000000000000..3e97846b091c9
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/Makefile
> @@ -0,0 +1,7 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +apu_drm-y += apu_drm_drv.o
> +apu_drm-y += apu_sched.o
> +apu_drm-y += apu_gem.o
> +
> +obj-$(CONFIG_DRM_APU) += apu_drm.o
> diff --git a/drivers/gpu/drm/apu/apu_drm_drv.c b/drivers/gpu/drm/apu/apu_drm_drv.c
> new file mode 100644
> index 0000000000000..91d8c99e373c0
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/apu_drm_drv.c
> @@ -0,0 +1,238 @@
> +// SPDX-License-Identifier: GPL-2.0
> +//
> +// Copyright 2020 BayLibre SAS
> +
> +#include <linux/dma-map-ops.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/iommu.h>
> +#include <linux/iova.h>
> +#include <linux/list.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/platform_device.h>
> +#include <linux/remoteproc.h>
> +
> +#include <drm/apu_drm.h>
> +#include <drm/drm_drv.h>
> +#include <drm/drm_gem_cma_helper.h>
> +#include <drm/drm_probe_helper.h>
> +
> +#include <uapi/drm/apu_drm.h>
> +
> +#include "apu_internal.h"
> +
> +static LIST_HEAD(apu_devices);
> +
> +static const struct drm_ioctl_desc ioctls[] = {
> +       DRM_IOCTL_DEF_DRV(APU_GEM_NEW, ioctl_gem_new,
> +                         DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(APU_GEM_QUEUE, ioctl_gem_queue,
> +                         DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(APU_GEM_DEQUEUE, ioctl_gem_dequeue,
> +                         DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_MAP, ioctl_gem_iommu_map,
> +                         DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(APU_GEM_IOMMU_UNMAP, ioctl_gem_iommu_unmap,
> +                         DRM_RENDER_ALLOW),
> +       DRM_IOCTL_DEF_DRV(APU_STATE, ioctl_apu_state,
> +                         DRM_RENDER_ALLOW),
> +};
> +
> +DEFINE_DRM_GEM_CMA_FOPS(apu_drm_ops);
> +
> +static struct drm_driver apu_drm_driver = {
> +       .driver_features = DRIVER_GEM | DRIVER_SYNCOBJ,
> +       .name = "drm_apu",
> +       .desc = "APU DRM driver",
> +       .date = "20210319",
> +       .major = 1,
> +       .minor = 0,
> +       .patchlevel = 0,
> +       .ioctls = ioctls,
> +       .num_ioctls = ARRAY_SIZE(ioctls),
> +       .fops = &apu_drm_ops,
> +       DRM_GEM_CMA_DRIVER_OPS_WITH_DUMB_CREATE(drm_gem_cma_dumb_create),
> +};
> +
> +void *apu_drm_priv(struct apu_core *apu_core)
> +{
> +       return apu_core->priv;
> +}
> +EXPORT_SYMBOL_GPL(apu_drm_priv);
> +
> +int apu_drm_reserve_iova(struct apu_core *apu_core, u64 start, u64 size)
> +{
> +       struct apu_drm *apu_drm = apu_core->apu_drm;
> +       struct iova *iova;
> +
> +       iova = reserve_iova(&apu_drm->iovad, PHYS_PFN(start),
> +                           PHYS_PFN(start + size));
> +       if (!iova)
> +               return -ENOMEM;
> +
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(apu_drm_reserve_iova);
> +
> +static int apu_drm_init_first_core(struct apu_drm *apu_drm,
> +                                  struct apu_core *apu_core)
> +{
> +       struct drm_device *drm;
> +       struct device *parent;
> +       u64 mask;
> +
> +       drm = apu_drm->drm;
> +       parent = apu_core->rproc->dev.parent;
> +       drm->dev->iommu_group = parent->iommu_group;
> +       apu_drm->domain = iommu_get_domain_for_dev(parent);
> +       set_dma_ops(drm->dev, get_dma_ops(parent));
> +       mask = dma_get_mask(parent);
> +       return dma_coerce_mask_and_coherent(drm->dev, mask);
> +}
> +
> +struct apu_core *apu_drm_register_core(struct rproc *rproc,
> +                                      struct apu_drm_ops *ops, void *priv)
> +{
> +       struct apu_drm *apu_drm;
> +       struct apu_core *apu_core;
> +       int ret;
> +
> +       list_for_each_entry(apu_drm, &apu_devices, node) {
> +               list_for_each_entry(apu_core, &apu_drm->apu_cores, node) {
> +                       if (apu_core->rproc == rproc) {
> +                               ret =
> +                                   apu_drm_init_first_core(apu_drm, apu_core);
> +                               apu_core->dev = &rproc->dev;
> +                               apu_core->priv = priv;
> +                               apu_core->ops = ops;
> +
> +                               ret = apu_drm_job_init(apu_core);
> +                               if (ret)
> +                                       return NULL;
> +
> +                               return apu_core;
> +                       }
> +               }
> +       }
> +
> +       return NULL;
> +}
> +EXPORT_SYMBOL_GPL(apu_drm_register_core);
> +
> +int apu_drm_unregister_core(void *priv)
> +{
> +       struct apu_drm *apu_drm;
> +       struct apu_core *apu_core;
> +
> +       list_for_each_entry(apu_drm, &apu_devices, node) {
> +               list_for_each_entry(apu_core, &apu_drm->apu_cores, node) {
> +                       if (apu_core->priv == priv) {
> +                               apu_sched_fini(apu_core);
> +                               apu_core->priv = NULL;
> +                               apu_core->ops = NULL;
> +                       }
> +               }
> +       }
> +
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(apu_drm_unregister_core);
> +
> +#ifdef CONFIG_OF
> +static const struct of_device_id apu_platform_of_match[] = {
> +       { .compatible = "mediatek,apu-drm", },
> +       { },
> +};
> +
> +MODULE_DEVICE_TABLE(of, apu_platform_of_match);
> +#endif
> +
> +static int apu_platform_probe(struct platform_device *pdev)
> +{
> +       struct drm_device *drm;
> +       struct apu_drm *apu_drm;
> +       struct of_phandle_iterator it;
> +       int index = 0;
> +       u64 iova[2];
> +       int ret;
> +
> +       apu_drm = devm_kzalloc(&pdev->dev, sizeof(*apu_drm), GFP_KERNEL);
> +       if (!apu_drm)
> +               return -ENOMEM;
> +       INIT_LIST_HEAD(&apu_drm->apu_cores);
> +
> +       of_phandle_iterator_init(&it, pdev->dev.of_node, "remoteproc", NULL, 0);
> +       while (of_phandle_iterator_next(&it) == 0) {
> +               struct rproc *rproc = rproc_get_by_phandle(it.phandle);
> +               struct apu_core *apu_core;
> +
> +               if (!rproc)
> +                       return -EPROBE_DEFER;
> +
> +               apu_core = devm_kzalloc(&pdev->dev, sizeof(*apu_core),
> +                                       GFP_KERNEL);
> +               if (!apu_core)
> +                       return -ENOMEM;
> +
> +               apu_core->rproc = rproc;
> +               apu_core->device_id = index++;
> +               apu_core->apu_drm = apu_drm;
> +               spin_lock_init(&apu_core->ctx_lock);
> +               INIT_LIST_HEAD(&apu_core->requests);
> +               list_add(&apu_core->node, &apu_drm->apu_cores);
> +       }
> +
> +       if (of_property_read_variable_u64_array(pdev->dev.of_node, "iova",
> +                                               iova, ARRAY_SIZE(iova),
> +                                               ARRAY_SIZE(iova)) !=
> +           ARRAY_SIZE(iova))
> +               return -EINVAL;
> +
> +       init_iova_domain(&apu_drm->iovad, PAGE_SIZE, PHYS_PFN(iova[0]));
> +       apu_drm->iova_limit_pfn = PHYS_PFN(iova[0] + iova[1]) - 1;
> +
> +       drm = drm_dev_alloc(&apu_drm_driver, &pdev->dev);
> +       if (IS_ERR(drm)) {
> +               ret = PTR_ERR(drm);
> +               return ret;
> +       }
> +
> +       ret = drm_dev_register(drm, 0);
> +       if (ret) {
> +               drm_dev_put(drm);
> +               return ret;
> +       }
> +
> +       drm->dev_private = apu_drm;
> +       apu_drm->drm = drm;
> +       apu_drm->dev = &pdev->dev;
> +
> +       platform_set_drvdata(pdev, drm);
> +
> +       list_add(&apu_drm->node, &apu_devices);
> +
> +       return 0;
> +}
> +
> +static int apu_platform_remove(struct platform_device *pdev)
> +{
> +       struct drm_device *drm;
> +
> +       drm = platform_get_drvdata(pdev);
> +
> +       drm_dev_unregister(drm);
> +       drm_dev_put(drm);
> +
> +       return 0;
> +}
> +
> +static struct platform_driver apu_platform_driver = {
> +       .probe = apu_platform_probe,
> +       .remove = apu_platform_remove,
> +       .driver = {
> +                  .name = "apu_drm",
> +                  .of_match_table = of_match_ptr(apu_platform_of_match),
> +       },
> +};
> +
> +module_platform_driver(apu_platform_driver);
> diff --git a/drivers/gpu/drm/apu/apu_gem.c b/drivers/gpu/drm/apu/apu_gem.c
> new file mode 100644
> index 0000000000000..c867143dab436
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/apu_gem.c
> @@ -0,0 +1,232 @@
> +// SPDX-License-Identifier: GPL-2.0
> +//
> +// Copyright 2020 BayLibre SAS
> +
> +#include <asm/cacheflush.h>
> +
> +#include <linux/dma-buf.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/highmem.h>
> +#include <linux/iommu.h>
> +#include <linux/iova.h>
> +#include <linux/mm.h>
> +#include <linux/swap.h>
> +
> +#include <drm/drm_drv.h>
> +#include <drm/drm_gem_cma_helper.h>
> +
> +#include <uapi/drm/apu_drm.h>
> +
> +#include "apu_internal.h"
> +
> +struct drm_gem_object *apu_gem_create_object(struct drm_device *dev,
> +                                            size_t size)
> +{
> +       struct drm_gem_cma_object *cma_obj;
> +
> +       cma_obj = drm_gem_cma_create(dev, size);
> +       if (!cma_obj)
> +               return NULL;
> +
> +       return &cma_obj->base;
> +}
> +
> +int ioctl_gem_new(struct drm_device *dev, void *data,
> +                 struct drm_file *file_priv)
> +{
> +       struct drm_apu_gem_new *args = data;
> +       struct drm_gem_cma_object *cma_obj;
> +       struct apu_gem_object *apu_obj;
> +       struct drm_gem_object *gem_obj;
> +       int ret;
> +
> +       cma_obj = drm_gem_cma_create(dev, args->size);
> +       if (IS_ERR(cma_obj))
> +               return PTR_ERR(cma_obj);
> +
> +       gem_obj = &cma_obj->base;
> +       apu_obj = to_apu_bo(gem_obj);
> +
> +       /*
> +        * Save the size of buffer expected by application instead of the
> +        * aligned one.
> +        */
> +       apu_obj->size = args->size;
> +       apu_obj->offset = 0;
> +       apu_obj->iommu_refcount = 0;
> +       mutex_init(&apu_obj->mutex);
> +
> +       ret = drm_gem_handle_create(file_priv, gem_obj, &args->handle);
> +       drm_gem_object_put(gem_obj);
> +       if (ret) {
> +               drm_gem_cma_free_object(gem_obj);
> +               return ret;
> +       }
> +       args->offset = drm_vma_node_offset_addr(&gem_obj->vma_node);
> +
> +       return 0;
> +}
> +
> +void apu_bo_iommu_unmap(struct apu_drm *apu_drm, struct apu_gem_object *obj)
> +{
> +       int iova_pfn;
> +       int i;
> +
> +       if (!obj->iommu_sgt)
> +               return;
> +
> +       mutex_lock(&obj->mutex);
> +       obj->iommu_refcount--;
> +       if (obj->iommu_refcount) {
> +               mutex_unlock(&obj->mutex);
> +               return;
> +       }
> +
> +       iova_pfn = PHYS_PFN(obj->iova);
> +       for (i = 0; i < obj->iommu_sgt->nents; i++) {
> +               iommu_unmap(apu_drm->domain, PFN_PHYS(iova_pfn),
> +                           PAGE_ALIGN(obj->iommu_sgt->sgl[i].length));
> +               iova_pfn += PHYS_PFN(PAGE_ALIGN(obj->iommu_sgt->sgl[i].length));
> +       }
> +
> +       sg_free_table(obj->iommu_sgt);
> +       kfree(obj->iommu_sgt);
> +
> +       free_iova(&apu_drm->iovad, PHYS_PFN(obj->iova));
> +       mutex_unlock(&obj->mutex);
> +}
> +
> +static struct sg_table *apu_get_sg_table(struct drm_gem_object *obj)
> +{
> +       if (obj->funcs)
> +               return obj->funcs->get_sg_table(obj);
> +       return NULL;
> +}
> +
> +int apu_bo_iommu_map(struct apu_drm *apu_drm, struct drm_gem_object *obj)
> +{
> +       struct apu_gem_object *apu_obj = to_apu_bo(obj);
> +       struct scatterlist *sgl;
> +       phys_addr_t phys;
> +       int total_buf_space;
> +       int iova_pfn;
> +       int iova;
> +       int ret;
> +       int i;
> +
> +       mutex_lock(&apu_obj->mutex);
> +       apu_obj->iommu_refcount++;
> +       if (apu_obj->iommu_refcount != 1) {
> +               mutex_unlock(&apu_obj->mutex);
> +               return 0;
> +       }
> +
> +       apu_obj->iommu_sgt = apu_get_sg_table(obj);
> +       if (IS_ERR(apu_obj->iommu_sgt)) {
> +               mutex_unlock(&apu_obj->mutex);
> +               return PTR_ERR(apu_obj->iommu_sgt);
> +       }
> +
> +       total_buf_space = obj->size;
> +       iova_pfn = alloc_iova_fast(&apu_drm->iovad,
> +                                  total_buf_space >> PAGE_SHIFT,
> +                                  apu_drm->iova_limit_pfn, true);
> +       apu_obj->iova = PFN_PHYS(iova_pfn);
> +
> +       if (!iova_pfn) {
> +               dev_err(apu_drm->dev, "Failed to allocate iova address\n");
> +               mutex_unlock(&apu_obj->mutex);
> +               return -ENOMEM;
> +       }
> +
> +       iova = apu_obj->iova;
> +       sgl = apu_obj->iommu_sgt->sgl;
> +       for (i = 0; i < apu_obj->iommu_sgt->nents; i++) {
> +               phys = page_to_phys(sg_page(&sgl[i]));
> +               ret =
> +                   iommu_map(apu_drm->domain, PFN_PHYS(iova_pfn), phys,
> +                             PAGE_ALIGN(sgl[i].length),
> +                             IOMMU_READ | IOMMU_WRITE);
> +               if (ret) {
> +                       dev_err(apu_drm->dev, "Failed to iommu map\n");
> +                       free_iova(&apu_drm->iovad, iova_pfn);
> +                       mutex_unlock(&apu_obj->mutex);
> +                       return ret;
> +               }
> +               iova += sgl[i].offset + sgl[i].length;
> +               iova_pfn += PHYS_PFN(PAGE_ALIGN(sgl[i].length));
> +       }
> +       mutex_unlock(&apu_obj->mutex);
> +
> +       return 0;
> +}
> +
> +int ioctl_gem_iommu_map(struct drm_device *dev, void *data,
> +                       struct drm_file *file_priv)
> +{
> +       struct apu_drm *apu_drm = dev->dev_private;
> +       struct drm_apu_gem_iommu_map *args = data;
> +       struct drm_gem_object **bos;
> +       void __user *bo_handles;
> +       int ret;
> +       int i;
> +
> +       u64 *das = kvmalloc_array(args->bo_handle_count,
> +                                 sizeof(u64), GFP_KERNEL);
> +       if (!das)
> +               return -ENOMEM;
> +
> +       bo_handles = (void __user *)(uintptr_t) args->bo_handles;
> +       ret = drm_gem_objects_lookup(file_priv, bo_handles,
> +                                    args->bo_handle_count, &bos);
> +       if (ret) {
> +               kvfree(das);
> +               return ret;
> +       }
> +
> +       for (i = 0; i < args->bo_handle_count; i++) {
> +               ret = apu_bo_iommu_map(apu_drm, bos[i]);
> +               if (ret) {
> +                       /* TODO: handle error */
> +                       break;
> +               }
> +               das[i] = to_apu_bo(bos[i])->iova + to_apu_bo(bos[i])->offset;
> +       }
> +
> +       if (copy_to_user((void *)args->bo_device_addresses, das,
> +                        args->bo_handle_count * sizeof(u64))) {
> +               ret = -EFAULT;
> +               DRM_DEBUG("Failed to copy device addresses\n");
> +               goto out;
> +       }
> +
> +out:
> +       kvfree(das);
> +       kvfree(bos);
> +
> +       return 0;
> +}
> +
> +int ioctl_gem_iommu_unmap(struct drm_device *dev, void *data,
> +                         struct drm_file *file_priv)
> +{
> +       struct apu_drm *apu_drm = dev->dev_private;
> +       struct drm_apu_gem_iommu_map *args = data;
> +       struct drm_gem_object **bos;
> +       void __user *bo_handles;
> +       int ret;
> +       int i;
> +
> +       bo_handles = (void __user *)(uintptr_t) args->bo_handles;
> +       ret = drm_gem_objects_lookup(file_priv, bo_handles,
> +                                    args->bo_handle_count, &bos);
> +       if (ret)
> +               return ret;
> +
> +       for (i = 0; i < args->bo_handle_count; i++)
> +               apu_bo_iommu_unmap(apu_drm, to_apu_bo(bos[i]));
> +
> +       kvfree(bos);
> +
> +       return 0;
> +}
> diff --git a/drivers/gpu/drm/apu/apu_internal.h b/drivers/gpu/drm/apu/apu_internal.h
> new file mode 100644
> index 0000000000000..b789b2f3ad9c6
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/apu_internal.h
> @@ -0,0 +1,89 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __APU_INTERNAL_H__
> +#define __APU_INTERNAL_H__
> +
> +#include <linux/iova.h>
> +
> +#include <drm/drm_drv.h>
> +#include <drm/drm_gem_cma_helper.h>
> +#include <drm/gpu_scheduler.h>
> +
> +struct apu_gem_object {
> +       struct drm_gem_cma_object base;
> +       struct mutex mutex;
> +       struct sg_table *iommu_sgt;
> +       int iommu_refcount;
> +       size_t size;
> +       u32 iova;
> +       u32 offset;
> +};
> +
> +struct apu_sched;
> +struct apu_core {
> +       int device_id;
> +       struct device *dev;
> +       struct rproc *rproc;
> +       struct apu_drm_ops *ops;
> +       struct apu_drm *apu_drm;
> +
> +       spinlock_t ctx_lock;
> +       struct list_head requests;
> +
> +       struct list_head node;
> +       void *priv;
> +
> +       struct apu_sched *sched;
> +       u32 flags;
> +};
> +
> +struct apu_drm {
> +       struct device *dev;
> +       struct drm_device *drm;
> +
> +       struct iommu_domain *domain;
> +       struct iova_domain iovad;
> +       int iova_limit_pfn;
> +
> +       struct list_head apu_cores;
> +       struct list_head node;
> +};
> +
> +static inline struct apu_gem_object *to_apu_bo(struct drm_gem_object *obj)
> +{
> +       return container_of(to_drm_gem_cma_obj(obj), struct apu_gem_object,
> +                           base);
> +}
> +
> +struct apu_gem_object *to_apu_bo(struct drm_gem_object *obj);
> +struct drm_gem_object *apu_gem_create_object(struct drm_device *dev,
> +                                            size_t size);
> +
> +int apu_bo_iommu_map(struct apu_drm *apu_drm, struct drm_gem_object *obj);
> +void apu_bo_iommu_unmap(struct apu_drm *apu_drm, struct apu_gem_object *obj);
> +struct drm_gem_object *apu_gem_create_object(struct drm_device *dev,
> +                                            size_t size);
> +int ioctl_gem_new(struct drm_device *dev, void *data,
> +                 struct drm_file *file_priv);
> +int ioctl_gem_user_new(struct drm_device *dev, void *data,
> +                      struct drm_file *file_priv);
> +int ioctl_gem_iommu_map(struct drm_device *dev, void *data,
> +                       struct drm_file *file_priv);
> +int ioctl_gem_iommu_unmap(struct drm_device *dev, void *data,
> +                         struct drm_file *file_priv);
> +int ioctl_gem_queue(struct drm_device *dev, void *data,
> +                   struct drm_file *file_priv);
> +int ioctl_gem_dequeue(struct drm_device *dev, void *data,
> +                     struct drm_file *file_priv);
> +int ioctl_apu_state(struct drm_device *dev, void *data,
> +                   struct drm_file *file_priv);
> +struct dma_buf *apu_gem_prime_export(struct drm_gem_object *gem,
> +                                    int flags);
> +
> +struct apu_job;
> +
> +int apu_drm_job_init(struct apu_core *core);
> +void apu_sched_fini(struct apu_core *core);
> +int apu_job_push(struct apu_job *job);
> +void apu_job_put(struct apu_job *job);
> +
> +#endif /* __APU_INTERNAL_H__ */
> diff --git a/drivers/gpu/drm/apu/apu_sched.c b/drivers/gpu/drm/apu/apu_sched.c
> new file mode 100644
> index 0000000000000..cebb0155c7783
> --- /dev/null
> +++ b/drivers/gpu/drm/apu/apu_sched.c
> @@ -0,0 +1,634 @@
> +// SPDX-License-Identifier: GPL-2.0
> +//
> +// Copyright 2020 BayLibre SAS
> +
> +#include <drm/apu_drm.h>
> +#include <drm/drm_drv.h>
> +#include <drm/drm_gem_cma_helper.h>
> +#include <drm/drm_syncobj.h>
> +#include <drm/gpu_scheduler.h>
> +
> +#include <uapi/drm/apu_drm.h>
> +
> +#include "apu_internal.h"
> +
> +struct apu_queue_state {
> +       struct drm_gpu_scheduler sched;
> +
> +       u64 fence_context;
> +       u64 seqno;
> +};
> +
> +struct apu_request {
> +       struct list_head node;
> +       void *job;
> +};
> +
> +struct apu_sched {
> +       struct apu_queue_state apu_queue;
> +       spinlock_t job_lock;
> +       struct drm_sched_entity sched_entity;
> +};
> +
> +struct apu_event {
> +       struct drm_pending_event pending_event;
> +       union {
> +               struct drm_event base;
> +               struct apu_job_event job_event;
> +       };
> +};
> +
> +struct apu_job {
> +       struct drm_sched_job base;
> +
> +       struct kref refcount;
> +
> +       struct apu_core *apu_core;
> +       struct apu_drm *apu_drm;
> +
> +       /* Fence to be signaled by IRQ handler when the job is complete. */
> +       struct dma_fence *done_fence;
> +
> +       __u32 cmd;
> +
> +       /* Exclusive fences we have taken from the BOs to wait for */
> +       struct dma_fence **implicit_fences;
> +       struct drm_gem_object **bos;
> +       u32 bo_count;
> +
> +       /* Fence to be signaled by drm-sched once its done with the job */
> +       struct dma_fence *render_done_fence;
> +
> +       void *data_in;
> +       uint16_t size_in;
> +       void *data_out;
> +       uint16_t size_out;
> +       uint16_t result;
> +       uint16_t id;
> +
> +       struct list_head node;
> +       struct drm_syncobj *sync_out;
> +
> +       struct apu_event *event;
> +};
> +
> +static DEFINE_IDA(req_ida);
> +static LIST_HEAD(complete_node);
> +
> +int apu_drm_callback(struct apu_core *apu_core, void *data, int len)
> +{
> +       struct apu_request *apu_req, *tmp;
> +       struct apu_dev_request *hdr = data;
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&apu_core->ctx_lock, flags);
> +       list_for_each_entry_safe(apu_req, tmp, &apu_core->requests, node) {
> +               struct apu_job *job = apu_req->job;
> +
> +               if (job && hdr->id == job->id) {
> +                       kref_get(&job->refcount);
> +                       job->result = hdr->result;
> +                       if (job->size_out)
> +                               memcpy(job->data_out, hdr->data + job->size_in,
> +                                      min(job->size_out, hdr->size_out));
> +                       job->size_out = hdr->size_out;
> +                       list_add(&job->node, &complete_node);
> +                       list_del(&apu_req->node);
> +                       ida_simple_remove(&req_ida, hdr->id);
> +                       kfree(apu_req);
> +                       drm_send_event(job->apu_drm->drm,
> +                                      &job->event->pending_event);
> +                       dma_fence_signal_locked(job->done_fence);
> +               }
> +       }
> +       spin_unlock_irqrestore(&apu_core->ctx_lock, flags);
> +
> +       return 0;
> +}
> +
> +void apu_sched_fini(struct apu_core *core)
> +{
> +       drm_sched_fini(&core->sched->apu_queue.sched);
> +       devm_kfree(core->dev, core->sched);
> +       core->flags &= ~APU_ONLINE;
> +       core->sched = NULL;
> +}
> +
> +static void apu_job_cleanup(struct kref *ref)
> +{
> +       struct apu_job *job = container_of(ref, struct apu_job,
> +                                          refcount);
> +       unsigned int i;
> +
> +       if (job->implicit_fences) {
> +               for (i = 0; i < job->bo_count; i++)
> +                       dma_fence_put(job->implicit_fences[i]);
> +               kvfree(job->implicit_fences);
> +       }
> +       dma_fence_put(job->done_fence);
> +       dma_fence_put(job->render_done_fence);
> +
> +       if (job->bos) {
> +               for (i = 0; i < job->bo_count; i++) {
> +                       struct apu_gem_object *apu_obj;
> +
> +                       apu_obj = to_apu_bo(job->bos[i]);
> +                       apu_bo_iommu_unmap(job->apu_drm, apu_obj);
> +                       drm_gem_object_put(job->bos[i]);
> +               }
> +
> +               kvfree(job->bos);
> +       }
> +
> +       kfree(job->data_out);
> +       kfree(job->data_in);
> +       kfree(job);
> +}
> +
> +void apu_job_put(struct apu_job *job)
> +{
> +       kref_put(&job->refcount, apu_job_cleanup);
> +}
> +
> +static void apu_acquire_object_fences(struct drm_gem_object **bos,
> +                                     int bo_count,
> +                                     struct dma_fence **implicit_fences)
> +{
> +       int i;
> +
> +       for (i = 0; i < bo_count; i++)
> +               implicit_fences[i] = dma_resv_get_excl_unlocked(bos[i]->resv);
> +}
> +
> +static void apu_attach_object_fences(struct drm_gem_object **bos,
> +                                    int bo_count, struct dma_fence *fence)
> +{
> +       int i;
> +
> +       for (i = 0; i < bo_count; i++)
> +               dma_resv_add_excl_fence(bos[i]->resv, fence);
> +}
> +
> +int apu_job_push(struct apu_job *job)
> +{
> +       struct drm_sched_entity *entity = &job->apu_core->sched->sched_entity;
> +       struct ww_acquire_ctx acquire_ctx;
> +       int ret = 0;
> +
> +       ret = drm_gem_lock_reservations(job->bos, job->bo_count, &acquire_ctx);
> +       if (ret)
> +               return ret;
> +
> +       ret = drm_sched_job_init(&job->base, entity, NULL);
> +       if (ret)
> +               goto unlock;
> +
> +       job->render_done_fence = dma_fence_get(&job->base.s_fence->finished);
> +
> +       kref_get(&job->refcount);       /* put by scheduler job completion */
> +
> +       apu_acquire_object_fences(job->bos, job->bo_count,
> +                                 job->implicit_fences);
> +
> +       drm_sched_entity_push_job(&job->base, entity);
> +
> +       apu_attach_object_fences(job->bos, job->bo_count,
> +                                job->render_done_fence);
> +
> +unlock:
> +       drm_gem_unlock_reservations(job->bos, job->bo_count, &acquire_ctx);
> +
> +       return ret;
> +}
> +
> +static const char *apu_fence_get_driver_name(struct dma_fence *fence)
> +{
> +       return "apu";
> +}
> +
> +static const char *apu_fence_get_timeline_name(struct dma_fence *fence)
> +{
> +       return "apu-0";
> +}
> +
> +static void apu_fence_release(struct dma_fence *f)
> +{
> +       kfree(f);
> +}
> +
> +static const struct dma_fence_ops apu_fence_ops = {
> +       .get_driver_name = apu_fence_get_driver_name,
> +       .get_timeline_name = apu_fence_get_timeline_name,
> +       .release = apu_fence_release,
> +};
> +
> +static struct dma_fence *apu_fence_create(struct apu_sched *sched)
> +{
> +       struct dma_fence *fence;
> +       struct apu_queue_state *apu_queue = &sched->apu_queue;
> +
> +       fence = kzalloc(sizeof(*fence), GFP_KERNEL);
> +       if (!fence)
> +               return ERR_PTR(-ENOMEM);
> +
> +       dma_fence_init(fence, &apu_fence_ops, &sched->job_lock,
> +                      apu_queue->fence_context, apu_queue->seqno++);
> +
> +       return fence;
> +}
> +
> +static struct apu_job *to_apu_job(struct drm_sched_job *sched_job)
> +{
> +       return container_of(sched_job, struct apu_job, base);
> +}
> +
> +static struct dma_fence *apu_job_dependency(struct drm_sched_job *sched_job,
> +                                           struct drm_sched_entity *s_entity)
> +{
> +       struct apu_job *job = to_apu_job(sched_job);
> +       struct dma_fence *fence;
> +       unsigned int i;
> +
> +       /* Implicit fences, max. one per BO */
> +       for (i = 0; i < job->bo_count; i++) {
> +               if (job->implicit_fences[i]) {
> +                       fence = job->implicit_fences[i];
> +                       job->implicit_fences[i] = NULL;
> +                       return fence;
> +               }
> +       }
> +
> +       return NULL;
> +}
> +
> +static int apu_job_hw_submit(struct apu_job *job)
> +{
> +       int ret;
> +       struct apu_core *apu_core = job->apu_core;
> +       struct apu_dev_request *dev_req;
> +       struct apu_request *apu_req;
> +       unsigned long flags;
> +
> +       int size = sizeof(*dev_req) + sizeof(u32) * job->bo_count * 2;
> +       u32 *dev_req_da;
> +       u32 *dev_req_buffer_size;
> +       int i;
> +
> +       dev_req = kmalloc(size + job->size_in + job->size_out, GFP_KERNEL);
> +       if (!dev_req)
> +               return -ENOMEM;
> +
> +       dev_req->cmd = job->cmd;
> +       dev_req->size_in = job->size_in;
> +       dev_req->size_out = job->size_out;
> +       dev_req->count = job->bo_count;
> +       dev_req_da =
> +           (u32 *) (dev_req->data + dev_req->size_in + dev_req->size_out);
> +       dev_req_buffer_size = (u32 *) (dev_req_da + dev_req->count);
> +       memcpy(dev_req->data, job->data_in, job->size_in);
> +
> +       apu_req = kzalloc(sizeof(*apu_req), GFP_KERNEL);
> +       if (!apu_req)
> +               return -ENOMEM;
> +
> +       for (i = 0; i < job->bo_count; i++) {
> +               struct apu_gem_object *obj = to_apu_bo(job->bos[i]);
> +
> +               dev_req_da[i] = obj->iova + obj->offset;
> +               dev_req_buffer_size[i] = obj->size;
> +       }
> +
> +       ret = ida_simple_get(&req_ida, 0, 0xffff, GFP_KERNEL);
> +       if (ret < 0)
> +               goto err_free_memory;
> +
> +       dev_req->id = ret;
> +
> +       job->id = dev_req->id;
> +       apu_req->job = job;
> +       spin_lock_irqsave(&apu_core->ctx_lock, flags);
> +       list_add(&apu_req->node, &apu_core->requests);
> +       spin_unlock_irqrestore(&apu_core->ctx_lock, flags);
> +       ret =
> +           apu_core->ops->send(apu_core, dev_req,
> +                               size + dev_req->size_in + dev_req->size_out);
> +       if (ret < 0)
> +               goto err;
> +       kfree(dev_req);
> +
> +       return 0;
> +
> +err:
> +       list_del(&apu_req->node);
> +       ida_simple_remove(&req_ida, dev_req->id);
> +err_free_memory:
> +       kfree(apu_req);
> +       kfree(dev_req);
> +
> +       return ret;
> +}
> +
> +static struct dma_fence *apu_job_run(struct drm_sched_job *sched_job)
> +{
> +       struct apu_job *job = to_apu_job(sched_job);
> +       struct dma_fence *fence = NULL;
> +
> +       if (unlikely(job->base.s_fence->finished.error))
> +               return NULL;
> +
> +       fence = apu_fence_create(job->apu_core->sched);
> +       if (IS_ERR(fence))
> +               return NULL;
> +
> +       job->done_fence = dma_fence_get(fence);
> +
> +       apu_job_hw_submit(job);
> +
> +       return fence;
> +}
> +
> +static void apu_update_rpoc_state(struct apu_core *core)
> +{
> +       if (core->rproc) {
> +               if (core->rproc->state == RPROC_CRASHED)
> +                       core->flags |= APU_CRASHED;
> +               if (core->rproc->state == RPROC_OFFLINE)
> +                       core->flags &= ~APU_ONLINE;
> +       }
> +}
> +
> +static enum drm_gpu_sched_stat apu_job_timedout(struct drm_sched_job *sched_job)
> +{
> +       struct apu_request *apu_req, *tmp;
> +       struct apu_job *job = to_apu_job(sched_job);
> +
> +       if (dma_fence_is_signaled(job->done_fence))
> +               return DRM_GPU_SCHED_STAT_NOMINAL;
> +
> +       list_for_each_entry_safe(apu_req, tmp, &job->apu_core->requests, node) {
> +               /* Remove the request and notify user about timeout */
> +               if (apu_req->job == job) {
> +                       kref_get(&job->refcount);
> +                       job->apu_core->flags |= APU_TIMEDOUT;
> +                       apu_update_rpoc_state(job->apu_core);
> +                       job->result = ETIMEDOUT;
> +                       list_add(&job->node, &complete_node);
> +                       list_del(&apu_req->node);
> +                       ida_simple_remove(&req_ida, job->id);
> +                       kfree(apu_req);
> +                       drm_send_event(job->apu_drm->drm,
> +                                      &job->event->pending_event);
> +                       dma_fence_signal_locked(job->done_fence);
> +               }
> +       }
> +
> +       return DRM_GPU_SCHED_STAT_NOMINAL;
> +}
> +
> +static void apu_job_free(struct drm_sched_job *sched_job)
> +{
> +       struct apu_job *job = to_apu_job(sched_job);
> +
> +       drm_sched_job_cleanup(sched_job);
> +
> +       apu_job_put(job);
> +}
> +
> +static const struct drm_sched_backend_ops apu_sched_ops = {
> +       .dependency = apu_job_dependency,
> +       .run_job = apu_job_run,
> +       .timedout_job = apu_job_timedout,
> +       .free_job = apu_job_free
> +};
> +
> +int apu_drm_job_init(struct apu_core *core)
> +{
> +       int ret;
> +       struct apu_sched *apu_sched;
> +       struct drm_gpu_scheduler *sched;
> +
> +       apu_sched = devm_kzalloc(core->dev, sizeof(*apu_sched), GFP_KERNEL);
> +       if (!apu_sched)
> +               return -ENOMEM;
> +
> +       sched = &apu_sched->apu_queue.sched;
> +       apu_sched->apu_queue.fence_context = dma_fence_context_alloc(1);
> +       ret = drm_sched_init(sched, &apu_sched_ops,
> +                            1, 0, msecs_to_jiffies(500),
> +                            NULL, NULL, "apu_js");
> +       if (ret) {
> +               dev_err(core->dev, "Failed to create scheduler: %d.", ret);
> +               return ret;
> +       }
> +
> +       ret = drm_sched_entity_init(&apu_sched->sched_entity,
> +                                   DRM_SCHED_PRIORITY_NORMAL,
> +                                   &sched, 1, NULL);
> +
> +       core->sched = apu_sched;
> +       core->flags = APU_ONLINE;
> +
> +       return ret;
> +}
> +
> +static struct apu_core *get_apu_core(struct apu_drm *apu_drm, int device_id)
> +{
> +       struct apu_core *apu_core;
> +
> +       list_for_each_entry(apu_core, &apu_drm->apu_cores, node) {
> +               if (apu_core->device_id == device_id)
> +                       return apu_core;
> +       }
> +
> +       return NULL;
> +}
> +
> +static int apu_core_is_running(struct apu_core *core)
> +{
> +       return core->ops && core->priv && core->sched;
> +}
> +
> +static int
> +apu_lookup_bos(struct drm_device *dev,
> +              struct drm_file *file_priv,
> +              struct drm_apu_gem_queue *args, struct apu_job *job)
> +{
> +       void __user *bo_handles;
> +       unsigned int i;
> +       int ret;
> +
> +       job->bo_count = args->bo_handle_count;
> +
> +       if (!job->bo_count)
> +               return 0;
> +
> +       job->implicit_fences = kvmalloc_array(job->bo_count,
> +                                             sizeof(struct dma_fence *),
> +                                             GFP_KERNEL | __GFP_ZERO);
> +       if (!job->implicit_fences)
> +               return -ENOMEM;
> +
> +       bo_handles = (void __user *)(uintptr_t) args->bo_handles;
> +       ret = drm_gem_objects_lookup(file_priv, bo_handles,
> +                                    job->bo_count, &job->bos);
> +       if (ret)
> +               return ret;
> +
> +       for (i = 0; i < job->bo_count; i++) {
> +               ret = apu_bo_iommu_map(job->apu_drm, job->bos[i]);
> +               if (ret) {
> +                       /* TODO: handle error */
> +                       break;
> +               }
> +       }
> +
> +       return ret;
> +}
> +
> +int ioctl_gem_queue(struct drm_device *dev, void *data,
> +                   struct drm_file *file_priv)
> +{
> +       struct apu_drm *apu_drm = dev->dev_private;
> +       struct drm_apu_gem_queue *args = data;
> +       struct apu_event *event;
> +       struct apu_core *core;
> +       struct drm_syncobj *sync_out = NULL;
> +       struct apu_job *job;
> +       int ret = 0;
> +
> +       core = get_apu_core(apu_drm, args->device);
> +       if (!apu_core_is_running(core))
> +               return -ENODEV;
> +
> +       if (args->out_sync > 0) {
> +               sync_out = drm_syncobj_find(file_priv, args->out_sync);
> +               if (!sync_out)
> +                       return -ENODEV;
> +       }
> +
> +       job = kzalloc(sizeof(*job), GFP_KERNEL);
> +       if (!job) {
> +               ret = -ENOMEM;
> +               goto fail_out_sync;
> +       }
> +
> +       kref_init(&job->refcount);
> +
> +       job->apu_drm = apu_drm;
> +       job->apu_core = core;
> +       job->cmd = args->cmd;
> +       job->size_in = args->size_in;
> +       job->size_out = args->size_out;
> +       job->sync_out = sync_out;
> +       if (job->size_in) {
> +               job->data_in = kmalloc(job->size_in, GFP_KERNEL);
> +               if (!job->data_in) {
> +                       ret = -ENOMEM;
> +                       goto fail_job;
> +               }
> +
> +               ret =
> +                   copy_from_user(job->data_in,
> +                                  (void __user *)(uintptr_t) args->data,
> +                                  job->size_in);
> +               if (ret)
> +                       goto fail_job;
> +       }
> +
> +       if (job->size_out) {
> +               job->data_out = kmalloc(job->size_out, GFP_KERNEL);
> +               if (!job->data_out) {
> +                       ret = -ENOMEM;
> +                       goto fail_job;
> +               }
> +       }
> +
> +       ret = apu_lookup_bos(dev, file_priv, args, job);
> +       if (ret)
> +               goto fail_job;
> +
> +       event = kzalloc(sizeof(*event), GFP_KERNEL);
> +       event->base.length = sizeof(struct apu_job_event);
> +       event->base.type = APU_JOB_COMPLETED;
> +       event->job_event.out_sync = args->out_sync;
> +       job->event = event;
> +       ret = drm_event_reserve_init(dev, file_priv, &job->event->pending_event,
> +                                    &job->event->base);
> +       if (ret)
> +               goto fail_job;
> +
> +       ret = apu_job_push(job);
> +       if (ret) {
> +               drm_event_cancel_free(dev, &job->event->pending_event);
> +               goto fail_job;
> +       }
> +
> +       /* Update the return sync object for the job */
> +       if (sync_out)
> +               drm_syncobj_replace_fence(sync_out, job->render_done_fence);
> +
> +fail_job:
> +       apu_job_put(job);
> +fail_out_sync:
> +       if (sync_out)
> +               drm_syncobj_put(sync_out);
> +
> +       return ret;
> +}
> +
> +int ioctl_gem_dequeue(struct drm_device *dev, void *data,
> +                     struct drm_file *file_priv)
> +{
> +       struct drm_apu_gem_dequeue *args = data;
> +       struct drm_syncobj *sync_out = NULL;
> +       struct apu_job *job;
> +       int ret = 0;
> +
> +       if (args->out_sync > 0) {
> +               sync_out = drm_syncobj_find(file_priv, args->out_sync);
> +               if (!sync_out)
> +                       return -ENODEV;
> +       }
> +
> +       list_for_each_entry(job, &complete_node, node) {
> +               if (job->sync_out == sync_out) {
> +                       if (job->data_out) {
> +                               ret = copy_to_user((void __user *)(uintptr_t)
> +                                                  args->data, job->data_out,
> +                                                  job->size_out);
> +                               args->size = job->size_out;
> +                       }
> +                       args->result = job->result;
> +                       list_del(&job->node);
> +                       apu_job_put(job);
> +                       drm_syncobj_put(sync_out);
> +
> +                       return ret;
> +               }
> +       }
> +
> +       if (sync_out)
> +               drm_syncobj_put(sync_out);
> +
> +       return 0;
> +}
> +
> +int ioctl_apu_state(struct drm_device *dev, void *data,
> +                   struct drm_file *file_priv)
> +{
> +       struct apu_drm *apu_drm = dev->dev_private;
> +       struct drm_apu_state *args = data;
> +       struct apu_core *core;
> +
> +       args->flags = 0;
> +
> +       core = get_apu_core(apu_drm, args->device);
> +       if (!core)
> +               return -ENODEV;
> +       args->flags |= core->flags;
> +
> +       /* Reset APU flags */
> +       core->flags &= ~(APU_TIMEDOUT | APU_CRASHED);
> +
> +       return 0;
> +}
> diff --git a/include/drm/apu_drm.h b/include/drm/apu_drm.h
> new file mode 100644
> index 0000000000000..f044ed0427fdd
> --- /dev/null
> +++ b/include/drm/apu_drm.h
> @@ -0,0 +1,59 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __APU_DRM_H__
> +#define __APU_DRM_H__
> +
> +#include <linux/iova.h>
> +#include <linux/remoteproc.h>
> +
> +struct apu_core;
> +struct apu_drm;
> +
> +struct apu_drm_ops {
> +       int (*send)(struct apu_core *apu_core, void *data, int len);
> +       int (*callback)(struct apu_core *apu_core, void *data, int len);
> +};
> +
> +#ifdef CONFIG_DRM_APU
> +
> +struct apu_core *apu_drm_register_core(struct rproc *rproc,
> +                                      struct apu_drm_ops *ops, void *priv);
> +int apu_drm_reserve_iova(struct apu_core *apu_core, u64 start, u64 size);
> +int apu_drm_unregister_core(void *priv);
> +int apu_drm_callback(struct apu_core *apu_core, void *data, int len);
> +void *apu_drm_priv(struct apu_core *apu_core);
> +
> +#else /* CONFIG_DRM_APU */
> +
> +static inline
> +struct apu_core *apu_drm_register_core(struct rproc *rproc,
> +                                      struct apu_drm_ops *ops, void *priv)
> +{
> +       return NULL;
> +}
> +
> +static inline
> +int apu_drm_reserve_iova(struct apu_core *apu_core, u64 start, u64 size)
> +{
> +       return -ENOMEM;
> +}
> +
> +static inline
> +int apu_drm_uregister_core(void *priv)
> +{
> +       return -ENODEV;
> +}
> +
> +static inline
> +int apu_drm_callback(struct apu_core *apu_core, void *data, int len)
> +{
> +       return -ENODEV;
> +}
> +
> +static inline void *apu_drm_priv(struct apu_core *apu_core)
> +{
> +       return NULL;
> +}
> +#endif /* CONFIG_DRM_APU */
> +
> +
> +#endif /* __APU_DRM_H__ */
> diff --git a/include/uapi/drm/apu_drm.h b/include/uapi/drm/apu_drm.h
> new file mode 100644
> index 0000000000000..c52e187bb0599
> --- /dev/null
> +++ b/include/uapi/drm/apu_drm.h
> @@ -0,0 +1,106 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef __UAPI_APU_DRM_H__
> +#define __UAPI_APU_DRM_H__
> +
> +#include "drm.h"
> +
> +#if defined(__cplusplus)
> +extern "C" {
> +#endif
> +
> +#define APU_JOB_COMPLETED 0x80000000
> +
> +/*
> + * Please note that modifications to all structs defined here are
> + * subject to backwards-compatibility constraints.
> + */
> +
> +/*
> + * Firmware request, must be aligned with the one defined in firmware.
> + * @id: Request id, used in the case of reply, to find the pending request
> + * @cmd: The command id to execute in the firmware
> + * @result: The result of the command executed on the firmware
> + * @size: The size of the data available in this request
> + * @count: The number of shared buffer
> + * @data: Contains the data attached with the request if size is greater than
> + *        zero, and the addresses of shared buffers if count is greater than
> + *        zero. Both the data and the shared buffer could be read and write
> + *        by the APU.
> + */
> +struct  apu_dev_request {
> +       u16 id;
> +       u16 cmd;
> +       u16 result;
> +       u16 size_in;
> +       u16 size_out;
> +       u16 count;
> +       u8 data[0];
> +} __packed;
> +
> +struct drm_apu_gem_new {
> +       __u32 size;                     /* in */
> +       __u32 flags;                    /* in */
> +       __u32 handle;                   /* out */
> +       __u64 offset;                   /* out */
> +};
> +

Please refer to
https://www.kernel.org/doc/Documentation/ioctl/botching-up-ioctls.rst

here and below in many places.

There's a lot of missing padding/alignment here.

I'm trying to find the time to review this stack in full, any writeups
on how this is used from userspace would be useful (not just the code
repo, but some sort of how do I get at it) it reads as kinda generic
(calling it apu), but then has some specifics around device binding.

Dave.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 3/4] rpmsg: Add support of AI Processor Unit (APU)
  2021-09-17 12:59   ` Alexandre Bailon
  (?)
@ 2021-09-23  3:31     ` Bjorn Andersson
  -1 siblings, 0 replies; 34+ messages in thread
From: Bjorn Andersson @ 2021-09-23  3:31 UTC (permalink / raw)
  To: Alexandre Bailon
  Cc: airlied, daniel, robh+dt, matthias.bgg, maarten.lankhorst,
	mripard, tzimmermann, ohad, mathieu.poirier, sumit.semwal,
	christian.koenig, dri-devel, devicetree, linux-arm-kernel,
	linux-mediatek, linux-kernel, linux-remoteproc, linux-media,
	linaro-mm-sig, khilman, gpain

On Fri 17 Sep 07:59 CDT 2021, Alexandre Bailon wrote:

> Some Mediatek SoC provides hardware accelerator for AI / ML.
> This driver use the DRM driver to manage the shared memory,
> and use rpmsg to execute jobs on the APU.
> 
> Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
> ---
>  drivers/rpmsg/Kconfig     |  10 +++
>  drivers/rpmsg/Makefile    |   1 +
>  drivers/rpmsg/apu_rpmsg.c | 184 ++++++++++++++++++++++++++++++++++++++
>  3 files changed, 195 insertions(+)
>  create mode 100644 drivers/rpmsg/apu_rpmsg.c
> 
> diff --git a/drivers/rpmsg/Kconfig b/drivers/rpmsg/Kconfig
> index 0b4407abdf138..fc1668f795004 100644
> --- a/drivers/rpmsg/Kconfig
> +++ b/drivers/rpmsg/Kconfig
> @@ -73,4 +73,14 @@ config RPMSG_VIRTIO
>  	select RPMSG_NS
>  	select VIRTIO
>  
> +config RPMSG_APU
> +	tristate "APU RPMSG driver"
> +	select REMOTEPROC
> +	select RPMSG_VIRTIO
> +	select DRM_APU
> +	help
> +	  This provides a RPMSG driver that provides some facilities to
> +	  communicate with an accelerated processing unit (APU).
> +	  This Uses the APU DRM driver to manage memory and job scheduling.

Similar to how a driver for e.g. an I2C device doesn't live in
drivers/i2c, this doesn't belong in drivers/rpmsg. Probably rather
directly in the DRM driver.

> +
>  endmenu
> diff --git a/drivers/rpmsg/Makefile b/drivers/rpmsg/Makefile
> index 8d452656f0ee3..8b336b9a817c1 100644
> --- a/drivers/rpmsg/Makefile
> +++ b/drivers/rpmsg/Makefile
> @@ -9,3 +9,4 @@ obj-$(CONFIG_RPMSG_QCOM_GLINK_RPM) += qcom_glink_rpm.o
>  obj-$(CONFIG_RPMSG_QCOM_GLINK_SMEM) += qcom_glink_smem.o
>  obj-$(CONFIG_RPMSG_QCOM_SMD)	+= qcom_smd.o
>  obj-$(CONFIG_RPMSG_VIRTIO)	+= virtio_rpmsg_bus.o
> +obj-$(CONFIG_RPMSG_APU)		+= apu_rpmsg.o
> diff --git a/drivers/rpmsg/apu_rpmsg.c b/drivers/rpmsg/apu_rpmsg.c
> new file mode 100644
> index 0000000000000..7e504bd176a4d
> --- /dev/null
> +++ b/drivers/rpmsg/apu_rpmsg.c
> @@ -0,0 +1,184 @@
> +// SPDX-License-Identifier: GPL-2.0
> +//
> +// Copyright 2020 BayLibre SAS
> +
> +#include <asm/cacheflush.h>
> +
> +#include <linux/cdev.h>
> +#include <linux/dma-buf.h>
> +#include <linux/dma-map-ops.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/iommu.h>
> +#include <linux/iova.h>
> +#include <linux/mm.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/platform_device.h>
> +#include <linux/remoteproc.h>
> +#include <linux/rpmsg.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +
> +#include <drm/apu_drm.h>
> +
> +#include "rpmsg_internal.h"
> +
> +#define APU_RPMSG_SERVICE_MT8183 "rpmsg-mt8183-apu0"
> +
> +struct rpmsg_apu {
> +	struct apu_core *core;
> +	struct rpmsg_device *rpdev;
> +};
> +
> +static int apu_rpmsg_callback(struct rpmsg_device *rpdev, void *data, int count,
> +			      void *priv, u32 addr)
> +{
> +	struct rpmsg_apu *apu = dev_get_drvdata(&rpdev->dev);
> +	struct apu_core *apu_core = apu->core;
> +
> +	return apu_drm_callback(apu_core, data, count);
> +}
> +
> +static int apu_rpmsg_send(struct apu_core *apu_core, void *data, int len)
> +{
> +	struct rpmsg_apu *apu = apu_drm_priv(apu_core);
> +	struct rpmsg_device *rpdev = apu->rpdev;
> +
> +	return rpmsg_send(rpdev->ept, data, len);

The rpmsg API is exposed outside drivers/rpmsg, so as I said above, just
implement this directly in your driver, no need to lug around a dummy
wrapper for things like this.

> +}
> +
> +static struct apu_drm_ops apu_rpmsg_ops = {
> +	.send = apu_rpmsg_send,
> +};
> +
> +static int apu_init_iovad(struct rproc *rproc, struct rpmsg_apu *apu)
> +{
> +	struct resource_table *table;
> +	struct fw_rsc_carveout *rsc;
> +	int i;
> +
> +	if (!rproc->table_ptr) {
> +		dev_err(&apu->rpdev->dev,
> +			"No resource_table: has the firmware been loaded ?\n");
> +		return -ENODEV;
> +	}
> +
> +	table = rproc->table_ptr;
> +	for (i = 0; i < table->num; i++) {
> +		int offset = table->offset[i];
> +		struct fw_rsc_hdr *hdr = (void *)table + offset;
> +
> +		if (hdr->type != RSC_CARVEOUT)
> +			continue;
> +
> +		rsc = (void *)hdr + sizeof(*hdr);
> +		if (apu_drm_reserve_iova(apu->core, rsc->da, rsc->len)) {
> +			dev_err(&apu->rpdev->dev,
> +				"failed to reserve iova\n");
> +			return -ENOMEM;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static struct rproc *apu_get_rproc(struct rpmsg_device *rpdev)
> +{
> +	/*
> +	 * To work, the APU RPMsg driver need to get the rproc device.
> +	 * Currently, we only use virtio so we could use that to find the
> +	 * remoteproc parent.
> +	 */
> +	if (!rpdev->dev.parent && rpdev->dev.parent->bus) {
> +		dev_err(&rpdev->dev, "invalid rpmsg device\n");
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	if (strcmp(rpdev->dev.parent->bus->name, "virtio")) {
> +		dev_err(&rpdev->dev, "unsupported bus\n");
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	return vdev_to_rproc(dev_to_virtio(rpdev->dev.parent));
> +}
> +
> +static int apu_rpmsg_probe(struct rpmsg_device *rpdev)
> +{
> +	struct rpmsg_apu *apu;
> +	struct rproc *rproc;
> +	int ret;
> +
> +	apu = devm_kzalloc(&rpdev->dev, sizeof(*apu), GFP_KERNEL);
> +	if (!apu)
> +		return -ENOMEM;
> +	apu->rpdev = rpdev;
> +
> +	rproc = apu_get_rproc(rpdev);

I believe that you can replace apu_get_rproc() with:

	rproc = rproc_get_by_child(&rpdev->dev);

> +	if (IS_ERR_OR_NULL(rproc))
> +		return PTR_ERR(rproc);
> +
> +	/* Make device dma capable by inheriting from parent's capabilities */
> +	set_dma_ops(&rpdev->dev, get_dma_ops(rproc->dev.parent));
> +
> +	ret = dma_coerce_mask_and_coherent(&rpdev->dev,
> +					   dma_get_mask(rproc->dev.parent));
> +	if (ret)
> +		goto err_put_device;
> +
> +	rpdev->dev.iommu_group = rproc->dev.parent->iommu_group;

Would it be better or you if we have a device_node, so that you could
specify the iommus property for this compute device?

I'm asking because I've seen cases where multi-purpose remoteproc
firmware operate using multiple different iommu streams...

> +
> +	apu->core = apu_drm_register_core(rproc, &apu_rpmsg_ops, apu);
> +	if (!apu->core) {
> +		ret = -ENODEV;
> +		goto err_put_device;
> +	}
> +
> +	ret = apu_init_iovad(rproc, apu);
> +
> +	dev_set_drvdata(&rpdev->dev, apu);
> +
> +	return ret;
> +
> +err_put_device:

This label looks misplaced, and sure enough, if apu_init_iovad() fails
you're not apu_drm_unregister_core().

But on that note, don't you want to apu_init_iovad() before you
apu_drm_register_core()?

> +	devm_kfree(&rpdev->dev, apu);

The reason for using devm_kzalloc() is that once you return
unsuccessfully from probe, or from remove the memory is freed.

So devm_kfree() should go in both cases.

> +
> +	return ret;
> +}
> +
> +static void apu_rpmsg_remove(struct rpmsg_device *rpdev)
> +{
> +	struct rpmsg_apu *apu = dev_get_drvdata(&rpdev->dev);
> +
> +	apu_drm_unregister_core(apu);
> +	devm_kfree(&rpdev->dev, apu);

No need to explicitly free devm resources.

Regards,
Bjorn

> +}
> +
> +static const struct rpmsg_device_id apu_rpmsg_match[] = {
> +	{ APU_RPMSG_SERVICE_MT8183 },
> +	{}
> +};
> +
> +static struct rpmsg_driver apu_rpmsg_driver = {
> +	.probe = apu_rpmsg_probe,
> +	.remove = apu_rpmsg_remove,
> +	.callback = apu_rpmsg_callback,
> +	.id_table = apu_rpmsg_match,
> +	.drv  = {
> +		.name  = "apu_rpmsg",
> +	},
> +};
> +
> +static int __init apu_rpmsg_init(void)
> +{
> +	return register_rpmsg_driver(&apu_rpmsg_driver);
> +}
> +arch_initcall(apu_rpmsg_init);
> +
> +static void __exit apu_rpmsg_exit(void)
> +{
> +	unregister_rpmsg_driver(&apu_rpmsg_driver);
> +}
> +module_exit(apu_rpmsg_exit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("APU RPMSG driver");
> -- 
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 3/4] rpmsg: Add support of AI Processor Unit (APU)
@ 2021-09-23  3:31     ` Bjorn Andersson
  0 siblings, 0 replies; 34+ messages in thread
From: Bjorn Andersson @ 2021-09-23  3:31 UTC (permalink / raw)
  To: Alexandre Bailon
  Cc: airlied, daniel, robh+dt, matthias.bgg, maarten.lankhorst,
	mripard, tzimmermann, ohad, mathieu.poirier, sumit.semwal,
	christian.koenig, dri-devel, devicetree, linux-arm-kernel,
	linux-mediatek, linux-kernel, linux-remoteproc, linux-media,
	linaro-mm-sig, khilman, gpain

On Fri 17 Sep 07:59 CDT 2021, Alexandre Bailon wrote:

> Some Mediatek SoC provides hardware accelerator for AI / ML.
> This driver use the DRM driver to manage the shared memory,
> and use rpmsg to execute jobs on the APU.
> 
> Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
> ---
>  drivers/rpmsg/Kconfig     |  10 +++
>  drivers/rpmsg/Makefile    |   1 +
>  drivers/rpmsg/apu_rpmsg.c | 184 ++++++++++++++++++++++++++++++++++++++
>  3 files changed, 195 insertions(+)
>  create mode 100644 drivers/rpmsg/apu_rpmsg.c
> 
> diff --git a/drivers/rpmsg/Kconfig b/drivers/rpmsg/Kconfig
> index 0b4407abdf138..fc1668f795004 100644
> --- a/drivers/rpmsg/Kconfig
> +++ b/drivers/rpmsg/Kconfig
> @@ -73,4 +73,14 @@ config RPMSG_VIRTIO
>  	select RPMSG_NS
>  	select VIRTIO
>  
> +config RPMSG_APU
> +	tristate "APU RPMSG driver"
> +	select REMOTEPROC
> +	select RPMSG_VIRTIO
> +	select DRM_APU
> +	help
> +	  This provides a RPMSG driver that provides some facilities to
> +	  communicate with an accelerated processing unit (APU).
> +	  This Uses the APU DRM driver to manage memory and job scheduling.

Similar to how a driver for e.g. an I2C device doesn't live in
drivers/i2c, this doesn't belong in drivers/rpmsg. Probably rather
directly in the DRM driver.

> +
>  endmenu
> diff --git a/drivers/rpmsg/Makefile b/drivers/rpmsg/Makefile
> index 8d452656f0ee3..8b336b9a817c1 100644
> --- a/drivers/rpmsg/Makefile
> +++ b/drivers/rpmsg/Makefile
> @@ -9,3 +9,4 @@ obj-$(CONFIG_RPMSG_QCOM_GLINK_RPM) += qcom_glink_rpm.o
>  obj-$(CONFIG_RPMSG_QCOM_GLINK_SMEM) += qcom_glink_smem.o
>  obj-$(CONFIG_RPMSG_QCOM_SMD)	+= qcom_smd.o
>  obj-$(CONFIG_RPMSG_VIRTIO)	+= virtio_rpmsg_bus.o
> +obj-$(CONFIG_RPMSG_APU)		+= apu_rpmsg.o
> diff --git a/drivers/rpmsg/apu_rpmsg.c b/drivers/rpmsg/apu_rpmsg.c
> new file mode 100644
> index 0000000000000..7e504bd176a4d
> --- /dev/null
> +++ b/drivers/rpmsg/apu_rpmsg.c
> @@ -0,0 +1,184 @@
> +// SPDX-License-Identifier: GPL-2.0
> +//
> +// Copyright 2020 BayLibre SAS
> +
> +#include <asm/cacheflush.h>
> +
> +#include <linux/cdev.h>
> +#include <linux/dma-buf.h>
> +#include <linux/dma-map-ops.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/iommu.h>
> +#include <linux/iova.h>
> +#include <linux/mm.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/platform_device.h>
> +#include <linux/remoteproc.h>
> +#include <linux/rpmsg.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +
> +#include <drm/apu_drm.h>
> +
> +#include "rpmsg_internal.h"
> +
> +#define APU_RPMSG_SERVICE_MT8183 "rpmsg-mt8183-apu0"
> +
> +struct rpmsg_apu {
> +	struct apu_core *core;
> +	struct rpmsg_device *rpdev;
> +};
> +
> +static int apu_rpmsg_callback(struct rpmsg_device *rpdev, void *data, int count,
> +			      void *priv, u32 addr)
> +{
> +	struct rpmsg_apu *apu = dev_get_drvdata(&rpdev->dev);
> +	struct apu_core *apu_core = apu->core;
> +
> +	return apu_drm_callback(apu_core, data, count);
> +}
> +
> +static int apu_rpmsg_send(struct apu_core *apu_core, void *data, int len)
> +{
> +	struct rpmsg_apu *apu = apu_drm_priv(apu_core);
> +	struct rpmsg_device *rpdev = apu->rpdev;
> +
> +	return rpmsg_send(rpdev->ept, data, len);

The rpmsg API is exposed outside drivers/rpmsg, so as I said above, just
implement this directly in your driver, no need to lug around a dummy
wrapper for things like this.

> +}
> +
> +static struct apu_drm_ops apu_rpmsg_ops = {
> +	.send = apu_rpmsg_send,
> +};
> +
> +static int apu_init_iovad(struct rproc *rproc, struct rpmsg_apu *apu)
> +{
> +	struct resource_table *table;
> +	struct fw_rsc_carveout *rsc;
> +	int i;
> +
> +	if (!rproc->table_ptr) {
> +		dev_err(&apu->rpdev->dev,
> +			"No resource_table: has the firmware been loaded ?\n");
> +		return -ENODEV;
> +	}
> +
> +	table = rproc->table_ptr;
> +	for (i = 0; i < table->num; i++) {
> +		int offset = table->offset[i];
> +		struct fw_rsc_hdr *hdr = (void *)table + offset;
> +
> +		if (hdr->type != RSC_CARVEOUT)
> +			continue;
> +
> +		rsc = (void *)hdr + sizeof(*hdr);
> +		if (apu_drm_reserve_iova(apu->core, rsc->da, rsc->len)) {
> +			dev_err(&apu->rpdev->dev,
> +				"failed to reserve iova\n");
> +			return -ENOMEM;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static struct rproc *apu_get_rproc(struct rpmsg_device *rpdev)
> +{
> +	/*
> +	 * To work, the APU RPMsg driver need to get the rproc device.
> +	 * Currently, we only use virtio so we could use that to find the
> +	 * remoteproc parent.
> +	 */
> +	if (!rpdev->dev.parent && rpdev->dev.parent->bus) {
> +		dev_err(&rpdev->dev, "invalid rpmsg device\n");
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	if (strcmp(rpdev->dev.parent->bus->name, "virtio")) {
> +		dev_err(&rpdev->dev, "unsupported bus\n");
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	return vdev_to_rproc(dev_to_virtio(rpdev->dev.parent));
> +}
> +
> +static int apu_rpmsg_probe(struct rpmsg_device *rpdev)
> +{
> +	struct rpmsg_apu *apu;
> +	struct rproc *rproc;
> +	int ret;
> +
> +	apu = devm_kzalloc(&rpdev->dev, sizeof(*apu), GFP_KERNEL);
> +	if (!apu)
> +		return -ENOMEM;
> +	apu->rpdev = rpdev;
> +
> +	rproc = apu_get_rproc(rpdev);

I believe that you can replace apu_get_rproc() with:

	rproc = rproc_get_by_child(&rpdev->dev);

> +	if (IS_ERR_OR_NULL(rproc))
> +		return PTR_ERR(rproc);
> +
> +	/* Make device dma capable by inheriting from parent's capabilities */
> +	set_dma_ops(&rpdev->dev, get_dma_ops(rproc->dev.parent));
> +
> +	ret = dma_coerce_mask_and_coherent(&rpdev->dev,
> +					   dma_get_mask(rproc->dev.parent));
> +	if (ret)
> +		goto err_put_device;
> +
> +	rpdev->dev.iommu_group = rproc->dev.parent->iommu_group;

Would it be better or you if we have a device_node, so that you could
specify the iommus property for this compute device?

I'm asking because I've seen cases where multi-purpose remoteproc
firmware operate using multiple different iommu streams...

> +
> +	apu->core = apu_drm_register_core(rproc, &apu_rpmsg_ops, apu);
> +	if (!apu->core) {
> +		ret = -ENODEV;
> +		goto err_put_device;
> +	}
> +
> +	ret = apu_init_iovad(rproc, apu);
> +
> +	dev_set_drvdata(&rpdev->dev, apu);
> +
> +	return ret;
> +
> +err_put_device:

This label looks misplaced, and sure enough, if apu_init_iovad() fails
you're not apu_drm_unregister_core().

But on that note, don't you want to apu_init_iovad() before you
apu_drm_register_core()?

> +	devm_kfree(&rpdev->dev, apu);

The reason for using devm_kzalloc() is that once you return
unsuccessfully from probe, or from remove the memory is freed.

So devm_kfree() should go in both cases.

> +
> +	return ret;
> +}
> +
> +static void apu_rpmsg_remove(struct rpmsg_device *rpdev)
> +{
> +	struct rpmsg_apu *apu = dev_get_drvdata(&rpdev->dev);
> +
> +	apu_drm_unregister_core(apu);
> +	devm_kfree(&rpdev->dev, apu);

No need to explicitly free devm resources.

Regards,
Bjorn

> +}
> +
> +static const struct rpmsg_device_id apu_rpmsg_match[] = {
> +	{ APU_RPMSG_SERVICE_MT8183 },
> +	{}
> +};
> +
> +static struct rpmsg_driver apu_rpmsg_driver = {
> +	.probe = apu_rpmsg_probe,
> +	.remove = apu_rpmsg_remove,
> +	.callback = apu_rpmsg_callback,
> +	.id_table = apu_rpmsg_match,
> +	.drv  = {
> +		.name  = "apu_rpmsg",
> +	},
> +};
> +
> +static int __init apu_rpmsg_init(void)
> +{
> +	return register_rpmsg_driver(&apu_rpmsg_driver);
> +}
> +arch_initcall(apu_rpmsg_init);
> +
> +static void __exit apu_rpmsg_exit(void)
> +{
> +	unregister_rpmsg_driver(&apu_rpmsg_driver);
> +}
> +module_exit(apu_rpmsg_exit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("APU RPMSG driver");
> -- 
> 2.31.1
> 

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 3/4] rpmsg: Add support of AI Processor Unit (APU)
@ 2021-09-23  3:31     ` Bjorn Andersson
  0 siblings, 0 replies; 34+ messages in thread
From: Bjorn Andersson @ 2021-09-23  3:31 UTC (permalink / raw)
  To: Alexandre Bailon
  Cc: airlied, daniel, robh+dt, matthias.bgg, maarten.lankhorst,
	mripard, tzimmermann, ohad, mathieu.poirier, sumit.semwal,
	christian.koenig, dri-devel, devicetree, linux-arm-kernel,
	linux-mediatek, linux-kernel, linux-remoteproc, linux-media,
	linaro-mm-sig, khilman, gpain

On Fri 17 Sep 07:59 CDT 2021, Alexandre Bailon wrote:

> Some Mediatek SoC provides hardware accelerator for AI / ML.
> This driver use the DRM driver to manage the shared memory,
> and use rpmsg to execute jobs on the APU.
> 
> Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
> ---
>  drivers/rpmsg/Kconfig     |  10 +++
>  drivers/rpmsg/Makefile    |   1 +
>  drivers/rpmsg/apu_rpmsg.c | 184 ++++++++++++++++++++++++++++++++++++++
>  3 files changed, 195 insertions(+)
>  create mode 100644 drivers/rpmsg/apu_rpmsg.c
> 
> diff --git a/drivers/rpmsg/Kconfig b/drivers/rpmsg/Kconfig
> index 0b4407abdf138..fc1668f795004 100644
> --- a/drivers/rpmsg/Kconfig
> +++ b/drivers/rpmsg/Kconfig
> @@ -73,4 +73,14 @@ config RPMSG_VIRTIO
>  	select RPMSG_NS
>  	select VIRTIO
>  
> +config RPMSG_APU
> +	tristate "APU RPMSG driver"
> +	select REMOTEPROC
> +	select RPMSG_VIRTIO
> +	select DRM_APU
> +	help
> +	  This provides a RPMSG driver that provides some facilities to
> +	  communicate with an accelerated processing unit (APU).
> +	  This Uses the APU DRM driver to manage memory and job scheduling.

Similar to how a driver for e.g. an I2C device doesn't live in
drivers/i2c, this doesn't belong in drivers/rpmsg. Probably rather
directly in the DRM driver.

> +
>  endmenu
> diff --git a/drivers/rpmsg/Makefile b/drivers/rpmsg/Makefile
> index 8d452656f0ee3..8b336b9a817c1 100644
> --- a/drivers/rpmsg/Makefile
> +++ b/drivers/rpmsg/Makefile
> @@ -9,3 +9,4 @@ obj-$(CONFIG_RPMSG_QCOM_GLINK_RPM) += qcom_glink_rpm.o
>  obj-$(CONFIG_RPMSG_QCOM_GLINK_SMEM) += qcom_glink_smem.o
>  obj-$(CONFIG_RPMSG_QCOM_SMD)	+= qcom_smd.o
>  obj-$(CONFIG_RPMSG_VIRTIO)	+= virtio_rpmsg_bus.o
> +obj-$(CONFIG_RPMSG_APU)		+= apu_rpmsg.o
> diff --git a/drivers/rpmsg/apu_rpmsg.c b/drivers/rpmsg/apu_rpmsg.c
> new file mode 100644
> index 0000000000000..7e504bd176a4d
> --- /dev/null
> +++ b/drivers/rpmsg/apu_rpmsg.c
> @@ -0,0 +1,184 @@
> +// SPDX-License-Identifier: GPL-2.0
> +//
> +// Copyright 2020 BayLibre SAS
> +
> +#include <asm/cacheflush.h>
> +
> +#include <linux/cdev.h>
> +#include <linux/dma-buf.h>
> +#include <linux/dma-map-ops.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/iommu.h>
> +#include <linux/iova.h>
> +#include <linux/mm.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/platform_device.h>
> +#include <linux/remoteproc.h>
> +#include <linux/rpmsg.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +
> +#include <drm/apu_drm.h>
> +
> +#include "rpmsg_internal.h"
> +
> +#define APU_RPMSG_SERVICE_MT8183 "rpmsg-mt8183-apu0"
> +
> +struct rpmsg_apu {
> +	struct apu_core *core;
> +	struct rpmsg_device *rpdev;
> +};
> +
> +static int apu_rpmsg_callback(struct rpmsg_device *rpdev, void *data, int count,
> +			      void *priv, u32 addr)
> +{
> +	struct rpmsg_apu *apu = dev_get_drvdata(&rpdev->dev);
> +	struct apu_core *apu_core = apu->core;
> +
> +	return apu_drm_callback(apu_core, data, count);
> +}
> +
> +static int apu_rpmsg_send(struct apu_core *apu_core, void *data, int len)
> +{
> +	struct rpmsg_apu *apu = apu_drm_priv(apu_core);
> +	struct rpmsg_device *rpdev = apu->rpdev;
> +
> +	return rpmsg_send(rpdev->ept, data, len);

The rpmsg API is exposed outside drivers/rpmsg, so as I said above, just
implement this directly in your driver, no need to lug around a dummy
wrapper for things like this.

> +}
> +
> +static struct apu_drm_ops apu_rpmsg_ops = {
> +	.send = apu_rpmsg_send,
> +};
> +
> +static int apu_init_iovad(struct rproc *rproc, struct rpmsg_apu *apu)
> +{
> +	struct resource_table *table;
> +	struct fw_rsc_carveout *rsc;
> +	int i;
> +
> +	if (!rproc->table_ptr) {
> +		dev_err(&apu->rpdev->dev,
> +			"No resource_table: has the firmware been loaded ?\n");
> +		return -ENODEV;
> +	}
> +
> +	table = rproc->table_ptr;
> +	for (i = 0; i < table->num; i++) {
> +		int offset = table->offset[i];
> +		struct fw_rsc_hdr *hdr = (void *)table + offset;
> +
> +		if (hdr->type != RSC_CARVEOUT)
> +			continue;
> +
> +		rsc = (void *)hdr + sizeof(*hdr);
> +		if (apu_drm_reserve_iova(apu->core, rsc->da, rsc->len)) {
> +			dev_err(&apu->rpdev->dev,
> +				"failed to reserve iova\n");
> +			return -ENOMEM;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static struct rproc *apu_get_rproc(struct rpmsg_device *rpdev)
> +{
> +	/*
> +	 * To work, the APU RPMsg driver need to get the rproc device.
> +	 * Currently, we only use virtio so we could use that to find the
> +	 * remoteproc parent.
> +	 */
> +	if (!rpdev->dev.parent && rpdev->dev.parent->bus) {
> +		dev_err(&rpdev->dev, "invalid rpmsg device\n");
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	if (strcmp(rpdev->dev.parent->bus->name, "virtio")) {
> +		dev_err(&rpdev->dev, "unsupported bus\n");
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	return vdev_to_rproc(dev_to_virtio(rpdev->dev.parent));
> +}
> +
> +static int apu_rpmsg_probe(struct rpmsg_device *rpdev)
> +{
> +	struct rpmsg_apu *apu;
> +	struct rproc *rproc;
> +	int ret;
> +
> +	apu = devm_kzalloc(&rpdev->dev, sizeof(*apu), GFP_KERNEL);
> +	if (!apu)
> +		return -ENOMEM;
> +	apu->rpdev = rpdev;
> +
> +	rproc = apu_get_rproc(rpdev);

I believe that you can replace apu_get_rproc() with:

	rproc = rproc_get_by_child(&rpdev->dev);

> +	if (IS_ERR_OR_NULL(rproc))
> +		return PTR_ERR(rproc);
> +
> +	/* Make device dma capable by inheriting from parent's capabilities */
> +	set_dma_ops(&rpdev->dev, get_dma_ops(rproc->dev.parent));
> +
> +	ret = dma_coerce_mask_and_coherent(&rpdev->dev,
> +					   dma_get_mask(rproc->dev.parent));
> +	if (ret)
> +		goto err_put_device;
> +
> +	rpdev->dev.iommu_group = rproc->dev.parent->iommu_group;

Would it be better or you if we have a device_node, so that you could
specify the iommus property for this compute device?

I'm asking because I've seen cases where multi-purpose remoteproc
firmware operate using multiple different iommu streams...

> +
> +	apu->core = apu_drm_register_core(rproc, &apu_rpmsg_ops, apu);
> +	if (!apu->core) {
> +		ret = -ENODEV;
> +		goto err_put_device;
> +	}
> +
> +	ret = apu_init_iovad(rproc, apu);
> +
> +	dev_set_drvdata(&rpdev->dev, apu);
> +
> +	return ret;
> +
> +err_put_device:

This label looks misplaced, and sure enough, if apu_init_iovad() fails
you're not apu_drm_unregister_core().

But on that note, don't you want to apu_init_iovad() before you
apu_drm_register_core()?

> +	devm_kfree(&rpdev->dev, apu);

The reason for using devm_kzalloc() is that once you return
unsuccessfully from probe, or from remove the memory is freed.

So devm_kfree() should go in both cases.

> +
> +	return ret;
> +}
> +
> +static void apu_rpmsg_remove(struct rpmsg_device *rpdev)
> +{
> +	struct rpmsg_apu *apu = dev_get_drvdata(&rpdev->dev);
> +
> +	apu_drm_unregister_core(apu);
> +	devm_kfree(&rpdev->dev, apu);

No need to explicitly free devm resources.

Regards,
Bjorn

> +}
> +
> +static const struct rpmsg_device_id apu_rpmsg_match[] = {
> +	{ APU_RPMSG_SERVICE_MT8183 },
> +	{}
> +};
> +
> +static struct rpmsg_driver apu_rpmsg_driver = {
> +	.probe = apu_rpmsg_probe,
> +	.remove = apu_rpmsg_remove,
> +	.callback = apu_rpmsg_callback,
> +	.id_table = apu_rpmsg_match,
> +	.drv  = {
> +		.name  = "apu_rpmsg",
> +	},
> +};
> +
> +static int __init apu_rpmsg_init(void)
> +{
> +	return register_rpmsg_driver(&apu_rpmsg_driver);
> +}
> +arch_initcall(apu_rpmsg_init);
> +
> +static void __exit apu_rpmsg_exit(void)
> +{
> +	unregister_rpmsg_driver(&apu_rpmsg_driver);
> +}
> +module_exit(apu_rpmsg_exit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("APU RPMSG driver");
> -- 
> 2.31.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 2/4] DRM: Add support of AI Processor Unit (APU)
  2021-09-23  0:58     ` Dave Airlie
  (?)
  (?)
@ 2021-09-23  6:17       ` Christian König
  -1 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2021-09-23  6:17 UTC (permalink / raw)
  To: Dave Airlie, Alexandre Bailon
  Cc: Dave Airlie, Daniel Vetter, Rob Herring, Matthias Brugger,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, ohad,
	bjorn.andersson, Mathieu Poirier, Sumit Semwal, dri-devel,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	linux-arm-kernel, moderated list:ARM/Mediatek SoC support, LKML,
	linux-remoteproc, Linux Media Mailing List,
	moderated list:DMA BUFFER SHARING FRAMEWORK, khilman, gpain

Am 23.09.21 um 02:58 schrieb Dave Airlie:
> On Sat, 18 Sept 2021 at 07:57, Alexandre Bailon <abailon@baylibre.com> wrote:
>> Some Mediatek SoC provides hardware accelerator for AI / ML.
>> This driver provides the infrastructure to manage memory
>> shared between host CPU and the accelerator, and to submit
>> jobs to the accelerator.
>> The APU itself is managed by remoteproc so this drivers
>> relies on remoteproc to found the APU and get some important data
>> from it. But, the driver is quite generic and it should possible
>> to manage accelerator using another ways.
>> This driver doesn't manage itself the data transmitions.
>> It must be registered by another driver implementing the transmitions.
>>
>> Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
>> [SNIP]

>> Please refer to
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel.org%2Fdoc%2FDocumentation%2Fioctl%2Fbotching-up-ioctls.rst&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C53a0ef2630404ddc4d9408d97e2d409c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679555123878415%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=6oVXAAOjQX%2FnDzJxZIAALqjDourHdrdGF6QVQKR58KI%3D&amp;reserved=0
>>
>> here and below in many places.
>>
>> There's a lot of missing padding/alignment here.

There is also the pahole utility which show you nicely where you need 
padding for your IOCTL structures.

For example "pahole drivers/gpu/drm/amd/amdgpu/amdgpu.ko -C 
drm_amdgpu_gem_va" gives you:

struct drm_amdgpu_gem_va {
     __u32                      handle;               /*     0     4 */
     __u32                      _pad;                 /*     4     4 */
     __u32                      operation;            /*     8     4 */
     __u32                      flags;                /*    12     4 */
     __u64                      va_address;           /*    16     8 */
     __u64                      offset_in_bo;         /*    24     8 */
     __u64                      map_size;             /*    32     8 */

     /* size: 40, cachelines: 1, members: 7 */
     /* last cacheline: 40 bytes */
};

And as you can see we have added the _pad field to our IOCTL parameter 
structure to properly align the 64bit members.

Regards,
Christian.

>>
>> I'm trying to find the time to review this stack in full, any writeups
>> on how this is used from userspace would be useful (not just the code
>> repo, but some sort of how do I get at it) it reads as kinda generic
>> (calling it apu), but then has some specifics around device binding.
>>
>> Dave.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 2/4] DRM: Add support of AI Processor Unit (APU)
@ 2021-09-23  6:17       ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2021-09-23  6:17 UTC (permalink / raw)
  To: Dave Airlie, Alexandre Bailon
  Cc: Dave Airlie, Daniel Vetter, Rob Herring, Matthias Brugger,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, ohad,
	bjorn.andersson, Mathieu Poirier, Sumit Semwal, dri-devel,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	linux-arm-kernel, moderated list:ARM/Mediatek SoC support, LKML,
	linux-remoteproc, Linux Media Mailing List,
	moderated list:DMA BUFFER SHARING FRAMEWORK, khilman, gpain

Am 23.09.21 um 02:58 schrieb Dave Airlie:
> On Sat, 18 Sept 2021 at 07:57, Alexandre Bailon <abailon@baylibre.com> wrote:
>> Some Mediatek SoC provides hardware accelerator for AI / ML.
>> This driver provides the infrastructure to manage memory
>> shared between host CPU and the accelerator, and to submit
>> jobs to the accelerator.
>> The APU itself is managed by remoteproc so this drivers
>> relies on remoteproc to found the APU and get some important data
>> from it. But, the driver is quite generic and it should possible
>> to manage accelerator using another ways.
>> This driver doesn't manage itself the data transmitions.
>> It must be registered by another driver implementing the transmitions.
>>
>> Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
>> [SNIP]

>> Please refer to
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel.org%2Fdoc%2FDocumentation%2Fioctl%2Fbotching-up-ioctls.rst&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C53a0ef2630404ddc4d9408d97e2d409c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679555123878415%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=6oVXAAOjQX%2FnDzJxZIAALqjDourHdrdGF6QVQKR58KI%3D&amp;reserved=0
>>
>> here and below in many places.
>>
>> There's a lot of missing padding/alignment here.

There is also the pahole utility which show you nicely where you need 
padding for your IOCTL structures.

For example "pahole drivers/gpu/drm/amd/amdgpu/amdgpu.ko -C 
drm_amdgpu_gem_va" gives you:

struct drm_amdgpu_gem_va {
     __u32                      handle;               /*     0     4 */
     __u32                      _pad;                 /*     4     4 */
     __u32                      operation;            /*     8     4 */
     __u32                      flags;                /*    12     4 */
     __u64                      va_address;           /*    16     8 */
     __u64                      offset_in_bo;         /*    24     8 */
     __u64                      map_size;             /*    32     8 */

     /* size: 40, cachelines: 1, members: 7 */
     /* last cacheline: 40 bytes */
};

And as you can see we have added the _pad field to our IOCTL parameter 
structure to properly align the 64bit members.

Regards,
Christian.

>>
>> I'm trying to find the time to review this stack in full, any writeups
>> on how this is used from userspace would be useful (not just the code
>> repo, but some sort of how do I get at it) it reads as kinda generic
>> (calling it apu), but then has some specifics around device binding.
>>
>> Dave.


_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 2/4] DRM: Add support of AI Processor Unit (APU)
@ 2021-09-23  6:17       ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2021-09-23  6:17 UTC (permalink / raw)
  To: Dave Airlie, Alexandre Bailon
  Cc: Dave Airlie, Daniel Vetter, Rob Herring, Matthias Brugger,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, ohad,
	bjorn.andersson, Mathieu Poirier, Sumit Semwal, dri-devel,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	linux-arm-kernel, moderated list:ARM/Mediatek SoC support, LKML,
	linux-remoteproc, Linux Media Mailing List,
	moderated list:DMA BUFFER SHARING FRAMEWORK, khilman, gpain

Am 23.09.21 um 02:58 schrieb Dave Airlie:
> On Sat, 18 Sept 2021 at 07:57, Alexandre Bailon <abailon@baylibre.com> wrote:
>> Some Mediatek SoC provides hardware accelerator for AI / ML.
>> This driver provides the infrastructure to manage memory
>> shared between host CPU and the accelerator, and to submit
>> jobs to the accelerator.
>> The APU itself is managed by remoteproc so this drivers
>> relies on remoteproc to found the APU and get some important data
>> from it. But, the driver is quite generic and it should possible
>> to manage accelerator using another ways.
>> This driver doesn't manage itself the data transmitions.
>> It must be registered by another driver implementing the transmitions.
>>
>> Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
>> [SNIP]

>> Please refer to
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel.org%2Fdoc%2FDocumentation%2Fioctl%2Fbotching-up-ioctls.rst&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C53a0ef2630404ddc4d9408d97e2d409c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679555123878415%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=6oVXAAOjQX%2FnDzJxZIAALqjDourHdrdGF6QVQKR58KI%3D&amp;reserved=0
>>
>> here and below in many places.
>>
>> There's a lot of missing padding/alignment here.

There is also the pahole utility which show you nicely where you need 
padding for your IOCTL structures.

For example "pahole drivers/gpu/drm/amd/amdgpu/amdgpu.ko -C 
drm_amdgpu_gem_va" gives you:

struct drm_amdgpu_gem_va {
     __u32                      handle;               /*     0     4 */
     __u32                      _pad;                 /*     4     4 */
     __u32                      operation;            /*     8     4 */
     __u32                      flags;                /*    12     4 */
     __u64                      va_address;           /*    16     8 */
     __u64                      offset_in_bo;         /*    24     8 */
     __u64                      map_size;             /*    32     8 */

     /* size: 40, cachelines: 1, members: 7 */
     /* last cacheline: 40 bytes */
};

And as you can see we have added the _pad field to our IOCTL parameter 
structure to properly align the 64bit members.

Regards,
Christian.

>>
>> I'm trying to find the time to review this stack in full, any writeups
>> on how this is used from userspace would be useful (not just the code
>> repo, but some sort of how do I get at it) it reads as kinda generic
>> (calling it apu), but then has some specifics around device binding.
>>
>> Dave.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC PATCH 2/4] DRM: Add support of AI Processor Unit (APU)
@ 2021-09-23  6:17       ` Christian König
  0 siblings, 0 replies; 34+ messages in thread
From: Christian König @ 2021-09-23  6:17 UTC (permalink / raw)
  To: Dave Airlie, Alexandre Bailon
  Cc: Dave Airlie, Daniel Vetter, Rob Herring, Matthias Brugger,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, ohad,
	bjorn.andersson, Mathieu Poirier, Sumit Semwal, dri-devel,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	linux-arm-kernel, moderated list:ARM/Mediatek SoC support, LKML,
	linux-remoteproc, Linux Media Mailing List,
	moderated list:DMA BUFFER SHARING FRAMEWORK, khilman, gpain

Am 23.09.21 um 02:58 schrieb Dave Airlie:
> On Sat, 18 Sept 2021 at 07:57, Alexandre Bailon <abailon@baylibre.com> wrote:
>> Some Mediatek SoC provides hardware accelerator for AI / ML.
>> This driver provides the infrastructure to manage memory
>> shared between host CPU and the accelerator, and to submit
>> jobs to the accelerator.
>> The APU itself is managed by remoteproc so this drivers
>> relies on remoteproc to found the APU and get some important data
>> from it. But, the driver is quite generic and it should possible
>> to manage accelerator using another ways.
>> This driver doesn't manage itself the data transmitions.
>> It must be registered by another driver implementing the transmitions.
>>
>> Signed-off-by: Alexandre Bailon <abailon@baylibre.com>
>> [SNIP]

>> Please refer to
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.kernel.org%2Fdoc%2FDocumentation%2Fioctl%2Fbotching-up-ioctls.rst&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C53a0ef2630404ddc4d9408d97e2d409c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637679555123878415%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=6oVXAAOjQX%2FnDzJxZIAALqjDourHdrdGF6QVQKR58KI%3D&amp;reserved=0
>>
>> here and below in many places.
>>
>> There's a lot of missing padding/alignment here.

There is also the pahole utility which show you nicely where you need 
padding for your IOCTL structures.

For example "pahole drivers/gpu/drm/amd/amdgpu/amdgpu.ko -C 
drm_amdgpu_gem_va" gives you:

struct drm_amdgpu_gem_va {
     __u32                      handle;               /*     0     4 */
     __u32                      _pad;                 /*     4     4 */
     __u32                      operation;            /*     8     4 */
     __u32                      flags;                /*    12     4 */
     __u64                      va_address;           /*    16     8 */
     __u64                      offset_in_bo;         /*    24     8 */
     __u64                      map_size;             /*    32     8 */

     /* size: 40, cachelines: 1, members: 7 */
     /* last cacheline: 40 bytes */
};

And as you can see we have added the _pad field to our IOCTL parameter 
structure to properly align the 64bit members.

Regards,
Christian.

>>
>> I'm trying to find the time to review this stack in full, any writeups
>> on how this is used from userspace would be useful (not just the code
>> repo, but some sort of how do I get at it) it reads as kinda generic
>> (calling it apu), but then has some specifics around device binding.
>>
>> Dave.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2021-09-23  6:19 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-17 12:59 [RFC PATCH 0/4] Add a DRM driver to support AI Processing Unit (APU) Alexandre Bailon
2021-09-17 12:59 ` Alexandre Bailon
2021-09-17 12:59 ` Alexandre Bailon
2021-09-17 12:59 ` Alexandre Bailon
2021-09-17 12:59 ` [RFC PATCH 1/4] dt-bindings: Add bidings for mtk,apu-drm Alexandre Bailon
2021-09-17 12:59   ` Alexandre Bailon
2021-09-17 12:59   ` Alexandre Bailon
2021-09-17 19:48   ` Rob Herring
2021-09-17 19:48     ` Rob Herring
2021-09-17 19:48     ` Rob Herring
2021-09-20 20:55   ` Rob Herring
2021-09-20 20:55     ` Rob Herring
2021-09-20 20:55     ` Rob Herring
2021-09-17 12:59 ` [RFC PATCH 2/4] DRM: Add support of AI Processor Unit (APU) Alexandre Bailon
2021-09-17 12:59   ` Alexandre Bailon
2021-09-17 12:59   ` Alexandre Bailon
2021-09-19  3:19   ` Hillf Danton
2021-09-23  0:58   ` Dave Airlie
2021-09-23  0:58     ` Dave Airlie
2021-09-23  0:58     ` Dave Airlie
2021-09-23  0:58     ` Dave Airlie
2021-09-23  6:17     ` Christian König
2021-09-23  6:17       ` Christian König
2021-09-23  6:17       ` Christian König
2021-09-23  6:17       ` Christian König
2021-09-17 12:59 ` [RFC PATCH 3/4] rpmsg: " Alexandre Bailon
2021-09-17 12:59   ` Alexandre Bailon
2021-09-17 12:59   ` Alexandre Bailon
2021-09-23  3:31   ` Bjorn Andersson
2021-09-23  3:31     ` Bjorn Andersson
2021-09-23  3:31     ` Bjorn Andersson
2021-09-17 12:59 ` [RFC PATCH 4/4] ARM64: mt8183-pumpkin: Add the APU DRM device Alexandre Bailon
2021-09-17 12:59   ` Alexandre Bailon
2021-09-17 12:59   ` Alexandre Bailon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.