This is another iteration to support split pagetables for Adreno GPUs as part of an incremental process to enable per-context pagetables. In order to support per-context pagetables the GPU needs to enable split pagetables so that we can store the global buffers in the TTBR1 space leaving the GPU free to program the TTBR0 register with the page address of a context specific pt. Previous revisions of this series can be found at [1] and [2]. This iteration is built on top of the arm-smmu-impl and arm-smmu-v2 rework code from Robin Murphy [3] and [4]. This code is based on the realization that when split pagetables are enabled the configuration for the T1 address space is identical to that of the T0 space, so we can just take the TCR configuration provided by io-pgtable, duplicate it and shift it by 16 bits. Since the current split pagetable implementation is specific to the Adreno GPUs we can also take a small shortcut and only allow split pagetables for SMMUs with a 49 bit upstream bus which allows us to use the default configuration for the sign extension bit and we can avoid a lot of extra code to handle different upstream bus sizes that will never get used. The first patch implements the split pagetable support for arm-smmu-v2. The second adds a SMMU model for the Adreno GPU SMMU and enables the split pagetables if conditions warrant. The 3rd and 4th patches add a domain attribute to query the status of split pagetables. The remaining patches modify drm/msm slightly to allow a6xx targets to recognize if split pagetables are enabled and adjust the address space accordingly. This series only includes support for split pagetables because I wanted to get this out for discussion and I haven't ported over the aux domain code to this kernel version, but I don't suspect it will end up being much different than previous versions [5]. [1] https://patchwork.freedesktop.org/series/63403/ [2] https://patchwork.freedesktop.org/series/64874/ [3] https://lists.linuxfoundation.org/pipermail/iommu/2019-August/037905.html [4] https://lists.linuxfoundation.org/pipermail/iommu/2019-August/038244.html [5] https://patchwork.freedesktop.org/patch/307601/ Jordan Crouse (7): iommu/arm-smmu: Support split pagetables dt-bindings: arm-smmu: Add Adreno GPU variant iommu/arm-smmu: Add a SMMU variant for the Adreno GPU iommu: Add DOMAIN_ATTR_SPLIT_TABLES iommu/arm-smmu: Support DOMAIN_ATTR_SPLIT_TABLES drm/msm: Create the msm_mmu object independently from the address space drm/msm: Use per-target functions to set up address spaces .../devicetree/bindings/iommu/arm,smmu.txt | 7 +++ drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 28 +++++++++++ drivers/gpu/drm/msm/adreno/a3xx_gpu.c | 1 + drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 1 + drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 1 + drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 56 ++++++++++++++++++++++ drivers/gpu/drm/msm/adreno/adreno_gpu.c | 43 ++++++++++++++--- drivers/gpu/drm/msm/adreno/adreno_gpu.h | 5 ++ drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 16 ++++--- drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 16 ++++--- drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c | 4 -- drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 13 ++++- drivers/gpu/drm/msm/msm_drv.h | 8 +--- drivers/gpu/drm/msm/msm_gem_vma.c | 30 ++---------- drivers/gpu/drm/msm/msm_gpu.c | 51 ++------------------ drivers/gpu/drm/msm/msm_gpu.h | 4 +- drivers/iommu/arm-smmu-impl.c | 15 ++++++ drivers/iommu/arm-smmu.c | 46 ++++++++++++++++-- drivers/iommu/arm-smmu.h | 2 + include/linux/iommu.h | 1 + 20 files changed, 237 insertions(+), 111 deletions(-) -- 2.7.4
Support a split pagetable format for SMMU models that support it. If enabled, mirror the TTBR0 setup for TTBR1 and program the pagetable address in TTBR1 instead of TTBR0. For now only allow split pagetables for ARM64 stage 1 IOMMUs with 49 bit upstream buses. This is the only real-life use case for split pagetables on arm-smmu-v2 to date and it is the easiest configuration to support without a bunch of extra logic. Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> --- drivers/iommu/arm-smmu.c | 41 +++++++++++++++++++++++++++++++++++++---- drivers/iommu/arm-smmu.h | 1 + 2 files changed, 38 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 49c734a..39e81ef 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -461,7 +461,17 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain, cb->tcr[0] = pgtbl_cfg->arm_v7s_cfg.tcr; } else { cb->tcr[0] = pgtbl_cfg->arm_lpae_s1_cfg.tcr; - cb->tcr[0] |= TCR_EPD1; + + /* + * For split pagetables, duplicate the T0 configuration + * for T1. Otherwise, disable walks through TTBR1 + */ + if (smmu_domain->split_pagetables) + cb->tcr[0] |= (pgtbl_cfg->arm_lpae_s1_cfg.tcr & + 0xffff) << 16; + else + cb->tcr[0] |= TCR_EPD1; + cb->tcr[1] = pgtbl_cfg->arm_lpae_s1_cfg.tcr >> 32; cb->tcr[1] |= FIELD_PREP(TCR2_SEP, TCR2_SEP_UPSTREAM); if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) @@ -477,9 +487,16 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain, cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr; cb->ttbr[1] = 0; } else { - cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr; + if (smmu_domain->split_pagetables) { + cb->ttbr[0] = 0; + cb->ttbr[1] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr; + } else { + cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr; + cb->ttbr[1] = 0; + } + cb->ttbr[0] |= FIELD_PREP(TTBRn_ASID, cfg->asid); - cb->ttbr[1] = FIELD_PREP(TTBRn_ASID, cfg->asid); + cb->ttbr[1] |= FIELD_PREP(TTBRn_ASID, cfg->asid); } } else { cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr; @@ -720,6 +737,14 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, goto out_unlock; } + /* + * For now, only support a ias of 48 and SEP_UPSTREAM for split + * pagetables. This doesn't preclude using other sign extension bits but + * since the group of split-pagetable users is very small we don't want + * to add a lot of extra code that won't be useful + */ + WARN_ON(smmu_domain->split_pagetables && ias != 48); + pgtbl_cfg = (struct io_pgtable_cfg) { .pgsize_bitmap = smmu->pgsize_bitmap, .ias = ias, @@ -740,7 +765,15 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, /* Update the domain's page sizes to reflect the page table format */ domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap; - domain->geometry.aperture_end = (1UL << ias) - 1; + + if (smmu_domain->split_pagetables) { + domain->geometry.aperture_start = ~(1UL << ias); + domain->geometry.aperture_end = ~0UL; + } else { + domain->geometry.aperture_start = 0; + domain->geometry.aperture_end = (1UL << ias) - 1; + } + domain->geometry.force_aperture = true; /* Initialise the context bank with our page table cfg */ diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h index 7b0e4d2..91a4eb8 100644 --- a/drivers/iommu/arm-smmu.h +++ b/drivers/iommu/arm-smmu.h @@ -316,6 +316,7 @@ struct arm_smmu_domain { struct mutex init_mutex; /* Protects smmu pointer */ spinlock_t cb_lock; /* Serialises ATS1* ops and TLB syncs */ struct iommu_domain domain; + bool split_pagetables; }; -- 2.7.4
Add a compatible string to identify SMMUs that are attached to Adreno GPU devices that wish to support split pagetables. Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> --- Documentation/devicetree/bindings/iommu/arm,smmu.txt | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt index 3133f3b..3b07896 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt @@ -18,6 +18,7 @@ conditions. "arm,mmu-500" "cavium,smmu-v2" "qcom,smmu-v2" + "qcom,adreno-smmu-v2" depending on the particular implementation and/or the version of the architecture implemented. @@ -31,6 +32,12 @@ conditions. as below, SoC-specific compatibles: "qcom,sdm845-smmu-500", "arm,mmu-500" + "qcom,adreno-smmu-v2" is a special implementation for + SMMU devices attached to the Adreno GPU on Qcom devices. + If selected, this will enable split pagetable (TTBR1) + support. Only use this if the GPU target is capable of + supporting 64 bit addresses. + - reg : Base address and size of the SMMU. - #global-interrupts : The number of global interrupts exposed by the -- 2.7.4
Add a SMMU model for the Adreno GPU and use it to enable split pagetable support if the conditions are right. Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> --- drivers/iommu/arm-smmu-impl.c | 15 +++++++++++++++ drivers/iommu/arm-smmu.c | 2 ++ drivers/iommu/arm-smmu.h | 1 + 3 files changed, 18 insertions(+) diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c index 3f88cd0..5d197dd 100644 --- a/drivers/iommu/arm-smmu-impl.c +++ b/drivers/iommu/arm-smmu-impl.c @@ -147,6 +147,18 @@ static const struct arm_smmu_impl arm_mmu500_impl = { .reset = arm_mmu500_reset, }; +static int qcom_adreno_init_context(struct arm_smmu_domain *smmu_domain) +{ + if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1 && + smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64) + smmu_domain->split_pagetables = true; + + return 0; +} + +static const struct arm_smmu_impl qcom_adreno_impl = { + .init_context = qcom_adreno_init_context, +}; struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) { @@ -162,6 +174,9 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu) break; case CAVIUM_SMMUV2: return cavium_smmu_impl_init(smmu); + case QCOM_ADRENO_SMMUV2: + smmu->impl = &qcom_adreno_impl; + break; default: break; } diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 39e81ef..3f41cf7 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1858,6 +1858,7 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU); ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500); ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2); ARM_SMMU_MATCH_DATA(qcom_smmuv2, ARM_SMMU_V2, QCOM_SMMUV2); +ARM_SMMU_MATCH_DATA(qcom_adreno_smmuv2, ARM_SMMU_V2, QCOM_ADRENO_SMMUV2); static const struct of_device_id arm_smmu_of_match[] = { { .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 }, @@ -1867,6 +1868,7 @@ static const struct of_device_id arm_smmu_of_match[] = { { .compatible = "arm,mmu-500", .data = &arm_mmu500 }, { .compatible = "cavium,smmu-v2", .data = &cavium_smmuv2 }, { .compatible = "qcom,smmu-v2", .data = &qcom_smmuv2 }, + { .compatible = "qcom,adreno-smmu-v2", .data = &qcom_adreno_smmuv2 }, { }, }; diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h index 91a4eb8..e5a2cc8 100644 --- a/drivers/iommu/arm-smmu.h +++ b/drivers/iommu/arm-smmu.h @@ -222,6 +222,7 @@ enum arm_smmu_implementation { ARM_MMU500, CAVIUM_SMMUV2, QCOM_SMMUV2, + QCOM_ADRENO_SMMUV2, }; struct arm_smmu_device { -- 2.7.4
Add a new attribute to query the state of split pagetables for the domain. Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> --- include/linux/iommu.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index fdc355c..b06db6c 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -125,6 +125,7 @@ enum iommu_attr { DOMAIN_ATTR_FSL_PAMUV1, DOMAIN_ATTR_NESTING, /* two stages of translation */ DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE, + DOMAIN_ATTR_SPLIT_TABLES, DOMAIN_ATTR_MAX, }; -- 2.7.4
Support the DOMAIN_ATTR_SPLIT_TABLES attribute to let the leaf driver know if split pagetables are enabled for the domain. Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> --- drivers/iommu/arm-smmu.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 3f41cf7..6a512ff 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1442,6 +1442,9 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain, case DOMAIN_ATTR_NESTING: *(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED); return 0; + case DOMAIN_ATTR_SPLIT_TABLES: + *(int *)data = !!(smmu_domain->split_pagetables); + return 0; default: return -ENODEV; } -- 2.7.4
Instead of creating the msm_mmu object along with the address space initialize it separately and pass it into the address space create function. This gives us the flexibility of attaching the IOMMU device and querying it before creating the address space which will come in handy in the next patch that takes advantage of split pagetables if available. Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> --- drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c | 16 ++++++++++------ drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c | 16 ++++++++++------ drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c | 4 ---- drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c | 13 +++++++++++-- drivers/gpu/drm/msm/msm_drv.h | 8 ++------ drivers/gpu/drm/msm/msm_gem_vma.c | 30 +++--------------------------- drivers/gpu/drm/msm/msm_gpu.c | 19 ++++++++++++------- 7 files changed, 48 insertions(+), 58 deletions(-) diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c index bb9d44e..8bf2639 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c @@ -722,23 +722,27 @@ static int _dpu_kms_mmu_init(struct dpu_kms *dpu_kms) { struct iommu_domain *domain; struct msm_gem_address_space *aspace; + struct msm_mmu *mmu; int ret; domain = iommu_domain_alloc(&platform_bus_type); if (!domain) return 0; - domain->geometry.aperture_start = 0x1000; - domain->geometry.aperture_end = 0xffffffff; + mmu = msm_iommu_new(dpu_kms->dev->dev, domain); + if (IS_ERR(mmu)) { + iommu_domain_free(domain); + return PTR_ERR(mmu); + } - aspace = msm_gem_address_space_create(dpu_kms->dev->dev, - domain, "dpu1"); + aspace = msm_gem_address_space_create(mmu, "dpu1", + 0x1000, 0xffffffff); if (IS_ERR(aspace)) { - iommu_domain_free(domain); + mmu->funcs->destroy(mmu); return PTR_ERR(aspace); } - ret = aspace->mmu->funcs->attach(aspace->mmu, iommu_ports, + ret = mmu->funcs->attach(mmu, iommu_ports, ARRAY_SIZE(iommu_ports)); if (ret) { DPU_ERROR("failed to attach iommu %d\n", ret); diff --git a/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c b/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c index 7a9ab55..af5a7a4 100644 --- a/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c +++ b/drivers/gpu/drm/msm/disp/mdp4/mdp4_kms.c @@ -498,9 +498,17 @@ struct msm_kms *mdp4_kms_init(struct drm_device *dev) mdelay(16); if (config->iommu) { - aspace = msm_gem_address_space_create(&pdev->dev, - config->iommu, "mdp4"); + struct msm_mmu *mmu = msm_iommu_new(&pdev->dev, config->iommu); + + if (IS_ERR(mmu)) { + ret = PTR_ERR(mmu); + goto fail; + } + + aspace = msm_gem_address_space_create(mmu, "mdp4", + 0x1000, 0xfffffffff); if (IS_ERR(aspace)) { + mmu->funcs->destroy(mmu); ret = PTR_ERR(aspace); goto fail; } @@ -558,10 +566,6 @@ static struct mdp4_platform_config *mdp4_get_config(struct platform_device *dev) /* TODO: Chips that aren't apq8064 have a 200 Mhz max_clk */ config.max_clk = 266667000; config.iommu = iommu_domain_alloc(&platform_bus_type); - if (config.iommu) { - config.iommu->geometry.aperture_start = 0x1000; - config.iommu->geometry.aperture_end = 0xffffffff; - } return &config; } diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c index dd1daf0..23265f7 100644 --- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c +++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_cfg.c @@ -721,10 +721,6 @@ static struct mdp5_cfg_platform *mdp5_get_config(struct platform_device *dev) static struct mdp5_cfg_platform config = {}; config.iommu = iommu_domain_alloc(&platform_bus_type); - if (config.iommu) { - config.iommu->geometry.aperture_start = 0x1000; - config.iommu->geometry.aperture_end = 0xffffffff; - } return &config; } diff --git a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c index 4a60f5f..36115fd 100644 --- a/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c +++ b/drivers/gpu/drm/msm/disp/mdp5/mdp5_kms.c @@ -702,9 +702,18 @@ struct msm_kms *mdp5_kms_init(struct drm_device *dev) mdelay(16); if (config->platform.iommu) { - aspace = msm_gem_address_space_create(&pdev->dev, - config->platform.iommu, "mdp5"); + struct msm_mmu *mmu = msm_iommu_new(&pdev->dev, + config->platform.iommu); + + if (IS_ERR(mmu)) { + ret = PTR_ERR(mmu); + goto fail; + } + + aspace = msm_gem_address_space_create(mmu, "mdp5", + 0x1000, 0xffffffff); if (IS_ERR(aspace)) { + mmu->funcs->destroy(mmu); ret = PTR_ERR(aspace); goto fail; } diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h index ee7b512..c2502b2 100644 --- a/drivers/gpu/drm/msm/msm_drv.h +++ b/drivers/gpu/drm/msm/msm_drv.h @@ -244,12 +244,8 @@ void msm_gem_close_vma(struct msm_gem_address_space *aspace, void msm_gem_address_space_put(struct msm_gem_address_space *aspace); struct msm_gem_address_space * -msm_gem_address_space_create(struct device *dev, struct iommu_domain *domain, - const char *name); - -struct msm_gem_address_space * -msm_gem_address_space_create_a2xx(struct device *dev, struct msm_gpu *gpu, - const char *name, uint64_t va_start, uint64_t va_end); +msm_gem_address_space_create(struct msm_mmu *mmu, const char *name, + u64 va_start, u64 va_end); int msm_register_mmu(struct drm_device *dev, struct msm_mmu *mmu); void msm_unregister_mmu(struct drm_device *dev, struct msm_mmu *mmu); diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c b/drivers/gpu/drm/msm/msm_gem_vma.c index 1af5354..45d4a63 100644 --- a/drivers/gpu/drm/msm/msm_gem_vma.c +++ b/drivers/gpu/drm/msm/msm_gem_vma.c @@ -127,32 +127,8 @@ int msm_gem_init_vma(struct msm_gem_address_space *aspace, struct msm_gem_address_space * -msm_gem_address_space_create(struct device *dev, struct iommu_domain *domain, - const char *name) -{ - struct msm_gem_address_space *aspace; - u64 size = domain->geometry.aperture_end - - domain->geometry.aperture_start; - - aspace = kzalloc(sizeof(*aspace), GFP_KERNEL); - if (!aspace) - return ERR_PTR(-ENOMEM); - - spin_lock_init(&aspace->lock); - aspace->name = name; - aspace->mmu = msm_iommu_new(dev, domain); - - drm_mm_init(&aspace->mm, (domain->geometry.aperture_start >> PAGE_SHIFT), - size >> PAGE_SHIFT); - - kref_init(&aspace->kref); - - return aspace; -} - -struct msm_gem_address_space * -msm_gem_address_space_create_a2xx(struct device *dev, struct msm_gpu *gpu, - const char *name, uint64_t va_start, uint64_t va_end) +msm_gem_address_space_create(struct msm_mmu *mmu, const char *name, + u64 va_start, u64 va_end) { struct msm_gem_address_space *aspace; u64 size = va_end - va_start; @@ -163,7 +139,7 @@ msm_gem_address_space_create_a2xx(struct device *dev, struct msm_gpu *gpu, spin_lock_init(&aspace->lock); aspace->name = name; - aspace->mmu = msm_gpummu_new(dev, gpu); + aspace->mmu = mmu; drm_mm_init(&aspace->mm, (va_start >> PAGE_SHIFT), size >> PAGE_SHIFT); diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c index 4edb874..9271f39 100644 --- a/drivers/gpu/drm/msm/msm_gpu.c +++ b/drivers/gpu/drm/msm/msm_gpu.c @@ -806,6 +806,7 @@ msm_gpu_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev, uint64_t va_start, uint64_t va_end) { struct msm_gem_address_space *aspace; + struct msm_mmu *mmu; int ret; /* @@ -818,20 +819,24 @@ msm_gpu_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev, if (!iommu) return NULL; - iommu->geometry.aperture_start = va_start; - iommu->geometry.aperture_end = va_end; + mmu = msm_iommu_new(&pdev->dev, iommu); + if (IS_ERR(mmu)) { + iommu_domain_free(iommu); + return ERR_CAST(mmu); + } DRM_DEV_INFO(gpu->dev->dev, "%s: using IOMMU\n", gpu->name); - aspace = msm_gem_address_space_create(&pdev->dev, iommu, "gpu"); - if (IS_ERR(aspace)) - iommu_domain_free(iommu); } else { - aspace = msm_gem_address_space_create_a2xx(&pdev->dev, gpu, "gpu", - va_start, va_end); + mmu = msm_gpummu_new(&pdev->dev, gpu); + if (IS_ERR(mmu)) + return ERR_CAST(mmu); } + aspace = msm_gem_address_space_create(mmu, "gpu", va_start, va_end); if (IS_ERR(aspace)) { + mmu->funcs->destroy(mmu); + DRM_DEV_ERROR(gpu->dev->dev, "failed to init mmu: %ld\n", PTR_ERR(aspace)); return ERR_CAST(aspace); -- 2.7.4
Use a per-target function to set up the default address space for each GPU. This allows a6xx targets to set up the correct address range if split pagetables are enabled by the IOMMU device. This also gets rid of a misplaced bit of a2xx code in msm_gpu and returns it to where it belongs. Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> --- drivers/gpu/drm/msm/adreno/a2xx_gpu.c | 28 +++++++++++++++++ drivers/gpu/drm/msm/adreno/a3xx_gpu.c | 1 + drivers/gpu/drm/msm/adreno/a4xx_gpu.c | 1 + drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 1 + drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 56 +++++++++++++++++++++++++++++++++ drivers/gpu/drm/msm/adreno/adreno_gpu.c | 43 +++++++++++++++++++++---- drivers/gpu/drm/msm/adreno/adreno_gpu.h | 5 +++ drivers/gpu/drm/msm/msm_gpu.c | 56 ++------------------------------- drivers/gpu/drm/msm/msm_gpu.h | 4 +-- 9 files changed, 134 insertions(+), 61 deletions(-) diff --git a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c index 1f83bc1..c9c032e 100644 --- a/drivers/gpu/drm/msm/adreno/a2xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a2xx_gpu.c @@ -401,6 +401,33 @@ static struct msm_gpu_state *a2xx_gpu_state_get(struct msm_gpu *gpu) return state; } +static struct msm_gem_address_space * +a2xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev) +{ + struct msm_gem_address_space *aspace; + struct msm_mmu *mmu = msm_gpummu_new(&pdev->dev, gpu); + int ret; + + if (IS_ERR(mmu)) + return ERR_CAST(mmu); + + ret = mmu->funcs->attach(mmu, NULL, 0); + if (ret) { + mmu->funcs->destroy(mmu); + return ERR_PTR(ret); + } + + aspace = msm_gem_address_space_create(mmu, "gpu", + SZ_16M, SZ_16M + 0xfff * SZ_64K); + + if (IS_ERR(aspace)) { + mmu->funcs->detach(mmu, NULL, 0); + mmu->funcs->destroy(mmu); + } + + return aspace; +} + /* Register offset defines for A2XX - copy of A3XX */ static const unsigned int a2xx_register_offsets[REG_ADRENO_REGISTER_MAX] = { REG_ADRENO_DEFINE(REG_ADRENO_CP_RB_BASE, REG_AXXX_CP_RB_BASE), @@ -429,6 +456,7 @@ static const struct adreno_gpu_funcs funcs = { #endif .gpu_state_get = a2xx_gpu_state_get, .gpu_state_put = adreno_gpu_state_put, + .create_address_space = a2xx_create_address_space, }, }; diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c index 5f7e980..f24dc21 100644 --- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c @@ -448,6 +448,7 @@ static const struct adreno_gpu_funcs funcs = { #endif .gpu_state_get = a3xx_gpu_state_get, .gpu_state_put = adreno_gpu_state_put, + .create_address_space = adreno_gpu_create_address_space, }, }; diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c index ab2b752..08f4292 100644 --- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c @@ -538,6 +538,7 @@ static const struct adreno_gpu_funcs funcs = { #endif .gpu_state_get = a4xx_gpu_state_get, .gpu_state_put = adreno_gpu_state_put, + .create_address_space = adreno_gpu_create_address_space, }, .get_timestamp = a4xx_get_timestamp, }; diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c index 1671db4..e29fea5 100644 --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c @@ -1385,6 +1385,7 @@ static const struct adreno_gpu_funcs funcs = { .gpu_busy = a5xx_gpu_busy, .gpu_state_get = a5xx_gpu_state_get, .gpu_state_put = a5xx_gpu_state_put, + .create_address_space = adreno_gpu_create_address_space, }, .get_timestamp = a5xx_get_timestamp, }; diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index be39cf0..3092426 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -810,6 +810,61 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu) return (unsigned long)busy_time; } +static struct msm_gem_address_space * +a6xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev) +{ + struct msm_gem_address_space *aspace; + struct iommu_domain *iommu; + struct msm_mmu *mmu; + int ret, val = 0; + u64 start, end; + + iommu = iommu_domain_alloc(&platform_bus_type); + if (!iommu) + return NULL; + + mmu = msm_iommu_new(&pdev->dev, iommu); + if (IS_ERR(mmu)) { + iommu_domain_free(iommu); + return ERR_CAST(mmu); + } + + ret = mmu->funcs->attach(mmu, NULL, 0); + if (ret) { + mmu->funcs->destroy(mmu); + return ERR_PTR(ret); + } + + iommu_domain_get_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, &val); + + /* + * If split pagetables are enabled our virtual address range will start + * at 0xfff0000000000000 and we don't need to worry about a hole for the + * GMEM. + */ + if (val) + start = iommu->geometry.aperture_start; + else + start = SZ_16M; + + /* + * Regardless of the start, always take advantage of the entire + * available space + */ + end = iommu->geometry.aperture_end; + + DRM_DEV_INFO(gpu->dev->dev, "%s: using IOMMU\n", gpu->name); + + aspace = msm_gem_address_space_create(mmu, "gpu", start, end); + if (IS_ERR(aspace)) { + mmu->funcs->detach(mmu, NULL, 0); + mmu->funcs->destroy(mmu); + } + + return aspace; +} + + static const struct adreno_gpu_funcs funcs = { .base = { .get_param = adreno_get_param, @@ -832,6 +887,7 @@ static const struct adreno_gpu_funcs funcs = { .gpu_state_get = a6xx_gpu_state_get, .gpu_state_put = a6xx_gpu_state_put, #endif + .create_address_space = a6xx_create_address_space, }, .get_timestamp = a6xx_get_timestamp, }; diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c index 9acbbc0..6edcd2c 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c @@ -18,6 +18,43 @@ #include "msm_gem.h" #include "msm_mmu.h" +/* Helper function for GPU targets that use arm-smmu but not split pagetables */ +struct msm_gem_address_space * +adreno_gpu_create_address_space(struct msm_gpu *gpu, + struct platform_device *pdev) +{ + struct msm_gem_address_space *aspace; + struct iommu_domain *iommu; + struct msm_mmu *mmu; + int ret; + + iommu = iommu_domain_alloc(&platform_bus_type); + if (!iommu) + return NULL; + + mmu = msm_iommu_new(&pdev->dev, iommu); + if (IS_ERR(mmu)) { + iommu_domain_free(iommu); + return ERR_CAST(mmu); + } + + ret = mmu->funcs->attach(mmu, NULL, 0); + if (ret) { + mmu->funcs->destroy(mmu); + return ERR_PTR(ret); + } + + DRM_DEV_INFO(gpu->dev->dev, "%s: using IOMMU\n", gpu->name); + + aspace = msm_gem_address_space_create(mmu, "gpu", SZ_16M, 0xffffffff); + if (IS_ERR(aspace)) { + mmu->funcs->detach(mmu, NULL, 0); + mmu->funcs->destroy(mmu); + } + + return aspace; +} + static bool zap_available = true; static int zap_shader_load_mdt(struct msm_gpu *gpu, const char *fwname, @@ -908,12 +945,6 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev, adreno_gpu_config.ioname = "kgsl_3d0_reg_memory"; - adreno_gpu_config.va_start = SZ_16M; - adreno_gpu_config.va_end = 0xffffffff; - /* maximum range of a2xx mmu */ - if (adreno_is_a2xx(adreno_gpu)) - adreno_gpu_config.va_end = SZ_16M + 0xfff * SZ_64K; - adreno_gpu_config.nr_rings = nr_rings; adreno_get_pwrlevels(&pdev->dev, gpu); diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h index c7441fb..141ff3a 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h @@ -247,6 +247,11 @@ void adreno_gpu_state_destroy(struct msm_gpu_state *state); int adreno_gpu_state_get(struct msm_gpu *gpu, struct msm_gpu_state *state); int adreno_gpu_state_put(struct msm_gpu_state *state); + +struct msm_gem_address_space * +adreno_gpu_create_address_space(struct msm_gpu *gpu, + struct platform_device *pdev); + /* * For a5xx and a6xx targets load the zap shader that is used to pull the GPU * out of secure mode diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c index 9271f39..5a36c3d 100644 --- a/drivers/gpu/drm/msm/msm_gpu.c +++ b/drivers/gpu/drm/msm/msm_gpu.c @@ -9,7 +9,6 @@ #include "msm_mmu.h" #include "msm_fence.h" #include "msm_gpu_trace.h" -#include "adreno/adreno_gpu.h" #include <generated/utsrelease.h> #include <linux/string_helpers.h> @@ -801,56 +800,6 @@ static int get_clocks(struct platform_device *pdev, struct msm_gpu *gpu) return 0; } -static struct msm_gem_address_space * -msm_gpu_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev, - uint64_t va_start, uint64_t va_end) -{ - struct msm_gem_address_space *aspace; - struct msm_mmu *mmu; - int ret; - - /* - * Setup IOMMU.. eventually we will (I think) do this once per context - * and have separate page tables per context. For now, to keep things - * simple and to get something working, just use a single address space: - */ - if (!adreno_is_a2xx(to_adreno_gpu(gpu))) { - struct iommu_domain *iommu = iommu_domain_alloc(&platform_bus_type); - if (!iommu) - return NULL; - - mmu = msm_iommu_new(&pdev->dev, iommu); - if (IS_ERR(mmu)) { - iommu_domain_free(iommu); - return ERR_CAST(mmu); - } - - DRM_DEV_INFO(gpu->dev->dev, "%s: using IOMMU\n", gpu->name); - - } else { - mmu = msm_gpummu_new(&pdev->dev, gpu); - if (IS_ERR(mmu)) - return ERR_CAST(mmu); - } - - aspace = msm_gem_address_space_create(mmu, "gpu", va_start, va_end); - if (IS_ERR(aspace)) { - mmu->funcs->destroy(mmu); - - DRM_DEV_ERROR(gpu->dev->dev, "failed to init mmu: %ld\n", - PTR_ERR(aspace)); - return ERR_CAST(aspace); - } - - ret = aspace->mmu->funcs->attach(aspace->mmu, NULL, 0); - if (ret) { - msm_gem_address_space_put(aspace); - return ERR_PTR(ret); - } - - return aspace; -} - int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev, struct msm_gpu *gpu, const struct msm_gpu_funcs *funcs, const char *name, struct msm_gpu_config *config) @@ -923,12 +872,13 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev, msm_devfreq_init(gpu); - gpu->aspace = msm_gpu_create_address_space(gpu, pdev, - config->va_start, config->va_end); + gpu->aspace = funcs->create_address_space(gpu, pdev); if (gpu->aspace == NULL) DRM_DEV_INFO(drm->dev, "%s: no IOMMU, fallback to VRAM carveout!\n", name); else if (IS_ERR(gpu->aspace)) { + DRM_DEV_ERROR(gpu->dev->dev, "failed to init mmu: %ld\n", + PTR_ERR(gpu->aspace)); ret = PTR_ERR(gpu->aspace); goto fail; } diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h index ab8f0f9c..41d86c2 100644 --- a/drivers/gpu/drm/msm/msm_gpu.h +++ b/drivers/gpu/drm/msm/msm_gpu.h @@ -21,8 +21,6 @@ struct msm_gpu_state; struct msm_gpu_config { const char *ioname; - uint64_t va_start; - uint64_t va_end; unsigned int nr_rings; }; @@ -64,6 +62,8 @@ struct msm_gpu_funcs { int (*gpu_state_put)(struct msm_gpu_state *state); unsigned long (*gpu_get_freq)(struct msm_gpu *gpu); void (*gpu_set_freq)(struct msm_gpu *gpu, unsigned long freq); + struct msm_gem_address_space *(*create_address_space) + (struct msm_gpu *gpu, struct platform_device *pdev); }; struct msm_gpu { -- 2.7.4
On Tue, 20 Aug 2019 13:06:27 -0600, Jordan Crouse wrote:
> Add a compatible string to identify SMMUs that are attached
> to Adreno GPU devices that wish to support split pagetables.
>
> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> ---
>
> Documentation/devicetree/bindings/iommu/arm,smmu.txt | 7 +++++++
> 1 file changed, 7 insertions(+)
>
Reviewed-by: Rob Herring <robh@kernel.org>
On Tue, Aug 20, 2019 at 01:06:30PM -0600, Jordan Crouse wrote:
> Support the DOMAIN_ATTR_SPLIT_TABLES attribute to let the leaf driver
> know if split pagetables are enabled for the domain.
>
> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> ---
>
> drivers/iommu/arm-smmu.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 3f41cf7..6a512ff 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -1442,6 +1442,9 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
> case DOMAIN_ATTR_NESTING:
> *(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
> return 0;
> + case DOMAIN_ATTR_SPLIT_TABLES:
> + *(int *)data = !!(smmu_domain->split_pagetables);
> + return 0;
Hmm. Could you move the setting of this attribute into
arm_smmu_domain_set_attr() and reject it if the ias != 48 in there? That way
the user of the domain can request this feature, rather than us enforcing it
based on the compatible string.
I'd also prefer to call it DOMAIN_ATTR_USE_TTBR1 instead, since it's pretty
ARM specific at this point.
Will
Quoting Jordan Crouse (2019-08-20 12:06:28) > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c > index 39e81ef..3f41cf7 100644 > --- a/drivers/iommu/arm-smmu.c > +++ b/drivers/iommu/arm-smmu.c > @@ -1858,6 +1858,7 @@ ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU); > ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500); > ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2); > ARM_SMMU_MATCH_DATA(qcom_smmuv2, ARM_SMMU_V2, QCOM_SMMUV2); > +ARM_SMMU_MATCH_DATA(qcom_adreno_smmuv2, ARM_SMMU_V2, QCOM_ADRENO_SMMUV2); > > static const struct of_device_id arm_smmu_of_match[] = { > { .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 }, > @@ -1867,6 +1868,7 @@ static const struct of_device_id arm_smmu_of_match[] = { > { .compatible = "arm,mmu-500", .data = &arm_mmu500 }, > { .compatible = "cavium,smmu-v2", .data = &cavium_smmuv2 }, > { .compatible = "qcom,smmu-v2", .data = &qcom_smmuv2 }, > + { .compatible = "qcom,adreno-smmu-v2", .data = &qcom_adreno_smmuv2 }, Can this be sorted on compat? > { }, > }; > > diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h > index 91a4eb8..e5a2cc8 100644 > --- a/drivers/iommu/arm-smmu.h > +++ b/drivers/iommu/arm-smmu.h > @@ -222,6 +222,7 @@ enum arm_smmu_implementation { > ARM_MMU500, > CAVIUM_SMMUV2, > QCOM_SMMUV2, > + QCOM_ADRENO_SMMUV2, Can this be sorted alphabetically?
Quoting Jordan Crouse (2019-08-20 12:06:27) > Add a compatible string to identify SMMUs that are attached > to Adreno GPU devices that wish to support split pagetables. > > Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> > --- > > Documentation/devicetree/bindings/iommu/arm,smmu.txt | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt > index 3133f3b..3b07896 100644 > --- a/Documentation/devicetree/bindings/iommu/arm,smmu.txt > +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt > @@ -18,6 +18,7 @@ conditions. > "arm,mmu-500" > "cavium,smmu-v2" > "qcom,smmu-v2" > + "qcom,adreno-smmu-v2" Is the tabbing weird here or just my MUA is failing? > > depending on the particular implementation and/or the > version of the architecture implemented. > @@ -31,6 +32,12 @@ conditions. > as below, SoC-specific compatibles: > "qcom,sdm845-smmu-500", "arm,mmu-500" > > + "qcom,adreno-smmu-v2" is a special implementation for Heh, special. > + SMMU devices attached to the Adreno GPU on Qcom devices. > + If selected, this will enable split pagetable (TTBR1) Is this selected? Sounds like Kconfig here. > + support. Only use this if the GPU target is capable of > + supporting 64 bit addresses. > + > - reg : Base address and size of the SMMU. >