* [PATCH v2 0/2] iommu/arm-smmu-v3: make sure the kdump kernel can work well when smmu is enabled @ 2019-03-18 13:12 Zhen Lei 2019-03-18 13:12 ` [PATCH v2 1/2] iommu/arm-smmu-v3: make sure the stale caching of L1STD are invalid Zhen Lei ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Zhen Lei @ 2019-03-18 13:12 UTC (permalink / raw) To: Jean-Philippe Brucker, Robin Murphy, Will Deacon, Joerg Roedel, linux-arm-kernel, iommu, linux-kernel Cc: Zhen Lei v1 --> v2: 1. Drop part2. Now, we only use the SMMUv3 hardware feature STE.config=0b000 (Report abort to device, no event recorded) to suppress the event messages caused by the unexpected devices. 2. rewrite the patch description. v1: This patch series include two parts: 1. Patch1-2 use dummy STE tables with "ste abort" hardware feature to abort unexpected devices accessing. For more details, see the description in patch 2. 2. If the "ste abort" feature is not support, force the unexpected devices in the secondary kernel to use the memory maps which it used in the first kernel. For more details, see patch 5. Zhen Lei (2): iommu/arm-smmu-v3: make sure the stale caching of L1STD are invalid iommu/arm-smmu-v3: to make smmu can be enabled in the kdump kernel drivers/iommu/arm-smmu-v3.c | 88 +++++++++++++++++++++++++++++++++------------ 1 file changed, 65 insertions(+), 23 deletions(-) -- 1.8.3 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/2] iommu/arm-smmu-v3: make sure the stale caching of L1STD are invalid 2019-03-18 13:12 [PATCH v2 0/2] iommu/arm-smmu-v3: make sure the kdump kernel can work well when smmu is enabled Zhen Lei @ 2019-03-18 13:12 ` Zhen Lei 2019-03-18 13:12 ` [PATCH v2 2/2] iommu/arm-smmu-v3: to make smmu can be enabled in the kdump kernel Zhen Lei 2019-04-04 15:30 ` [PATCH v2 0/2] iommu/arm-smmu-v3: make sure the kdump kernel can work well when smmu is enabled Will Deacon 2 siblings, 0 replies; 10+ messages in thread From: Zhen Lei @ 2019-03-18 13:12 UTC (permalink / raw) To: Jean-Philippe Brucker, Robin Murphy, Will Deacon, Joerg Roedel, linux-arm-kernel, iommu, linux-kernel Cc: Zhen Lei After the content of L1STD(Level 1 Stream Table Descriptor) in DDR has been modified, should make sure the cached copies be invalidated. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> --- drivers/iommu/arm-smmu-v3.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index d3880010c6cfc8c..9b6afa8e69f70f6 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1071,13 +1071,14 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu, *dst = cpu_to_le64(val); } -static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid) +static void __arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, + u32 sid, bool leaf) { struct arm_smmu_cmdq_ent cmd = { .opcode = CMDQ_OP_CFGI_STE, .cfgi = { .sid = sid, - .leaf = true, + .leaf = leaf, }, }; @@ -1085,6 +1086,16 @@ static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid) arm_smmu_cmdq_issue_sync(smmu); } +static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid) +{ + __arm_smmu_sync_ste_for_sid(smmu, sid, true); +} + +static void arm_smmu_sync_std_for_sid(struct arm_smmu_device *smmu, u32 sid) +{ + __arm_smmu_sync_ste_for_sid(smmu, sid, false); +} + static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid, __le64 *dst, struct arm_smmu_strtab_ent *ste) { @@ -1232,6 +1243,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid) arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT); arm_smmu_write_strtab_l1_desc(strtab, desc); + arm_smmu_sync_std_for_sid(smmu, sid); return 0; } -- 1.8.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 2/2] iommu/arm-smmu-v3: to make smmu can be enabled in the kdump kernel 2019-03-18 13:12 [PATCH v2 0/2] iommu/arm-smmu-v3: make sure the kdump kernel can work well when smmu is enabled Zhen Lei 2019-03-18 13:12 ` [PATCH v2 1/2] iommu/arm-smmu-v3: make sure the stale caching of L1STD are invalid Zhen Lei @ 2019-03-18 13:12 ` Zhen Lei 2019-04-04 15:30 ` [PATCH v2 0/2] iommu/arm-smmu-v3: make sure the kdump kernel can work well when smmu is enabled Will Deacon 2 siblings, 0 replies; 10+ messages in thread From: Zhen Lei @ 2019-03-18 13:12 UTC (permalink / raw) To: Jean-Philippe Brucker, Robin Murphy, Will Deacon, Joerg Roedel, linux-arm-kernel, iommu, linux-kernel Cc: Zhen Lei I don't known why device_shutdown() is not called in the first kernel before the execution switch to the secondary kernel. People may afraid that the function may lead the kernel to be crashed again, because it contains too many operations, lead the secondary kernel can not be entered finally. Maybe the configuration of a device driver is set in the first kernel, but it's not set in the secondary kernel, because of the limited memory resources. (In order to facilitate the description, mark this kind of devices as "unexpected devices".) Because the device was not shutdown in the first kernel, so it may still access memory in the secondary kernel. For example, a netcard may still using its ring buffer to auto receive the external network packets in the secondary kernel. commit b63b3439b856 ("iommu/arm-smmu-v3: Abort all transactions if SMMU is enabled in kdump kernel") set SMMU_GBPA.ABORT to abort the unexpected devices access, but it also abort the memory access of the devices which we needed, like netcard. For example, a system may have no harddisk, and the vmcore should be dumped through network. In fact, we can use STE.config=0b000 to abort the memory access of the unexpected devices only. Show as below: 1. In the first kernel, all buffers used by the "unexpected" devices are correctly mapped, and it will not be corrupted by the secondary kernel because the latter has its dedicated reserved memory. 2. Enter the secondary kernel, set SMMU_GBPA.ABORT=1 then disable smmu. 3. Preset all STE entries: STE.config=0b000. For 2-level Stream Table, pre-allocated a dummy L2ST(Level 2 Stream Table) and make all L1STD.l2ptr pointer to the dummy L2ST. The dummy L2ST is shared by all L1STDs(Level 1 Stream Table Descriptor). 4. Enable smmu. After now, a new attached device if needed, will allocate a new L2ST accordingly, and change the related L1STD.l2ptr pointer to it. Please note that, we still base desc->l2ptr to judge whether the L2ST have been allocated or not, and don't care the value of L1STD.l2ptr. Fixes: commit b63b3439b856 ("iommu/arm-smmu-v3: Abort all transactions ...") Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> --- drivers/iommu/arm-smmu-v3.c | 72 ++++++++++++++++++++++++++++++++------------- 1 file changed, 51 insertions(+), 21 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 9b6afa8e69f70f6..28b04d4aef62a9f 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1218,35 +1218,57 @@ static void arm_smmu_init_bypass_stes(u64 *strtab, unsigned int nent) } } -static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid) +static int __arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid, + struct arm_smmu_strtab_l1_desc *desc) { - size_t size; void *strtab; struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; - struct arm_smmu_strtab_l1_desc *desc = &cfg->l1_desc[sid >> STRTAB_SPLIT]; - if (desc->l2ptr) - return 0; - - size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3); strtab = &cfg->strtab[(sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS]; - desc->span = STRTAB_SPLIT + 1; - desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &desc->l2ptr_dma, - GFP_KERNEL | __GFP_ZERO); if (!desc->l2ptr) { - dev_err(smmu->dev, - "failed to allocate l2 stream table for SID %u\n", - sid); - return -ENOMEM; + size_t size; + + size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3); + desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, + &desc->l2ptr_dma, + GFP_KERNEL | __GFP_ZERO); + if (!desc->l2ptr) { + dev_err(smmu->dev, + "failed to allocate l2 stream table for SID %u\n", + sid); + return -ENOMEM; + } + + desc->span = STRTAB_SPLIT + 1; + arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT); } - arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT); arm_smmu_write_strtab_l1_desc(strtab, desc); + return 0; +} + +static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid) +{ + int ret; + struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; + struct arm_smmu_strtab_l1_desc *desc = &cfg->l1_desc[sid >> STRTAB_SPLIT]; + + ret = __arm_smmu_init_l2_strtab(smmu, sid, desc); + if (ret) + return ret; + arm_smmu_sync_std_for_sid(smmu, sid); return 0; } +static int arm_smmu_init_dummy_l2_strtab(struct arm_smmu_device *smmu, u32 sid) +{ + static struct arm_smmu_strtab_l1_desc dummy_desc; + + return __arm_smmu_init_l2_strtab(smmu, sid, &dummy_desc); +} + /* IRQ and event handlers */ static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev) { @@ -2149,8 +2171,12 @@ static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu) } for (i = 0; i < cfg->num_l1_ents; ++i) { - arm_smmu_write_strtab_l1_desc(strtab, &cfg->l1_desc[i]); - strtab += STRTAB_L1_DESC_DWORDS << 3; + if (is_kdump_kernel()) { + arm_smmu_init_dummy_l2_strtab(smmu, i << STRTAB_SPLIT); + } else { + arm_smmu_write_strtab_l1_desc(strtab, &cfg->l1_desc[i]); + strtab += STRTAB_L1_DESC_DWORDS << 3; + } } return 0; @@ -2466,11 +2492,8 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass) /* Clear CR0 and sync (disables SMMU and queue processing) */ reg = readl_relaxed(smmu->base + ARM_SMMU_CR0); if (reg & CR0_SMMUEN) { - if (is_kdump_kernel()) { + if (is_kdump_kernel()) arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0); - arm_smmu_device_disable(smmu); - return -EBUSY; - } dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n"); } @@ -2858,6 +2881,13 @@ static int arm_smmu_device_probe(struct platform_device *pdev) struct device *dev = &pdev->dev; bool bypass; + /* + * Force to disable bypass in the kdump kernel, abort all incoming + * transactions from the unknown devices. + */ + if (is_kdump_kernel()) + disable_bypass = 1; + smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL); if (!smmu) { dev_err(dev, "failed to allocate arm_smmu_device\n"); -- 1.8.3 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] iommu/arm-smmu-v3: make sure the kdump kernel can work well when smmu is enabled 2019-03-18 13:12 [PATCH v2 0/2] iommu/arm-smmu-v3: make sure the kdump kernel can work well when smmu is enabled Zhen Lei 2019-03-18 13:12 ` [PATCH v2 1/2] iommu/arm-smmu-v3: make sure the stale caching of L1STD are invalid Zhen Lei 2019-03-18 13:12 ` [PATCH v2 2/2] iommu/arm-smmu-v3: to make smmu can be enabled in the kdump kernel Zhen Lei @ 2019-04-04 15:30 ` Will Deacon 2019-04-08 2:31 ` Leizhen (ThunderTown) 2 siblings, 1 reply; 10+ messages in thread From: Will Deacon @ 2019-04-04 15:30 UTC (permalink / raw) To: Zhen Lei Cc: Jean-Philippe Brucker, Robin Murphy, Joerg Roedel, linux-arm-kernel, iommu, linux-kernel Hi Zhen Lei, On Mon, Mar 18, 2019 at 09:12:41PM +0800, Zhen Lei wrote: > v1 --> v2: > 1. Drop part2. Now, we only use the SMMUv3 hardware feature STE.config=0b000 > (Report abort to device, no event recorded) to suppress the event messages > caused by the unexpected devices. > 2. rewrite the patch description. This issue came up a while back: https://lore.kernel.org/linux-pci/20180302103032.GB19323@arm.com/ and I'd still prefer to solve it using the disable_bypass logic which we already have. Something along the lines of the diff below? We're relying on the DMA API not subsequently requesting a passthrough domain, but it should only do that if you've configured your crashkernel to do so. Will --->8 diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index d3880010c6cf..91b8f3b2ee25 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -2454,13 +2454,9 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass) /* Clear CR0 and sync (disables SMMU and queue processing) */ reg = readl_relaxed(smmu->base + ARM_SMMU_CR0); if (reg & CR0_SMMUEN) { - if (is_kdump_kernel()) { - arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0); - arm_smmu_device_disable(smmu); - return -EBUSY; - } - dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n"); + WARN_ON(is_kdump_kernel() && !disable_bypass); + arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0); } ret = arm_smmu_device_disable(smmu); ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] iommu/arm-smmu-v3: make sure the kdump kernel can work well when smmu is enabled 2019-04-04 15:30 ` [PATCH v2 0/2] iommu/arm-smmu-v3: make sure the kdump kernel can work well when smmu is enabled Will Deacon @ 2019-04-08 2:31 ` Leizhen (ThunderTown) 2019-04-16 9:14 ` Will Deacon 0 siblings, 1 reply; 10+ messages in thread From: Leizhen (ThunderTown) @ 2019-04-08 2:31 UTC (permalink / raw) To: Will Deacon Cc: Jean-Philippe Brucker, Robin Murphy, Joerg Roedel, linux-arm-kernel, iommu, linux-kernel Hi Will, On 2019/4/4 23:30, Will Deacon wrote: > Hi Zhen Lei, > > On Mon, Mar 18, 2019 at 09:12:41PM +0800, Zhen Lei wrote: >> v1 --> v2: >> 1. Drop part2. Now, we only use the SMMUv3 hardware feature STE.config=0b000 >> (Report abort to device, no event recorded) to suppress the event messages >> caused by the unexpected devices. >> 2. rewrite the patch description. > > This issue came up a while back: > > https://lore.kernel.org/linux-pci/20180302103032.GB19323@arm.com/ > > and I'd still prefer to solve it using the disable_bypass logic which we > already have. Something along the lines of the diff below? Yes, my patches also use disable_bypass=1(set ste.config=0b000). If SMMU_IDR0.ST_LEVEL=0(Linear Stream table supported), then all STE entries are allocated and initialized(set ste.config=0b000). But if SMMU_IDR0.ST_LEVEL=1 (2-level Stream Table), we only allocated and initialized the first level tables, but leave level 2 tables dynamic allocated. That means, C_BAD_STREAMID(eventid=0x2) will be reported, if an unexpeted device access memory without reinitialized in kdump kernel. So my patches allocated a dummy level 2 table(STE table), and make all level 1 table entries pointer to it in advance. That means abort all unexpected devices memory access base this dummy STE table. When an expected device(need to be used in kdump kernel) attached, we will allocate a new level 2 table(STE table) accordingly, but keep others still pointer to the dummy STE table. > > We're relying on the DMA API not subsequently requesting a passthrough > domain, but it should only do that if you've configured your crashkernel > to do so. > > Will > > --->8 > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c > index d3880010c6cf..91b8f3b2ee25 100644 > --- a/drivers/iommu/arm-smmu-v3.c > +++ b/drivers/iommu/arm-smmu-v3.c > @@ -2454,13 +2454,9 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass) > /* Clear CR0 and sync (disables SMMU and queue processing) */ > reg = readl_relaxed(smmu->base + ARM_SMMU_CR0); > if (reg & CR0_SMMUEN) { > - if (is_kdump_kernel()) { > - arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0); > - arm_smmu_device_disable(smmu); > - return -EBUSY; > - } > - > dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n"); > + WARN_ON(is_kdump_kernel() && !disable_bypass); > + arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0); > } > > ret = arm_smmu_device_disable(smmu); > > . > -- Thanks! BestRegards ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] iommu/arm-smmu-v3: make sure the kdump kernel can work well when smmu is enabled 2019-04-08 2:31 ` Leizhen (ThunderTown) @ 2019-04-16 9:14 ` Will Deacon 2019-04-17 1:39 ` Leizhen (ThunderTown) ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Will Deacon @ 2019-04-16 9:14 UTC (permalink / raw) To: Leizhen (ThunderTown) Cc: Jean-Philippe Brucker, Joerg Roedel, linux-kernel, iommu, Robin Murphy, linux-arm-kernel On Mon, Apr 08, 2019 at 10:31:47AM +0800, Leizhen (ThunderTown) wrote: > On 2019/4/4 23:30, Will Deacon wrote: > > On Mon, Mar 18, 2019 at 09:12:41PM +0800, Zhen Lei wrote: > >> v1 --> v2: > >> 1. Drop part2. Now, we only use the SMMUv3 hardware feature STE.config=0b000 > >> (Report abort to device, no event recorded) to suppress the event messages > >> caused by the unexpected devices. > >> 2. rewrite the patch description. > > > > This issue came up a while back: > > > > https://lore.kernel.org/linux-pci/20180302103032.GB19323@arm.com/ > > > > and I'd still prefer to solve it using the disable_bypass logic which we > > already have. Something along the lines of the diff below? > > Yes, my patches also use disable_bypass=1(set ste.config=0b000). If > SMMU_IDR0.ST_LEVEL=0(Linear Stream table supported), then all STE entries > are allocated and initialized(set ste.config=0b000). But if SMMU_IDR0.ST_LEVEL=1 > (2-level Stream Table), we only allocated and initialized the first level tables, > but leave level 2 tables dynamic allocated. That means, C_BAD_STREAMID(eventid=0x2) > will be reported, if an unexpeted device access memory without reinitialized in > kdump kernel. So is your problem just that the C_BAD_STREAMID events are noisy? If so, perhaps we should be disabling fault reporting entirely in the kdump kernel. How about the update diff below? I'm keen to have this as simple as possible, so we don't end up introducing rarely tested, complex code on the crash path. Will --->8 diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index d3880010c6cf..d8b73da6447d 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -2454,13 +2454,9 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass) /* Clear CR0 and sync (disables SMMU and queue processing) */ reg = readl_relaxed(smmu->base + ARM_SMMU_CR0); if (reg & CR0_SMMUEN) { - if (is_kdump_kernel()) { - arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0); - arm_smmu_device_disable(smmu); - return -EBUSY; - } - dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n"); + WARN_ON(is_kdump_kernel() && !disable_bypass); + arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0); } ret = arm_smmu_device_disable(smmu); @@ -2553,6 +2549,8 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass) return ret; } + if (is_kdump_kernel()) + enables &= ~(CR0_EVTQEN | CR0_PRIQEN); /* Enable the SMMU interface, or ensure bypass */ if (!bypass || disable_bypass) { ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] iommu/arm-smmu-v3: make sure the kdump kernel can work well when smmu is enabled 2019-04-16 9:14 ` Will Deacon @ 2019-04-17 1:39 ` Leizhen (ThunderTown) 2019-04-19 13:48 ` Leizhen (ThunderTown) 2019-04-22 12:33 ` Bhupesh Sharma 2019-04-24 16:22 ` Matthias Brugger 2 siblings, 1 reply; 10+ messages in thread From: Leizhen (ThunderTown) @ 2019-04-17 1:39 UTC (permalink / raw) To: Will Deacon Cc: Jean-Philippe Brucker, Joerg Roedel, linux-kernel, iommu, Robin Murphy, linux-arm-kernel On 2019/4/16 17:14, Will Deacon wrote: > On Mon, Apr 08, 2019 at 10:31:47AM +0800, Leizhen (ThunderTown) wrote: >> On 2019/4/4 23:30, Will Deacon wrote: >>> On Mon, Mar 18, 2019 at 09:12:41PM +0800, Zhen Lei wrote: >>>> v1 --> v2: >>>> 1. Drop part2. Now, we only use the SMMUv3 hardware feature STE.config=0b000 >>>> (Report abort to device, no event recorded) to suppress the event messages >>>> caused by the unexpected devices. >>>> 2. rewrite the patch description. >>> >>> This issue came up a while back: >>> >>> https://lore.kernel.org/linux-pci/20180302103032.GB19323@arm.com/ >>> >>> and I'd still prefer to solve it using the disable_bypass logic which we >>> already have. Something along the lines of the diff below? >> >> Yes, my patches also use disable_bypass=1(set ste.config=0b000). If >> SMMU_IDR0.ST_LEVEL=0(Linear Stream table supported), then all STE entries >> are allocated and initialized(set ste.config=0b000). But if SMMU_IDR0.ST_LEVEL=1 >> (2-level Stream Table), we only allocated and initialized the first level tables, >> but leave level 2 tables dynamic allocated. That means, C_BAD_STREAMID(eventid=0x2) >> will be reported, if an unexpeted device access memory without reinitialized in >> kdump kernel. > > So is your problem just that the C_BAD_STREAMID events are noisy? If so, > perhaps we should be disabling fault reporting entirely in the kdump kernel. > > How about the update diff below? I'm keen to have this as simple as > possible, so we don't end up introducing rarely tested, complex code on > the crash path. In theory, it can solve the problem, let me test it. But then again, below patch will also disable the fault reporting come from the expected devices which are used in the kdump kernel. In fact, my patches have been merged into our interval version more than 2 months, no bug have been found yet. However, my patches do not support the case that the hardware does not support the "STE bypass" feature, I think your patch can also resolve it. > > Will > > --->8 > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c > index d3880010c6cf..d8b73da6447d 100644 > --- a/drivers/iommu/arm-smmu-v3.c > +++ b/drivers/iommu/arm-smmu-v3.c > @@ -2454,13 +2454,9 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass) > /* Clear CR0 and sync (disables SMMU and queue processing) */ > reg = readl_relaxed(smmu->base + ARM_SMMU_CR0); > if (reg & CR0_SMMUEN) { > - if (is_kdump_kernel()) { > - arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0); > - arm_smmu_device_disable(smmu); > - return -EBUSY; > - } > - > dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n"); > + WARN_ON(is_kdump_kernel() && !disable_bypass); > + arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0); > } > > ret = arm_smmu_device_disable(smmu); > @@ -2553,6 +2549,8 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass) > return ret; > } > > + if (is_kdump_kernel()) > + enables &= ~(CR0_EVTQEN | CR0_PRIQEN); > > /* Enable the SMMU interface, or ensure bypass */ > if (!bypass || disable_bypass) { > > . > -- Thanks! BestRegards ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] iommu/arm-smmu-v3: make sure the kdump kernel can work well when smmu is enabled 2019-04-17 1:39 ` Leizhen (ThunderTown) @ 2019-04-19 13:48 ` Leizhen (ThunderTown) 0 siblings, 0 replies; 10+ messages in thread From: Leizhen (ThunderTown) @ 2019-04-19 13:48 UTC (permalink / raw) To: Will Deacon Cc: Jean-Philippe Brucker, Joerg Roedel, linux-kernel, iommu, Robin Murphy, linux-arm-kernel On 2019/4/17 9:39, Leizhen (ThunderTown) wrote: > > > On 2019/4/16 17:14, Will Deacon wrote: >> On Mon, Apr 08, 2019 at 10:31:47AM +0800, Leizhen (ThunderTown) wrote: >>> On 2019/4/4 23:30, Will Deacon wrote: >>>> On Mon, Mar 18, 2019 at 09:12:41PM +0800, Zhen Lei wrote: >>>>> v1 --> v2: >>>>> 1. Drop part2. Now, we only use the SMMUv3 hardware feature STE.config=0b000 >>>>> (Report abort to device, no event recorded) to suppress the event messages >>>>> caused by the unexpected devices. >>>>> 2. rewrite the patch description. >>>> >>>> This issue came up a while back: >>>> >>>> https://lore.kernel.org/linux-pci/20180302103032.GB19323@arm.com/ >>>> >>>> and I'd still prefer to solve it using the disable_bypass logic which we >>>> already have. Something along the lines of the diff below? >>> >>> Yes, my patches also use disable_bypass=1(set ste.config=0b000). If >>> SMMU_IDR0.ST_LEVEL=0(Linear Stream table supported), then all STE entries >>> are allocated and initialized(set ste.config=0b000). But if SMMU_IDR0.ST_LEVEL=1 >>> (2-level Stream Table), we only allocated and initialized the first level tables, >>> but leave level 2 tables dynamic allocated. That means, C_BAD_STREAMID(eventid=0x2) >>> will be reported, if an unexpeted device access memory without reinitialized in >>> kdump kernel. >> >> So is your problem just that the C_BAD_STREAMID events are noisy? If so, >> perhaps we should be disabling fault reporting entirely in the kdump kernel. >> >> How about the update diff below? I'm keen to have this as simple as >> possible, so we don't end up introducing rarely tested, complex code on >> the crash path. > In theory, it can solve the problem, let me test it. Hi Will, I have tested your patch on my board today. It works well. > > But then again, below patch will also disable the fault reporting come from the > expected devices which are used in the kdump kernel. In fact, my patches have been > merged into our interval version more than 2 months, no bug have been found yet. > > However, my patches do not support the case that the hardware does not support the > "STE bypass" feature, I think your patch can also resolve it. > >> >> Will >> >> --->8 >> >> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c >> index d3880010c6cf..d8b73da6447d 100644 >> --- a/drivers/iommu/arm-smmu-v3.c >> +++ b/drivers/iommu/arm-smmu-v3.c >> @@ -2454,13 +2454,9 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass) >> /* Clear CR0 and sync (disables SMMU and queue processing) */ >> reg = readl_relaxed(smmu->base + ARM_SMMU_CR0); >> if (reg & CR0_SMMUEN) { >> - if (is_kdump_kernel()) { >> - arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0); >> - arm_smmu_device_disable(smmu); >> - return -EBUSY; >> - } >> - >> dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n"); >> + WARN_ON(is_kdump_kernel() && !disable_bypass); >> + arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0); >> } >> >> ret = arm_smmu_device_disable(smmu); >> @@ -2553,6 +2549,8 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass) >> return ret; >> } >> >> + if (is_kdump_kernel()) >> + enables &= ~(CR0_EVTQEN | CR0_PRIQEN); >> >> /* Enable the SMMU interface, or ensure bypass */ >> if (!bypass || disable_bypass) { >> >> . >> > -- Thanks! BestRegards ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] iommu/arm-smmu-v3: make sure the kdump kernel can work well when smmu is enabled 2019-04-16 9:14 ` Will Deacon 2019-04-17 1:39 ` Leizhen (ThunderTown) @ 2019-04-22 12:33 ` Bhupesh Sharma 2019-04-24 16:22 ` Matthias Brugger 2 siblings, 0 replies; 10+ messages in thread From: Bhupesh Sharma @ 2019-04-22 12:33 UTC (permalink / raw) To: Will Deacon, Leizhen (ThunderTown) Cc: Jean-Philippe Brucker, Joerg Roedel, linux-kernel, iommu, Robin Murphy, linux-arm-kernel, kexec, Bhupesh SHARMA Hi Will, On 04/16/2019 02:44 PM, Will Deacon wrote: > On Mon, Apr 08, 2019 at 10:31:47AM +0800, Leizhen (ThunderTown) wrote: >> On 2019/4/4 23:30, Will Deacon wrote: >>> On Mon, Mar 18, 2019 at 09:12:41PM +0800, Zhen Lei wrote: >>>> v1 --> v2: >>>> 1. Drop part2. Now, we only use the SMMUv3 hardware feature STE.config=0b000 >>>> (Report abort to device, no event recorded) to suppress the event messages >>>> caused by the unexpected devices. >>>> 2. rewrite the patch description. >>> >>> This issue came up a while back: >>> >>> https://lore.kernel.org/linux-pci/20180302103032.GB19323@arm.com/ >>> >>> and I'd still prefer to solve it using the disable_bypass logic which we >>> already have. Something along the lines of the diff below? >> >> Yes, my patches also use disable_bypass=1(set ste.config=0b000). If >> SMMU_IDR0.ST_LEVEL=0(Linear Stream table supported), then all STE entries >> are allocated and initialized(set ste.config=0b000). But if SMMU_IDR0.ST_LEVEL=1 >> (2-level Stream Table), we only allocated and initialized the first level tables, >> but leave level 2 tables dynamic allocated. That means, C_BAD_STREAMID(eventid=0x2) >> will be reported, if an unexpeted device access memory without reinitialized in >> kdump kernel. > > So is your problem just that the C_BAD_STREAMID events are noisy? If so, > perhaps we should be disabling fault reporting entirely in the kdump kernel. > > How about the update diff below? I'm keen to have this as simple as > possible, so we don't end up introducing rarely tested, complex code on > the crash path. > > Will > > --->8 > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c > index d3880010c6cf..d8b73da6447d 100644 > --- a/drivers/iommu/arm-smmu-v3.c > +++ b/drivers/iommu/arm-smmu-v3.c > @@ -2454,13 +2454,9 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass) > /* Clear CR0 and sync (disables SMMU and queue processing) */ > reg = readl_relaxed(smmu->base + ARM_SMMU_CR0); > if (reg & CR0_SMMUEN) { > - if (is_kdump_kernel()) { > - arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0); > - arm_smmu_device_disable(smmu); > - return -EBUSY; > - } > - > dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n"); > + WARN_ON(is_kdump_kernel() && !disable_bypass); > + arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0); > } > > ret = arm_smmu_device_disable(smmu); > @@ -2553,6 +2549,8 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass) > return ret; > } > > + if (is_kdump_kernel()) > + enables &= ~(CR0_EVTQEN | CR0_PRIQEN); > > /* Enable the SMMU interface, or ensure bypass */ > if (!bypass || disable_bypass) { > Thanks for the fix. I can confirm that after this kdump kernel boots well for me on huawei boards, so feel free to add: Tested-by: Bhupesh Sharma <bhsharma@redhat.com> Here are the kdump kernel logs without this fix: [ 4.514181] arm-smmu-v3 arm-smmu-v3.1.auto: EVTQ overflow detected -- events lost .. And then repeating messages like the following .. [ 4.521654] arm-smmu-v3 arm-smmu-v3.1.auto: event 0x02 received: [ 4.527654] arm-smmu-v3 arm-smmu-v3.1.auto: 0x00007d0200000002 [ 4.533567] arm-smmu-v3 arm-smmu-v3.1.auto: 0x000000010000017e [ 4.539478] arm-smmu-v3 arm-smmu-v3.1.auto: 0x00000000ff6de000 [ 4.545390] arm-smmu-v3 arm-smmu-v3.1.auto: 0x000000000eee03e8 And with the fix applied, kdump kernel logs can be seen below: [ 9136.361094] Starting crashdump kernel... [ 9136.365007] Bye! [ 0.000000] Booting Linux on physical CPU 0x0000070002 [0x480fd010] [ 0.000000] Linux version 5.1.0-rc6+ <..snip..> [ 3.424103] arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0 [ 3.429674] arm-smmu-v3 arm-smmu-v3.0.auto: ias 48-bit, oas 48-bit (features 0x00000fef) [ 3.437780] arm-smmu-v3 arm-smmu-v3.0.auto: SMMU currently enabled! Resetting... [ 3.445431] arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0 <..snip..> Thanks, Bhupesh ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 0/2] iommu/arm-smmu-v3: make sure the kdump kernel can work well when smmu is enabled 2019-04-16 9:14 ` Will Deacon 2019-04-17 1:39 ` Leizhen (ThunderTown) 2019-04-22 12:33 ` Bhupesh Sharma @ 2019-04-24 16:22 ` Matthias Brugger 2 siblings, 0 replies; 10+ messages in thread From: Matthias Brugger @ 2019-04-24 16:22 UTC (permalink / raw) To: Will Deacon, Leizhen (ThunderTown) Cc: Jean-Philippe Brucker, Joerg Roedel, linux-kernel, iommu, Robin Murphy, linux-arm-kernel On 16/04/2019 11:14, Will Deacon wrote: > On Mon, Apr 08, 2019 at 10:31:47AM +0800, Leizhen (ThunderTown) wrote: >> On 2019/4/4 23:30, Will Deacon wrote: >>> On Mon, Mar 18, 2019 at 09:12:41PM +0800, Zhen Lei wrote: >>>> v1 --> v2: >>>> 1. Drop part2. Now, we only use the SMMUv3 hardware feature STE.config=0b000 >>>> (Report abort to device, no event recorded) to suppress the event messages >>>> caused by the unexpected devices. >>>> 2. rewrite the patch description. >>> >>> This issue came up a while back: >>> >>> https://lore.kernel.org/linux-pci/20180302103032.GB19323@arm.com/ >>> >>> and I'd still prefer to solve it using the disable_bypass logic which we >>> already have. Something along the lines of the diff below? >> >> Yes, my patches also use disable_bypass=1(set ste.config=0b000). If >> SMMU_IDR0.ST_LEVEL=0(Linear Stream table supported), then all STE entries >> are allocated and initialized(set ste.config=0b000). But if SMMU_IDR0.ST_LEVEL=1 >> (2-level Stream Table), we only allocated and initialized the first level tables, >> but leave level 2 tables dynamic allocated. That means, C_BAD_STREAMID(eventid=0x2) >> will be reported, if an unexpeted device access memory without reinitialized in >> kdump kernel. > > So is your problem just that the C_BAD_STREAMID events are noisy? If so, > perhaps we should be disabling fault reporting entirely in the kdump kernel. > > How about the update diff below? I'm keen to have this as simple as > possible, so we don't end up introducing rarely tested, complex code on > the crash path. > > Will > > --->8 > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c > index d3880010c6cf..d8b73da6447d 100644 > --- a/drivers/iommu/arm-smmu-v3.c > +++ b/drivers/iommu/arm-smmu-v3.c > @@ -2454,13 +2454,9 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass) > /* Clear CR0 and sync (disables SMMU and queue processing) */ > reg = readl_relaxed(smmu->base + ARM_SMMU_CR0); > if (reg & CR0_SMMUEN) { > - if (is_kdump_kernel()) { > - arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0); > - arm_smmu_device_disable(smmu); > - return -EBUSY; > - } > - > dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n"); > + WARN_ON(is_kdump_kernel() && !disable_bypass); > + arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0); > } > > ret = arm_smmu_device_disable(smmu); > @@ -2553,6 +2549,8 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass) > return ret; > } > > + if (is_kdump_kernel()) > + enables &= ~(CR0_EVTQEN | CR0_PRIQEN); > > /* Enable the SMMU interface, or ensure bypass */ > if (!bypass || disable_bypass) { > Same here I tested the patch and it works for me. Feel free to add: Tested-by: Matthias Brugger <mbrugger@suse.com> Regards, Matthias ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-04-24 16:22 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-03-18 13:12 [PATCH v2 0/2] iommu/arm-smmu-v3: make sure the kdump kernel can work well when smmu is enabled Zhen Lei 2019-03-18 13:12 ` [PATCH v2 1/2] iommu/arm-smmu-v3: make sure the stale caching of L1STD are invalid Zhen Lei 2019-03-18 13:12 ` [PATCH v2 2/2] iommu/arm-smmu-v3: to make smmu can be enabled in the kdump kernel Zhen Lei 2019-04-04 15:30 ` [PATCH v2 0/2] iommu/arm-smmu-v3: make sure the kdump kernel can work well when smmu is enabled Will Deacon 2019-04-08 2:31 ` Leizhen (ThunderTown) 2019-04-16 9:14 ` Will Deacon 2019-04-17 1:39 ` Leizhen (ThunderTown) 2019-04-19 13:48 ` Leizhen (ThunderTown) 2019-04-22 12:33 ` Bhupesh Sharma 2019-04-24 16:22 ` Matthias Brugger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).