linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/2] PCI: dwc: Add support for 64-bit MSI target addresses
@ 2022-08-25 18:50 Will McVicker
  2022-08-25 18:50 ` [PATCH v5 1/2] PCI: dwc: Drop dependency on ZONE_DMA32 Will McVicker
  2022-08-25 18:50 ` [PATCH v5 2/2] PCI: dwc: Add support for 64-bit MSI target address Will McVicker
  0 siblings, 2 replies; 26+ messages in thread
From: Will McVicker @ 2022-08-25 18:50 UTC (permalink / raw)
  To: Jingoo Han, Gustavo Pimentel, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Bjorn Helgaas, Will McVicker
  Cc: kernel-team, Vidya Sagar, Christoph Hellwig, Robin Murphy,
	linux-pci, linux-kernel

Hi All,

I have updated the second patch to first try allocating the DMA buffer
with a 32-bit mask. If that fails and the host supports a 64-bit msi_msg
buffer, then the driver will retry the allocation with a 64-bit DMA mask.

I rebased the series and tested with a Pixel 6 on 6.0-rc1
(android-mainline) and a db845c with 5.15. Thanks for the reviews!

Regards,
Will

Will McVicker (2):
  PCI: dwc: Drop dependency on ZONE_DMA32
  PCI: dwc: Add support for 64-bit MSI target address

v5:
 * Updated patch 2/2 to first try with a 32-bit DMA mask. On failure,
   retry with a 64-bit mask if supported.

v4:
 * Updated commit descriptions.
 * Renamed msi_64b -> msi_64bit.
 * Dropped msi_64bit ternary use.
 * Dropped export of dw_pcie_msi_capabilities.

v3:
  * Switched to a managed DMA allocation.
  * Simplified the DMA allocation cleanup.
  * Dropped msi_page from struct dw_pcie_rp.
  * Allocating a u64 instead of a full page.

v2:
  * Fixed build error caught by kernel test robot
  * Fixed error handling reported by Isaac Manjarres

 .../pci/controller/dwc/pcie-designware-host.c | 54 ++++++++++---------
 drivers/pci/controller/dwc/pcie-designware.c  |  8 +++
 drivers/pci/controller/dwc/pcie-designware.h  |  2 +-
 3 files changed, 39 insertions(+), 25 deletions(-)


base-commit: 568035b01cfb107af8d2e4bd2fb9aea22cf5b868
-- 
2.37.2.672.g94769d06f0-goog


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v5 1/2] PCI: dwc: Drop dependency on ZONE_DMA32
  2022-08-25 18:50 [PATCH v5 0/2] PCI: dwc: Add support for 64-bit MSI target addresses Will McVicker
@ 2022-08-25 18:50 ` Will McVicker
  2022-09-28 11:41   ` Serge Semin
  2022-08-25 18:50 ` [PATCH v5 2/2] PCI: dwc: Add support for 64-bit MSI target address Will McVicker
  1 sibling, 1 reply; 26+ messages in thread
From: Will McVicker @ 2022-08-25 18:50 UTC (permalink / raw)
  To: Jingoo Han, Gustavo Pimentel, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Bjorn Helgaas, Will McVicker
  Cc: kernel-team, Vidya Sagar, Christoph Hellwig, Robin Murphy,
	linux-pci, linux-kernel, Isaac J . Manjarres

Re-work the msi_msg DMA allocation logic to use dmam_alloc_coherent() which
uses the coherent DMA mask to try to return an allocation within the DMA
mask limits. With that, we now can drop the msi_page parameter in struct
dw_pcie_rp. This allows kernel configurations that disable ZONE_DMA32 to
continue supporting a 32-bit DMA mask. Without this patch, the PCIe host
device will fail to probe when ZONE_DMA32 is disabled.

Fixes: 35797e672ff0 ("PCI: dwc: Fix MSI msi_msg DMA mapping")
Reported-by: Isaac J. Manjarres <isaacmanjarres@google.com>
Signed-off-by: Will McVicker <willmcvicker@google.com>
Acked-by: Jingoo Han <jingoohan1@gmail.com>
Reviewed-by: Rob Herring <robh@kernel.org>
---
 .../pci/controller/dwc/pcie-designware-host.c | 28 +++++--------------
 drivers/pci/controller/dwc/pcie-designware.h  |  1 -
 2 files changed, 7 insertions(+), 22 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
index 7746f94a715f..39f3b37d4033 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -267,15 +267,6 @@ static void dw_pcie_free_msi(struct dw_pcie_rp *pp)
 
 	irq_domain_remove(pp->msi_domain);
 	irq_domain_remove(pp->irq_domain);
-
-	if (pp->msi_data) {
-		struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
-		struct device *dev = pci->dev;
-
-		dma_unmap_page(dev, pp->msi_data, PAGE_SIZE, DMA_FROM_DEVICE);
-		if (pp->msi_page)
-			__free_page(pp->msi_page);
-	}
 }
 
 static void dw_pcie_msi_init(struct dw_pcie_rp *pp)
@@ -336,6 +327,7 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
 	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
 	struct device *dev = pci->dev;
 	struct platform_device *pdev = to_platform_device(dev);
+	u64 *msi_vaddr;
 	int ret;
 	u32 ctrl, num_ctrls;
 
@@ -375,22 +367,16 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
 						    dw_chained_msi_isr, pp);
 	}
 
-	ret = dma_set_mask(dev, DMA_BIT_MASK(32));
+	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
 	if (ret)
 		dev_warn(dev, "Failed to set DMA mask to 32-bit. Devices with only 32-bit MSI support may not work properly\n");
 
-	pp->msi_page = alloc_page(GFP_DMA32);
-	pp->msi_data = dma_map_page(dev, pp->msi_page, 0,
-				    PAGE_SIZE, DMA_FROM_DEVICE);
-	ret = dma_mapping_error(dev, pp->msi_data);
-	if (ret) {
-		dev_err(pci->dev, "Failed to map MSI data\n");
-		__free_page(pp->msi_page);
-		pp->msi_page = NULL;
-		pp->msi_data = 0;
+	msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
+					GFP_KERNEL);
+	if (!msi_vaddr) {
+		dev_err(dev, "Failed to alloc and map MSI data\n");
 		dw_pcie_free_msi(pp);
-
-		return ret;
+		return -ENOMEM;
 	}
 
 	return 0;
diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
index 09b887093a84..a871ae7eb59e 100644
--- a/drivers/pci/controller/dwc/pcie-designware.h
+++ b/drivers/pci/controller/dwc/pcie-designware.h
@@ -243,7 +243,6 @@ struct dw_pcie_rp {
 	struct irq_domain	*irq_domain;
 	struct irq_domain	*msi_domain;
 	dma_addr_t		msi_data;
-	struct page		*msi_page;
 	struct irq_chip		*msi_irq_chip;
 	u32			num_vectors;
 	u32			irq_mask[MAX_MSI_CTRLS];
-- 
2.37.2.672.g94769d06f0-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v5 2/2] PCI: dwc: Add support for 64-bit MSI target address
  2022-08-25 18:50 [PATCH v5 0/2] PCI: dwc: Add support for 64-bit MSI target addresses Will McVicker
  2022-08-25 18:50 ` [PATCH v5 1/2] PCI: dwc: Drop dependency on ZONE_DMA32 Will McVicker
@ 2022-08-25 18:50 ` Will McVicker
  2022-08-25 20:59   ` Robin Murphy
                     ` (2 more replies)
  1 sibling, 3 replies; 26+ messages in thread
From: Will McVicker @ 2022-08-25 18:50 UTC (permalink / raw)
  To: Jingoo Han, Gustavo Pimentel, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Bjorn Helgaas, Will McVicker
  Cc: kernel-team, Vidya Sagar, Christoph Hellwig, Robin Murphy,
	linux-pci, linux-kernel, kernel test robot

Since not all devices require a 32-bit MSI address, add support to the
PCIe host driver to allow setting the DMA mask to 64-bits if the 32-bit
allocation fails. This allows kernels to disable ZONE_DMA32 and bounce
buffering (swiotlb) without risking not being able to get a 32-bit address
during DMA allocation.

Basically, in the slim chance that there are no 32-bit allocations
available, the current PCIe host driver will fail to allocate the msi_msg
page due to a DMA address overflow (seen in [1]). With this patch, the
PCIe host can retry the allocation with a 64-bit DMA mask if the current
PCIe device advertises 64-bit support via its MSI capabilities.

[1] https://lore.kernel.org/all/Yo0soniFborDl7+C@google.com/

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Will McVicker <willmcvicker@google.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Jingoo Han <jingoohan1@gmail.com>
---
 .../pci/controller/dwc/pcie-designware-host.c | 38 ++++++++++++++-----
 drivers/pci/controller/dwc/pcie-designware.c  |  8 ++++
 drivers/pci/controller/dwc/pcie-designware.h  |  1 +
 3 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
index 39f3b37d4033..8928a9a29d58 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -330,6 +330,9 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
 	u64 *msi_vaddr;
 	int ret;
 	u32 ctrl, num_ctrls;
+	bool msi_64bit = false;
+	bool retry_64bit = false;
+	u16 msi_capabilities;
 
 	for (ctrl = 0; ctrl < MAX_MSI_CTRLS; ctrl++)
 		pp->irq_mask[ctrl] = ~0;
@@ -367,16 +370,33 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
 						    dw_chained_msi_isr, pp);
 	}
 
-	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
-	if (ret)
-		dev_warn(dev, "Failed to set DMA mask to 32-bit. Devices with only 32-bit MSI support may not work properly\n");
+	msi_capabilities = dw_pcie_msi_capabilities(pci);
+	if (msi_capabilities & PCI_MSI_FLAGS_ENABLE)
+		msi_64bit = msi_capabilities & PCI_MSI_FLAGS_64BIT;
 
-	msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
-					GFP_KERNEL);
-	if (!msi_vaddr) {
-		dev_err(dev, "Failed to alloc and map MSI data\n");
-		dw_pcie_free_msi(pp);
-		return -ENOMEM;
+	while (true) {
+		dev_dbg(dev, "Setting MSI DMA mask to %s-bit.\n",
+			retry_64bit ? "64" : "32");
+		ret = dma_set_mask_and_coherent(dev, retry_64bit ?
+						DMA_BIT_MASK(64) :
+						DMA_BIT_MASK(32));
+		if (ret)
+			dev_warn(dev, "Failed to set DMA mask to %s-bit.\n",
+				 retry_64bit ? "64" : "32");
+
+		msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
+						GFP_KERNEL);
+		if (!msi_vaddr) {
+			dev_err(dev, "Failed to alloc and map MSI data\n");
+			if (msi_64bit && !retry_64bit) {
+				retry_64bit = true;
+				continue;
+			}
+
+			dw_pcie_free_msi(pp);
+			return -ENOMEM;
+		}
+		break;
 	}
 
 	return 0;
diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
index c6725c519a47..650a7f22f9d0 100644
--- a/drivers/pci/controller/dwc/pcie-designware.c
+++ b/drivers/pci/controller/dwc/pcie-designware.c
@@ -82,6 +82,14 @@ u8 dw_pcie_find_capability(struct dw_pcie *pci, u8 cap)
 }
 EXPORT_SYMBOL_GPL(dw_pcie_find_capability);
 
+u16 dw_pcie_msi_capabilities(struct dw_pcie *pci)
+{
+	u8 offset;
+
+	offset = dw_pcie_find_capability(pci, PCI_CAP_ID_MSI);
+	return dw_pcie_readw_dbi(pci, offset + PCI_MSI_FLAGS);
+}
+
 static u16 dw_pcie_find_next_ext_capability(struct dw_pcie *pci, u16 start,
 					    u8 cap)
 {
diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
index a871ae7eb59e..45fcdfc8c035 100644
--- a/drivers/pci/controller/dwc/pcie-designware.h
+++ b/drivers/pci/controller/dwc/pcie-designware.h
@@ -332,6 +332,7 @@ void dw_pcie_version_detect(struct dw_pcie *pci);
 
 u8 dw_pcie_find_capability(struct dw_pcie *pci, u8 cap);
 u16 dw_pcie_find_ext_capability(struct dw_pcie *pci, u8 cap);
+u16 dw_pcie_msi_capabilities(struct dw_pcie *pci);
 
 int dw_pcie_read(void __iomem *addr, int size, u32 *val);
 int dw_pcie_write(void __iomem *addr, int size, u32 val);
-- 
2.37.2.672.g94769d06f0-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] PCI: dwc: Add support for 64-bit MSI target address
  2022-08-25 18:50 ` [PATCH v5 2/2] PCI: dwc: Add support for 64-bit MSI target address Will McVicker
@ 2022-08-25 20:59   ` Robin Murphy
  2022-08-25 21:22     ` William McVicker
  2022-09-09 13:29   ` Christoph Hellwig
  2022-09-28 12:05   ` Serge Semin
  2 siblings, 1 reply; 26+ messages in thread
From: Robin Murphy @ 2022-08-25 20:59 UTC (permalink / raw)
  To: Will McVicker, Jingoo Han, Gustavo Pimentel, Lorenzo Pieralisi,
	Rob Herring, Krzysztof Wilczyński, Bjorn Helgaas
  Cc: kernel-team, Vidya Sagar, Christoph Hellwig, linux-pci,
	linux-kernel, kernel test robot

On 2022-08-25 19:50, Will McVicker wrote:
> Since not all devices require a 32-bit MSI address, add support to the
> PCIe host driver to allow setting the DMA mask to 64-bits if the 32-bit
> allocation fails. This allows kernels to disable ZONE_DMA32 and bounce
> buffering (swiotlb) without risking not being able to get a 32-bit address
> during DMA allocation.
> 
> Basically, in the slim chance that there are no 32-bit allocations
> available, the current PCIe host driver will fail to allocate the msi_msg
> page due to a DMA address overflow (seen in [1]). With this patch, the
> PCIe host can retry the allocation with a 64-bit DMA mask if the current
> PCIe device advertises 64-bit support via its MSI capabilities.
> 
> [1] https://lore.kernel.org/all/Yo0soniFborDl7+C@google.com/
> 
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Will McVicker <willmcvicker@google.com>
> Reviewed-by: Rob Herring <robh@kernel.org>
> Acked-by: Jingoo Han <jingoohan1@gmail.com>
> ---
>   .../pci/controller/dwc/pcie-designware-host.c | 38 ++++++++++++++-----
>   drivers/pci/controller/dwc/pcie-designware.c  |  8 ++++
>   drivers/pci/controller/dwc/pcie-designware.h  |  1 +
>   3 files changed, 38 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> index 39f3b37d4033..8928a9a29d58 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> @@ -330,6 +330,9 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
>   	u64 *msi_vaddr;
>   	int ret;
>   	u32 ctrl, num_ctrls;
> +	bool msi_64bit = false;
> +	bool retry_64bit = false;
> +	u16 msi_capabilities;
>   
>   	for (ctrl = 0; ctrl < MAX_MSI_CTRLS; ctrl++)
>   		pp->irq_mask[ctrl] = ~0;
> @@ -367,16 +370,33 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
>   						    dw_chained_msi_isr, pp);
>   	}
>   
> -	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
> -	if (ret)
> -		dev_warn(dev, "Failed to set DMA mask to 32-bit. Devices with only 32-bit MSI support may not work properly\n");
> +	msi_capabilities = dw_pcie_msi_capabilities(pci);
> +	if (msi_capabilities & PCI_MSI_FLAGS_ENABLE)
> +		msi_64bit = msi_capabilities & PCI_MSI_FLAGS_64BIT;
>   
> -	msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
> -					GFP_KERNEL);
> -	if (!msi_vaddr) {
> -		dev_err(dev, "Failed to alloc and map MSI data\n");
> -		dw_pcie_free_msi(pp);
> -		return -ENOMEM;
> +	while (true) {
> +		dev_dbg(dev, "Setting MSI DMA mask to %s-bit.\n",
> +			retry_64bit ? "64" : "32");

If only we has some sort of "variable" that could could store a 
numerical value, think of the possibilities... :)

> +		ret = dma_set_mask_and_coherent(dev, retry_64bit ?
> +						DMA_BIT_MASK(64) :
> +						DMA_BIT_MASK(32));
> +		if (ret)
> +			dev_warn(dev, "Failed to set DMA mask to %s-bit.\n",
> +				 retry_64bit ? "64" : "32");

Setting a 64-bit mask should never fail, since it represents having no 
possible limitation whatsoever (I'm not sure if there are any platforms 
left where setting a 32-bit mask can actually fail in practice either, 
but I have no strong opinion on the fate of the existing warning).

> +
> +		msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
> +						GFP_KERNEL);
> +		if (!msi_vaddr) {
> +			dev_err(dev, "Failed to alloc and map MSI data\n");

Possibly a mattrer of personal taste, but I'd say try to avoid dev_err() 
for things that aren't actually fatal; if you're still able to continue 
on, at best it's a warning, not an error. Especially if your use-case 
*expects* the 32-bit allocation fail. There's nothing more offputting 
than booting a typical vendor kernel and watching it scream tons of 
errors that look EXTREMELY IMPORTANT yet are also apparently 
inconsequential.

> +			if (msi_64bit && !retry_64bit) {
> +				retry_64bit = true;
> +				continue;
> +			}
> +
> +			dw_pcie_free_msi(pp);
> +			return -ENOMEM;
> +		}
> +		break;

TBH the whole loop design is a bit baroque for me, I'd have gone for a 
more straightforward tweak to the existing flow, something like:

	msi_vaddr = NULL;
	ret = dma_set_mask(32);
	if (!ret)
		msi_vaddr = dma_alloc();
	if (!msi_vaddr && msi_64bit) {
		dev_warn();
		dma_set_mask(64);
		msi_vaddr = dma_alloc();
	}
	if (!msi_vaddr) {
		dev_err();
		return;
	}
		
However I'm happy that you've captured the important functional point, 
so I'll leave the style matters up to Lorenzo.

Thanks,
Robin.

>   	}
>   
>   	return 0;
> diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
> index c6725c519a47..650a7f22f9d0 100644
> --- a/drivers/pci/controller/dwc/pcie-designware.c
> +++ b/drivers/pci/controller/dwc/pcie-designware.c
> @@ -82,6 +82,14 @@ u8 dw_pcie_find_capability(struct dw_pcie *pci, u8 cap)
>   }
>   EXPORT_SYMBOL_GPL(dw_pcie_find_capability);
>   
> +u16 dw_pcie_msi_capabilities(struct dw_pcie *pci)
> +{
> +	u8 offset;
> +
> +	offset = dw_pcie_find_capability(pci, PCI_CAP_ID_MSI);
> +	return dw_pcie_readw_dbi(pci, offset + PCI_MSI_FLAGS);
> +}
> +
>   static u16 dw_pcie_find_next_ext_capability(struct dw_pcie *pci, u16 start,
>   					    u8 cap)
>   {
> diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
> index a871ae7eb59e..45fcdfc8c035 100644
> --- a/drivers/pci/controller/dwc/pcie-designware.h
> +++ b/drivers/pci/controller/dwc/pcie-designware.h
> @@ -332,6 +332,7 @@ void dw_pcie_version_detect(struct dw_pcie *pci);
>   
>   u8 dw_pcie_find_capability(struct dw_pcie *pci, u8 cap);
>   u16 dw_pcie_find_ext_capability(struct dw_pcie *pci, u8 cap);
> +u16 dw_pcie_msi_capabilities(struct dw_pcie *pci);
>   
>   int dw_pcie_read(void __iomem *addr, int size, u32 *val);
>   int dw_pcie_write(void __iomem *addr, int size, u32 val);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] PCI: dwc: Add support for 64-bit MSI target address
  2022-08-25 20:59   ` Robin Murphy
@ 2022-08-25 21:22     ` William McVicker
  0 siblings, 0 replies; 26+ messages in thread
From: William McVicker @ 2022-08-25 21:22 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Jingoo Han, Gustavo Pimentel, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Bjorn Helgaas, kernel-team,
	Vidya Sagar, Christoph Hellwig, linux-pci, linux-kernel,
	kernel test robot

On 08/25/2022, Robin Murphy wrote:
> On 2022-08-25 19:50, Will McVicker wrote:
> > Since not all devices require a 32-bit MSI address, add support to the
> > PCIe host driver to allow setting the DMA mask to 64-bits if the 32-bit
> > allocation fails. This allows kernels to disable ZONE_DMA32 and bounce
> > buffering (swiotlb) without risking not being able to get a 32-bit address
> > during DMA allocation.
> > 
> > Basically, in the slim chance that there are no 32-bit allocations
> > available, the current PCIe host driver will fail to allocate the msi_msg
> > page due to a DMA address overflow (seen in [1]). With this patch, the
> > PCIe host can retry the allocation with a 64-bit DMA mask if the current
> > PCIe device advertises 64-bit support via its MSI capabilities.
> > 
> > [1] https://lore.kernel.org/all/Yo0soniFborDl7+C@google.com/
> > 
> > Reported-by: kernel test robot <lkp@intel.com>
> > Signed-off-by: Will McVicker <willmcvicker@google.com>
> > Reviewed-by: Rob Herring <robh@kernel.org>
> > Acked-by: Jingoo Han <jingoohan1@gmail.com>
> > ---
> >   .../pci/controller/dwc/pcie-designware-host.c | 38 ++++++++++++++-----
> >   drivers/pci/controller/dwc/pcie-designware.c  |  8 ++++
> >   drivers/pci/controller/dwc/pcie-designware.h  |  1 +
> >   3 files changed, 38 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> > index 39f3b37d4033..8928a9a29d58 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> > @@ -330,6 +330,9 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> >   	u64 *msi_vaddr;
> >   	int ret;
> >   	u32 ctrl, num_ctrls;
> > +	bool msi_64bit = false;
> > +	bool retry_64bit = false;
> > +	u16 msi_capabilities;
> >   	for (ctrl = 0; ctrl < MAX_MSI_CTRLS; ctrl++)
> >   		pp->irq_mask[ctrl] = ~0;
> > @@ -367,16 +370,33 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> >   						    dw_chained_msi_isr, pp);
> >   	}
> > -	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
> > -	if (ret)
> > -		dev_warn(dev, "Failed to set DMA mask to 32-bit. Devices with only 32-bit MSI support may not work properly\n");
> > +	msi_capabilities = dw_pcie_msi_capabilities(pci);
> > +	if (msi_capabilities & PCI_MSI_FLAGS_ENABLE)
> > +		msi_64bit = msi_capabilities & PCI_MSI_FLAGS_64BIT;
> > -	msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
> > -					GFP_KERNEL);
> > -	if (!msi_vaddr) {
> > -		dev_err(dev, "Failed to alloc and map MSI data\n");
> > -		dw_pcie_free_msi(pp);
> > -		return -ENOMEM;
> > +	while (true) {
> > +		dev_dbg(dev, "Setting MSI DMA mask to %s-bit.\n",
> > +			retry_64bit ? "64" : "32");
> 
> If only we has some sort of "variable" that could could store a numerical
> value, think of the possibilities... :)

Sure, now that we're trying both 32- and 64-bit, I can do that. Thanks for the
suggestion :)

> 
> > +		ret = dma_set_mask_and_coherent(dev, retry_64bit ?
> > +						DMA_BIT_MASK(64) :
> > +						DMA_BIT_MASK(32));
> > +		if (ret)
> > +			dev_warn(dev, "Failed to set DMA mask to %s-bit.\n",
> > +				 retry_64bit ? "64" : "32");
> 
> Setting a 64-bit mask should never fail, since it represents having no
> possible limitation whatsoever (I'm not sure if there are any platforms left
> where setting a 32-bit mask can actually fail in practice either, but I have
> no strong opinion on the fate of the existing warning).

Yeah, I'm not sure how this could fail. So I just left the warning and edited
the message. It's probably cleaner to just leave the warning unconditionally
based on ret.

> 
> > +
> > +		msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
> > +						GFP_KERNEL);
> > +		if (!msi_vaddr) {
> > +			dev_err(dev, "Failed to alloc and map MSI data\n");
> 
> Possibly a mattrer of personal taste, but I'd say try to avoid dev_err() for
> things that aren't actually fatal; if you're still able to continue on, at
> best it's a warning, not an error. Especially if your use-case *expects* the
> 32-bit allocation fail. There's nothing more offputting than booting a
> typical vendor kernel and watching it scream tons of errors that look
> EXTREMELY IMPORTANT yet are also apparently inconsequential.

Failing a 32-bit allocation should be a rare case, but still possible. If it
fails for both 32-bit and 64-bit, then it's very likely the PCIe device calling
dw_pcie_host_init() will fail to probe. So I'll move this down to only report
that error.

> 
> > +			if (msi_64bit && !retry_64bit) {
> > +				retry_64bit = true;
> > +				continue;
> > +			}
> > +
> > +			dw_pcie_free_msi(pp);
> > +			return -ENOMEM;
> > +		}
> > +		break;
> 
> TBH the whole loop design is a bit baroque for me, I'd have gone for a more
> straightforward tweak to the existing flow, something like:
> 
> 	msi_vaddr = NULL;
> 	ret = dma_set_mask(32);
> 	if (!ret)
> 		msi_vaddr = dma_alloc();
> 	if (!msi_vaddr && msi_64bit) {
> 		dev_warn();
> 		dma_set_mask(64);
> 		msi_vaddr = dma_alloc();
> 	}
> 	if (!msi_vaddr) {
> 		dev_err();
> 		return;
> 	}
> 		
> However I'm happy that you've captured the important functional point, so
> I'll leave the style matters up to Lorenzo.

I was trying to avoid duplicating the allocation code, but if that's preferred,
then I'm fine with it.

> 
> Thanks,
> Robin.
> 
> >   	}
> >   	return 0;
> > diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
> > index c6725c519a47..650a7f22f9d0 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware.c
> > +++ b/drivers/pci/controller/dwc/pcie-designware.c
> > @@ -82,6 +82,14 @@ u8 dw_pcie_find_capability(struct dw_pcie *pci, u8 cap)
> >   }
> >   EXPORT_SYMBOL_GPL(dw_pcie_find_capability);
> > +u16 dw_pcie_msi_capabilities(struct dw_pcie *pci)
> > +{
> > +	u8 offset;
> > +
> > +	offset = dw_pcie_find_capability(pci, PCI_CAP_ID_MSI);
> > +	return dw_pcie_readw_dbi(pci, offset + PCI_MSI_FLAGS);
> > +}
> > +
> >   static u16 dw_pcie_find_next_ext_capability(struct dw_pcie *pci, u16 start,
> >   					    u8 cap)
> >   {
> > diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
> > index a871ae7eb59e..45fcdfc8c035 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware.h
> > +++ b/drivers/pci/controller/dwc/pcie-designware.h
> > @@ -332,6 +332,7 @@ void dw_pcie_version_detect(struct dw_pcie *pci);
> >   u8 dw_pcie_find_capability(struct dw_pcie *pci, u8 cap);
> >   u16 dw_pcie_find_ext_capability(struct dw_pcie *pci, u8 cap);
> > +u16 dw_pcie_msi_capabilities(struct dw_pcie *pci);
> >   int dw_pcie_read(void __iomem *addr, int size, u32 *val);
> >   int dw_pcie_write(void __iomem *addr, int size, u32 val);

Thanks,
Will

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] PCI: dwc: Add support for 64-bit MSI target address
  2022-08-25 18:50 ` [PATCH v5 2/2] PCI: dwc: Add support for 64-bit MSI target address Will McVicker
  2022-08-25 20:59   ` Robin Murphy
@ 2022-09-09 13:29   ` Christoph Hellwig
  2022-09-09 13:47     ` Robin Murphy
  2022-09-28 12:05   ` Serge Semin
  2 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2022-09-09 13:29 UTC (permalink / raw)
  To: Will McVicker
  Cc: Jingoo Han, Gustavo Pimentel, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Bjorn Helgaas, kernel-team,
	Vidya Sagar, Christoph Hellwig, Robin Murphy, linux-pci,
	linux-kernel, kernel test robot

On Thu, Aug 25, 2022 at 06:50:25PM +0000, Will McVicker wrote:
> Since not all devices require a 32-bit MSI address, add support to the
> PCIe host driver to allow setting the DMA mask to 64-bits if the 32-bit
> allocation fails. This allows kernels to disable ZONE_DMA32 and bounce
> buffering (swiotlb) without risking not being able to get a 32-bit address
> during DMA allocation.

Umm.  You can't just disable ZONE_DMA32.  Linux absolutely requires a
32-bit dma mask to work, it is in fact the implicit default.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] PCI: dwc: Add support for 64-bit MSI target address
  2022-09-09 13:29   ` Christoph Hellwig
@ 2022-09-09 13:47     ` Robin Murphy
  2022-09-09 14:47       ` Christoph Hellwig
  0 siblings, 1 reply; 26+ messages in thread
From: Robin Murphy @ 2022-09-09 13:47 UTC (permalink / raw)
  To: Christoph Hellwig, Will McVicker
  Cc: Jingoo Han, Gustavo Pimentel, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Bjorn Helgaas, kernel-team,
	Vidya Sagar, linux-pci, linux-kernel, kernel test robot

On 2022-09-09 14:29, Christoph Hellwig wrote:
> On Thu, Aug 25, 2022 at 06:50:25PM +0000, Will McVicker wrote:
>> Since not all devices require a 32-bit MSI address, add support to the
>> PCIe host driver to allow setting the DMA mask to 64-bits if the 32-bit
>> allocation fails. This allows kernels to disable ZONE_DMA32 and bounce
>> buffering (swiotlb) without risking not being able to get a 32-bit address
>> during DMA allocation.
> 
> Umm.  You can't just disable ZONE_DMA32.  Linux absolutely requires a
> 32-bit dma mask to work, it is in fact the implicit default.

Eh, it's behind CONFIG_EXPERT, which makes it enough of a "I think I 
know what I'm doing and accept responsibility for picking up the pieces 
if it breaks" thing.

Robin.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] PCI: dwc: Add support for 64-bit MSI target address
  2022-09-09 13:47     ` Robin Murphy
@ 2022-09-09 14:47       ` Christoph Hellwig
  2022-09-09 15:00         ` Robin Murphy
  0 siblings, 1 reply; 26+ messages in thread
From: Christoph Hellwig @ 2022-09-09 14:47 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Christoph Hellwig, Will McVicker, Jingoo Han, Gustavo Pimentel,
	Lorenzo Pieralisi, Rob Herring, Krzysztof Wilczyński,
	Bjorn Helgaas, kernel-team, Vidya Sagar, linux-pci, linux-kernel,
	kernel test robot

On Fri, Sep 09, 2022 at 02:47:19PM +0100, Robin Murphy wrote:
> On 2022-09-09 14:29, Christoph Hellwig wrote:
> > On Thu, Aug 25, 2022 at 06:50:25PM +0000, Will McVicker wrote:
> > > Since not all devices require a 32-bit MSI address, add support to the
> > > PCIe host driver to allow setting the DMA mask to 64-bits if the 32-bit
> > > allocation fails. This allows kernels to disable ZONE_DMA32 and bounce
> > > buffering (swiotlb) without risking not being able to get a 32-bit address
> > > during DMA allocation.
> > 
> > Umm.  You can't just disable ZONE_DMA32.  Linux absolutely requires a
> > 32-bit dma mask to work, it is in fact the implicit default.
> 
> Eh, it's behind CONFIG_EXPERT, which makes it enough of a "I think I know
> what I'm doing and accept responsibility for picking up the pieces if it
> breaks" thing.

Seem like indeed on arm64 there is a way to disable it.  The x86 model
is to just select it unconditionally, which I think is the right way
if we don't want to get into completely random failures.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] PCI: dwc: Add support for 64-bit MSI target address
  2022-09-09 14:47       ` Christoph Hellwig
@ 2022-09-09 15:00         ` Robin Murphy
  0 siblings, 0 replies; 26+ messages in thread
From: Robin Murphy @ 2022-09-09 15:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Will McVicker, Jingoo Han, Gustavo Pimentel, Lorenzo Pieralisi,
	Rob Herring, Krzysztof Wilczyński, Bjorn Helgaas,
	kernel-team, Vidya Sagar, linux-pci, linux-kernel,
	kernel test robot

On 2022-09-09 15:47, Christoph Hellwig wrote:
> On Fri, Sep 09, 2022 at 02:47:19PM +0100, Robin Murphy wrote:
>> On 2022-09-09 14:29, Christoph Hellwig wrote:
>>> On Thu, Aug 25, 2022 at 06:50:25PM +0000, Will McVicker wrote:
>>>> Since not all devices require a 32-bit MSI address, add support to the
>>>> PCIe host driver to allow setting the DMA mask to 64-bits if the 32-bit
>>>> allocation fails. This allows kernels to disable ZONE_DMA32 and bounce
>>>> buffering (swiotlb) without risking not being able to get a 32-bit address
>>>> during DMA allocation.
>>>
>>> Umm.  You can't just disable ZONE_DMA32.  Linux absolutely requires a
>>> 32-bit dma mask to work, it is in fact the implicit default.
>>
>> Eh, it's behind CONFIG_EXPERT, which makes it enough of a "I think I know
>> what I'm doing and accept responsibility for picking up the pieces if it
>> breaks" thing.
> 
> Seem like indeed on arm64 there is a way to disable it.  The x86 model
> is to just select it unconditionally, which I think is the right way
> if we don't want to get into completely random failures.

IIRC there were reasons for wanting as much ZONE_NORMAL memory as 
possible; for the embedded folks who are already typically running with 
"swiotlb=noforce" to save memory because they know their hardware, we 
may as well let them have the footgun. Distros and other general-purpose 
configs should rightly not be going anywhere near this.

Cheers,
Robin.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 1/2] PCI: dwc: Drop dependency on ZONE_DMA32
  2022-08-25 18:50 ` [PATCH v5 1/2] PCI: dwc: Drop dependency on ZONE_DMA32 Will McVicker
@ 2022-09-28 11:41   ` Serge Semin
  2022-09-29 18:25     ` Robin Murphy
  0 siblings, 1 reply; 26+ messages in thread
From: Serge Semin @ 2022-09-28 11:41 UTC (permalink / raw)
  To: Will McVicker
  Cc: Serge Semin, Jingoo Han, Gustavo Pimentel, Lorenzo Pieralisi,
	Rob Herring, Krzysztof Wilczyński, Bjorn Helgaas,
	kernel-team, Vidya Sagar, Christoph Hellwig, Robin Murphy,
	linux-pci, linux-kernel, Isaac J . Manjarres

On Thu, Aug 25, 2022 at 06:50:24PM +0000, Will McVicker wrote:
> Re-work the msi_msg DMA allocation logic to use dmam_alloc_coherent() which
> uses the coherent DMA mask to try to return an allocation within the DMA
> mask limits. With that, we now can drop the msi_page parameter in struct
> dw_pcie_rp. This allows kernel configurations that disable ZONE_DMA32 to
> continue supporting a 32-bit DMA mask. Without this patch, the PCIe host
> device will fail to probe when ZONE_DMA32 is disabled.

As Rob already said here
https://lore.kernel.org/all/CAL_JsqJh=d-B51b6yPBRq0tOwbChN=AFPr-a19U1QdQZAE7c1A@mail.gmail.com/
and I mentioned in this thread
https://lore.kernel.org/linux-pci/20220912000211.ct6asuhhmnatje5e@mobilestation/
DW PCIe MSI doesn't cause any DMA due to the way the iMSI-RX engine is
designed. So reserving any real system memory is a waste of one in
this case. Reserving DMA-coherent even more inappropriate since it
can be expensive on some platforms (see note in Part Ia of
Documentation/core-api/dma-api.rst). For instance on MIPS32 with
non-corehent common DMA.

> 
> Fixes: 35797e672ff0 ("PCI: dwc: Fix MSI msi_msg DMA mapping")
> Reported-by: Isaac J. Manjarres <isaacmanjarres@google.com>
> Signed-off-by: Will McVicker <willmcvicker@google.com>
> Acked-by: Jingoo Han <jingoohan1@gmail.com>
> Reviewed-by: Rob Herring <robh@kernel.org>
> ---
>  .../pci/controller/dwc/pcie-designware-host.c | 28 +++++--------------
>  drivers/pci/controller/dwc/pcie-designware.h  |  1 -
>  2 files changed, 7 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> index 7746f94a715f..39f3b37d4033 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> @@ -267,15 +267,6 @@ static void dw_pcie_free_msi(struct dw_pcie_rp *pp)
>  
>  	irq_domain_remove(pp->msi_domain);
>  	irq_domain_remove(pp->irq_domain);
> -
> -	if (pp->msi_data) {
> -		struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> -		struct device *dev = pci->dev;
> -
> -		dma_unmap_page(dev, pp->msi_data, PAGE_SIZE, DMA_FROM_DEVICE);
> -		if (pp->msi_page)
> -			__free_page(pp->msi_page);
> -	}
>  }
>  
>  static void dw_pcie_msi_init(struct dw_pcie_rp *pp)
> @@ -336,6 +327,7 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
>  	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
>  	struct device *dev = pci->dev;
>  	struct platform_device *pdev = to_platform_device(dev);
> +	u64 *msi_vaddr;
>  	int ret;
>  	u32 ctrl, num_ctrls;
>  
> @@ -375,22 +367,16 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
>  						    dw_chained_msi_isr, pp);
>  	}
>  

> -	ret = dma_set_mask(dev, DMA_BIT_MASK(32));
> +	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));

This has been redundant in the first place since none of the DW PCIe
low-level drivers update the mask, and it's of 32-bits wide by default
anyway:
https://elixir.bootlin.com/linux/latest/source/drivers/of/platform.c#L167

>  	if (ret)
>  		dev_warn(dev, "Failed to set DMA mask to 32-bit. Devices with only 32-bit MSI support may not work properly\n");
>  
> -	pp->msi_page = alloc_page(GFP_DMA32);
> -	pp->msi_data = dma_map_page(dev, pp->msi_page, 0,
> -				    PAGE_SIZE, DMA_FROM_DEVICE);
> -	ret = dma_mapping_error(dev, pp->msi_data);
> -	if (ret) {
> -		dev_err(pci->dev, "Failed to map MSI data\n");
> -		__free_page(pp->msi_page);
> -		pp->msi_page = NULL;
> -		pp->msi_data = 0;
> +	msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
> +					GFP_KERNEL);

Changing the whole device DMA-mask due to something that doesn't
perform seems inappropriate. I'd suggest to preserve the ZONE_DMA32
here until there is something like suggested by @Robin
https://lore.kernel.org/linux-pci/1e63a581-14ae-b4b5-a5bf-ca8f09c33af6@arm.com/
in the last paragraph is implemented. Especially seeing there still
common drivers in kernel which still rely on that zone.

-Sergey

> +	if (!msi_vaddr) {
> +		dev_err(dev, "Failed to alloc and map MSI data\n");
>  		dw_pcie_free_msi(pp);
> -
> -		return ret;
> +		return -ENOMEM;
>  	}
>  
>  	return 0;
> diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
> index 09b887093a84..a871ae7eb59e 100644
> --- a/drivers/pci/controller/dwc/pcie-designware.h
> +++ b/drivers/pci/controller/dwc/pcie-designware.h
> @@ -243,7 +243,6 @@ struct dw_pcie_rp {
>  	struct irq_domain	*irq_domain;
>  	struct irq_domain	*msi_domain;
>  	dma_addr_t		msi_data;
> -	struct page		*msi_page;
>  	struct irq_chip		*msi_irq_chip;
>  	u32			num_vectors;
>  	u32			irq_mask[MAX_MSI_CTRLS];
> -- 
> 2.37.2.672.g94769d06f0-goog
> 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] PCI: dwc: Add support for 64-bit MSI target address
  2022-08-25 18:50 ` [PATCH v5 2/2] PCI: dwc: Add support for 64-bit MSI target address Will McVicker
  2022-08-25 20:59   ` Robin Murphy
  2022-09-09 13:29   ` Christoph Hellwig
@ 2022-09-28 12:05   ` Serge Semin
  2022-09-28 17:52     ` William McVicker
  2022-09-30 13:46     ` Lorenzo Pieralisi
  2 siblings, 2 replies; 26+ messages in thread
From: Serge Semin @ 2022-09-28 12:05 UTC (permalink / raw)
  To: Will McVicker
  Cc: Serge Semin, Jingoo Han, Gustavo Pimentel, Lorenzo Pieralisi,
	Rob Herring, Krzysztof Wilczyński, Bjorn Helgaas,
	kernel-team, Vidya Sagar, Christoph Hellwig, Robin Murphy,
	linux-pci, linux-kernel, kernel test robot

On Thu, Aug 25, 2022 at 06:50:25PM +0000, Will McVicker wrote:
> Since not all devices require a 32-bit MSI address, add support to the
> PCIe host driver to allow setting the DMA mask to 64-bits if the 32-bit
> allocation fails. This allows kernels to disable ZONE_DMA32 and bounce
> buffering (swiotlb) without risking not being able to get a 32-bit address
> during DMA allocation.

What is a problem in having the ZONE_DMA32 enabled anyway?

> 
> Basically, in the slim chance that there are no 32-bit allocations
> available, the current PCIe host driver will fail to allocate the msi_msg
> page due to a DMA address overflow (seen in [1]). With this patch, the
> PCIe host can retry the allocation with a 64-bit DMA mask if the current
> PCIe device advertises 64-bit support via its MSI capabilities.
> 
> [1] https://lore.kernel.org/all/Yo0soniFborDl7+C@google.com/

Note the reported error isn't caused by the allocation procedure, but
by the mapping procedure.

> 
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Will McVicker <willmcvicker@google.com>
> Reviewed-by: Rob Herring <robh@kernel.org>
> Acked-by: Jingoo Han <jingoohan1@gmail.com>
> ---
>  .../pci/controller/dwc/pcie-designware-host.c | 38 ++++++++++++++-----
>  drivers/pci/controller/dwc/pcie-designware.c  |  8 ++++
>  drivers/pci/controller/dwc/pcie-designware.h  |  1 +
>  3 files changed, 38 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> index 39f3b37d4033..8928a9a29d58 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> @@ -330,6 +330,9 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
>  	u64 *msi_vaddr;
>  	int ret;
>  	u32 ctrl, num_ctrls;
> +	bool msi_64bit = false;
> +	bool retry_64bit = false;
> +	u16 msi_capabilities;
>  
>  	for (ctrl = 0; ctrl < MAX_MSI_CTRLS; ctrl++)
>  		pp->irq_mask[ctrl] = ~0;
> @@ -367,16 +370,33 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
>  						    dw_chained_msi_isr, pp);
>  	}
>  
> -	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
> -	if (ret)
> -		dev_warn(dev, "Failed to set DMA mask to 32-bit. Devices with only 32-bit MSI support may not work properly\n");

> +	msi_capabilities = dw_pcie_msi_capabilities(pci);
> +	if (msi_capabilities & PCI_MSI_FLAGS_ENABLE)
> +		msi_64bit = msi_capabilities & PCI_MSI_FLAGS_64BIT;

Note this capability flag has nothing to do with the DW PCIe iMSI-RX
engine, which is used here to detect and report MSI TLPs. By design
iMSI-RX always support 64-bit addresses. If you imply having that flag
set by the DW PCIe platform drivers on the platform-specific probe
stage as an indication of MSI address range, then ok.

>  
> -	msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
> -					GFP_KERNEL);
> -	if (!msi_vaddr) {
> -		dev_err(dev, "Failed to alloc and map MSI data\n");
> -		dw_pcie_free_msi(pp);
> -		return -ENOMEM;
> +	while (true) {
> +		dev_dbg(dev, "Setting MSI DMA mask to %s-bit.\n",
> +			retry_64bit ? "64" : "32");

> +		ret = dma_set_mask_and_coherent(dev, retry_64bit ?
> +						DMA_BIT_MASK(64) :
> +						DMA_BIT_MASK(32));

I'd suggest to just drop this. No DMA actually performed on getting the
MSI TLPs. So modifying the device DMA-mask due to something which
doesn't cause DMA and based on the flag which doesn't indicates the
device DMA-capability is at least inappropriate.

> +		if (ret)
> +			dev_warn(dev, "Failed to set DMA mask to %s-bit.\n",
> +				 retry_64bit ? "64" : "32");
> +

> +		msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
> +						GFP_KERNEL);

As I noted earlier the DMA-coherent memory can be too expensive. So
it's a waste of one allocating with no intent of usage. Instead of this
just get back the alloc_page() method here and pass the flag GFP_DMA32
to that function if MSI-capability reported the PCI_MSI_FLAGS_64BIT
unset.

-Sergey

> +		if (!msi_vaddr) {
> +			dev_err(dev, "Failed to alloc and map MSI data\n");
> +			if (msi_64bit && !retry_64bit) {
> +				retry_64bit = true;
> +				continue;
> +			}
> +
> +			dw_pcie_free_msi(pp);
> +			return -ENOMEM;
> +		}
> +		break;
>  	}
>  
>  	return 0;
> diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
> index c6725c519a47..650a7f22f9d0 100644
> --- a/drivers/pci/controller/dwc/pcie-designware.c
> +++ b/drivers/pci/controller/dwc/pcie-designware.c
> @@ -82,6 +82,14 @@ u8 dw_pcie_find_capability(struct dw_pcie *pci, u8 cap)
>  }
>  EXPORT_SYMBOL_GPL(dw_pcie_find_capability);
>  
> +u16 dw_pcie_msi_capabilities(struct dw_pcie *pci)
> +{
> +	u8 offset;
> +
> +	offset = dw_pcie_find_capability(pci, PCI_CAP_ID_MSI);
> +	return dw_pcie_readw_dbi(pci, offset + PCI_MSI_FLAGS);
> +}
> +
>  static u16 dw_pcie_find_next_ext_capability(struct dw_pcie *pci, u16 start,
>  					    u8 cap)
>  {
> diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
> index a871ae7eb59e..45fcdfc8c035 100644
> --- a/drivers/pci/controller/dwc/pcie-designware.h
> +++ b/drivers/pci/controller/dwc/pcie-designware.h
> @@ -332,6 +332,7 @@ void dw_pcie_version_detect(struct dw_pcie *pci);
>  
>  u8 dw_pcie_find_capability(struct dw_pcie *pci, u8 cap);
>  u16 dw_pcie_find_ext_capability(struct dw_pcie *pci, u8 cap);
> +u16 dw_pcie_msi_capabilities(struct dw_pcie *pci);
>  
>  int dw_pcie_read(void __iomem *addr, int size, u32 *val);
>  int dw_pcie_write(void __iomem *addr, int size, u32 val);
> -- 
> 2.37.2.672.g94769d06f0-goog
> 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] PCI: dwc: Add support for 64-bit MSI target address
  2022-09-28 12:05   ` Serge Semin
@ 2022-09-28 17:52     ` William McVicker
  2022-09-29  8:13       ` Lorenzo Pieralisi
  2022-09-30 13:46     ` Lorenzo Pieralisi
  1 sibling, 1 reply; 26+ messages in thread
From: William McVicker @ 2022-09-28 17:52 UTC (permalink / raw)
  To: Serge Semin
  Cc: Jingoo Han, Gustavo Pimentel, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Bjorn Helgaas, kernel-team,
	Vidya Sagar, Christoph Hellwig, Robin Murphy, linux-pci,
	linux-kernel, kernel test robot

On 09/28/2022, Serge Semin wrote:
> On Thu, Aug 25, 2022 at 06:50:25PM +0000, Will McVicker wrote:
> > Since not all devices require a 32-bit MSI address, add support to the
> > PCIe host driver to allow setting the DMA mask to 64-bits if the 32-bit
> > allocation fails. This allows kernels to disable ZONE_DMA32 and bounce
> > buffering (swiotlb) without risking not being able to get a 32-bit address
> > during DMA allocation.
> 
> What is a problem in having the ZONE_DMA32 enabled anyway?

On Android most devices don't have a 32-bit limitation. Several Android OEMs
have reported significant enough performance improvements after disabling
ZONE_DMA32. These include reducing memory usage, improving the time spent by
kswapd, improving direct reclaim, and improving app launch time.

So this patch series was introduced to remove the dependency on ZONE_DMA32 for
the DW PCIe drivers.

> 
> > 
> > Basically, in the slim chance that there are no 32-bit allocations
> > available, the current PCIe host driver will fail to allocate the msi_msg
> > page due to a DMA address overflow (seen in [1]). With this patch, the
> > PCIe host can retry the allocation with a 64-bit DMA mask if the current
> > PCIe device advertises 64-bit support via its MSI capabilities.
> > 
> > [1] https://lore.kernel.org/all/Yo0soniFborDl7+C@google.com/
> 
> Note the reported error isn't caused by the allocation procedure, but
> by the mapping procedure.
> 
> > 
> > Reported-by: kernel test robot <lkp@intel.com>
> > Signed-off-by: Will McVicker <willmcvicker@google.com>
> > Reviewed-by: Rob Herring <robh@kernel.org>
> > Acked-by: Jingoo Han <jingoohan1@gmail.com>
> > ---
> >  .../pci/controller/dwc/pcie-designware-host.c | 38 ++++++++++++++-----
> >  drivers/pci/controller/dwc/pcie-designware.c  |  8 ++++
> >  drivers/pci/controller/dwc/pcie-designware.h  |  1 +
> >  3 files changed, 38 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> > index 39f3b37d4033..8928a9a29d58 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> > @@ -330,6 +330,9 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> >  	u64 *msi_vaddr;
> >  	int ret;
> >  	u32 ctrl, num_ctrls;
> > +	bool msi_64bit = false;
> > +	bool retry_64bit = false;
> > +	u16 msi_capabilities;
> >  
> >  	for (ctrl = 0; ctrl < MAX_MSI_CTRLS; ctrl++)
> >  		pp->irq_mask[ctrl] = ~0;
> > @@ -367,16 +370,33 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> >  						    dw_chained_msi_isr, pp);
> >  	}
> >  
> > -	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
> > -	if (ret)
> > -		dev_warn(dev, "Failed to set DMA mask to 32-bit. Devices with only 32-bit MSI support may not work properly\n");
> 
> > +	msi_capabilities = dw_pcie_msi_capabilities(pci);
> > +	if (msi_capabilities & PCI_MSI_FLAGS_ENABLE)
> > +		msi_64bit = msi_capabilities & PCI_MSI_FLAGS_64BIT;
> 
> Note this capability flag has nothing to do with the DW PCIe iMSI-RX
> engine, which is used here to detect and report MSI TLPs. By design
> iMSI-RX always support 64-bit addresses. If you imply having that flag
> set by the DW PCIe platform drivers on the platform-specific probe
> stage as an indication of MSI address range, then ok.

Right. The DW PCIe device driver can set this flag during probe before calling
dw_pcie_host init() to ensure that we will always successfully allocate and map
the MSI target address (as required to return successfully from
dw_pcie_host_init()).

> 
> >  
> > -	msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
> > -					GFP_KERNEL);
> > -	if (!msi_vaddr) {
> > -		dev_err(dev, "Failed to alloc and map MSI data\n");
> > -		dw_pcie_free_msi(pp);
> > -		return -ENOMEM;
> > +	while (true) {
> > +		dev_dbg(dev, "Setting MSI DMA mask to %s-bit.\n",
> > +			retry_64bit ? "64" : "32");
> 
> > +		ret = dma_set_mask_and_coherent(dev, retry_64bit ?
> > +						DMA_BIT_MASK(64) :
> > +						DMA_BIT_MASK(32));
> 
> I'd suggest to just drop this. No DMA actually performed on getting the
> MSI TLPs. So modifying the device DMA-mask due to something which
> doesn't cause DMA and based on the flag which doesn't indicates the
> device DMA-capability is at least inappropriate.
> 
> > +		if (ret)
> > +			dev_warn(dev, "Failed to set DMA mask to %s-bit.\n",
> > +				 retry_64bit ? "64" : "32");
> > +
> 
> > +		msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
> > +						GFP_KERNEL);
> 
> As I noted earlier the DMA-coherent memory can be too expensive. So
> it's a waste of one allocating with no intent of usage. Instead of this
> just get back the alloc_page() method here and pass the flag GFP_DMA32
> to that function if MSI-capability reported the PCI_MSI_FLAGS_64BIT
> unset.

As mentioned above, we don't want to force this driver to require the kernel to
enable ZONE_DMA32. Since no I/O happens to this buffer, could we use
dma_alloc_attrs() with the DMA_ATTR_SKIP_CPU_SYNC and
DMA_ATTR_NO_KERNEL_MAPPING attribute? Would that address the "too expensive"
issues you're referring to?

With regards to the DMA mask, I'm okay with moving that out of the host
controller and into the DW PCIe device driver. That would address all of my
issues and we could just drop the logic for checking the PCI_MSI_FLAGS_64BIT.
However, I'm not the one you to convince to do that.

Regards,
Will

> 
> -Sergey
> 
> > +		if (!msi_vaddr) {
> > +			dev_err(dev, "Failed to alloc and map MSI data\n");
> > +			if (msi_64bit && !retry_64bit) {
> > +				retry_64bit = true;
> > +				continue;
> > +			}
> > +
> > +			dw_pcie_free_msi(pp);
> > +			return -ENOMEM;
> > +		}
> > +		break;
> >  	}
> >  
> >  	return 0;
> > diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
> > index c6725c519a47..650a7f22f9d0 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware.c
> > +++ b/drivers/pci/controller/dwc/pcie-designware.c
> > @@ -82,6 +82,14 @@ u8 dw_pcie_find_capability(struct dw_pcie *pci, u8 cap)
> >  }
> >  EXPORT_SYMBOL_GPL(dw_pcie_find_capability);
> >  
> > +u16 dw_pcie_msi_capabilities(struct dw_pcie *pci)
> > +{
> > +	u8 offset;
> > +
> > +	offset = dw_pcie_find_capability(pci, PCI_CAP_ID_MSI);
> > +	return dw_pcie_readw_dbi(pci, offset + PCI_MSI_FLAGS);
> > +}
> > +
> >  static u16 dw_pcie_find_next_ext_capability(struct dw_pcie *pci, u16 start,
> >  					    u8 cap)
> >  {
> > diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
> > index a871ae7eb59e..45fcdfc8c035 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware.h
> > +++ b/drivers/pci/controller/dwc/pcie-designware.h
> > @@ -332,6 +332,7 @@ void dw_pcie_version_detect(struct dw_pcie *pci);
> >  
> >  u8 dw_pcie_find_capability(struct dw_pcie *pci, u8 cap);
> >  u16 dw_pcie_find_ext_capability(struct dw_pcie *pci, u8 cap);
> > +u16 dw_pcie_msi_capabilities(struct dw_pcie *pci);
> >  
> >  int dw_pcie_read(void __iomem *addr, int size, u32 *val);
> >  int dw_pcie_write(void __iomem *addr, int size, u32 val);
> > -- 
> > 2.37.2.672.g94769d06f0-goog
> > 
> > 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] PCI: dwc: Add support for 64-bit MSI target address
  2022-09-28 17:52     ` William McVicker
@ 2022-09-29  8:13       ` Lorenzo Pieralisi
  2022-09-29 18:50         ` William McVicker
  0 siblings, 1 reply; 26+ messages in thread
From: Lorenzo Pieralisi @ 2022-09-29  8:13 UTC (permalink / raw)
  To: William McVicker
  Cc: Serge Semin, Jingoo Han, Gustavo Pimentel, Rob Herring,
	Krzysztof Wilczyński, Bjorn Helgaas, kernel-team,
	Vidya Sagar, Christoph Hellwig, Robin Murphy, linux-pci,
	linux-kernel, kernel test robot

On Wed, Sep 28, 2022 at 05:52:26PM +0000, William McVicker wrote:
> On 09/28/2022, Serge Semin wrote:
> > On Thu, Aug 25, 2022 at 06:50:25PM +0000, Will McVicker wrote:
> > > Since not all devices require a 32-bit MSI address, add support to the
> > > PCIe host driver to allow setting the DMA mask to 64-bits if the 32-bit
> > > allocation fails. This allows kernels to disable ZONE_DMA32 and bounce
> > > buffering (swiotlb) without risking not being able to get a 32-bit address
> > > during DMA allocation.
> > 
> > What is a problem in having the ZONE_DMA32 enabled anyway?
> 
> On Android most devices don't have a 32-bit limitation. Several Android OEMs
> have reported significant enough performance improvements after disabling
> ZONE_DMA32. These include reducing memory usage, improving the time spent by
> kswapd, improving direct reclaim, and improving app launch time.
> 
> So this patch series was introduced to remove the dependency on ZONE_DMA32 for
> the DW PCIe drivers.
> 
> > 
> > > 
> > > Basically, in the slim chance that there are no 32-bit allocations
> > > available, the current PCIe host driver will fail to allocate the msi_msg
> > > page due to a DMA address overflow (seen in [1]). With this patch, the
> > > PCIe host can retry the allocation with a 64-bit DMA mask if the current
> > > PCIe device advertises 64-bit support via its MSI capabilities.
> > > 
> > > [1] https://lore.kernel.org/all/Yo0soniFborDl7+C@google.com/
> > 
> > Note the reported error isn't caused by the allocation procedure, but
> > by the mapping procedure.
> > 
> > > 
> > > Reported-by: kernel test robot <lkp@intel.com>
> > > Signed-off-by: Will McVicker <willmcvicker@google.com>
> > > Reviewed-by: Rob Herring <robh@kernel.org>
> > > Acked-by: Jingoo Han <jingoohan1@gmail.com>
> > > ---
> > >  .../pci/controller/dwc/pcie-designware-host.c | 38 ++++++++++++++-----
> > >  drivers/pci/controller/dwc/pcie-designware.c  |  8 ++++
> > >  drivers/pci/controller/dwc/pcie-designware.h  |  1 +
> > >  3 files changed, 38 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> > > index 39f3b37d4033..8928a9a29d58 100644
> > > --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> > > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> > > @@ -330,6 +330,9 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> > >  	u64 *msi_vaddr;
> > >  	int ret;
> > >  	u32 ctrl, num_ctrls;
> > > +	bool msi_64bit = false;
> > > +	bool retry_64bit = false;
> > > +	u16 msi_capabilities;
> > >  
> > >  	for (ctrl = 0; ctrl < MAX_MSI_CTRLS; ctrl++)
> > >  		pp->irq_mask[ctrl] = ~0;
> > > @@ -367,16 +370,33 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> > >  						    dw_chained_msi_isr, pp);
> > >  	}
> > >  
> > > -	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
> > > -	if (ret)
> > > -		dev_warn(dev, "Failed to set DMA mask to 32-bit. Devices with only 32-bit MSI support may not work properly\n");
> > 
> > > +	msi_capabilities = dw_pcie_msi_capabilities(pci);
> > > +	if (msi_capabilities & PCI_MSI_FLAGS_ENABLE)
> > > +		msi_64bit = msi_capabilities & PCI_MSI_FLAGS_64BIT;
> > 
> > Note this capability flag has nothing to do with the DW PCIe iMSI-RX
> > engine, which is used here to detect and report MSI TLPs. By design
> > iMSI-RX always support 64-bit addresses. If you imply having that flag
> > set by the DW PCIe platform drivers on the platform-specific probe
> > stage as an indication of MSI address range, then ok.
> 
> Right. The DW PCIe device driver can set this flag during probe before calling
> dw_pcie_host init() to ensure that we will always successfully allocate and map
> the MSI target address (as required to return successfully from
> dw_pcie_host_init()).
> 
> > 
> > >  
> > > -	msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
> > > -					GFP_KERNEL);
> > > -	if (!msi_vaddr) {
> > > -		dev_err(dev, "Failed to alloc and map MSI data\n");
> > > -		dw_pcie_free_msi(pp);
> > > -		return -ENOMEM;
> > > +	while (true) {
> > > +		dev_dbg(dev, "Setting MSI DMA mask to %s-bit.\n",
> > > +			retry_64bit ? "64" : "32");
> > 
> > > +		ret = dma_set_mask_and_coherent(dev, retry_64bit ?
> > > +						DMA_BIT_MASK(64) :
> > > +						DMA_BIT_MASK(32));
> > 
> > I'd suggest to just drop this. No DMA actually performed on getting the
> > MSI TLPs. So modifying the device DMA-mask due to something which
> > doesn't cause DMA and based on the flag which doesn't indicates the
> > device DMA-capability is at least inappropriate.
> > 
> > > +		if (ret)
> > > +			dev_warn(dev, "Failed to set DMA mask to %s-bit.\n",
> > > +				 retry_64bit ? "64" : "32");
> > > +
> > 
> > > +		msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
> > > +						GFP_KERNEL);
> > 
> > As I noted earlier the DMA-coherent memory can be too expensive. So
> > it's a waste of one allocating with no intent of usage. Instead of this
> > just get back the alloc_page() method here and pass the flag GFP_DMA32
> > to that function if MSI-capability reported the PCI_MSI_FLAGS_64BIT
> > unset.
> 
> As mentioned above, we don't want to force this driver to require the kernel to
> enable ZONE_DMA32. Since no I/O happens to this buffer, could we use
> dma_alloc_attrs() with the DMA_ATTR_SKIP_CPU_SYNC and
> DMA_ATTR_NO_KERNEL_MAPPING attribute? Would that address the "too expensive"
> issues you're referring to?
> 
> With regards to the DMA mask, I'm okay with moving that out of the host
> controller and into the DW PCIe device driver. That would address all of my
> issues and we could just drop the logic for checking the PCI_MSI_FLAGS_64BIT.
> However, I'm not the one you to convince to do that.

We are late -rc7 and it does not look like we are converging on this
discussion - I will wait till tomorrow but then I will have to drop

https://lore.kernel.org/linux-pci/20220825235404.4132818-1-willmcvicker@google.com

from the PCI queue for v6.1 so that we can restart from a clean slate.

Lorenzo

> Regards,
> Will
> 
> > 
> > -Sergey
> > 
> > > +		if (!msi_vaddr) {
> > > +			dev_err(dev, "Failed to alloc and map MSI data\n");
> > > +			if (msi_64bit && !retry_64bit) {
> > > +				retry_64bit = true;
> > > +				continue;
> > > +			}
> > > +
> > > +			dw_pcie_free_msi(pp);
> > > +			return -ENOMEM;
> > > +		}
> > > +		break;
> > >  	}
> > >  
> > >  	return 0;
> > > diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
> > > index c6725c519a47..650a7f22f9d0 100644
> > > --- a/drivers/pci/controller/dwc/pcie-designware.c
> > > +++ b/drivers/pci/controller/dwc/pcie-designware.c
> > > @@ -82,6 +82,14 @@ u8 dw_pcie_find_capability(struct dw_pcie *pci, u8 cap)
> > >  }
> > >  EXPORT_SYMBOL_GPL(dw_pcie_find_capability);
> > >  
> > > +u16 dw_pcie_msi_capabilities(struct dw_pcie *pci)
> > > +{
> > > +	u8 offset;
> > > +
> > > +	offset = dw_pcie_find_capability(pci, PCI_CAP_ID_MSI);
> > > +	return dw_pcie_readw_dbi(pci, offset + PCI_MSI_FLAGS);
> > > +}
> > > +
> > >  static u16 dw_pcie_find_next_ext_capability(struct dw_pcie *pci, u16 start,
> > >  					    u8 cap)
> > >  {
> > > diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
> > > index a871ae7eb59e..45fcdfc8c035 100644
> > > --- a/drivers/pci/controller/dwc/pcie-designware.h
> > > +++ b/drivers/pci/controller/dwc/pcie-designware.h
> > > @@ -332,6 +332,7 @@ void dw_pcie_version_detect(struct dw_pcie *pci);
> > >  
> > >  u8 dw_pcie_find_capability(struct dw_pcie *pci, u8 cap);
> > >  u16 dw_pcie_find_ext_capability(struct dw_pcie *pci, u8 cap);
> > > +u16 dw_pcie_msi_capabilities(struct dw_pcie *pci);
> > >  
> > >  int dw_pcie_read(void __iomem *addr, int size, u32 *val);
> > >  int dw_pcie_write(void __iomem *addr, int size, u32 val);
> > > -- 
> > > 2.37.2.672.g94769d06f0-goog
> > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 1/2] PCI: dwc: Drop dependency on ZONE_DMA32
  2022-09-28 11:41   ` Serge Semin
@ 2022-09-29 18:25     ` Robin Murphy
  2022-09-29 19:32       ` Serge Semin
  0 siblings, 1 reply; 26+ messages in thread
From: Robin Murphy @ 2022-09-29 18:25 UTC (permalink / raw)
  To: Serge Semin, Will McVicker
  Cc: Jingoo Han, Gustavo Pimentel, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Bjorn Helgaas, kernel-team,
	Vidya Sagar, Christoph Hellwig, linux-pci, linux-kernel,
	Isaac J . Manjarres

On 2022-09-28 12:41, Serge Semin wrote:
> On Thu, Aug 25, 2022 at 06:50:24PM +0000, Will McVicker wrote:
>> Re-work the msi_msg DMA allocation logic to use dmam_alloc_coherent() which
>> uses the coherent DMA mask to try to return an allocation within the DMA
>> mask limits. With that, we now can drop the msi_page parameter in struct
>> dw_pcie_rp. This allows kernel configurations that disable ZONE_DMA32 to
>> continue supporting a 32-bit DMA mask. Without this patch, the PCIe host
>> device will fail to probe when ZONE_DMA32 is disabled.
> 
> As Rob already said here
> https://lore.kernel.org/all/CAL_JsqJh=d-B51b6yPBRq0tOwbChN=AFPr-a19U1QdQZAE7c1A@mail.gmail.com/
> and I mentioned in this thread
> https://lore.kernel.org/linux-pci/20220912000211.ct6asuhhmnatje5e@mobilestation/
> DW PCIe MSI doesn't cause any DMA due to the way the iMSI-RX engine is
> designed. So reserving any real system memory is a waste of one in
> this case. Reserving DMA-coherent even more inappropriate since it
> can be expensive on some platforms (see note in Part Ia of
> Documentation/core-api/dma-api.rst). For instance on MIPS32 with
> non-corehent common DMA.

This has been discussed before - in general it is difficult to pick an 
arbitrary MSI address that is *guaranteed* not to overlap any valid DMA 
address that somebody may try to use later. However there is a very easy 
way to guarantee that the DMA API won't give anyone a particular DMA 
address, which is to get an address directly from the DMA API and keep 
it. Yes, that can technically be done with a streaming mapping *if* you 
already have some memory allocated in a suitable physical location, but 
coherent allocations are even more foolproof, simpler to clean up 
(particularly with devres), and unlikely to be an issue on relevant 
platforms (do any MIPS32 systems use this driver?)

>> Fixes: 35797e672ff0 ("PCI: dwc: Fix MSI msi_msg DMA mapping")
>> Reported-by: Isaac J. Manjarres <isaacmanjarres@google.com>
>> Signed-off-by: Will McVicker <willmcvicker@google.com>
>> Acked-by: Jingoo Han <jingoohan1@gmail.com>
>> Reviewed-by: Rob Herring <robh@kernel.org>
>> ---
>>   .../pci/controller/dwc/pcie-designware-host.c | 28 +++++--------------
>>   drivers/pci/controller/dwc/pcie-designware.h  |  1 -
>>   2 files changed, 7 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
>> index 7746f94a715f..39f3b37d4033 100644
>> --- a/drivers/pci/controller/dwc/pcie-designware-host.c
>> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
>> @@ -267,15 +267,6 @@ static void dw_pcie_free_msi(struct dw_pcie_rp *pp)
>>   
>>   	irq_domain_remove(pp->msi_domain);
>>   	irq_domain_remove(pp->irq_domain);
>> -
>> -	if (pp->msi_data) {
>> -		struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
>> -		struct device *dev = pci->dev;
>> -
>> -		dma_unmap_page(dev, pp->msi_data, PAGE_SIZE, DMA_FROM_DEVICE);
>> -		if (pp->msi_page)
>> -			__free_page(pp->msi_page);
>> -	}
>>   }
>>   
>>   static void dw_pcie_msi_init(struct dw_pcie_rp *pp)
>> @@ -336,6 +327,7 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
>>   	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
>>   	struct device *dev = pci->dev;
>>   	struct platform_device *pdev = to_platform_device(dev);
>> +	u64 *msi_vaddr;
>>   	int ret;
>>   	u32 ctrl, num_ctrls;
>>   
>> @@ -375,22 +367,16 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
>>   						    dw_chained_msi_isr, pp);
>>   	}
>>   
> 
>> -	ret = dma_set_mask(dev, DMA_BIT_MASK(32));
>> +	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
> 
> This has been redundant in the first place since none of the DW PCIe
> low-level drivers update the mask, and it's of 32-bits wide by default
> anyway:
> https://elixir.bootlin.com/linux/latest/source/drivers/of/platform.c#L167

No, in general drivers should always explicitly set their mask(s) and 
check the return value to make sure DMA is possible at all before trying 
any other DMA API calls. There's no guarantee that the default mask is 
usable (e.g. some systems don't have any 32-bit addressable RAM), or 
that it's even always 32 bits (due to crufty reasons of something 
of_dma_configure() tried to do a long time ago).

Thanks,
Robin.

>>   	if (ret)
>>   		dev_warn(dev, "Failed to set DMA mask to 32-bit. Devices with only 32-bit MSI support may not work properly\n");
>>   
>> -	pp->msi_page = alloc_page(GFP_DMA32);
>> -	pp->msi_data = dma_map_page(dev, pp->msi_page, 0,
>> -				    PAGE_SIZE, DMA_FROM_DEVICE);
>> -	ret = dma_mapping_error(dev, pp->msi_data);
>> -	if (ret) {
>> -		dev_err(pci->dev, "Failed to map MSI data\n");
>> -		__free_page(pp->msi_page);
>> -		pp->msi_page = NULL;
>> -		pp->msi_data = 0;
>> +	msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
>> +					GFP_KERNEL);
> 
> Changing the whole device DMA-mask due to something that doesn't
> perform seems inappropriate. I'd suggest to preserve the ZONE_DMA32
> here until there is something like suggested by @Robin
> https://lore.kernel.org/linux-pci/1e63a581-14ae-b4b5-a5bf-ca8f09c33af6@arm.com/
> in the last paragraph is implemented. Especially seeing there still
> common drivers in kernel which still rely on that zone.
> 
> -Sergey
> 
>> +	if (!msi_vaddr) {
>> +		dev_err(dev, "Failed to alloc and map MSI data\n");
>>   		dw_pcie_free_msi(pp);
>> -
>> -		return ret;
>> +		return -ENOMEM;
>>   	}
>>   
>>   	return 0;
>> diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
>> index 09b887093a84..a871ae7eb59e 100644
>> --- a/drivers/pci/controller/dwc/pcie-designware.h
>> +++ b/drivers/pci/controller/dwc/pcie-designware.h
>> @@ -243,7 +243,6 @@ struct dw_pcie_rp {
>>   	struct irq_domain	*irq_domain;
>>   	struct irq_domain	*msi_domain;
>>   	dma_addr_t		msi_data;
>> -	struct page		*msi_page;
>>   	struct irq_chip		*msi_irq_chip;
>>   	u32			num_vectors;
>>   	u32			irq_mask[MAX_MSI_CTRLS];
>> -- 
>> 2.37.2.672.g94769d06f0-goog
>>
>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] PCI: dwc: Add support for 64-bit MSI target address
  2022-09-29  8:13       ` Lorenzo Pieralisi
@ 2022-09-29 18:50         ` William McVicker
  2022-09-29 19:00           ` Serge Semin
  0 siblings, 1 reply; 26+ messages in thread
From: William McVicker @ 2022-09-29 18:50 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Serge Semin, Jingoo Han, Gustavo Pimentel, Rob Herring,
	Krzysztof Wilczyński, Bjorn Helgaas, kernel-team,
	Vidya Sagar, Christoph Hellwig, Robin Murphy, linux-pci,
	linux-kernel, kernel test robot

On 09/29/2022, Lorenzo Pieralisi wrote:
> On Wed, Sep 28, 2022 at 05:52:26PM +0000, William McVicker wrote:
> > On 09/28/2022, Serge Semin wrote:
> > > On Thu, Aug 25, 2022 at 06:50:25PM +0000, Will McVicker wrote:
> > > > Since not all devices require a 32-bit MSI address, add support to the
> > > > PCIe host driver to allow setting the DMA mask to 64-bits if the 32-bit
> > > > allocation fails. This allows kernels to disable ZONE_DMA32 and bounce
> > > > buffering (swiotlb) without risking not being able to get a 32-bit address
> > > > during DMA allocation.
> > > 
> > > What is a problem in having the ZONE_DMA32 enabled anyway?
> > 
> > On Android most devices don't have a 32-bit limitation. Several Android OEMs
> > have reported significant enough performance improvements after disabling
> > ZONE_DMA32. These include reducing memory usage, improving the time spent by
> > kswapd, improving direct reclaim, and improving app launch time.
> > 
> > So this patch series was introduced to remove the dependency on ZONE_DMA32 for
> > the DW PCIe drivers.
> > 
> > > 
> > > > 
> > > > Basically, in the slim chance that there are no 32-bit allocations
> > > > available, the current PCIe host driver will fail to allocate the msi_msg
> > > > page due to a DMA address overflow (seen in [1]). With this patch, the
> > > > PCIe host can retry the allocation with a 64-bit DMA mask if the current
> > > > PCIe device advertises 64-bit support via its MSI capabilities.
> > > > 
> > > > [1] https://lore.kernel.org/all/Yo0soniFborDl7+C@google.com/
> > > 
> > > Note the reported error isn't caused by the allocation procedure, but
> > > by the mapping procedure.
> > > 
> > > > 
> > > > Reported-by: kernel test robot <lkp@intel.com>
> > > > Signed-off-by: Will McVicker <willmcvicker@google.com>
> > > > Reviewed-by: Rob Herring <robh@kernel.org>
> > > > Acked-by: Jingoo Han <jingoohan1@gmail.com>
> > > > ---
> > > >  .../pci/controller/dwc/pcie-designware-host.c | 38 ++++++++++++++-----
> > > >  drivers/pci/controller/dwc/pcie-designware.c  |  8 ++++
> > > >  drivers/pci/controller/dwc/pcie-designware.h  |  1 +
> > > >  3 files changed, 38 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> > > > index 39f3b37d4033..8928a9a29d58 100644
> > > > --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> > > > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> > > > @@ -330,6 +330,9 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> > > >  	u64 *msi_vaddr;
> > > >  	int ret;
> > > >  	u32 ctrl, num_ctrls;
> > > > +	bool msi_64bit = false;
> > > > +	bool retry_64bit = false;
> > > > +	u16 msi_capabilities;
> > > >  
> > > >  	for (ctrl = 0; ctrl < MAX_MSI_CTRLS; ctrl++)
> > > >  		pp->irq_mask[ctrl] = ~0;
> > > > @@ -367,16 +370,33 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> > > >  						    dw_chained_msi_isr, pp);
> > > >  	}
> > > >  
> > > > -	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
> > > > -	if (ret)
> > > > -		dev_warn(dev, "Failed to set DMA mask to 32-bit. Devices with only 32-bit MSI support may not work properly\n");
> > > 
> > > > +	msi_capabilities = dw_pcie_msi_capabilities(pci);
> > > > +	if (msi_capabilities & PCI_MSI_FLAGS_ENABLE)
> > > > +		msi_64bit = msi_capabilities & PCI_MSI_FLAGS_64BIT;
> > > 
> > > Note this capability flag has nothing to do with the DW PCIe iMSI-RX
> > > engine, which is used here to detect and report MSI TLPs. By design
> > > iMSI-RX always support 64-bit addresses. If you imply having that flag
> > > set by the DW PCIe platform drivers on the platform-specific probe
> > > stage as an indication of MSI address range, then ok.
> > 
> > Right. The DW PCIe device driver can set this flag during probe before calling
> > dw_pcie_host init() to ensure that we will always successfully allocate and map
> > the MSI target address (as required to return successfully from
> > dw_pcie_host_init()).
> > 
> > > 
> > > >  
> > > > -	msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
> > > > -					GFP_KERNEL);
> > > > -	if (!msi_vaddr) {
> > > > -		dev_err(dev, "Failed to alloc and map MSI data\n");
> > > > -		dw_pcie_free_msi(pp);
> > > > -		return -ENOMEM;
> > > > +	while (true) {
> > > > +		dev_dbg(dev, "Setting MSI DMA mask to %s-bit.\n",
> > > > +			retry_64bit ? "64" : "32");
> > > 
> > > > +		ret = dma_set_mask_and_coherent(dev, retry_64bit ?
> > > > +						DMA_BIT_MASK(64) :
> > > > +						DMA_BIT_MASK(32));
> > > 
> > > I'd suggest to just drop this. No DMA actually performed on getting the
> > > MSI TLPs. So modifying the device DMA-mask due to something which
> > > doesn't cause DMA and based on the flag which doesn't indicates the
> > > device DMA-capability is at least inappropriate.
> > > 
> > > > +		if (ret)
> > > > +			dev_warn(dev, "Failed to set DMA mask to %s-bit.\n",
> > > > +				 retry_64bit ? "64" : "32");
> > > > +
> > > 
> > > > +		msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
> > > > +						GFP_KERNEL);
> > > 
> > > As I noted earlier the DMA-coherent memory can be too expensive. So
> > > it's a waste of one allocating with no intent of usage. Instead of this
> > > just get back the alloc_page() method here and pass the flag GFP_DMA32
> > > to that function if MSI-capability reported the PCI_MSI_FLAGS_64BIT
> > > unset.
> > 
> > As mentioned above, we don't want to force this driver to require the kernel to
> > enable ZONE_DMA32. Since no I/O happens to this buffer, could we use
> > dma_alloc_attrs() with the DMA_ATTR_SKIP_CPU_SYNC and
> > DMA_ATTR_NO_KERNEL_MAPPING attribute? Would that address the "too expensive"
> > issues you're referring to?
> > 
> > With regards to the DMA mask, I'm okay with moving that out of the host
> > controller and into the DW PCIe device driver. That would address all of my
> > issues and we could just drop the logic for checking the PCI_MSI_FLAGS_64BIT.
> > However, I'm not the one you to convince to do that.
> 
> We are late -rc7 and it does not look like we are converging on this
> discussion - I will wait till tomorrow but then I will have to drop
> 
> https://lore.kernel.org/linux-pci/20220825235404.4132818-1-willmcvicker@google.com
> 
> from the PCI queue for v6.1 so that we can restart from a clean slate.
> 
> Lorenzo
> 

Hi Lorenzo,

Based on Robin's response [1], I don't think we should change the
implementation based on MIPS32 until we have (1) someone showing MIPS32 is
using this driver and (2) that there's an actual perf regression when using
dmam_alloc_coherent(). My patch series addresses a real issue by removing the
dependency on ZONE_DMA32. Even if we did drop my patches, it won't solve
Serge's DMA mask issues since the DW PCIe host driver will continue to
unconditionally set the mask to 32-bits.

[1] https://lore.kernel.org/linux-pci/4dc31a63-00b1-f379-c5ac-7dc9425937f4@arm.com/

Thanks,
Will

> > Regards,
> > Will
> > 
> > > 
> > > -Sergey
> > > 
> > > > +		if (!msi_vaddr) {
> > > > +			dev_err(dev, "Failed to alloc and map MSI data\n");
> > > > +			if (msi_64bit && !retry_64bit) {
> > > > +				retry_64bit = true;
> > > > +				continue;
> > > > +			}
> > > > +
> > > > +			dw_pcie_free_msi(pp);
> > > > +			return -ENOMEM;
> > > > +		}
> > > > +		break;
> > > >  	}
> > > >  
> > > >  	return 0;
> > > > diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
> > > > index c6725c519a47..650a7f22f9d0 100644
> > > > --- a/drivers/pci/controller/dwc/pcie-designware.c
> > > > +++ b/drivers/pci/controller/dwc/pcie-designware.c
> > > > @@ -82,6 +82,14 @@ u8 dw_pcie_find_capability(struct dw_pcie *pci, u8 cap)
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(dw_pcie_find_capability);
> > > >  
> > > > +u16 dw_pcie_msi_capabilities(struct dw_pcie *pci)
> > > > +{
> > > > +	u8 offset;
> > > > +
> > > > +	offset = dw_pcie_find_capability(pci, PCI_CAP_ID_MSI);
> > > > +	return dw_pcie_readw_dbi(pci, offset + PCI_MSI_FLAGS);
> > > > +}
> > > > +
> > > >  static u16 dw_pcie_find_next_ext_capability(struct dw_pcie *pci, u16 start,
> > > >  					    u8 cap)
> > > >  {
> > > > diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
> > > > index a871ae7eb59e..45fcdfc8c035 100644
> > > > --- a/drivers/pci/controller/dwc/pcie-designware.h
> > > > +++ b/drivers/pci/controller/dwc/pcie-designware.h
> > > > @@ -332,6 +332,7 @@ void dw_pcie_version_detect(struct dw_pcie *pci);
> > > >  
> > > >  u8 dw_pcie_find_capability(struct dw_pcie *pci, u8 cap);
> > > >  u16 dw_pcie_find_ext_capability(struct dw_pcie *pci, u8 cap);
> > > > +u16 dw_pcie_msi_capabilities(struct dw_pcie *pci);
> > > >  
> > > >  int dw_pcie_read(void __iomem *addr, int size, u32 *val);
> > > >  int dw_pcie_write(void __iomem *addr, int size, u32 val);
> > > > -- 
> > > > 2.37.2.672.g94769d06f0-goog
> > > > 
> > > > 
> > 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] PCI: dwc: Add support for 64-bit MSI target address
  2022-09-29 18:50         ` William McVicker
@ 2022-09-29 19:00           ` Serge Semin
  0 siblings, 0 replies; 26+ messages in thread
From: Serge Semin @ 2022-09-29 19:00 UTC (permalink / raw)
  To: William McVicker
  Cc: Lorenzo Pieralisi, Jingoo Han, Gustavo Pimentel, Rob Herring,
	Krzysztof Wilczyński, Bjorn Helgaas, kernel-team,
	Vidya Sagar, Christoph Hellwig, Robin Murphy, linux-pci,
	linux-kernel, kernel test robot

On Thu, Sep 29, 2022 at 06:50:01PM +0000, William McVicker wrote:
> On 09/29/2022, Lorenzo Pieralisi wrote:
> > On Wed, Sep 28, 2022 at 05:52:26PM +0000, William McVicker wrote:
> > > On 09/28/2022, Serge Semin wrote:
> > > > On Thu, Aug 25, 2022 at 06:50:25PM +0000, Will McVicker wrote:
> > > > > Since not all devices require a 32-bit MSI address, add support to the
> > > > > PCIe host driver to allow setting the DMA mask to 64-bits if the 32-bit
> > > > > allocation fails. This allows kernels to disable ZONE_DMA32 and bounce
> > > > > buffering (swiotlb) without risking not being able to get a 32-bit address
> > > > > during DMA allocation.
> > > > 
> > > > What is a problem in having the ZONE_DMA32 enabled anyway?
> > > 
> > > On Android most devices don't have a 32-bit limitation. Several Android OEMs
> > > have reported significant enough performance improvements after disabling
> > > ZONE_DMA32. These include reducing memory usage, improving the time spent by
> > > kswapd, improving direct reclaim, and improving app launch time.
> > > 
> > > So this patch series was introduced to remove the dependency on ZONE_DMA32 for
> > > the DW PCIe drivers.
> > > 
> > > > 
> > > > > 
> > > > > Basically, in the slim chance that there are no 32-bit allocations
> > > > > available, the current PCIe host driver will fail to allocate the msi_msg
> > > > > page due to a DMA address overflow (seen in [1]). With this patch, the
> > > > > PCIe host can retry the allocation with a 64-bit DMA mask if the current
> > > > > PCIe device advertises 64-bit support via its MSI capabilities.
> > > > > 
> > > > > [1] https://lore.kernel.org/all/Yo0soniFborDl7+C@google.com/
> > > > 
> > > > Note the reported error isn't caused by the allocation procedure, but
> > > > by the mapping procedure.
> > > > 
> > > > > 
> > > > > Reported-by: kernel test robot <lkp@intel.com>
> > > > > Signed-off-by: Will McVicker <willmcvicker@google.com>
> > > > > Reviewed-by: Rob Herring <robh@kernel.org>
> > > > > Acked-by: Jingoo Han <jingoohan1@gmail.com>
> > > > > ---
> > > > >  .../pci/controller/dwc/pcie-designware-host.c | 38 ++++++++++++++-----
> > > > >  drivers/pci/controller/dwc/pcie-designware.c  |  8 ++++
> > > > >  drivers/pci/controller/dwc/pcie-designware.h  |  1 +
> > > > >  3 files changed, 38 insertions(+), 9 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> > > > > index 39f3b37d4033..8928a9a29d58 100644
> > > > > --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> > > > > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> > > > > @@ -330,6 +330,9 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> > > > >  	u64 *msi_vaddr;
> > > > >  	int ret;
> > > > >  	u32 ctrl, num_ctrls;
> > > > > +	bool msi_64bit = false;
> > > > > +	bool retry_64bit = false;
> > > > > +	u16 msi_capabilities;
> > > > >  
> > > > >  	for (ctrl = 0; ctrl < MAX_MSI_CTRLS; ctrl++)
> > > > >  		pp->irq_mask[ctrl] = ~0;
> > > > > @@ -367,16 +370,33 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> > > > >  						    dw_chained_msi_isr, pp);
> > > > >  	}
> > > > >  
> > > > > -	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
> > > > > -	if (ret)
> > > > > -		dev_warn(dev, "Failed to set DMA mask to 32-bit. Devices with only 32-bit MSI support may not work properly\n");
> > > > 
> > > > > +	msi_capabilities = dw_pcie_msi_capabilities(pci);
> > > > > +	if (msi_capabilities & PCI_MSI_FLAGS_ENABLE)
> > > > > +		msi_64bit = msi_capabilities & PCI_MSI_FLAGS_64BIT;
> > > > 
> > > > Note this capability flag has nothing to do with the DW PCIe iMSI-RX
> > > > engine, which is used here to detect and report MSI TLPs. By design
> > > > iMSI-RX always support 64-bit addresses. If you imply having that flag
> > > > set by the DW PCIe platform drivers on the platform-specific probe
> > > > stage as an indication of MSI address range, then ok.
> > > 
> > > Right. The DW PCIe device driver can set this flag during probe before calling
> > > dw_pcie_host init() to ensure that we will always successfully allocate and map
> > > the MSI target address (as required to return successfully from
> > > dw_pcie_host_init()).
> > > 
> > > > 
> > > > >  
> > > > > -	msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
> > > > > -					GFP_KERNEL);
> > > > > -	if (!msi_vaddr) {
> > > > > -		dev_err(dev, "Failed to alloc and map MSI data\n");
> > > > > -		dw_pcie_free_msi(pp);
> > > > > -		return -ENOMEM;
> > > > > +	while (true) {
> > > > > +		dev_dbg(dev, "Setting MSI DMA mask to %s-bit.\n",
> > > > > +			retry_64bit ? "64" : "32");
> > > > 
> > > > > +		ret = dma_set_mask_and_coherent(dev, retry_64bit ?
> > > > > +						DMA_BIT_MASK(64) :
> > > > > +						DMA_BIT_MASK(32));
> > > > 
> > > > I'd suggest to just drop this. No DMA actually performed on getting the
> > > > MSI TLPs. So modifying the device DMA-mask due to something which
> > > > doesn't cause DMA and based on the flag which doesn't indicates the
> > > > device DMA-capability is at least inappropriate.
> > > > 
> > > > > +		if (ret)
> > > > > +			dev_warn(dev, "Failed to set DMA mask to %s-bit.\n",
> > > > > +				 retry_64bit ? "64" : "32");
> > > > > +
> > > > 
> > > > > +		msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
> > > > > +						GFP_KERNEL);
> > > > 
> > > > As I noted earlier the DMA-coherent memory can be too expensive. So
> > > > it's a waste of one allocating with no intent of usage. Instead of this
> > > > just get back the alloc_page() method here and pass the flag GFP_DMA32
> > > > to that function if MSI-capability reported the PCI_MSI_FLAGS_64BIT
> > > > unset.
> > > 
> > > As mentioned above, we don't want to force this driver to require the kernel to
> > > enable ZONE_DMA32. Since no I/O happens to this buffer, could we use
> > > dma_alloc_attrs() with the DMA_ATTR_SKIP_CPU_SYNC and
> > > DMA_ATTR_NO_KERNEL_MAPPING attribute? Would that address the "too expensive"
> > > issues you're referring to?
> > > 
> > > With regards to the DMA mask, I'm okay with moving that out of the host
> > > controller and into the DW PCIe device driver. That would address all of my
> > > issues and we could just drop the logic for checking the PCI_MSI_FLAGS_64BIT.
> > > However, I'm not the one you to convince to do that.
> > 
> > We are late -rc7 and it does not look like we are converging on this
> > discussion - I will wait till tomorrow but then I will have to drop
> > 
> > https://lore.kernel.org/linux-pci/20220825235404.4132818-1-willmcvicker@google.com
> > 
> > from the PCI queue for v6.1 so that we can restart from a clean slate.
> > 
> > Lorenzo
> > 
> 
> Hi Lorenzo,
> 
> Based on Robin's response [1], I don't think we should change the
> implementation based on MIPS32 until we have (1) someone showing MIPS32 is
> using this driver and 

This patch adds the DW PCIe controller implemented in the framework of
MIPS32 arch:
https://lore.kernel.org/linux-pci/20220822184701.25246-21-Sergey.Semin@baikalelectronics.ru/

> (2) that there's an actual perf regression when using
> dmam_alloc_coherent(). My patch series addresses a real issue by removing the
> dependency on ZONE_DMA32.

What about finding out what is a root cause of the performance
degradation instead of just dropping the whole standard zone support?

> Even if we did drop my patches, it won't solve
> Serge's DMA mask issues since the DW PCIe host driver will continue to
> unconditionally set the mask to 32-bits.

If you moved the DMA-mask setting to the platform driver that would
have solved my problems. I am pretty much sure the generic code
shouldn't be altering the DMA-mask if it isn't aware of the actual
device capability. In case of DW PCIe controller the AXI-bus address
width is a platform specific parameter and the generic DW PCIe code
doesn't know which width is valid.

-Sergey

> 
> [1] https://lore.kernel.org/linux-pci/4dc31a63-00b1-f379-c5ac-7dc9425937f4@arm.com/
> 
> Thanks,
> Will
> 
> > > Regards,
> > > Will
> > > 
> > > > 
> > > > -Sergey
> > > > 
> > > > > +		if (!msi_vaddr) {
> > > > > +			dev_err(dev, "Failed to alloc and map MSI data\n");
> > > > > +			if (msi_64bit && !retry_64bit) {
> > > > > +				retry_64bit = true;
> > > > > +				continue;
> > > > > +			}
> > > > > +
> > > > > +			dw_pcie_free_msi(pp);
> > > > > +			return -ENOMEM;
> > > > > +		}
> > > > > +		break;
> > > > >  	}
> > > > >  
> > > > >  	return 0;
> > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
> > > > > index c6725c519a47..650a7f22f9d0 100644
> > > > > --- a/drivers/pci/controller/dwc/pcie-designware.c
> > > > > +++ b/drivers/pci/controller/dwc/pcie-designware.c
> > > > > @@ -82,6 +82,14 @@ u8 dw_pcie_find_capability(struct dw_pcie *pci, u8 cap)
> > > > >  }
> > > > >  EXPORT_SYMBOL_GPL(dw_pcie_find_capability);
> > > > >  
> > > > > +u16 dw_pcie_msi_capabilities(struct dw_pcie *pci)
> > > > > +{
> > > > > +	u8 offset;
> > > > > +
> > > > > +	offset = dw_pcie_find_capability(pci, PCI_CAP_ID_MSI);
> > > > > +	return dw_pcie_readw_dbi(pci, offset + PCI_MSI_FLAGS);
> > > > > +}
> > > > > +
> > > > >  static u16 dw_pcie_find_next_ext_capability(struct dw_pcie *pci, u16 start,
> > > > >  					    u8 cap)
> > > > >  {
> > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
> > > > > index a871ae7eb59e..45fcdfc8c035 100644
> > > > > --- a/drivers/pci/controller/dwc/pcie-designware.h
> > > > > +++ b/drivers/pci/controller/dwc/pcie-designware.h
> > > > > @@ -332,6 +332,7 @@ void dw_pcie_version_detect(struct dw_pcie *pci);
> > > > >  
> > > > >  u8 dw_pcie_find_capability(struct dw_pcie *pci, u8 cap);
> > > > >  u16 dw_pcie_find_ext_capability(struct dw_pcie *pci, u8 cap);
> > > > > +u16 dw_pcie_msi_capabilities(struct dw_pcie *pci);
> > > > >  
> > > > >  int dw_pcie_read(void __iomem *addr, int size, u32 *val);
> > > > >  int dw_pcie_write(void __iomem *addr, int size, u32 val);
> > > > > -- 
> > > > > 2.37.2.672.g94769d06f0-goog
> > > > > 
> > > > > 
> > > 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 1/2] PCI: dwc: Drop dependency on ZONE_DMA32
  2022-09-29 18:25     ` Robin Murphy
@ 2022-09-29 19:32       ` Serge Semin
  2022-09-30 11:01         ` Robin Murphy
  0 siblings, 1 reply; 26+ messages in thread
From: Serge Semin @ 2022-09-29 19:32 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Will McVicker, Jingoo Han, Gustavo Pimentel, Lorenzo Pieralisi,
	Rob Herring, Krzysztof Wilczyński, Bjorn Helgaas,
	kernel-team, Vidya Sagar, Christoph Hellwig, linux-pci,
	linux-kernel, Isaac J . Manjarres

On Thu, Sep 29, 2022 at 07:25:03PM +0100, Robin Murphy wrote:
> On 2022-09-28 12:41, Serge Semin wrote:
> > On Thu, Aug 25, 2022 at 06:50:24PM +0000, Will McVicker wrote:
> > > Re-work the msi_msg DMA allocation logic to use dmam_alloc_coherent() which
> > > uses the coherent DMA mask to try to return an allocation within the DMA
> > > mask limits. With that, we now can drop the msi_page parameter in struct
> > > dw_pcie_rp. This allows kernel configurations that disable ZONE_DMA32 to
> > > continue supporting a 32-bit DMA mask. Without this patch, the PCIe host
> > > device will fail to probe when ZONE_DMA32 is disabled.
> > 
> > As Rob already said here
> > https://lore.kernel.org/all/CAL_JsqJh=d-B51b6yPBRq0tOwbChN=AFPr-a19U1QdQZAE7c1A@mail.gmail.com/
> > and I mentioned in this thread
> > https://lore.kernel.org/linux-pci/20220912000211.ct6asuhhmnatje5e@mobilestation/
> > DW PCIe MSI doesn't cause any DMA due to the way the iMSI-RX engine is
> > designed. So reserving any real system memory is a waste of one in
> > this case. Reserving DMA-coherent even more inappropriate since it
> > can be expensive on some platforms (see note in Part Ia of
> > Documentation/core-api/dma-api.rst). For instance on MIPS32 with
> > non-corehent common DMA.
> 

> This has been discussed before - in general it is difficult to pick an
> arbitrary MSI address that is *guaranteed* not to overlap any valid DMA
> address that somebody may try to use later. However there is a very easy way
> to guarantee that the DMA API won't give anyone a particular DMA address,
> which is to get an address directly from the DMA API and keep it. Yes, that
> can technically be done with a streaming mapping *if* you already have some
> memory allocated in a suitable physical location, but coherent allocations
> are even more foolproof, simpler to clean up (particularly with devres), and
> unlikely to be an issue on relevant platforms (do any MIPS32 systems use
> this driver?)

My patchset adds the DW PCIe RP controller support on MIPS32 arch:
https://lore.kernel.org/linux-pci/20220822184701.25246-21-Sergey.Semin@baikalelectronics.ru/

> 
> > > Fixes: 35797e672ff0 ("PCI: dwc: Fix MSI msi_msg DMA mapping")
> > > Reported-by: Isaac J. Manjarres <isaacmanjarres@google.com>
> > > Signed-off-by: Will McVicker <willmcvicker@google.com>
> > > Acked-by: Jingoo Han <jingoohan1@gmail.com>
> > > Reviewed-by: Rob Herring <robh@kernel.org>
> > > ---
> > >   .../pci/controller/dwc/pcie-designware-host.c | 28 +++++--------------
> > >   drivers/pci/controller/dwc/pcie-designware.h  |  1 -
> > >   2 files changed, 7 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> > > index 7746f94a715f..39f3b37d4033 100644
> > > --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> > > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> > > @@ -267,15 +267,6 @@ static void dw_pcie_free_msi(struct dw_pcie_rp *pp)
> > >   	irq_domain_remove(pp->msi_domain);
> > >   	irq_domain_remove(pp->irq_domain);
> > > -
> > > -	if (pp->msi_data) {
> > > -		struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > -		struct device *dev = pci->dev;
> > > -
> > > -		dma_unmap_page(dev, pp->msi_data, PAGE_SIZE, DMA_FROM_DEVICE);
> > > -		if (pp->msi_page)
> > > -			__free_page(pp->msi_page);
> > > -	}
> > >   }
> > >   static void dw_pcie_msi_init(struct dw_pcie_rp *pp)
> > > @@ -336,6 +327,7 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> > >   	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > >   	struct device *dev = pci->dev;
> > >   	struct platform_device *pdev = to_platform_device(dev);
> > > +	u64 *msi_vaddr;
> > >   	int ret;
> > >   	u32 ctrl, num_ctrls;
> > > @@ -375,22 +367,16 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> > >   						    dw_chained_msi_isr, pp);
> > >   	}
> > 
> > > -	ret = dma_set_mask(dev, DMA_BIT_MASK(32));
> > > +	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
> > 
> > This has been redundant in the first place since none of the DW PCIe
> > low-level drivers update the mask, and it's of 32-bits wide by default
> > anyway:
> > https://elixir.bootlin.com/linux/latest/source/drivers/of/platform.c#L167
> 

> No, in general drivers should always explicitly set their mask(s) and check
> the return value to make sure DMA is possible at all before trying any other
> DMA API calls. There's no guarantee that the default mask is usable (e.g.
> some systems don't have any 32-bit addressable RAM), or that it's even
> always 32 bits (due to crufty reasons of something of_dma_configure() tried
> to do a long time ago).

Suppose you are right and DMA-mask should be always set before any
mapping. What do you suggest to do in this case? (1) The code above
overrides the real DMA-mask which could be set by the platform
drivers, which in its turn are normally aware of the device DMA
capabilities. But in this case due to override afterwards any buffers
above 4GB mapping will cause using the bounce buffers. (2) It's set
here for something which isn't actual DMA. So to speak on one side is
this patchset which overrides the mask for something which isn't DMA,
and there are another patchsets:
https://lore.kernel.org/linux-pci/20220822184701.25246-1-Sergey.Semin@baikalelectronics.ru/
and
https://lore.kernel.org/linux-pci/20220728142841.12305-1-Sergey.Semin@baikalelectronics.ru/
which add the real DMA support to DW PCIe driver and for which setting
the real DMA-mask is crucial. What do you suggest? Setting the mask
twice: before allocating MSI-buffer and afterwards for the sake of
eDMA buffers mapping? Moving DMA-mask setting from the generic DW PCIe
code to the platform drivers?

-Sergey

> 
> Thanks,
> Robin.
> 
> > >   	if (ret)
> > >   		dev_warn(dev, "Failed to set DMA mask to 32-bit. Devices with only 32-bit MSI support may not work properly\n");
> > > -	pp->msi_page = alloc_page(GFP_DMA32);
> > > -	pp->msi_data = dma_map_page(dev, pp->msi_page, 0,
> > > -				    PAGE_SIZE, DMA_FROM_DEVICE);
> > > -	ret = dma_mapping_error(dev, pp->msi_data);
> > > -	if (ret) {
> > > -		dev_err(pci->dev, "Failed to map MSI data\n");
> > > -		__free_page(pp->msi_page);
> > > -		pp->msi_page = NULL;
> > > -		pp->msi_data = 0;
> > > +	msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
> > > +					GFP_KERNEL);
> > 
> > Changing the whole device DMA-mask due to something that doesn't
> > perform seems inappropriate. I'd suggest to preserve the ZONE_DMA32
> > here until there is something like suggested by @Robin
> > https://lore.kernel.org/linux-pci/1e63a581-14ae-b4b5-a5bf-ca8f09c33af6@arm.com/
> > in the last paragraph is implemented. Especially seeing there still
> > common drivers in kernel which still rely on that zone.
> > 
> > -Sergey
> > 
> > > +	if (!msi_vaddr) {
> > > +		dev_err(dev, "Failed to alloc and map MSI data\n");
> > >   		dw_pcie_free_msi(pp);
> > > -
> > > -		return ret;
> > > +		return -ENOMEM;
> > >   	}
> > >   	return 0;
> > > diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
> > > index 09b887093a84..a871ae7eb59e 100644
> > > --- a/drivers/pci/controller/dwc/pcie-designware.h
> > > +++ b/drivers/pci/controller/dwc/pcie-designware.h
> > > @@ -243,7 +243,6 @@ struct dw_pcie_rp {
> > >   	struct irq_domain	*irq_domain;
> > >   	struct irq_domain	*msi_domain;
> > >   	dma_addr_t		msi_data;
> > > -	struct page		*msi_page;
> > >   	struct irq_chip		*msi_irq_chip;
> > >   	u32			num_vectors;
> > >   	u32			irq_mask[MAX_MSI_CTRLS];
> > > -- 
> > > 2.37.2.672.g94769d06f0-goog
> > > 
> > > 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 1/2] PCI: dwc: Drop dependency on ZONE_DMA32
  2022-09-29 19:32       ` Serge Semin
@ 2022-09-30 11:01         ` Robin Murphy
  2022-09-30 12:57           ` Serge Semin
  0 siblings, 1 reply; 26+ messages in thread
From: Robin Murphy @ 2022-09-30 11:01 UTC (permalink / raw)
  To: Serge Semin
  Cc: Will McVicker, Jingoo Han, Gustavo Pimentel, Lorenzo Pieralisi,
	Rob Herring, Krzysztof Wilczyński, Bjorn Helgaas,
	kernel-team, Vidya Sagar, Christoph Hellwig, linux-pci,
	linux-kernel, Isaac J . Manjarres

On 2022-09-29 20:32, Serge Semin wrote:
> On Thu, Sep 29, 2022 at 07:25:03PM +0100, Robin Murphy wrote:
>> On 2022-09-28 12:41, Serge Semin wrote:
>>> On Thu, Aug 25, 2022 at 06:50:24PM +0000, Will McVicker wrote:
>>>> Re-work the msi_msg DMA allocation logic to use dmam_alloc_coherent() which
>>>> uses the coherent DMA mask to try to return an allocation within the DMA
>>>> mask limits. With that, we now can drop the msi_page parameter in struct
>>>> dw_pcie_rp. This allows kernel configurations that disable ZONE_DMA32 to
>>>> continue supporting a 32-bit DMA mask. Without this patch, the PCIe host
>>>> device will fail to probe when ZONE_DMA32 is disabled.
>>>
>>> As Rob already said here
>>> https://lore.kernel.org/all/CAL_JsqJh=d-B51b6yPBRq0tOwbChN=AFPr-a19U1QdQZAE7c1A@mail.gmail.com/
>>> and I mentioned in this thread
>>> https://lore.kernel.org/linux-pci/20220912000211.ct6asuhhmnatje5e@mobilestation/
>>> DW PCIe MSI doesn't cause any DMA due to the way the iMSI-RX engine is
>>> designed. So reserving any real system memory is a waste of one in
>>> this case. Reserving DMA-coherent even more inappropriate since it
>>> can be expensive on some platforms (see note in Part Ia of
>>> Documentation/core-api/dma-api.rst). For instance on MIPS32 with
>>> non-corehent common DMA.
>>
> 
>> This has been discussed before - in general it is difficult to pick an
>> arbitrary MSI address that is *guaranteed* not to overlap any valid DMA
>> address that somebody may try to use later. However there is a very easy way
>> to guarantee that the DMA API won't give anyone a particular DMA address,
>> which is to get an address directly from the DMA API and keep it. Yes, that
>> can technically be done with a streaming mapping *if* you already have some
>> memory allocated in a suitable physical location, but coherent allocations
>> are even more foolproof, simpler to clean up (particularly with devres), and
>> unlikely to be an issue on relevant platforms (do any MIPS32 systems use
>> this driver?)
> 
> My patchset adds the DW PCIe RP controller support on MIPS32 arch:
> https://lore.kernel.org/linux-pci/20220822184701.25246-21-Sergey.Semin@baikalelectronics.ru/
> 
>>
>>>> Fixes: 35797e672ff0 ("PCI: dwc: Fix MSI msi_msg DMA mapping")
>>>> Reported-by: Isaac J. Manjarres <isaacmanjarres@google.com>
>>>> Signed-off-by: Will McVicker <willmcvicker@google.com>
>>>> Acked-by: Jingoo Han <jingoohan1@gmail.com>
>>>> Reviewed-by: Rob Herring <robh@kernel.org>
>>>> ---
>>>>    .../pci/controller/dwc/pcie-designware-host.c | 28 +++++--------------
>>>>    drivers/pci/controller/dwc/pcie-designware.h  |  1 -
>>>>    2 files changed, 7 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
>>>> index 7746f94a715f..39f3b37d4033 100644
>>>> --- a/drivers/pci/controller/dwc/pcie-designware-host.c
>>>> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
>>>> @@ -267,15 +267,6 @@ static void dw_pcie_free_msi(struct dw_pcie_rp *pp)
>>>>    	irq_domain_remove(pp->msi_domain);
>>>>    	irq_domain_remove(pp->irq_domain);
>>>> -
>>>> -	if (pp->msi_data) {
>>>> -		struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
>>>> -		struct device *dev = pci->dev;
>>>> -
>>>> -		dma_unmap_page(dev, pp->msi_data, PAGE_SIZE, DMA_FROM_DEVICE);
>>>> -		if (pp->msi_page)
>>>> -			__free_page(pp->msi_page);
>>>> -	}
>>>>    }
>>>>    static void dw_pcie_msi_init(struct dw_pcie_rp *pp)
>>>> @@ -336,6 +327,7 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
>>>>    	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
>>>>    	struct device *dev = pci->dev;
>>>>    	struct platform_device *pdev = to_platform_device(dev);
>>>> +	u64 *msi_vaddr;
>>>>    	int ret;
>>>>    	u32 ctrl, num_ctrls;
>>>> @@ -375,22 +367,16 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
>>>>    						    dw_chained_msi_isr, pp);
>>>>    	}
>>>
>>>> -	ret = dma_set_mask(dev, DMA_BIT_MASK(32));
>>>> +	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
>>>
>>> This has been redundant in the first place since none of the DW PCIe
>>> low-level drivers update the mask, and it's of 32-bits wide by default
>>> anyway:
>>> https://elixir.bootlin.com/linux/latest/source/drivers/of/platform.c#L167
>>
> 
>> No, in general drivers should always explicitly set their mask(s) and check
>> the return value to make sure DMA is possible at all before trying any other
>> DMA API calls. There's no guarantee that the default mask is usable (e.g.
>> some systems don't have any 32-bit addressable RAM), or that it's even
>> always 32 bits (due to crufty reasons of something of_dma_configure() tried
>> to do a long time ago).
> 
> Suppose you are right and DMA-mask should be always set before any
> mapping. What do you suggest to do in this case? (1) The code above
> overrides the real DMA-mask which could be set by the platform
> drivers, which in its turn are normally aware of the device DMA
> capabilities.

I am right. Appropriate DMA API usage as defined by the DMA API 
maintainers is not a matter of supposition. I literally just explained 
right there why drivers can't blindly assume the default mask is usable 
on modern systems (yes, it was different 20 years ago when system 
topologies were simpler).

However, having now gone and looked at the whole driver rather than 
unclear fragments of patch context, the code here *is* technically 
wrong. I've been mistakenly thinking all along that this was operating 
on the PCI device because I know that's what it *should* be doing, and 
seeing misleading things like "dev = pci->dev" falsely affirmed that 
assumption that it would be correct because it's been around for ages. 
AFAIU the correct PCI device won't actually exist until we've got far 
enough through pci_host_probe(), so I'm not sure how to easily solve this :/

Of course *this* patch doesn't change any of that either, so it's no 
worse than the existing code and I don't see that dropping it helps you 
at all; the current driver is already trampling your 64-bit mask back to 
32 bits and reserving the doorbell address in the wrong DMA address 
space (modulo the other dma-ranges bug which also took far too long to 
figure out). At this point I'd rather keep it since getting rid of the 
__GFP_DMA32 abuse is objectively good. If losing one page of coherent 
memory is a measurably significant problem for T1 once the other issues 
are worked out and that series lands, then you're welcome to propose a 
change on top (but I would prefer that all the drivers using this trick 
are changed consistently).

Thanks,
Robin.

> But in this case due to override afterwards any buffers
> above 4GB mapping will cause using the bounce buffers. (2) It's set
> here for something which isn't actual DMA. So to speak on one side is
> this patchset which overrides the mask for something which isn't DMA,
> and there are another patchsets:
> https://lore.kernel.org/linux-pci/20220822184701.25246-1-Sergey.Semin@baikalelectronics.ru/
> and
> https://lore.kernel.org/linux-pci/20220728142841.12305-1-Sergey.Semin@baikalelectronics.ru/
> which add the real DMA support to DW PCIe driver and for which setting
> the real DMA-mask is crucial. What do you suggest? Setting the mask
> twice: before allocating MSI-buffer and afterwards for the sake of
> eDMA buffers mapping? Moving DMA-mask setting from the generic DW PCIe
> code to the platform drivers?
> 
> -Sergey
> 
>>
>> Thanks,
>> Robin.
>>
>>>>    	if (ret)
>>>>    		dev_warn(dev, "Failed to set DMA mask to 32-bit. Devices with only 32-bit MSI support may not work properly\n");
>>>> -	pp->msi_page = alloc_page(GFP_DMA32);
>>>> -	pp->msi_data = dma_map_page(dev, pp->msi_page, 0,
>>>> -				    PAGE_SIZE, DMA_FROM_DEVICE);
>>>> -	ret = dma_mapping_error(dev, pp->msi_data);
>>>> -	if (ret) {
>>>> -		dev_err(pci->dev, "Failed to map MSI data\n");
>>>> -		__free_page(pp->msi_page);
>>>> -		pp->msi_page = NULL;
>>>> -		pp->msi_data = 0;
>>>> +	msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
>>>> +					GFP_KERNEL);
>>>
>>> Changing the whole device DMA-mask due to something that doesn't
>>> perform seems inappropriate. I'd suggest to preserve the ZONE_DMA32
>>> here until there is something like suggested by @Robin
>>> https://lore.kernel.org/linux-pci/1e63a581-14ae-b4b5-a5bf-ca8f09c33af6@arm.com/
>>> in the last paragraph is implemented. Especially seeing there still
>>> common drivers in kernel which still rely on that zone.
>>>
>>> -Sergey
>>>
>>>> +	if (!msi_vaddr) {
>>>> +		dev_err(dev, "Failed to alloc and map MSI data\n");
>>>>    		dw_pcie_free_msi(pp);
>>>> -
>>>> -		return ret;
>>>> +		return -ENOMEM;
>>>>    	}
>>>>    	return 0;
>>>> diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
>>>> index 09b887093a84..a871ae7eb59e 100644
>>>> --- a/drivers/pci/controller/dwc/pcie-designware.h
>>>> +++ b/drivers/pci/controller/dwc/pcie-designware.h
>>>> @@ -243,7 +243,6 @@ struct dw_pcie_rp {
>>>>    	struct irq_domain	*irq_domain;
>>>>    	struct irq_domain	*msi_domain;
>>>>    	dma_addr_t		msi_data;
>>>> -	struct page		*msi_page;
>>>>    	struct irq_chip		*msi_irq_chip;
>>>>    	u32			num_vectors;
>>>>    	u32			irq_mask[MAX_MSI_CTRLS];
>>>> -- 
>>>> 2.37.2.672.g94769d06f0-goog
>>>>
>>>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 1/2] PCI: dwc: Drop dependency on ZONE_DMA32
  2022-09-30 11:01         ` Robin Murphy
@ 2022-09-30 12:57           ` Serge Semin
  2022-09-30 15:39             ` Robin Murphy
  0 siblings, 1 reply; 26+ messages in thread
From: Serge Semin @ 2022-09-30 12:57 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Will McVicker, Jingoo Han, Gustavo Pimentel, Lorenzo Pieralisi,
	Rob Herring, Krzysztof Wilczyński, Bjorn Helgaas,
	kernel-team, Vidya Sagar, Christoph Hellwig, linux-pci,
	linux-kernel, Isaac J . Manjarres

On Fri, Sep 30, 2022 at 12:01:58PM +0100, Robin Murphy wrote:
> On 2022-09-29 20:32, Serge Semin wrote:
> > On Thu, Sep 29, 2022 at 07:25:03PM +0100, Robin Murphy wrote:
> > > On 2022-09-28 12:41, Serge Semin wrote:
> > > > On Thu, Aug 25, 2022 at 06:50:24PM +0000, Will McVicker wrote:
> > > > > Re-work the msi_msg DMA allocation logic to use dmam_alloc_coherent() which
> > > > > uses the coherent DMA mask to try to return an allocation within the DMA
> > > > > mask limits. With that, we now can drop the msi_page parameter in struct
> > > > > dw_pcie_rp. This allows kernel configurations that disable ZONE_DMA32 to
> > > > > continue supporting a 32-bit DMA mask. Without this patch, the PCIe host
> > > > > device will fail to probe when ZONE_DMA32 is disabled.
> > > > 
> > > > As Rob already said here
> > > > https://lore.kernel.org/all/CAL_JsqJh=d-B51b6yPBRq0tOwbChN=AFPr-a19U1QdQZAE7c1A@mail.gmail.com/
> > > > and I mentioned in this thread
> > > > https://lore.kernel.org/linux-pci/20220912000211.ct6asuhhmnatje5e@mobilestation/
> > > > DW PCIe MSI doesn't cause any DMA due to the way the iMSI-RX engine is
> > > > designed. So reserving any real system memory is a waste of one in
> > > > this case. Reserving DMA-coherent even more inappropriate since it
> > > > can be expensive on some platforms (see note in Part Ia of
> > > > Documentation/core-api/dma-api.rst). For instance on MIPS32 with
> > > > non-corehent common DMA.
> > > 
> > 
> > > This has been discussed before - in general it is difficult to pick an
> > > arbitrary MSI address that is *guaranteed* not to overlap any valid DMA
> > > address that somebody may try to use later. However there is a very easy way
> > > to guarantee that the DMA API won't give anyone a particular DMA address,
> > > which is to get an address directly from the DMA API and keep it. Yes, that
> > > can technically be done with a streaming mapping *if* you already have some
> > > memory allocated in a suitable physical location, but coherent allocations
> > > are even more foolproof, simpler to clean up (particularly with devres), and
> > > unlikely to be an issue on relevant platforms (do any MIPS32 systems use
> > > this driver?)
> > 
> > My patchset adds the DW PCIe RP controller support on MIPS32 arch:
> > https://lore.kernel.org/linux-pci/20220822184701.25246-21-Sergey.Semin@baikalelectronics.ru/
> > 
> > > 
> > > > > Fixes: 35797e672ff0 ("PCI: dwc: Fix MSI msi_msg DMA mapping")
> > > > > Reported-by: Isaac J. Manjarres <isaacmanjarres@google.com>
> > > > > Signed-off-by: Will McVicker <willmcvicker@google.com>
> > > > > Acked-by: Jingoo Han <jingoohan1@gmail.com>
> > > > > Reviewed-by: Rob Herring <robh@kernel.org>
> > > > > ---
> > > > >    .../pci/controller/dwc/pcie-designware-host.c | 28 +++++--------------
> > > > >    drivers/pci/controller/dwc/pcie-designware.h  |  1 -
> > > > >    2 files changed, 7 insertions(+), 22 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> > > > > index 7746f94a715f..39f3b37d4033 100644
> > > > > --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> > > > > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> > > > > @@ -267,15 +267,6 @@ static void dw_pcie_free_msi(struct dw_pcie_rp *pp)
> > > > >    	irq_domain_remove(pp->msi_domain);
> > > > >    	irq_domain_remove(pp->irq_domain);
> > > > > -
> > > > > -	if (pp->msi_data) {
> > > > > -		struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > > > -		struct device *dev = pci->dev;
> > > > > -
> > > > > -		dma_unmap_page(dev, pp->msi_data, PAGE_SIZE, DMA_FROM_DEVICE);
> > > > > -		if (pp->msi_page)
> > > > > -			__free_page(pp->msi_page);
> > > > > -	}
> > > > >    }
> > > > >    static void dw_pcie_msi_init(struct dw_pcie_rp *pp)
> > > > > @@ -336,6 +327,7 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> > > > >    	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > > >    	struct device *dev = pci->dev;
> > > > >    	struct platform_device *pdev = to_platform_device(dev);
> > > > > +	u64 *msi_vaddr;
> > > > >    	int ret;
> > > > >    	u32 ctrl, num_ctrls;
> > > > > @@ -375,22 +367,16 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> > > > >    						    dw_chained_msi_isr, pp);
> > > > >    	}
> > > > 
> > > > > -	ret = dma_set_mask(dev, DMA_BIT_MASK(32));
> > > > > +	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
> > > > 
> > > > This has been redundant in the first place since none of the DW PCIe
> > > > low-level drivers update the mask, and it's of 32-bits wide by default
> > > > anyway:
> > > > https://elixir.bootlin.com/linux/latest/source/drivers/of/platform.c#L167
> > > 
> > 
> > > No, in general drivers should always explicitly set their mask(s) and check
> > > the return value to make sure DMA is possible at all before trying any other
> > > DMA API calls. There's no guarantee that the default mask is usable (e.g.
> > > some systems don't have any 32-bit addressable RAM), or that it's even
> > > always 32 bits (due to crufty reasons of something of_dma_configure() tried
> > > to do a long time ago).
> > 
> > Suppose you are right and DMA-mask should be always set before any
> > mapping. What do you suggest to do in this case? (1) The code above
> > overrides the real DMA-mask which could be set by the platform
> > drivers, which in its turn are normally aware of the device DMA
> > capabilities.
> 
> I am right. Appropriate DMA API usage as defined by the DMA API maintainers
> is not a matter of supposition. I literally just explained right there why
> drivers can't blindly assume the default mask is usable on modern systems
> (yes, it was different 20 years ago when system topologies were simpler).
> 

> However, having now gone and looked at the whole driver rather than unclear
> fragments of patch context, the code here *is* technically wrong. I've been
> mistakenly thinking all along that this was operating on the PCI device
> because I know that's what it *should* be doing, and seeing misleading
> things like "dev = pci->dev" falsely affirmed that assumption that it would
> be correct because it's been around for ages.
> AFAIU the correct PCI device
> won't actually exist until we've got far enough through pci_host_probe(), so
> I'm not sure how to easily solve this :/

Right. The code affected by the subject patch has nothing to do with
the real PCI devices. The DMA-mask is set to the DW PCIe Host controller
platform device in order to force a page being allocated within 32-bit
address space. That's it.

Here is a log of the related changes:

0. Initially a GFP_KERNEL-based page was allocated for the MSI buffer.
It may cause having the DMA/PCIe-address above 4GB, which wouldn't work
for the PCIe peripherals with only 32-bit MSI capability. Though
nobody bothered back then.

1. 07940c369a6b ("PCI: dwc: Fix MSI page leakage in suspend/resume")
After this commit nothing really has changed, but instead of
allocating the MSI-message pseudo-buffer turned to be embedded into
the private data. It could be allocated at any base address with no
actual limitation (because private data is kmalloc'ed).

2. 660c486590aa ("PCI: dwc: Set 32-bit DMA mask for MSI target address allocation")
Someone found out that some devices failed to deliver MSI to the
address above 4GB of PCIe address space and fixed the problem by
force-setting the DMA-mask to being 32-bit before mapping the MSI
buffer. It indeed fixed the problem, but the actual buffer still left
being allocated from any address space. Instead, the mapping procedure
just bounced the buffer to 4GB space. So basically the solution was
very clumsy since turns a bounce buffer being reserved forever.

3. 35797e672ff0 PCI: dwc: Fix MSI msi_msg DMA mapping
@William basically got things back to (0) but instead of GFP_KERNEL
the page was allocated from GFP_DMA32. At this stage he should have
dropped the DMA-mask setting too since the buffer was already
guaranteed to be within 4GB space, but he didn't.

So now we have what we have. The DMA-mask is pointlessly changed for
something not really implying any DMA. That's why I insisted on
dropping it at the very least. Another reason I thought was also
appropriate was the default DMA-mask being set to 32bits anyway.
But you said we shouldn't rely on the default DMA-mask setting. So
ok, it doesn't count then. But it doesn't change the uselessness of the
DMA-mask change in the current driver. 

> AFAIU the correct PCI device
> won't actually exist until we've got far enough through pci_host_probe(), so
> I'm not sure how to easily solve this :/

Walk over all PCIe devices detected on the PCIe-bus. Check their
MSI-capability flags. If any of them have no 64-bit MSI flag set, then
make sure the MSI-base address is allocated within 4GB memory region.
It isn't that easy to implement though...

> 
> Of course *this* patch doesn't change any of that either, so it's no worse
> than the existing code and I don't see that dropping it helps you at all;
> the current driver is already trampling your 64-bit mask back to 32 bits 

Yes, and by this pathset @William intend to fix the DMA-mask-override
behaviour by using the dma_alloc_coherent() method. So any
platform-specific DMA-mask setting will be overwritten, and the
DMA-mask setting won't be able to be moved/dropped due to the
dma_alloc_coherent() method usage.

I have added a DW eDMA-engine support to the DW PCIe driver (you've
already seen my patches) and the engine initialization is supposed to
be performed after any basic initializations like CSRs mapping, data
allocations, MSI, etc. Since DMA is performed by the controller itself
it's required to have a correct DMA-mask set to the DW PCIe platform
device otherwise any consequent mapping will be bounce buffered to the
lowest 4GB even though the corresponding platform can support more
than 4GB of memory (even our MIPS32 can) with DW eDMA easily reaching
that memory. What would help me in this case if the MSI-buffer
allocation procedure wouldn't change the device DMA-mask.  As an
alternative to completely dropping the DMA-mask setting, the DMA-mask
setup process could be moved to the low-level platform device drivers.
It would be even more justified since the platform-specific device
capabilities (like DW PCIe AXI-interface address-bus width) are
unknown in the generic code.

As another alternative I could override the DMA-mask within the DW
eDMA probe procedure. But that would make things more complicated than
relying on the low-level platform drivers doing that.

> and
> reserving the doorbell address in the wrong DMA address space (modulo the
> other dma-ranges bug which also took far too long to figure out).

Actually current driver (without William patch) reserve the doorbell
address in the correct DMA address space (if we don't take the
dma-ranges settings into account). It works as expected in case if the
PCIe<->CPU space has one-on-one mapping (which is true in the most of
the cases). The only thing which is wrong is the pointless DMA-mask
update. I could have easily dropped it in my patchset. But after the
@William patchset is applied I won't be able to do that due to using
the dma_alloc_coherent() here.

> At this
> point I'd rather keep it since getting rid of the __GFP_DMA32 abuse is
> objectively good. If losing one page of coherent memory is a measurably
> significant problem for T1 once the other issues are worked out and that
> series lands, then you're welcome to propose a change on top (but I would
> prefer that all the drivers using this trick are changed consistently).

Regarding DMA-coherent allocation. I am not happy with losing a whole
page of the dma-coherent memory, but we can live with that. What give
additional difficulty for our eDMA-patches is the DMA-mask override.
If you still insist on preserving the @William patchset as it is,
where do you suggest for me to update the DMA-mask if the low-level
driver won't be suitable for that anymore?

-Sergey

> 
> Thanks,
> Robin.
> 
> > But in this case due to override afterwards any buffers
> > above 4GB mapping will cause using the bounce buffers. (2) It's set
> > here for something which isn't actual DMA. So to speak on one side is
> > this patchset which overrides the mask for something which isn't DMA,
> > and there are another patchsets:
> > https://lore.kernel.org/linux-pci/20220822184701.25246-1-Sergey.Semin@baikalelectronics.ru/
> > and
> > https://lore.kernel.org/linux-pci/20220728142841.12305-1-Sergey.Semin@baikalelectronics.ru/
> > which add the real DMA support to DW PCIe driver and for which setting
> > the real DMA-mask is crucial. What do you suggest? Setting the mask
> > twice: before allocating MSI-buffer and afterwards for the sake of
> > eDMA buffers mapping? Moving DMA-mask setting from the generic DW PCIe
> > code to the platform drivers?
> > 
> > -Sergey
> > 
> > > 
> > > Thanks,
> > > Robin.
> > > 
> > > > >    	if (ret)
> > > > >    		dev_warn(dev, "Failed to set DMA mask to 32-bit. Devices with only 32-bit MSI support may not work properly\n");
> > > > > -	pp->msi_page = alloc_page(GFP_DMA32);
> > > > > -	pp->msi_data = dma_map_page(dev, pp->msi_page, 0,
> > > > > -				    PAGE_SIZE, DMA_FROM_DEVICE);
> > > > > -	ret = dma_mapping_error(dev, pp->msi_data);
> > > > > -	if (ret) {
> > > > > -		dev_err(pci->dev, "Failed to map MSI data\n");
> > > > > -		__free_page(pp->msi_page);
> > > > > -		pp->msi_page = NULL;
> > > > > -		pp->msi_data = 0;
> > > > > +	msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
> > > > > +					GFP_KERNEL);
> > > > 
> > > > Changing the whole device DMA-mask due to something that doesn't
> > > > perform seems inappropriate. I'd suggest to preserve the ZONE_DMA32
> > > > here until there is something like suggested by @Robin
> > > > https://lore.kernel.org/linux-pci/1e63a581-14ae-b4b5-a5bf-ca8f09c33af6@arm.com/
> > > > in the last paragraph is implemented. Especially seeing there still
> > > > common drivers in kernel which still rely on that zone.
> > > > 
> > > > -Sergey
> > > > 
> > > > > +	if (!msi_vaddr) {
> > > > > +		dev_err(dev, "Failed to alloc and map MSI data\n");
> > > > >    		dw_pcie_free_msi(pp);
> > > > > -
> > > > > -		return ret;
> > > > > +		return -ENOMEM;
> > > > >    	}
> > > > >    	return 0;
> > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
> > > > > index 09b887093a84..a871ae7eb59e 100644
> > > > > --- a/drivers/pci/controller/dwc/pcie-designware.h
> > > > > +++ b/drivers/pci/controller/dwc/pcie-designware.h
> > > > > @@ -243,7 +243,6 @@ struct dw_pcie_rp {
> > > > >    	struct irq_domain	*irq_domain;
> > > > >    	struct irq_domain	*msi_domain;
> > > > >    	dma_addr_t		msi_data;
> > > > > -	struct page		*msi_page;
> > > > >    	struct irq_chip		*msi_irq_chip;
> > > > >    	u32			num_vectors;
> > > > >    	u32			irq_mask[MAX_MSI_CTRLS];
> > > > > -- 
> > > > > 2.37.2.672.g94769d06f0-goog
> > > > > 
> > > > > 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] PCI: dwc: Add support for 64-bit MSI target address
  2022-09-28 12:05   ` Serge Semin
  2022-09-28 17:52     ` William McVicker
@ 2022-09-30 13:46     ` Lorenzo Pieralisi
  2022-09-30 14:14       ` Serge Semin
  1 sibling, 1 reply; 26+ messages in thread
From: Lorenzo Pieralisi @ 2022-09-30 13:46 UTC (permalink / raw)
  To: Serge Semin, Will McVicker
  Cc: Jingoo Han, Gustavo Pimentel, Rob Herring,
	Krzysztof Wilczyński, Bjorn Helgaas, kernel-team,
	Vidya Sagar, Christoph Hellwig, Robin Murphy, linux-pci,
	linux-kernel, kernel test robot

On Wed, Sep 28, 2022 at 03:05:10PM +0300, Serge Semin wrote:
> On Thu, Aug 25, 2022 at 06:50:25PM +0000, Will McVicker wrote:

[...]

> > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> > index 39f3b37d4033..8928a9a29d58 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> > @@ -330,6 +330,9 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> >  	u64 *msi_vaddr;
> >  	int ret;
> >  	u32 ctrl, num_ctrls;
> > +	bool msi_64bit = false;
> > +	bool retry_64bit = false;
> > +	u16 msi_capabilities;
> >  
> >  	for (ctrl = 0; ctrl < MAX_MSI_CTRLS; ctrl++)
> >  		pp->irq_mask[ctrl] = ~0;
> > @@ -367,16 +370,33 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> >  						    dw_chained_msi_isr, pp);
> >  	}
> >  
> > -	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
> > -	if (ret)
> > -		dev_warn(dev, "Failed to set DMA mask to 32-bit. Devices with only 32-bit MSI support may not work properly\n");
> 
> > +	msi_capabilities = dw_pcie_msi_capabilities(pci);
> > +	if (msi_capabilities & PCI_MSI_FLAGS_ENABLE)
> > +		msi_64bit = msi_capabilities & PCI_MSI_FLAGS_64BIT;
> 
> Note this capability flag has nothing to do with the DW PCIe iMSI-RX
> engine, which is used here to detect and report MSI TLPs. By design
> iMSI-RX always support 64-bit addresses. If you imply having that flag
> set by the DW PCIe platform drivers on the platform-specific probe
> stage as an indication of MSI address range, then ok.

The MSI cap shows that - AFAICS - the RP can be programmed with
a 64-bit MSI(DMA) address. It is fair to say that this is not
enough to guarantee that the DMA mask for the host bridge can be
inferred to be 64-bit though.

I am inclined to drop this patch (only) from the PCI queue because
it is unclear to me whether it really does the right thing.

Lorenzo

> > -	msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
> > -					GFP_KERNEL);
> > -	if (!msi_vaddr) {
> > -		dev_err(dev, "Failed to alloc and map MSI data\n");
> > -		dw_pcie_free_msi(pp);
> > -		return -ENOMEM;
> > +	while (true) {
> > +		dev_dbg(dev, "Setting MSI DMA mask to %s-bit.\n",
> > +			retry_64bit ? "64" : "32");
> 
> > +		ret = dma_set_mask_and_coherent(dev, retry_64bit ?
> > +						DMA_BIT_MASK(64) :
> > +						DMA_BIT_MASK(32));
> 
> I'd suggest to just drop this. No DMA actually performed on getting the
> MSI TLPs. So modifying the device DMA-mask due to something which
> doesn't cause DMA and based on the flag which doesn't indicates the
> device DMA-capability is at least inappropriate.
> 
> > +		if (ret)
> > +			dev_warn(dev, "Failed to set DMA mask to %s-bit.\n",
> > +				 retry_64bit ? "64" : "32");
> > +
> 
> > +		msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
> > +						GFP_KERNEL);
> 
> As I noted earlier the DMA-coherent memory can be too expensive. So
> it's a waste of one allocating with no intent of usage. Instead of this
> just get back the alloc_page() method here and pass the flag GFP_DMA32
> to that function if MSI-capability reported the PCI_MSI_FLAGS_64BIT
> unset.
> 
> -Sergey
> 
> > +		if (!msi_vaddr) {
> > +			dev_err(dev, "Failed to alloc and map MSI data\n");
> > +			if (msi_64bit && !retry_64bit) {
> > +				retry_64bit = true;
> > +				continue;
> > +			}
> > +
> > +			dw_pcie_free_msi(pp);
> > +			return -ENOMEM;
> > +		}
> > +		break;
> >  	}
> >  
> >  	return 0;
> > diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
> > index c6725c519a47..650a7f22f9d0 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware.c
> > +++ b/drivers/pci/controller/dwc/pcie-designware.c
> > @@ -82,6 +82,14 @@ u8 dw_pcie_find_capability(struct dw_pcie *pci, u8 cap)
> >  }
> >  EXPORT_SYMBOL_GPL(dw_pcie_find_capability);
> >  
> > +u16 dw_pcie_msi_capabilities(struct dw_pcie *pci)
> > +{
> > +	u8 offset;
> > +
> > +	offset = dw_pcie_find_capability(pci, PCI_CAP_ID_MSI);
> > +	return dw_pcie_readw_dbi(pci, offset + PCI_MSI_FLAGS);
> > +}
> > +
> >  static u16 dw_pcie_find_next_ext_capability(struct dw_pcie *pci, u16 start,
> >  					    u8 cap)
> >  {
> > diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
> > index a871ae7eb59e..45fcdfc8c035 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware.h
> > +++ b/drivers/pci/controller/dwc/pcie-designware.h
> > @@ -332,6 +332,7 @@ void dw_pcie_version_detect(struct dw_pcie *pci);
> >  
> >  u8 dw_pcie_find_capability(struct dw_pcie *pci, u8 cap);
> >  u16 dw_pcie_find_ext_capability(struct dw_pcie *pci, u8 cap);
> > +u16 dw_pcie_msi_capabilities(struct dw_pcie *pci);
> >  
> >  int dw_pcie_read(void __iomem *addr, int size, u32 *val);
> >  int dw_pcie_write(void __iomem *addr, int size, u32 val);
> > -- 
> > 2.37.2.672.g94769d06f0-goog
> > 
> > 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 2/2] PCI: dwc: Add support for 64-bit MSI target address
  2022-09-30 13:46     ` Lorenzo Pieralisi
@ 2022-09-30 14:14       ` Serge Semin
  0 siblings, 0 replies; 26+ messages in thread
From: Serge Semin @ 2022-09-30 14:14 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Will McVicker, Jingoo Han, Gustavo Pimentel, Rob Herring,
	Krzysztof Wilczyński, Bjorn Helgaas, kernel-team,
	Vidya Sagar, Christoph Hellwig, Robin Murphy, linux-pci,
	linux-kernel, kernel test robot

On Fri, Sep 30, 2022 at 03:46:56PM +0200, Lorenzo Pieralisi wrote:
> On Wed, Sep 28, 2022 at 03:05:10PM +0300, Serge Semin wrote:
> > On Thu, Aug 25, 2022 at 06:50:25PM +0000, Will McVicker wrote:
> 
> [...]
> 
> > > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> > > index 39f3b37d4033..8928a9a29d58 100644
> > > --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> > > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> > > @@ -330,6 +330,9 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> > >  	u64 *msi_vaddr;
> > >  	int ret;
> > >  	u32 ctrl, num_ctrls;
> > > +	bool msi_64bit = false;
> > > +	bool retry_64bit = false;
> > > +	u16 msi_capabilities;
> > >  
> > >  	for (ctrl = 0; ctrl < MAX_MSI_CTRLS; ctrl++)
> > >  		pp->irq_mask[ctrl] = ~0;
> > > @@ -367,16 +370,33 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> > >  						    dw_chained_msi_isr, pp);
> > >  	}
> > >  
> > > -	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
> > > -	if (ret)
> > > -		dev_warn(dev, "Failed to set DMA mask to 32-bit. Devices with only 32-bit MSI support may not work properly\n");
> > 
> > > +	msi_capabilities = dw_pcie_msi_capabilities(pci);
> > > +	if (msi_capabilities & PCI_MSI_FLAGS_ENABLE)
> > > +		msi_64bit = msi_capabilities & PCI_MSI_FLAGS_64BIT;
> > 
> > Note this capability flag has nothing to do with the DW PCIe iMSI-RX
> > engine, which is used here to detect and report MSI TLPs. By design
> > iMSI-RX always support 64-bit addresses. If you imply having that flag
> > set by the DW PCIe platform drivers on the platform-specific probe
> > stage as an indication of MSI address range, then ok.
> 

> The MSI cap shows that - AFAICS - the RP can be programmed with
> a 64-bit MSI(DMA) address. It is fair to say that this is not
> enough to guarantee that the DMA mask for the host bridge can be
> inferred to be 64-bit though.

iMSI-RX always supports 64-bit bus addresses by design. The
MSI-control CSRs are unconditionally permit having 64-bit address
setup. So you can't even synthesize the DW PCIe RP IP-core with only
32-bits MSI support. AFAICS what @William is introducing here is
MSI_FLAGS_64BIT usage as a flag, which could be manually set by the
platform drivers and would indicate that the platform driver permits
having 64-bit MSI TLPs. I guess the platforms are supposed to know
better which PCIe device are going to live on the bus and set that
flag accordingly. It isn't true of course without the bus
pre-scanning.

> 
> I am inclined to drop this patch (only) from the PCI queue because
> it is unclear to me whether it really does the right thing.

Let's wait for the Robin response for my last comment regarding the
patch 1 fate too.

-Sergey

> 
> Lorenzo
> 
> > > -	msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
> > > -					GFP_KERNEL);
> > > -	if (!msi_vaddr) {
> > > -		dev_err(dev, "Failed to alloc and map MSI data\n");
> > > -		dw_pcie_free_msi(pp);
> > > -		return -ENOMEM;
> > > +	while (true) {
> > > +		dev_dbg(dev, "Setting MSI DMA mask to %s-bit.\n",
> > > +			retry_64bit ? "64" : "32");
> > 
> > > +		ret = dma_set_mask_and_coherent(dev, retry_64bit ?
> > > +						DMA_BIT_MASK(64) :
> > > +						DMA_BIT_MASK(32));
> > 
> > I'd suggest to just drop this. No DMA actually performed on getting the
> > MSI TLPs. So modifying the device DMA-mask due to something which
> > doesn't cause DMA and based on the flag which doesn't indicates the
> > device DMA-capability is at least inappropriate.
> > 
> > > +		if (ret)
> > > +			dev_warn(dev, "Failed to set DMA mask to %s-bit.\n",
> > > +				 retry_64bit ? "64" : "32");
> > > +
> > 
> > > +		msi_vaddr = dmam_alloc_coherent(dev, sizeof(u64), &pp->msi_data,
> > > +						GFP_KERNEL);
> > 
> > As I noted earlier the DMA-coherent memory can be too expensive. So
> > it's a waste of one allocating with no intent of usage. Instead of this
> > just get back the alloc_page() method here and pass the flag GFP_DMA32
> > to that function if MSI-capability reported the PCI_MSI_FLAGS_64BIT
> > unset.
> > 
> > -Sergey
> > 
> > > +		if (!msi_vaddr) {
> > > +			dev_err(dev, "Failed to alloc and map MSI data\n");
> > > +			if (msi_64bit && !retry_64bit) {
> > > +				retry_64bit = true;
> > > +				continue;
> > > +			}
> > > +
> > > +			dw_pcie_free_msi(pp);
> > > +			return -ENOMEM;
> > > +		}
> > > +		break;
> > >  	}
> > >  
> > >  	return 0;
> > > diff --git a/drivers/pci/controller/dwc/pcie-designware.c b/drivers/pci/controller/dwc/pcie-designware.c
> > > index c6725c519a47..650a7f22f9d0 100644
> > > --- a/drivers/pci/controller/dwc/pcie-designware.c
> > > +++ b/drivers/pci/controller/dwc/pcie-designware.c
> > > @@ -82,6 +82,14 @@ u8 dw_pcie_find_capability(struct dw_pcie *pci, u8 cap)
> > >  }
> > >  EXPORT_SYMBOL_GPL(dw_pcie_find_capability);
> > >  
> > > +u16 dw_pcie_msi_capabilities(struct dw_pcie *pci)
> > > +{
> > > +	u8 offset;
> > > +
> > > +	offset = dw_pcie_find_capability(pci, PCI_CAP_ID_MSI);
> > > +	return dw_pcie_readw_dbi(pci, offset + PCI_MSI_FLAGS);
> > > +}
> > > +
> > >  static u16 dw_pcie_find_next_ext_capability(struct dw_pcie *pci, u16 start,
> > >  					    u8 cap)
> > >  {
> > > diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
> > > index a871ae7eb59e..45fcdfc8c035 100644
> > > --- a/drivers/pci/controller/dwc/pcie-designware.h
> > > +++ b/drivers/pci/controller/dwc/pcie-designware.h
> > > @@ -332,6 +332,7 @@ void dw_pcie_version_detect(struct dw_pcie *pci);
> > >  
> > >  u8 dw_pcie_find_capability(struct dw_pcie *pci, u8 cap);
> > >  u16 dw_pcie_find_ext_capability(struct dw_pcie *pci, u8 cap);
> > > +u16 dw_pcie_msi_capabilities(struct dw_pcie *pci);
> > >  
> > >  int dw_pcie_read(void __iomem *addr, int size, u32 *val);
> > >  int dw_pcie_write(void __iomem *addr, int size, u32 val);
> > > -- 
> > > 2.37.2.672.g94769d06f0-goog
> > > 
> > > 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 1/2] PCI: dwc: Drop dependency on ZONE_DMA32
  2022-09-30 12:57           ` Serge Semin
@ 2022-09-30 15:39             ` Robin Murphy
  2022-09-30 17:02               ` William McVicker
  2022-10-07 22:45               ` Serge Semin
  0 siblings, 2 replies; 26+ messages in thread
From: Robin Murphy @ 2022-09-30 15:39 UTC (permalink / raw)
  To: Serge Semin
  Cc: Will McVicker, Jingoo Han, Gustavo Pimentel, Lorenzo Pieralisi,
	Rob Herring, Krzysztof Wilczyński, Bjorn Helgaas,
	kernel-team, Vidya Sagar, Christoph Hellwig, linux-pci,
	linux-kernel, Isaac J . Manjarres

On 2022-09-30 13:57, Serge Semin wrote:
> On Fri, Sep 30, 2022 at 12:01:58PM +0100, Robin Murphy wrote:
>> On 2022-09-29 20:32, Serge Semin wrote:
>>> On Thu, Sep 29, 2022 at 07:25:03PM +0100, Robin Murphy wrote:
>>>> On 2022-09-28 12:41, Serge Semin wrote:
>>>>> On Thu, Aug 25, 2022 at 06:50:24PM +0000, Will McVicker wrote:
>>>>>> Re-work the msi_msg DMA allocation logic to use dmam_alloc_coherent() which
>>>>>> uses the coherent DMA mask to try to return an allocation within the DMA
>>>>>> mask limits. With that, we now can drop the msi_page parameter in struct
>>>>>> dw_pcie_rp. This allows kernel configurations that disable ZONE_DMA32 to
>>>>>> continue supporting a 32-bit DMA mask. Without this patch, the PCIe host
>>>>>> device will fail to probe when ZONE_DMA32 is disabled.
>>>>>
>>>>> As Rob already said here
>>>>> https://lore.kernel.org/all/CAL_JsqJh=d-B51b6yPBRq0tOwbChN=AFPr-a19U1QdQZAE7c1A@mail.gmail.com/
>>>>> and I mentioned in this thread
>>>>> https://lore.kernel.org/linux-pci/20220912000211.ct6asuhhmnatje5e@mobilestation/
>>>>> DW PCIe MSI doesn't cause any DMA due to the way the iMSI-RX engine is
>>>>> designed. So reserving any real system memory is a waste of one in
>>>>> this case. Reserving DMA-coherent even more inappropriate since it
>>>>> can be expensive on some platforms (see note in Part Ia of
>>>>> Documentation/core-api/dma-api.rst). For instance on MIPS32 with
>>>>> non-corehent common DMA.
>>>>
>>>
>>>> This has been discussed before - in general it is difficult to pick an
>>>> arbitrary MSI address that is *guaranteed* not to overlap any valid DMA
>>>> address that somebody may try to use later. However there is a very easy way
>>>> to guarantee that the DMA API won't give anyone a particular DMA address,
>>>> which is to get an address directly from the DMA API and keep it. Yes, that
>>>> can technically be done with a streaming mapping *if* you already have some
>>>> memory allocated in a suitable physical location, but coherent allocations
>>>> are even more foolproof, simpler to clean up (particularly with devres), and
>>>> unlikely to be an issue on relevant platforms (do any MIPS32 systems use
>>>> this driver?)
>>>
>>> My patchset adds the DW PCIe RP controller support on MIPS32 arch:
>>> https://lore.kernel.org/linux-pci/20220822184701.25246-21-Sergey.Semin@baikalelectronics.ru/
>>>
>>>>
>>>>>> Fixes: 35797e672ff0 ("PCI: dwc: Fix MSI msi_msg DMA mapping")
>>>>>> Reported-by: Isaac J. Manjarres <isaacmanjarres@google.com>
>>>>>> Signed-off-by: Will McVicker <willmcvicker@google.com>
>>>>>> Acked-by: Jingoo Han <jingoohan1@gmail.com>
>>>>>> Reviewed-by: Rob Herring <robh@kernel.org>
>>>>>> ---
>>>>>>     .../pci/controller/dwc/pcie-designware-host.c | 28 +++++--------------
>>>>>>     drivers/pci/controller/dwc/pcie-designware.h  |  1 -
>>>>>>     2 files changed, 7 insertions(+), 22 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
>>>>>> index 7746f94a715f..39f3b37d4033 100644
>>>>>> --- a/drivers/pci/controller/dwc/pcie-designware-host.c
>>>>>> +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
>>>>>> @@ -267,15 +267,6 @@ static void dw_pcie_free_msi(struct dw_pcie_rp *pp)
>>>>>>     	irq_domain_remove(pp->msi_domain);
>>>>>>     	irq_domain_remove(pp->irq_domain);
>>>>>> -
>>>>>> -	if (pp->msi_data) {
>>>>>> -		struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
>>>>>> -		struct device *dev = pci->dev;
>>>>>> -
>>>>>> -		dma_unmap_page(dev, pp->msi_data, PAGE_SIZE, DMA_FROM_DEVICE);
>>>>>> -		if (pp->msi_page)
>>>>>> -			__free_page(pp->msi_page);
>>>>>> -	}
>>>>>>     }
>>>>>>     static void dw_pcie_msi_init(struct dw_pcie_rp *pp)
>>>>>> @@ -336,6 +327,7 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
>>>>>>     	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
>>>>>>     	struct device *dev = pci->dev;
>>>>>>     	struct platform_device *pdev = to_platform_device(dev);
>>>>>> +	u64 *msi_vaddr;
>>>>>>     	int ret;
>>>>>>     	u32 ctrl, num_ctrls;
>>>>>> @@ -375,22 +367,16 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
>>>>>>     						    dw_chained_msi_isr, pp);
>>>>>>     	}
>>>>>
>>>>>> -	ret = dma_set_mask(dev, DMA_BIT_MASK(32));
>>>>>> +	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
>>>>>
>>>>> This has been redundant in the first place since none of the DW PCIe
>>>>> low-level drivers update the mask, and it's of 32-bits wide by default
>>>>> anyway:
>>>>> https://elixir.bootlin.com/linux/latest/source/drivers/of/platform.c#L167
>>>>
>>>
>>>> No, in general drivers should always explicitly set their mask(s) and check
>>>> the return value to make sure DMA is possible at all before trying any other
>>>> DMA API calls. There's no guarantee that the default mask is usable (e.g.
>>>> some systems don't have any 32-bit addressable RAM), or that it's even
>>>> always 32 bits (due to crufty reasons of something of_dma_configure() tried
>>>> to do a long time ago).
>>>
>>> Suppose you are right and DMA-mask should be always set before any
>>> mapping. What do you suggest to do in this case? (1) The code above
>>> overrides the real DMA-mask which could be set by the platform
>>> drivers, which in its turn are normally aware of the device DMA
>>> capabilities.
>>
>> I am right. Appropriate DMA API usage as defined by the DMA API maintainers
>> is not a matter of supposition. I literally just explained right there why
>> drivers can't blindly assume the default mask is usable on modern systems
>> (yes, it was different 20 years ago when system topologies were simpler).
>>
> 
>> However, having now gone and looked at the whole driver rather than unclear
>> fragments of patch context, the code here *is* technically wrong. I've been
>> mistakenly thinking all along that this was operating on the PCI device
>> because I know that's what it *should* be doing, and seeing misleading
>> things like "dev = pci->dev" falsely affirmed that assumption that it would
>> be correct because it's been around for ages.
>> AFAIU the correct PCI device
>> won't actually exist until we've got far enough through pci_host_probe(), so
>> I'm not sure how to easily solve this :/
> 
> Right. The code affected by the subject patch has nothing to do with
> the real PCI devices. The DMA-mask is set to the DW PCIe Host controller
> platform device in order to force a page being allocated within 32-bit
> address space. That's it.
> 
> Here is a log of the related changes:
> 
> 0. Initially a GFP_KERNEL-based page was allocated for the MSI buffer.
> It may cause having the DMA/PCIe-address above 4GB, which wouldn't work
> for the PCIe peripherals with only 32-bit MSI capability. Though
> nobody bothered back then.
> 
> 1. 07940c369a6b ("PCI: dwc: Fix MSI page leakage in suspend/resume")
> After this commit nothing really has changed, but instead of
> allocating the MSI-message pseudo-buffer turned to be embedded into
> the private data. It could be allocated at any base address with no
> actual limitation (because private data is kmalloc'ed).
> 
> 2. 660c486590aa ("PCI: dwc: Set 32-bit DMA mask for MSI target address allocation")
> Someone found out that some devices failed to deliver MSI to the
> address above 4GB of PCIe address space and fixed the problem by
> force-setting the DMA-mask to being 32-bit before mapping the MSI
> buffer. It indeed fixed the problem, but the actual buffer still left
> being allocated from any address space. Instead, the mapping procedure
> just bounced the buffer to 4GB space. So basically the solution was
> very clumsy since turns a bounce buffer being reserved forever.
> 
> 3. 35797e672ff0 PCI: dwc: Fix MSI msi_msg DMA mapping
> @William basically got things back to (0) but instead of GFP_KERNEL
> the page was allocated from GFP_DMA32. At this stage he should have
> dropped the DMA-mask setting too since the buffer was already
> guaranteed to be within 4GB space, but he didn't.

I never saw that change, but frankly the justification in the commit 
message is wrong. I know that there are Android systems with memory 
above 32 bits that run with SWIOTLB disabled because they think they 
know what they're doing, which are almost certainly the same ones that 
also want to disable ZONE_DMA32 for similar reasons. That patch is 
really just another hack around an unexpected configuration, but without 
saying so.

> So now we have what we have. The DMA-mask is pointlessly changed for
> something not really implying any DMA. That's why I insisted on
> dropping it at the very least. Another reason I thought was also
> appropriate was the default DMA-mask being set to 32bits anyway.
> But you said we shouldn't rely on the default DMA-mask setting. So
> ok, it doesn't count then. But it doesn't change the uselessness of the
> DMA-mask change in the current driver.

As I keep saying, it *is* relevant to DMA. The MSI doorbell may not be 
accessing memory, but it is still a thing that occupies DMA address 
space like a mapping of memory does, and DMA masks are how we control 
how DMA address space is allocated. Unless and until we have an API for 
arbitrarily reserving DMA address space within a given range, the 
approach used here and in other drivers is the next best thing, however 
much you don't like it.

>> AFAIU the correct PCI device
>> won't actually exist until we've got far enough through pci_host_probe(), so
>> I'm not sure how to easily solve this :/
> 
> Walk over all PCIe devices detected on the PCIe-bus. Check their
> MSI-capability flags. If any of them have no 64-bit MSI flag set, then
> make sure the MSI-base address is allocated within 4GB memory region.
> It isn't that easy to implement though...

It has nothing to do with capabilities (but also: consider hotplug). We 
simply need the host bridge PCI device to pass to the DMA API to ensure 
that the mapping/allocation is relative to PCI Mem space rather than 
system physical address space, because the two don't have to be 
identical. The challenge is how to reliably pick up that device and set 
up the doorbell *before* any other PCI devices also discovered by 
pci_host_probe() have a chance to start binding drivers and trying to 
request MSIs.

>> Of course *this* patch doesn't change any of that either, so it's no worse
>> than the existing code and I don't see that dropping it helps you at all;
>> the current driver is already trampling your 64-bit mask back to 32 bits
> 
> Yes, and by this pathset @William intend to fix the DMA-mask-override
> behaviour by using the dma_alloc_coherent() method.

No, that is effectively unchanged. Whether it's a streaming mapping with 
dma_mask or a coherent allocation with coherent_dma_mask, masks are 
getting set way, it's the fact that it's on the wrong device that's the 
real problem.

If you expose the eDMA as a generic dmaengine device then there's every 
chance some consumer would make a streaming mapping with it, so even 
though the current code happens to not override the coherent mask it's 
still biting you in the streaming mask.

> So any
> platform-specific DMA-mask setting will be overwritten, and the
> DMA-mask setting won't be able to be moved/dropped due to the
> dma_alloc_coherent() method usage.
> 
> I have added a DW eDMA-engine support to the DW PCIe driver (you've
> already seen my patches) and the engine initialization is supposed to
> be performed after any basic initializations like CSRs mapping, data
> allocations, MSI, etc. Since DMA is performed by the controller itself
> it's required to have a correct DMA-mask set to the DW PCIe platform
> device otherwise any consequent mapping will be bounce buffered to the
> lowest 4GB even though the corresponding platform can support more
> than 4GB of memory (even our MIPS32 can) with DW eDMA easily reaching
> that memory. What would help me in this case if the MSI-buffer
> allocation procedure wouldn't change the device DMA-mask.  As an
> alternative to completely dropping the DMA-mask setting, the DMA-mask
> setup process could be moved to the low-level platform device drivers.
> It would be even more justified since the platform-specific device
> capabilities (like DW PCIe AXI-interface address-bus width) are
> unknown in the generic code.
> 
> As another alternative I could override the DMA-mask within the DW
> eDMA probe procedure. But that would make things more complicated than
> relying on the low-level platform drivers doing that.
> 
>> and
>> reserving the doorbell address in the wrong DMA address space (modulo the
>> other dma-ranges bug which also took far too long to figure out).
> 
> Actually current driver (without William patch) reserve the doorbell
> address in the correct DMA address space (if we don't take the
> dma-ranges settings into account).

No it does not. With or without this patch it is still passing the 
*platform device* to the DMA API, which means the mapping is relative to 
the platform address space, not PCI Mem space on the other side of the 
iATU. The fact that the iATU's dma-ranges translation is erroneously 
applied to the platform device at the moment is, as I have said, a bug.

> It works as expected in case if the
> PCIe<->CPU space has one-on-one mapping (which is true in the most of
> the cases). The only thing which is wrong is the pointless DMA-mask
> update. I could have easily dropped it in my patchset. But after the
> @William patchset is applied I won't be able to do that due to using
> the dma_alloc_coherent() here.

Once again, it is not pointless. There is no guarantee that __GFP_DMA32 
does anything, since ZONE_DMA32 may not exist (per this patch), and the 
zones may not be as expected anyway (look at what arm64 currently does 
if all RAM is above 32 bits, but save those complaints for another thread).

>> At this
>> point I'd rather keep it since getting rid of the __GFP_DMA32 abuse is
>> objectively good. If losing one page of coherent memory is a measurably
>> significant problem for T1 once the other issues are worked out and that
>> series lands, then you're welcome to propose a change on top (but I would
>> prefer that all the drivers using this trick are changed consistently).
> 
> Regarding DMA-coherent allocation. I am not happy with losing a whole
> page of the dma-coherent memory, but we can live with that. What give
> additional difficulty for our eDMA-patches is the DMA-mask override.
> If you still insist on preserving the @William patchset as it is,
> where do you suggest for me to update the DMA-mask if the low-level
> driver won't be suitable for that anymore?

I'm not insisting anything, it's just that this patch is already 
reviewed and queued, is a small step towards being less wrong overall, 
and dropping it wouldn't actually solve any of your problems anyway, so 
I just feel that being obstructive because it falls short of perfection 
isn't helpful.

Robin.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 1/2] PCI: dwc: Drop dependency on ZONE_DMA32
  2022-09-30 15:39             ` Robin Murphy
@ 2022-09-30 17:02               ` William McVicker
  2022-10-03  8:04                 ` Lorenzo Pieralisi
  2022-10-07 22:45               ` Serge Semin
  1 sibling, 1 reply; 26+ messages in thread
From: William McVicker @ 2022-09-30 17:02 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Serge Semin, Jingoo Han, Gustavo Pimentel, Lorenzo Pieralisi,
	Rob Herring, Krzysztof Wilczyński, Bjorn Helgaas,
	kernel-team, Vidya Sagar, Christoph Hellwig, linux-pci,
	linux-kernel, Isaac J . Manjarres

On 09/30/2022, Robin Murphy wrote:
> On 2022-09-30 13:57, Serge Semin wrote:
> > On Fri, Sep 30, 2022 at 12:01:58PM +0100, Robin Murphy wrote:
> > > On 2022-09-29 20:32, Serge Semin wrote:
> > > > On Thu, Sep 29, 2022 at 07:25:03PM +0100, Robin Murphy wrote:
> > > > > On 2022-09-28 12:41, Serge Semin wrote:
> > > > > > On Thu, Aug 25, 2022 at 06:50:24PM +0000, Will McVicker wrote:
> > > > > > > Re-work the msi_msg DMA allocation logic to use dmam_alloc_coherent() which
> > > > > > > uses the coherent DMA mask to try to return an allocation within the DMA
> > > > > > > mask limits. With that, we now can drop the msi_page parameter in struct
> > > > > > > dw_pcie_rp. This allows kernel configurations that disable ZONE_DMA32 to
> > > > > > > continue supporting a 32-bit DMA mask. Without this patch, the PCIe host
> > > > > > > device will fail to probe when ZONE_DMA32 is disabled.
> > > > > > 
> > > > > > As Rob already said here
> > > > > > https://lore.kernel.org/all/CAL_JsqJh=d-B51b6yPBRq0tOwbChN=AFPr-a19U1QdQZAE7c1A@mail.gmail.com/
> > > > > > and I mentioned in this thread
> > > > > > https://lore.kernel.org/linux-pci/20220912000211.ct6asuhhmnatje5e@mobilestation/
> > > > > > DW PCIe MSI doesn't cause any DMA due to the way the iMSI-RX engine is
> > > > > > designed. So reserving any real system memory is a waste of one in
> > > > > > this case. Reserving DMA-coherent even more inappropriate since it
> > > > > > can be expensive on some platforms (see note in Part Ia of
> > > > > > Documentation/core-api/dma-api.rst). For instance on MIPS32 with
> > > > > > non-corehent common DMA.
> > > > > 
> > > > 
> > > > > This has been discussed before - in general it is difficult to pick an
> > > > > arbitrary MSI address that is *guaranteed* not to overlap any valid DMA
> > > > > address that somebody may try to use later. However there is a very easy way
> > > > > to guarantee that the DMA API won't give anyone a particular DMA address,
> > > > > which is to get an address directly from the DMA API and keep it. Yes, that
> > > > > can technically be done with a streaming mapping *if* you already have some
> > > > > memory allocated in a suitable physical location, but coherent allocations
> > > > > are even more foolproof, simpler to clean up (particularly with devres), and
> > > > > unlikely to be an issue on relevant platforms (do any MIPS32 systems use
> > > > > this driver?)
> > > > 
> > > > My patchset adds the DW PCIe RP controller support on MIPS32 arch:
> > > > https://lore.kernel.org/linux-pci/20220822184701.25246-21-Sergey.Semin@baikalelectronics.ru/
> > > > 
> > > > > 
> > > > > > > Fixes: 35797e672ff0 ("PCI: dwc: Fix MSI msi_msg DMA mapping")
> > > > > > > Reported-by: Isaac J. Manjarres <isaacmanjarres@google.com>
> > > > > > > Signed-off-by: Will McVicker <willmcvicker@google.com>
> > > > > > > Acked-by: Jingoo Han <jingoohan1@gmail.com>
> > > > > > > Reviewed-by: Rob Herring <robh@kernel.org>
> > > > > > > ---
> > > > > > >     .../pci/controller/dwc/pcie-designware-host.c | 28 +++++--------------
> > > > > > >     drivers/pci/controller/dwc/pcie-designware.h  |  1 -
> > > > > > >     2 files changed, 7 insertions(+), 22 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> > > > > > > index 7746f94a715f..39f3b37d4033 100644
> > > > > > > --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> > > > > > > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> > > > > > > @@ -267,15 +267,6 @@ static void dw_pcie_free_msi(struct dw_pcie_rp *pp)
> > > > > > >     	irq_domain_remove(pp->msi_domain);
> > > > > > >     	irq_domain_remove(pp->irq_domain);
> > > > > > > -
> > > > > > > -	if (pp->msi_data) {
> > > > > > > -		struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > > > > > -		struct device *dev = pci->dev;
> > > > > > > -
> > > > > > > -		dma_unmap_page(dev, pp->msi_data, PAGE_SIZE, DMA_FROM_DEVICE);
> > > > > > > -		if (pp->msi_page)
> > > > > > > -			__free_page(pp->msi_page);
> > > > > > > -	}
> > > > > > >     }
> > > > > > >     static void dw_pcie_msi_init(struct dw_pcie_rp *pp)
> > > > > > > @@ -336,6 +327,7 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> > > > > > >     	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > > > > >     	struct device *dev = pci->dev;
> > > > > > >     	struct platform_device *pdev = to_platform_device(dev);
> > > > > > > +	u64 *msi_vaddr;
> > > > > > >     	int ret;
> > > > > > >     	u32 ctrl, num_ctrls;
> > > > > > > @@ -375,22 +367,16 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> > > > > > >     						    dw_chained_msi_isr, pp);
> > > > > > >     	}
> > > > > > 
> > > > > > > -	ret = dma_set_mask(dev, DMA_BIT_MASK(32));
> > > > > > > +	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
> > > > > > 
> > > > > > This has been redundant in the first place since none of the DW PCIe
> > > > > > low-level drivers update the mask, and it's of 32-bits wide by default
> > > > > > anyway:
> > > > > > https://elixir.bootlin.com/linux/latest/source/drivers/of/platform.c#L167
> > > > > 
> > > > 
> > > > > No, in general drivers should always explicitly set their mask(s) and check
> > > > > the return value to make sure DMA is possible at all before trying any other
> > > > > DMA API calls. There's no guarantee that the default mask is usable (e.g.
> > > > > some systems don't have any 32-bit addressable RAM), or that it's even
> > > > > always 32 bits (due to crufty reasons of something of_dma_configure() tried
> > > > > to do a long time ago).
> > > > 
> > > > Suppose you are right and DMA-mask should be always set before any
> > > > mapping. What do you suggest to do in this case? (1) The code above
> > > > overrides the real DMA-mask which could be set by the platform
> > > > drivers, which in its turn are normally aware of the device DMA
> > > > capabilities.
> > > 
> > > I am right. Appropriate DMA API usage as defined by the DMA API maintainers
> > > is not a matter of supposition. I literally just explained right there why
> > > drivers can't blindly assume the default mask is usable on modern systems
> > > (yes, it was different 20 years ago when system topologies were simpler).
> > > 
> > 
> > > However, having now gone and looked at the whole driver rather than unclear
> > > fragments of patch context, the code here *is* technically wrong. I've been
> > > mistakenly thinking all along that this was operating on the PCI device
> > > because I know that's what it *should* be doing, and seeing misleading
> > > things like "dev = pci->dev" falsely affirmed that assumption that it would
> > > be correct because it's been around for ages.
> > > AFAIU the correct PCI device
> > > won't actually exist until we've got far enough through pci_host_probe(), so
> > > I'm not sure how to easily solve this :/
> > 
> > Right. The code affected by the subject patch has nothing to do with
> > the real PCI devices. The DMA-mask is set to the DW PCIe Host controller
> > platform device in order to force a page being allocated within 32-bit
> > address space. That's it.
> > 
> > Here is a log of the related changes:
> > 
> > 0. Initially a GFP_KERNEL-based page was allocated for the MSI buffer.
> > It may cause having the DMA/PCIe-address above 4GB, which wouldn't work
> > for the PCIe peripherals with only 32-bit MSI capability. Though
> > nobody bothered back then.
> > 
> > 1. 07940c369a6b ("PCI: dwc: Fix MSI page leakage in suspend/resume")
> > After this commit nothing really has changed, but instead of
> > allocating the MSI-message pseudo-buffer turned to be embedded into
> > the private data. It could be allocated at any base address with no
> > actual limitation (because private data is kmalloc'ed).
> > 
> > 2. 660c486590aa ("PCI: dwc: Set 32-bit DMA mask for MSI target address allocation")
> > Someone found out that some devices failed to deliver MSI to the
> > address above 4GB of PCIe address space and fixed the problem by
> > force-setting the DMA-mask to being 32-bit before mapping the MSI
> > buffer. It indeed fixed the problem, but the actual buffer still left
> > being allocated from any address space. Instead, the mapping procedure
> > just bounced the buffer to 4GB space. So basically the solution was
> > very clumsy since turns a bounce buffer being reserved forever.
> > 
> > 3. 35797e672ff0 PCI: dwc: Fix MSI msi_msg DMA mapping
> > @William basically got things back to (0) but instead of GFP_KERNEL
> > the page was allocated from GFP_DMA32. At this stage he should have
> > dropped the DMA-mask setting too since the buffer was already
> > guaranteed to be within 4GB space, but he didn't.
> 
> I never saw that change, but frankly the justification in the commit message
> is wrong. I know that there are Android systems with memory above 32 bits
> that run with SWIOTLB disabled because they think they know what they're
> doing, which are almost certainly the same ones that also want to disable
> ZONE_DMA32 for similar reasons. That patch is really just another hack
> around an unexpected configuration, but without saying so.

Whether or not the Android OEM knows what they're doing is up to them :). We
shouldn't force enabling/disabling configs based on our own opinions about who
is an expert. Phones at least have the advantage of a controlled environment
which allows them to tune their drivers based on the hardware they ship and
don't need to worry about hotplugging unexpected hardware.

> 
> > So now we have what we have. The DMA-mask is pointlessly changed for
> > something not really implying any DMA. That's why I insisted on
> > dropping it at the very least. Another reason I thought was also
> > appropriate was the default DMA-mask being set to 32bits anyway.
> > But you said we shouldn't rely on the default DMA-mask setting. So
> > ok, it doesn't count then. But it doesn't change the uselessness of the
> > DMA-mask change in the current driver.
> 
> As I keep saying, it *is* relevant to DMA. The MSI doorbell may not be
> accessing memory, but it is still a thing that occupies DMA address space
> like a mapping of memory does, and DMA masks are how we control how DMA
> address space is allocated. Unless and until we have an API for arbitrarily
> reserving DMA address space within a given range, the approach used here and
> in other drivers is the next best thing, however much you don't like it.
> 
> > > AFAIU the correct PCI device
> > > won't actually exist until we've got far enough through pci_host_probe(), so
> > > I'm not sure how to easily solve this :/
> > 
> > Walk over all PCIe devices detected on the PCIe-bus. Check their
> > MSI-capability flags. If any of them have no 64-bit MSI flag set, then
> > make sure the MSI-base address is allocated within 4GB memory region.
> > It isn't that easy to implement though...
> 
> It has nothing to do with capabilities (but also: consider hotplug). We
> simply need the host bridge PCI device to pass to the DMA API to ensure that
> the mapping/allocation is relative to PCI Mem space rather than system
> physical address space, because the two don't have to be identical. The
> challenge is how to reliably pick up that device and set up the doorbell
> *before* any other PCI devices also discovered by pci_host_probe() have a
> chance to start binding drivers and trying to request MSIs.

Maybe I can provide some more insights on my patches that may help you guys
understand the idea behind the MSI capabilities flag. Basically, on a given
Android phone there are going to be multiple PCIe endpoints -- wifi and modem
are good examples. Some of these PCIe endpoints may only support a 32-bit MSI
capability structure, but others could support a 64-bit MSI capability
structure. So my intent was to have the PCIe RC driver detect the endpoint
device's MSI capabilites during the EP device probe and then set the
PCI_MSI_FLAGS_64BIT accordingly before calling dw_pcie_host_init(). Since the
PCIe RC drivers are the ones to call dw_pcie_host_init(), we can dynamically
change the DMA mask and allocate the doorbell target address based on the
PCI_MSI_FLAGS_64BIT.

This all really gets a lot more complicated with the introduction of Serge's
eDMA device. So I full well expect dw_pcie_msi_host_init() to be re-factored
for that.

> 
> > > Of course *this* patch doesn't change any of that either, so it's no worse
> > > than the existing code and I don't see that dropping it helps you at all;
> > > the current driver is already trampling your 64-bit mask back to 32 bits
> > 
> > Yes, and by this pathset @William intend to fix the DMA-mask-override
> > behaviour by using the dma_alloc_coherent() method.
> 
> No, that is effectively unchanged. Whether it's a streaming mapping with
> dma_mask or a coherent allocation with coherent_dma_mask, masks are getting
> set way, it's the fact that it's on the wrong device that's the real
> problem.
> 
> If you expose the eDMA as a generic dmaengine device then there's every
> chance some consumer would make a streaming mapping with it, so even though
> the current code happens to not override the coherent mask it's still biting
> you in the streaming mask.
> 
> > So any
> > platform-specific DMA-mask setting will be overwritten, and the
> > DMA-mask setting won't be able to be moved/dropped due to the
> > dma_alloc_coherent() method usage.
> > 
> > I have added a DW eDMA-engine support to the DW PCIe driver (you've
> > already seen my patches) and the engine initialization is supposed to
> > be performed after any basic initializations like CSRs mapping, data
> > allocations, MSI, etc. Since DMA is performed by the controller itself
> > it's required to have a correct DMA-mask set to the DW PCIe platform
> > device otherwise any consequent mapping will be bounce buffered to the
> > lowest 4GB even though the corresponding platform can support more
> > than 4GB of memory (even our MIPS32 can) with DW eDMA easily reaching
> > that memory. What would help me in this case if the MSI-buffer
> > allocation procedure wouldn't change the device DMA-mask.  As an
> > alternative to completely dropping the DMA-mask setting, the DMA-mask
> > setup process could be moved to the low-level platform device drivers.
> > It would be even more justified since the platform-specific device
> > capabilities (like DW PCIe AXI-interface address-bus width) are
> > unknown in the generic code.
> > 
> > As another alternative I could override the DMA-mask within the DW
> > eDMA probe procedure. But that would make things more complicated than
> > relying on the low-level platform drivers doing that.
> > 
> > > and
> > > reserving the doorbell address in the wrong DMA address space (modulo the
> > > other dma-ranges bug which also took far too long to figure out).
> > 
> > Actually current driver (without William patch) reserve the doorbell
> > address in the correct DMA address space (if we don't take the
> > dma-ranges settings into account).
> 
> No it does not. With or without this patch it is still passing the *platform
> device* to the DMA API, which means the mapping is relative to the platform
> address space, not PCI Mem space on the other side of the iATU. The fact
> that the iATU's dma-ranges translation is erroneously applied to the
> platform device at the moment is, as I have said, a bug.

Thanks for pointing this out. I agree this is a bug and I guess it hasn't
really been a problem because there isn't really any DMA'ing going on. With the
new eDMA device being introduced, this bug will likely need to be fixed.

> 
> > It works as expected in case if the
> > PCIe<->CPU space has one-on-one mapping (which is true in the most of
> > the cases). The only thing which is wrong is the pointless DMA-mask
> > update. I could have easily dropped it in my patchset. But after the
> > @William patchset is applied I won't be able to do that due to using
> > the dma_alloc_coherent() here.
> 
> Once again, it is not pointless. There is no guarantee that __GFP_DMA32 does
> anything, since ZONE_DMA32 may not exist (per this patch), and the zones may
> not be as expected anyway (look at what arm64 currently does if all RAM is
> above 32 bits, but save those complaints for another thread).
> 
> > > At this
> > > point I'd rather keep it since getting rid of the __GFP_DMA32 abuse is
> > > objectively good. If losing one page of coherent memory is a measurably
> > > significant problem for T1 once the other issues are worked out and that
> > > series lands, then you're welcome to propose a change on top (but I would
> > > prefer that all the drivers using this trick are changed consistently).
> > 
> > Regarding DMA-coherent allocation. I am not happy with losing a whole
> > page of the dma-coherent memory, but we can live with that. What give
> > additional difficulty for our eDMA-patches is the DMA-mask override.
> > If you still insist on preserving the @William patchset as it is,
> > where do you suggest for me to update the DMA-mask if the low-level
> > driver won't be suitable for that anymore?
> 
> I'm not insisting anything, it's just that this patch is already reviewed
> and queued, is a small step towards being less wrong overall, and dropping
> it wouldn't actually solve any of your problems anyway, so I just feel that
> being obstructive because it falls short of perfection isn't helpful.

Thanks for the responses Robin! I agree that we don't have a complete solution
and need to fix this DMA address space bug, but dont think that's enough of
a reason to drop this patch series. At the very least I think patch 1/2 which
removes the ZONE_DMA32 dependency is a worthy patch to take for 6.1.

Thanks,
Will

> 
> Robin.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 1/2] PCI: dwc: Drop dependency on ZONE_DMA32
  2022-09-30 17:02               ` William McVicker
@ 2022-10-03  8:04                 ` Lorenzo Pieralisi
  2022-10-03 16:40                   ` William McVicker
  0 siblings, 1 reply; 26+ messages in thread
From: Lorenzo Pieralisi @ 2022-10-03  8:04 UTC (permalink / raw)
  To: William McVicker
  Cc: Robin Murphy, Serge Semin, Jingoo Han, Gustavo Pimentel,
	Rob Herring, Krzysztof Wilczyński, Bjorn Helgaas,
	kernel-team, Vidya Sagar, Christoph Hellwig, linux-pci,
	linux-kernel, Isaac J . Manjarres

On Fri, Sep 30, 2022 at 10:02:22AM -0700, William McVicker wrote:

[...]

> > > So now we have what we have. The DMA-mask is pointlessly changed for
> > > something not really implying any DMA. That's why I insisted on
> > > dropping it at the very least. Another reason I thought was also
> > > appropriate was the default DMA-mask being set to 32bits anyway.
> > > But you said we shouldn't rely on the default DMA-mask setting. So
> > > ok, it doesn't count then. But it doesn't change the uselessness of the
> > > DMA-mask change in the current driver.
> > 
> > As I keep saying, it *is* relevant to DMA. The MSI doorbell may not be
> > accessing memory, but it is still a thing that occupies DMA address space
> > like a mapping of memory does, and DMA masks are how we control how DMA
> > address space is allocated. Unless and until we have an API for arbitrarily
> > reserving DMA address space within a given range, the approach used here and
> > in other drivers is the next best thing, however much you don't like it.
> > 
> > > > AFAIU the correct PCI device
> > > > won't actually exist until we've got far enough through pci_host_probe(), so
> > > > I'm not sure how to easily solve this :/
> > > 
> > > Walk over all PCIe devices detected on the PCIe-bus. Check their
> > > MSI-capability flags. If any of them have no 64-bit MSI flag set, then
> > > make sure the MSI-base address is allocated within 4GB memory region.
> > > It isn't that easy to implement though...
> > 
> > It has nothing to do with capabilities (but also: consider hotplug). We
> > simply need the host bridge PCI device to pass to the DMA API to ensure that
> > the mapping/allocation is relative to PCI Mem space rather than system
> > physical address space, because the two don't have to be identical. The
> > challenge is how to reliably pick up that device and set up the doorbell
> > *before* any other PCI devices also discovered by pci_host_probe() have a
> > chance to start binding drivers and trying to request MSIs.
> 
> Maybe I can provide some more insights on my patches that may help you guys
> understand the idea behind the MSI capabilities flag. Basically, on a given
> Android phone there are going to be multiple PCIe endpoints -- wifi and modem
> are good examples. Some of these PCIe endpoints may only support a 32-bit MSI
> capability structure, but others could support a 64-bit MSI capability
> structure. So my intent was to have the PCIe RC driver detect the endpoint
> device's MSI capabilites during the EP device probe and then set the
> PCI_MSI_FLAGS_64BIT accordingly before calling dw_pcie_host_init(). Since the
> PCIe RC drivers are the ones to call dw_pcie_host_init(), we can dynamically
> change the DMA mask and allocate the doorbell target address based on the
> PCI_MSI_FLAGS_64BIT.

It seems to me we are all talking past each others to solve different
problems, so I am going to reset this discussion given that we are
in the merge window and I must finalise the PCI patch queue.

Patch (2/2) should be dropped IMO - I don't think the host bridge
platform device DMA mask should depend on the root port MSI cap
64/32 bit addressing - I don't think that's the right thing to do.

We should keep this discussion going for the next cycle, I will
drop patch (2/2) for the time being, sorry.

Thanks,
Lorenzo

> This all really gets a lot more complicated with the introduction of Serge's
> eDMA device. So I full well expect dw_pcie_msi_host_init() to be re-factored
> for that.
> 
> > 
> > > > Of course *this* patch doesn't change any of that either, so it's no worse
> > > > than the existing code and I don't see that dropping it helps you at all;
> > > > the current driver is already trampling your 64-bit mask back to 32 bits
> > > 
> > > Yes, and by this pathset @William intend to fix the DMA-mask-override
> > > behaviour by using the dma_alloc_coherent() method.
> > 
> > No, that is effectively unchanged. Whether it's a streaming mapping with
> > dma_mask or a coherent allocation with coherent_dma_mask, masks are getting
> > set way, it's the fact that it's on the wrong device that's the real
> > problem.
> > 
> > If you expose the eDMA as a generic dmaengine device then there's every
> > chance some consumer would make a streaming mapping with it, so even though
> > the current code happens to not override the coherent mask it's still biting
> > you in the streaming mask.
> > 
> > > So any
> > > platform-specific DMA-mask setting will be overwritten, and the
> > > DMA-mask setting won't be able to be moved/dropped due to the
> > > dma_alloc_coherent() method usage.
> > > 
> > > I have added a DW eDMA-engine support to the DW PCIe driver (you've
> > > already seen my patches) and the engine initialization is supposed to
> > > be performed after any basic initializations like CSRs mapping, data
> > > allocations, MSI, etc. Since DMA is performed by the controller itself
> > > it's required to have a correct DMA-mask set to the DW PCIe platform
> > > device otherwise any consequent mapping will be bounce buffered to the
> > > lowest 4GB even though the corresponding platform can support more
> > > than 4GB of memory (even our MIPS32 can) with DW eDMA easily reaching
> > > that memory. What would help me in this case if the MSI-buffer
> > > allocation procedure wouldn't change the device DMA-mask.  As an
> > > alternative to completely dropping the DMA-mask setting, the DMA-mask
> > > setup process could be moved to the low-level platform device drivers.
> > > It would be even more justified since the platform-specific device
> > > capabilities (like DW PCIe AXI-interface address-bus width) are
> > > unknown in the generic code.
> > > 
> > > As another alternative I could override the DMA-mask within the DW
> > > eDMA probe procedure. But that would make things more complicated than
> > > relying on the low-level platform drivers doing that.
> > > 
> > > > and
> > > > reserving the doorbell address in the wrong DMA address space (modulo the
> > > > other dma-ranges bug which also took far too long to figure out).
> > > 
> > > Actually current driver (without William patch) reserve the doorbell
> > > address in the correct DMA address space (if we don't take the
> > > dma-ranges settings into account).
> > 
> > No it does not. With or without this patch it is still passing the *platform
> > device* to the DMA API, which means the mapping is relative to the platform
> > address space, not PCI Mem space on the other side of the iATU. The fact
> > that the iATU's dma-ranges translation is erroneously applied to the
> > platform device at the moment is, as I have said, a bug.
> 
> Thanks for pointing this out. I agree this is a bug and I guess it hasn't
> really been a problem because there isn't really any DMA'ing going on. With the
> new eDMA device being introduced, this bug will likely need to be fixed.
> 
> > 
> > > It works as expected in case if the
> > > PCIe<->CPU space has one-on-one mapping (which is true in the most of
> > > the cases). The only thing which is wrong is the pointless DMA-mask
> > > update. I could have easily dropped it in my patchset. But after the
> > > @William patchset is applied I won't be able to do that due to using
> > > the dma_alloc_coherent() here.
> > 
> > Once again, it is not pointless. There is no guarantee that __GFP_DMA32 does
> > anything, since ZONE_DMA32 may not exist (per this patch), and the zones may
> > not be as expected anyway (look at what arm64 currently does if all RAM is
> > above 32 bits, but save those complaints for another thread).
> > 
> > > > At this
> > > > point I'd rather keep it since getting rid of the __GFP_DMA32 abuse is
> > > > objectively good. If losing one page of coherent memory is a measurably
> > > > significant problem for T1 once the other issues are worked out and that
> > > > series lands, then you're welcome to propose a change on top (but I would
> > > > prefer that all the drivers using this trick are changed consistently).
> > > 
> > > Regarding DMA-coherent allocation. I am not happy with losing a whole
> > > page of the dma-coherent memory, but we can live with that. What give
> > > additional difficulty for our eDMA-patches is the DMA-mask override.
> > > If you still insist on preserving the @William patchset as it is,
> > > where do you suggest for me to update the DMA-mask if the low-level
> > > driver won't be suitable for that anymore?
> > 
> > I'm not insisting anything, it's just that this patch is already reviewed
> > and queued, is a small step towards being less wrong overall, and dropping
> > it wouldn't actually solve any of your problems anyway, so I just feel that
> > being obstructive because it falls short of perfection isn't helpful.
> 
> Thanks for the responses Robin! I agree that we don't have a complete solution
> and need to fix this DMA address space bug, but dont think that's enough of
> a reason to drop this patch series. At the very least I think patch 1/2 which
> removes the ZONE_DMA32 dependency is a worthy patch to take for 6.1.
> 
> Thanks,
> Will
> 
> > 
> > Robin.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 1/2] PCI: dwc: Drop dependency on ZONE_DMA32
  2022-10-03  8:04                 ` Lorenzo Pieralisi
@ 2022-10-03 16:40                   ` William McVicker
  0 siblings, 0 replies; 26+ messages in thread
From: William McVicker @ 2022-10-03 16:40 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Robin Murphy, Serge Semin, Jingoo Han, Gustavo Pimentel,
	Rob Herring, Krzysztof Wilczyński, Bjorn Helgaas,
	kernel-team, Vidya Sagar, Christoph Hellwig, linux-pci,
	linux-kernel, Isaac J . Manjarres

On 10/03/2022, Lorenzo Pieralisi wrote:
> On Fri, Sep 30, 2022 at 10:02:22AM -0700, William McVicker wrote:
> 
> [...]
> 
> > > > So now we have what we have. The DMA-mask is pointlessly changed for
> > > > something not really implying any DMA. That's why I insisted on
> > > > dropping it at the very least. Another reason I thought was also
> > > > appropriate was the default DMA-mask being set to 32bits anyway.
> > > > But you said we shouldn't rely on the default DMA-mask setting. So
> > > > ok, it doesn't count then. But it doesn't change the uselessness of the
> > > > DMA-mask change in the current driver.
> > > 
> > > As I keep saying, it *is* relevant to DMA. The MSI doorbell may not be
> > > accessing memory, but it is still a thing that occupies DMA address space
> > > like a mapping of memory does, and DMA masks are how we control how DMA
> > > address space is allocated. Unless and until we have an API for arbitrarily
> > > reserving DMA address space within a given range, the approach used here and
> > > in other drivers is the next best thing, however much you don't like it.
> > > 
> > > > > AFAIU the correct PCI device
> > > > > won't actually exist until we've got far enough through pci_host_probe(), so
> > > > > I'm not sure how to easily solve this :/
> > > > 
> > > > Walk over all PCIe devices detected on the PCIe-bus. Check their
> > > > MSI-capability flags. If any of them have no 64-bit MSI flag set, then
> > > > make sure the MSI-base address is allocated within 4GB memory region.
> > > > It isn't that easy to implement though...
> > > 
> > > It has nothing to do with capabilities (but also: consider hotplug). We
> > > simply need the host bridge PCI device to pass to the DMA API to ensure that
> > > the mapping/allocation is relative to PCI Mem space rather than system
> > > physical address space, because the two don't have to be identical. The
> > > challenge is how to reliably pick up that device and set up the doorbell
> > > *before* any other PCI devices also discovered by pci_host_probe() have a
> > > chance to start binding drivers and trying to request MSIs.
> > 
> > Maybe I can provide some more insights on my patches that may help you guys
> > understand the idea behind the MSI capabilities flag. Basically, on a given
> > Android phone there are going to be multiple PCIe endpoints -- wifi and modem
> > are good examples. Some of these PCIe endpoints may only support a 32-bit MSI
> > capability structure, but others could support a 64-bit MSI capability
> > structure. So my intent was to have the PCIe RC driver detect the endpoint
> > device's MSI capabilites during the EP device probe and then set the
> > PCI_MSI_FLAGS_64BIT accordingly before calling dw_pcie_host_init(). Since the
> > PCIe RC drivers are the ones to call dw_pcie_host_init(), we can dynamically
> > change the DMA mask and allocate the doorbell target address based on the
> > PCI_MSI_FLAGS_64BIT.
> 
> It seems to me we are all talking past each others to solve different
> problems, so I am going to reset this discussion given that we are
> in the merge window and I must finalise the PCI patch queue.
> 
> Patch (2/2) should be dropped IMO - I don't think the host bridge
> platform device DMA mask should depend on the root port MSI cap
> 64/32 bit addressing - I don't think that's the right thing to do.
> 
> We should keep this discussion going for the next cycle, I will
> drop patch (2/2) for the time being, sorry.

Thanks Lorenzo for taking patch 1/2. That solves my immediate issues. Let's
solve Serge's eDMA problem first. Then we can revisit the 64-bit MSI target
address issue.

Regards,
Will

> 
> Thanks,
> Lorenzo
> 
> > This all really gets a lot more complicated with the introduction of Serge's
> > eDMA device. So I full well expect dw_pcie_msi_host_init() to be re-factored
> > for that.
> > 
> > > 
> > > > > Of course *this* patch doesn't change any of that either, so it's no worse
> > > > > than the existing code and I don't see that dropping it helps you at all;
> > > > > the current driver is already trampling your 64-bit mask back to 32 bits
> > > > 
> > > > Yes, and by this pathset @William intend to fix the DMA-mask-override
> > > > behaviour by using the dma_alloc_coherent() method.
> > > 
> > > No, that is effectively unchanged. Whether it's a streaming mapping with
> > > dma_mask or a coherent allocation with coherent_dma_mask, masks are getting
> > > set way, it's the fact that it's on the wrong device that's the real
> > > problem.
> > > 
> > > If you expose the eDMA as a generic dmaengine device then there's every
> > > chance some consumer would make a streaming mapping with it, so even though
> > > the current code happens to not override the coherent mask it's still biting
> > > you in the streaming mask.
> > > 
> > > > So any
> > > > platform-specific DMA-mask setting will be overwritten, and the
> > > > DMA-mask setting won't be able to be moved/dropped due to the
> > > > dma_alloc_coherent() method usage.
> > > > 
> > > > I have added a DW eDMA-engine support to the DW PCIe driver (you've
> > > > already seen my patches) and the engine initialization is supposed to
> > > > be performed after any basic initializations like CSRs mapping, data
> > > > allocations, MSI, etc. Since DMA is performed by the controller itself
> > > > it's required to have a correct DMA-mask set to the DW PCIe platform
> > > > device otherwise any consequent mapping will be bounce buffered to the
> > > > lowest 4GB even though the corresponding platform can support more
> > > > than 4GB of memory (even our MIPS32 can) with DW eDMA easily reaching
> > > > that memory. What would help me in this case if the MSI-buffer
> > > > allocation procedure wouldn't change the device DMA-mask.  As an
> > > > alternative to completely dropping the DMA-mask setting, the DMA-mask
> > > > setup process could be moved to the low-level platform device drivers.
> > > > It would be even more justified since the platform-specific device
> > > > capabilities (like DW PCIe AXI-interface address-bus width) are
> > > > unknown in the generic code.
> > > > 
> > > > As another alternative I could override the DMA-mask within the DW
> > > > eDMA probe procedure. But that would make things more complicated than
> > > > relying on the low-level platform drivers doing that.
> > > > 
> > > > > and
> > > > > reserving the doorbell address in the wrong DMA address space (modulo the
> > > > > other dma-ranges bug which also took far too long to figure out).
> > > > 
> > > > Actually current driver (without William patch) reserve the doorbell
> > > > address in the correct DMA address space (if we don't take the
> > > > dma-ranges settings into account).
> > > 
> > > No it does not. With or without this patch it is still passing the *platform
> > > device* to the DMA API, which means the mapping is relative to the platform
> > > address space, not PCI Mem space on the other side of the iATU. The fact
> > > that the iATU's dma-ranges translation is erroneously applied to the
> > > platform device at the moment is, as I have said, a bug.
> > 
> > Thanks for pointing this out. I agree this is a bug and I guess it hasn't
> > really been a problem because there isn't really any DMA'ing going on. With the
> > new eDMA device being introduced, this bug will likely need to be fixed.
> > 
> > > 
> > > > It works as expected in case if the
> > > > PCIe<->CPU space has one-on-one mapping (which is true in the most of
> > > > the cases). The only thing which is wrong is the pointless DMA-mask
> > > > update. I could have easily dropped it in my patchset. But after the
> > > > @William patchset is applied I won't be able to do that due to using
> > > > the dma_alloc_coherent() here.
> > > 
> > > Once again, it is not pointless. There is no guarantee that __GFP_DMA32 does
> > > anything, since ZONE_DMA32 may not exist (per this patch), and the zones may
> > > not be as expected anyway (look at what arm64 currently does if all RAM is
> > > above 32 bits, but save those complaints for another thread).
> > > 
> > > > > At this
> > > > > point I'd rather keep it since getting rid of the __GFP_DMA32 abuse is
> > > > > objectively good. If losing one page of coherent memory is a measurably
> > > > > significant problem for T1 once the other issues are worked out and that
> > > > > series lands, then you're welcome to propose a change on top (but I would
> > > > > prefer that all the drivers using this trick are changed consistently).
> > > > 
> > > > Regarding DMA-coherent allocation. I am not happy with losing a whole
> > > > page of the dma-coherent memory, but we can live with that. What give
> > > > additional difficulty for our eDMA-patches is the DMA-mask override.
> > > > If you still insist on preserving the @William patchset as it is,
> > > > where do you suggest for me to update the DMA-mask if the low-level
> > > > driver won't be suitable for that anymore?
> > > 
> > > I'm not insisting anything, it's just that this patch is already reviewed
> > > and queued, is a small step towards being less wrong overall, and dropping
> > > it wouldn't actually solve any of your problems anyway, so I just feel that
> > > being obstructive because it falls short of perfection isn't helpful.
> > 
> > Thanks for the responses Robin! I agree that we don't have a complete solution
> > and need to fix this DMA address space bug, but dont think that's enough of
> > a reason to drop this patch series. At the very least I think patch 1/2 which
> > removes the ZONE_DMA32 dependency is a worthy patch to take for 6.1.
> > 
> > Thanks,
> > Will
> > 
> > > 
> > > Robin.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v5 1/2] PCI: dwc: Drop dependency on ZONE_DMA32
  2022-09-30 15:39             ` Robin Murphy
  2022-09-30 17:02               ` William McVicker
@ 2022-10-07 22:45               ` Serge Semin
  1 sibling, 0 replies; 26+ messages in thread
From: Serge Semin @ 2022-10-07 22:45 UTC (permalink / raw)
  To: Robin Murphy, Will McVicker, Lorenzo Pieralisi
  Cc: Jingoo Han, Gustavo Pimentel, Rob Herring,
	Krzysztof Wilczyński, Bjorn Helgaas, kernel-team,
	Vidya Sagar, Christoph Hellwig, linux-pci, linux-kernel,
	Isaac J . Manjarres

On Fri, Sep 30, 2022 at 04:39:05PM +0100, Robin Murphy wrote:
> On 2022-09-30 13:57, Serge Semin wrote:
> > On Fri, Sep 30, 2022 at 12:01:58PM +0100, Robin Murphy wrote:
> > > On 2022-09-29 20:32, Serge Semin wrote:
> > > > On Thu, Sep 29, 2022 at 07:25:03PM +0100, Robin Murphy wrote:
> > > > > On 2022-09-28 12:41, Serge Semin wrote:
> > > > > > On Thu, Aug 25, 2022 at 06:50:24PM +0000, Will McVicker wrote:
> > > > > > > Re-work the msi_msg DMA allocation logic to use dmam_alloc_coherent() which
> > > > > > > uses the coherent DMA mask to try to return an allocation within the DMA
> > > > > > > mask limits. With that, we now can drop the msi_page parameter in struct
> > > > > > > dw_pcie_rp. This allows kernel configurations that disable ZONE_DMA32 to
> > > > > > > continue supporting a 32-bit DMA mask. Without this patch, the PCIe host
> > > > > > > device will fail to probe when ZONE_DMA32 is disabled.
> > > > > > 
> > > > > > As Rob already said here
> > > > > > https://lore.kernel.org/all/CAL_JsqJh=d-B51b6yPBRq0tOwbChN=AFPr-a19U1QdQZAE7c1A@mail.gmail.com/
> > > > > > and I mentioned in this thread
> > > > > > https://lore.kernel.org/linux-pci/20220912000211.ct6asuhhmnatje5e@mobilestation/
> > > > > > DW PCIe MSI doesn't cause any DMA due to the way the iMSI-RX engine is
> > > > > > designed. So reserving any real system memory is a waste of one in
> > > > > > this case. Reserving DMA-coherent even more inappropriate since it
> > > > > > can be expensive on some platforms (see note in Part Ia of
> > > > > > Documentation/core-api/dma-api.rst). For instance on MIPS32 with
> > > > > > non-corehent common DMA.
> > > > > 
> > > > 
> > > > > This has been discussed before - in general it is difficult to pick an
> > > > > arbitrary MSI address that is *guaranteed* not to overlap any valid DMA
> > > > > address that somebody may try to use later. However there is a very easy way
> > > > > to guarantee that the DMA API won't give anyone a particular DMA address,
> > > > > which is to get an address directly from the DMA API and keep it. Yes, that
> > > > > can technically be done with a streaming mapping *if* you already have some
> > > > > memory allocated in a suitable physical location, but coherent allocations
> > > > > are even more foolproof, simpler to clean up (particularly with devres), and
> > > > > unlikely to be an issue on relevant platforms (do any MIPS32 systems use
> > > > > this driver?)
> > > > 
> > > > My patchset adds the DW PCIe RP controller support on MIPS32 arch:
> > > > https://lore.kernel.org/linux-pci/20220822184701.25246-21-Sergey.Semin@baikalelectronics.ru/
> > > > 
> > > > > 
> > > > > > > Fixes: 35797e672ff0 ("PCI: dwc: Fix MSI msi_msg DMA mapping")
> > > > > > > Reported-by: Isaac J. Manjarres <isaacmanjarres@google.com>
> > > > > > > Signed-off-by: Will McVicker <willmcvicker@google.com>
> > > > > > > Acked-by: Jingoo Han <jingoohan1@gmail.com>
> > > > > > > Reviewed-by: Rob Herring <robh@kernel.org>
> > > > > > > ---
> > > > > > >     .../pci/controller/dwc/pcie-designware-host.c | 28 +++++--------------
> > > > > > >     drivers/pci/controller/dwc/pcie-designware.h  |  1 -
> > > > > > >     2 files changed, 7 insertions(+), 22 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c
> > > > > > > index 7746f94a715f..39f3b37d4033 100644
> > > > > > > --- a/drivers/pci/controller/dwc/pcie-designware-host.c
> > > > > > > +++ b/drivers/pci/controller/dwc/pcie-designware-host.c
> > > > > > > @@ -267,15 +267,6 @@ static void dw_pcie_free_msi(struct dw_pcie_rp *pp)
> > > > > > >     	irq_domain_remove(pp->msi_domain);
> > > > > > >     	irq_domain_remove(pp->irq_domain);
> > > > > > > -
> > > > > > > -	if (pp->msi_data) {
> > > > > > > -		struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > > > > > -		struct device *dev = pci->dev;
> > > > > > > -
> > > > > > > -		dma_unmap_page(dev, pp->msi_data, PAGE_SIZE, DMA_FROM_DEVICE);
> > > > > > > -		if (pp->msi_page)
> > > > > > > -			__free_page(pp->msi_page);
> > > > > > > -	}
> > > > > > >     }
> > > > > > >     static void dw_pcie_msi_init(struct dw_pcie_rp *pp)
> > > > > > > @@ -336,6 +327,7 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> > > > > > >     	struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > > > > > >     	struct device *dev = pci->dev;
> > > > > > >     	struct platform_device *pdev = to_platform_device(dev);
> > > > > > > +	u64 *msi_vaddr;
> > > > > > >     	int ret;
> > > > > > >     	u32 ctrl, num_ctrls;
> > > > > > > @@ -375,22 +367,16 @@ static int dw_pcie_msi_host_init(struct dw_pcie_rp *pp)
> > > > > > >     						    dw_chained_msi_isr, pp);
> > > > > > >     	}
> > > > > > 
> > > > > > > -	ret = dma_set_mask(dev, DMA_BIT_MASK(32));
> > > > > > > +	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
> > > > > > 
> > > > > > This has been redundant in the first place since none of the DW PCIe
> > > > > > low-level drivers update the mask, and it's of 32-bits wide by default
> > > > > > anyway:
> > > > > > https://elixir.bootlin.com/linux/latest/source/drivers/of/platform.c#L167
> > > > > 
> > > > 
> > > > > No, in general drivers should always explicitly set their mask(s) and check
> > > > > the return value to make sure DMA is possible at all before trying any other
> > > > > DMA API calls. There's no guarantee that the default mask is usable (e.g.
> > > > > some systems don't have any 32-bit addressable RAM), or that it's even
> > > > > always 32 bits (due to crufty reasons of something of_dma_configure() tried
> > > > > to do a long time ago).
> > > > 
> > > > Suppose you are right and DMA-mask should be always set before any
> > > > mapping. What do you suggest to do in this case? (1) The code above
> > > > overrides the real DMA-mask which could be set by the platform
> > > > drivers, which in its turn are normally aware of the device DMA
> > > > capabilities.
> > > 
> > > I am right. Appropriate DMA API usage as defined by the DMA API maintainers
> > > is not a matter of supposition. I literally just explained right there why
> > > drivers can't blindly assume the default mask is usable on modern systems
> > > (yes, it was different 20 years ago when system topologies were simpler).
> > > 
> > 
> > > However, having now gone and looked at the whole driver rather than unclear
> > > fragments of patch context, the code here *is* technically wrong. I've been
> > > mistakenly thinking all along that this was operating on the PCI device
> > > because I know that's what it *should* be doing, and seeing misleading
> > > things like "dev = pci->dev" falsely affirmed that assumption that it would
> > > be correct because it's been around for ages.
> > > AFAIU the correct PCI device
> > > won't actually exist until we've got far enough through pci_host_probe(), so
> > > I'm not sure how to easily solve this :/
> > 
> > Right. The code affected by the subject patch has nothing to do with
> > the real PCI devices. The DMA-mask is set to the DW PCIe Host controller
> > platform device in order to force a page being allocated within 32-bit
> > address space. That's it.
> > 
> > Here is a log of the related changes:
> > 
> > 0. Initially a GFP_KERNEL-based page was allocated for the MSI buffer.
> > It may cause having the DMA/PCIe-address above 4GB, which wouldn't work
> > for the PCIe peripherals with only 32-bit MSI capability. Though
> > nobody bothered back then.
> > 
> > 1. 07940c369a6b ("PCI: dwc: Fix MSI page leakage in suspend/resume")
> > After this commit nothing really has changed, but instead of
> > allocating the MSI-message pseudo-buffer turned to be embedded into
> > the private data. It could be allocated at any base address with no
> > actual limitation (because private data is kmalloc'ed).
> > 
> > 2. 660c486590aa ("PCI: dwc: Set 32-bit DMA mask for MSI target address allocation")
> > Someone found out that some devices failed to deliver MSI to the
> > address above 4GB of PCIe address space and fixed the problem by
> > force-setting the DMA-mask to being 32-bit before mapping the MSI
> > buffer. It indeed fixed the problem, but the actual buffer still left
> > being allocated from any address space. Instead, the mapping procedure
> > just bounced the buffer to 4GB space. So basically the solution was
> > very clumsy since turns a bounce buffer being reserved forever.
> > 
> > 3. 35797e672ff0 PCI: dwc: Fix MSI msi_msg DMA mapping
> > @William basically got things back to (0) but instead of GFP_KERNEL
> > the page was allocated from GFP_DMA32. At this stage he should have
> > dropped the DMA-mask setting too since the buffer was already
> > guaranteed to be within 4GB space, but he didn't.
> 
> I never saw that change, but frankly the justification in the commit message
> is wrong. I know that there are Android systems with memory above 32 bits
> that run with SWIOTLB disabled because they think they know what they're
> doing, which are almost certainly the same ones that also want to disable
> ZONE_DMA32 for similar reasons. That patch is really just another hack
> around an unexpected configuration, but without saying so.
> 
> > So now we have what we have. The DMA-mask is pointlessly changed for
> > something not really implying any DMA. That's why I insisted on
> > dropping it at the very least. Another reason I thought was also
> > appropriate was the default DMA-mask being set to 32bits anyway.
> > But you said we shouldn't rely on the default DMA-mask setting. So
> > ok, it doesn't count then. But it doesn't change the uselessness of the
> > DMA-mask change in the current driver.
> 

> As I keep saying, it *is* relevant to DMA. The MSI doorbell may not be
> accessing memory, but it is still a thing that occupies DMA address space
> like a mapping of memory does, and DMA masks are how we control how DMA
> address space is allocated. Unless and until we have an API for arbitrarily
> reserving DMA address space within a given range, the approach used here and
> in other drivers is the next best thing, however much you don't like it.

It's not that I don't like the approach. Now after the @William's
patch is merged in I don't really see a well suitable place for
setting the real DMA-mask of the DW PCIe controller for the eDMA
operations. I didn't want to set the mask in the DW eDMA probe
function since it was common for both eDMA embedded into a remote PCIe
function and installed into the local Root Port/End-point device. In
the first case the DMA-mask is already set in the dw-edma-pcie.c
driver. In order to cover the later case I was going to use the
low-level platform drivers. But after not being able to get rid from
the DMA-mask override performed in the MSI-setup procedure I will have
no choice to hack the eDMA driver somehow (see the last comment for
details).

> 
> > > AFAIU the correct PCI device
> > > won't actually exist until we've got far enough through pci_host_probe(), so
> > > I'm not sure how to easily solve this :/
> > 
> > Walk over all PCIe devices detected on the PCIe-bus. Check their
> > MSI-capability flags. If any of them have no 64-bit MSI flag set, then
> > make sure the MSI-base address is allocated within 4GB memory region.
> > It isn't that easy to implement though...
> 

> It has nothing to do with capabilities (but also: consider hotplug).

Actually it does. Based on the MSI 64-bit MSI flags state we would need
to chose either the pure 32-bit or 64/32-bit mask. Of course hotplug
would be a problem. But it's not the point in this case.

> We
> simply need the host bridge PCI device to pass to the DMA API to ensure that
> the mapping/allocation is relative to PCI Mem space rather than system
> physical address space, because the two don't have to be identical.

Right, that would have solved the problem with setting the DMA-mask to
a correct device. But the PCIe-space-related "dma-ranges" would be
still left applied to the DW PCIe platform device. Moreover in general
the PCIe host interface can have its own "dma-ranges". So we'll need
to have two types of the "dma-ranges": for the PCIe-bus to initialize
the inbound iATU and for the DW PCIe-controller itself. Furthermore
the later translations are applicable for the former addresses too... 

> The
> challenge is how to reliably pick up that device and set up the doorbell
> *before* any other PCI devices also discovered by pci_host_probe() have a
> chance to start binding drivers and trying to request MSIs.

There is no way at the current pci_host_probe() method implementation
because it performs the host bridge device registration and the
PCIe-bus scanning. There either have to be a hook to be called after
the bridge is registered or the pci_host_probe() would have to be
split up into two sub-methods: bridge-registration and scan child
buses. I have doubts that either case would be happily welcomed by
@Bjorn.

> 
> > > Of course *this* patch doesn't change any of that either, so it's no worse
> > > than the existing code and I don't see that dropping it helps you at all;
> > > the current driver is already trampling your 64-bit mask back to 32 bits
> > 
> > Yes, and by this pathset @William intend to fix the DMA-mask-override
> > behaviour by using the dma_alloc_coherent() method.
> 

> No, that is effectively unchanged.

I meant the same. It's unchanged, but if before the @William' patch we
could have drop the DMA-mask change at the assumption that the CPU and
PCIe spaces are identical and relying on the DMA32 zone availability,
now we can't because of the dma_alloc_coherent() method utilization.

> Whether it's a streaming mapping with
> dma_mask or a coherent allocation with coherent_dma_mask, masks are getting
> set way, it's the fact that it's on the wrong device that's the real
> problem.
> 
> If you expose the eDMA as a generic dmaengine device then there's every
> chance some consumer would make a streaming mapping with it, so even though
> the current code happens to not override the coherent mask it's still biting
> you in the streaming mask.

Right. It is. That's why I was trying to come up with an approach for
@William to drop the DMA-mask override at all, i.e. either rely on the
default DMA-mask state or just move the mask setting to the low-level
DW PCIe platform drivers.

> 
> > So any
> > platform-specific DMA-mask setting will be overwritten, and the
> > DMA-mask setting won't be able to be moved/dropped due to the
> > dma_alloc_coherent() method usage.
> > 
> > I have added a DW eDMA-engine support to the DW PCIe driver (you've
> > already seen my patches) and the engine initialization is supposed to
> > be performed after any basic initializations like CSRs mapping, data
> > allocations, MSI, etc. Since DMA is performed by the controller itself
> > it's required to have a correct DMA-mask set to the DW PCIe platform
> > device otherwise any consequent mapping will be bounce buffered to the
> > lowest 4GB even though the corresponding platform can support more
> > than 4GB of memory (even our MIPS32 can) with DW eDMA easily reaching
> > that memory. What would help me in this case if the MSI-buffer
> > allocation procedure wouldn't change the device DMA-mask.  As an
> > alternative to completely dropping the DMA-mask setting, the DMA-mask
> > setup process could be moved to the low-level platform device drivers.
> > It would be even more justified since the platform-specific device
> > capabilities (like DW PCIe AXI-interface address-bus width) are
> > unknown in the generic code.
> > 
> > As another alternative I could override the DMA-mask within the DW
> > eDMA probe procedure. But that would make things more complicated than
> > relying on the low-level platform drivers doing that.
> > 
> > > and
> > > reserving the doorbell address in the wrong DMA address space (modulo the
> > > other dma-ranges bug which also took far too long to figure out).
> > 
> > Actually current driver (without William patch) reserve the doorbell
> > address in the correct DMA address space (if we don't take the
> > dma-ranges settings into account).
> 

> No it does not. With or without this patch it is still passing the *platform
> device* to the DMA API, which means the mapping is relative to the platform
> address space, not PCI Mem space on the other side of the iATU. The fact
> that the iATU's dma-ranges translation is erroneously applied to the
> platform device at the moment is, as I have said, a bug.

That's why I noted "if we don't take the dma-ranges settings into
account". Taking out a wrong device utilization and if the CPU and PCIe
spaces are identical then passing the *platform device* to the DMA API
will do what is expected by the current code.

> 
> > It works as expected in case if the
> > PCIe<->CPU space has one-on-one mapping (which is true in the most of
> > the cases). The only thing which is wrong is the pointless DMA-mask
> > update. I could have easily dropped it in my patchset. But after the
> > @William patchset is applied I won't be able to do that due to using
> > the dma_alloc_coherent() here.
> 

> Once again, it is not pointless. There is no guarantee that __GFP_DMA32 does
> anything, since ZONE_DMA32 may not exist (per this patch), and the zones may
> not be as expected anyway (look at what arm64 currently does if all RAM is
> above 32 bits, but save those complaints for another thread).

Ok. It is indeed not pointless taking the fact that the __GFP_DMA32 can
do nothing. (I would have renamed the __GFP_DMA32 flag to something like
__GFP_DMA32_NOT_GUARANTEED then)

> 
> > > At this
> > > point I'd rather keep it since getting rid of the __GFP_DMA32 abuse is
> > > objectively good. If losing one page of coherent memory is a measurably
> > > significant problem for T1 once the other issues are worked out and that
> > > series lands, then you're welcome to propose a change on top (but I would
> > > prefer that all the drivers using this trick are changed consistently).
> > 
> > Regarding DMA-coherent allocation. I am not happy with losing a whole
> > page of the dma-coherent memory, but we can live with that. What give
> > additional difficulty for our eDMA-patches is the DMA-mask override.
> > If you still insist on preserving the @William patchset as it is,
> > where do you suggest for me to update the DMA-mask if the low-level
> > driver won't be suitable for that anymore?
> 

> I'm not insisting anything, it's just that this patch is already reviewed
> and queued, is a small step towards being less wrong overall, and dropping
> it wouldn't actually solve any of your problems anyway, so I just feel that
> being obstructive because it falls short of perfection isn't helpful.

I am starting to get confused because the problem has turned to be
multi-dimensional. We started discussing the DMA-mask override, while
now we are talking about a wrong device usage and the dma-ranges bug.
Let me sum up so we could come to some conclusions on each problem. 

1. This patch is already merged in by @Lorenzo.
The streaming and coherent DMA-masks are still forced to be of 32-bits
in the DW PCIe platform device. The low-level DW PCIe platform drivers
can't be used to set a correct mask now since it will be overwritten
by the MSI-setup procedure.

* Temporary solution: I'll override the DMA-mask in the eDMA driver:
try 64-bit mask, if failed - fallback to 32-bit mask. In fact I'll set
the DMA-mask to the DMAengine backing device since it will be used for
the correct DMA-mappings anyway (see problem 3.)

2. The DMA-mask is set to a wrong device in the framework of the
MSI-setup procedure.
Indeed it is supposed to be set to the PCI bridge device. But we can't
do that due to the pci_host_probe() method implementation. The device
is not ready for the DMA-mask update and the memory allocation/mapping
at the stage of the MSI-setup procedure execution. Isn't it? (Note the
empty instance is available at that stage.)

3. PCIe-bus-related DMA-ranges are applied to the PCIe host interface.
This is a bug. It can't be easily fixed due to the PCIe-specific
semantic of the "dma-ranges" OF-property. @Robin promised to think of
a proper solution for this.

* Temporary solution: I'll set the dma_chan.dev->chan_dma_dev flag to
true, just initialize the DMA-parameters of the DMA-channel backing
device instance by using the {of,acpi}_dma_configure() methods and drop the
dma_range_map assigned to it. In addition I'll set the DMA-mask of the
DMA-channel backing device too to solve the problem 1 (see above).

Sounds like a doable plan. What do you think?

-Sergey

> 
> Robin.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-10-07 22:45 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-25 18:50 [PATCH v5 0/2] PCI: dwc: Add support for 64-bit MSI target addresses Will McVicker
2022-08-25 18:50 ` [PATCH v5 1/2] PCI: dwc: Drop dependency on ZONE_DMA32 Will McVicker
2022-09-28 11:41   ` Serge Semin
2022-09-29 18:25     ` Robin Murphy
2022-09-29 19:32       ` Serge Semin
2022-09-30 11:01         ` Robin Murphy
2022-09-30 12:57           ` Serge Semin
2022-09-30 15:39             ` Robin Murphy
2022-09-30 17:02               ` William McVicker
2022-10-03  8:04                 ` Lorenzo Pieralisi
2022-10-03 16:40                   ` William McVicker
2022-10-07 22:45               ` Serge Semin
2022-08-25 18:50 ` [PATCH v5 2/2] PCI: dwc: Add support for 64-bit MSI target address Will McVicker
2022-08-25 20:59   ` Robin Murphy
2022-08-25 21:22     ` William McVicker
2022-09-09 13:29   ` Christoph Hellwig
2022-09-09 13:47     ` Robin Murphy
2022-09-09 14:47       ` Christoph Hellwig
2022-09-09 15:00         ` Robin Murphy
2022-09-28 12:05   ` Serge Semin
2022-09-28 17:52     ` William McVicker
2022-09-29  8:13       ` Lorenzo Pieralisi
2022-09-29 18:50         ` William McVicker
2022-09-29 19:00           ` Serge Semin
2022-09-30 13:46     ` Lorenzo Pieralisi
2022-09-30 14:14       ` Serge Semin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).