All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] PCI: tegra: limit MSI target address to 32-bit
@ 2017-11-06 18:03 ` Vidya Sagar
  0 siblings, 0 replies; 30+ messages in thread
From: Vidya Sagar @ 2017-11-06 18:03 UTC (permalink / raw)
  To: treding-DDmLM1+adcrQT0dZR+AlfA, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, kthota-DDmLM1+adcrQT0dZR+AlfA,
	swarren-DDmLM1+adcrQT0dZR+AlfA,
	mmaddireddy-DDmLM1+adcrQT0dZR+AlfA,
	vidyas-DDmLM1+adcrQT0dZR+AlfA

limits MSI target address to only 32-bit region to enable
some of the PCIe end points where only 32-bit MSIs
are supported work properly.
One example being Marvel SATA controller

Signed-off-by: Vidya Sagar <vidyas-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
---
 drivers/pci/host/pci-tegra.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
index 1987fec1f126..03d3dcdd06c2 100644
--- a/drivers/pci/host/pci-tegra.c
+++ b/drivers/pci/host/pci-tegra.c
@@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
 	}
 
 	/* setup AFI/FPCI range */
-	msi->pages = __get_free_pages(GFP_KERNEL, 0);
+	msi->pages = __get_free_pages(GFP_DMA, 0);
 	msi->phys = virt_to_phys((void *)msi->pages);
 
 	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH] PCI: tegra: limit MSI target address to 32-bit
@ 2017-11-06 18:03 ` Vidya Sagar
  0 siblings, 0 replies; 30+ messages in thread
From: Vidya Sagar @ 2017-11-06 18:03 UTC (permalink / raw)
  To: treding, bhelgaas
  Cc: linux-tegra, linux-pci, kthota, swarren, mmaddireddy, vidyas

limits MSI target address to only 32-bit region to enable
some of the PCIe end points where only 32-bit MSIs
are supported work properly.
One example being Marvel SATA controller

Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
---
 drivers/pci/host/pci-tegra.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
index 1987fec1f126..03d3dcdd06c2 100644
--- a/drivers/pci/host/pci-tegra.c
+++ b/drivers/pci/host/pci-tegra.c
@@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
 	}
 
 	/* setup AFI/FPCI range */
-	msi->pages = __get_free_pages(GFP_KERNEL, 0);
+	msi->pages = __get_free_pages(GFP_DMA, 0);
 	msi->phys = virt_to_phys((void *)msi->pages);
 
 	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
  2017-11-06 18:03 ` Vidya Sagar
@ 2017-11-08 21:25     ` Bjorn Helgaas
  -1 siblings, 0 replies; 30+ messages in thread
From: Bjorn Helgaas @ 2017-11-08 21:25 UTC (permalink / raw)
  To: Vidya Sagar
  Cc: treding-DDmLM1+adcrQT0dZR+AlfA, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, kthota-DDmLM1+adcrQT0dZR+AlfA,
	swarren-DDmLM1+adcrQT0dZR+AlfA,
	mmaddireddy-DDmLM1+adcrQT0dZR+AlfA, Michal Simek,
	Sören Brinkmann, Simon Horman

[+cc Michal, Sören, Simon]

On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
> limits MSI target address to only 32-bit region to enable
> some of the PCIe end points where only 32-bit MSIs
> are supported work properly.
> One example being Marvel SATA controller
> 
> Signed-off-by: Vidya Sagar <vidyas-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/pci/host/pci-tegra.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
> index 1987fec1f126..03d3dcdd06c2 100644
> --- a/drivers/pci/host/pci-tegra.c
> +++ b/drivers/pci/host/pci-tegra.c
> @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
>  	}
>  
>  	/* setup AFI/FPCI range */
> -	msi->pages = __get_free_pages(GFP_KERNEL, 0);
> +	msi->pages = __get_free_pages(GFP_DMA, 0);
>  	msi->phys = virt_to_phys((void *)msi->pages);

Should this be GFP_DMA32?  See the comment above the GFP_DMA
definition.

Should we be using virt_to_phys() here?  Where exactly is the result
("msi->phys") used, i.e., what bus will that address appear on?  If it
appears on the PCI side, this should probably use something like
pcibios_resource_to_bus().

Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a similar
problem?  They both use GFP_KERNEL, then virt_to_phys(), then write the
result of virt_to_phys() using a 32-bit register write.

>  	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
@ 2017-11-08 21:25     ` Bjorn Helgaas
  0 siblings, 0 replies; 30+ messages in thread
From: Bjorn Helgaas @ 2017-11-08 21:25 UTC (permalink / raw)
  To: Vidya Sagar
  Cc: treding, bhelgaas, linux-tegra, linux-pci, kthota, swarren,
	mmaddireddy, Michal Simek, Sören Brinkmann, Simon Horman

[+cc Michal, Sören, Simon]

On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
> limits MSI target address to only 32-bit region to enable
> some of the PCIe end points where only 32-bit MSIs
> are supported work properly.
> One example being Marvel SATA controller
> 
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> ---
>  drivers/pci/host/pci-tegra.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
> index 1987fec1f126..03d3dcdd06c2 100644
> --- a/drivers/pci/host/pci-tegra.c
> +++ b/drivers/pci/host/pci-tegra.c
> @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
>  	}
>  
>  	/* setup AFI/FPCI range */
> -	msi->pages = __get_free_pages(GFP_KERNEL, 0);
> +	msi->pages = __get_free_pages(GFP_DMA, 0);
>  	msi->phys = virt_to_phys((void *)msi->pages);

Should this be GFP_DMA32?  See the comment above the GFP_DMA
definition.

Should we be using virt_to_phys() here?  Where exactly is the result
("msi->phys") used, i.e., what bus will that address appear on?  If it
appears on the PCI side, this should probably use something like
pcibios_resource_to_bus().

Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a similar
problem?  They both use GFP_KERNEL, then virt_to_phys(), then write the
result of virt_to_phys() using a 32-bit register write.

>  	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
  2017-11-08 21:25     ` Bjorn Helgaas
@ 2017-11-09  7:18         ` Vidya Sagar
  -1 siblings, 0 replies; 30+ messages in thread
From: Vidya Sagar @ 2017-11-09  7:18 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: treding-DDmLM1+adcrQT0dZR+AlfA, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, kthota-DDmLM1+adcrQT0dZR+AlfA,
	swarren-DDmLM1+adcrQT0dZR+AlfA,
	mmaddireddy-DDmLM1+adcrQT0dZR+AlfA, Michal Simek,
	Sören Brinkmann, Simon Horman



On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
> [+cc Michal, Sören, Simon]
>
> On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
>> limits MSI target address to only 32-bit region to enable
>> some of the PCIe end points where only 32-bit MSIs
>> are supported work properly.
>> One example being Marvel SATA controller
>>
>> Signed-off-by: Vidya Sagar <vidyas-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
>> ---
>>   drivers/pci/host/pci-tegra.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
>> index 1987fec1f126..03d3dcdd06c2 100644
>> --- a/drivers/pci/host/pci-tegra.c
>> +++ b/drivers/pci/host/pci-tegra.c
>> @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
>>   	}
>>   
>>   	/* setup AFI/FPCI range */
>> -	msi->pages = __get_free_pages(GFP_KERNEL, 0);
>> +	msi->pages = __get_free_pages(GFP_DMA, 0);
>>   	msi->phys = virt_to_phys((void *)msi->pages);
> Should this be GFP_DMA32?  See the comment above the GFP_DMA
> definition.
looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
is the correct one to use, but, even with that I got >32-bit addresses.
GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
I didn't dig into it to find out why is this the case.
> Should we be using virt_to_phys() here?  Where exactly is the result
> ("msi->phys") used, i.e., what bus will that address appear on?  If it
> appears on the PCI side, this should probably use something like
> pcibios_resource_to_bus().
This address is written to two places.
First, into host's internal register to let it know that when an incoming
memory write comes with this address, raise an MSI interrupt instead of 
forwarding it to
memory subsystem.
Second, into 'Message Address' field of 'Message Address Register for 
MSI' register in
end point's configuration space (this is done by MSI framework) for end 
point to know
which address to be used to generate MSI interrupt.
> Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a similar
> problem?  They both use GFP_KERNEL, then virt_to_phys(), then write the
> result of virt_to_phys() using a 32-bit register write.
Well, if those systems deal with 64-bit addresses and when an end point 
is connected which supports
only 32-bit MSI addresses, this problem will surface when 
__get_free_pages() returns an address that
translates to a >32-bit address after virt_to_phys() call on it.
>
>>   	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);
>> -- 
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
@ 2017-11-09  7:18         ` Vidya Sagar
  0 siblings, 0 replies; 30+ messages in thread
From: Vidya Sagar @ 2017-11-09  7:18 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: treding, bhelgaas, linux-tegra, linux-pci, kthota, swarren,
	mmaddireddy, Michal Simek, Sören Brinkmann, Simon Horman



On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
> [+cc Michal, S=C3=B6ren, Simon]
>
> On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
>> limits MSI target address to only 32-bit region to enable
>> some of the PCIe end points where only 32-bit MSIs
>> are supported work properly.
>> One example being Marvel SATA controller
>>
>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>> ---
>>   drivers/pci/host/pci-tegra.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
>> index 1987fec1f126..03d3dcdd06c2 100644
>> --- a/drivers/pci/host/pci-tegra.c
>> +++ b/drivers/pci/host/pci-tegra.c
>> @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie=
 *pcie)
>>   	}
>>  =20
>>   	/* setup AFI/FPCI range */
>> -	msi->pages =3D __get_free_pages(GFP_KERNEL, 0);
>> +	msi->pages =3D __get_free_pages(GFP_DMA, 0);
>>   	msi->phys =3D virt_to_phys((void *)msi->pages);
> Should this be GFP_DMA32?  See the comment above the GFP_DMA
> definition.
looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
is the correct one to use, but, even with that I got >32-bit addresses.
GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
I didn't dig into it to find out why is this the case.
> Should we be using virt_to_phys() here?  Where exactly is the result
> ("msi->phys") used, i.e., what bus will that address appear on?  If it
> appears on the PCI side, this should probably use something like
> pcibios_resource_to_bus().
This address is written to two places.
First, into host's internal register to let it know that when an incoming
memory write comes with this address, raise an MSI interrupt instead of=20
forwarding it to
memory subsystem.
Second, into 'Message Address' field of 'Message Address Register for=20
MSI' register in
end point's configuration space (this is done by MSI framework) for end=20
point to know
which address to be used to generate MSI interrupt.
> Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a similar
> problem?  They both use GFP_KERNEL, then virt_to_phys(), then write the
> result of virt_to_phys() using a 32-bit register write.
Well, if those systems deal with 64-bit addresses and when an end point=20
is connected which supports
only 32-bit MSI addresses, this problem will surface when=20
__get_free_pages() returns an address that
translates to a >32-bit address after virt_to_phys() call on it.
>
>>   	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_S=
T);
>> --=20
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
  2017-11-09  7:18         ` Vidya Sagar
@ 2017-11-09 18:14             ` Bjorn Helgaas
  -1 siblings, 0 replies; 30+ messages in thread
From: Bjorn Helgaas @ 2017-11-09 18:14 UTC (permalink / raw)
  To: Vidya Sagar
  Cc: treding-DDmLM1+adcrQT0dZR+AlfA, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, kthota-DDmLM1+adcrQT0dZR+AlfA,
	swarren-DDmLM1+adcrQT0dZR+AlfA,
	mmaddireddy-DDmLM1+adcrQT0dZR+AlfA, Michal Simek,
	Sören Brinkmann, Simon Horman, Lorenzo Pieralisi

[+cc Lorenzo]

On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
> On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
> >On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
> >>limits MSI target address to only 32-bit region to enable
> >>some of the PCIe end points where only 32-bit MSIs
> >>are supported work properly.
> >>One example being Marvel SATA controller
> >>
> >>Signed-off-by: Vidya Sagar <vidyas-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> >>---
> >>  drivers/pci/host/pci-tegra.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >>diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
> >>index 1987fec1f126..03d3dcdd06c2 100644
> >>--- a/drivers/pci/host/pci-tegra.c
> >>+++ b/drivers/pci/host/pci-tegra.c
> >>@@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
> >>  	}
> >>  	/* setup AFI/FPCI range */
> >>-	msi->pages = __get_free_pages(GFP_KERNEL, 0);
> >>+	msi->pages = __get_free_pages(GFP_DMA, 0);
> >>  	msi->phys = virt_to_phys((void *)msi->pages);
>
> >Should this be GFP_DMA32?  See the comment above the GFP_DMA
> >definition.
>
> looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
> is the correct one to use, but, even with that I got >32-bit addresses.
> GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
> I didn't dig into it to find out why is this the case.

This sounds worth looking into (but maybe we don't need the
__get_free_pages() at all; see below).  Maybe there's some underlying
bug.  My laptop shows this, which looks like it might be related:

  Zone ranges:
    DMA      [mem 0x0000000000001000-0x0000000000ffffff]
    DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
    Normal   [mem 0x0000000100000000-0x00000004217fffff]
    Device   empty

What does your machine show?

> >Should we be using virt_to_phys() here?  Where exactly is the
> >result ("msi->phys") used, i.e., what bus will that address appear
> >on?  If it appears on the PCI side, this should probably use
> >something like pcibios_resource_to_bus().
>
> This address is written to two places.  First, into host's internal
> register to let it know that when an incoming memory write comes
> with this address, raise an MSI interrupt instead of forwarding it
> to memory subsystem.  Second, into 'Message Address' field of
> 'Message Address Register for MSI' register in end point's
> configuration space (this is done by MSI framework) for end point to
> know which address to be used to generate MSI interrupt.

Hmmm, ISTR some past discussion about this.  Here's a little: [1, 2].
And this commit [3] sounds like it describes a similar hardware
situation with Tegra where the host bridge intercepts the MSI target
address, so writes to it never reach system memory.  That means that
Tegra doesn't need to allocate system memory at all.

Is your system similar?  Can you just statically allocate a little bus
address space, use that for the MSI target address, and skip the
__get_free_pages()?

> >Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
> >similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
> >then write the result of virt_to_phys() using a 32-bit register
> >write.
>
> Well, if those systems deal with 64-bit addresses and when an end
> point is connected which supports only 32-bit MSI addresses, this
> problem will surface when __get_free_pages() returns an address that
> translates to a >32-bit address after virt_to_phys() call on it.

I'd like to hear from the R-Car and Xilinx folks about (1) whether
there's a potential issue with truncating a 64-bit address, and
(2) whether that hardware works like Tegra, where the MSI write never
reaches memory so we don't actually need to allocate a page.

If all we need is to allocate a little bus address space for the MSI
target, I'd like to figure out a better way to do that than
__get_free_pages().  The current code seems a little buggy, and
it's getting propagated through several drivers.

> >>  	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);

[1] https://lkml.kernel.org/r/20170824134451.GA31858-1RhO1Y9PlrlHTL0Zs8A6p5iNqAH0jzoTYJqu5kTmcBRl57MIdRCFDg@public.gmane.org
[2] https://lkml.kernel.org/r/86efs3wesi.fsf-5wv7dgnIgG8@public.gmane.org
[3] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d7bd554f27c9

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
@ 2017-11-09 18:14             ` Bjorn Helgaas
  0 siblings, 0 replies; 30+ messages in thread
From: Bjorn Helgaas @ 2017-11-09 18:14 UTC (permalink / raw)
  To: Vidya Sagar
  Cc: treding, bhelgaas, linux-tegra, linux-pci, kthota, swarren,
	mmaddireddy, Michal Simek, Sören Brinkmann, Simon Horman,
	Lorenzo Pieralisi

[+cc Lorenzo]

On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
> On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
> >On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
> >>limits MSI target address to only 32-bit region to enable
> >>some of the PCIe end points where only 32-bit MSIs
> >>are supported work properly.
> >>One example being Marvel SATA controller
> >>
> >>Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> >>---
> >>  drivers/pci/host/pci-tegra.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >>diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
> >>index 1987fec1f126..03d3dcdd06c2 100644
> >>--- a/drivers/pci/host/pci-tegra.c
> >>+++ b/drivers/pci/host/pci-tegra.c
> >>@@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
> >>  	}
> >>  	/* setup AFI/FPCI range */
> >>-	msi->pages = __get_free_pages(GFP_KERNEL, 0);
> >>+	msi->pages = __get_free_pages(GFP_DMA, 0);
> >>  	msi->phys = virt_to_phys((void *)msi->pages);
>
> >Should this be GFP_DMA32?  See the comment above the GFP_DMA
> >definition.
>
> looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
> is the correct one to use, but, even with that I got >32-bit addresses.
> GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
> I didn't dig into it to find out why is this the case.

This sounds worth looking into (but maybe we don't need the
__get_free_pages() at all; see below).  Maybe there's some underlying
bug.  My laptop shows this, which looks like it might be related:

  Zone ranges:
    DMA      [mem 0x0000000000001000-0x0000000000ffffff]
    DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
    Normal   [mem 0x0000000100000000-0x00000004217fffff]
    Device   empty

What does your machine show?

> >Should we be using virt_to_phys() here?  Where exactly is the
> >result ("msi->phys") used, i.e., what bus will that address appear
> >on?  If it appears on the PCI side, this should probably use
> >something like pcibios_resource_to_bus().
>
> This address is written to two places.  First, into host's internal
> register to let it know that when an incoming memory write comes
> with this address, raise an MSI interrupt instead of forwarding it
> to memory subsystem.  Second, into 'Message Address' field of
> 'Message Address Register for MSI' register in end point's
> configuration space (this is done by MSI framework) for end point to
> know which address to be used to generate MSI interrupt.

Hmmm, ISTR some past discussion about this.  Here's a little: [1, 2].
And this commit [3] sounds like it describes a similar hardware
situation with Tegra where the host bridge intercepts the MSI target
address, so writes to it never reach system memory.  That means that
Tegra doesn't need to allocate system memory at all.

Is your system similar?  Can you just statically allocate a little bus
address space, use that for the MSI target address, and skip the
__get_free_pages()?

> >Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
> >similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
> >then write the result of virt_to_phys() using a 32-bit register
> >write.
>
> Well, if those systems deal with 64-bit addresses and when an end
> point is connected which supports only 32-bit MSI addresses, this
> problem will surface when __get_free_pages() returns an address that
> translates to a >32-bit address after virt_to_phys() call on it.

I'd like to hear from the R-Car and Xilinx folks about (1) whether
there's a potential issue with truncating a 64-bit address, and
(2) whether that hardware works like Tegra, where the MSI write never
reaches memory so we don't actually need to allocate a page.

If all we need is to allocate a little bus address space for the MSI
target, I'd like to figure out a better way to do that than
__get_free_pages().  The current code seems a little buggy, and
it's getting propagated through several drivers.

> >>  	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);

[1] https://lkml.kernel.org/r/20170824134451.GA31858@bhelgaas-glaptop.roam.corp.google.com
[2] https://lkml.kernel.org/r/86efs3wesi.fsf@arm.com
[3] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d7bd554f27c9

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
  2017-11-09 18:14             ` Bjorn Helgaas
@ 2017-11-09 18:41               ` Vidya Sagar
  -1 siblings, 0 replies; 30+ messages in thread
From: Vidya Sagar @ 2017-11-09 18:41 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: treding, bhelgaas, linux-tegra, linux-pci, kthota, swarren,
	mmaddireddy, Michal Simek, Sören Brinkmann, Simon Horman,
	Lorenzo Pieralisi



On Thursday 09 November 2017 11:44 PM, Bjorn Helgaas wrote:
> [+cc Lorenzo]
>
> On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
>> On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
>>> On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
>>>> limits MSI target address to only 32-bit region to enable
>>>> some of the PCIe end points where only 32-bit MSIs
>>>> are supported work properly.
>>>> One example being Marvel SATA controller
>>>>
>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>>>> ---
>>>>   drivers/pci/host/pci-tegra.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
>>>> index 1987fec1f126..03d3dcdd06c2 100644
>>>> --- a/drivers/pci/host/pci-tegra.c
>>>> +++ b/drivers/pci/host/pci-tegra.c
>>>> @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
>>>>   	}
>>>>   	/* setup AFI/FPCI range */
>>>> -	msi->pages = __get_free_pages(GFP_KERNEL, 0);
>>>> +	msi->pages = __get_free_pages(GFP_DMA, 0);
>>>>   	msi->phys = virt_to_phys((void *)msi->pages);
>>> Should this be GFP_DMA32?  See the comment above the GFP_DMA
>>> definition.
>> looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
>> is the correct one to use, but, even with that I got >32-bit addresses.
>> GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
>> I didn't dig into it to find out why is this the case.
> This sounds worth looking into (but maybe we don't need the
> __get_free_pages() at all; see below).  Maybe there's some underlying
> bug.  My laptop shows this, which looks like it might be related:
>
>    Zone ranges:
>      DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>      DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
>      Normal   [mem 0x0000000100000000-0x00000004217fffff]
>      Device   empty
>
> What does your machine show?
I see following in my linux box
  Zone ranges:
    DMA      [mem 0x0000000000001000-0x0000000000ffffff]
    DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
    Normal   [mem 0x0000000100000000-0x000000106effffff]
    Device   empty

and following in my T210 Tegra platform
Zone ranges:
   DMA      [mem 0x0000000080000000-0x00000000ffffffff]
   Normal   [mem 0x0000000100000000-0x000000017fffffff]

>
>>> Should we be using virt_to_phys() here?  Where exactly is the
>>> result ("msi->phys") used, i.e., what bus will that address appear
>>> on?  If it appears on the PCI side, this should probably use
>>> something like pcibios_resource_to_bus().
>> This address is written to two places.  First, into host's internal
>> register to let it know that when an incoming memory write comes
>> with this address, raise an MSI interrupt instead of forwarding it
>> to memory subsystem.  Second, into 'Message Address' field of
>> 'Message Address Register for MSI' register in end point's
>> configuration space (this is done by MSI framework) for end point to
>> know which address to be used to generate MSI interrupt.
> Hmmm, ISTR some past discussion about this.  Here's a little: [1, 2].
> And this commit [3] sounds like it describes a similar hardware
> situation with Tegra where the host bridge intercepts the MSI target
> address, so writes to it never reach system memory.  That means that
> Tegra doesn't need to allocate system memory at all.
>
> Is your system similar?  Can you just statically allocate a little bus
> address space, use that for the MSI target address, and skip the
> __get_free_pages()?
It is the same system where MSI memory writes don't really make it to
system memory. But, the addresses mentioned in that patch don't work.
we are working with hardware folks on understanding the issue better.
Meanwhile, using GFP_DMA is giving consistent results and thought this
can be used as a stop gap solution before we figure out a better bus
address to be used as MSI target address.
>>> Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
>>> similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
>>> then write the result of virt_to_phys() using a 32-bit register
>>> write.
>> Well, if those systems deal with 64-bit addresses and when an end
>> point is connected which supports only 32-bit MSI addresses, this
>> problem will surface when __get_free_pages() returns an address that
>> translates to a >32-bit address after virt_to_phys() call on it.
> I'd like to hear from the R-Car and Xilinx folks about (1) whether
> there's a potential issue with truncating a 64-bit address, and
> (2) whether that hardware works like Tegra, where the MSI write never
> reaches memory so we don't actually need to allocate a page.
>
> If all we need is to allocate a little bus address space for the MSI
> target, I'd like to figure out a better way to do that than
> __get_free_pages().  The current code seems a little buggy, and
> it's getting propagated through several drivers.
>
>>>>   	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);
> [1] https://lkml.kernel.org/r/20170824134451.GA31858@bhelgaas-glaptop.roam.corp.google.com
> [2] https://lkml.kernel.org/r/86efs3wesi.fsf@arm.com
> [3] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d7bd554f27c9

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
@ 2017-11-09 18:41               ` Vidya Sagar
  0 siblings, 0 replies; 30+ messages in thread
From: Vidya Sagar @ 2017-11-09 18:41 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: treding, bhelgaas, linux-tegra, linux-pci, kthota, swarren,
	mmaddireddy, Michal Simek, Sören Brinkmann, Simon Horman,
	Lorenzo Pieralisi



On Thursday 09 November 2017 11:44 PM, Bjorn Helgaas wrote:
> [+cc Lorenzo]
>
> On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
>> On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
>>> On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
>>>> limits MSI target address to only 32-bit region to enable
>>>> some of the PCIe end points where only 32-bit MSIs
>>>> are supported work properly.
>>>> One example being Marvel SATA controller
>>>>
>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>>>> ---
>>>>   drivers/pci/host/pci-tegra.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra=
.c
>>>> index 1987fec1f126..03d3dcdd06c2 100644
>>>> --- a/drivers/pci/host/pci-tegra.c
>>>> +++ b/drivers/pci/host/pci-tegra.c
>>>> @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pc=
ie *pcie)
>>>>   	}
>>>>   	/* setup AFI/FPCI range */
>>>> -	msi->pages =3D __get_free_pages(GFP_KERNEL, 0);
>>>> +	msi->pages =3D __get_free_pages(GFP_DMA, 0);
>>>>   	msi->phys =3D virt_to_phys((void *)msi->pages);
>>> Should this be GFP_DMA32?  See the comment above the GFP_DMA
>>> definition.
>> looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DM=
A32
>> is the correct one to use, but, even with that I got >32-bit addresses.
>> GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
>> I didn't dig into it to find out why is this the case.
> This sounds worth looking into (but maybe we don't need the
> __get_free_pages() at all; see below).  Maybe there's some underlying
> bug.  My laptop shows this, which looks like it might be related:
>
>    Zone ranges:
>      DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>      DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
>      Normal   [mem 0x0000000100000000-0x00000004217fffff]
>      Device   empty
>
> What does your machine show?
I see following in my linux box
 =C2=A0Zone ranges:
 =C2=A0=C2=A0 DMA=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 [mem 0x0000000000001000-0x0=
000000000ffffff]
 =C2=A0=C2=A0 DMA32=C2=A0=C2=A0=C2=A0 [mem 0x0000000001000000-0x00000000fff=
fffff]
 =C2=A0=C2=A0 Normal=C2=A0=C2=A0 [mem 0x0000000100000000-0x000000106effffff=
]
 =C2=A0=C2=A0 Device=C2=A0=C2=A0 empty

and following in my T210 Tegra platform
Zone ranges:
 =C2=A0 DMA=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 [mem 0x0000000080000000-0x0000000=
0ffffffff]
 =C2=A0 Normal=C2=A0=C2=A0 [mem 0x0000000100000000-0x000000017fffffff]

>
>>> Should we be using virt_to_phys() here?  Where exactly is the
>>> result ("msi->phys") used, i.e., what bus will that address appear
>>> on?  If it appears on the PCI side, this should probably use
>>> something like pcibios_resource_to_bus().
>> This address is written to two places.  First, into host's internal
>> register to let it know that when an incoming memory write comes
>> with this address, raise an MSI interrupt instead of forwarding it
>> to memory subsystem.  Second, into 'Message Address' field of
>> 'Message Address Register for MSI' register in end point's
>> configuration space (this is done by MSI framework) for end point to
>> know which address to be used to generate MSI interrupt.
> Hmmm, ISTR some past discussion about this.  Here's a little: [1, 2].
> And this commit [3] sounds like it describes a similar hardware
> situation with Tegra where the host bridge intercepts the MSI target
> address, so writes to it never reach system memory.  That means that
> Tegra doesn't need to allocate system memory at all.
>
> Is your system similar?  Can you just statically allocate a little bus
> address space, use that for the MSI target address, and skip the
> __get_free_pages()?
It is the same system where MSI memory writes don't really make it to
system memory. But, the addresses mentioned in that patch don't work.
we are working with hardware folks on understanding the issue better.
Meanwhile, using GFP_DMA is giving consistent results and thought this
can be used as a stop gap solution before we figure out a better bus
address to be used as MSI target address.
>>> Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
>>> similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
>>> then write the result of virt_to_phys() using a 32-bit register
>>> write.
>> Well, if those systems deal with 64-bit addresses and when an end
>> point is connected which supports only 32-bit MSI addresses, this
>> problem will surface when __get_free_pages() returns an address that
>> translates to a >32-bit address after virt_to_phys() call on it.
> I'd like to hear from the R-Car and Xilinx folks about (1) whether
> there's a potential issue with truncating a 64-bit address, and
> (2) whether that hardware works like Tegra, where the MSI write never
> reaches memory so we don't actually need to allocate a page.
>
> If all we need is to allocate a little bus address space for the MSI
> target, I'd like to figure out a better way to do that than
> __get_free_pages().  The current code seems a little buggy, and
> it's getting propagated through several drivers.
>
>>>>   	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR=
_ST);
> [1] https://lkml.kernel.org/r/20170824134451.GA31858@bhelgaas-glaptop.roa=
m.corp.google.com
> [2] https://lkml.kernel.org/r/86efs3wesi.fsf@arm.com
> [3] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit=
/?id=3Dd7bd554f27c9

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
  2017-11-09 18:14             ` Bjorn Helgaas
@ 2017-11-10  0:47                 ` subrahmanya_lingappa
  -1 siblings, 0 replies; 30+ messages in thread
From: subrahmanya_lingappa @ 2017-11-10  0:47 UTC (permalink / raw)
  To: Bjorn Helgaas, Vidya Sagar
  Cc: treding-DDmLM1+adcrQT0dZR+AlfA, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, kthota-DDmLM1+adcrQT0dZR+AlfA,
	swarren-DDmLM1+adcrQT0dZR+AlfA,
	mmaddireddy-DDmLM1+adcrQT0dZR+AlfA, Michal Simek,
	Sören Brinkmann, Simon Horman, Lorenzo Pieralisi

Bjorn,

On 11/9/2017 11:44 PM, Bjorn Helgaas wrote:
> [+cc Lorenzo]
>
> On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
>> On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
>>> On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
>>>> limits MSI target address to only 32-bit region to enable
>>>> some of the PCIe end points where only 32-bit MSIs
>>>> are supported work properly.
>>>> One example being Marvel SATA controller
>>>>
>>>> Signed-off-by: Vidya Sagar <vidyas-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
>>>> ---
>>>>  drivers/pci/host/pci-tegra.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
>>>> index 1987fec1f126..03d3dcdd06c2 100644
>>>> --- a/drivers/pci/host/pci-tegra.c
>>>> +++ b/drivers/pci/host/pci-tegra.c
>>>> @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
>>>>  	}
>>>>  	/* setup AFI/FPCI range */
>>>> -	msi->pages = __get_free_pages(GFP_KERNEL, 0);
>>>> +	msi->pages = __get_free_pages(GFP_DMA, 0);
>>>>  	msi->phys = virt_to_phys((void *)msi->pages);
>>
>>> Should this be GFP_DMA32?  See the comment above the GFP_DMA
>>> definition.
>>
>> looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
>> is the correct one to use, but, even with that I got >32-bit addresses.
>> GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
>> I didn't dig into it to find out why is this the case.
>
> This sounds worth looking into (but maybe we don't need the
> __get_free_pages() at all; see below).  Maybe there's some underlying
> bug.  My laptop shows this, which looks like it might be related:
>
>   Zone ranges:
>     DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>     DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
>     Normal   [mem 0x0000000100000000-0x00000004217fffff]
>     Device   empty
>
> What does your machine show?
>
>>> Should we be using virt_to_phys() here?  Where exactly is the
>>> result ("msi->phys") used, i.e., what bus will that address appear
>>> on?  If it appears on the PCI side, this should probably use
>>> something like pcibios_resource_to_bus().
>>
>> This address is written to two places.  First, into host's internal
>> register to let it know that when an incoming memory write comes
>> with this address, raise an MSI interrupt instead of forwarding it
>> to memory subsystem.  Second, into 'Message Address' field of
>> 'Message Address Register for MSI' register in end point's
>> configuration space (this is done by MSI framework) for end point to
>> know which address to be used to generate MSI interrupt.
>
> Hmmm, ISTR some past discussion about this.  Here's a little: [1, 2].
> And this commit [3] sounds like it describes a similar hardware
> situation with Tegra where the host bridge intercepts the MSI target
> address, so writes to it never reach system memory.  That means that
> Tegra doesn't need to allocate system memory at all.
>
> Is your system similar?  Can you just statically allocate a little bus
> address space, use that for the MSI target address, and skip the
> __get_free_pages()?
>
>>> Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
>>> similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
>>> then write the result of virt_to_phys() using a 32-bit register
>>> write.
>>
>> Well, if those systems deal with 64-bit addresses and when an end
>> point is connected which supports only 32-bit MSI addresses, this
>> problem will surface when __get_free_pages() returns an address that
>> translates to a >32-bit address after virt_to_phys() call on it.
>
> I'd like to hear from the R-Car and Xilinx folks about (1) whether
> there's a potential issue with truncating a 64-bit address, and
> (2) whether that hardware works like Tegra, where the MSI write never
> reaches memory so we don't actually need to allocate a page.
>

it is so, at least drivers/pci/host/pcie-xilinx.c driver, MSI data is 
never read from the msi_pages physical
memory allocated, data is always intercepted, caught and fed from one of 
the HW FIFO register XILINX_PCIE_REG_RPIFR2. Here too __get_free_pages 
was used to allocate msi_pages just to get a real memory address to 
program in the MSI address in PCIe endpoints config register.

MSI handling code from pcie-xilinx.c

	if (status & (XILINX_PCIE_INTR_INTX | XILINX_PCIE_INTR_MSI)) {
		.
		.

		/* Decode the IRQ number */
		if (val & XILINX_PCIE_RPIFR1_MSI_INTR) {
			val = pcie_read(port, XILINX_PCIE_REG_RPIFR2) &
				XILINX_PCIE_RPIFR2_MSG_DATA;
		} else {
			val = (val & XILINX_PCIE_RPIFR1_INTR_MASK) >>
				XILINX_PCIE_RPIFR1_INTR_SHIFT;
			val = irq_find_mapping(port->leg_domain, val);
		}
		
		.
		.
		
		/* Handle the interrupt */
		if (IS_ENABLED(CONFIG_PCI_MSI) ||
		    !(val & XILINX_PCIE_RPIFR1_MSI_INTR))
			generic_handle_irq(val);
	}

> If all we need is to allocate a little bus address space for the MSI
> target, I'd like to figure out a better way to do that than
> __get_free_pages().  The current code seems a little buggy, and
> it's getting propagated through several drivers.
>
>>>>  	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);
>
> [1] https://lkml.kernel.org/r/20170824134451.GA31858-1RhO1Y9PlrlHTL0Zs8A6p5iNqAH0jzoTYJqu5kTmcBRl57MIdRCFDg@public.gmane.org
> [2] https://lkml.kernel.org/r/86efs3wesi.fsf-5wv7dgnIgG8@public.gmane.org
> [3] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d7bd554f27c9
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
@ 2017-11-10  0:47                 ` subrahmanya_lingappa
  0 siblings, 0 replies; 30+ messages in thread
From: subrahmanya_lingappa @ 2017-11-10  0:47 UTC (permalink / raw)
  To: Bjorn Helgaas, Vidya Sagar
  Cc: treding, bhelgaas, linux-tegra, linux-pci, kthota, swarren,
	mmaddireddy, Michal Simek, Sören Brinkmann, Simon Horman,
	Lorenzo Pieralisi

Bjorn,

On 11/9/2017 11:44 PM, Bjorn Helgaas wrote:
> [+cc Lorenzo]
>
> On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
>> On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
>>> On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
>>>> limits MSI target address to only 32-bit region to enable
>>>> some of the PCIe end points where only 32-bit MSIs
>>>> are supported work properly.
>>>> One example being Marvel SATA controller
>>>>
>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>>>> ---
>>>>  drivers/pci/host/pci-tegra.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
>>>> index 1987fec1f126..03d3dcdd06c2 100644
>>>> --- a/drivers/pci/host/pci-tegra.c
>>>> +++ b/drivers/pci/host/pci-tegra.c
>>>> @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
>>>>  	}
>>>>  	/* setup AFI/FPCI range */
>>>> -	msi->pages = __get_free_pages(GFP_KERNEL, 0);
>>>> +	msi->pages = __get_free_pages(GFP_DMA, 0);
>>>>  	msi->phys = virt_to_phys((void *)msi->pages);
>>
>>> Should this be GFP_DMA32?  See the comment above the GFP_DMA
>>> definition.
>>
>> looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
>> is the correct one to use, but, even with that I got >32-bit addresses.
>> GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
>> I didn't dig into it to find out why is this the case.
>
> This sounds worth looking into (but maybe we don't need the
> __get_free_pages() at all; see below).  Maybe there's some underlying
> bug.  My laptop shows this, which looks like it might be related:
>
>   Zone ranges:
>     DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>     DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
>     Normal   [mem 0x0000000100000000-0x00000004217fffff]
>     Device   empty
>
> What does your machine show?
>
>>> Should we be using virt_to_phys() here?  Where exactly is the
>>> result ("msi->phys") used, i.e., what bus will that address appear
>>> on?  If it appears on the PCI side, this should probably use
>>> something like pcibios_resource_to_bus().
>>
>> This address is written to two places.  First, into host's internal
>> register to let it know that when an incoming memory write comes
>> with this address, raise an MSI interrupt instead of forwarding it
>> to memory subsystem.  Second, into 'Message Address' field of
>> 'Message Address Register for MSI' register in end point's
>> configuration space (this is done by MSI framework) for end point to
>> know which address to be used to generate MSI interrupt.
>
> Hmmm, ISTR some past discussion about this.  Here's a little: [1, 2].
> And this commit [3] sounds like it describes a similar hardware
> situation with Tegra where the host bridge intercepts the MSI target
> address, so writes to it never reach system memory.  That means that
> Tegra doesn't need to allocate system memory at all.
>
> Is your system similar?  Can you just statically allocate a little bus
> address space, use that for the MSI target address, and skip the
> __get_free_pages()?
>
>>> Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
>>> similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
>>> then write the result of virt_to_phys() using a 32-bit register
>>> write.
>>
>> Well, if those systems deal with 64-bit addresses and when an end
>> point is connected which supports only 32-bit MSI addresses, this
>> problem will surface when __get_free_pages() returns an address that
>> translates to a >32-bit address after virt_to_phys() call on it.
>
> I'd like to hear from the R-Car and Xilinx folks about (1) whether
> there's a potential issue with truncating a 64-bit address, and
> (2) whether that hardware works like Tegra, where the MSI write never
> reaches memory so we don't actually need to allocate a page.
>

it is so, at least drivers/pci/host/pcie-xilinx.c driver, MSI data is 
never read from the msi_pages physical
memory allocated, data is always intercepted, caught and fed from one of 
the HW FIFO register XILINX_PCIE_REG_RPIFR2. Here too __get_free_pages 
was used to allocate msi_pages just to get a real memory address to 
program in the MSI address in PCIe endpoints config register.

MSI handling code from pcie-xilinx.c

	if (status & (XILINX_PCIE_INTR_INTX | XILINX_PCIE_INTR_MSI)) {
		.
		.

		/* Decode the IRQ number */
		if (val & XILINX_PCIE_RPIFR1_MSI_INTR) {
			val = pcie_read(port, XILINX_PCIE_REG_RPIFR2) &
				XILINX_PCIE_RPIFR2_MSG_DATA;
		} else {
			val = (val & XILINX_PCIE_RPIFR1_INTR_MASK) >>
				XILINX_PCIE_RPIFR1_INTR_SHIFT;
			val = irq_find_mapping(port->leg_domain, val);
		}
		
		.
		.
		
		/* Handle the interrupt */
		if (IS_ENABLED(CONFIG_PCI_MSI) ||
		    !(val & XILINX_PCIE_RPIFR1_MSI_INTR))
			generic_handle_irq(val);
	}

> If all we need is to allocate a little bus address space for the MSI
> target, I'd like to figure out a better way to do that than
> __get_free_pages().  The current code seems a little buggy, and
> it's getting propagated through several drivers.
>
>>>>  	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);
>
> [1] https://lkml.kernel.org/r/20170824134451.GA31858@bhelgaas-glaptop.roam.corp.google.com
> [2] https://lkml.kernel.org/r/86efs3wesi.fsf@arm.com
> [3] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d7bd554f27c9
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
  2017-11-09 18:41               ` Vidya Sagar
@ 2017-11-10  9:37                   ` Thierry Reding
  -1 siblings, 0 replies; 30+ messages in thread
From: Thierry Reding @ 2017-11-10  9:37 UTC (permalink / raw)
  To: Vidya Sagar
  Cc: Bjorn Helgaas, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, kthota-DDmLM1+adcrQT0dZR+AlfA,
	swarren-DDmLM1+adcrQT0dZR+AlfA,
	mmaddireddy-DDmLM1+adcrQT0dZR+AlfA, Michal Simek,
	Sören Brinkmann, Simon Horman, Lorenzo Pieralisi,
	Arnd Bergmann

[-- Attachment #1: Type: text/plain, Size: 3114 bytes --]

On Fri, Nov 10, 2017 at 12:11:05AM +0530, Vidya Sagar wrote:
> 
> 
> On Thursday 09 November 2017 11:44 PM, Bjorn Helgaas wrote:
> > [+cc Lorenzo]
> > 
> > On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
> > > On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
> > > > On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
> > > > > limits MSI target address to only 32-bit region to enable
> > > > > some of the PCIe end points where only 32-bit MSIs
> > > > > are supported work properly.
> > > > > One example being Marvel SATA controller
> > > > > 
> > > > > Signed-off-by: Vidya Sagar <vidyas-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> > > > > ---
> > > > >   drivers/pci/host/pci-tegra.c | 2 +-
> > > > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
> > > > > index 1987fec1f126..03d3dcdd06c2 100644
> > > > > --- a/drivers/pci/host/pci-tegra.c
> > > > > +++ b/drivers/pci/host/pci-tegra.c
> > > > > @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
> > > > >   	}
> > > > >   	/* setup AFI/FPCI range */
> > > > > -	msi->pages = __get_free_pages(GFP_KERNEL, 0);
> > > > > +	msi->pages = __get_free_pages(GFP_DMA, 0);
> > > > >   	msi->phys = virt_to_phys((void *)msi->pages);
> > > > Should this be GFP_DMA32?  See the comment above the GFP_DMA
> > > > definition.
> > > looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
> > > is the correct one to use, but, even with that I got >32-bit addresses.
> > > GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
> > > I didn't dig into it to find out why is this the case.
> > This sounds worth looking into (but maybe we don't need the
> > __get_free_pages() at all; see below).  Maybe there's some underlying
> > bug.  My laptop shows this, which looks like it might be related:
> > 
> >    Zone ranges:
> >      DMA      [mem 0x0000000000001000-0x0000000000ffffff]
> >      DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
> >      Normal   [mem 0x0000000100000000-0x00000004217fffff]
> >      Device   empty
> > 
> > What does your machine show?
> I see following in my linux box
>  Zone ranges:
>    DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>    DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
>    Normal   [mem 0x0000000100000000-0x000000106effffff]
>    Device   empty
> 
> and following in my T210 Tegra platform
> Zone ranges:
>   DMA      [mem 0x0000000080000000-0x00000000ffffffff]
>   Normal   [mem 0x0000000100000000-0x000000017fffffff]

This seems to be happening because 64-bit ARM doesn't have the
ZONE_DMA32 Kconfig option, which seems to cause the DMA32 zone
to default to the normal zone (see include/linux/gfp.h).

That's very confusing in conjunction with the kerneldoc comment
for GFP_DMA32 because it isn't actually guaranteed to give you
32-bit addresses for !ZONE_DMA32.

Cc'ing Arnd who knows about these things.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
@ 2017-11-10  9:37                   ` Thierry Reding
  0 siblings, 0 replies; 30+ messages in thread
From: Thierry Reding @ 2017-11-10  9:37 UTC (permalink / raw)
  To: Vidya Sagar
  Cc: Bjorn Helgaas, bhelgaas, linux-tegra, linux-pci, kthota, swarren,
	mmaddireddy, Michal Simek, Sören Brinkmann, Simon Horman,
	Lorenzo Pieralisi, Arnd Bergmann

[-- Attachment #1: Type: text/plain, Size: 3085 bytes --]

On Fri, Nov 10, 2017 at 12:11:05AM +0530, Vidya Sagar wrote:
> 
> 
> On Thursday 09 November 2017 11:44 PM, Bjorn Helgaas wrote:
> > [+cc Lorenzo]
> > 
> > On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
> > > On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
> > > > On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
> > > > > limits MSI target address to only 32-bit region to enable
> > > > > some of the PCIe end points where only 32-bit MSIs
> > > > > are supported work properly.
> > > > > One example being Marvel SATA controller
> > > > > 
> > > > > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > > > > ---
> > > > >   drivers/pci/host/pci-tegra.c | 2 +-
> > > > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
> > > > > index 1987fec1f126..03d3dcdd06c2 100644
> > > > > --- a/drivers/pci/host/pci-tegra.c
> > > > > +++ b/drivers/pci/host/pci-tegra.c
> > > > > @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
> > > > >   	}
> > > > >   	/* setup AFI/FPCI range */
> > > > > -	msi->pages = __get_free_pages(GFP_KERNEL, 0);
> > > > > +	msi->pages = __get_free_pages(GFP_DMA, 0);
> > > > >   	msi->phys = virt_to_phys((void *)msi->pages);
> > > > Should this be GFP_DMA32?  See the comment above the GFP_DMA
> > > > definition.
> > > looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
> > > is the correct one to use, but, even with that I got >32-bit addresses.
> > > GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
> > > I didn't dig into it to find out why is this the case.
> > This sounds worth looking into (but maybe we don't need the
> > __get_free_pages() at all; see below).  Maybe there's some underlying
> > bug.  My laptop shows this, which looks like it might be related:
> > 
> >    Zone ranges:
> >      DMA      [mem 0x0000000000001000-0x0000000000ffffff]
> >      DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
> >      Normal   [mem 0x0000000100000000-0x00000004217fffff]
> >      Device   empty
> > 
> > What does your machine show?
> I see following in my linux box
>  Zone ranges:
>    DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>    DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
>    Normal   [mem 0x0000000100000000-0x000000106effffff]
>    Device   empty
> 
> and following in my T210 Tegra platform
> Zone ranges:
>   DMA      [mem 0x0000000080000000-0x00000000ffffffff]
>   Normal   [mem 0x0000000100000000-0x000000017fffffff]

This seems to be happening because 64-bit ARM doesn't have the
ZONE_DMA32 Kconfig option, which seems to cause the DMA32 zone
to default to the normal zone (see include/linux/gfp.h).

That's very confusing in conjunction with the kerneldoc comment
for GFP_DMA32 because it isn't actually guaranteed to give you
32-bit addresses for !ZONE_DMA32.

Cc'ing Arnd who knows about these things.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
  2017-11-10  0:47                 ` subrahmanya_lingappa
@ 2017-11-10  9:44                     ` Thierry Reding
  -1 siblings, 0 replies; 30+ messages in thread
From: Thierry Reding @ 2017-11-10  9:44 UTC (permalink / raw)
  To: subrahmanya_lingappa
  Cc: Bjorn Helgaas, Vidya Sagar, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, kthota-DDmLM1+adcrQT0dZR+AlfA,
	swarren-DDmLM1+adcrQT0dZR+AlfA,
	mmaddireddy-DDmLM1+adcrQT0dZR+AlfA, Michal Simek,
	Sören Brinkmann, Simon Horman, Lorenzo Pieralisi

[-- Attachment #1: Type: text/plain, Size: 5873 bytes --]

On Fri, Nov 10, 2017 at 06:17:16AM +0530, subrahmanya_lingappa wrote:
> Bjorn,
> 
> On 11/9/2017 11:44 PM, Bjorn Helgaas wrote:
> > [+cc Lorenzo]
> > 
> > On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
> > > On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
> > > > On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
> > > > > limits MSI target address to only 32-bit region to enable
> > > > > some of the PCIe end points where only 32-bit MSIs
> > > > > are supported work properly.
> > > > > One example being Marvel SATA controller
> > > > > 
> > > > > Signed-off-by: Vidya Sagar <vidyas-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> > > > > ---
> > > > >  drivers/pci/host/pci-tegra.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
> > > > > index 1987fec1f126..03d3dcdd06c2 100644
> > > > > --- a/drivers/pci/host/pci-tegra.c
> > > > > +++ b/drivers/pci/host/pci-tegra.c
> > > > > @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
> > > > >  	}
> > > > >  	/* setup AFI/FPCI range */
> > > > > -	msi->pages = __get_free_pages(GFP_KERNEL, 0);
> > > > > +	msi->pages = __get_free_pages(GFP_DMA, 0);
> > > > >  	msi->phys = virt_to_phys((void *)msi->pages);
> > > 
> > > > Should this be GFP_DMA32?  See the comment above the GFP_DMA
> > > > definition.
> > > 
> > > looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
> > > is the correct one to use, but, even with that I got >32-bit addresses.
> > > GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
> > > I didn't dig into it to find out why is this the case.
> > 
> > This sounds worth looking into (but maybe we don't need the
> > __get_free_pages() at all; see below).  Maybe there's some underlying
> > bug.  My laptop shows this, which looks like it might be related:
> > 
> >   Zone ranges:
> >     DMA      [mem 0x0000000000001000-0x0000000000ffffff]
> >     DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
> >     Normal   [mem 0x0000000100000000-0x00000004217fffff]
> >     Device   empty
> > 
> > What does your machine show?
> > 
> > > > Should we be using virt_to_phys() here?  Where exactly is the
> > > > result ("msi->phys") used, i.e., what bus will that address appear
> > > > on?  If it appears on the PCI side, this should probably use
> > > > something like pcibios_resource_to_bus().
> > > 
> > > This address is written to two places.  First, into host's internal
> > > register to let it know that when an incoming memory write comes
> > > with this address, raise an MSI interrupt instead of forwarding it
> > > to memory subsystem.  Second, into 'Message Address' field of
> > > 'Message Address Register for MSI' register in end point's
> > > configuration space (this is done by MSI framework) for end point to
> > > know which address to be used to generate MSI interrupt.
> > 
> > Hmmm, ISTR some past discussion about this.  Here's a little: [1, 2].
> > And this commit [3] sounds like it describes a similar hardware
> > situation with Tegra where the host bridge intercepts the MSI target
> > address, so writes to it never reach system memory.  That means that
> > Tegra doesn't need to allocate system memory at all.
> > 
> > Is your system similar?  Can you just statically allocate a little bus
> > address space, use that for the MSI target address, and skip the
> > __get_free_pages()?
> > 
> > > > Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
> > > > similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
> > > > then write the result of virt_to_phys() using a 32-bit register
> > > > write.
> > > 
> > > Well, if those systems deal with 64-bit addresses and when an end
> > > point is connected which supports only 32-bit MSI addresses, this
> > > problem will surface when __get_free_pages() returns an address that
> > > translates to a >32-bit address after virt_to_phys() call on it.
> > 
> > I'd like to hear from the R-Car and Xilinx folks about (1) whether
> > there's a potential issue with truncating a 64-bit address, and
> > (2) whether that hardware works like Tegra, where the MSI write never
> > reaches memory so we don't actually need to allocate a page.
> > 
> 
> it is so, at least drivers/pci/host/pcie-xilinx.c driver, MSI data is never
> read from the msi_pages physical
> memory allocated, data is always intercepted, caught and fed from one of the
> HW FIFO register XILINX_PCIE_REG_RPIFR2. Here too __get_free_pages was used
> to allocate msi_pages just to get a real memory address to program in the
> MSI address in PCIe endpoints config register.
> 
> MSI handling code from pcie-xilinx.c
> 
> 	if (status & (XILINX_PCIE_INTR_INTX | XILINX_PCIE_INTR_MSI)) {
> 		.
> 		.
> 
> 		/* Decode the IRQ number */
> 		if (val & XILINX_PCIE_RPIFR1_MSI_INTR) {
> 			val = pcie_read(port, XILINX_PCIE_REG_RPIFR2) &
> 				XILINX_PCIE_RPIFR2_MSG_DATA;
> 		} else {
> 			val = (val & XILINX_PCIE_RPIFR1_INTR_MASK) >>
> 				XILINX_PCIE_RPIFR1_INTR_SHIFT;
> 			val = irq_find_mapping(port->leg_domain, val);
> 		}
> 		
> 		.
> 		.
> 		
> 		/* Handle the interrupt */
> 		if (IS_ENABLED(CONFIG_PCI_MSI) ||
> 		    !(val & XILINX_PCIE_RPIFR1_MSI_INTR))
> 			generic_handle_irq(val);
> 	}

A side-note: this looks like a bug to me. In case of an MSI, the val
variable will have been overwritten by the MSI decode branch above, so
it seems to me like the conditional down here isn't guaranteed to work
and depends on what MSI was actually received?

Don't you need to store the interrupt status in a variable separate from
the value that you're using to find which interrupt was received?

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
@ 2017-11-10  9:44                     ` Thierry Reding
  0 siblings, 0 replies; 30+ messages in thread
From: Thierry Reding @ 2017-11-10  9:44 UTC (permalink / raw)
  To: subrahmanya_lingappa
  Cc: Bjorn Helgaas, Vidya Sagar, bhelgaas, linux-tegra, linux-pci,
	kthota, swarren, mmaddireddy, Michal Simek, Sören Brinkmann,
	Simon Horman, Lorenzo Pieralisi

[-- Attachment #1: Type: text/plain, Size: 5844 bytes --]

On Fri, Nov 10, 2017 at 06:17:16AM +0530, subrahmanya_lingappa wrote:
> Bjorn,
> 
> On 11/9/2017 11:44 PM, Bjorn Helgaas wrote:
> > [+cc Lorenzo]
> > 
> > On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
> > > On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
> > > > On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
> > > > > limits MSI target address to only 32-bit region to enable
> > > > > some of the PCIe end points where only 32-bit MSIs
> > > > > are supported work properly.
> > > > > One example being Marvel SATA controller
> > > > > 
> > > > > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > > > > ---
> > > > >  drivers/pci/host/pci-tegra.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
> > > > > index 1987fec1f126..03d3dcdd06c2 100644
> > > > > --- a/drivers/pci/host/pci-tegra.c
> > > > > +++ b/drivers/pci/host/pci-tegra.c
> > > > > @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
> > > > >  	}
> > > > >  	/* setup AFI/FPCI range */
> > > > > -	msi->pages = __get_free_pages(GFP_KERNEL, 0);
> > > > > +	msi->pages = __get_free_pages(GFP_DMA, 0);
> > > > >  	msi->phys = virt_to_phys((void *)msi->pages);
> > > 
> > > > Should this be GFP_DMA32?  See the comment above the GFP_DMA
> > > > definition.
> > > 
> > > looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
> > > is the correct one to use, but, even with that I got >32-bit addresses.
> > > GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
> > > I didn't dig into it to find out why is this the case.
> > 
> > This sounds worth looking into (but maybe we don't need the
> > __get_free_pages() at all; see below).  Maybe there's some underlying
> > bug.  My laptop shows this, which looks like it might be related:
> > 
> >   Zone ranges:
> >     DMA      [mem 0x0000000000001000-0x0000000000ffffff]
> >     DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
> >     Normal   [mem 0x0000000100000000-0x00000004217fffff]
> >     Device   empty
> > 
> > What does your machine show?
> > 
> > > > Should we be using virt_to_phys() here?  Where exactly is the
> > > > result ("msi->phys") used, i.e., what bus will that address appear
> > > > on?  If it appears on the PCI side, this should probably use
> > > > something like pcibios_resource_to_bus().
> > > 
> > > This address is written to two places.  First, into host's internal
> > > register to let it know that when an incoming memory write comes
> > > with this address, raise an MSI interrupt instead of forwarding it
> > > to memory subsystem.  Second, into 'Message Address' field of
> > > 'Message Address Register for MSI' register in end point's
> > > configuration space (this is done by MSI framework) for end point to
> > > know which address to be used to generate MSI interrupt.
> > 
> > Hmmm, ISTR some past discussion about this.  Here's a little: [1, 2].
> > And this commit [3] sounds like it describes a similar hardware
> > situation with Tegra where the host bridge intercepts the MSI target
> > address, so writes to it never reach system memory.  That means that
> > Tegra doesn't need to allocate system memory at all.
> > 
> > Is your system similar?  Can you just statically allocate a little bus
> > address space, use that for the MSI target address, and skip the
> > __get_free_pages()?
> > 
> > > > Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
> > > > similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
> > > > then write the result of virt_to_phys() using a 32-bit register
> > > > write.
> > > 
> > > Well, if those systems deal with 64-bit addresses and when an end
> > > point is connected which supports only 32-bit MSI addresses, this
> > > problem will surface when __get_free_pages() returns an address that
> > > translates to a >32-bit address after virt_to_phys() call on it.
> > 
> > I'd like to hear from the R-Car and Xilinx folks about (1) whether
> > there's a potential issue with truncating a 64-bit address, and
> > (2) whether that hardware works like Tegra, where the MSI write never
> > reaches memory so we don't actually need to allocate a page.
> > 
> 
> it is so, at least drivers/pci/host/pcie-xilinx.c driver, MSI data is never
> read from the msi_pages physical
> memory allocated, data is always intercepted, caught and fed from one of the
> HW FIFO register XILINX_PCIE_REG_RPIFR2. Here too __get_free_pages was used
> to allocate msi_pages just to get a real memory address to program in the
> MSI address in PCIe endpoints config register.
> 
> MSI handling code from pcie-xilinx.c
> 
> 	if (status & (XILINX_PCIE_INTR_INTX | XILINX_PCIE_INTR_MSI)) {
> 		.
> 		.
> 
> 		/* Decode the IRQ number */
> 		if (val & XILINX_PCIE_RPIFR1_MSI_INTR) {
> 			val = pcie_read(port, XILINX_PCIE_REG_RPIFR2) &
> 				XILINX_PCIE_RPIFR2_MSG_DATA;
> 		} else {
> 			val = (val & XILINX_PCIE_RPIFR1_INTR_MASK) >>
> 				XILINX_PCIE_RPIFR1_INTR_SHIFT;
> 			val = irq_find_mapping(port->leg_domain, val);
> 		}
> 		
> 		.
> 		.
> 		
> 		/* Handle the interrupt */
> 		if (IS_ENABLED(CONFIG_PCI_MSI) ||
> 		    !(val & XILINX_PCIE_RPIFR1_MSI_INTR))
> 			generic_handle_irq(val);
> 	}

A side-note: this looks like a bug to me. In case of an MSI, the val
variable will have been overwritten by the MSI decode branch above, so
it seems to me like the conditional down here isn't guaranteed to work
and depends on what MSI was actually received?

Don't you need to store the interrupt status in a variable separate from
the value that you're using to find which interrupt was received?

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH] PCI: tegra: limit MSI target address to 32-bit
  2017-11-09 18:41               ` Vidya Sagar
@ 2017-11-10  9:47                 ` David Laight
  -1 siblings, 0 replies; 30+ messages in thread
From: David Laight @ 2017-11-10  9:47 UTC (permalink / raw)
  To: 'Vidya Sagar', Bjorn Helgaas
  Cc: treding, bhelgaas, linux-tegra, linux-pci, kthota, swarren,
	mmaddireddy, Michal Simek, Sören Brinkmann, Simon Horman,
	Lorenzo Pieralisi

From: Vidya Sagar
> Sent: 09 November 2017 18:41
...
> > And this commit [3] sounds like it describes a similar hardware
> > situation with Tegra where the host bridge intercepts the MSI target
> > address, so writes to it never reach system memory.  That means that
> > Tegra doesn't need to allocate system memory at all.
> >
> > Is your system similar?  Can you just statically allocate a little bus
> > address space, use that for the MSI target address, and skip the
> > __get_free_pages()?
>
> It is the same system where MSI memory writes don't really make it to
> system memory. But, the addresses mentioned in that patch don't work.
> we are working with hardware folks on understanding the issue better.
...

You do need to make sure that the address you pick will never
be used for a 'dma' transfer.
If the address needs to be inside the (pcie) physical addresses
for system memory then allocating a physical page may be the
safest way to ensure this doesn't happen.

	David



^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH] PCI: tegra: limit MSI target address to 32-bit
@ 2017-11-10  9:47                 ` David Laight
  0 siblings, 0 replies; 30+ messages in thread
From: David Laight @ 2017-11-10  9:47 UTC (permalink / raw)
  To: 'Vidya Sagar', Bjorn Helgaas
  Cc: treding, bhelgaas, linux-tegra, linux-pci, kthota, swarren,
	mmaddireddy, Michal Simek, Sören Brinkmann, Simon Horman,
	Lorenzo Pieralisi

RnJvbTogVmlkeWEgU2FnYXINCj4gU2VudDogMDkgTm92ZW1iZXIgMjAxNyAxODo0MQ0KLi4uDQo+
ID4gQW5kIHRoaXMgY29tbWl0IFszXSBzb3VuZHMgbGlrZSBpdCBkZXNjcmliZXMgYSBzaW1pbGFy
IGhhcmR3YXJlDQo+ID4gc2l0dWF0aW9uIHdpdGggVGVncmEgd2hlcmUgdGhlIGhvc3QgYnJpZGdl
IGludGVyY2VwdHMgdGhlIE1TSSB0YXJnZXQNCj4gPiBhZGRyZXNzLCBzbyB3cml0ZXMgdG8gaXQg
bmV2ZXIgcmVhY2ggc3lzdGVtIG1lbW9yeS4gIFRoYXQgbWVhbnMgdGhhdA0KPiA+IFRlZ3JhIGRv
ZXNuJ3QgbmVlZCB0byBhbGxvY2F0ZSBzeXN0ZW0gbWVtb3J5IGF0IGFsbC4NCj4gPg0KPiA+IElz
IHlvdXIgc3lzdGVtIHNpbWlsYXI/ICBDYW4geW91IGp1c3Qgc3RhdGljYWxseSBhbGxvY2F0ZSBh
IGxpdHRsZSBidXMNCj4gPiBhZGRyZXNzIHNwYWNlLCB1c2UgdGhhdCBmb3IgdGhlIE1TSSB0YXJn
ZXQgYWRkcmVzcywgYW5kIHNraXAgdGhlDQo+ID4gX19nZXRfZnJlZV9wYWdlcygpPw0KPg0KPiBJ
dCBpcyB0aGUgc2FtZSBzeXN0ZW0gd2hlcmUgTVNJIG1lbW9yeSB3cml0ZXMgZG9uJ3QgcmVhbGx5
IG1ha2UgaXQgdG8NCj4gc3lzdGVtIG1lbW9yeS4gQnV0LCB0aGUgYWRkcmVzc2VzIG1lbnRpb25l
ZCBpbiB0aGF0IHBhdGNoIGRvbid0IHdvcmsuDQo+IHdlIGFyZSB3b3JraW5nIHdpdGggaGFyZHdh
cmUgZm9sa3Mgb24gdW5kZXJzdGFuZGluZyB0aGUgaXNzdWUgYmV0dGVyLg0KLi4uDQoNCllvdSBk
byBuZWVkIHRvIG1ha2Ugc3VyZSB0aGF0IHRoZSBhZGRyZXNzIHlvdSBwaWNrIHdpbGwgbmV2ZXIN
CmJlIHVzZWQgZm9yIGEgJ2RtYScgdHJhbnNmZXIuDQpJZiB0aGUgYWRkcmVzcyBuZWVkcyB0byBi
ZSBpbnNpZGUgdGhlIChwY2llKSBwaHlzaWNhbCBhZGRyZXNzZXMNCmZvciBzeXN0ZW0gbWVtb3J5
IHRoZW4gYWxsb2NhdGluZyBhIHBoeXNpY2FsIHBhZ2UgbWF5IGJlIHRoZQ0Kc2FmZXN0IHdheSB0
byBlbnN1cmUgdGhpcyBkb2Vzbid0IGhhcHBlbi4NCg0KCURhdmlkDQoNCg0K

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
  2017-11-09 18:14             ` Bjorn Helgaas
@ 2017-11-10 11:22                 ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 30+ messages in thread
From: Lorenzo Pieralisi @ 2017-11-10 11:22 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Vidya Sagar, treding-DDmLM1+adcrQT0dZR+AlfA,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, kthota-DDmLM1+adcrQT0dZR+AlfA,
	swarren-DDmLM1+adcrQT0dZR+AlfA,
	mmaddireddy-DDmLM1+adcrQT0dZR+AlfA, Michal Simek,
	Sören Brinkmann, Simon Horman, robin.murphy-5wv7dgnIgG8

[+Robin]

On Thu, Nov 09, 2017 at 12:14:35PM -0600, Bjorn Helgaas wrote:
> [+cc Lorenzo]
> 
> On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
> > On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
> > >On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
> > >>limits MSI target address to only 32-bit region to enable
> > >>some of the PCIe end points where only 32-bit MSIs
> > >>are supported work properly.
> > >>One example being Marvel SATA controller
> > >>
> > >>Signed-off-by: Vidya Sagar <vidyas-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> > >>---
> > >>  drivers/pci/host/pci-tegra.c | 2 +-
> > >>  1 file changed, 1 insertion(+), 1 deletion(-)
> > >>
> > >>diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
> > >>index 1987fec1f126..03d3dcdd06c2 100644
> > >>--- a/drivers/pci/host/pci-tegra.c
> > >>+++ b/drivers/pci/host/pci-tegra.c
> > >>@@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
> > >>  	}
> > >>  	/* setup AFI/FPCI range */
> > >>-	msi->pages = __get_free_pages(GFP_KERNEL, 0);
> > >>+	msi->pages = __get_free_pages(GFP_DMA, 0);
> > >>  	msi->phys = virt_to_phys((void *)msi->pages);
> >
> > >Should this be GFP_DMA32?  See the comment above the GFP_DMA
> > >definition.
> >
> > looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
> > is the correct one to use, but, even with that I got >32-bit addresses.
> > GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
> > I didn't dig into it to find out why is this the case.
> 
> This sounds worth looking into (but maybe we don't need the
> __get_free_pages() at all; see below).  Maybe there's some underlying
> bug.  My laptop shows this, which looks like it might be related:
> 
>   Zone ranges:
>     DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>     DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
>     Normal   [mem 0x0000000100000000-0x00000004217fffff]
>     Device   empty
> 
> What does your machine show?
> 
> > >Should we be using virt_to_phys() here?  Where exactly is the
> > >result ("msi->phys") used, i.e., what bus will that address appear
> > >on?  If it appears on the PCI side, this should probably use
> > >something like pcibios_resource_to_bus().

I had a quick chat with Robin (CC'ed, who has dealt/is dealing with most
of the work this thread relates to) and I think that as things stand,
for MSI physical addresses, an offset between PCI->host address shift is
not really contemplated (ie 1:1 host<->pci is assumed).

> > This address is written to two places.  First, into host's internal
> > register to let it know that when an incoming memory write comes
> > with this address, raise an MSI interrupt instead of forwarding it
> > to memory subsystem.  Second, into 'Message Address' field of
> > 'Message Address Register for MSI' register in end point's
> > configuration space (this is done by MSI framework) for end point to
> > know which address to be used to generate MSI interrupt.
> 
> Hmmm, ISTR some past discussion about this.  Here's a little: [1, 2].
> And this commit [3] sounds like it describes a similar hardware
> situation with Tegra where the host bridge intercepts the MSI target
> address, so writes to it never reach system memory.  That means that
> Tegra doesn't need to allocate system memory at all.
> 
> Is your system similar?  Can you just statically allocate a little bus
> address space, use that for the MSI target address, and skip the
> __get_free_pages()?

IIUC, all these host bridges need is a physical address that is routed
upstream to the host bridge by the PCIe tree (ie it is not part of the
host bridge windows), as long as the host bridge intercepts it (and
endpoints do _not_ reuse it for something else) that should be fine, as
it has been pointed out in this thread, allocating a page is a solution
- there may be others (which are likely to be platform specific).

> > >Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
> > >similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
> > >then write the result of virt_to_phys() using a 32-bit register
> > >write.
> >
> > Well, if those systems deal with 64-bit addresses and when an end
> > point is connected which supports only 32-bit MSI addresses, this
> > problem will surface when __get_free_pages() returns an address that
> > translates to a >32-bit address after virt_to_phys() call on it.
> 
> I'd like to hear from the R-Car and Xilinx folks about (1) whether
> there's a potential issue with truncating a 64-bit address, and
> (2) whether that hardware works like Tegra, where the MSI write never
> reaches memory so we don't actually need to allocate a page.
> 
> If all we need is to allocate a little bus address space for the MSI
> target, I'd like to figure out a better way to do that than
> __get_free_pages().  The current code seems a little buggy, and
> it's getting propagated through several drivers.

I will look into this with Robin's help.

Thanks,
Lorenzo

> > >>  	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);
> 
> [1] https://lkml.kernel.org/r/20170824134451.GA31858-1RhO1Y9PlrlHTL0Zs8A6p5iNqAH0jzoTYJqu5kTmcBRl57MIdRCFDg@public.gmane.org
> [2] https://lkml.kernel.org/r/86efs3wesi.fsf-5wv7dgnIgG8@public.gmane.org
> [3] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d7bd554f27c9

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
@ 2017-11-10 11:22                 ` Lorenzo Pieralisi
  0 siblings, 0 replies; 30+ messages in thread
From: Lorenzo Pieralisi @ 2017-11-10 11:22 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Vidya Sagar, treding, bhelgaas, linux-tegra, linux-pci, kthota,
	swarren, mmaddireddy, Michal Simek, Sören Brinkmann,
	Simon Horman, robin.murphy

[+Robin]

On Thu, Nov 09, 2017 at 12:14:35PM -0600, Bjorn Helgaas wrote:
> [+cc Lorenzo]
> 
> On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
> > On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
> > >On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
> > >>limits MSI target address to only 32-bit region to enable
> > >>some of the PCIe end points where only 32-bit MSIs
> > >>are supported work properly.
> > >>One example being Marvel SATA controller
> > >>
> > >>Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > >>---
> > >>  drivers/pci/host/pci-tegra.c | 2 +-
> > >>  1 file changed, 1 insertion(+), 1 deletion(-)
> > >>
> > >>diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
> > >>index 1987fec1f126..03d3dcdd06c2 100644
> > >>--- a/drivers/pci/host/pci-tegra.c
> > >>+++ b/drivers/pci/host/pci-tegra.c
> > >>@@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
> > >>  	}
> > >>  	/* setup AFI/FPCI range */
> > >>-	msi->pages = __get_free_pages(GFP_KERNEL, 0);
> > >>+	msi->pages = __get_free_pages(GFP_DMA, 0);
> > >>  	msi->phys = virt_to_phys((void *)msi->pages);
> >
> > >Should this be GFP_DMA32?  See the comment above the GFP_DMA
> > >definition.
> >
> > looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
> > is the correct one to use, but, even with that I got >32-bit addresses.
> > GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
> > I didn't dig into it to find out why is this the case.
> 
> This sounds worth looking into (but maybe we don't need the
> __get_free_pages() at all; see below).  Maybe there's some underlying
> bug.  My laptop shows this, which looks like it might be related:
> 
>   Zone ranges:
>     DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>     DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
>     Normal   [mem 0x0000000100000000-0x00000004217fffff]
>     Device   empty
> 
> What does your machine show?
> 
> > >Should we be using virt_to_phys() here?  Where exactly is the
> > >result ("msi->phys") used, i.e., what bus will that address appear
> > >on?  If it appears on the PCI side, this should probably use
> > >something like pcibios_resource_to_bus().

I had a quick chat with Robin (CC'ed, who has dealt/is dealing with most
of the work this thread relates to) and I think that as things stand,
for MSI physical addresses, an offset between PCI->host address shift is
not really contemplated (ie 1:1 host<->pci is assumed).

> > This address is written to two places.  First, into host's internal
> > register to let it know that when an incoming memory write comes
> > with this address, raise an MSI interrupt instead of forwarding it
> > to memory subsystem.  Second, into 'Message Address' field of
> > 'Message Address Register for MSI' register in end point's
> > configuration space (this is done by MSI framework) for end point to
> > know which address to be used to generate MSI interrupt.
> 
> Hmmm, ISTR some past discussion about this.  Here's a little: [1, 2].
> And this commit [3] sounds like it describes a similar hardware
> situation with Tegra where the host bridge intercepts the MSI target
> address, so writes to it never reach system memory.  That means that
> Tegra doesn't need to allocate system memory at all.
> 
> Is your system similar?  Can you just statically allocate a little bus
> address space, use that for the MSI target address, and skip the
> __get_free_pages()?

IIUC, all these host bridges need is a physical address that is routed
upstream to the host bridge by the PCIe tree (ie it is not part of the
host bridge windows), as long as the host bridge intercepts it (and
endpoints do _not_ reuse it for something else) that should be fine, as
it has been pointed out in this thread, allocating a page is a solution
- there may be others (which are likely to be platform specific).

> > >Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
> > >similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
> > >then write the result of virt_to_phys() using a 32-bit register
> > >write.
> >
> > Well, if those systems deal with 64-bit addresses and when an end
> > point is connected which supports only 32-bit MSI addresses, this
> > problem will surface when __get_free_pages() returns an address that
> > translates to a >32-bit address after virt_to_phys() call on it.
> 
> I'd like to hear from the R-Car and Xilinx folks about (1) whether
> there's a potential issue with truncating a 64-bit address, and
> (2) whether that hardware works like Tegra, where the MSI write never
> reaches memory so we don't actually need to allocate a page.
> 
> If all we need is to allocate a little bus address space for the MSI
> target, I'd like to figure out a better way to do that than
> __get_free_pages().  The current code seems a little buggy, and
> it's getting propagated through several drivers.

I will look into this with Robin's help.

Thanks,
Lorenzo

> > >>  	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);
> 
> [1] https://lkml.kernel.org/r/20170824134451.GA31858@bhelgaas-glaptop.roam.corp.google.com
> [2] https://lkml.kernel.org/r/86efs3wesi.fsf@arm.com
> [3] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d7bd554f27c9

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
  2017-11-10  9:37                   ` Thierry Reding
@ 2017-11-10 11:57                     ` Arnd Bergmann
  -1 siblings, 0 replies; 30+ messages in thread
From: Arnd Bergmann @ 2017-11-10 11:57 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Vidya Sagar, Bjorn Helgaas, Bjorn Helgaas,
	open list:TEGRA ARCHITECTURE SUPPORT, linux-pci,
	kthota-DDmLM1+adcrQT0dZR+AlfA, swarren-DDmLM1+adcrQT0dZR+AlfA,
	mmaddireddy-DDmLM1+adcrQT0dZR+AlfA, Michal Simek,
	Sören Brinkmann, Simon Horman, Lorenzo Pieralisi

On Fri, Nov 10, 2017 at 10:37 AM, Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> wrote:
> On Fri, Nov 10, 2017 at 12:11:05AM +0530, Vidya Sagar wrote:
>> On Thursday 09 November 2017 11:44 PM, Bjorn Helgaas wrote:
>> > [+cc Lorenzo]
>> > On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
>> > > On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
>> > > > On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
>> > > > > limits MSI target address to only 32-bit region to enable
>> > > > > some of the PCIe end points where only 32-bit MSIs
>> > > > > are supported work properly.
>> > > > > One example being Marvel SATA controller
>> > > > >
>> > > > > Signed-off-by: Vidya Sagar <vidyas-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
>> > > > > ---
>> > > > >   drivers/pci/host/pci-tegra.c | 2 +-
>> > > > >   1 file changed, 1 insertion(+), 1 deletion(-)
>> > > > >
>> > > > > diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
>> > > > > index 1987fec1f126..03d3dcdd06c2 100644
>> > > > > --- a/drivers/pci/host/pci-tegra.c
>> > > > > +++ b/drivers/pci/host/pci-tegra.c
>> > > > > @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
>> > > > >       }
>> > > > >       /* setup AFI/FPCI range */
>> > > > > -     msi->pages = __get_free_pages(GFP_KERNEL, 0);
>> > > > > +     msi->pages = __get_free_pages(GFP_DMA, 0);
>> > > > >       msi->phys = virt_to_phys((void *)msi->pages);
>> > > > Should this be GFP_DMA32?  See the comment above the GFP_DMA
>> > > > definition.
>> > > looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
>> > > is the correct one to use, but, even with that I got >32-bit addresses.
>> > > GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
>> > > I didn't dig into it to find out why is this the case.
>> > This sounds worth looking into (but maybe we don't need the
>> > __get_free_pages() at all; see below).  Maybe there's some underlying
>> > bug.  My laptop shows this, which looks like it might be related:
>> >
>> >    Zone ranges:
>> >      DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>> >      DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
>> >      Normal   [mem 0x0000000100000000-0x00000004217fffff]
>> >      Device   empty
>> >
>> > What does your machine show?
>> I see following in my linux box
>>  Zone ranges:
>>    DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>>    DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
>>    Normal   [mem 0x0000000100000000-0x000000106effffff]
>>    Device   empty
>>
>> and following in my T210 Tegra platform
>> Zone ranges:
>>   DMA      [mem 0x0000000080000000-0x00000000ffffffff]
>>   Normal   [mem 0x0000000100000000-0x000000017fffffff]
>
> This seems to be happening because 64-bit ARM doesn't have the
> ZONE_DMA32 Kconfig option, which seems to cause the DMA32 zone
> to default to the normal zone (see include/linux/gfp.h).
>
> That's very confusing in conjunction with the kerneldoc comment
> for GFP_DMA32 because it isn't actually guaranteed to give you
> 32-bit addresses for !ZONE_DMA32.
>
> Cc'ing Arnd who knows about these things.

I don't have a good answer unfortunately, but I agree there is clearly
something wrong with the behavior, AFAICT, ZONE_DMA32 should
really do what the name says.

When you look at the commit that introduced ZONE_DMA32, you
see that it used to be different, see commit a2f1b4249007
("[PATCH] x86_64: Add 4GB DMA32 zone"). ZONE_DMA32 used
to fall back to ZONE_DMA, but this got changed shortly after,
still in 2006.

      Arnd

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
@ 2017-11-10 11:57                     ` Arnd Bergmann
  0 siblings, 0 replies; 30+ messages in thread
From: Arnd Bergmann @ 2017-11-10 11:57 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Vidya Sagar, Bjorn Helgaas, Bjorn Helgaas,
	open list:TEGRA ARCHITECTURE SUPPORT, linux-pci, kthota, swarren,
	mmaddireddy, Michal Simek, Sören Brinkmann, Simon Horman,
	Lorenzo Pieralisi

On Fri, Nov 10, 2017 at 10:37 AM, Thierry Reding <treding@nvidia.com> wrote:
> On Fri, Nov 10, 2017 at 12:11:05AM +0530, Vidya Sagar wrote:
>> On Thursday 09 November 2017 11:44 PM, Bjorn Helgaas wrote:
>> > [+cc Lorenzo]
>> > On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
>> > > On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
>> > > > On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
>> > > > > limits MSI target address to only 32-bit region to enable
>> > > > > some of the PCIe end points where only 32-bit MSIs
>> > > > > are supported work properly.
>> > > > > One example being Marvel SATA controller
>> > > > >
>> > > > > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>> > > > > ---
>> > > > >   drivers/pci/host/pci-tegra.c | 2 +-
>> > > > >   1 file changed, 1 insertion(+), 1 deletion(-)
>> > > > >
>> > > > > diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
>> > > > > index 1987fec1f126..03d3dcdd06c2 100644
>> > > > > --- a/drivers/pci/host/pci-tegra.c
>> > > > > +++ b/drivers/pci/host/pci-tegra.c
>> > > > > @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
>> > > > >       }
>> > > > >       /* setup AFI/FPCI range */
>> > > > > -     msi->pages = __get_free_pages(GFP_KERNEL, 0);
>> > > > > +     msi->pages = __get_free_pages(GFP_DMA, 0);
>> > > > >       msi->phys = virt_to_phys((void *)msi->pages);
>> > > > Should this be GFP_DMA32?  See the comment above the GFP_DMA
>> > > > definition.
>> > > looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
>> > > is the correct one to use, but, even with that I got >32-bit addresses.
>> > > GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
>> > > I didn't dig into it to find out why is this the case.
>> > This sounds worth looking into (but maybe we don't need the
>> > __get_free_pages() at all; see below).  Maybe there's some underlying
>> > bug.  My laptop shows this, which looks like it might be related:
>> >
>> >    Zone ranges:
>> >      DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>> >      DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
>> >      Normal   [mem 0x0000000100000000-0x00000004217fffff]
>> >      Device   empty
>> >
>> > What does your machine show?
>> I see following in my linux box
>>  Zone ranges:
>>    DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>>    DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
>>    Normal   [mem 0x0000000100000000-0x000000106effffff]
>>    Device   empty
>>
>> and following in my T210 Tegra platform
>> Zone ranges:
>>   DMA      [mem 0x0000000080000000-0x00000000ffffffff]
>>   Normal   [mem 0x0000000100000000-0x000000017fffffff]
>
> This seems to be happening because 64-bit ARM doesn't have the
> ZONE_DMA32 Kconfig option, which seems to cause the DMA32 zone
> to default to the normal zone (see include/linux/gfp.h).
>
> That's very confusing in conjunction with the kerneldoc comment
> for GFP_DMA32 because it isn't actually guaranteed to give you
> 32-bit addresses for !ZONE_DMA32.
>
> Cc'ing Arnd who knows about these things.

I don't have a good answer unfortunately, but I agree there is clearly
something wrong with the behavior, AFAICT, ZONE_DMA32 should
really do what the name says.

When you look at the commit that introduced ZONE_DMA32, you
see that it used to be different, see commit a2f1b4249007
("[PATCH] x86_64: Add 4GB DMA32 zone"). ZONE_DMA32 used
to fall back to ZONE_DMA, but this got changed shortly after,
still in 2006.

      Arnd

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
  2017-11-10 11:22                 ` Lorenzo Pieralisi
@ 2017-11-10 12:04                   ` Robin Murphy
  -1 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2017-11-10 12:04 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Bjorn Helgaas
  Cc: Vidya Sagar, treding-DDmLM1+adcrQT0dZR+AlfA,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, kthota-DDmLM1+adcrQT0dZR+AlfA,
	swarren-DDmLM1+adcrQT0dZR+AlfA,
	mmaddireddy-DDmLM1+adcrQT0dZR+AlfA, Michal Simek,
	Sören Brinkmann, Simon Horman

On 10/11/17 11:22, Lorenzo Pieralisi wrote:
> [+Robin]
> 
> On Thu, Nov 09, 2017 at 12:14:35PM -0600, Bjorn Helgaas wrote:
>> [+cc Lorenzo]
>>
>> On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
>>> On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
>>>> On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
>>>>> limits MSI target address to only 32-bit region to enable
>>>>> some of the PCIe end points where only 32-bit MSIs
>>>>> are supported work properly.
>>>>> One example being Marvel SATA controller
>>>>>
>>>>> Signed-off-by: Vidya Sagar <vidyas-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
>>>>> ---
>>>>>   drivers/pci/host/pci-tegra.c | 2 +-
>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
>>>>> index 1987fec1f126..03d3dcdd06c2 100644
>>>>> --- a/drivers/pci/host/pci-tegra.c
>>>>> +++ b/drivers/pci/host/pci-tegra.c
>>>>> @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
>>>>>   	}
>>>>>   	/* setup AFI/FPCI range */
>>>>> -	msi->pages = __get_free_pages(GFP_KERNEL, 0);
>>>>> +	msi->pages = __get_free_pages(GFP_DMA, 0);
>>>>>   	msi->phys = virt_to_phys((void *)msi->pages);
>>>
>>>> Should this be GFP_DMA32?  See the comment above the GFP_DMA
>>>> definition.
>>>
>>> looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
>>> is the correct one to use, but, even with that I got >32-bit addresses.
>>> GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
>>> I didn't dig into it to find out why is this the case.
>>
>> This sounds worth looking into (but maybe we don't need the
>> __get_free_pages() at all; see below).  Maybe there's some underlying
>> bug.  My laptop shows this, which looks like it might be related:
>>
>>    Zone ranges:
>>      DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>>      DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
>>      Normal   [mem 0x0000000100000000-0x00000004217fffff]
>>      Device   empty
>>
>> What does your machine show?

The great thing about ZONE_DMA is that it has completely different 
meanings per platform. ZONE_DMA32 is even worse as it's more or less 
just an x86 thing, and isn't implemented (thus does nothing) on most 
other architectures, including ARM/arm64 where Tegra is relevant.

Fun.

>>>> Should we be using virt_to_phys() here?  Where exactly is the
>>>> result ("msi->phys") used, i.e., what bus will that address appear
>>>> on?  If it appears on the PCI side, this should probably use
>>>> something like pcibios_resource_to_bus().
> 
> I had a quick chat with Robin (CC'ed, who has dealt/is dealing with most
> of the work this thread relates to) and I think that as things stand,
> for MSI physical addresses, an offset between PCI->host address shift is
> not really contemplated (ie 1:1 host<->pci is assumed).

I think the most correct way to generate an address would actually be 
via the DMA mapping API (i.e. dma_map_page() with the host bridge's PCI 
device), which would take any relevant offsets into account. It's a bit 
funky, but there's some method in the madness.

>>> This address is written to two places.  First, into host's internal
>>> register to let it know that when an incoming memory write comes
>>> with this address, raise an MSI interrupt instead of forwarding it
>>> to memory subsystem.  Second, into 'Message Address' field of
>>> 'Message Address Register for MSI' register in end point's
>>> configuration space (this is done by MSI framework) for end point to
>>> know which address to be used to generate MSI interrupt.
>>
>> Hmmm, ISTR some past discussion about this.  Here's a little: [1, 2].
>> And this commit [3] sounds like it describes a similar hardware
>> situation with Tegra where the host bridge intercepts the MSI target
>> address, so writes to it never reach system memory.  That means that
>> Tegra doesn't need to allocate system memory at all.
>>
>> Is your system similar?  Can you just statically allocate a little bus
>> address space, use that for the MSI target address, and skip the
>> __get_free_pages()?
> 
> IIUC, all these host bridges need is a physical address that is routed
> upstream to the host bridge by the PCIe tree (ie it is not part of the
> host bridge windows), as long as the host bridge intercepts it (and
> endpoints do _not_ reuse it for something else) that should be fine, as
> it has been pointed out in this thread, allocating a page is a solution
> - there may be others (which are likely to be platform specific).
> 
>>>> Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
>>>> similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
>>>> then write the result of virt_to_phys() using a 32-bit register
>>>> write.
>>>
>>> Well, if those systems deal with 64-bit addresses and when an end
>>> point is connected which supports only 32-bit MSI addresses, this
>>> problem will surface when __get_free_pages() returns an address that
>>> translates to a >32-bit address after virt_to_phys() call on it.
>>
>> I'd like to hear from the R-Car and Xilinx folks about (1) whether
>> there's a potential issue with truncating a 64-bit address, and
>> (2) whether that hardware works like Tegra, where the MSI write never
>> reaches memory so we don't actually need to allocate a page.
>>
>> If all we need is to allocate a little bus address space for the MSI
>> target, I'd like to figure out a better way to do that than
>> __get_free_pages().  The current code seems a little buggy, and
>> it's getting propagated through several drivers.

The really neat version is to take a known non-memory physical address 
like the host controller's own MMIO region, which has no legitimate 
reason to ever be used as a DMA address. pcie-mediatek almost gets this 
right, but by using virt_to_phys() on an ioremapped address they end up 
with nonsense rather than the correct address (although realistically 
you would have to be extremely unlucky for said nonsense to collide with 
a real DMA address given to a PCI endpoint later). Following on from 
above, dma_map_resource() would be the foolproof way to get that right.

Robin.

> 
> I will look into this with Robin's help.
> 
> Thanks,
> Lorenzo
> 
>>>>>   	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);
>>
>> [1] https://lkml.kernel.org/r/20170824134451.GA31858-1RhO1Y9PlrlHTL0Zs8A6p5iNqAH0jzoTYJqu5kTmcBRl57MIdRCFDg@public.gmane.org
>> [2] https://lkml.kernel.org/r/86efs3wesi.fsf-5wv7dgnIgG8@public.gmane.org
>> [3] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d7bd554f27c9

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
@ 2017-11-10 12:04                   ` Robin Murphy
  0 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2017-11-10 12:04 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Bjorn Helgaas
  Cc: Vidya Sagar, treding, bhelgaas, linux-tegra, linux-pci, kthota,
	swarren, mmaddireddy, Michal Simek, Sören Brinkmann,
	Simon Horman

On 10/11/17 11:22, Lorenzo Pieralisi wrote:
> [+Robin]
> 
> On Thu, Nov 09, 2017 at 12:14:35PM -0600, Bjorn Helgaas wrote:
>> [+cc Lorenzo]
>>
>> On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
>>> On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
>>>> On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
>>>>> limits MSI target address to only 32-bit region to enable
>>>>> some of the PCIe end points where only 32-bit MSIs
>>>>> are supported work properly.
>>>>> One example being Marvel SATA controller
>>>>>
>>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>>>>> ---
>>>>>   drivers/pci/host/pci-tegra.c | 2 +-
>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
>>>>> index 1987fec1f126..03d3dcdd06c2 100644
>>>>> --- a/drivers/pci/host/pci-tegra.c
>>>>> +++ b/drivers/pci/host/pci-tegra.c
>>>>> @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
>>>>>   	}
>>>>>   	/* setup AFI/FPCI range */
>>>>> -	msi->pages = __get_free_pages(GFP_KERNEL, 0);
>>>>> +	msi->pages = __get_free_pages(GFP_DMA, 0);
>>>>>   	msi->phys = virt_to_phys((void *)msi->pages);
>>>
>>>> Should this be GFP_DMA32?  See the comment above the GFP_DMA
>>>> definition.
>>>
>>> looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
>>> is the correct one to use, but, even with that I got >32-bit addresses.
>>> GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
>>> I didn't dig into it to find out why is this the case.
>>
>> This sounds worth looking into (but maybe we don't need the
>> __get_free_pages() at all; see below).  Maybe there's some underlying
>> bug.  My laptop shows this, which looks like it might be related:
>>
>>    Zone ranges:
>>      DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>>      DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
>>      Normal   [mem 0x0000000100000000-0x00000004217fffff]
>>      Device   empty
>>
>> What does your machine show?

The great thing about ZONE_DMA is that it has completely different 
meanings per platform. ZONE_DMA32 is even worse as it's more or less 
just an x86 thing, and isn't implemented (thus does nothing) on most 
other architectures, including ARM/arm64 where Tegra is relevant.

Fun.

>>>> Should we be using virt_to_phys() here?  Where exactly is the
>>>> result ("msi->phys") used, i.e., what bus will that address appear
>>>> on?  If it appears on the PCI side, this should probably use
>>>> something like pcibios_resource_to_bus().
> 
> I had a quick chat with Robin (CC'ed, who has dealt/is dealing with most
> of the work this thread relates to) and I think that as things stand,
> for MSI physical addresses, an offset between PCI->host address shift is
> not really contemplated (ie 1:1 host<->pci is assumed).

I think the most correct way to generate an address would actually be 
via the DMA mapping API (i.e. dma_map_page() with the host bridge's PCI 
device), which would take any relevant offsets into account. It's a bit 
funky, but there's some method in the madness.

>>> This address is written to two places.  First, into host's internal
>>> register to let it know that when an incoming memory write comes
>>> with this address, raise an MSI interrupt instead of forwarding it
>>> to memory subsystem.  Second, into 'Message Address' field of
>>> 'Message Address Register for MSI' register in end point's
>>> configuration space (this is done by MSI framework) for end point to
>>> know which address to be used to generate MSI interrupt.
>>
>> Hmmm, ISTR some past discussion about this.  Here's a little: [1, 2].
>> And this commit [3] sounds like it describes a similar hardware
>> situation with Tegra where the host bridge intercepts the MSI target
>> address, so writes to it never reach system memory.  That means that
>> Tegra doesn't need to allocate system memory at all.
>>
>> Is your system similar?  Can you just statically allocate a little bus
>> address space, use that for the MSI target address, and skip the
>> __get_free_pages()?
> 
> IIUC, all these host bridges need is a physical address that is routed
> upstream to the host bridge by the PCIe tree (ie it is not part of the
> host bridge windows), as long as the host bridge intercepts it (and
> endpoints do _not_ reuse it for something else) that should be fine, as
> it has been pointed out in this thread, allocating a page is a solution
> - there may be others (which are likely to be platform specific).
> 
>>>> Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
>>>> similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
>>>> then write the result of virt_to_phys() using a 32-bit register
>>>> write.
>>>
>>> Well, if those systems deal with 64-bit addresses and when an end
>>> point is connected which supports only 32-bit MSI addresses, this
>>> problem will surface when __get_free_pages() returns an address that
>>> translates to a >32-bit address after virt_to_phys() call on it.
>>
>> I'd like to hear from the R-Car and Xilinx folks about (1) whether
>> there's a potential issue with truncating a 64-bit address, and
>> (2) whether that hardware works like Tegra, where the MSI write never
>> reaches memory so we don't actually need to allocate a page.
>>
>> If all we need is to allocate a little bus address space for the MSI
>> target, I'd like to figure out a better way to do that than
>> __get_free_pages().  The current code seems a little buggy, and
>> it's getting propagated through several drivers.

The really neat version is to take a known non-memory physical address 
like the host controller's own MMIO region, which has no legitimate 
reason to ever be used as a DMA address. pcie-mediatek almost gets this 
right, but by using virt_to_phys() on an ioremapped address they end up 
with nonsense rather than the correct address (although realistically 
you would have to be extremely unlucky for said nonsense to collide with 
a real DMA address given to a PCI endpoint later). Following on from 
above, dma_map_resource() would be the foolproof way to get that right.

Robin.

> 
> I will look into this with Robin's help.
> 
> Thanks,
> Lorenzo
> 
>>>>>   	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);
>>
>> [1] https://lkml.kernel.org/r/20170824134451.GA31858@bhelgaas-glaptop.roam.corp.google.com
>> [2] https://lkml.kernel.org/r/86efs3wesi.fsf@arm.com
>> [3] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d7bd554f27c9

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
  2017-11-10 12:04                   ` Robin Murphy
@ 2017-11-10 13:07                       ` Thierry Reding
  -1 siblings, 0 replies; 30+ messages in thread
From: Thierry Reding @ 2017-11-10 13:07 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Lorenzo Pieralisi, Bjorn Helgaas, Vidya Sagar,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, kthota-DDmLM1+adcrQT0dZR+AlfA,
	swarren-DDmLM1+adcrQT0dZR+AlfA,
	mmaddireddy-DDmLM1+adcrQT0dZR+AlfA, Michal Simek,
	Sören Brinkmann, Simon Horman

[-- Attachment #1: Type: text/plain, Size: 7909 bytes --]

On Fri, Nov 10, 2017 at 12:04:21PM +0000, Robin Murphy wrote:
> On 10/11/17 11:22, Lorenzo Pieralisi wrote:
> > [+Robin]
> > 
> > On Thu, Nov 09, 2017 at 12:14:35PM -0600, Bjorn Helgaas wrote:
> > > [+cc Lorenzo]
> > > 
> > > On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
> > > > On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
> > > > > On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
> > > > > > limits MSI target address to only 32-bit region to enable
> > > > > > some of the PCIe end points where only 32-bit MSIs
> > > > > > are supported work properly.
> > > > > > One example being Marvel SATA controller
> > > > > > 
> > > > > > Signed-off-by: Vidya Sagar <vidyas-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> > > > > > ---
> > > > > >   drivers/pci/host/pci-tegra.c | 2 +-
> > > > > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
> > > > > > index 1987fec1f126..03d3dcdd06c2 100644
> > > > > > --- a/drivers/pci/host/pci-tegra.c
> > > > > > +++ b/drivers/pci/host/pci-tegra.c
> > > > > > @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
> > > > > >   	}
> > > > > >   	/* setup AFI/FPCI range */
> > > > > > -	msi->pages = __get_free_pages(GFP_KERNEL, 0);
> > > > > > +	msi->pages = __get_free_pages(GFP_DMA, 0);
> > > > > >   	msi->phys = virt_to_phys((void *)msi->pages);
> > > > 
> > > > > Should this be GFP_DMA32?  See the comment above the GFP_DMA
> > > > > definition.
> > > > 
> > > > looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
> > > > is the correct one to use, but, even with that I got >32-bit addresses.
> > > > GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
> > > > I didn't dig into it to find out why is this the case.
> > > 
> > > This sounds worth looking into (but maybe we don't need the
> > > __get_free_pages() at all; see below).  Maybe there's some underlying
> > > bug.  My laptop shows this, which looks like it might be related:
> > > 
> > >    Zone ranges:
> > >      DMA      [mem 0x0000000000001000-0x0000000000ffffff]
> > >      DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
> > >      Normal   [mem 0x0000000100000000-0x00000004217fffff]
> > >      Device   empty
> > > 
> > > What does your machine show?
> 
> The great thing about ZONE_DMA is that it has completely different meanings
> per platform. ZONE_DMA32 is even worse as it's more or less just an x86
> thing, and isn't implemented (thus does nothing) on most other
> architectures, including ARM/arm64 where Tegra is relevant.
> 
> Fun.
> 
> > > > > Should we be using virt_to_phys() here?  Where exactly is the
> > > > > result ("msi->phys") used, i.e., what bus will that address appear
> > > > > on?  If it appears on the PCI side, this should probably use
> > > > > something like pcibios_resource_to_bus().
> > 
> > I had a quick chat with Robin (CC'ed, who has dealt/is dealing with most
> > of the work this thread relates to) and I think that as things stand,
> > for MSI physical addresses, an offset between PCI->host address shift is
> > not really contemplated (ie 1:1 host<->pci is assumed).
> 
> I think the most correct way to generate an address would actually be via
> the DMA mapping API (i.e. dma_map_page() with the host bridge's PCI device),
> which would take any relevant offsets into account. It's a bit funky, but
> there's some method in the madness.
> 
> > > > This address is written to two places.  First, into host's internal
> > > > register to let it know that when an incoming memory write comes
> > > > with this address, raise an MSI interrupt instead of forwarding it
> > > > to memory subsystem.  Second, into 'Message Address' field of
> > > > 'Message Address Register for MSI' register in end point's
> > > > configuration space (this is done by MSI framework) for end point to
> > > > know which address to be used to generate MSI interrupt.
> > > 
> > > Hmmm, ISTR some past discussion about this.  Here's a little: [1, 2].
> > > And this commit [3] sounds like it describes a similar hardware
> > > situation with Tegra where the host bridge intercepts the MSI target
> > > address, so writes to it never reach system memory.  That means that
> > > Tegra doesn't need to allocate system memory at all.
> > > 
> > > Is your system similar?  Can you just statically allocate a little bus
> > > address space, use that for the MSI target address, and skip the
> > > __get_free_pages()?
> > 
> > IIUC, all these host bridges need is a physical address that is routed
> > upstream to the host bridge by the PCIe tree (ie it is not part of the
> > host bridge windows), as long as the host bridge intercepts it (and
> > endpoints do _not_ reuse it for something else) that should be fine, as
> > it has been pointed out in this thread, allocating a page is a solution
> > - there may be others (which are likely to be platform specific).
> > 
> > > > > Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
> > > > > similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
> > > > > then write the result of virt_to_phys() using a 32-bit register
> > > > > write.
> > > > 
> > > > Well, if those systems deal with 64-bit addresses and when an end
> > > > point is connected which supports only 32-bit MSI addresses, this
> > > > problem will surface when __get_free_pages() returns an address that
> > > > translates to a >32-bit address after virt_to_phys() call on it.
> > > 
> > > I'd like to hear from the R-Car and Xilinx folks about (1) whether
> > > there's a potential issue with truncating a 64-bit address, and
> > > (2) whether that hardware works like Tegra, where the MSI write never
> > > reaches memory so we don't actually need to allocate a page.
> > > 
> > > If all we need is to allocate a little bus address space for the MSI
> > > target, I'd like to figure out a better way to do that than
> > > __get_free_pages().  The current code seems a little buggy, and
> > > it's getting propagated through several drivers.
> 
> The really neat version is to take a known non-memory physical address like
> the host controller's own MMIO region, which has no legitimate reason to
> ever be used as a DMA address. pcie-mediatek almost gets this right, but by
> using virt_to_phys() on an ioremapped address they end up with nonsense
> rather than the correct address (although realistically you would have to be
> extremely unlucky for said nonsense to collide with a real DMA address given
> to a PCI endpoint later). Following on from above, dma_map_resource() would
> be the foolproof way to get that right.

Yes, that was our intention as well. Our initial plan was to use an
address from the PCI aperture within Tegra that wasn't being used for
any other purpose. However, we ran into some odd corner cases where this
wasn't working as expected. As a temporary solution we wanted to move to
GFP_DMA32 (or GFP_DMA) in order to support 32-bit only MSI endpoints.

Eventually we'll want to get rid of the allocation altogether, we just
need to find a set of values that work reliably. In the meantime, it
looks as though GFP_DMA would be the right solution as long as we have
to stick with __get_free_pages().

Alternatively, since we've already verified that the MSI writes are
never committed to memory, we could choose some random address pointing
to system memory as well, but I'm reluctant to do that because it could
end up being confusing for users (and developers) to see some random
address showing up somewhere. A physical address such as the beginning
of system memory should always work and might be unique enough to
indicate that it is special.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
@ 2017-11-10 13:07                       ` Thierry Reding
  0 siblings, 0 replies; 30+ messages in thread
From: Thierry Reding @ 2017-11-10 13:07 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Lorenzo Pieralisi, Bjorn Helgaas, Vidya Sagar, bhelgaas,
	linux-tegra, linux-pci, kthota, swarren, mmaddireddy,
	Michal Simek, Sören Brinkmann, Simon Horman

[-- Attachment #1: Type: text/plain, Size: 7880 bytes --]

On Fri, Nov 10, 2017 at 12:04:21PM +0000, Robin Murphy wrote:
> On 10/11/17 11:22, Lorenzo Pieralisi wrote:
> > [+Robin]
> > 
> > On Thu, Nov 09, 2017 at 12:14:35PM -0600, Bjorn Helgaas wrote:
> > > [+cc Lorenzo]
> > > 
> > > On Thu, Nov 09, 2017 at 12:48:14PM +0530, Vidya Sagar wrote:
> > > > On Thursday 09 November 2017 02:55 AM, Bjorn Helgaas wrote:
> > > > > On Mon, Nov 06, 2017 at 11:33:07PM +0530, Vidya Sagar wrote:
> > > > > > limits MSI target address to only 32-bit region to enable
> > > > > > some of the PCIe end points where only 32-bit MSIs
> > > > > > are supported work properly.
> > > > > > One example being Marvel SATA controller
> > > > > > 
> > > > > > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > > > > > ---
> > > > > >   drivers/pci/host/pci-tegra.c | 2 +-
> > > > > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
> > > > > > index 1987fec1f126..03d3dcdd06c2 100644
> > > > > > --- a/drivers/pci/host/pci-tegra.c
> > > > > > +++ b/drivers/pci/host/pci-tegra.c
> > > > > > @@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
> > > > > >   	}
> > > > > >   	/* setup AFI/FPCI range */
> > > > > > -	msi->pages = __get_free_pages(GFP_KERNEL, 0);
> > > > > > +	msi->pages = __get_free_pages(GFP_DMA, 0);
> > > > > >   	msi->phys = virt_to_phys((void *)msi->pages);
> > > > 
> > > > > Should this be GFP_DMA32?  See the comment above the GFP_DMA
> > > > > definition.
> > > > 
> > > > looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
> > > > is the correct one to use, but, even with that I got >32-bit addresses.
> > > > GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
> > > > I didn't dig into it to find out why is this the case.
> > > 
> > > This sounds worth looking into (but maybe we don't need the
> > > __get_free_pages() at all; see below).  Maybe there's some underlying
> > > bug.  My laptop shows this, which looks like it might be related:
> > > 
> > >    Zone ranges:
> > >      DMA      [mem 0x0000000000001000-0x0000000000ffffff]
> > >      DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
> > >      Normal   [mem 0x0000000100000000-0x00000004217fffff]
> > >      Device   empty
> > > 
> > > What does your machine show?
> 
> The great thing about ZONE_DMA is that it has completely different meanings
> per platform. ZONE_DMA32 is even worse as it's more or less just an x86
> thing, and isn't implemented (thus does nothing) on most other
> architectures, including ARM/arm64 where Tegra is relevant.
> 
> Fun.
> 
> > > > > Should we be using virt_to_phys() here?  Where exactly is the
> > > > > result ("msi->phys") used, i.e., what bus will that address appear
> > > > > on?  If it appears on the PCI side, this should probably use
> > > > > something like pcibios_resource_to_bus().
> > 
> > I had a quick chat with Robin (CC'ed, who has dealt/is dealing with most
> > of the work this thread relates to) and I think that as things stand,
> > for MSI physical addresses, an offset between PCI->host address shift is
> > not really contemplated (ie 1:1 host<->pci is assumed).
> 
> I think the most correct way to generate an address would actually be via
> the DMA mapping API (i.e. dma_map_page() with the host bridge's PCI device),
> which would take any relevant offsets into account. It's a bit funky, but
> there's some method in the madness.
> 
> > > > This address is written to two places.  First, into host's internal
> > > > register to let it know that when an incoming memory write comes
> > > > with this address, raise an MSI interrupt instead of forwarding it
> > > > to memory subsystem.  Second, into 'Message Address' field of
> > > > 'Message Address Register for MSI' register in end point's
> > > > configuration space (this is done by MSI framework) for end point to
> > > > know which address to be used to generate MSI interrupt.
> > > 
> > > Hmmm, ISTR some past discussion about this.  Here's a little: [1, 2].
> > > And this commit [3] sounds like it describes a similar hardware
> > > situation with Tegra where the host bridge intercepts the MSI target
> > > address, so writes to it never reach system memory.  That means that
> > > Tegra doesn't need to allocate system memory at all.
> > > 
> > > Is your system similar?  Can you just statically allocate a little bus
> > > address space, use that for the MSI target address, and skip the
> > > __get_free_pages()?
> > 
> > IIUC, all these host bridges need is a physical address that is routed
> > upstream to the host bridge by the PCIe tree (ie it is not part of the
> > host bridge windows), as long as the host bridge intercepts it (and
> > endpoints do _not_ reuse it for something else) that should be fine, as
> > it has been pointed out in this thread, allocating a page is a solution
> > - there may be others (which are likely to be platform specific).
> > 
> > > > > Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
> > > > > similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
> > > > > then write the result of virt_to_phys() using a 32-bit register
> > > > > write.
> > > > 
> > > > Well, if those systems deal with 64-bit addresses and when an end
> > > > point is connected which supports only 32-bit MSI addresses, this
> > > > problem will surface when __get_free_pages() returns an address that
> > > > translates to a >32-bit address after virt_to_phys() call on it.
> > > 
> > > I'd like to hear from the R-Car and Xilinx folks about (1) whether
> > > there's a potential issue with truncating a 64-bit address, and
> > > (2) whether that hardware works like Tegra, where the MSI write never
> > > reaches memory so we don't actually need to allocate a page.
> > > 
> > > If all we need is to allocate a little bus address space for the MSI
> > > target, I'd like to figure out a better way to do that than
> > > __get_free_pages().  The current code seems a little buggy, and
> > > it's getting propagated through several drivers.
> 
> The really neat version is to take a known non-memory physical address like
> the host controller's own MMIO region, which has no legitimate reason to
> ever be used as a DMA address. pcie-mediatek almost gets this right, but by
> using virt_to_phys() on an ioremapped address they end up with nonsense
> rather than the correct address (although realistically you would have to be
> extremely unlucky for said nonsense to collide with a real DMA address given
> to a PCI endpoint later). Following on from above, dma_map_resource() would
> be the foolproof way to get that right.

Yes, that was our intention as well. Our initial plan was to use an
address from the PCI aperture within Tegra that wasn't being used for
any other purpose. However, we ran into some odd corner cases where this
wasn't working as expected. As a temporary solution we wanted to move to
GFP_DMA32 (or GFP_DMA) in order to support 32-bit only MSI endpoints.

Eventually we'll want to get rid of the allocation altogether, we just
need to find a set of values that work reliably. In the meantime, it
looks as though GFP_DMA would be the right solution as long as we have
to stick with __get_free_pages().

Alternatively, since we've already verified that the MSI writes are
never committed to memory, we could choose some random address pointing
to system memory as well, but I'm reluctant to do that because it could
end up being confusing for users (and developers) to see some random
address showing up somewhere. A physical address such as the beginning
of system memory should always work and might be unique enough to
indicate that it is special.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
  2017-11-10 12:04                   ` Robin Murphy
  (?)
  (?)
@ 2017-11-13 11:06                   ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 30+ messages in thread
From: Lorenzo Pieralisi @ 2017-11-13 11:06 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Bjorn Helgaas, Vidya Sagar, treding, bhelgaas, linux-tegra,
	linux-pci, kthota, swarren, mmaddireddy, Michal Simek,
	Sören Brinkmann, Simon Horman

On Fri, Nov 10, 2017 at 12:04:21PM +0000, Robin Murphy wrote:

[...]

> >>>>Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
> >>>>similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
> >>>>then write the result of virt_to_phys() using a 32-bit register
> >>>>write.
> >>>
> >>>Well, if those systems deal with 64-bit addresses and when an end
> >>>point is connected which supports only 32-bit MSI addresses, this
> >>>problem will surface when __get_free_pages() returns an address that
> >>>translates to a >32-bit address after virt_to_phys() call on it.
> >>
> >>I'd like to hear from the R-Car and Xilinx folks about (1) whether
> >>there's a potential issue with truncating a 64-bit address, and
> >>(2) whether that hardware works like Tegra, where the MSI write never
> >>reaches memory so we don't actually need to allocate a page.
> >>
> >>If all we need is to allocate a little bus address space for the MSI
> >>target, I'd like to figure out a better way to do that than
> >>__get_free_pages().  The current code seems a little buggy, and
> >>it's getting propagated through several drivers.
> 
> The really neat version is to take a known non-memory physical
> address like the host controller's own MMIO region, which has no
> legitimate reason to ever be used as a DMA address.

True - that could be safe enough.

What about IOVAs though ? I suspect we should reserve the MSI address
range - it looks like there is something missing here.

> pcie-mediatek almost gets this right, but by using virt_to_phys() on
> an ioremapped address they end up with nonsense rather than the
> correct address

That definitely needs fixing given that it works by chance.

> (although realistically you would have to be extremely unlucky for
> said nonsense to collide with a real DMA address given to a PCI
> endpoint later). Following on from above, dma_map_resource() would
> be the foolproof way to get that right.

That's how it should be done then lest it trickles into other drivers
requiring a similar set-up.

Thanks,
Lorenzo

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
  2017-11-10 13:07                       ` Thierry Reding
@ 2017-11-20 17:07                         ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 30+ messages in thread
From: Lorenzo Pieralisi @ 2017-11-20 17:07 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Robin Murphy, Bjorn Helgaas, Vidya Sagar,
	bhelgaas-hpIqsD4AKlfQT0dZR+AlfA,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, kthota-DDmLM1+adcrQT0dZR+AlfA,
	swarren-DDmLM1+adcrQT0dZR+AlfA,
	mmaddireddy-DDmLM1+adcrQT0dZR+AlfA, Michal Simek,
	Sören Brinkmann, Simon Horman,
	ryder.lee-NuS5LvNUpcJWk0Htik3J/w

[+ryder]

On Fri, Nov 10, 2017 at 02:07:05PM +0100, Thierry Reding wrote:

[...]

> > The really neat version is to take a known non-memory physical address like
> > the host controller's own MMIO region, which has no legitimate reason to
> > ever be used as a DMA address. pcie-mediatek almost gets this right, but by
> > using virt_to_phys() on an ioremapped address they end up with nonsense
> > rather than the correct address (although realistically you would have to be
> > extremely unlucky for said nonsense to collide with a real DMA address given
> > to a PCI endpoint later). Following on from above, dma_map_resource() would
> > be the foolproof way to get that right.
> 
> Yes, that was our intention as well. Our initial plan was to use an
> address from the PCI aperture within Tegra that wasn't being used for
> any other purpose.

Hi Thierry,

to wrap up this thread, why an address from the PCI aperture and
not a host bridge register ?

I CC'ed Ryder so that he can explain to me please what:

PCIE_MSI_VECTOR and PCIE_IMSI_ADDR offsets in:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/host/pcie-mediatek.c?h=v4.14

register space represent (IIUC the driver uses:

virt_to_phys(port->base + PCIE_MSI_VECTOR);

as MSI doorbell address, which is wrong anyway as Robin explained, just
want to understand how that register was chosen - it is never written or
read in the driver so it is hard to figure that out) to understand
whether the approach can be replicated instead of relying on GFP_DMA
for pages allocation.

Thanks,
Lorenzo

> However, we ran into some odd corner cases where this wasn't working
> as expected. As a temporary solution we wanted to move to GFP_DMA32
> (or GFP_DMA) in order to support 32-bit only MSI endpoints.
> 
> Eventually we'll want to get rid of the allocation altogether, we just
> need to find a set of values that work reliably. In the meantime, it
> looks as though GFP_DMA would be the right solution as long as we have
> to stick with __get_free_pages().
> 
> Alternatively, since we've already verified that the MSI writes are
> never committed to memory, we could choose some random address pointing
> to system memory as well, but I'm reluctant to do that because it could
> end up being confusing for users (and developers) to see some random
> address showing up somewhere. A physical address such as the beginning
> of system memory should always work and might be unique enough to
> indicate that it is special.
> 
> Thierry

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
@ 2017-11-20 17:07                         ` Lorenzo Pieralisi
  0 siblings, 0 replies; 30+ messages in thread
From: Lorenzo Pieralisi @ 2017-11-20 17:07 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Robin Murphy, Bjorn Helgaas, Vidya Sagar, bhelgaas, linux-tegra,
	linux-pci, kthota, swarren, mmaddireddy, Michal Simek,
	Sören Brinkmann, Simon Horman, ryder.lee

[+ryder]

On Fri, Nov 10, 2017 at 02:07:05PM +0100, Thierry Reding wrote:

[...]

> > The really neat version is to take a known non-memory physical address like
> > the host controller's own MMIO region, which has no legitimate reason to
> > ever be used as a DMA address. pcie-mediatek almost gets this right, but by
> > using virt_to_phys() on an ioremapped address they end up with nonsense
> > rather than the correct address (although realistically you would have to be
> > extremely unlucky for said nonsense to collide with a real DMA address given
> > to a PCI endpoint later). Following on from above, dma_map_resource() would
> > be the foolproof way to get that right.
> 
> Yes, that was our intention as well. Our initial plan was to use an
> address from the PCI aperture within Tegra that wasn't being used for
> any other purpose.

Hi Thierry,

to wrap up this thread, why an address from the PCI aperture and
not a host bridge register ?

I CC'ed Ryder so that he can explain to me please what:

PCIE_MSI_VECTOR and PCIE_IMSI_ADDR offsets in:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/host/pcie-mediatek.c?h=v4.14

register space represent (IIUC the driver uses:

virt_to_phys(port->base + PCIE_MSI_VECTOR);

as MSI doorbell address, which is wrong anyway as Robin explained, just
want to understand how that register was chosen - it is never written or
read in the driver so it is hard to figure that out) to understand
whether the approach can be replicated instead of relying on GFP_DMA
for pages allocation.

Thanks,
Lorenzo

> However, we ran into some odd corner cases where this wasn't working
> as expected. As a temporary solution we wanted to move to GFP_DMA32
> (or GFP_DMA) in order to support 32-bit only MSI endpoints.
> 
> Eventually we'll want to get rid of the allocation altogether, we just
> need to find a set of values that work reliably. In the meantime, it
> looks as though GFP_DMA would be the right solution as long as we have
> to stick with __get_free_pages().
> 
> Alternatively, since we've already verified that the MSI writes are
> never committed to memory, we could choose some random address pointing
> to system memory as well, but I'm reluctant to do that because it could
> end up being confusing for users (and developers) to see some random
> address showing up somewhere. A physical address such as the beginning
> of system memory should always work and might be unique enough to
> indicate that it is special.
> 
> Thierry

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] PCI: tegra: limit MSI target address to 32-bit
  2017-11-09 18:41               ` Vidya Sagar
                                 ` (2 preceding siblings ...)
  (?)
@ 2018-03-16 17:23               ` Lorenzo Pieralisi
  -1 siblings, 0 replies; 30+ messages in thread
From: Lorenzo Pieralisi @ 2018-03-16 17:23 UTC (permalink / raw)
  To: Vidya Sagar
  Cc: Bjorn Helgaas, treding, bhelgaas, linux-tegra, linux-pci, kthota,
	swarren, mmaddireddy, Michal Simek, Sören Brinkmann,
	Simon Horman

On Fri, Nov 10, 2017 at 12:11:05AM +0530, Vidya Sagar wrote:

[...]

> >>>>--- a/drivers/pci/host/pci-tegra.c
> >>>>+++ b/drivers/pci/host/pci-tegra.c
> >>>>@@ -1531,7 +1531,7 @@ static int tegra_pcie_enable_msi(struct tegra_pcie *pcie)
> >>>>  	}
> >>>>  	/* setup AFI/FPCI range */
> >>>>-	msi->pages = __get_free_pages(GFP_KERNEL, 0);
> >>>>+	msi->pages = __get_free_pages(GFP_DMA, 0);
> >>>>  	msi->phys = virt_to_phys((void *)msi->pages);
> >>>Should this be GFP_DMA32?  See the comment above the GFP_DMA
> >>>definition.
> >>looking at the comments for both GFP_DMA32 and GFP_DMA, I thought GFP_DMA32
> >>is the correct one to use, but, even with that I got >32-bit addresses.
> >>GFP_DMA always gives addresses in <4GB boundary (i.e. 32-bit).
> >>I didn't dig into it to find out why is this the case.
> >This sounds worth looking into (but maybe we don't need the
> >__get_free_pages() at all; see below).  Maybe there's some underlying
> >bug.  My laptop shows this, which looks like it might be related:
> >
> >   Zone ranges:
> >     DMA      [mem 0x0000000000001000-0x0000000000ffffff]
> >     DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
> >     Normal   [mem 0x0000000100000000-0x00000004217fffff]
> >     Device   empty
> >
> >What does your machine show?
> I see following in my linux box
>  Zone ranges:
>    DMA      [mem 0x0000000000001000-0x0000000000ffffff]
>    DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
>    Normal   [mem 0x0000000100000000-0x000000106effffff]
>    Device   empty
> 
> and following in my T210 Tegra platform
> Zone ranges:
>   DMA      [mem 0x0000000080000000-0x00000000ffffffff]
>   Normal   [mem 0x0000000100000000-0x000000017fffffff]
> 
> >
> >>>Should we be using virt_to_phys() here?  Where exactly is the
> >>>result ("msi->phys") used, i.e., what bus will that address appear
> >>>on?  If it appears on the PCI side, this should probably use
> >>>something like pcibios_resource_to_bus().
> >>This address is written to two places.  First, into host's internal
> >>register to let it know that when an incoming memory write comes
> >>with this address, raise an MSI interrupt instead of forwarding it
> >>to memory subsystem.  Second, into 'Message Address' field of
> >>'Message Address Register for MSI' register in end point's
> >>configuration space (this is done by MSI framework) for end point to
> >>know which address to be used to generate MSI interrupt.
> >Hmmm, ISTR some past discussion about this.  Here's a little: [1, 2].
> >And this commit [3] sounds like it describes a similar hardware
> >situation with Tegra where the host bridge intercepts the MSI target
> >address, so writes to it never reach system memory.  That means that
> >Tegra doesn't need to allocate system memory at all.
> >
> >Is your system similar?  Can you just statically allocate a little bus
> >address space, use that for the MSI target address, and skip the
> >__get_free_pages()?
> It is the same system where MSI memory writes don't really make it to
> system memory. But, the addresses mentioned in that patch don't work.
> we are working with hardware folks on understanding the issue better.
> Meanwhile, using GFP_DMA is giving consistent results and thought this
> can be used as a stop gap solution before we figure out a better bus
> address to be used as MSI target address.

Vidya,

I will mark this as superseded - by Thierry's patch:

https://patchwork.ozlabs.org/patch/848569/

even though we have not reached a real conclusion yet
on that patch - we should take this discussion from the
thread above.

Thank you,
Lorenzo

> >>>Do rcar_pcie_enable_msi() and xilinx_pcie_enable_msi() have a
> >>>similar problem?  They both use GFP_KERNEL, then virt_to_phys(),
> >>>then write the result of virt_to_phys() using a 32-bit register
> >>>write.
> >>Well, if those systems deal with 64-bit addresses and when an end
> >>point is connected which supports only 32-bit MSI addresses, this
> >>problem will surface when __get_free_pages() returns an address that
> >>translates to a >32-bit address after virt_to_phys() call on it.
> >I'd like to hear from the R-Car and Xilinx folks about (1) whether
> >there's a potential issue with truncating a 64-bit address, and
> >(2) whether that hardware works like Tegra, where the MSI write never
> >reaches memory so we don't actually need to allocate a page.
> >
> >If all we need is to allocate a little bus address space for the MSI
> >target, I'd like to figure out a better way to do that than
> >__get_free_pages().  The current code seems a little buggy, and
> >it's getting propagated through several drivers.
> >
> >>>>  	afi_writel(pcie, msi->phys >> soc->msi_base_shift, AFI_MSI_FPCI_BAR_ST);
> >[1] https://lkml.kernel.org/r/20170824134451.GA31858@bhelgaas-glaptop.roam.corp.google.com
> >[2] https://lkml.kernel.org/r/86efs3wesi.fsf@arm.com
> >[3] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d7bd554f27c9
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2018-03-16 17:23 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-06 18:03 [PATCH] PCI: tegra: limit MSI target address to 32-bit Vidya Sagar
2017-11-06 18:03 ` Vidya Sagar
     [not found] ` <1509991387-15951-1-git-send-email-vidyas-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2017-11-08 21:25   ` Bjorn Helgaas
2017-11-08 21:25     ` Bjorn Helgaas
     [not found]     ` <20171108212558.GC21597-1RhO1Y9PlrlHTL0Zs8A6p5iNqAH0jzoTYJqu5kTmcBRl57MIdRCFDg@public.gmane.org>
2017-11-09  7:18       ` Vidya Sagar
2017-11-09  7:18         ` Vidya Sagar
     [not found]         ` <e993e1f8-b6b3-de07-759c-a56012e06a2a-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2017-11-09 18:14           ` Bjorn Helgaas
2017-11-09 18:14             ` Bjorn Helgaas
2017-11-09 18:41             ` Vidya Sagar
2017-11-09 18:41               ` Vidya Sagar
     [not found]               ` <7727b9bc-a44b-4cbe-1839-7e4dd7c2c186-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2017-11-10  9:37                 ` Thierry Reding
2017-11-10  9:37                   ` Thierry Reding
2017-11-10 11:57                   ` Arnd Bergmann
2017-11-10 11:57                     ` Arnd Bergmann
2017-11-10  9:47               ` David Laight
2017-11-10  9:47                 ` David Laight
2018-03-16 17:23               ` Lorenzo Pieralisi
     [not found]             ` <20171109181435.GB7629-1RhO1Y9PlrlHTL0Zs8A6p5iNqAH0jzoTYJqu5kTmcBRl57MIdRCFDg@public.gmane.org>
2017-11-10  0:47               ` subrahmanya_lingappa
2017-11-10  0:47                 ` subrahmanya_lingappa
     [not found]                 ` <1686e861-79ac-75eb-b905-769253c6f79e-DTHOJn6Rh8lhmhkoCovsdw@public.gmane.org>
2017-11-10  9:44                   ` Thierry Reding
2017-11-10  9:44                     ` Thierry Reding
2017-11-10 11:22               ` Lorenzo Pieralisi
2017-11-10 11:22                 ` Lorenzo Pieralisi
2017-11-10 12:04                 ` Robin Murphy
2017-11-10 12:04                   ` Robin Murphy
     [not found]                   ` <3887d62c-98bb-04d5-5b40-96d5b7b43e63-5wv7dgnIgG8@public.gmane.org>
2017-11-10 13:07                     ` Thierry Reding
2017-11-10 13:07                       ` Thierry Reding
2017-11-20 17:07                       ` Lorenzo Pieralisi
2017-11-20 17:07                         ` Lorenzo Pieralisi
2017-11-13 11:06                   ` Lorenzo Pieralisi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.