linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: "Alex Williamson" <alex.williamson@redhat.com>,
	"Jose Ricardo Ziviani" <joserz@linux.ibm.com>,
	"Sam Bobroff" <sbobroff@linux.ibm.com>,
	"Alistair Popple" <alistair@popple.id.au>,
	linuxppc-dev@lists.ozlabs.org, kvm-ppc@vger.kernel.org,
	"Piotr Jaroszynski" <pjaroszynski@nvidia.com>,
	"Oliver O'Halloran" <oohall@gmail.com>,
	"Andrew Donnellan" <andrew.donnellan@au1.ibm.com>,
	"Leonardo Augusto Guimarães Garcia" <lagarcia@br.ibm.com>,
	"Reza Arbab" <arbab@linux.ibm.com>
Subject: Re: [PATCH kernel v3 09/22] powerpc/pseries/iommu: Force default DMA window removal
Date: Mon, 19 Nov 2018 18:28:50 +1100	[thread overview]
Message-ID: <eb939cdf-aef0-5556-9f83-a330f7d3fd95@ozlabs.ru> (raw)
In-Reply-To: <20181116045405.GB23632@umbus>



On 16/11/2018 15:54, David Gibson wrote:
> On Tue, Nov 13, 2018 at 07:28:10PM +1100, Alexey Kardashevskiy wrote:
>> It is quite common for a device to support more than 32bit but less than
>> 64bit for DMA, for example, GPUs often support 42..50bits. However
>> the pseries platform only allows huge DMA window (the one which allows
>> the use of more than 2GB of DMA space) for 64bit-capable devices mostly
>> because:
>>
>> 1. we may have 32bit and >32bit devices on the same IOMMU domain and
>> we cannot place the new big window where the 32bit one is located;
>>
>> 2. the existing hardware only supports the second window at very high
>> offset of 1<<59 == 0x0800.0000.0000.0000.
>>
>> So in order to allow 33..59bit DMA, we have to remove the default DMA
>> window and place a huge one there instead.
>>
>> The PAPR spec says that the platform may decide not to use the default
>> window and remove it using DDW RTAS calls. There are few possible ways
>> for the platform to decide:
>>
>> 1. look at the device IDs and decide in advance that such and such
>> devices are capable of more than 32bit DMA (powernv's sketchy bypass
>> does something like this - it drops the default window if all devices
>> on the PE are from the same vendor) - this is not great as involves
>> guessing because, unlike sketchy bypass, the GPU case involves 2 vendor
>> ids and does not scale;
>>
>> 2. advertise 1 available DMA window in the hypervisor via
>> ibm,query-pe-dma-window so the pseries platform could take it as a clue
>> that if more bits for DMA are needed, it has to remove the default
>> window - this is not great as it is implicit clue rather than direct
>> instruction;
>>
>> 3. removing the default DMA window at all it not really an option as
>> PAPR mandates its presense at the guest boot time;
>>
>> 4. make the hypervisor explicitly tell the guest that the default window
>> is better be removed so the guest does not have to think hard and can
>> simply do what requested and this is what this patch does.
> 
> This approach only makes sense if the hypervisor has better
> information as to what to do that the guest does.  It's not clear to
> me why that would be the case.  Isn't the DMA capabilities of the
> device something the driver should know, in which case it can decide
> based on that?

The device knows it can do 42bits so it will request DMA mask for 42bits
and then the platform has to deal with it, the device has no control
over DMA windows.

Then the platform tries to make everything work, which sadly includes
32bit-DMA devices so the default DMA window stays there and for 42bit
devices there is no other way than to go via the smaller window as the
only other window we can create is beyond the reach of the GPU.

We have so called "sketchy bypass" hack for some other GPUs (which
Christoph is trying to get rid of) at
https://github.com/aik/linux/blob/nv2/arch/powerpc/platforms/powernv/pci-ioda.c#L1885

which is powernv and which seemed a solution there and which I am trying
to reimplement here.


> 
>>
>> This makes use of the latter approach and exploits a new
>> "qemu,dma-force-remove-default" flag in a vPHB.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  arch/powerpc/platforms/pseries/iommu.c | 28 +++++++++++++++++++++++---
>>  1 file changed, 25 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
>> index 9ece42f..78473ac 100644
>> --- a/arch/powerpc/platforms/pseries/iommu.c
>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>> @@ -54,6 +54,7 @@
>>  #include "pseries.h"
>>  
>>  #define DDW_INVALID_OFFSET	((uint64_t)-1)
>> +#define DDW_INVALID_LIOBN	((uint32_t)-1)
>>  
>>  static struct iommu_table_group *iommu_pseries_alloc_group(int node)
>>  {
>> @@ -977,7 +978,8 @@ static LIST_HEAD(failed_ddw_pdn_list);
>>   *
>>   * returns the dma offset for use by dma_set_mask
>>   */
>> -static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>> +static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn,
>> +		u32 default_liobn)
>>  {
>>  	int len, ret;
>>  	struct ddw_query_response query;
>> @@ -1022,6 +1024,16 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>>  	if (ret)
>>  		goto out_failed;
>>  
>> +	/*
>> +	 * The device tree has a request to force remove the default window,
>> +	 * do this.
>> +	 */
>> +	if (default_liobn != DDW_INVALID_LIOBN && (!ddw_avail[2] ||
>> +			rtas_call(ddw_avail[2], 1, 1, NULL, default_liobn))) {
>> +		dev_dbg(&dev->dev, "Could not remove window");
>> +		goto out_failed;
>> +	}
>> +
>>         /*
>>  	 * Query if there is a second window of size to map the
>>  	 * whole partition.  Query returns number of windows, largest
>> @@ -1212,7 +1224,7 @@ static int dma_set_mask_pSeriesLP(struct device *dev, u64 dma_mask)
>>  	pdev = to_pci_dev(dev);
>>  
>>  	/* only attempt to use a new window if 64-bit DMA is requested */
>> -	if (!disable_ddw && dma_mask == DMA_BIT_MASK(64)) {
>> +	if (!disable_ddw && dma_mask > DMA_BIT_MASK(32)) {
>>  		dn = pci_device_to_OF_node(pdev);
>>  		dev_dbg(dev, "node is %pOF\n", dn);
>>  
>> @@ -1229,7 +1241,17 @@ static int dma_set_mask_pSeriesLP(struct device *dev, u64 dma_mask)
>>  				break;
>>  		}
>>  		if (pdn && PCI_DN(pdn)) {
>> -			dma_offset = enable_ddw(pdev, pdn);
>> +			u32 liobn = DDW_INVALID_LIOBN;
>> +			int ret = of_device_is_compatible(pdn, "IBM,npu-vphb");
>> +
>> +			if (ret) {
>> +				dma_window = of_get_property(pdn,
>> +						"ibm,dma-window", NULL);
>> +				if (dma_window)
>> +					liobn = be32_to_cpu(dma_window[0]);
>> +			}
>> +
>> +			dma_offset = enable_ddw(pdev, pdn, liobn);
>>  			if (dma_offset != DDW_INVALID_OFFSET) {
>>  				dev_info(dev, "Using 64-bit direct DMA at offset %llx\n", dma_offset);
>>  				set_dma_offset(dev, dma_offset);
> 

-- 
Alexey

  reply	other threads:[~2018-11-19  7:31 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-13  8:28 [PATCH kernel v3 00/22] powerpc/powernv/npu, vfio: NVIDIA V100 + P9 passthrough Alexey Kardashevskiy
2018-11-13  8:28 ` [PATCH kernel v3 01/22] powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2 Alexey Kardashevskiy
2018-11-13  8:28 ` [PATCH kernel v3 02/22] powerpc/mm/iommu/vfio_spapr_tce: Change mm_iommu_get to reference a region Alexey Kardashevskiy
2018-11-15  5:32   ` David Gibson
2018-11-13  8:28 ` [PATCH kernel v3 03/22] powerpc/mm/iommu: Make mm_iommu_new() fail on existing regions Alexey Kardashevskiy
2018-11-15  5:38   ` David Gibson
2018-11-13  8:28 ` [PATCH kernel v3 04/22] powerpc/vfio/iommu/kvm: Do not pin device memory Alexey Kardashevskiy
2018-11-16  3:11   ` David Gibson
2018-11-13  8:28 ` [PATCH kernel v3 05/22] powerpc/powernv/npu: Add helper to access struct npu for NPU device Alexey Kardashevskiy
2018-11-14  3:42   ` Alistair Popple
2018-11-13  8:28 ` [PATCH kernel v3 06/22] powerpc/powernv: Detach npu struct from pnv_phb Alexey Kardashevskiy
2018-11-14  4:28   ` Alistair Popple
2018-11-19  7:18     ` Alexey Kardashevskiy
2018-11-13  8:28 ` [PATCH kernel v3 07/22] powerpc/powernv/npu: Move OPAL calls away from context manipulation Alexey Kardashevskiy
2018-11-14  4:57   ` Alistair Popple
2018-11-13  8:28 ` [PATCH kernel v3 08/22] powerpc/pseries/iommu: Allow dynamic window to start from zero Alexey Kardashevskiy
2018-11-13  8:28 ` [PATCH kernel v3 09/22] powerpc/pseries/iommu: Force default DMA window removal Alexey Kardashevskiy
2018-11-16  4:54   ` David Gibson
2018-11-19  7:28     ` Alexey Kardashevskiy [this message]
2018-11-13  8:28 ` [PATCH kernel v3 10/22] powerpc/pseries/iommu: Use memory@ nodes in max RAM address calculation Alexey Kardashevskiy
2018-11-16  5:23   ` David Gibson
2018-11-19  7:43     ` Alexey Kardashevskiy
2018-11-13  8:28 ` [PATCH kernel v3 11/22] powerpc/pseries/npu: Enable platform support Alexey Kardashevskiy
2018-11-16  5:25   ` David Gibson
2018-11-19  7:50     ` Alexey Kardashevskiy
2018-11-13  8:28 ` [PATCH kernel v3 12/22] powerpc/pseries: Remove IOMMU API support for non-LPAR systems Alexey Kardashevskiy
2018-11-13  8:28 ` [PATCH kernel v3 13/22] powerpc/powernv/pseries: Rework device adding to IOMMU groups Alexey Kardashevskiy
2018-11-13  8:28 ` [PATCH kernel v3 14/22] powerpc/iommu_api: Move IOMMU groups setup to a single place Alexey Kardashevskiy
2018-11-19  0:15   ` David Gibson
2018-11-13  8:28 ` [PATCH kernel v3 15/22] powerpc/powernv: Reference iommu_table while it is linked to a group Alexey Kardashevskiy
2018-11-19  0:20   ` David Gibson
2018-11-13  8:28 ` [PATCH kernel v3 16/22] powerpc/powernv: Add purge cache OPAL call Alexey Kardashevskiy
2018-11-19  0:21   ` David Gibson
2018-11-13  8:28 ` [PATCH kernel v3 17/22] powerpc/powernv/npu: Convert NPU IOMMU helpers to iommu_table_group_ops Alexey Kardashevskiy
2018-11-19  0:24   ` David Gibson
2018-11-13  8:28 ` [PATCH kernel v3 18/22] powerpc/powernv/npu: Add compound IOMMU groups Alexey Kardashevskiy
2018-11-19  1:12   ` David Gibson
2018-11-19  2:29     ` Alexey Kardashevskiy
2018-11-13  8:28 ` [PATCH kernel v3 19/22] powerpc/powernv/npu: Add release_ownership hook Alexey Kardashevskiy
2018-11-13  8:28 ` [PATCH kernel v3 20/22] vfio_pci: Allow mapping extra regions Alexey Kardashevskiy
2018-11-13  8:28 ` [PATCH kernel v3 21/22] vfio_pci: Allow regions to add own capabilities Alexey Kardashevskiy
2018-11-13  8:28 ` [PATCH kernel v3 22/22] vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] [10de:1db1] subdriver Alexey Kardashevskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eb939cdf-aef0-5556-9f83-a330f7d3fd95@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=alistair@popple.id.au \
    --cc=andrew.donnellan@au1.ibm.com \
    --cc=arbab@linux.ibm.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=joserz@linux.ibm.com \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=lagarcia@br.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=oohall@gmail.com \
    --cc=pjaroszynski@nvidia.com \
    --cc=sbobroff@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).