linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: linuxppc-dev@lists.ozlabs.org
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>,
	Alex Williamson <alex.williamson@redhat.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	David Gibson <david@gibson.dropbear.id.au>,
	Gavin Shan <gwshan@linux.vnet.ibm.com>,
	Paul Mackerras <paulus@samba.org>,
	Wei Yang <weiyang@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org
Subject: [PATCH kernel v12 00/34] powerpc/iommu/vfio: Enable Dynamic DMA windows
Date: Fri,  5 Jun 2015 16:34:52 +1000	[thread overview]
Message-ID: <1433486126-23551-1-git-send-email-aik@ozlabs.ru> (raw)


This enables sPAPR defined feature called Dynamic DMA windows (DDW).

Each Partitionable Endpoint (IOMMU group) has an address range on a PCI bus
where devices are allowed to do DMA. These ranges are called DMA windows.
By default, there is a single DMA window, 1 or 2GB big, mapped at zero
on a PCI bus.

Hi-speed devices may suffer from the limited size of the window.
The recent host kernels use a TCE bypass window on POWER8 CPU which implements
direct PCI bus address range mapping (with offset of 1<<59) to the host memory.

For guests, PAPR defines a DDW RTAS API which allows pseries guests
querying the hypervisor about DDW support and capabilities (page size mask
for now). A pseries guest may request an additional (to the default)
DMA windows using this RTAS API.
The existing pseries Linux guests request an additional window as big as
the guest RAM and map the entire guest window which effectively creates
direct mapping of the guest memory to a PCI bus.

The multiple DMA windows feature is supported by POWER7/POWER8 CPUs; however
this patchset only adds support for POWER8 as TCE tables are implemented
in POWER7 in a quite different way ans POWER7 is not the highest priority.

This patchset reworks PPC64 IOMMU code and adds necessary structures
to support big windows.

Once a Linux guest discovers the presence of DDW, it does:
1. query hypervisor about number of available windows and page size masks;
2. create a window with the biggest possible page size (today 4K/64K/16M);
3. map the entire guest RAM via H_PUT_TCE* hypercalls;
4. switche dma_ops to direct_dma_ops on the selected PE.

Once this is done, H_PUT_TCE is not called anymore for 64bit devices and
the guest does not waste time on DMA map/unmap operations.

Note that 32bit devices won't use DDW and will keep using the default
DMA window so KVM optimizations will be required (to be posted later).

This is pushed to git@github.com:aik/linux.git
 + 93b347697...5ba9cbd vfio-for-github -> vfio-for-github (forced update)

The pushed branch contains all patches from this patchset and KVM
acceleration patches as well to give an idea about the current state
of in-kernel acceleration support.

Changes:
v12:
* fixed few issues in multilevel TCE tables
* fixed locked_vm counting in "userspace-to-physical addresses translation cache"
* fixed some commit logs
* rebased on 4.1-rc6

v11:
* reworked locking in pinned pages cache

v10:
* fixed&tested on SRIOV system
* fixed multiple comments from David
* added bunch of iommu device attachment reworks

v9:
* rebased on top of SRIOV (which is in upstream now)
* fixed multiple comments from David
* reworked ownership patches
* removed vfio: powerpc/spapr: Do cleanup when releasing the group (used to be #2)
as updated #1 should do this
* moved "powerpc/powernv: Implement accessor to TCE entry" to a separate patch
* added a patch which moves TCE Kill register address to PE from IOMMU table

v8:
* fixed a bug in error fallback in "powerpc/mmu: Add userspace-to-physical
addresses translation cache"
* fixed subject in "vfio: powerpc/spapr: Check that IOMMU page is fully
contained by system page"
* moved v2 documentation to the correct patch
* added checks for failed vzalloc() in "powerpc/iommu: Add userspace view
of TCE table"

v7:
* moved memory preregistration to the current process's MMU context
* added code preventing unregistration if some pages are still mapped;
for this, there is a userspace view of the table is stored in iommu_table
* added locked_vm counting for DDW tables (including userspace view of those)

v6:
* fixed a bunch of errors in "vfio: powerpc/spapr: Support Dynamic DMA windows"
* moved static IOMMU properties from iommu_table_group to iommu_table_group_ops

v5:
* added SPAPR_TCE_IOMMU_v2 to tell the userspace that there is a memory
pre-registration feature
* added backward compatibility
* renamed few things (mostly powerpc_iommu -> iommu_table_group)

v4:
* moved patches around to have VFIO and PPC patches separated as much as
possible
* now works with the existing upstream QEMU

v3:
* redesigned the whole thing
* multiple IOMMU groups per PHB -> one PHB is needed for VFIO in the guest ->
no problems with locked_vm counting; also we save memory on actual tables
* guest RAM preregistration is required for DDW
* PEs (IOMMU groups) are passed to VFIO with no DMA windows at all so
we do not bother with iommu_table::it_map anymore
* added multilevel TCE tables support to support really huge guests

v2:
* added missing __pa() in "powerpc/powernv: Release replaced TCE"
* reposted to make some noise




Alexey Kardashevskiy (34):
  powerpc/eeh/ioda2: Use device::iommu_group to check IOMMU group
  powerpc/iommu/powernv: Get rid of set_iommu_table_base_and_group
  powerpc/powernv/ioda: Clean up IOMMU group registration
  powerpc/iommu: Put IOMMU group explicitly
  powerpc/iommu: Always release iommu_table in iommu_free_table()
  vfio: powerpc/spapr: Move page pinning from arch code to VFIO IOMMU
    driver
  vfio: powerpc/spapr: Check that IOMMU page is fully contained by
    system page
  vfio: powerpc/spapr: Use it_page_size
  vfio: powerpc/spapr: Move locked_vm accounting to helpers
  vfio: powerpc/spapr: Disable DMA mappings on disabled container
  vfio: powerpc/spapr: Moving pinning/unpinning to helpers
  vfio: powerpc/spapr: Rework groups attaching
  powerpc/powernv: Do not set "read" flag if direction==DMA_NONE
  powerpc/iommu: Move tce_xxx callbacks from ppc_md to iommu_table
  powerpc/powernv/ioda/ioda2: Rework TCE invalidation in
    tce_build()/tce_free()
  powerpc/spapr: vfio: Replace iommu_table with iommu_table_group
  powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group
  vfio: powerpc/spapr/iommu/powernv/ioda2: Rework IOMMU ownership
    control
  powerpc/iommu: Fix IOMMU ownership control functions
  powerpc/powernv/ioda2: Move TCE kill register address to PE
  powerpc/powernv/ioda2: Add TCE invalidation for all attached groups
  powerpc/powernv: Implement accessor to TCE entry
  powerpc/iommu/powernv: Release replaced TCE
  powerpc/powernv/ioda2: Rework iommu_table creation
  powerpc/powernv/ioda2: Introduce helpers to allocate TCE pages
  powerpc/powernv/ioda2: Introduce pnv_pci_ioda2_set_window
  powerpc/powernv: Implement multilevel TCE tables
  vfio: powerpc/spapr: powerpc/powernv/ioda: Define and implement DMA
    windows API
  powerpc/powernv/ioda2: Use new helpers to do proper cleanup on PE
    release
  powerpc/iommu/ioda2: Add get_table_size() to calculate the size of
    future table
  vfio: powerpc/spapr: powerpc/powernv/ioda2: Use DMA windows API in
    ownership control
  powerpc/mmu: Add userspace-to-physical addresses translation cache
  vfio: powerpc/spapr: Register memory and define IOMMU v2
  vfio: powerpc/spapr: Support Dynamic DMA windows

 Documentation/vfio.txt                      |   50 +-
 arch/powerpc/include/asm/iommu.h            |  119 ++-
 arch/powerpc/include/asm/machdep.h          |   25 -
 arch/powerpc/include/asm/mmu-hash64.h       |    3 +
 arch/powerpc/include/asm/mmu_context.h      |   18 +
 arch/powerpc/include/asm/pci-bridge.h       |    2 +-
 arch/powerpc/kernel/eeh.c                   |    4 +-
 arch/powerpc/kernel/iommu.c                 |  247 +++---
 arch/powerpc/kernel/setup_64.c              |    3 +
 arch/powerpc/kernel/vio.c                   |    5 +
 arch/powerpc/mm/Makefile                    |    1 +
 arch/powerpc/mm/mmu_context_hash64.c        |    6 +
 arch/powerpc/mm/mmu_context_iommu.c         |  316 ++++++++
 arch/powerpc/platforms/cell/iommu.c         |    8 +-
 arch/powerpc/platforms/pasemi/iommu.c       |    7 +-
 arch/powerpc/platforms/powernv/pci-ioda.c   |  775 ++++++++++++++-----
 arch/powerpc/platforms/powernv/pci-p5ioc2.c |   35 +-
 arch/powerpc/platforms/powernv/pci.c        |  168 ++--
 arch/powerpc/platforms/powernv/pci.h        |   24 +-
 arch/powerpc/platforms/pseries/iommu.c      |  177 +++--
 arch/powerpc/sysdev/dart_iommu.c            |   12 +-
 drivers/vfio/vfio_iommu_spapr_tce.c         | 1109 ++++++++++++++++++++++++---
 include/uapi/linux/vfio.h                   |   88 ++-
 23 files changed, 2585 insertions(+), 617 deletions(-)
 create mode 100644 arch/powerpc/mm/mmu_context_iommu.c

-- 
2.4.0.rc3.8.gfb3e7d5


             reply	other threads:[~2015-06-05  6:38 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-05  6:34 Alexey Kardashevskiy [this message]
2015-06-05  6:34 ` [PATCH kernel v12 01/34] powerpc/eeh/ioda2: Use device::iommu_group to check IOMMU group Alexey Kardashevskiy
2015-06-05  6:34 ` [PATCH kernel v12 02/34] powerpc/iommu/powernv: Get rid of set_iommu_table_base_and_group Alexey Kardashevskiy
2015-06-05  6:34 ` [PATCH kernel v12 03/34] powerpc/powernv/ioda: Clean up IOMMU group registration Alexey Kardashevskiy
2015-06-05  6:34 ` [PATCH kernel v12 04/34] powerpc/iommu: Put IOMMU group explicitly Alexey Kardashevskiy
2015-06-05  6:34 ` [PATCH kernel v12 05/34] powerpc/iommu: Always release iommu_table in iommu_free_table() Alexey Kardashevskiy
2015-06-05  6:34 ` [PATCH kernel v12 06/34] vfio: powerpc/spapr: Move page pinning from arch code to VFIO IOMMU driver Alexey Kardashevskiy
2015-06-05  6:34 ` [PATCH kernel v12 07/34] vfio: powerpc/spapr: Check that IOMMU page is fully contained by system page Alexey Kardashevskiy
2015-06-05  6:35 ` [PATCH kernel v12 08/34] vfio: powerpc/spapr: Use it_page_size Alexey Kardashevskiy
2015-06-05  6:35 ` [PATCH kernel v12 09/34] vfio: powerpc/spapr: Move locked_vm accounting to helpers Alexey Kardashevskiy
2015-06-09  4:22   ` David Gibson
2015-06-05  6:35 ` [PATCH kernel v12 10/34] vfio: powerpc/spapr: Disable DMA mappings on disabled container Alexey Kardashevskiy
2015-06-05  6:35 ` [PATCH kernel v12 11/34] vfio: powerpc/spapr: Moving pinning/unpinning to helpers Alexey Kardashevskiy
2015-06-05  6:35 ` [PATCH kernel v12 12/34] vfio: powerpc/spapr: Rework groups attaching Alexey Kardashevskiy
2015-06-05  6:35 ` [PATCH kernel v12 13/34] powerpc/powernv: Do not set "read" flag if direction==DMA_NONE Alexey Kardashevskiy
2015-06-05  6:35 ` [PATCH kernel v12 14/34] powerpc/iommu: Move tce_xxx callbacks from ppc_md to iommu_table Alexey Kardashevskiy
2015-06-05  6:35 ` [PATCH kernel v12 15/34] powerpc/powernv/ioda/ioda2: Rework TCE invalidation in tce_build()/tce_free() Alexey Kardashevskiy
2015-06-05  6:35 ` [PATCH kernel v12 16/34] powerpc/spapr: vfio: Replace iommu_table with iommu_table_group Alexey Kardashevskiy
2015-06-05  6:35 ` [PATCH kernel v12 17/34] powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group Alexey Kardashevskiy
2015-06-09  2:36   ` David Gibson
2015-06-09  5:59   ` [PATCH kernel] powerpc/pseries: Fix compile error when CONFIG_IOMMU_API if off Alexey Kardashevskiy
2015-06-09 12:23   ` [PATCH kernel v12 17/34] powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group Michael Ellerman
2015-06-10  3:08     ` [PATCH kernel] powerpc/powernv: Fix crash when CONFIG_IOMMU_API is off Alexey Kardashevskiy
2015-06-10  7:33   ` [kernel, v12, 17/34] powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group Michael Ellerman
2015-06-11  4:28     ` [PATCH kernel v12.2] powerpc/powernv: Fix crash when CONFIG_IOMMU_API is off Alexey Kardashevskiy
2015-06-05  6:35 ` [PATCH kernel v12 18/34] vfio: powerpc/spapr/iommu/powernv/ioda2: Rework IOMMU ownership control Alexey Kardashevskiy
2015-06-05  6:35 ` [PATCH kernel v12 19/34] powerpc/iommu: Fix IOMMU ownership control functions Alexey Kardashevskiy
2015-06-05  6:35 ` [PATCH kernel v12 20/34] powerpc/powernv/ioda2: Move TCE kill register address to PE Alexey Kardashevskiy
2015-06-05  6:35 ` [PATCH kernel v12 21/34] powerpc/powernv/ioda2: Add TCE invalidation for all attached groups Alexey Kardashevskiy
2015-06-05  6:35 ` [PATCH kernel v12 22/34] powerpc/powernv: Implement accessor to TCE entry Alexey Kardashevskiy
2015-06-05  6:35 ` [PATCH kernel v12 23/34] powerpc/iommu/powernv: Release replaced TCE Alexey Kardashevskiy
2015-06-05  6:35 ` [PATCH kernel v12 24/34] powerpc/powernv/ioda2: Rework iommu_table creation Alexey Kardashevskiy
2015-06-05  6:35 ` [PATCH kernel v12 25/34] powerpc/powernv/ioda2: Introduce helpers to allocate TCE pages Alexey Kardashevskiy
2015-06-05  6:35 ` [PATCH kernel v12 26/34] powerpc/powernv/ioda2: Introduce pnv_pci_ioda2_set_window Alexey Kardashevskiy
2015-06-09  4:23   ` David Gibson
2015-06-05  6:35 ` [PATCH kernel v12 27/34] powerpc/powernv: Implement multilevel TCE tables Alexey Kardashevskiy
2015-06-09  3:57   ` David Gibson
2015-06-05  6:35 ` [PATCH kernel v12 28/34] vfio: powerpc/spapr: powerpc/powernv/ioda: Define and implement DMA windows API Alexey Kardashevskiy
2015-06-05  6:35 ` [PATCH kernel v12 29/34] powerpc/powernv/ioda2: Use new helpers to do proper cleanup on PE release Alexey Kardashevskiy
2015-06-05  6:35 ` [PATCH kernel v12 30/34] powerpc/iommu/ioda2: Add get_table_size() to calculate the size of future table Alexey Kardashevskiy
2015-06-05  6:35 ` [PATCH kernel v12 31/34] vfio: powerpc/spapr: powerpc/powernv/ioda2: Use DMA windows API in ownership control Alexey Kardashevskiy
2015-06-05  6:35 ` [PATCH kernel v12 32/34] powerpc/mmu: Add userspace-to-physical addresses translation cache Alexey Kardashevskiy
2015-06-09  4:05   ` David Gibson
2015-06-05  6:35 ` [PATCH kernel v12 33/34] vfio: powerpc/spapr: Register memory and define IOMMU v2 Alexey Kardashevskiy
2015-06-09  4:21   ` David Gibson
2015-06-05  6:35 ` [PATCH kernel v12 34/34] vfio: powerpc/spapr: Support Dynamic DMA windows Alexey Kardashevskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1433486126-23551-1-git-send-email-aik@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=benh@kernel.crashing.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=gwshan@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=paulus@samba.org \
    --cc=weiyang@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).