All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] VFIO Device states interface in GVT
@ 2019-02-19  7:42 Yan Zhao
  2019-02-19  7:43 ` [PATCH 1/8] drm/i915/gvt: Apply g2h adjust for GTT mmio access Yan Zhao
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: Yan Zhao @ 2019-02-19  7:42 UTC (permalink / raw)
  To: intel-gvt-dev, alex.williamson; +Cc: kvm, linux-kernel, Yan Zhao

This patchset provides GVT vGPU with device states control and
interfaces to get/set device data.


Desgin of device state control and interfaces to get/set device data
====================================================================

CODE STRUCTURES
---------------
/* Device State region type and sub-type */
#define VFIO_REGION_TYPE_DEVICE_STATE           (1 << 1)
#define VFIO_REGION_SUBTYPE_DEVICE_STATE_CTL       (1)
#define VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_CONFIG      (2)
#define VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_MEMORY      (3)
#define VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_DIRTYBITMAP (4)

#define VFIO_DEVICE_STATE_INTERFACE_VERSION 1
#define VFIO_DEVICE_DATA_CAP_DEVICE_MEMORY 1
#define VFIO_DEVICE_DATA_CAP_SYSTEM_MEMORY 2

#define VFIO_DEVICE_STATE_RUNNING 0 
#define VFIO_DEVICE_STATE_STOP 1
#define VFIO_DEVICE_STATE_LOGGING 2

#define VFIO_DEVICE_DATA_ACTION_GET_BUFFER 1
#define VFIO_DEVICE_DATA_ACTION_SET_BUFFER 2

struct vfio_device_state_ctl {
	__u32 version;	  /* ro, version of device control interface*/
	__u32 device_state;       /* VFIO device state, wo */
	__u32 caps;		 /* ro */
        struct {
		__u32 action;  /* wo, GET_BUFFER or SET_BUFFER */ 
		__u64 size;    /*rw, total size of device config*/
	} device_config;
	struct {
		__u32 action;    /* wo, GET_BUFFER or SET_BUFFER */ 
		__u64 size;     /* rw, total size of device memory*/  
                __u64 pos;/*chunk offset in total buffer of device memory*/
	} device_memory;
	struct {
		__u64 start_addr; /* wo */
		__u64 page_nr;   /* wo */
	} system_memory;
};

DEVICE DATA
-----------
A VFIO device's data can be divided into 3 categories: device config,
device memory and system memory dirty pages.

Device Config: such kind of data like MMIOs, page tables...
               Every device is supposed to possess device config data.
               Usually the size of device config data is small (no big
               than 10M), and it needs to be loaded in certain strict
               order.
               Therefore no dirty data logging is enabled for device
               config and it must be got/set as a whole.

Device Memory: device's internal memory, standalone and outside system
               memory.  It is usually very big.
               Not all device has device memory. Like IGD only uses system
               memory and has no device memory.

System Memory Dirty Pages: A device can produce dirty pages in system
               memory. 


DEVICE STATE REGIONS
---------------------
A VFIO device driver needs to register two mandatory regions and optionally
another two regions if it plans to support device state management.

So, there are up to four regions in total.
one is control region (region CTL) which is accessed via read/write system
call from user space;
the left three are data regions which are mmaped into user space and can be
accessed in the same way as accessing memory from user space.
(If data regions failed to be mmaped into user space, the way of read/write
system calls from user space is also valid).

1. region CTL:
          Mandatory.
          This is a control region.
          Its layout is defined in struct vfio_device_state_ctl.
          Reading from this region can get version, capabilities and data
          size of device state interfaces.
          Writing to this region can set device state, data size and
          choose which interface to use, i.e, among
          "get device config buffer", "set device config buffer",
          "get device memory buffer", "set device memory buffer",
          "get system memory dirty bitmap". 
2. region DEVICE_CONFIG
          Mandatory.
          This is a data region that holds device config data.
          It is able to be mmaped into user space.
3. region DEVICE_MEMORY
          Optional.
          This is a data region that holds device memory data.
          It is able to be mmaped into user space.
4. region DIRTY_BITMAP
          Optional.
          This is a data region that holds bitmap of dirty pages in system
          memory that a VFIO devices produces.
          It is able to be mmaped into user space.


DEVICE STATES
-------------
Four states are defined for a VFIO device:
        RUNNING, RUNNING & LOGGING, STOP & LOGGING, STOP.
They can be set by writing to device_state field of vfio_device_state_ctl
region.

LOGGING state is a special state that it CANNOT exist independently.
It must be set alongside with state RUNNING or STOP, i.e, 
        RUNNING & LOGGING, STOP & LOGGING

It is used for dirty data logging both for device memory and system memory.

LOGGING only impacts device/system memory.
Device config should be always accessible and return whole config snapshot
regardless of LOGGING state.

Typical state transition flows for VFIO devices are:
    (a) RUNNING --> RUNNING & LOGGING --> STOP & LOGGING --> STOP
    (b) RUNNING --> STOP --> RUNNING

RUNNING: In this state, a VFIO device is in active state ready to receive
         commands from device driver.
         interfaces includes "get device config buffer", "get device config
         size", "get device memory buffer", "get device memory size"
         return whole data snapshot.
         "get system memory dirty bitmap" returns empty bitmap.
         It is the default state that a VFIO device enters initially.

STOP:	 In this state, a VFIO device is deactivated to interact with
         device driver.
         "get device config buffer", "get device config size"
         "get device memory buffer", "get device memory size",
          return whole data snapshot.
         "get system memory dirty bitmap" returns empty bitmap.

RUNNING & LOGGING: In this state, a VFIO device is in active state.
         "get device config buffer", "get device config size" returns whole
         snapshot of device config. 
         "get device memory buffer", "get device memory size", "get system
         memory dirty bitmap" returns dirty data since last call to those
         interfaces.

STOP & LOGGING: In this state, the VFIO device is deactivated.
         "get device config buffer", "get device config size" returns whole
         snapshot of device config. 
         "get device memory buffer", "get device memory size", "get system
         memory dirty bitmap" returns dirty data since last call to those
         interfaces.

Note:
The reason why RUNNING is the default state is that device's active state
must not depend on device state interface.
It is possible that region vfio_device_state_ctl fails to got registered.
In that condition, a device needs be in active state by default. 


DEVICE DATA CAPS
------------------
Device Config capability is by default on, no need to set this cap.

For devices that have devcie memory, it is required to expose DEVICE_MEMORY
capability.
#define VFIO_DEVICE_DATA_CAP_DEVICE_MEMORY 1

For devices producing dirty pages in system memory, it is required to
expose cap SYSTEM_MEMORY in order to get dirty bitmap in certain range of
system memory.
#define VFIO_DEVICE_DATA_CAP_SYSTEM_MEMORY 2

see section "DEVICE STATE INTERFACE" for "get caps" interface to get device
data caps from userspace VFIO.


DEVICE STATE INTERFACES
------------------------
1. get version
   (1) user space calls read system call on "version" field of region CTL.
   (2) VFIO driver writes version number of device state interfaces to the
       "version" field of region CTL.

2. get caps
   (1) user space calls read system call on "caps" field of region CTL.
   (2) if a VFIO device has huge device memory, VFIO driver reports
       VFIO_DEVICE_DATA_CAP_DEVICE_MEMORY in "caps" field of region CTL.
       if a VFIO device produces dirty pages in system memory, VFIO driver
       reports VFIO_DEVICE_DATA_CAP_SYSTEM_MEMORY in "caps" field of
       region CTL.

3. set device state
   (1) user space calls write system call on "device_state" field of region
       CTL.
   (2) device state transitions as:

       RUNNING -- start dirty data logging --> RUNNING & LOGGING 
       RUNNING -- deactivate --> STOP
       RUNNING -- deactivate & start dirty data longging --> STOP & LOGGING
       RUNNING & LOGGING -- stop dirty data logging --> RUNNING
       RUNNING & LOGGING -- deactivate --> STOP & LOGGING
       RUNNING & LOGGING -- deactivate & stop dirty data logging --> STOP
       STOP -- activate --> RUNNING
       STOP -- start dirty data logging --> STOP & LOGGING
       STOP -- activate & start dirty data logging --> RUNNING & LOGGING
       STOP & LOGGING -- stop dirty data logging --> STOP
       STOP & LOGGING -- activate --> RUNNING & LOGGING
       STOP & LOGGING -- activate & stop dirty data logging --> RUNNING

4. get device config size
   (1) user space calls read system call on "device_config.size" field of
       region CTL for the total size of device config snapshot.
   (2) VFIO driver writes device config data's total size in
       "device_config.size" field of region CTL.

5. set device config size
   (1) user space calls write system call.
       total size of device config snapshot --> "device_config.size" field
       of region CTL.
   (2) VFIO driver reads device config data's total size from
       "device_config.size" field of region CTL.

6 get device config buffer
   (1) user space calls write system call.
       "GET_BUFFER" --> "device_config.action" field of region CTL.
   (2) VFIO driver
       a. gets whole snapshot for device config 
       b. writes whole device config snapshot to region
       DEVICE_CONFIG.
   (3) user space reads the whole of device config snapshot from region
       DEVICE_CONFIG.
 
7. set device config buffer
   (1) user space writes whole of device config data to region
       DEVICE_CONFIG.
   (2) user space calls write system call.
       "SET_BUFFER" --> "device_config.action" field of region CTL.
   (3) VFIO driver loads whole of device config from region DEVICE_CONFIG.

8. get device memory size
   (1) user space calls read system call on "device_memory.size" field of
       region CTL for device memory size.
   (2) VFIO driver
       a. gets device memory snapshot (in state RUNNING or STOP), or
          gets device memory dirty data (in state RUNNING & LOGGING or
          state STOP & LOGGING)
       b. writes size in "device_memory.size" field of region CTL
 
9. set device memory size
   (1) user space calls write system call on "device_memory.size" field of
       region CTL to set total size of device memory snapshot.
   (2) VFIO driver reads device memory's size from "device_memory.size"
       field of region CTL.


10. get device memory buffer
   (1) user space calls write system.
       pos --> "device_memory.pos" field of region CTL,
       "GET_BUFFER" --> "device_memory.action" field of region CTL.
       (pos must be 0 or multiples of length of region DEVICE_MEMORY).
   (2) VFIO driver writes N'th chunk of device memory snapshot/dirty data
       to region DEVICE_MEMORY.
       (N equals to pos/(region length of DEVICE_MEMORY))
   (3) user space reads the N'th chunk of device memory snapshot/dirty data
       from region DEVICE_MEMORY.
 
11. set device memory buffer
   (1) user space writes N'th chunk of device memory snapshot/dirty data to
       region DEVICE_MEMORY.
       (N equals to pos/(region length of DEVICE_MEMORY))
   (2) user space writes pos to "device_memory.pos" field and writes
       "SET_BUFFER" to "device_memory.action" field of region CTL.
   (3) VFIO driver loads N'th chunk of device memory snapshot/dirty data
       from region DEVICE_MEMORY.

12. get system memory dirty bitmap
   (1) user space calls write system call to specify a range of system
       memory that querying dirty pages.
       system memory's start address --> "system_memory.start_addr" field
       of region CTL,
       system memory's page count --> "system_memory.page_nr" field of
       region CTL.
   (2) if device state is not in RUNNING or STOP & LOGGING,
       VFIO driver returns empty bitmap; otherwise,
       VFIO driver checks the page_nr,
       if it's larger than the size that region DIRTY_BITMAP can support,
       error returns; if not,
       VFIO driver returns as bitmap to specify dirty pages that
       device produces since last query in this range of system memory .
   (3) usespace reads back the dirty bitmap from region DIRTY_BITMAP.


EXAMPLE USAGE
-------------
Take live migration of a VFIO device as an example to use those device
state interfaces.

Live migration save path:

(QEMU LIVE MIGRATION STATE --> DEVICE STATE INTERFACE --> DEVICE STATE)

MIGRATION_STATUS_NONE --> VFIO_DEVICE_STATE_RUNNING
 |
MIGRATION_STATUS_SAVE_SETUP
 |
 .save_setup callback -->
 get device memory size (whole snapshot size)
 get device memory buffer (whole snapshot data)
 set device state --> VFIO_DEVICE_STATE_RUNNING & VFIO_DEVICE_STATE_LOGGING
 |
MIGRATION_STATUS_ACTIVE
 |
 .save_live_pending callback --> get device memory size (dirty data)
 .save_live_iteration callback --> get device memory buffer (dirty data)
 .log_sync callback --> get system memory dirty bitmap
 |
(vcpu stops) --> set device state -->
 VFIO_DEVICE_STATE_STOP & VFIO_DEVICE_STATE_LOGGING
 |
.save_live_complete_precopy callback -->
 get device memory size (dirty data)
 get device memory buffer (dirty data)
 get device config size (whole snapshot size)
 get device config buffer (whole snapshot data)
 |
.save_cleanup callback -->  set device state --> VFIO_DEVICE_STATE_STOP
MIGRATION_STATUS_COMPLETED

MIGRATION_STATUS_CANCELLED or
MIGRATION_STATUS_FAILED
 |
(vcpu starts) --> set device state --> VFIO_DEVICE_STATE_RUNNING


Live migration load path:

(QEMU LIVE MIGRATION STATE --> DEVICE STATE INTERFACE --> DEVICE STATE)

MIGRATION_STATUS_NONE --> VFIO_DEVICE_STATE_RUNNING
 |
(vcpu stops) --> set device state --> VFIO_DEVICE_STATE_STOP
 |
MIGRATION_STATUS_ACTIVE
 |
.load state callback -->
 set device memory size, set device memory buffer, set device config size,
 set device config buffer
 |
(vcpu starts) --> set device state --> VFIO_DEVICE_STATE_RUNNING
 |
MIGRATION_STATUS_COMPLETED


Patch Orgnization
=================

The first 6 patches let vGPU view its base ggtt address as starting from
0. Before vGPU submitting workloads to HW, trap vGPU's workloads, scan
commands to patch them to start from base address of the ggtt partition
assiggned to the vGPU.

The latter two patches implements the VFIO device states interfaces.
Patch 7 implements loading device config data from vGPU and restoring
device config data into vGPU through GVT's internal interface
intel_gvt_save_restore.

Patch 8 exposes device states interfaces to userspace VFIO through VFIO
regions of type VFIO_REGION_TYPE_DEVICE_STATE. Through those regions, user
space VFIO can get/set device's state and data.


Yan Zhao (2):
  drm/i915/gvt: vGPU device config data save/restore interface
  drm/i915/gvt: VFIO device states interfaces

Yulei Zhang (6):
  drm/i915/gvt: Apply g2h adjust for GTT mmio access
  drm/i915/gvt: Apply g2h adjustment during fence mmio access
  drm/i915/gvt: Patch the gma in gpu commands during command parser
  drm/i915/gvt: Retrieve the guest gm base address from PVINFO
  drm/i915/gvt: Align the guest gm aperture start offset for live
    migration
  drm/i915/gvt: Apply g2h adjustment to buffer start gma for dmabuf

 drivers/gpu/drm/i915/gvt/Makefile      |   2 +-
 drivers/gpu/drm/i915/gvt/aperture_gm.c |   6 +-
 drivers/gpu/drm/i915/gvt/cfg_space.c   |   3 +-
 drivers/gpu/drm/i915/gvt/cmd_parser.c  |  31 +-
 drivers/gpu/drm/i915/gvt/dmabuf.c      |   3 +
 drivers/gpu/drm/i915/gvt/execlist.c    |   2 +-
 drivers/gpu/drm/i915/gvt/gtt.c         |  25 +-
 drivers/gpu/drm/i915/gvt/gtt.h         |   3 +
 drivers/gpu/drm/i915/gvt/gvt.c         |   1 +
 drivers/gpu/drm/i915/gvt/gvt.h         |  48 +-
 drivers/gpu/drm/i915/gvt/kvmgt.c       | 414 +++++++++++-
 drivers/gpu/drm/i915/gvt/migrate.c     | 863 +++++++++++++++++++++++++
 drivers/gpu/drm/i915/gvt/migrate.h     |  97 +++
 drivers/gpu/drm/i915/gvt/mmio.c        |  13 +
 drivers/gpu/drm/i915/gvt/mmio.h        |   1 +
 drivers/gpu/drm/i915/gvt/vgpu.c        |  11 +-
 include/uapi/linux/vfio.h              |  38 ++
 17 files changed, 1511 insertions(+), 50 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gvt/migrate.c
 create mode 100644 drivers/gpu/drm/i915/gvt/migrate.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/8] drm/i915/gvt: Apply g2h adjust for GTT mmio access
  2019-02-19  7:42 [PATCH 0/8] VFIO Device states interface in GVT Yan Zhao
@ 2019-02-19  7:43 ` Yan Zhao
  2019-02-19  7:45 ` [PATCH 2/8] drm/i915/gvt: Apply g2h adjustment during fence " Yan Zhao
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Yan Zhao @ 2019-02-19  7:43 UTC (permalink / raw)
  To: intel-gvt-dev, alex.williamson
  Cc: kvm, linux-kernel, Yulei Zhang, Zhenyu Wang

From: Yulei Zhang <yulei.zhang@intel.com>

Apply guest to host gma conversion while guest try to access the
GTT mmio registers, as after enable live migration the host gma
will be changed due to the resourece re-allocation, but guest
gma should be remaining unchanged, thus g2h conversion is request
for it.

Signed-off-by: Yulei Zhang <yulei.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
---
 drivers/gpu/drm/i915/gvt/gtt.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index c7103dd2d8d5..8a5d26d1d402 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -65,8 +65,7 @@ bool intel_gvt_ggtt_validate_range(struct intel_vgpu *vgpu, u64 addr, u32 size)
 /* translate a guest gmadr to host gmadr */
 int intel_gvt_ggtt_gmadr_g2h(struct intel_vgpu *vgpu, u64 g_addr, u64 *h_addr)
 {
-	if (WARN(!vgpu_gmadr_is_valid(vgpu, g_addr),
-		 "invalid guest gmadr %llx\n", g_addr))
+	if (!vgpu_gmadr_is_valid(vgpu, g_addr))
 		return -EACCES;
 
 	if (vgpu_gmadr_is_aperture(vgpu, g_addr))
@@ -2162,7 +2161,8 @@ static int emulate_ggtt_mmio_write(struct intel_vgpu *vgpu, unsigned int off,
 	struct intel_vgpu_mm *ggtt_mm = vgpu->gtt.ggtt_mm;
 	struct intel_gvt_gtt_pte_ops *ops = gvt->gtt.pte_ops;
 	unsigned long g_gtt_index = off >> info->gtt_entry_size_shift;
-	unsigned long gma, gfn;
+	unsigned long gfn;
+	unsigned long h_gtt_index;
 	struct intel_gvt_gtt_entry e, m;
 	dma_addr_t dma_addr;
 	int ret;
@@ -2172,10 +2172,8 @@ static int emulate_ggtt_mmio_write(struct intel_vgpu *vgpu, unsigned int off,
 	if (bytes != 4 && bytes != 8)
 		return -EINVAL;
 
-	gma = g_gtt_index << I915_GTT_PAGE_SHIFT;
-
 	/* the VM may configure the whole GM space when ballooning is used */
-	if (!vgpu_gmadr_is_valid(vgpu, gma))
+	if (intel_gvt_ggtt_index_g2h(vgpu, g_gtt_index, &h_gtt_index))
 		return 0;
 
 	e.type = GTT_TYPE_GGTT_PTE;
@@ -2259,11 +2257,12 @@ static int emulate_ggtt_mmio_write(struct intel_vgpu *vgpu, unsigned int off,
 out:
 	ggtt_set_guest_entry(ggtt_mm, &e, g_gtt_index);
 
-	ggtt_get_host_entry(ggtt_mm, &e, g_gtt_index);
+	ggtt_get_host_entry(ggtt_mm, &e, h_gtt_index);
 	ggtt_invalidate_pte(vgpu, &e);
 
-	ggtt_set_host_entry(ggtt_mm, &m, g_gtt_index);
+	ggtt_set_host_entry(ggtt_mm, &m, h_gtt_index);
 	ggtt_invalidate(gvt->dev_priv);
+
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/8] drm/i915/gvt: Apply g2h adjustment during fence mmio access
  2019-02-19  7:42 [PATCH 0/8] VFIO Device states interface in GVT Yan Zhao
  2019-02-19  7:43 ` [PATCH 1/8] drm/i915/gvt: Apply g2h adjust for GTT mmio access Yan Zhao
@ 2019-02-19  7:45 ` Yan Zhao
  2019-02-19  7:45 ` [PATCH 3/8] drm/i915/gvt: Patch the gma in gpu commands during command parser Yan Zhao
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Yan Zhao @ 2019-02-19  7:45 UTC (permalink / raw)
  To: intel-gvt-dev, alex.williamson; +Cc: kvm, linux-kernel, Yulei Zhang

From: Yulei Zhang <yulei.zhang@intel.com>

Apply the guest to host gma conversion while guest config the
fence mmio registers due to the host gma change after the migration.

Signed-off-by: Yulei Zhang <yulei.zhang@intel.com>
---
 drivers/gpu/drm/i915/gvt/aperture_gm.c |  6 ++++--
 drivers/gpu/drm/i915/gvt/gvt.h         | 14 ++++++++++++++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/aperture_gm.c b/drivers/gpu/drm/i915/gvt/aperture_gm.c
index 359d37d5c958..123c475f2f6e 100644
--- a/drivers/gpu/drm/i915/gvt/aperture_gm.c
+++ b/drivers/gpu/drm/i915/gvt/aperture_gm.c
@@ -146,8 +146,10 @@ void intel_vgpu_write_fence(struct intel_vgpu *vgpu,
 	I915_WRITE(fence_reg_lo, 0);
 	POSTING_READ(fence_reg_lo);
 
-	I915_WRITE(fence_reg_hi, upper_32_bits(value));
-	I915_WRITE(fence_reg_lo, lower_32_bits(value));
+	I915_WRITE(fence_reg_hi,
+		intel_gvt_reg_g2h(vgpu, upper_32_bits(value), 0xFFFFF000));
+	I915_WRITE(fence_reg_lo,
+		intel_gvt_reg_g2h(vgpu, lower_32_bits(value), 0xFFFFF000));
 	POSTING_READ(fence_reg_lo);
 }
 
diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
index b4ab1dad0143..8621d0f5fd26 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.h
+++ b/drivers/gpu/drm/i915/gvt/gvt.h
@@ -530,6 +530,20 @@ int intel_gvt_ggtt_index_g2h(struct intel_vgpu *vgpu, unsigned long g_index,
 int intel_gvt_ggtt_h2g_index(struct intel_vgpu *vgpu, unsigned long h_index,
 			     unsigned long *g_index);
 
+/* apply guest to host gma conversion in GM registers setting */
+static inline u64 intel_gvt_reg_g2h(struct intel_vgpu *vgpu,
+		u32 addr, u32 mask)
+{
+	u64 gma;
+
+	if (addr) {
+		intel_gvt_ggtt_gmadr_g2h(vgpu,
+				addr & mask, &gma);
+		addr = gma | (addr & (~mask));
+	}
+	return addr;
+}
+
 void intel_vgpu_init_cfg_space(struct intel_vgpu *vgpu,
 		bool primary);
 void intel_vgpu_reset_cfg_space(struct intel_vgpu *vgpu);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/8] drm/i915/gvt: Patch the gma in gpu commands during command parser
  2019-02-19  7:42 [PATCH 0/8] VFIO Device states interface in GVT Yan Zhao
  2019-02-19  7:43 ` [PATCH 1/8] drm/i915/gvt: Apply g2h adjust for GTT mmio access Yan Zhao
  2019-02-19  7:45 ` [PATCH 2/8] drm/i915/gvt: Apply g2h adjustment during fence " Yan Zhao
@ 2019-02-19  7:45 ` Yan Zhao
  2019-02-19  7:46 ` [PATCH 4/8] drm/i915/gvt: Retrieve the guest gm base address from PVINFO Yan Zhao
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Yan Zhao @ 2019-02-19  7:45 UTC (permalink / raw)
  To: intel-gvt-dev, alex.williamson; +Cc: kvm, linux-kernel, Yulei Zhang

From: Yulei Zhang <yulei.zhang@intel.com>

Adjust the graphics memory address in gpu commands according to
the shift offset in guests' aperture and hidden gm address, and patch
the commands before submit to execute.

Signed-off-by: Yulei Zhang <yulei.zhang@intel.com>
---
 drivers/gpu/drm/i915/gvt/cmd_parser.c | 31 ++++++++++++++++++++-------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/cmd_parser.c b/drivers/gpu/drm/i915/gvt/cmd_parser.c
index 77ae634eb11c..90836756b235 100644
--- a/drivers/gpu/drm/i915/gvt/cmd_parser.c
+++ b/drivers/gpu/drm/i915/gvt/cmd_parser.c
@@ -984,7 +984,8 @@ static int cmd_handler_lrr(struct parser_exec_state *s)
 }
 
 static inline int cmd_address_audit(struct parser_exec_state *s,
-		unsigned long guest_gma, int op_size, bool index_mode);
+				    unsigned long guest_gma, int op_size,
+				    bool index_mode, int offset);
 
 static int cmd_handler_lrm(struct parser_exec_state *s)
 {
@@ -1006,7 +1007,8 @@ static int cmd_handler_lrm(struct parser_exec_state *s)
 			gma = cmd_gma(s, i + 1);
 			if (gmadr_bytes == 8)
 				gma |= (cmd_gma_hi(s, i + 2)) << 32;
-			ret |= cmd_address_audit(s, gma, sizeof(u32), false);
+			ret |= cmd_address_audit(s, gma, sizeof(u32),
+						 false, i + 1);
 			if (ret)
 				break;
 		}
@@ -1030,7 +1032,8 @@ static int cmd_handler_srm(struct parser_exec_state *s)
 			gma = cmd_gma(s, i + 1);
 			if (gmadr_bytes == 8)
 				gma |= (cmd_gma_hi(s, i + 2)) << 32;
-			ret |= cmd_address_audit(s, gma, sizeof(u32), false);
+			ret |= cmd_address_audit(s, gma, sizeof(u32),
+						 false, i + 1);
 			if (ret)
 				break;
 		}
@@ -1102,7 +1105,7 @@ static int cmd_handler_pipe_control(struct parser_exec_state *s)
 				if (cmd_val(s, 1) & (1 << 21))
 					index_mode = true;
 				ret |= cmd_address_audit(s, gma, sizeof(u64),
-						index_mode);
+						index_mode, 2);
 			}
 		}
 	}
@@ -1432,10 +1435,13 @@ static unsigned long get_gma_bb_from_cmd(struct parser_exec_state *s, int index)
 }
 
 static inline int cmd_address_audit(struct parser_exec_state *s,
-		unsigned long guest_gma, int op_size, bool index_mode)
+				    unsigned long guest_gma, int op_size,
+				    bool index_mode, int offset)
 {
 	struct intel_vgpu *vgpu = s->vgpu;
 	u32 max_surface_size = vgpu->gvt->device_info.max_surface_size;
+	int gmadr_bytes = vgpu->gvt->device_info.gmadr_bytes_in_cmd;
+	u64 host_gma;
 	int i;
 	int ret;
 
@@ -1453,6 +1459,14 @@ static inline int cmd_address_audit(struct parser_exec_state *s,
 	} else if (!intel_gvt_ggtt_validate_range(vgpu, guest_gma, op_size)) {
 		ret = -EFAULT;
 		goto err;
+	} else
+		intel_gvt_ggtt_gmadr_g2h(vgpu, guest_gma, &host_gma);
+
+	if (offset > 0) {
+		patch_value(s, cmd_ptr(s, offset), host_gma & GENMASK(31, 2));
+		if (gmadr_bytes == 8)
+			patch_value(s, cmd_ptr(s, offset + 1),
+				(host_gma >> 32) & GENMASK(15, 0));
 	}
 
 	return 0;
@@ -1497,7 +1511,7 @@ static int cmd_handler_mi_store_data_imm(struct parser_exec_state *s)
 		gma = (gma_high << 32) | gma_low;
 		core_id = (cmd_val(s, 1) & (1 << 0)) ? 1 : 0;
 	}
-	ret = cmd_address_audit(s, gma + op_size * core_id, op_size, false);
+	ret = cmd_address_audit(s, gma + op_size * core_id, op_size, false, 1);
 	return ret;
 }
 
@@ -1541,7 +1555,7 @@ static int cmd_handler_mi_op_2f(struct parser_exec_state *s)
 		gma_high = cmd_val(s, 2) & GENMASK(15, 0);
 		gma = (gma_high << 32) | gma;
 	}
-	ret = cmd_address_audit(s, gma, op_size, false);
+	ret = cmd_address_audit(s, gma, op_size, false, 1);
 	return ret;
 }
 
@@ -1581,7 +1595,8 @@ static int cmd_handler_mi_flush_dw(struct parser_exec_state *s)
 		/* Store Data Index */
 		if (cmd_val(s, 0) & (1 << 21))
 			index_mode = true;
-		ret = cmd_address_audit(s, gma, sizeof(u64), index_mode);
+		ret = cmd_address_audit(s, (gma | (1 << 2)),
+					sizeof(u64), index_mode, 1);
 	}
 	/* Check notify bit */
 	if ((cmd_val(s, 0) & (1 << 8)))
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/8] drm/i915/gvt: Retrieve the guest gm base address from PVINFO
  2019-02-19  7:42 [PATCH 0/8] VFIO Device states interface in GVT Yan Zhao
                   ` (2 preceding siblings ...)
  2019-02-19  7:45 ` [PATCH 3/8] drm/i915/gvt: Patch the gma in gpu commands during command parser Yan Zhao
@ 2019-02-19  7:46 ` Yan Zhao
  2019-02-19  7:46 ` [PATCH 5/8] drm/i915/gvt: Align the guest gm aperture start offset for live migration Yan Zhao
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Yan Zhao @ 2019-02-19  7:46 UTC (permalink / raw)
  To: intel-gvt-dev, alex.williamson
  Cc: kvm, linux-kernel, Yulei Zhang, Zhenyu Wang

From: Yulei Zhang <yulei.zhang@intel.com>

As after migration the host gm base address will be changed due
to resource re-allocation, in order to make sure the guest gm
address doesn't change with that to retrieve the guest gm base
address from PVINFO.

Signed-off-by: Yulei Zhang <yulei.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
---
 drivers/gpu/drm/i915/gvt/cfg_space.c |  3 ++-
 drivers/gpu/drm/i915/gvt/gtt.c       |  8 ++++----
 drivers/gpu/drm/i915/gvt/gvt.h       | 22 ++++++++++++++++++----
 3 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/cfg_space.c b/drivers/gpu/drm/i915/gvt/cfg_space.c
index 19cf1bbe059d..272f8dba7b21 100644
--- a/drivers/gpu/drm/i915/gvt/cfg_space.c
+++ b/drivers/gpu/drm/i915/gvt/cfg_space.c
@@ -33,6 +33,7 @@
 
 #include "i915_drv.h"
 #include "gvt.h"
+#include "i915_pvinfo.h"
 
 enum {
 	INTEL_GVT_PCI_BAR_GTTMMIO = 0,
@@ -133,7 +134,7 @@ static int map_aperture(struct intel_vgpu *vgpu, bool map)
 	else
 		val = *(u32 *)(vgpu_cfg_space(vgpu) + PCI_BASE_ADDRESS_2);
 
-	first_gfn = (val + vgpu_aperture_offset(vgpu)) >> PAGE_SHIFT;
+	first_gfn = (val + vgpu_guest_aperture_offset(vgpu)) >> PAGE_SHIFT;
 
 	ret = intel_gvt_hypervisor_map_gfn_to_mfn(vgpu, first_gfn,
 						  aperture_pa >> PAGE_SHIFT,
diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 8a5d26d1d402..753ad975c958 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -70,10 +70,10 @@ int intel_gvt_ggtt_gmadr_g2h(struct intel_vgpu *vgpu, u64 g_addr, u64 *h_addr)
 
 	if (vgpu_gmadr_is_aperture(vgpu, g_addr))
 		*h_addr = vgpu_aperture_gmadr_base(vgpu)
-			  + (g_addr - vgpu_aperture_offset(vgpu));
+			  + (g_addr - vgpu_guest_aperture_gmadr_base(vgpu));
 	else
 		*h_addr = vgpu_hidden_gmadr_base(vgpu)
-			  + (g_addr - vgpu_hidden_offset(vgpu));
+			  + (g_addr - vgpu_guest_hidden_gmadr_base(vgpu));
 	return 0;
 }
 
@@ -85,10 +85,10 @@ int intel_gvt_ggtt_gmadr_h2g(struct intel_vgpu *vgpu, u64 h_addr, u64 *g_addr)
 		return -EACCES;
 
 	if (gvt_gmadr_is_aperture(vgpu->gvt, h_addr))
-		*g_addr = vgpu_aperture_gmadr_base(vgpu)
+		*g_addr = vgpu_guest_aperture_gmadr_base(vgpu)
 			+ (h_addr - gvt_aperture_gmadr_base(vgpu->gvt));
 	else
-		*g_addr = vgpu_hidden_gmadr_base(vgpu)
+		*g_addr = vgpu_guest_hidden_gmadr_base(vgpu)
 			+ (h_addr - gvt_hidden_gmadr_base(vgpu->gvt));
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
index 8621d0f5fd26..1f5ef59a36ac 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.h
+++ b/drivers/gpu/drm/i915/gvt/gvt.h
@@ -424,6 +424,20 @@ int intel_gvt_load_firmware(struct intel_gvt *gvt);
 #define vgpu_fence_base(vgpu) (vgpu->fence.base)
 #define vgpu_fence_sz(vgpu) (vgpu->fence.size)
 
+/* Aperture/GM space definitions for vGPU Guest view point */
+#define vgpu_guest_aperture_offset(vgpu) \
+	vgpu_vreg_t(vgpu, vgtif_reg(avail_rs.mappable_gmadr.base))
+#define vgpu_guest_hidden_offset(vgpu)	\
+	vgpu_vreg_t(vgpu, vgtif_reg(avail_rs.nonmappable_gmadr.base))
+
+#define vgpu_guest_aperture_gmadr_base(vgpu) (vgpu_guest_aperture_offset(vgpu))
+#define vgpu_guest_aperture_gmadr_end(vgpu) \
+	(vgpu_guest_aperture_gmadr_base(vgpu) + vgpu_aperture_sz(vgpu) - 1)
+
+#define vgpu_guest_hidden_gmadr_base(vgpu) (vgpu_guest_hidden_offset(vgpu))
+#define vgpu_guest_hidden_gmadr_end(vgpu) \
+	(vgpu_guest_hidden_gmadr_base(vgpu) + vgpu_hidden_sz(vgpu) - 1)
+
 struct intel_vgpu_creation_params {
 	__u64 handle;
 	__u64 low_gm_sz;  /* in MB */
@@ -499,12 +513,12 @@ void intel_gvt_deactivate_vgpu(struct intel_vgpu *vgpu);
 
 /* validating GM functions */
 #define vgpu_gmadr_is_aperture(vgpu, gmadr) \
-	((gmadr >= vgpu_aperture_gmadr_base(vgpu)) && \
-	 (gmadr <= vgpu_aperture_gmadr_end(vgpu)))
+	((gmadr >= vgpu_guest_aperture_gmadr_base(vgpu)) && \
+	 (gmadr <= vgpu_guest_aperture_gmadr_end(vgpu)))
 
 #define vgpu_gmadr_is_hidden(vgpu, gmadr) \
-	((gmadr >= vgpu_hidden_gmadr_base(vgpu)) && \
-	 (gmadr <= vgpu_hidden_gmadr_end(vgpu)))
+	((gmadr >= vgpu_guest_hidden_gmadr_base(vgpu)) && \
+	 (gmadr <= vgpu_guest_hidden_gmadr_end(vgpu)))
 
 #define vgpu_gmadr_is_valid(vgpu, gmadr) \
 	 ((vgpu_gmadr_is_aperture(vgpu, gmadr) || \
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/8] drm/i915/gvt: Align the guest gm aperture start offset for live migration
  2019-02-19  7:42 [PATCH 0/8] VFIO Device states interface in GVT Yan Zhao
                   ` (3 preceding siblings ...)
  2019-02-19  7:46 ` [PATCH 4/8] drm/i915/gvt: Retrieve the guest gm base address from PVINFO Yan Zhao
@ 2019-02-19  7:46 ` Yan Zhao
  2019-02-19  7:46 ` [PATCH 6/8] drm/i915/gvt: Apply g2h adjustment to buffer start gma for dmabuf Yan Zhao
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Yan Zhao @ 2019-02-19  7:46 UTC (permalink / raw)
  To: intel-gvt-dev, alex.williamson
  Cc: kvm, linux-kernel, Yulei Zhang, Zhenyu Wang

From: Yulei Zhang <yulei.zhang@intel.com>

As guest gm aperture region start offset is initialized when vGPU created,
in order to make sure that start offset is remain the same after migration,
align the aperture start offset to 0 for guest.

Signed-off-by: Yulei Zhang <yulei.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c |  3 +--
 drivers/gpu/drm/i915/gvt/vgpu.c  | 10 ++++++++--
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index c1072143da1d..223c67e87680 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1198,8 +1198,7 @@ static long intel_vgpu_ioctl(struct mdev_device *mdev, unsigned int cmd,
 			sparse->header.version = 1;
 			sparse->nr_areas = nr_areas;
 			cap_type_id = VFIO_REGION_INFO_CAP_SPARSE_MMAP;
-			sparse->areas[0].offset =
-					PAGE_ALIGN(vgpu_aperture_offset(vgpu));
+			sparse->areas[0].offset = 0;
 			sparse->areas[0].size = vgpu_aperture_sz(vgpu);
 			break;
 
diff --git a/drivers/gpu/drm/i915/gvt/vgpu.c b/drivers/gpu/drm/i915/gvt/vgpu.c
index c628be05fbfe..fcccda35a456 100644
--- a/drivers/gpu/drm/i915/gvt/vgpu.c
+++ b/drivers/gpu/drm/i915/gvt/vgpu.c
@@ -48,8 +48,7 @@ void populate_pvinfo_page(struct intel_vgpu *vgpu)
 	vgpu_vreg_t(vgpu, vgtif_reg(vgt_caps)) |= VGT_CAPS_HWSP_EMULATION;
 	vgpu_vreg_t(vgpu, vgtif_reg(vgt_caps)) |= VGT_CAPS_HUGE_GTT;
 
-	vgpu_vreg_t(vgpu, vgtif_reg(avail_rs.mappable_gmadr.base)) =
-		vgpu_aperture_gmadr_base(vgpu);
+	vgpu_vreg_t(vgpu, vgtif_reg(avail_rs.mappable_gmadr.base)) = 0;
 	vgpu_vreg_t(vgpu, vgtif_reg(avail_rs.mappable_gmadr.size)) =
 		vgpu_aperture_sz(vgpu);
 	vgpu_vreg_t(vgpu, vgtif_reg(avail_rs.nonmappable_gmadr.base)) =
@@ -524,6 +523,9 @@ void intel_gvt_reset_vgpu_locked(struct intel_vgpu *vgpu, bool dmlr,
 {
 	struct intel_gvt *gvt = vgpu->gvt;
 	struct intel_gvt_workload_scheduler *scheduler = &gvt->scheduler;
+	u64 maddr = vgpu_vreg_t(vgpu, vgtif_reg(avail_rs.mappable_gmadr.base));
+	u64 unmaddr = vgpu_vreg_t(vgpu,
+				vgtif_reg(avail_rs.nonmappable_gmadr.base));
 	unsigned int resetting_eng = dmlr ? ALL_ENGINES : engine_mask;
 
 	gvt_dbg_core("------------------------------------------\n");
@@ -556,6 +558,10 @@ void intel_gvt_reset_vgpu_locked(struct intel_vgpu *vgpu, bool dmlr,
 
 		intel_vgpu_reset_mmio(vgpu, dmlr);
 		populate_pvinfo_page(vgpu);
+		vgpu_vreg_t(vgpu, vgtif_reg(avail_rs.mappable_gmadr.base)) =
+			maddr;
+		vgpu_vreg_t(vgpu, vgtif_reg(avail_rs.nonmappable_gmadr.base)) =
+			unmaddr;
 		intel_vgpu_reset_display(vgpu);
 
 		if (dmlr) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 6/8] drm/i915/gvt: Apply g2h adjustment to buffer start gma for dmabuf
  2019-02-19  7:42 [PATCH 0/8] VFIO Device states interface in GVT Yan Zhao
                   ` (4 preceding siblings ...)
  2019-02-19  7:46 ` [PATCH 5/8] drm/i915/gvt: Align the guest gm aperture start offset for live migration Yan Zhao
@ 2019-02-19  7:46 ` Yan Zhao
  2019-02-19  7:46 ` [PATCH 7/8] drm/i915/gvt: vGPU device config data save/restore interface Yan Zhao
  2019-02-19  7:46 ` [PATCH 8/8] drm/i915/gvt: VFIO device states interfaces Yan Zhao
  7 siblings, 0 replies; 10+ messages in thread
From: Yan Zhao @ 2019-02-19  7:46 UTC (permalink / raw)
  To: intel-gvt-dev, alex.williamson; +Cc: kvm, linux-kernel, Yulei Zhang

From: Yulei Zhang <yulei.zhang@intel.com>

Adjust the buffer start gma in dmabuf for display in host domain.

Signed-off-by: Yulei Zhang <yulei.zhang@intel.com>
---
 drivers/gpu/drm/i915/gvt/dmabuf.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gvt/dmabuf.c b/drivers/gpu/drm/i915/gvt/dmabuf.c
index 51ed99a37803..e96c655c40bc 100644
--- a/drivers/gpu/drm/i915/gvt/dmabuf.c
+++ b/drivers/gpu/drm/i915/gvt/dmabuf.c
@@ -293,6 +293,9 @@ static int vgpu_get_plane_info(struct drm_device *dev,
 		return -EFAULT;
 	}
 
+	/* Apply g2h adjust to buffer start gma for display */
+	intel_gvt_ggtt_gmadr_g2h(vgpu, info->start, &info->start);
+
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 7/8] drm/i915/gvt: vGPU device config data save/restore interface
  2019-02-19  7:42 [PATCH 0/8] VFIO Device states interface in GVT Yan Zhao
                   ` (5 preceding siblings ...)
  2019-02-19  7:46 ` [PATCH 6/8] drm/i915/gvt: Apply g2h adjustment to buffer start gma for dmabuf Yan Zhao
@ 2019-02-19  7:46 ` Yan Zhao
  2019-02-20  9:39   ` Zhenyu Wang
  2019-02-19  7:46 ` [PATCH 8/8] drm/i915/gvt: VFIO device states interfaces Yan Zhao
  7 siblings, 1 reply; 10+ messages in thread
From: Yan Zhao @ 2019-02-19  7:46 UTC (permalink / raw)
  To: intel-gvt-dev, alex.williamson
  Cc: kvm, linux-kernel, Yan Zhao, Yulei Zhang, Xiao Zheng, Zhenyu Wang

The patch implments the gvt interface intel_gvt_save_restore to
save/restore vGPU's device config data for live migration.

vGPU device config data includes vreg, vggtt, vcfg space, workloads, ppgtt,
execlist.
It does not include dirty pages in system memory produced by vGPU.

Signed-off-by: Yulei Zhang <yulei.zhang@intel.com>
Signed-off-by: Xiao Zheng <xiao.zheng@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 drivers/gpu/drm/i915/gvt/Makefile   |   2 +-
 drivers/gpu/drm/i915/gvt/execlist.c |   2 +-
 drivers/gpu/drm/i915/gvt/gtt.c      |   2 +-
 drivers/gpu/drm/i915/gvt/gtt.h      |   3 +
 drivers/gpu/drm/i915/gvt/gvt.c      |   1 +
 drivers/gpu/drm/i915/gvt/gvt.h      |   9 +
 drivers/gpu/drm/i915/gvt/migrate.c  | 863 ++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/gvt/migrate.h  |  97 ++++
 drivers/gpu/drm/i915/gvt/mmio.c     |  13 +
 drivers/gpu/drm/i915/gvt/mmio.h     |   1 +
 drivers/gpu/drm/i915/gvt/vgpu.c     |   1 +
 11 files changed, 991 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gvt/migrate.c
 create mode 100644 drivers/gpu/drm/i915/gvt/migrate.h

diff --git a/drivers/gpu/drm/i915/gvt/Makefile b/drivers/gpu/drm/i915/gvt/Makefile
index b016dc753db9..f863fbed1792 100644
--- a/drivers/gpu/drm/i915/gvt/Makefile
+++ b/drivers/gpu/drm/i915/gvt/Makefile
@@ -3,7 +3,7 @@ GVT_DIR := gvt
 GVT_SOURCE := gvt.o aperture_gm.o handlers.o vgpu.o trace_points.o firmware.o \
 	interrupt.o gtt.o cfg_space.o opregion.o mmio.o display.o edid.o \
 	execlist.o scheduler.o sched_policy.o mmio_context.o cmd_parser.o debugfs.o \
-	fb_decoder.o dmabuf.o page_track.o
+	fb_decoder.o dmabuf.o page_track.o migrate.o
 
 ccflags-y				+= -I$(src) -I$(src)/$(GVT_DIR)
 i915-y					+= $(addprefix $(GVT_DIR)/, $(GVT_SOURCE))
diff --git a/drivers/gpu/drm/i915/gvt/execlist.c b/drivers/gpu/drm/i915/gvt/execlist.c
index 70494e394d2c..992e2260eec9 100644
--- a/drivers/gpu/drm/i915/gvt/execlist.c
+++ b/drivers/gpu/drm/i915/gvt/execlist.c
@@ -437,7 +437,7 @@ static int complete_execlist_workload(struct intel_vgpu_workload *workload)
 	return ret;
 }
 
-static int submit_context(struct intel_vgpu *vgpu, int ring_id,
+int submit_context(struct intel_vgpu *vgpu, int ring_id,
 		struct execlist_ctx_descriptor_format *desc,
 		bool emulate_schedule_in)
 {
diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index 753ad975c958..18e3f08b0553 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -589,7 +589,7 @@ static inline void ppgtt_set_shadow_root_entry(struct intel_vgpu_mm *mm,
 	_ppgtt_set_root_entry(mm, entry, index, false);
 }
 
-static void ggtt_get_guest_entry(struct intel_vgpu_mm *mm,
+void ggtt_get_guest_entry(struct intel_vgpu_mm *mm,
 		struct intel_gvt_gtt_entry *entry, unsigned long index)
 {
 	struct intel_gvt_gtt_pte_ops *pte_ops = mm->vgpu->gvt->gtt.pte_ops;
diff --git a/drivers/gpu/drm/i915/gvt/gtt.h b/drivers/gpu/drm/i915/gvt/gtt.h
index d8cb04cc946d..73709768b666 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.h
+++ b/drivers/gpu/drm/i915/gvt/gtt.h
@@ -270,6 +270,9 @@ struct intel_vgpu_mm *intel_vgpu_get_ppgtt_mm(struct intel_vgpu *vgpu,
 
 int intel_vgpu_put_ppgtt_mm(struct intel_vgpu *vgpu, u64 pdps[]);
 
+void ggtt_get_guest_entry(struct intel_vgpu_mm *mm,
+		struct intel_gvt_gtt_entry *entry, unsigned long index);
+
 int intel_vgpu_emulate_ggtt_mmio_read(struct intel_vgpu *vgpu,
 	unsigned int off, void *p_data, unsigned int bytes);
 
diff --git a/drivers/gpu/drm/i915/gvt/gvt.c b/drivers/gpu/drm/i915/gvt/gvt.c
index 733a2a0d0c30..3dd9e4ebd39b 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.c
+++ b/drivers/gpu/drm/i915/gvt/gvt.c
@@ -185,6 +185,7 @@ static const struct intel_gvt_ops intel_gvt_ops = {
 	.vgpu_query_plane = intel_vgpu_query_plane,
 	.vgpu_get_dmabuf = intel_vgpu_get_dmabuf,
 	.write_protect_handler = intel_vgpu_page_track_handler,
+	.vgpu_save_restore = intel_gvt_save_restore,
 };
 
 /**
diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
index 1f5ef59a36ac..cfde510e9d77 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.h
+++ b/drivers/gpu/drm/i915/gvt/gvt.h
@@ -46,6 +46,7 @@
 #include "sched_policy.h"
 #include "mmio_context.h"
 #include "cmd_parser.h"
+#include "migrate.h"
 #include "fb_decoder.h"
 #include "dmabuf.h"
 #include "page_track.h"
@@ -510,6 +511,8 @@ void intel_gvt_reset_vgpu_locked(struct intel_vgpu *vgpu, bool dmlr,
 void intel_gvt_reset_vgpu(struct intel_vgpu *vgpu);
 void intel_gvt_activate_vgpu(struct intel_vgpu *vgpu);
 void intel_gvt_deactivate_vgpu(struct intel_vgpu *vgpu);
+int intel_gvt_save_restore(struct intel_vgpu *vgpu, char *buf,
+		size_t count, void *base, uint64_t off, bool restore);
 
 /* validating GM functions */
 #define vgpu_gmadr_is_aperture(vgpu, gmadr) \
@@ -609,6 +612,9 @@ struct intel_gvt_ops {
 	int (*vgpu_get_dmabuf)(struct intel_vgpu *vgpu, unsigned int);
 	int (*write_protect_handler)(struct intel_vgpu *, u64, void *,
 				     unsigned int);
+	int (*vgpu_save_restore)(struct intel_vgpu *vgpu, char *buf,
+					size_t count, void *base,
+					uint64_t off, bool restore);
 };
 
 
@@ -722,6 +728,9 @@ int intel_gvt_debugfs_add_vgpu(struct intel_vgpu *vgpu);
 void intel_gvt_debugfs_remove_vgpu(struct intel_vgpu *vgpu);
 int intel_gvt_debugfs_init(struct intel_gvt *gvt);
 void intel_gvt_debugfs_clean(struct intel_gvt *gvt);
+int submit_context(struct intel_vgpu *vgpu, int ring_id,
+		struct execlist_ctx_descriptor_format *desc,
+		bool emulate_schedule_in);
 
 
 #include "trace.h"
diff --git a/drivers/gpu/drm/i915/gvt/migrate.c b/drivers/gpu/drm/i915/gvt/migrate.c
new file mode 100644
index 000000000000..dca6eae6f5c9
--- /dev/null
+++ b/drivers/gpu/drm/i915/gvt/migrate.c
@@ -0,0 +1,863 @@
+/*
+ * Copyright(c) 2011-2016 Intel Corporation. All rights reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * Authors:
+ *    Yulei Zhang <yulei.zhang@intel.com>
+ *    Xiao Zheng <xiao.zheng@intel.com>
+ */
+
+#include "i915_drv.h"
+#include "gvt.h"
+#include "i915_pvinfo.h"
+
+#define INV (-1)
+#define RULES_NUM(x) (sizeof(x)/sizeof(gvt_migration_obj_t))
+#define FOR_EACH_OBJ(obj, rules) \
+	for (obj = rules; obj->region.type != GVT_MIGRATION_NONE; obj++)
+#define MIG_VREG_RESTORE(vgpu, off)					\
+	{								\
+		u32 data = vgpu_vreg(vgpu, (off));			\
+		u64 pa = intel_vgpu_mmio_offset_to_gpa(vgpu, off);	\
+		intel_vgpu_emulate_mmio_write(vgpu, pa, &data, 4);	\
+	}
+
+/* s - struct
+ * t - type of obj
+ * m - size of obj
+ * ops - operation override callback func
+ */
+#define MIGRATION_UNIT(_s, _t, _m, _ops) {		\
+.img		= NULL,					\
+.region.type	= _t,					\
+.region.size	= _m,				\
+.ops		= &(_ops),				\
+.name		= "["#_s":"#_t"]\0"			\
+}
+
+#define MIGRATION_END {		\
+	NULL, NULL, 0,		\
+	{GVT_MIGRATION_NONE, 0},\
+	NULL,	\
+	NULL	\
+}
+
+static DEFINE_MUTEX(gvt_migration);
+static int image_header_load(const struct gvt_migration_obj_t *obj, u32 size);
+static int image_header_save(const struct gvt_migration_obj_t *obj);
+static int vreg_load(const struct gvt_migration_obj_t *obj, u32 size);
+static int vreg_save(const struct gvt_migration_obj_t *obj);
+static int sreg_load(const struct gvt_migration_obj_t *obj, u32 size);
+static int sreg_save(const struct gvt_migration_obj_t *obj);
+static int vcfg_space_load(const struct gvt_migration_obj_t *obj, u32 size);
+static int vcfg_space_save(const struct gvt_migration_obj_t *obj);
+static int vggtt_load(const struct gvt_migration_obj_t *obj, u32 size);
+static int vggtt_save(const struct gvt_migration_obj_t *obj);
+static int workload_load(const struct gvt_migration_obj_t *obj, u32 size);
+static int workload_save(const struct gvt_migration_obj_t *obj);
+static int ppgtt_load(const struct gvt_migration_obj_t *obj, u32 size);
+static int ppgtt_save(const struct gvt_migration_obj_t *obj);
+static int execlist_load(const struct gvt_migration_obj_t *obj, u32 size);
+static int execlist_save(const struct gvt_migration_obj_t *obj);
+
+/***********************************************
+ * Internal Static Functions
+ ***********************************************/
+struct gvt_migration_operation_t vReg_ops = {
+	.pre_copy = NULL,
+	.pre_save = vreg_save,
+	.pre_load = vreg_load,
+	.post_load = NULL,
+};
+
+struct gvt_migration_operation_t sReg_ops = {
+	.pre_copy = NULL,
+	.pre_save = sreg_save,
+	.pre_load = sreg_load,
+	.post_load = NULL,
+};
+
+struct gvt_migration_operation_t vcfg_space_ops = {
+	.pre_copy = NULL,
+	.pre_save = vcfg_space_save,
+	.pre_load = vcfg_space_load,
+	.post_load = NULL,
+};
+
+struct gvt_migration_operation_t vgtt_info_ops = {
+	.pre_copy = NULL,
+	.pre_save = vggtt_save,
+	.pre_load = vggtt_load,
+	.post_load = NULL,
+};
+
+struct gvt_migration_operation_t image_header_ops = {
+	.pre_copy = NULL,
+	.pre_save = image_header_save,
+	.pre_load = image_header_load,
+	.post_load = NULL,
+};
+
+struct gvt_migration_operation_t workload_ops = {
+	.pre_copy = NULL,
+	.pre_save = workload_save,
+	.pre_load = workload_load,
+	.post_load = NULL,
+};
+
+struct gvt_migration_operation_t ppgtt_ops = {
+	.pre_copy = NULL,
+	.pre_save = ppgtt_save,
+	.pre_load = ppgtt_load,
+	.post_load = NULL,
+};
+
+struct gvt_migration_operation_t execlist_ops = {
+	.pre_copy = NULL,
+	.pre_save = execlist_save,
+	.pre_load = execlist_load,
+	.post_load = NULL,
+};
+
+/* gvt_device_objs[] are list of gvt_migration_obj_t objs
+ * Each obj has its operation method to save to qemu image
+ * and restore from qemu image during the migration.
+ *
+ * for each saved bject, it will have a region header
+ * struct gvt_region_t {
+ *   region_type;
+ *   region_size;
+ * }
+ *__________________  _________________   __________________
+ *|x64 (Source)    |  |image region    |  |x64 (Target)    |
+ *|________________|  |________________|  |________________|
+ *|    Region A    |  |   Region A     |  |   Region A     |
+ *|    Header      |  |   offset=0     |  | allocate a page|
+ *|    content     |  |                |  | copy data here |
+ *|----------------|  |     ...        |  |----------------|
+ *|    Region B    |  |     ...        |  |   Region B     |
+ *|    Header      |  |----------------|  |                |
+ *|    content        |   Region B     |  |                |
+ *|----------------|  |   offset=4096  |  |----------------|
+ *                    |                |
+ *                    |----------------|
+ *
+ * On the target side, it will parser the incoming data copy
+ * from Qemu image, and apply difference restore handlers depends
+ * on the region type.
+ */
+static struct gvt_migration_obj_t gvt_device_objs[] = {
+	MIGRATION_UNIT(struct intel_vgpu,
+			GVT_MIGRATION_HEAD,
+			sizeof(struct gvt_image_header_t),
+			image_header_ops),
+	MIGRATION_UNIT(struct intel_vgpu,
+			GVT_MIGRATION_CFG_SPACE,
+			PCI_CFG_SPACE_EXP_SIZE,
+			vcfg_space_ops),
+	MIGRATION_UNIT(struct intel_vgpu,
+			GVT_MIGRATION_SREG,
+			GVT_MMIO_SIZE, sReg_ops),
+	MIGRATION_UNIT(struct intel_vgpu,
+			GVT_MIGRATION_VREG,
+			GVT_MMIO_SIZE, vReg_ops),
+	MIGRATION_UNIT(struct intel_vgpu,
+			GVT_MIGRATION_GTT,
+			0, vgtt_info_ops),
+	MIGRATION_UNIT(struct intel_vgpu,
+			GVT_MIGRATION_PPGTT,
+			0, ppgtt_ops),
+	MIGRATION_UNIT(struct intel_vgpu,
+			GVT_MIGRATION_WORKLOAD,
+			0, workload_ops),
+	MIGRATION_UNIT(struct intel_vgpu,
+			GVT_MIGRATION_EXECLIST,
+			0, execlist_ops),
+	MIGRATION_END,
+};
+
+static inline void
+update_image_region_start_pos(struct gvt_migration_obj_t *obj, int pos)
+{
+	obj->offset = pos;
+}
+
+static inline void
+update_image_region_base(struct gvt_migration_obj_t *obj, void *base)
+{
+	obj->img = base;
+}
+
+static inline void
+update_status_region_base(struct gvt_migration_obj_t *obj, void *base)
+{
+	obj->vgpu = base;
+}
+
+static inline struct gvt_migration_obj_t *
+find_migration_obj(enum gvt_migration_type_t type)
+{
+	struct gvt_migration_obj_t *obj;
+
+	for (obj = gvt_device_objs;
+		obj->region.type != GVT_MIGRATION_NONE; obj++)
+		if (obj->region.type == type)
+			return obj;
+	return NULL;
+}
+
+static int image_header_save(const struct gvt_migration_obj_t *obj)
+{
+	struct gvt_region_t region;
+	struct gvt_image_header_t header;
+
+	region.type = GVT_MIGRATION_HEAD;
+	region.size = sizeof(struct gvt_image_header_t);
+	memcpy(obj->img, &region, sizeof(struct gvt_region_t));
+
+	header.version = GVT_MIGRATION_VERSION;
+	header.data_size = obj->offset;
+	header.crc_check = 0; /* CRC check skipped for now*/
+
+	memcpy(obj->img + sizeof(struct gvt_region_t), &header,
+			sizeof(struct gvt_image_header_t));
+
+	return sizeof(struct gvt_region_t) + sizeof(struct gvt_image_header_t);
+}
+
+static int image_header_load(const struct gvt_migration_obj_t *obj, u32 size)
+{
+	struct gvt_image_header_t header;
+
+	if (unlikely(size != sizeof(struct gvt_image_header_t))) {
+		gvt_err("migration obj size isn't match between target and image! memsize=%d imgsize=%d\n",
+		obj->region.size,
+		size);
+		return INV;
+	}
+
+	memcpy(&header, obj->img + obj->offset,
+		sizeof(struct gvt_image_header_t));
+
+	return header.data_size;
+}
+
+static int vcfg_space_save(const struct gvt_migration_obj_t *obj)
+{
+	struct intel_vgpu *vgpu = (struct intel_vgpu *) obj->vgpu;
+	int n_transfer = INV;
+	void *src = vgpu->cfg_space.virtual_cfg_space;
+	void *des = obj->img + obj->offset;
+
+	memcpy(des, &obj->region, sizeof(struct gvt_region_t));
+
+	des += sizeof(struct gvt_region_t);
+	n_transfer = obj->region.size;
+
+	memcpy(des, src, n_transfer);
+	return sizeof(struct gvt_region_t) + n_transfer;
+}
+
+static int vcfg_space_load(const struct gvt_migration_obj_t *obj, u32 size)
+{
+	struct intel_vgpu *vgpu = (struct intel_vgpu *) obj->vgpu;
+	void *dest = vgpu->cfg_space.virtual_cfg_space;
+	int n_transfer = INV;
+
+	if (unlikely(size != obj->region.size)) {
+		gvt_err("migration obj size isn't match between target and image! memsize=%d imgsize=%d\n",
+		obj->region.size,
+		size);
+	} else {
+		n_transfer = obj->region.size;
+		memcpy(dest, obj->img + obj->offset, n_transfer);
+	}
+
+	return n_transfer;
+}
+
+static int sreg_save(const struct gvt_migration_obj_t *obj)
+{
+	struct intel_vgpu *vgpu = (struct intel_vgpu *) obj->vgpu;
+	int n_transfer = INV;
+	void *src = vgpu->mmio.sreg;
+	void *des = obj->img + obj->offset;
+
+	memcpy(des, &obj->region, sizeof(struct gvt_region_t));
+
+	des += sizeof(struct gvt_region_t);
+	n_transfer = obj->region.size;
+
+	memcpy(des, src, n_transfer);
+	return sizeof(struct gvt_region_t) + n_transfer;
+}
+
+static int sreg_load(const struct gvt_migration_obj_t *obj, u32 size)
+{
+	struct intel_vgpu *vgpu = (struct intel_vgpu *) obj->vgpu;
+	void *dest = vgpu->mmio.sreg;
+	int n_transfer = INV;
+
+	if (unlikely(size != obj->region.size)) {
+		gvt_err("migration obj size isn't match between target and image! memsize=%d imgsize=%d\n",
+		obj->region.size,
+		size);
+	} else {
+		n_transfer = obj->region.size;
+		memcpy(dest, obj->img + obj->offset, n_transfer);
+	}
+
+	return n_transfer;
+}
+
+static int ppgtt_save(const struct gvt_migration_obj_t *obj)
+{
+	struct intel_vgpu *vgpu = (struct intel_vgpu *) obj->vgpu;
+	struct list_head *pos;
+	struct intel_vgpu_mm *mm;
+	struct gvt_ppgtt_entry_t entry;
+	struct gvt_region_t region;
+	int num = 0;
+	u32 sz = sizeof(struct gvt_ppgtt_entry_t);
+	void *des = obj->img + obj->offset;
+
+	list_for_each(pos, &vgpu->gtt.ppgtt_mm_list_head) {
+		mm = container_of(pos, struct intel_vgpu_mm, ppgtt_mm.list);
+		if (mm->type != INTEL_GVT_MM_PPGTT)
+			continue;
+
+		entry.page_table_level = mm->ppgtt_mm.root_entry_type;
+		memcpy(entry.pdp, mm->ppgtt_mm.guest_pdps, 32);
+
+		memcpy(des + sizeof(struct gvt_region_t) + (num * sz),
+			&entry, sz);
+		num++;
+	}
+
+	region.type = GVT_MIGRATION_PPGTT;
+	region.size = num * sz;
+	memcpy(des, &region, sizeof(struct gvt_region_t));
+
+	return sizeof(struct gvt_region_t) + region.size;
+}
+
+static int ppgtt_load(const struct gvt_migration_obj_t *obj, u32 size)
+{
+	struct intel_vgpu *vgpu = (struct intel_vgpu *) obj->vgpu;
+	int n_transfer = INV;
+	struct gvt_ppgtt_entry_t entry;
+	struct intel_vgpu_mm *mm;
+	void *src = obj->img + obj->offset;
+	int i;
+	u32 sz = sizeof(struct gvt_ppgtt_entry_t);
+
+	if (size == 0)
+		return size;
+
+	if (unlikely(size % sz) != 0) {
+		gvt_err("migration obj size isn't match between target and image! memsize=%d imgsize=%d\n",
+		obj->region.size,
+		size);
+		return n_transfer;
+	}
+
+	for (i = 0; i < size / sz; i++) {
+		memcpy(&entry, src + (i * sz), sz);
+		mm = intel_vgpu_create_ppgtt_mm(vgpu, entry.page_table_level,
+						entry.pdp);
+		if (IS_ERR(mm)) {
+			gvt_vgpu_err("fail to create mm object.\n");
+			return n_transfer;
+		}
+	}
+
+	n_transfer = size;
+
+	return n_transfer;
+}
+
+static int vreg_save(const struct gvt_migration_obj_t *obj)
+{
+	struct intel_vgpu *vgpu = (struct intel_vgpu *) obj->vgpu;
+	int n_transfer = INV;
+	void *src = vgpu->mmio.vreg;
+	void *des = obj->img + obj->offset;
+
+	memcpy(des, &obj->region, sizeof(struct gvt_region_t));
+
+	des += sizeof(struct gvt_region_t);
+	n_transfer = obj->region.size;
+
+	memcpy(des, src, n_transfer);
+	return sizeof(struct gvt_region_t) + n_transfer;
+}
+
+static int vreg_load(const struct gvt_migration_obj_t *obj, u32 size)
+{
+	struct intel_vgpu *vgpu = (struct intel_vgpu *) obj->vgpu;
+	void *dest = vgpu->mmio.vreg;
+	int n_transfer = INV;
+	struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	enum pipe pipe;
+
+	if (unlikely(size != obj->region.size)) {
+		gvt_err("migration obj size isn't match between target and image! memsize=%d imgsize=%d\n",
+		obj->region.size,
+		size);
+		goto exit;
+	} else {
+		n_transfer = obj->region.size;
+		memcpy(dest, obj->img + obj->offset, n_transfer);
+	}
+
+	//restore vblank emulation
+	for (pipe = PIPE_A; pipe < I915_MAX_PIPES; ++pipe)
+		MIG_VREG_RESTORE(vgpu, i915_mmio_reg_offset(PIPECONF(pipe)));
+
+	//restore ring mode register for execlist init
+	for_each_engine(engine, dev_priv, id)
+		MIG_VREG_RESTORE(vgpu,
+				i915_mmio_reg_offset(RING_MODE_GEN7(engine)));
+
+	for_each_engine(engine, dev_priv, id)
+		MIG_VREG_RESTORE(vgpu,
+			i915_mmio_reg_offset(RING_HWS_PGA(engine->mmio_base)));
+
+	memcpy(dest, obj->img + obj->offset, n_transfer);
+exit:
+	return n_transfer;
+}
+
+static int execlist_save(const struct gvt_migration_obj_t *obj)
+{
+	struct intel_vgpu *vgpu = (struct intel_vgpu *) obj->vgpu;
+	struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv;
+	struct gvt_region_t region;
+	struct intel_engine_cs *engine;
+	u32 sz = sizeof(struct intel_vgpu_elsp_dwords);
+	unsigned int i;
+
+	void *des = obj->img + obj->offset;
+
+	for_each_engine(engine, dev_priv, i) {
+		memcpy(des + sizeof(struct gvt_region_t) + (i * sz),
+			&vgpu->submission.execlist[engine->id].elsp_dwords, sz);
+	}
+
+	region.type = GVT_MIGRATION_EXECLIST;
+	region.size = i * sz;
+	memcpy(des, &region, sizeof(struct gvt_region_t));
+	return sizeof(struct gvt_region_t) + region.size;
+}
+
+static int execlist_load(const struct gvt_migration_obj_t *obj, u32 size)
+{
+	struct intel_vgpu *vgpu = (struct intel_vgpu *) obj->vgpu;
+	struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv;
+	struct intel_engine_cs *engine;
+	u32 sz = sizeof(struct intel_vgpu_elsp_dwords);
+	void *src = obj->img + obj->offset;
+	int n_transfer = INV;
+	unsigned int i;
+
+	if (size == 0)
+		return size;
+
+	if (unlikely(size % sz) != 0) {
+		gvt_err("migration obj size isn't match between target and image! memsize=%d imgsize=%d\n",
+		obj->region.size,
+		size);
+		return n_transfer;
+	}
+
+	for_each_engine(engine, dev_priv, i) {
+		memcpy(&vgpu->submission.execlist[engine->id].elsp_dwords,
+			src + (i * sz), sz);
+	}
+
+	n_transfer = size;
+
+	return n_transfer;
+}
+
+static int workload_save(const struct gvt_migration_obj_t *obj)
+{
+	struct intel_vgpu *vgpu = (struct intel_vgpu *) obj->vgpu;
+	struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv;
+	struct gvt_region_t region;
+	struct intel_engine_cs *engine;
+	struct intel_vgpu_workload *pos, *n;
+	unsigned int i;
+	struct gvt_pending_workload_t workload;
+	void *des = obj->img + obj->offset;
+	unsigned int num = 0;
+	u32 sz = sizeof(struct gvt_pending_workload_t);
+
+	for_each_engine(engine, dev_priv, i) {
+		list_for_each_entry_safe(pos, n,
+			&vgpu->submission.workload_q_head[engine->id], list) {
+			workload.ring_id = pos->ring_id;
+			workload.ctx_desc = pos->ctx_desc;
+			workload.emulate_schedule_in = pos->emulate_schedule_in;
+			workload.elsp_dwords = pos->elsp_dwords;
+			list_del_init(&pos->list);
+			intel_vgpu_destroy_workload(pos);
+			memcpy(des + sizeof(struct gvt_region_t) + (num * sz),
+				&workload, sz);
+			num++;
+		}
+	}
+
+	region.type = GVT_MIGRATION_WORKLOAD;
+	region.size = num * sz;
+	memcpy(des, &region, sizeof(struct gvt_region_t));
+
+	return sizeof(struct gvt_region_t) + region.size;
+}
+
+static int workload_load(const struct gvt_migration_obj_t *obj, u32 size)
+{
+	struct intel_vgpu *vgpu = (struct intel_vgpu *) obj->vgpu;
+	int n_transfer = INV;
+	struct gvt_pending_workload_t workload;
+	void *src = obj->img + obj->offset;
+	u32 sz = sizeof(struct gvt_pending_workload_t);
+	int i;
+
+	if (size == 0)
+		return size;
+
+	if (unlikely(size % sz) != 0) {
+		gvt_err("migration obj size isn't match between target and image! memsize=%d imgsize=%d\n",
+		obj->region.size,
+		size);
+		return n_transfer;
+	}
+	for (i = 0; i < size / sz; i++) {
+		struct intel_vgpu_execlist *execlist;
+
+		execlist = &vgpu->submission.execlist[workload.ring_id];
+		memcpy(&workload, src + (i * sz), sz);
+		if (workload.emulate_schedule_in) {
+			execlist->elsp_dwords = workload.elsp_dwords;
+			execlist->elsp_dwords.index = 0;
+		}
+		submit_context(vgpu, workload.ring_id,
+			&workload.ctx_desc, workload.emulate_schedule_in);
+	}
+
+	n_transfer = size;
+
+	return n_transfer;
+}
+
+static int
+mig_ggtt_save_restore(struct intel_vgpu_mm *ggtt_mm,
+		void *data, u64 gm_offset,
+		u64 gm_sz,
+		bool save_to_image)
+{
+	struct intel_vgpu *vgpu = ggtt_mm->vgpu;
+	struct intel_gvt_gtt_gma_ops *gma_ops = vgpu->gvt->gtt.gma_ops;
+
+	void *ptable;
+	int sz;
+	int shift = vgpu->gvt->device_info.gtt_entry_size_shift;
+
+	ptable = ggtt_mm->ggtt_mm.virtual_ggtt +
+	    (gma_ops->gma_to_ggtt_pte_index(gm_offset) << shift);
+	sz = (gm_sz >> I915_GTT_PAGE_SHIFT) << shift;
+
+	if (save_to_image)
+		memcpy(data, ptable, sz);
+	else
+		memcpy(ptable, data, sz);
+
+	return sz;
+}
+
+static int vggtt_save(const struct gvt_migration_obj_t *obj)
+{
+	int ret = INV;
+	struct intel_vgpu *vgpu = (struct intel_vgpu *) obj->vgpu;
+	struct intel_vgpu_mm *ggtt_mm = vgpu->gtt.ggtt_mm;
+	void *des = obj->img + obj->offset;
+	struct gvt_region_t region;
+	int sz;
+
+	u64 aperture_offset = vgpu_guest_aperture_offset(vgpu);
+	u64 aperture_sz = vgpu_aperture_sz(vgpu);
+	u64 hidden_gm_offset = vgpu_guest_hidden_offset(vgpu);
+	u64 hidden_gm_sz = vgpu_hidden_sz(vgpu);
+
+	des += sizeof(struct gvt_region_t);
+
+	/*TODO:512MB GTT takes total 1024KB page table size, optimization here*/
+
+	gvt_dbg_core("Guest aperture=0x%llx (HW: 0x%llx),Guest Hidden=0x%llx (HW:0x%llx)\n",
+		aperture_offset, vgpu_aperture_offset(vgpu),
+		hidden_gm_offset, vgpu_hidden_offset(vgpu));
+
+	/*TODO:to be fixed after removal of address ballooning */
+	ret = 0;
+
+	/* aperture */
+	sz = mig_ggtt_save_restore(ggtt_mm, des,
+		aperture_offset, aperture_sz, true);
+	des += sz;
+	ret += sz;
+
+	/* hidden gm */
+	sz = mig_ggtt_save_restore(ggtt_mm, des,
+		hidden_gm_offset, hidden_gm_sz, true);
+	des += sz;
+	ret += sz;
+
+	/* Save the total size of this session */
+	region.type = GVT_MIGRATION_GTT;
+	region.size = ret;
+	memcpy(obj->img + obj->offset, &region, sizeof(struct gvt_region_t));
+
+	ret += sizeof(struct gvt_region_t);
+
+	return ret;
+}
+
+static int vggtt_load(const struct gvt_migration_obj_t *obj, u32 size)
+{
+	int ret;
+	u32 ggtt_index;
+	void *src;
+	int sz;
+
+	struct intel_vgpu *vgpu = (struct intel_vgpu *) obj->vgpu;
+	struct intel_vgpu_mm *ggtt_mm = vgpu->gtt.ggtt_mm;
+
+	int shift = vgpu->gvt->device_info.gtt_entry_size_shift;
+
+	/* offset to bar1 beginning */
+	u64 dest_aperture_offset = vgpu_guest_aperture_offset(vgpu);
+	u64 aperture_sz = vgpu_aperture_sz(vgpu);
+	u64 dest_hidden_gm_offset = vgpu_guest_hidden_offset(vgpu);
+	u64 hidden_gm_sz = vgpu_hidden_sz(vgpu);
+
+	gvt_dbg_core("Guest aperture=0x%llx (HW: 0x%llx), Guest Hidden=0x%llx (HW:0x%llx)\n",
+		dest_aperture_offset, vgpu_aperture_offset(vgpu),
+		dest_hidden_gm_offset, vgpu_hidden_offset(vgpu));
+
+	if ((size>>shift) !=
+			((aperture_sz + hidden_gm_sz) >> I915_GTT_PAGE_SHIFT)) {
+		gvt_err("ggtt restore failed due to page table size not match\n");
+		return INV;
+	}
+
+	ret = 0;
+	src = obj->img + obj->offset;
+
+	/* aperture */
+	sz = mig_ggtt_save_restore(ggtt_mm,
+		src, dest_aperture_offset, aperture_sz, false);
+	src += sz;
+	ret += sz;
+
+	/* hidden GM */
+	sz = mig_ggtt_save_restore(ggtt_mm, src,
+			dest_hidden_gm_offset, hidden_gm_sz, false);
+	ret += sz;
+
+	/* aperture/hidden GTT emulation from Source to Target */
+	for (ggtt_index = 0;
+	     ggtt_index < (gvt_ggtt_gm_sz(vgpu->gvt) >> I915_GTT_PAGE_SHIFT);
+	     ggtt_index++) {
+
+		if (vgpu_gmadr_is_valid(vgpu,
+					ggtt_index << I915_GTT_PAGE_SHIFT)) {
+			struct intel_gvt_gtt_pte_ops *ops =
+					vgpu->gvt->gtt.pte_ops;
+			struct intel_gvt_gtt_entry e;
+			u64 offset;
+			u64 pa;
+
+			/* TODO: hardcode to 64bit right now */
+			offset = vgpu->gvt->device_info.gtt_start_offset
+				+ (ggtt_index<<shift);
+
+			pa = intel_vgpu_mmio_offset_to_gpa(vgpu, offset);
+
+			/* read out virtual GTT entity and
+			 * trigger emulate write
+			 */
+			ggtt_get_guest_entry(ggtt_mm, &e, ggtt_index);
+			if (ops->test_present(&e)) {
+			/* same as gtt_emulate
+			 * _write(vgt, offset, &e.val64, 1<<shift);
+			 * Using vgt_emulate_write as to align with vReg load
+			 */
+				intel_vgpu_emulate_mmio_write(vgpu, pa,
+							&e.val64, 1<<shift);
+			}
+		}
+	}
+
+	return ret;
+}
+
+static int vgpu_save(const void *img)
+{
+	struct gvt_migration_obj_t *node;
+	int n_img_actual_saved = 0;
+
+	/* go by obj rules one by one */
+	FOR_EACH_OBJ(node, gvt_device_objs) {
+		int n_img = INV;
+
+		/* obj will copy data to image file img.offset */
+		update_image_region_start_pos(node, n_img_actual_saved);
+		if (node->ops->pre_save == NULL) {
+			n_img = 0;
+		} else {
+			n_img = node->ops->pre_save(node);
+			if (n_img == INV) {
+				gvt_err("Save obj %s failed\n",
+						node->name);
+				n_img_actual_saved = INV;
+				break;
+			}
+		}
+		/* show GREEN on screen with colorred term */
+		gvt_dbg_core("Save obj %s success with %d bytes\n",
+			       node->name, n_img);
+		n_img_actual_saved += n_img;
+
+		if (n_img_actual_saved >= MIGRATION_IMG_MAX_SIZE) {
+			gvt_err("Image size overflow!!! data=%d MAX=%ld\n",
+				n_img_actual_saved,
+				MIGRATION_IMG_MAX_SIZE);
+			/* Mark as invalid */
+			n_img_actual_saved = INV;
+			break;
+		}
+	}
+	/* update the header with real image size */
+	node = find_migration_obj(GVT_MIGRATION_HEAD);
+	update_image_region_start_pos(node, n_img_actual_saved);
+	node->ops->pre_save(node);
+	return n_img_actual_saved;
+}
+
+static int vgpu_restore(void *img)
+{
+	struct gvt_migration_obj_t *node;
+	struct gvt_region_t region;
+	int n_img_actual_recv = 0;
+	u32 n_img_actual_size;
+
+	/* load image header at first to get real size */
+	memcpy(&region, img, sizeof(struct gvt_region_t));
+	if (region.type != GVT_MIGRATION_HEAD) {
+		gvt_err("Invalid image. Doesn't start with image_head\n");
+		return INV;
+	}
+
+	n_img_actual_recv += sizeof(struct gvt_region_t);
+	node = find_migration_obj(region.type);
+	update_image_region_start_pos(node, n_img_actual_recv);
+	n_img_actual_size = node->ops->pre_load(node, region.size);
+	if (n_img_actual_size == INV) {
+		gvt_err("Load img %s failed\n", node->name);
+		return INV;
+	}
+
+	if (n_img_actual_size >= MIGRATION_IMG_MAX_SIZE) {
+		gvt_err("Invalid image. magic_id offset = 0x%x\n",
+				n_img_actual_size);
+		return INV;
+	}
+
+	n_img_actual_recv += sizeof(struct gvt_image_header_t);
+
+	do {
+		int n_img = INV;
+		/* parse each region head to get type and size */
+		memcpy(&region, img + n_img_actual_recv,
+				sizeof(struct gvt_region_t));
+		node = find_migration_obj(region.type);
+		if (node == NULL)
+			break;
+		n_img_actual_recv += sizeof(struct gvt_region_t);
+		update_image_region_start_pos(node, n_img_actual_recv);
+
+		if (node->ops->pre_load == NULL) {
+			n_img = 0;
+		} else {
+			n_img = node->ops->pre_load(node, region.size);
+			if (n_img == INV) {
+				/* Error occurred. colored as RED */
+				gvt_err("Load obj %s failed\n",
+						node->name);
+				n_img_actual_recv = INV;
+				break;
+			}
+		}
+		/* show GREEN on screen with colorred term */
+		gvt_dbg_core("Load obj %s success with %d bytes.\n",
+			       node->name, n_img);
+		n_img_actual_recv += n_img;
+	} while (n_img_actual_recv < MIGRATION_IMG_MAX_SIZE);
+
+	return n_img_actual_recv;
+}
+
+int intel_gvt_save_restore(struct intel_vgpu *vgpu, char *buf, size_t count,
+			   void *base, uint64_t off, bool restore)
+{
+	struct gvt_migration_obj_t *node;
+	int ret = 0;
+
+	mutex_lock(&gvt_migration);
+
+	FOR_EACH_OBJ(node, gvt_device_objs) {
+		update_image_region_base(node, base + off);
+		update_image_region_start_pos(node, INV);
+		update_status_region_base(node, vgpu);
+	}
+
+	if (restore) {
+		vgpu->pv_notified = true;
+		if (vgpu_restore(base + off) == INV) {
+			ret = -EFAULT;
+			goto exit;
+		}
+	} else {
+		if (vgpu_save(base + off) == INV) {
+			ret = -EFAULT;
+			goto exit;
+		}
+
+	}
+
+exit:
+	mutex_unlock(&gvt_migration);
+
+	return ret;
+}
diff --git a/drivers/gpu/drm/i915/gvt/migrate.h b/drivers/gpu/drm/i915/gvt/migrate.h
new file mode 100644
index 000000000000..99ecb4eda553
--- /dev/null
+++ b/drivers/gpu/drm/i915/gvt/migrate.h
@@ -0,0 +1,97 @@
+/*
+ * Copyright(c) 2011-2016 Intel Corporation. All rights reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * Authors:
+ *    Yulei Zhang <yulei.zhang@intel.com>
+ *    Xiao Zheng <xiao.zheng@intel.com>
+ */
+
+#ifndef __GVT_MIGRATE_H__
+#define __GVT_MIGRATE_H__
+
+#define MIGRATION_DIRTY_BITMAP_SIZE (16*1024UL)
+
+/* Assume 9MB is enough to descript VM kernel state */
+#define MIGRATION_IMG_MAX_SIZE (9*1024UL*1024UL)
+#define GVT_MMIO_SIZE (2*1024UL*1024UL)
+#define GVT_MIGRATION_VERSION	0
+
+enum gvt_migration_type_t {
+	GVT_MIGRATION_NONE,
+	GVT_MIGRATION_HEAD,
+	GVT_MIGRATION_CFG_SPACE,
+	GVT_MIGRATION_VREG,
+	GVT_MIGRATION_SREG,
+	GVT_MIGRATION_GTT,
+	GVT_MIGRATION_PPGTT,
+	GVT_MIGRATION_WORKLOAD,
+	GVT_MIGRATION_EXECLIST,
+};
+
+struct gvt_ppgtt_entry_t {
+	int page_table_level;
+	u64 pdp[4];
+};
+
+struct gvt_pending_workload_t {
+	int ring_id;
+	bool emulate_schedule_in;
+	struct execlist_ctx_descriptor_format ctx_desc;
+	struct intel_vgpu_elsp_dwords elsp_dwords;
+};
+
+struct gvt_region_t {
+	enum gvt_migration_type_t type;
+	u32 size;		/* obj size of bytes to read/write */
+};
+
+struct gvt_migration_obj_t {
+	void *img;
+	void *vgpu;
+	u32 offset;
+	struct gvt_region_t region;
+	/* operation func defines how data save-restore */
+	struct gvt_migration_operation_t *ops;
+	char *name;
+};
+
+struct gvt_migration_operation_t {
+	/* called during pre-copy stage, VM is still alive */
+	int (*pre_copy)(const struct gvt_migration_obj_t *obj);
+	/* called before when VM was paused,
+	 * return bytes transferred
+	 */
+	int (*pre_save)(const struct gvt_migration_obj_t *obj);
+	/* called before load the state of device */
+	int (*pre_load)(const struct gvt_migration_obj_t *obj, u32 size);
+	/* called after load the state of device, VM already alive */
+	int (*post_load)(const struct gvt_migration_obj_t *obj, u32 size);
+};
+
+struct gvt_image_header_t {
+	int version;
+	int data_size;
+	u64 crc_check;
+	u64 global_data[64];
+};
+
+#endif
diff --git a/drivers/gpu/drm/i915/gvt/mmio.c b/drivers/gpu/drm/i915/gvt/mmio.c
index 43f65848ecd6..6221d2f274fc 100644
--- a/drivers/gpu/drm/i915/gvt/mmio.c
+++ b/drivers/gpu/drm/i915/gvt/mmio.c
@@ -50,6 +50,19 @@ int intel_vgpu_gpa_to_mmio_offset(struct intel_vgpu *vgpu, u64 gpa)
 	return gpa - gttmmio_gpa;
 }
 
+/**
+ * intel_vgpu_mmio_offset_to_GPA - translate a MMIO offset to GPA
+ * @vgpu: a vGPU
+ *
+ * Returns:
+ * Zero on success, negative error code if failed
+ */
+int intel_vgpu_mmio_offset_to_gpa(struct intel_vgpu *vgpu, u64 offset)
+{
+	return offset + ((*(u64 *)(vgpu_cfg_space(vgpu) + PCI_BASE_ADDRESS_0)) &
+		~GENMASK(3, 0));
+}
+
 #define reg_is_mmio(gvt, reg)  \
 	(reg >= 0 && reg < gvt->device_info.mmio_size)
 
diff --git a/drivers/gpu/drm/i915/gvt/mmio.h b/drivers/gpu/drm/i915/gvt/mmio.h
index 1ffc69eba30e..a2bddb0257cf 100644
--- a/drivers/gpu/drm/i915/gvt/mmio.h
+++ b/drivers/gpu/drm/i915/gvt/mmio.h
@@ -82,6 +82,7 @@ void intel_vgpu_reset_mmio(struct intel_vgpu *vgpu, bool dmlr);
 void intel_vgpu_clean_mmio(struct intel_vgpu *vgpu);
 
 int intel_vgpu_gpa_to_mmio_offset(struct intel_vgpu *vgpu, u64 gpa);
+int intel_vgpu_mmio_offset_to_gpa(struct intel_vgpu *vgpu, u64 offset);
 
 int intel_vgpu_emulate_mmio_read(struct intel_vgpu *vgpu, u64 pa,
 				void *p_data, unsigned int bytes);
diff --git a/drivers/gpu/drm/i915/gvt/vgpu.c b/drivers/gpu/drm/i915/gvt/vgpu.c
index fcccda35a456..7676dcfdca09 100644
--- a/drivers/gpu/drm/i915/gvt/vgpu.c
+++ b/drivers/gpu/drm/i915/gvt/vgpu.c
@@ -213,6 +213,7 @@ void intel_gvt_activate_vgpu(struct intel_vgpu *vgpu)
 {
 	mutex_lock(&vgpu->gvt->lock);
 	vgpu->active = true;
+	intel_vgpu_start_schedule(vgpu);
 	mutex_unlock(&vgpu->gvt->lock);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 8/8] drm/i915/gvt: VFIO device states interfaces
  2019-02-19  7:42 [PATCH 0/8] VFIO Device states interface in GVT Yan Zhao
                   ` (6 preceding siblings ...)
  2019-02-19  7:46 ` [PATCH 7/8] drm/i915/gvt: vGPU device config data save/restore interface Yan Zhao
@ 2019-02-19  7:46 ` Yan Zhao
  7 siblings, 0 replies; 10+ messages in thread
From: Yan Zhao @ 2019-02-19  7:46 UTC (permalink / raw)
  To: intel-gvt-dev, alex.williamson
  Cc: kvm, linux-kernel, Yan Zhao, Kevin Tian, Yulei Zhang

This patch registers 3 VFIO device state regiones of type
VFIO_REGION_TYPE_DEVICE_STATE, and subtype
VFIO_REGION_SUBTYPE_DEVICE_STATE_CTL,
VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_CONFIG,
VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_DIRTYBITMAP.

userspace VFIO will check the existence of those regions to get/set
vGPU's device states.

region of subtype VFIO_REGION_SUBTYPE_DEVICE_STATE_CTL is the control
region, its layout is defined in struct vfio_device_state_ctl.
Reading from userspace into this region will get device state interace's
version and device data caps.

As Intel vGPU does not have device memory, so it does not support cap
VFIO_DEVICE_DATA_CAP_DEVICE_MEMORY.

But Intel vGPU will produce dirty page in system memory, cap
VFIO_DEVICE_DATA_CAP_SYSTEM_MEMORY is reported.

through writing to the control region, vGPU's state can also be set to
one of VFIO_DEVICE_STATE_RUNNING, VFIO_DEVICE_STATE_STOP,
VFIO_DEVICE_STATE_RUNNING & VFIO_DEVICE_STATE_LOGGING,
VFIO_DEVICE_STATE_STOP & VFIO_DEVICE_STATE_LOGGING.
state VFIO_DEVICE_STATE_LOGGING is set to notify logging dirty page in
system memory, but since vGPU's dirty page logging now is implemented by
cache of dma pages for guest gfns in vggtt and ppgtt,
nothing special needs to be done in the two LOGGING states, like
start/stop logging threads...

vGPU's device config data (including vreg, vggtt, vcfg space, workloads,
ppgtt, execlist, which are saved/restored through gvt interface
intel_gvt_save_restore) is hold in region of subtype
VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_CONFIG.
This region is mmaped into userspace VFIO.

Therefore userspace VFIO's reading from this config data
region requires it first write GET_BUFFER to device_config.action in the
above control region, so that GVT can load config data of vGPU into
config data region first;
And after userspace VFIO's writing to config data region, SET_BUFFER
is also needed to write to device_config.action in control region, so
GVT can restore config data into vGPU.
(Also, if device config data region failed to be mmaped into userspace
VFIO, read/write handlers are also provided).

vGPU's region for dirty bitmap logging in system memory is of subtype
VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_DIRTYBITMAP. It's also mmaped into
userspace VFIO. By writing start_addr and page count of a range of
system memory, dirty pages' bitmap produced by vGPU is saved in this
region dirty bitmap. Userspace VFIO can directly read dirty bitmap from
mmaped region or through this region's read/write handlers.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yulei Zhang <yulei.zhang@intel.com>
---
 drivers/gpu/drm/i915/gvt/gvt.h   |   3 +
 drivers/gpu/drm/i915/gvt/kvmgt.c | 412 +++++++++++++++++++++++++++++--
 include/uapi/linux/vfio.h        |  38 +++
 3 files changed, 437 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
index cfde510e9d77..b0580169f595 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.h
+++ b/drivers/gpu/drm/i915/gvt/gvt.h
@@ -227,6 +227,9 @@ struct intel_vgpu {
 		struct work_struct release_work;
 		atomic_t released;
 		struct vfio_device *vfio_device;
+		struct vfio_device_state_ctl *state_ctl;
+		void *state_config;
+		void *state_bitmap;
 	} vdev;
 #endif
 
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 223c67e87680..02df2ebaa3f4 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -65,6 +65,8 @@ struct intel_vgpu_regops {
 			size_t count, loff_t *ppos, bool iswrite);
 	void (*release)(struct intel_vgpu *vgpu,
 			struct vfio_region *region);
+	int (*mmap)(struct intel_vgpu *vgpu,
+			struct vm_area_struct *vma);
 };
 
 struct vfio_region {
@@ -414,7 +416,7 @@ static size_t intel_vgpu_reg_rw_opregion(struct intel_vgpu *vgpu, char *buf,
 	count = min(count, (size_t)(vgpu->vdev.region[i].size - pos));
 	memcpy(buf, base + pos, count);
 
-	return count;
+	return 0;
 }
 
 static void intel_vgpu_reg_release_opregion(struct intel_vgpu *vgpu,
@@ -427,6 +429,272 @@ static const struct intel_vgpu_regops intel_vgpu_regops_opregion = {
 	.release = intel_vgpu_reg_release_opregion,
 };
 
+static size_t set_device_state(struct intel_vgpu *vgpu, u32 state)
+{
+	int rc = 0;
+
+	switch (state) {
+	case VFIO_DEVICE_STATE_STOP:
+		intel_gvt_ops->vgpu_deactivate(vgpu);
+		break;
+	case VFIO_DEVICE_STATE_RUNNING:
+		intel_gvt_ops->vgpu_activate(vgpu);
+		break;
+	case VFIO_DEVICE_STATE_LOGGING | VFIO_DEVICE_STATE_RUNNING:
+	case VFIO_DEVICE_STATE_LOGGING | VFIO_DEVICE_STATE_STOP:
+		break;
+	default:
+		rc = -EFAULT;
+	}
+
+	return rc;
+}
+
+static void intel_vgpu_get_dirty_bitmap(struct intel_vgpu *vgpu,
+		u64 start_addr, u64 npage, void *bitmap)
+{
+	u64 gfn = start_addr >> PAGE_SHIFT;
+	int i;
+
+	memset(bitmap, 0, MIGRATION_DIRTY_BITMAP_SIZE);
+
+	for (i = 0; i < npage; i++) {
+		mutex_lock(&vgpu->vdev.cache_lock);
+		if (__gvt_cache_find_gfn(vgpu, gfn))
+			set_bit(i, bitmap);
+
+		mutex_unlock(&vgpu->vdev.cache_lock);
+		gfn++;
+	}
+}
+
+static size_t intel_vgpu_reg_rw_state_ctl(struct intel_vgpu *vgpu,
+		char *buf, size_t count, loff_t *ppos, bool iswrite)
+{
+	struct vfio_device_state_ctl *state_ctl;
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+	unsigned int i;
+	int rc = 0;
+	__u64 len;
+
+	state_ctl = vgpu->vdev.state_ctl;
+	if (!state_ctl) {
+		gvt_vgpu_err("invalid rw of state ctl region\n");
+		rc = -EFAULT;
+		goto exit;
+	}
+
+	i = VFIO_PCI_OFFSET_TO_INDEX(*ppos) - VFIO_PCI_NUM_REGIONS;
+	if (pos >= vgpu->vdev.region[i].size) {
+		gvt_vgpu_err("invalid offset for Intel vgpu state ctl region\n");
+		rc = -EINVAL;
+		goto exit;
+	}
+
+#define CTL_OFFSET(x) offsetof(struct vfio_device_state_ctl, x)
+	switch (pos) {
+	case CTL_OFFSET(version):
+		if (!iswrite)
+			rc = copy_to_user(buf,
+				&state_ctl->version,
+				sizeof(state_ctl->version));
+		break;
+	case CTL_OFFSET(device_state):
+		if (!iswrite)
+			rc = copy_to_user(buf,
+				&state_ctl->device_state,
+				sizeof(state_ctl->device_state));
+		else {
+			u32 state;
+
+			if (copy_from_user(&state, buf, sizeof(state))) {
+				rc = -EFAULT;
+				goto exit;
+			}
+			set_device_state(vgpu, state);
+		}
+		break;
+	case CTL_OFFSET(caps):
+		if (!iswrite)
+			rc = copy_to_user(buf,
+				&state_ctl->caps,
+				sizeof(state_ctl->caps));
+		break;
+	case CTL_OFFSET(device_config.action):
+		if (iswrite) {
+			u32 action;
+			bool isset;
+
+			if (copy_from_user(&action, buf, sizeof(action))) {
+				rc = -EFAULT;
+				goto exit;
+			}
+			isset = (action ==
+				VFIO_DEVICE_DATA_ACTION_SET_BUFFER);
+			rc = intel_gvt_ops->vgpu_save_restore(vgpu,
+					NULL,
+					MIGRATION_IMG_MAX_SIZE,
+					vgpu->vdev.state_config,
+					0,
+					isset);
+		} else {
+			/* action read is not valid */
+			rc = -EINVAL;
+		}
+		break;
+	case CTL_OFFSET(device_config.size):
+		len = MIGRATION_IMG_MAX_SIZE;
+		if (!iswrite)
+			rc = copy_to_user(buf, &len, sizeof(len));
+		break;
+	case CTL_OFFSET(system_memory):
+		{
+			struct {
+				__u64 start_addr;
+				__u64 page_nr;
+			} system_memory;
+
+			void *bitmap = vgpu->vdev.state_bitmap;
+
+			if (count != sizeof(system_memory)) {
+				/* must write as a whole */
+				rc = -EINVAL;
+				goto exit;
+			}
+			if (!iswrite) {
+				/* action read is not valid */
+				rc = -EINVAL;
+				goto exit;
+			}
+			if (copy_from_user(&system_memory, buf,
+						sizeof(system_memory))) {
+				rc = -EFAULT;
+				goto exit;
+			}
+			intel_vgpu_get_dirty_bitmap(vgpu,
+				system_memory.start_addr,
+				system_memory.page_nr, bitmap);
+		}
+		break;
+	default:
+		break;
+	}
+exit:
+	return rc;
+}
+
+static void intel_vgpu_reg_release_state_ctl(struct intel_vgpu *vgpu,
+		struct vfio_region *region)
+{
+	vfree(region->data);
+}
+
+static size_t intel_vgpu_reg_rw_state_data_config(struct intel_vgpu *vgpu,
+		char *buf, size_t count, loff_t *ppos, bool iswrite)
+{
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+	unsigned int i = VFIO_PCI_OFFSET_TO_INDEX(*ppos) - VFIO_PCI_NUM_REGIONS;
+	void *base = vgpu->vdev.region[i].data;
+	int rc = 0;
+
+	if (pos >= vgpu->vdev.region[i].size) {
+		gvt_vgpu_err("invalid offset to rw Intel vgpu state data region\n");
+		rc = -EINVAL;
+		goto exit;
+	}
+
+	if (iswrite) {
+		if (copy_from_user(base + pos, buf, count))
+			rc = -EFAULT;
+	} else {
+		if (copy_to_user(buf, base + pos, count))
+			rc = -EFAULT;
+	}
+
+exit:
+	return rc;
+}
+
+static
+void intel_vgpu_reg_release_state_data_config(struct intel_vgpu *vgpu,
+		struct vfio_region *region)
+{
+	vfree(region->data);
+}
+
+static
+int intel_vgpu_reg_mmap_state_data_config(struct intel_vgpu *vgpu,
+			struct vm_area_struct *vma)
+{
+	unsigned long pgoff = 0;
+	void *base = vgpu->vdev.state_config;
+
+	pgoff = vma->vm_pgoff &
+		((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
+
+	if (pgoff != 0)
+		return -EINVAL;
+
+	return remap_vmalloc_range(vma, base, 0);
+}
+
+static size_t intel_vgpu_reg_rw_state_bitmap(struct intel_vgpu *vgpu,
+		char *buf, size_t count, loff_t *ppos, bool iswrite)
+{
+	loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+	unsigned int i = VFIO_PCI_OFFSET_TO_INDEX(*ppos) -
+			VFIO_PCI_NUM_REGIONS;
+	void *base = vgpu->vdev.region[i].data;
+	int rc = 0;
+
+	if (iswrite || pos != 0)
+		return -EINVAL;
+
+	if (copy_to_user(buf, base, count))
+		rc = -EFAULT;
+
+	return 0;
+}
+
+static
+void intel_vgpu_reg_release_state_bitmap(struct intel_vgpu *vgpu,
+		struct vfio_region *region)
+{
+	vfree(region->data);
+}
+
+static int intel_vgpu_reg_mmap_state_bitmap(struct intel_vgpu *vgpu,
+			struct vm_area_struct *vma)
+{
+	unsigned long pgoff = 0;
+	void *base = vgpu->vdev.state_bitmap;
+
+	pgoff = vma->vm_pgoff &
+		((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);
+
+	if (pgoff != 0)
+		return -EINVAL;
+
+	return remap_vmalloc_range(vma, base, 0);
+}
+
+static const struct intel_vgpu_regops intel_vgpu_regops_state_ctl = {
+	.rw	 = intel_vgpu_reg_rw_state_ctl,
+	.release = intel_vgpu_reg_release_state_ctl,
+};
+
+static const struct intel_vgpu_regops intel_vgpu_regops_state_data_config = {
+	.rw	 = intel_vgpu_reg_rw_state_data_config,
+	.release = intel_vgpu_reg_release_state_data_config,
+	.mmap    = intel_vgpu_reg_mmap_state_data_config,
+};
+
+static const struct intel_vgpu_regops intel_vgpu_regops_state_bitmap = {
+	.rw	 = intel_vgpu_reg_rw_state_bitmap,
+	.release = intel_vgpu_reg_release_state_bitmap,
+	.mmap    = intel_vgpu_reg_mmap_state_bitmap,
+};
+
 static int intel_vgpu_register_reg(struct intel_vgpu *vgpu,
 		unsigned int type, unsigned int subtype,
 		const struct intel_vgpu_regops *ops,
@@ -493,6 +761,82 @@ static int kvmgt_set_opregion(void *p_vgpu)
 	return ret;
 }
 
+static int kvmgt_init_device_state(struct intel_vgpu *vgpu)
+{
+	void *bitmap_base, *config_base;
+	int ret;
+	struct vfio_device_state_ctl *state_ctl;
+
+	state_ctl = vzalloc(sizeof(struct vfio_device_state_ctl));
+	if (!state_ctl) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	state_ctl->version = VFIO_DEVICE_STATE_INTERFACE_VERSION;
+	state_ctl->caps = VFIO_DEVICE_DATA_CAP_SYSTEM_MEMORY;
+
+	ret = intel_vgpu_register_reg(vgpu,
+			VFIO_REGION_TYPE_DEVICE_STATE,
+			VFIO_REGION_SUBTYPE_DEVICE_STATE_CTL,
+			&intel_vgpu_regops_state_ctl,
+			sizeof(struct vfio_device_state_ctl),
+			VFIO_REGION_INFO_FLAG_READ |
+			VFIO_REGION_INFO_FLAG_WRITE,
+			state_ctl);
+	if (ret) {
+		vfree(state_ctl);
+		goto out;
+	}
+	vgpu->vdev.state_ctl = state_ctl;
+
+	config_base = vmalloc_user(MIGRATION_IMG_MAX_SIZE);
+	if (config_base == NULL) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	ret = intel_vgpu_register_reg(vgpu,
+			VFIO_REGION_TYPE_DEVICE_STATE,
+			VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_CONFIG,
+			&intel_vgpu_regops_state_data_config,
+			MIGRATION_IMG_MAX_SIZE,
+			VFIO_REGION_INFO_FLAG_CAPS |
+			VFIO_REGION_INFO_FLAG_READ |
+			VFIO_REGION_INFO_FLAG_WRITE |
+			VFIO_REGION_INFO_FLAG_MMAP,
+			config_base);
+	if (ret) {
+		vfree(config_base);
+		goto out;
+	}
+	vgpu->vdev.state_config = config_base;
+
+
+	bitmap_base = vmalloc_user(MIGRATION_DIRTY_BITMAP_SIZE);
+	if (bitmap_base == NULL) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	ret = intel_vgpu_register_reg(vgpu,
+			VFIO_REGION_TYPE_DEVICE_STATE,
+			VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_DIRTYBITMAP,
+			&intel_vgpu_regops_state_bitmap,
+			MIGRATION_DIRTY_BITMAP_SIZE,
+			VFIO_REGION_INFO_FLAG_CAPS |
+			VFIO_REGION_INFO_FLAG_READ |
+			VFIO_REGION_INFO_FLAG_WRITE |
+			VFIO_REGION_INFO_FLAG_MMAP,
+			bitmap_base);
+	if (ret) {
+		vfree(bitmap_base);
+		goto out;
+	}
+	vgpu->vdev.state_bitmap = bitmap_base;
+
+out:
+	return ret;
+}
+
 static void kvmgt_put_vfio_device(void *vgpu)
 {
 	if (WARN_ON(!((struct intel_vgpu *)vgpu)->vdev.vfio_device))
@@ -631,6 +975,8 @@ static int intel_vgpu_open(struct mdev_device *mdev)
 	if (ret)
 		goto undo_group;
 
+	kvmgt_init_device_state(vgpu);
+
 	intel_gvt_ops->vgpu_activate(vgpu);
 
 	atomic_set(&vgpu->vdev.released, 0);
@@ -662,6 +1008,7 @@ static void __intel_vgpu_release(struct intel_vgpu *vgpu)
 {
 	struct kvmgt_guest_info *info;
 	int ret;
+	int i;
 
 	if (!handle_valid(vgpu->handle))
 		return;
@@ -671,6 +1018,13 @@ static void __intel_vgpu_release(struct intel_vgpu *vgpu)
 
 	intel_gvt_ops->vgpu_release(vgpu);
 
+	for (i = 0; i < vgpu->vdev.num_regions; i++)
+		vgpu->vdev.region[i].ops->release(vgpu, &vgpu->vdev.region[i]);
+
+	vgpu->vdev.num_regions = 0;
+	kfree(vgpu->vdev.region);
+	vgpu->vdev.region = NULL;
+
 	ret = vfio_unregister_notifier(mdev_dev(vgpu->vdev.mdev), VFIO_IOMMU_NOTIFY,
 					&vgpu->vdev.iommu_notifier);
 	WARN(ret, "vfio_unregister_notifier for iommu failed: %d\n", ret);
@@ -816,11 +1170,11 @@ static ssize_t intel_vgpu_rw(struct mdev_device *mdev, char *buf,
 	case VFIO_PCI_ROM_REGION_INDEX:
 		break;
 	default:
-		if (index >= VFIO_PCI_NUM_REGIONS + vgpu->vdev.num_regions)
+		if (index < VFIO_PCI_NUM_REGIONS)
 			return -EINVAL;
 
 		index -= VFIO_PCI_NUM_REGIONS;
-		return vgpu->vdev.region[index].ops->rw(vgpu, buf, count,
+		ret = vgpu->vdev.region[index].ops->rw(vgpu, buf, count,
 				ppos, is_write);
 	}
 
@@ -851,6 +1205,10 @@ static ssize_t intel_vgpu_read(struct mdev_device *mdev, char __user *buf,
 {
 	unsigned int done = 0;
 	int ret;
+	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+
+	if (index >= VFIO_PCI_NUM_REGIONS)
+		return intel_vgpu_rw(mdev, (char *)buf, count, ppos, false);
 
 	while (count) {
 		size_t filled;
@@ -925,6 +1283,10 @@ static ssize_t intel_vgpu_write(struct mdev_device *mdev,
 {
 	unsigned int done = 0;
 	int ret;
+	unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+
+	if (index >= VFIO_PCI_NUM_REGIONS)
+		return intel_vgpu_rw(mdev, (char *)buf, count, ppos, true);
 
 	while (count) {
 		size_t filled;
@@ -999,24 +1361,42 @@ static int intel_vgpu_mmap(struct mdev_device *mdev, struct vm_area_struct *vma)
 	unsigned long req_size, pgoff = 0;
 	pgprot_t pg_prot;
 	struct intel_vgpu *vgpu = mdev_get_drvdata(mdev);
+	int ret = 0;
 
 	index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);
-	if (index >= VFIO_PCI_ROM_REGION_INDEX)
-		return -EINVAL;
 
-	if (vma->vm_end < vma->vm_start)
-		return -EINVAL;
-	if ((vma->vm_flags & VM_SHARED) == 0)
-		return -EINVAL;
-	if (index != VFIO_PCI_BAR2_REGION_INDEX)
-		return -EINVAL;
+	if (vma->vm_end < vma->vm_start) {
+		ret = -EINVAL;
+		goto exit;
+	}
 
-	pg_prot = vma->vm_page_prot;
-	virtaddr = vma->vm_start;
-	req_size = vma->vm_end - vma->vm_start;
-	pgoff = vgpu_aperture_pa_base(vgpu) >> PAGE_SHIFT;
+	if ((vma->vm_flags & VM_SHARED) == 0) {
+		ret = -EINVAL;
+		goto exit;
+	}
+
+	if (index == VFIO_PCI_BAR2_REGION_INDEX) {
+		pg_prot = vma->vm_page_prot;
+		virtaddr = vma->vm_start;
+		req_size = vma->vm_end - vma->vm_start;
+		pgoff = vgpu_aperture_pa_base(vgpu) >> PAGE_SHIFT;
+		ret = remap_pfn_range(vma, virtaddr, pgoff,
+				req_size, pg_prot);
+	} else if ((index >= VFIO_PCI_NUM_REGIONS +
+			vgpu->vdev.num_regions) ||
+			index < VFIO_PCI_NUM_REGIONS) {
+		ret = -EINVAL;
+	} else {
+		index -= VFIO_PCI_NUM_REGIONS;
+		if (vgpu->vdev.region[index].ops->mmap)
+			ret = vgpu->vdev.region[index].ops->mmap(vgpu,
+					vma);
+		else
+			ret = -EINVAL;
+	}
+exit:
+	return ret;
 
-	return remap_pfn_range(vma, virtaddr, pgoff, req_size, pg_prot);
 }
 
 static int intel_vgpu_get_irq_count(struct intel_vgpu *vgpu, int type)
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 813102810f53..a577b242e3bd 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -303,6 +303,14 @@ struct vfio_region_info_cap_type {
 #define VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG	(2)
 #define VFIO_REGION_SUBTYPE_INTEL_IGD_LPC_CFG	(3)
 
+
+/* Device State region type and sub-type */
+#define VFIO_REGION_TYPE_DEVICE_STATE           (1 << 1)
+#define VFIO_REGION_SUBTYPE_DEVICE_STATE_CTL       (1)
+#define VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_CONFIG      (2)
+#define VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_MEMORY      (3)
+#define VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_DIRTYBITMAP (4)
+
 #define VFIO_REGION_TYPE_GFX                    (1)
 #define VFIO_REGION_SUBTYPE_GFX_EDID            (1)
 
@@ -866,6 +874,36 @@ struct vfio_iommu_spapr_tce_remove {
 };
 #define VFIO_IOMMU_SPAPR_TCE_REMOVE	_IO(VFIO_TYPE, VFIO_BASE + 20)
 
+#define VFIO_DEVICE_STATE_INTERFACE_VERSION 1
+#define VFIO_DEVICE_DATA_CAP_DEVICE_MEMORY 1
+#define VFIO_DEVICE_DATA_CAP_SYSTEM_MEMORY 2
+
+#define VFIO_DEVICE_STATE_RUNNING 0
+#define VFIO_DEVICE_STATE_STOP 1
+#define VFIO_DEVICE_STATE_LOGGING 2
+
+#define VFIO_DEVICE_DATA_ACTION_GET_BUFFER 1
+#define VFIO_DEVICE_DATA_ACTION_SET_BUFFER 2
+
+struct vfio_device_state_ctl {
+	__u32 version;		  /* ro */
+	__u32 device_state;       /* VFIO device state, wo */
+	__u32 caps;		 /* ro */
+	struct {
+		__u32 action;  /* wo, GET_BUFFER or SET_BUFFER */
+		__u64 size;    /*rw, total size of device config*/
+	} device_config;
+	struct {
+		__u32 action;    /* wo, GET_BUFFER or SET_BUFFER */
+		__u64 size;     /* rw, total size of device memory*/
+		__u64 pos;/*chunk offset in total buffer of device memory*/
+	} device_memory;
+	struct {
+		__u64 start_addr; /* wo */
+		__u64 page_nr;   /* wo */
+	} system_memory;
+} __attribute__((packed));
+
 /* ***************************************************************** */
 
 #endif /* _UAPIVFIO_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 7/8] drm/i915/gvt: vGPU device config data save/restore interface
  2019-02-19  7:46 ` [PATCH 7/8] drm/i915/gvt: vGPU device config data save/restore interface Yan Zhao
@ 2019-02-20  9:39   ` Zhenyu Wang
  0 siblings, 0 replies; 10+ messages in thread
From: Zhenyu Wang @ 2019-02-20  9:39 UTC (permalink / raw)
  To: Yan Zhao
  Cc: intel-gvt-dev, alex.williamson, kvm, linux-kernel, Zhenyu Wang,
	Yulei Zhang, Xiao Zheng

[-- Attachment #1: Type: text/plain, Size: 4967 bytes --]

On 2019.02.19 02:46:32 -0500, Yan Zhao wrote:
> The patch implments the gvt interface intel_gvt_save_restore to
> save/restore vGPU's device config data for live migration.
> 
> vGPU device config data includes vreg, vggtt, vcfg space, workloads, ppgtt,
> execlist.
> It does not include dirty pages in system memory produced by vGPU.
> 
> Signed-off-by: Yulei Zhang <yulei.zhang@intel.com>
> Signed-off-by: Xiao Zheng <xiao.zheng@intel.com>
> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>

...

> +
> +#ifndef __GVT_MIGRATE_H__
> +#define __GVT_MIGRATE_H__
> +
> +#define MIGRATION_DIRTY_BITMAP_SIZE (16*1024UL)
> +
> +/* Assume 9MB is enough to descript VM kernel state */
> +#define MIGRATION_IMG_MAX_SIZE (9*1024UL*1024UL)
> +#define GVT_MMIO_SIZE (2*1024UL*1024UL)
> +#define GVT_MIGRATION_VERSION	0
> +
> +enum gvt_migration_type_t {
> +	GVT_MIGRATION_NONE,
> +	GVT_MIGRATION_HEAD,
> +	GVT_MIGRATION_CFG_SPACE,
> +	GVT_MIGRATION_VREG,
> +	GVT_MIGRATION_SREG,
> +	GVT_MIGRATION_GTT,
> +	GVT_MIGRATION_PPGTT,
> +	GVT_MIGRATION_WORKLOAD,
> +	GVT_MIGRATION_EXECLIST,
> +};
> +
> +struct gvt_ppgtt_entry_t {
> +	int page_table_level;
> +	u64 pdp[4];
> +};
> +
> +struct gvt_pending_workload_t {
> +	int ring_id;
> +	bool emulate_schedule_in;
> +	struct execlist_ctx_descriptor_format ctx_desc;
> +	struct intel_vgpu_elsp_dwords elsp_dwords;
> +};
> +
> +struct gvt_region_t {
> +	enum gvt_migration_type_t type;
> +	u32 size;		/* obj size of bytes to read/write */
> +};
> +
> +struct gvt_migration_obj_t {
> +	void *img;
> +	void *vgpu;
> +	u32 offset;
> +	struct gvt_region_t region;
> +	/* operation func defines how data save-restore */
> +	struct gvt_migration_operation_t *ops;
> +	char *name;
> +};
> +
> +struct gvt_migration_operation_t {
> +	/* called during pre-copy stage, VM is still alive */
> +	int (*pre_copy)(const struct gvt_migration_obj_t *obj);
> +	/* called before when VM was paused,
> +	 * return bytes transferred
> +	 */
> +	int (*pre_save)(const struct gvt_migration_obj_t *obj);
> +	/* called before load the state of device */
> +	int (*pre_load)(const struct gvt_migration_obj_t *obj, u32 size);
> +	/* called after load the state of device, VM already alive */
> +	int (*post_load)(const struct gvt_migration_obj_t *obj, u32 size);
> +};
> +
> +struct gvt_image_header_t {
> +	int version;
> +	int data_size;
> +	u64 crc_check;
> +	u64 global_data[64];
> +};

I think this misses device info that should ship with the image,
currently what I can think is that each platform should have seperate
type, e.g BDW, SKL, KBL, etc. We won't allow to restore onto different
platform than the source.

> +
> +#endif
> diff --git a/drivers/gpu/drm/i915/gvt/mmio.c b/drivers/gpu/drm/i915/gvt/mmio.c
> index 43f65848ecd6..6221d2f274fc 100644
> --- a/drivers/gpu/drm/i915/gvt/mmio.c
> +++ b/drivers/gpu/drm/i915/gvt/mmio.c
> @@ -50,6 +50,19 @@ int intel_vgpu_gpa_to_mmio_offset(struct intel_vgpu *vgpu, u64 gpa)
>  	return gpa - gttmmio_gpa;
>  }
>  
> +/**
> + * intel_vgpu_mmio_offset_to_GPA - translate a MMIO offset to GPA
> + * @vgpu: a vGPU
> + *
> + * Returns:
> + * Zero on success, negative error code if failed
> + */
> +int intel_vgpu_mmio_offset_to_gpa(struct intel_vgpu *vgpu, u64 offset)
> +{
> +	return offset + ((*(u64 *)(vgpu_cfg_space(vgpu) + PCI_BASE_ADDRESS_0)) &
> +		~GENMASK(3, 0));
> +}
> +
>  #define reg_is_mmio(gvt, reg)  \
>  	(reg >= 0 && reg < gvt->device_info.mmio_size)
>  
> diff --git a/drivers/gpu/drm/i915/gvt/mmio.h b/drivers/gpu/drm/i915/gvt/mmio.h
> index 1ffc69eba30e..a2bddb0257cf 100644
> --- a/drivers/gpu/drm/i915/gvt/mmio.h
> +++ b/drivers/gpu/drm/i915/gvt/mmio.h
> @@ -82,6 +82,7 @@ void intel_vgpu_reset_mmio(struct intel_vgpu *vgpu, bool dmlr);
>  void intel_vgpu_clean_mmio(struct intel_vgpu *vgpu);
>  
>  int intel_vgpu_gpa_to_mmio_offset(struct intel_vgpu *vgpu, u64 gpa);
> +int intel_vgpu_mmio_offset_to_gpa(struct intel_vgpu *vgpu, u64 offset);
>  
>  int intel_vgpu_emulate_mmio_read(struct intel_vgpu *vgpu, u64 pa,
>  				void *p_data, unsigned int bytes);
> diff --git a/drivers/gpu/drm/i915/gvt/vgpu.c b/drivers/gpu/drm/i915/gvt/vgpu.c
> index fcccda35a456..7676dcfdca09 100644
> --- a/drivers/gpu/drm/i915/gvt/vgpu.c
> +++ b/drivers/gpu/drm/i915/gvt/vgpu.c
> @@ -213,6 +213,7 @@ void intel_gvt_activate_vgpu(struct intel_vgpu *vgpu)
>  {
>  	mutex_lock(&vgpu->gvt->lock);
>  	vgpu->active = true;
> +	intel_vgpu_start_schedule(vgpu);
>  	mutex_unlock(&vgpu->gvt->lock);
>  }
>  
> -- 
> 2.17.1
> 
> _______________________________________________
> intel-gvt-dev mailing list
> intel-gvt-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev

-- 
Open Source Technology Center, Intel ltd.

$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-02-20  9:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-19  7:42 [PATCH 0/8] VFIO Device states interface in GVT Yan Zhao
2019-02-19  7:43 ` [PATCH 1/8] drm/i915/gvt: Apply g2h adjust for GTT mmio access Yan Zhao
2019-02-19  7:45 ` [PATCH 2/8] drm/i915/gvt: Apply g2h adjustment during fence " Yan Zhao
2019-02-19  7:45 ` [PATCH 3/8] drm/i915/gvt: Patch the gma in gpu commands during command parser Yan Zhao
2019-02-19  7:46 ` [PATCH 4/8] drm/i915/gvt: Retrieve the guest gm base address from PVINFO Yan Zhao
2019-02-19  7:46 ` [PATCH 5/8] drm/i915/gvt: Align the guest gm aperture start offset for live migration Yan Zhao
2019-02-19  7:46 ` [PATCH 6/8] drm/i915/gvt: Apply g2h adjustment to buffer start gma for dmabuf Yan Zhao
2019-02-19  7:46 ` [PATCH 7/8] drm/i915/gvt: vGPU device config data save/restore interface Yan Zhao
2019-02-20  9:39   ` Zhenyu Wang
2019-02-19  7:46 ` [PATCH 8/8] drm/i915/gvt: VFIO device states interfaces Yan Zhao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.