All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC] vfio: Introduce DMA logging uAPIs for VFIO device
@ 2022-05-01 12:33 Yishai Hadas
  2022-05-02 19:07 ` Alex Williamson
  0 siblings, 1 reply; 8+ messages in thread
From: Yishai Hadas @ 2022-05-01 12:33 UTC (permalink / raw)
  To: alex.williamson, jgg, kvm
  Cc: yishaih, maorg, cohuck, kevin.tian, joao.m.martins, cjia,
	kwankhede, targupta, shameerali.kolothum.thodi, eric.auger

DMA logging allows a device to internally record what DMAs the device is
initiation and report them back to userspace.

It is part of the VFIO migration infrastructure that allows implementing
dirty page tracking during the pre-copy phase of live migration.

Only DMA WRITEs are logged, and this API is not connected to
VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE.

This RFC patch shows the expected usage of the DMA logging involved
uAPIs for VFIO device-tracker.

It uses the FEATURE ioctl with its GET/SET/PROBE options as of below.

It exposes a PROBE option to detect if the device supports DMA logging.

It exposes a SET option to start device DMA logging in given of IOVA
ranges.

It exposes a SET option to stop device DMA logging that was previously
started.

It exposes a GET option to read back and clear the device DMA log.

Extra details exist as part of vfio.h per a specific option in this RFC
patch.

Note:
To have IOMMU hardware support for dirty pages the below RFC [1] that
was sent by Joao Martins can be referenced.

[1] https://lore.kernel.org/all/2d369e58-8ac0-f263-7b94-fe73917782e1@linux.intel.com/T/

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 include/uapi/linux/vfio.h | 80 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 80 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index fea86061b44e..9d0b7e73e999 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -986,6 +986,86 @@ enum vfio_device_mig_state {
 	VFIO_DEVICE_STATE_RUNNING_P2P = 5,
 };
 
+/*
+ * Upon VFIO_DEVICE_FEATURE_SET start device DMA logging.
+ * VFIO_DEVICE_FEATURE_PROBE can be used to detect if the device supports
+ * DMA logging.
+ *
+ * DMA logging allows a device to internally record what DMAs the device is
+ * initiation and report them back to userspace. It is part of the VFIO
+ * migration infrastructure that allows implementing dirty page tracking
+ * during the pre copy phase of live migration. Only DMA WRITEs are logged,
+ * and this API is not connected to VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE.
+ *
+ * When DMA logging is started a range of IOVAs to monitor is provided and the
+ * device can optimize its logging to cover only the IOVA range given. Each
+ * DMA that the device initiates inside the range will be logged by the device
+ * for later retrieval.
+ *
+ * page_size is an input that hints what tracking granularity the device
+ * should try to achieve. If the device cannot do the hinted page size then it
+ * should pick the next closest page size it supports. On output the device
+ * will return the page size it selected.
+ *
+ * ranges is a pointer to an array of
+ * struct vfio_device_feature_dma_logging_range.
+ */
+struct vfio_device_feature_dma_logging_control {
+	__aligned_u64 page_size;
+	__u32 num_ranges;
+	__u32 __reserved;
+	__aligned_u64 ranges;
+};
+
+struct vfio_device_feature_dma_logging_range {
+	__aligned_u64 iova;
+	__aligned_u64 length;
+};
+
+#define VFIO_DEVICE_FEATURE_DMA_LOGGING_START 3
+
+
+/*
+ * Upon VFIO_DEVICE_FEATURE_SET stop device DMA logging that was started
+ * by VFIO_DEVICE_FEATURE_DMA_LOGGING_START
+ */
+#define VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP 4
+
+/*
+ * Upon VFIO_DEVICE_FEATURE_GET read back and clear the device DMA log
+ *
+ * Query the device's DMA log for written pages within the given IOVA range.
+ * During querying the log is cleared for the IOVA range.
+ *
+ * bitmap is a pointer to an array of u64s that will hold the output bitmap
+ * with 1 bit reporting a page_size unit of IOVA. The mapping of IOVA to bits
+ * is given by:
+ *  bitmap[(addr - iova)/page_size] & (1ULL << (addr % 64))
+ *
+ * The input page_size can be any power of two value and does not have to
+ * match the value given to VFIO_DEVICE_FEATURE_DMA_LOGGING_START. The driver
+ * will format its internal logging to match the reporting page size, possibly
+ * by replicating bits if the internal page size is lower than requested.
+ *
+ * Bits will be updated in bitmap using atomic or to allow userspace to
+ * combine bitmaps from multiple trackers together. Therefore userspace must
+ * zero the bitmap before doing any reports.
+ *
+ * If any error is returned userspace should assume that the dirty log is
+ * corrupted and restart.
+ *
+ * If DMA logging is not enabled, an error will be returned.
+ *
+ */
+struct vfio_device_feature_dma_logging_report {
+	__aligned_u64 iova;
+	__aligned_u64 length;
+	__aligned_u64 page_size;
+	__aligned_u64 bitmap;
+};
+
+#define VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT 5
+
 /* -------- API for Type1 VFIO IOMMU -------- */
 
 /**
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] vfio: Introduce DMA logging uAPIs for VFIO device
  2022-05-01 12:33 [PATCH RFC] vfio: Introduce DMA logging uAPIs for VFIO device Yishai Hadas
@ 2022-05-02 19:07 ` Alex Williamson
  2022-05-02 19:25   ` Jason Gunthorpe
  0 siblings, 1 reply; 8+ messages in thread
From: Alex Williamson @ 2022-05-02 19:07 UTC (permalink / raw)
  To: Yishai Hadas
  Cc: jgg, kvm, maorg, cohuck, kevin.tian, joao.m.martins, cjia,
	kwankhede, targupta, shameerali.kolothum.thodi, eric.auger

On Sun, 1 May 2022 15:33:00 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> DMA logging allows a device to internally record what DMAs the device is
> initiation and report them back to userspace.
> 
> It is part of the VFIO migration infrastructure that allows implementing
> dirty page tracking during the pre-copy phase of live migration.
> 
> Only DMA WRITEs are logged, and this API is not connected to
> VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE.
> 
> This RFC patch shows the expected usage of the DMA logging involved
> uAPIs for VFIO device-tracker.
> 
> It uses the FEATURE ioctl with its GET/SET/PROBE options as of below.
> 
> It exposes a PROBE option to detect if the device supports DMA logging.
> 
> It exposes a SET option to start device DMA logging in given of IOVA
> ranges.
> 
> It exposes a SET option to stop device DMA logging that was previously
> started.
> 
> It exposes a GET option to read back and clear the device DMA log.
> 
> Extra details exist as part of vfio.h per a specific option in this RFC
> patch.
> 
> Note:
> To have IOMMU hardware support for dirty pages the below RFC [1] that
> was sent by Joao Martins can be referenced.
> 
> [1] https://lore.kernel.org/all/2d369e58-8ac0-f263-7b94-fe73917782e1@linux.intel.com/T/
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  include/uapi/linux/vfio.h | 80 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 80 insertions(+)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index fea86061b44e..9d0b7e73e999 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -986,6 +986,86 @@ enum vfio_device_mig_state {
>  	VFIO_DEVICE_STATE_RUNNING_P2P = 5,
>  };
>  
> +/*
> + * Upon VFIO_DEVICE_FEATURE_SET start device DMA logging.
> + * VFIO_DEVICE_FEATURE_PROBE can be used to detect if the device supports
> + * DMA logging.
> + *
> + * DMA logging allows a device to internally record what DMAs the device is
> + * initiation and report them back to userspace. It is part of the VFIO
> + * migration infrastructure that allows implementing dirty page tracking
> + * during the pre copy phase of live migration. Only DMA WRITEs are logged,
> + * and this API is not connected to VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE.
> + *
> + * When DMA logging is started a range of IOVAs to monitor is provided and the
> + * device can optimize its logging to cover only the IOVA range given. Each
> + * DMA that the device initiates inside the range will be logged by the device
> + * for later retrieval.
> + *
> + * page_size is an input that hints what tracking granularity the device
> + * should try to achieve. If the device cannot do the hinted page size then it
> + * should pick the next closest page size it supports. On output the device
> + * will return the page size it selected.
> + *
> + * ranges is a pointer to an array of
> + * struct vfio_device_feature_dma_logging_range.
> + */
> +struct vfio_device_feature_dma_logging_control {
> +	__aligned_u64 page_size;
> +	__u32 num_ranges;
> +	__u32 __reserved;
> +	__aligned_u64 ranges;
> +};
> +
> +struct vfio_device_feature_dma_logging_range {
> +	__aligned_u64 iova;
> +	__aligned_u64 length;
> +};
> +
> +#define VFIO_DEVICE_FEATURE_DMA_LOGGING_START 3
> +
> +
> +/*
> + * Upon VFIO_DEVICE_FEATURE_SET stop device DMA logging that was started
> + * by VFIO_DEVICE_FEATURE_DMA_LOGGING_START
> + */
> +#define VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP 4

This seems difficult to use from a QEMU perspective, where a vfio
device typically operates on a MemoryListener and we only have
visibility to one range at a time.  I don't see any indication that
LOGGING_START is meant to be cumulative such that userspace could
incrementally add ranges to be watched, nor clearly does LOGGING_STOP
appear to have any sort of IOVA range granularity.  Is userspace
intended to pass the full vCPU physical address range here, and if so
would a single min/max IOVA be sufficient?  I'm not sure how else we
could support memory hotplug while this was enabled.

How does this work with IOMMU based tracking, I assume that if devices
share an IOAS we wouldn't be able to exclude devices supporting
device-level tracking from the IOAS log.

> +
> +/*
> + * Upon VFIO_DEVICE_FEATURE_GET read back and clear the device DMA log
> + *
> + * Query the device's DMA log for written pages within the given IOVA range.
> + * During querying the log is cleared for the IOVA range.
> + *
> + * bitmap is a pointer to an array of u64s that will hold the output bitmap
> + * with 1 bit reporting a page_size unit of IOVA. The mapping of IOVA to bits
> + * is given by:
> + *  bitmap[(addr - iova)/page_size] & (1ULL << (addr % 64))
> + *
> + * The input page_size can be any power of two value and does not have to
> + * match the value given to VFIO_DEVICE_FEATURE_DMA_LOGGING_START. The driver
> + * will format its internal logging to match the reporting page size, possibly
> + * by replicating bits if the internal page size is lower than requested.

Or setting multiple bits if the internal page size is larger than
requested.

Is there a bitmap size limit?  We've minimally needed to impose limits
to reflect limitations of the bitmap code internally in the past.
Userspace needs a means to learn such limits.  Thanks,

Alex

> + *
> + * Bits will be updated in bitmap using atomic or to allow userspace to
> + * combine bitmaps from multiple trackers together. Therefore userspace must
> + * zero the bitmap before doing any reports.
> + *
> + * If any error is returned userspace should assume that the dirty log is
> + * corrupted and restart.
> + *
> + * If DMA logging is not enabled, an error will be returned.
> + *
> + */
> +struct vfio_device_feature_dma_logging_report {
> +	__aligned_u64 iova;
> +	__aligned_u64 length;
> +	__aligned_u64 page_size;
> +	__aligned_u64 bitmap;
> +};
> +
> +#define VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT 5
> +
>  /* -------- API for Type1 VFIO IOMMU -------- */
>  
>  /**


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] vfio: Introduce DMA logging uAPIs for VFIO device
  2022-05-02 19:07 ` Alex Williamson
@ 2022-05-02 19:25   ` Jason Gunthorpe
  2022-05-02 19:58     ` Alex Williamson
  0 siblings, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2022-05-02 19:25 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yishai Hadas, kvm, maorg, cohuck, kevin.tian, joao.m.martins,
	cjia, kwankhede, targupta, shameerali.kolothum.thodi, eric.auger

On Mon, May 02, 2022 at 01:07:01PM -0600, Alex Williamson wrote:

> > +/*
> > + * Upon VFIO_DEVICE_FEATURE_SET stop device DMA logging that was started
> > + * by VFIO_DEVICE_FEATURE_DMA_LOGGING_START
> > + */
> > +#define VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP 4
> 
> This seems difficult to use from a QEMU perspective, where a vfio
> device typically operates on a MemoryListener and we only have
> visibility to one range at a time.  I don't see any indication that
> LOGGING_START is meant to be cumulative such that userspace could
> incrementally add ranges to be watched, nor clearly does LOGGING_STOP
> appear to have any sort of IOVA range granularity.  

Correct, at least mlx5 HW just cannot do a change tracking operation,
so userspace must pre-select some kind of IOVA range to monitor based
on the current VM configuration.

> Is userspace intended to pass the full vCPU physical address range
> here, and if so would a single min/max IOVA be sufficient?  

At least mlx5 doesn't have enough capacity for that. Some reasonable
in-between of the current address space, and maybe a speculative extra
for hot plug.

Multi-range is needed to support some arch cases that have a big
discontinuity in their IOVA space, like PPC for instance.

> I'm not sure how else we could support memory hotplug while this was
> enabled.

Most likely memory hot plug events will have to take a 'everything got
dirty' hit during pre-copy - not much else we can do with this HW.

> How does this work with IOMMU based tracking, I assume that if devices
> share an IOAS we wouldn't be able to exclude devices supporting
> device-level tracking from the IOAS log.

Exclusion is possible, the userspace would have to manually create
iommu_domains and attach devices to them with the idea that only
iommu_domains for devices it wants to track would have dma dirty
tracking turned on.

But I'm not sure it makes much sense, if the iommu will track dirties,
and you have to use it for another devices, then it is unlikely there
will be a performance win to inject a device tracker as well.

In any case the two interfaces are orthogonal, I would not expect
userspace to want to track with both, but if it does everything does
work.

> Is there a bitmap size limit?  

Joao's code doesn't have a limit, it pulls user pages incrementally as
it goes.

I'm expecting VFIO devices to use the same bitmap library as the IOMMU
drivers so we have a consistent reporting.

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] vfio: Introduce DMA logging uAPIs for VFIO device
  2022-05-02 19:25   ` Jason Gunthorpe
@ 2022-05-02 19:58     ` Alex Williamson
  2022-05-02 22:04       ` Jason Gunthorpe
  0 siblings, 1 reply; 8+ messages in thread
From: Alex Williamson @ 2022-05-02 19:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Yishai Hadas, kvm, maorg, cohuck, kevin.tian, joao.m.martins,
	cjia, kwankhede, targupta, shameerali.kolothum.thodi, eric.auger

On Mon, 2 May 2022 16:25:41 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Mon, May 02, 2022 at 01:07:01PM -0600, Alex Williamson wrote:
> 
> > > +/*
> > > + * Upon VFIO_DEVICE_FEATURE_SET stop device DMA logging that was started
> > > + * by VFIO_DEVICE_FEATURE_DMA_LOGGING_START
> > > + */
> > > +#define VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP 4  
> > 
> > This seems difficult to use from a QEMU perspective, where a vfio
> > device typically operates on a MemoryListener and we only have
> > visibility to one range at a time.  I don't see any indication that
> > LOGGING_START is meant to be cumulative such that userspace could
> > incrementally add ranges to be watched, nor clearly does LOGGING_STOP
> > appear to have any sort of IOVA range granularity.    
> 
> Correct, at least mlx5 HW just cannot do a change tracking operation,
> so userspace must pre-select some kind of IOVA range to monitor based
> on the current VM configuration.
> 
> > Is userspace intended to pass the full vCPU physical address range
> > here, and if so would a single min/max IOVA be sufficient?    
> 
> At least mlx5 doesn't have enough capacity for that. Some reasonable
> in-between of the current address space, and maybe a speculative extra
> for hot plug.

Ah great, implicit limitations not reported to the user that I hadn't
even guessed!  How does a user learn about any limitations in the
number of ranges or size of each range?
 
> Multi-range is needed to support some arch cases that have a big
> discontinuity in their IOVA space, like PPC for instance.
> 
> > I'm not sure how else we could support memory hotplug while this was
> > enabled.  
> 
> Most likely memory hot plug events will have to take a 'everything got
> dirty' hit during pre-copy - not much else we can do with this HW.
> 
> > How does this work with IOMMU based tracking, I assume that if devices
> > share an IOAS we wouldn't be able to exclude devices supporting
> > device-level tracking from the IOAS log.  
> 
> Exclusion is possible, the userspace would have to manually create
> iommu_domains and attach devices to them with the idea that only
> iommu_domains for devices it wants to track would have dma dirty
> tracking turned on.

Well yeah, but that's the separate IOAS solution.

> But I'm not sure it makes much sense, if the iommu will track dirties,
> and you have to use it for another devices, then it is unlikely there
> will be a performance win to inject a device tracker as well.
> 
> In any case the two interfaces are orthogonal, I would not expect
> userspace to want to track with both, but if it does everything does
> work.
> 
> > Is there a bitmap size limit?    
> 
> Joao's code doesn't have a limit, it pulls user pages incrementally as
> it goes.
> 
> I'm expecting VFIO devices to use the same bitmap library as the IOMMU
> drivers so we have a consistent reporting.

I haven't reviewed that series in any detail yet, but it seems to
impose the same bitmap size and reporting to userspace features as
type1 based in internal limits of bitmap_set().  Thanks,

Alex


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] vfio: Introduce DMA logging uAPIs for VFIO device
  2022-05-02 19:58     ` Alex Williamson
@ 2022-05-02 22:04       ` Jason Gunthorpe
  2022-05-02 23:02         ` Jason Gunthorpe
  2022-05-03 11:39         ` Joao Martins
  0 siblings, 2 replies; 8+ messages in thread
From: Jason Gunthorpe @ 2022-05-02 22:04 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yishai Hadas, kvm, maorg, cohuck, kevin.tian, joao.m.martins,
	cjia, kwankhede, targupta, shameerali.kolothum.thodi, eric.auger

On Mon, May 02, 2022 at 01:58:37PM -0600, Alex Williamson wrote:
> On Mon, 2 May 2022 16:25:41 -0300
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Mon, May 02, 2022 at 01:07:01PM -0600, Alex Williamson wrote:
> > 
> > > > +/*
> > > > + * Upon VFIO_DEVICE_FEATURE_SET stop device DMA logging that was started
> > > > + * by VFIO_DEVICE_FEATURE_DMA_LOGGING_START
> > > > + */
> > > > +#define VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP 4  
> > > 
> > > This seems difficult to use from a QEMU perspective, where a vfio
> > > device typically operates on a MemoryListener and we only have
> > > visibility to one range at a time.  I don't see any indication that
> > > LOGGING_START is meant to be cumulative such that userspace could
> > > incrementally add ranges to be watched, nor clearly does LOGGING_STOP
> > > appear to have any sort of IOVA range granularity.    
> > 
> > Correct, at least mlx5 HW just cannot do a change tracking operation,
> > so userspace must pre-select some kind of IOVA range to monitor based
> > on the current VM configuration.
> > 
> > > Is userspace intended to pass the full vCPU physical address range
> > > here, and if so would a single min/max IOVA be sufficient?    
> > 
> > At least mlx5 doesn't have enough capacity for that. Some reasonable
> > in-between of the current address space, and maybe a speculative extra
> > for hot plug.
> 
> Ah great, implicit limitations not reported to the user that I hadn't
> even guessed!  How does a user learn about any limitations in the
> number of ranges or size of each range?

There is some limit of number of ranges and total aggregate address
space, you think we should report rather than try-and-fail?

I guess total address space and total number of ranges is easy to
report, but I don't quite know what userspace will do with it?

> > > How does this work with IOMMU based tracking, I assume that if devices
> > > share an IOAS we wouldn't be able to exclude devices supporting
> > > device-level tracking from the IOAS log.  
> > 
> > Exclusion is possible, the userspace would have to manually create
> > iommu_domains and attach devices to them with the idea that only
> > iommu_domains for devices it wants to track would have dma dirty
> > tracking turned on.
> 
> Well yeah, but that's the separate IOAS solution.

Sure, you can't disable tracking done at the iommu_domain level
without creating different iommu_domains. The IOAS can be shared,
userspace just has to be aware of, or perhaps explicitly control, the
assignment of iommu_domains to devices under the IOAS.

> > I'm expecting VFIO devices to use the same bitmap library as the IOMMU
> > drivers so we have a consistent reporting.
> 
> I haven't reviewed that series in any detail yet, but it seems to
> impose the same bitmap size and reporting to userspace features as
> type1 based in internal limits of bitmap_set().  Thanks,

It goes page by page, so the bitmap_set() can't see more than 4k of
bitmap at a time, IIRC.

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] vfio: Introduce DMA logging uAPIs for VFIO device
  2022-05-02 22:04       ` Jason Gunthorpe
@ 2022-05-02 23:02         ` Jason Gunthorpe
  2022-05-03 11:46           ` Joao Martins
  2022-05-03 11:39         ` Joao Martins
  1 sibling, 1 reply; 8+ messages in thread
From: Jason Gunthorpe @ 2022-05-02 23:02 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Yishai Hadas, kvm, maorg, cohuck, kevin.tian, joao.m.martins,
	cjia, kwankhede, targupta, shameerali.kolothum.thodi, eric.auger

On Mon, May 02, 2022 at 07:04:47PM -0300, Jason Gunthorpe wrote:
> > Ah great, implicit limitations not reported to the user that I hadn't
> > even guessed!  How does a user learn about any limitations in the
> > number of ranges or size of each range?
> 
> There is some limit of number of ranges and total aggregate address
> space, you think we should report rather than try-and-fail?
> 
> I guess total address space and total number of ranges is easy to
> report, but I don't quite know what userspace will do with it?

Actually, I recall now, the idea was that the driver would sort this
out.

So, userspace would pass in whatever ranges it wanted and if the
device could do only. say 4, then the driver would merge adjacent
ranges till there were only 4 - and limit each range to any device
limit there may be and so on and so forth.

If the result was unfittable then it fails and the only thing
userspace can do is try with less.

So, if userspace wants to do 'minimal + extra for hotplug' then on
failure it should try 'minimal' and on failure of that give up
completely.

Then we don't have to try and catalog all the weird device choices we
might need into some query function.

Yishahi, this concept should be called out in the documentation, and
we probably need some core code to pull the ranges into an interval
tree and validate they are non-overlapping and so forth

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] vfio: Introduce DMA logging uAPIs for VFIO device
  2022-05-02 22:04       ` Jason Gunthorpe
  2022-05-02 23:02         ` Jason Gunthorpe
@ 2022-05-03 11:39         ` Joao Martins
  1 sibling, 0 replies; 8+ messages in thread
From: Joao Martins @ 2022-05-03 11:39 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson
  Cc: Yishai Hadas, kvm, maorg, cohuck, kevin.tian, cjia, kwankhede,
	targupta, shameerali.kolothum.thodi, eric.auger

On 5/2/22 23:04, Jason Gunthorpe wrote:
> On Mon, May 02, 2022 at 01:58:37PM -0600, Alex Williamson wrote:
>> On Mon, 2 May 2022 16:25:41 -0300
>> Jason Gunthorpe <jgg@nvidia.com> wrote:
>>> On Mon, May 02, 2022 at 01:07:01PM -0600, Alex Williamson wrote:
>>>>> +/*
>>>>> + * Upon VFIO_DEVICE_FEATURE_SET stop device DMA logging that was started
>>>>> + * by VFIO_DEVICE_FEATURE_DMA_LOGGING_START
>>>>> + */
>>>>> +#define VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP 4  
>>>>
>>>> This seems difficult to use from a QEMU perspective, where a vfio
>>>> device typically operates on a MemoryListener and we only have
>>>> visibility to one range at a time.  I don't see any indication that
>>>> LOGGING_START is meant to be cumulative such that userspace could
>>>> incrementally add ranges to be watched, nor clearly does LOGGING_STOP
>>>> appear to have any sort of IOVA range granularity.    
>>>
>>> Correct, at least mlx5 HW just cannot do a change tracking operation,
>>> so userspace must pre-select some kind of IOVA range to monitor based
>>> on the current VM configuration.
>>>
>>>> Is userspace intended to pass the full vCPU physical address range
>>>> here, and if so would a single min/max IOVA be sufficient?    
>>>
>>> At least mlx5 doesn't have enough capacity for that. Some reasonable
>>> in-between of the current address space, and maybe a speculative extra
>>> for hot plug.
>>

Though one knows (in the interim abstractions) at start of guest, the
underlying memory map e.g. that you can only have <this much> for
hotpluggable memory.

Albeit I can't remember (need to check) if a memory listener infra gets
propagated with everything at start (including the non present stuff like
hotplug MR).

>>> I'm expecting VFIO devices to use the same bitmap library as the IOMMU
>>> drivers so we have a consistent reporting.
>>
>> I haven't reviewed that series in any detail yet, but it seems to
>> impose the same bitmap size and reporting to userspace features as
>> type1 based in internal limits of bitmap_set().  Thanks,
> 
> It goes page by page, so the bitmap_set() can't see more than 4k of
> bitmap at a time, IIRC.

It allows a bitmap big enough to marshal 64G of IOVA space to it,
albeit it only does a section at a time (128M) i.e. one 4K page or iow
32K bits at a time. However the callers will only set bits in the bitmap
for a given PTE page size, so we are always limited by PTE page size being
recorded at the bitmap page-size granularity.

The limit is implicitly bound by the IOVA range being passed in and bitmap
page size. So this means it's the bitmap size necessary to record an iova range
of length ULONG_MAX.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] vfio: Introduce DMA logging uAPIs for VFIO device
  2022-05-02 23:02         ` Jason Gunthorpe
@ 2022-05-03 11:46           ` Joao Martins
  0 siblings, 0 replies; 8+ messages in thread
From: Joao Martins @ 2022-05-03 11:46 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson
  Cc: Yishai Hadas, kvm, maorg, cohuck, kevin.tian, cjia, kwankhede,
	targupta, shameerali.kolothum.thodi, eric.auger

On 5/3/22 00:02, Jason Gunthorpe wrote:
> On Mon, May 02, 2022 at 07:04:47PM -0300, Jason Gunthorpe wrote:
>>> Ah great, implicit limitations not reported to the user that I hadn't
>>> even guessed!  How does a user learn about any limitations in the
>>> number of ranges or size of each range?
>>
>> There is some limit of number of ranges and total aggregate address
>> space, you think we should report rather than try-and-fail?
>>
>> I guess total address space and total number of ranges is easy to
>> report, but I don't quite know what userspace will do with it?
> 
> Actually, I recall now, the idea was that the driver would sort this
> out.
> 
> So, userspace would pass in whatever ranges it wanted and if the
> device could do only. say 4, then the driver would merge adjacent
> ranges till there were only 4 - and limit each range to any device
> limit there may be and so on and so forth.
> 
> If the result was unfittable then it fails and the only thing
> userspace can do is try with less.
> 
> So, if userspace wants to do 'minimal + extra for hotplug' then on
> failure it should try 'minimal' and on failure of that give up
> completely.
> 

Is it worth classifying the range as present or not, say a ::flags of
vfio_device_feature_dma_logging_range to help choosing what to aggregate
or not? Provided the number of calls might be undeterministic?

> Then we don't have to try and catalog all the weird device choices we
> might need into some query function.
> 
> Yishahi, this concept should be called out in the documentation, and
> we probably need some core code to pull the ranges into an interval
> tree and validate they are non-overlapping and so forth
> 
> Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-05-03 11:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-01 12:33 [PATCH RFC] vfio: Introduce DMA logging uAPIs for VFIO device Yishai Hadas
2022-05-02 19:07 ` Alex Williamson
2022-05-02 19:25   ` Jason Gunthorpe
2022-05-02 19:58     ` Alex Williamson
2022-05-02 22:04       ` Jason Gunthorpe
2022-05-02 23:02         ` Jason Gunthorpe
2022-05-03 11:46           ` Joao Martins
2022-05-03 11:39         ` Joao Martins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.