kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Yishai Hadas <yishaih@nvidia.com>
Cc: <jgg@nvidia.com>, <kvm@vger.kernel.org>, <maorg@nvidia.com>,
	<cohuck@redhat.com>, <kevin.tian@intel.com>,
	<joao.m.martins@oracle.com>, <cjia@nvidia.com>,
	<kwankhede@nvidia.com>, <targupta@nvidia.com>,
	<shameerali.kolothum.thodi@huawei.com>, <eric.auger@redhat.com>
Subject: Re: [PATCH RFC] vfio: Introduce DMA logging uAPIs for VFIO device
Date: Mon, 2 May 2022 13:07:01 -0600	[thread overview]
Message-ID: <20220502130701.62e10b00.alex.williamson@redhat.com> (raw)
In-Reply-To: <20220501123301.127279-1-yishaih@nvidia.com>

On Sun, 1 May 2022 15:33:00 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> DMA logging allows a device to internally record what DMAs the device is
> initiation and report them back to userspace.
> 
> It is part of the VFIO migration infrastructure that allows implementing
> dirty page tracking during the pre-copy phase of live migration.
> 
> Only DMA WRITEs are logged, and this API is not connected to
> VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE.
> 
> This RFC patch shows the expected usage of the DMA logging involved
> uAPIs for VFIO device-tracker.
> 
> It uses the FEATURE ioctl with its GET/SET/PROBE options as of below.
> 
> It exposes a PROBE option to detect if the device supports DMA logging.
> 
> It exposes a SET option to start device DMA logging in given of IOVA
> ranges.
> 
> It exposes a SET option to stop device DMA logging that was previously
> started.
> 
> It exposes a GET option to read back and clear the device DMA log.
> 
> Extra details exist as part of vfio.h per a specific option in this RFC
> patch.
> 
> Note:
> To have IOMMU hardware support for dirty pages the below RFC [1] that
> was sent by Joao Martins can be referenced.
> 
> [1] https://lore.kernel.org/all/2d369e58-8ac0-f263-7b94-fe73917782e1@linux.intel.com/T/
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  include/uapi/linux/vfio.h | 80 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 80 insertions(+)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index fea86061b44e..9d0b7e73e999 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -986,6 +986,86 @@ enum vfio_device_mig_state {
>  	VFIO_DEVICE_STATE_RUNNING_P2P = 5,
>  };
>  
> +/*
> + * Upon VFIO_DEVICE_FEATURE_SET start device DMA logging.
> + * VFIO_DEVICE_FEATURE_PROBE can be used to detect if the device supports
> + * DMA logging.
> + *
> + * DMA logging allows a device to internally record what DMAs the device is
> + * initiation and report them back to userspace. It is part of the VFIO
> + * migration infrastructure that allows implementing dirty page tracking
> + * during the pre copy phase of live migration. Only DMA WRITEs are logged,
> + * and this API is not connected to VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE.
> + *
> + * When DMA logging is started a range of IOVAs to monitor is provided and the
> + * device can optimize its logging to cover only the IOVA range given. Each
> + * DMA that the device initiates inside the range will be logged by the device
> + * for later retrieval.
> + *
> + * page_size is an input that hints what tracking granularity the device
> + * should try to achieve. If the device cannot do the hinted page size then it
> + * should pick the next closest page size it supports. On output the device
> + * will return the page size it selected.
> + *
> + * ranges is a pointer to an array of
> + * struct vfio_device_feature_dma_logging_range.
> + */
> +struct vfio_device_feature_dma_logging_control {
> +	__aligned_u64 page_size;
> +	__u32 num_ranges;
> +	__u32 __reserved;
> +	__aligned_u64 ranges;
> +};
> +
> +struct vfio_device_feature_dma_logging_range {
> +	__aligned_u64 iova;
> +	__aligned_u64 length;
> +};
> +
> +#define VFIO_DEVICE_FEATURE_DMA_LOGGING_START 3
> +
> +
> +/*
> + * Upon VFIO_DEVICE_FEATURE_SET stop device DMA logging that was started
> + * by VFIO_DEVICE_FEATURE_DMA_LOGGING_START
> + */
> +#define VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP 4

This seems difficult to use from a QEMU perspective, where a vfio
device typically operates on a MemoryListener and we only have
visibility to one range at a time.  I don't see any indication that
LOGGING_START is meant to be cumulative such that userspace could
incrementally add ranges to be watched, nor clearly does LOGGING_STOP
appear to have any sort of IOVA range granularity.  Is userspace
intended to pass the full vCPU physical address range here, and if so
would a single min/max IOVA be sufficient?  I'm not sure how else we
could support memory hotplug while this was enabled.

How does this work with IOMMU based tracking, I assume that if devices
share an IOAS we wouldn't be able to exclude devices supporting
device-level tracking from the IOAS log.

> +
> +/*
> + * Upon VFIO_DEVICE_FEATURE_GET read back and clear the device DMA log
> + *
> + * Query the device's DMA log for written pages within the given IOVA range.
> + * During querying the log is cleared for the IOVA range.
> + *
> + * bitmap is a pointer to an array of u64s that will hold the output bitmap
> + * with 1 bit reporting a page_size unit of IOVA. The mapping of IOVA to bits
> + * is given by:
> + *  bitmap[(addr - iova)/page_size] & (1ULL << (addr % 64))
> + *
> + * The input page_size can be any power of two value and does not have to
> + * match the value given to VFIO_DEVICE_FEATURE_DMA_LOGGING_START. The driver
> + * will format its internal logging to match the reporting page size, possibly
> + * by replicating bits if the internal page size is lower than requested.

Or setting multiple bits if the internal page size is larger than
requested.

Is there a bitmap size limit?  We've minimally needed to impose limits
to reflect limitations of the bitmap code internally in the past.
Userspace needs a means to learn such limits.  Thanks,

Alex

> + *
> + * Bits will be updated in bitmap using atomic or to allow userspace to
> + * combine bitmaps from multiple trackers together. Therefore userspace must
> + * zero the bitmap before doing any reports.
> + *
> + * If any error is returned userspace should assume that the dirty log is
> + * corrupted and restart.
> + *
> + * If DMA logging is not enabled, an error will be returned.
> + *
> + */
> +struct vfio_device_feature_dma_logging_report {
> +	__aligned_u64 iova;
> +	__aligned_u64 length;
> +	__aligned_u64 page_size;
> +	__aligned_u64 bitmap;
> +};
> +
> +#define VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT 5
> +
>  /* -------- API for Type1 VFIO IOMMU -------- */
>  
>  /**


  reply	other threads:[~2022-05-02 19:07 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-01 12:33 [PATCH RFC] vfio: Introduce DMA logging uAPIs for VFIO device Yishai Hadas
2022-05-02 19:07 ` Alex Williamson [this message]
2022-05-02 19:25   ` Jason Gunthorpe
2022-05-02 19:58     ` Alex Williamson
2022-05-02 22:04       ` Jason Gunthorpe
2022-05-02 23:02         ` Jason Gunthorpe
2022-05-03 11:46           ` Joao Martins
2022-05-03 11:39         ` Joao Martins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220502130701.62e10b00.alex.williamson@redhat.com \
    --to=alex.williamson@redhat.com \
    --cc=cjia@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=joao.m.martins@oracle.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=maorg@nvidia.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=targupta@nvidia.com \
    --cc=yishaih@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).