On Thu, Jun 04, 2020 at 02:40:22PM +0100, Alex Bennée wrote:
> The purpose of vhost_section is to identify RAM regions that need to
> be made available to a vhost client. However when running under TCG
> all RAM sections have DIRTY_MEMORY_CODE set which leads to problems
> down the line.
> 
> Re-factor the code so:
> 
>   - steps are clearer to follow
>   - reason for rejection is recorded in the trace point
>   - we allow DIRTY_MEMORY_CODE when TCG is enabled
> 
> We expand the comment to explain that kernel based vhost has specific
> support for migration tracking.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: Stefan Hajnoczi <stefanha@redhat.com>
> 
> ---
> v2
>   - drop enum, add trace_vhost_reject_section
>   - return false at any fail point
>   - unconditionally add DIRTY_MEMORY_CODE to handled cases
>   - slightly re-word the explanatory comment and commit message
> ---
>  hw/virtio/vhost.c      | 55 ++++++++++++++++++++++++++++++------------
>  hw/virtio/trace-events |  3 ++-
>  2 files changed, 41 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index aff98a0ede5..120c0cc747b 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -27,6 +27,7 @@
>  #include "migration/blocker.h"
>  #include "migration/qemu-file-types.h"
>  #include "sysemu/dma.h"
> +#include "sysemu/tcg.h"
>  #include "trace.h"
>  
>  /* enabled until disconnected backend stabilizes */
> @@ -403,26 +404,48 @@ static int vhost_verify_ring_mappings(struct vhost_dev *dev,
>      return r;
>  }
>  
> +/*
> + * vhost_section: identify sections needed for vhost access
> + *
> + * We only care about RAM sections here (where virtqueue can live). If

It's not just the virtqueue. Arbitrary guest RAM buffers can be placed
into the virtqueue so we need to pass all guest RAM to the vhost device
backend.

> + * we find one we still allow the backend to potentially filter it out
> + * of our list.
> + */
>  static bool vhost_section(struct vhost_dev *dev, MemoryRegionSection *section)
>  {
> -    bool result;
> -    bool log_dirty = memory_region_get_dirty_log_mask(section->mr) &
> -                     ~(1 << DIRTY_MEMORY_MIGRATION);
> -    result = memory_region_is_ram(section->mr) &&
> -        !memory_region_is_rom(section->mr);
> -
> -    /* Vhost doesn't handle any block which is doing dirty-tracking other
> -     * than migration; this typically fires on VGA areas.
> -     */
> -    result &= !log_dirty;
> +    MemoryRegion *mr = section->mr;
> +
> +    if (memory_region_is_ram(mr) && !memory_region_is_rom(mr)) {
> +        uint8_t dirty_mask = memory_region_get_dirty_log_mask(mr);
> +        uint8_t handled_dirty;
> +
> +        /*
> +         * Kernel based vhost doesn't handle any block which is doing
> +         * dirty-tracking other than migration for which it has
> +         * specific logging support. However for TCG the kernel never
> +         * gets involved anyway so we can also ignore it's
> +         * self-modiying code detection flags.
> +         */
> +        handled_dirty = (1 << DIRTY_MEMORY_MIGRATION);
> +        handled_dirty |= (1 << DIRTY_MEMORY_CODE);

Wait, how is vhost going to support TCG self-modifying code detection?

It seems like this change will allow vhost devices to run, but now QEMU
will miss out on self-modifying code. Do we already enable vhost dirty
memory logging for DIRTY_MEMORY_CODE memory somehwere?

Or is there some cross-architectural reason why we can be sure that
allowing the vhost backend to DMA to guest RAM without marking pages
dirty is safe? For example, maybe the CPU needs to explicitly flush the
icache after DMA because this was a DMA operation not a regular
self-modifying code memory store? But is this true across all
architectures?

Stefan