From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andre Przywara <andre.przywara@arm.com>
Subject: Re: [PATCH v3 18/19] KVM: arm64: ITS: Device table save/restore
Date: Wed, 22 Mar 2017 14:39:29 +0000
Message-ID: <2c4841df-f7b3-ee03-16d3-7e32c9cbd936@arm.com>
References: <1488800074-21991-1-git-send-email-eric.auger@redhat.com>
 <1488800074-21991-19-git-send-email-eric.auger@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Cc: Prasun.Kapoor@cavium.com, quintela@redhat.com, dgilbert@redhat.com,
 pbonzini@redhat.com
To: Eric Auger <eric.auger@redhat.com>, eric.auger.pro@gmail.com,
 marc.zyngier@arm.com, christoffer.dall@linaro.org,
 vijayak@caviumnetworks.com, Vijaya.Kumar@cavium.com,
 peter.maydell@linaro.org, linux-arm-kernel@lists.infradead.org,
 drjones@redhat.com, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org
Return-path: <kvmarm-bounces@lists.cs.columbia.edu>
In-Reply-To: <1488800074-21991-19-git-send-email-eric.auger@redhat.com>
List-Unsubscribe: <https://lists.cs.columbia.edu/mailman/options/kvmarm>,
 <mailto:kvmarm-request@lists.cs.columbia.edu?subject=unsubscribe>
List-Archive: <https://lists.cs.columbia.edu/pipermail/kvmarm>
List-Post: <mailto:kvmarm@lists.cs.columbia.edu>
List-Help: <mailto:kvmarm-request@lists.cs.columbia.edu?subject=help>
List-Subscribe: <https://lists.cs.columbia.edu/mailman/listinfo/kvmarm>,
 <mailto:kvmarm-request@lists.cs.columbia.edu?subject=subscribe>
Errors-To: kvmarm-bounces@lists.cs.columbia.edu
Sender: kvmarm-bounces@lists.cs.columbia.edu
List-Id: kvm.vger.kernel.org

Hi,

On 06/03/17 11:34, Eric Auger wrote:
> This patch flushes the device table entries into guest RAM.
> Both flat table and 2 stage tables are supported.  DeviceId
> indexing is used.
> 
> For each device listed in the device table, we also flush
> the translation table using the vgic_its_flush/restore_itt
> routines.
> 
> On restore, devices are re-allocated and their itte are
> re-built.

Some minor things below.
In general I had quite some trouble to understand what's going on here,
though I convinced myself that this is correct. So could you add a bit
more comments here? For instance to explain that we have to explicitly
handle the L1 table on restore, but not on flush.

> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> v2 -> v3:
> - fix itt_addr bitmask in vgic_its_restore_dte
> - addition of return 0 in vgic_its_restore_ite moved to
>   the ITE related patch
> 
> v1 -> v2:
> - use 8 byte format for DTE and ITE
> - support 2 stage format
> - remove kvm parameter
> - ITT flush/restore moved in a separate patch
> - use deviceid indexing
> ---
>  virt/kvm/arm/vgic/vgic-its.c | 144 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 142 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
> index a216849..27ebabd 100644
> --- a/virt/kvm/arm/vgic/vgic-its.c
> +++ b/virt/kvm/arm/vgic/vgic-its.c
> @@ -1849,12 +1849,137 @@ static int vgic_its_restore_itt(struct vgic_its *its,
>  }
>  
>  /**
> + * vgic_its_flush_dte - Flush a device table entry at a given GPA
> + *
> + * @its: ITS handle
> + * @dev: ITS device
> + * @ptr: GPA
> + */
> +static int vgic_its_flush_dte(struct vgic_its *its,
> +			      struct its_device *dev, gpa_t ptr)
> +{
> +	struct kvm *kvm = its->dev->kvm;
> +	u64 val, itt_addr_field;
> +	int ret;
> +	u32 next_offset;
> +
> +	itt_addr_field = dev->itt_addr >> 8;
> +	next_offset = compute_next_devid_offset(&its->device_list, dev);
> +	val = (((u64)next_offset << 45) | (itt_addr_field << 5) |

So this gives you 19 bits for next_offset, but the value of
VITS_DTE_MAX_DEVID_OFFSET suggests 20 bits. It should become more
obvious what's happening here if use "BITS(x) - 1" at the definition as
suggested before.

Also you limit itt_addr here to 40 bits, where the actual limit seems to
be 44 bits (52 - 8). Is that limited by KVM somewhere else?
Even if it is, I think we should make sure that itt_addr_field doesn't
spill over into next_offset.

> +		(dev->nb_eventid_bits - 1));

Mmmh, here nb_eventid_bits seems to be the real bit number again. Puzzled.

> +	val = cpu_to_le64(val);
> +	ret = kvm_write_guest(kvm, ptr, &val, 8);
> +	return ret;
> +}
> +
> +/**
> + * vgic_its_restore_dte - restore a device table entry
> + *
> + * @its: its handle
> + * @id: device id the DTE corresponds to
> + * @ptr: kernel VA where the 8 byte DTE is located
> + * @opaque: unused
> + * @next: offset to the next valid device id
> + *
> + * Return: < 0 on error, 0 otherwise
> + */
> +static int vgic_its_restore_dte(struct vgic_its *its, u32 id,
> +				void *ptr, void *opaque, u32 *next)
> +{
> +	struct its_device *dev;
> +	gpa_t itt_addr;
> +	size_t size;
> +	u64 val, *p = (u64 *)ptr;
> +	int ret;
> +
> +	val = *p;
> +	val = le64_to_cpu(val);
> +
> +	size = val & GENMASK_ULL(4, 0);
> +	itt_addr = (val & GENMASK_ULL(44, 5)) >> 5;
> +	*next = 1;
> +
> +	if (!itt_addr)
> +		return 0;
> +
> +	/* dte entry is valid */
> +	*next = (val & GENMASK_ULL(63, 45)) >> 45;

No need for GENMASK, just shift by 45.

> +
> +	ret = vgic_its_alloc_device(its, &dev, id,
> +				    itt_addr, size);
> +	if (ret)
> +		return ret;
> +	ret = vgic_its_restore_itt(its, dev);
> +
> +	return ret;
> +}
> +
> +/**
>   * vgic_its_flush_device_tables - flush the device table and all ITT
>   * into guest RAM
>   */
>  static int vgic_its_flush_device_tables(struct vgic_its *its)
>  {
> -	return -ENXIO;
> +	struct its_device *dev;
> +	u64 baser;
> +
> +	baser = its->baser_device_table;
> +
> +	list_for_each_entry(dev, &its->device_list, dev_list) {
> +		int ret;
> +		gpa_t eaddr;
> +
> +		if (!vgic_its_check_id(its, baser,
> +				       dev->device_id, &eaddr))
> +			return -EINVAL;
> +
> +		ret = vgic_its_flush_itt(its, dev);
> +		if (ret)
> +			return ret;
> +
> +		ret = vgic_its_flush_dte(its, dev, eaddr);
> +		if (ret)
> +			return ret;
> +		}

whitespace ?

> +	return 0;
> +}
> +
> +/**
> + * handle_l1_entry - callback used for L1 entries (2 stage case)
> + *
> + * @its: its handle
> + * @id: id
> + * @addr: kernel VA
> + * @opaque: unused
> + * @next_offset: offset to the next L1 entry: 0 if the last element
> + * was found, 1 otherwise
> + */
> +static int handle_l1_entry(struct vgic_its *its, u32 id, void *addr,
> +			   void *opaque, u32 *next_offset)
> +{
> +	u64 *pe = addr;
> +	gpa_t gpa;
> +	int l2_start_id = id * (SZ_64K / 8);

I think we can use GITS_LVL1_ENTRY_SIZE here, which I suppose is what
the 8 stands for.

> +	int ret;
> +
> +	*pe = le64_to_cpu(*pe);

Is it correct to _update_ the entry here? I think that breaks BE, right?
Beside I believe the ITS is not supposed to tinker with the L1 table
entries, isn't it?

So should it be instead:
	u64 pe = *(u64 *)addr;
	
	pe = le64_to_cpu(pe);

instead?
And what "pe" stand for anyway? Maybe "entry" instead?

> +	*next_offset = 1;
> +
> +	if (!(*pe & BIT_ULL(63)))
> +		return 0;
> +
> +	gpa = *pe & GENMASK_ULL(51, 16);
> +
> +	ret = lookup_table(its, gpa, SZ_64K, 8,
> +			    l2_start_id, vgic_its_restore_dte, NULL);
> +
> +	if (ret == 1) {
> +		/* last entry was found in this L2 table */
> +		*next_offset = 0;
> +		ret = 0;
> +	}
> +
> +	return ret;
>  }
>  
>  /**
> @@ -1863,7 +1988,22 @@ static int vgic_its_flush_device_tables(struct vgic_its *its)
>   */
>  static int vgic_its_restore_device_tables(struct vgic_its *its)
>  {
> -	return -ENXIO;
> +	u64 baser = its->baser_device_table;
> +	int l1_tbl_size = GITS_BASER_NR_PAGES(baser) * SZ_64K;
> +	int l1_esz = GITS_BASER_ENTRY_SIZE(baser);
> +	gpa_t l1_gpa;
> +
> +	l1_gpa = BASER_ADDRESS(baser);
> +	if (!l1_gpa)
> +		return 0;
> +
> +	if (!(baser & GITS_BASER_INDIRECT))
> +		return lookup_table(its, l1_gpa, l1_tbl_size, l1_esz,
> +				    0, vgic_its_restore_dte, NULL);
> +
> +	/* two stage table */
> +	return lookup_table(its, l1_gpa, l1_tbl_size, 8, 0,
> +			    handle_l1_entry, NULL);

That usage of lookup_table with the callback is pretty neat!

Cheers,
Andre.

>  }
>  
>  static int vgic_its_flush_cte(struct vgic_its *its,
> 

From mboxrd@z Thu Jan  1 00:00:00 1970
From: andre.przywara@arm.com (Andre Przywara)
Date: Wed, 22 Mar 2017 14:39:29 +0000
Subject: [PATCH v3 18/19] KVM: arm64: ITS: Device table save/restore
In-Reply-To: <1488800074-21991-19-git-send-email-eric.auger@redhat.com>
References: <1488800074-21991-1-git-send-email-eric.auger@redhat.com>
 <1488800074-21991-19-git-send-email-eric.auger@redhat.com>
Message-ID: <2c4841df-f7b3-ee03-16d3-7e32c9cbd936@arm.com>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org

Hi,

On 06/03/17 11:34, Eric Auger wrote:
> This patch flushes the device table entries into guest RAM.
> Both flat table and 2 stage tables are supported.  DeviceId
> indexing is used.
> 
> For each device listed in the device table, we also flush
> the translation table using the vgic_its_flush/restore_itt
> routines.
> 
> On restore, devices are re-allocated and their itte are
> re-built.

Some minor things below.
In general I had quite some trouble to understand what's going on here,
though I convinced myself that this is correct. So could you add a bit
more comments here? For instance to explain that we have to explicitly
handle the L1 table on restore, but not on flush.

> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> v2 -> v3:
> - fix itt_addr bitmask in vgic_its_restore_dte
> - addition of return 0 in vgic_its_restore_ite moved to
>   the ITE related patch
> 
> v1 -> v2:
> - use 8 byte format for DTE and ITE
> - support 2 stage format
> - remove kvm parameter
> - ITT flush/restore moved in a separate patch
> - use deviceid indexing
> ---
>  virt/kvm/arm/vgic/vgic-its.c | 144 ++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 142 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
> index a216849..27ebabd 100644
> --- a/virt/kvm/arm/vgic/vgic-its.c
> +++ b/virt/kvm/arm/vgic/vgic-its.c
> @@ -1849,12 +1849,137 @@ static int vgic_its_restore_itt(struct vgic_its *its,
>  }
>  
>  /**
> + * vgic_its_flush_dte - Flush a device table entry at a given GPA
> + *
> + * @its: ITS handle
> + * @dev: ITS device
> + * @ptr: GPA
> + */
> +static int vgic_its_flush_dte(struct vgic_its *its,
> +			      struct its_device *dev, gpa_t ptr)
> +{
> +	struct kvm *kvm = its->dev->kvm;
> +	u64 val, itt_addr_field;
> +	int ret;
> +	u32 next_offset;
> +
> +	itt_addr_field = dev->itt_addr >> 8;
> +	next_offset = compute_next_devid_offset(&its->device_list, dev);
> +	val = (((u64)next_offset << 45) | (itt_addr_field << 5) |

So this gives you 19 bits for next_offset, but the value of
VITS_DTE_MAX_DEVID_OFFSET suggests 20 bits. It should become more
obvious what's happening here if use "BITS(x) - 1"@the definition as
suggested before.

Also you limit itt_addr here to 40 bits, where the actual limit seems to
be 44 bits (52 - 8). Is that limited by KVM somewhere else?
Even if it is, I think we should make sure that itt_addr_field doesn't
spill over into next_offset.

> +		(dev->nb_eventid_bits - 1));

Mmmh, here nb_eventid_bits seems to be the real bit number again. Puzzled.

> +	val = cpu_to_le64(val);
> +	ret = kvm_write_guest(kvm, ptr, &val, 8);
> +	return ret;
> +}
> +
> +/**
> + * vgic_its_restore_dte - restore a device table entry
> + *
> + * @its: its handle
> + * @id: device id the DTE corresponds to
> + * @ptr: kernel VA where the 8 byte DTE is located
> + * @opaque: unused
> + * @next: offset to the next valid device id
> + *
> + * Return: < 0 on error, 0 otherwise
> + */
> +static int vgic_its_restore_dte(struct vgic_its *its, u32 id,
> +				void *ptr, void *opaque, u32 *next)
> +{
> +	struct its_device *dev;
> +	gpa_t itt_addr;
> +	size_t size;
> +	u64 val, *p = (u64 *)ptr;
> +	int ret;
> +
> +	val = *p;
> +	val = le64_to_cpu(val);
> +
> +	size = val & GENMASK_ULL(4, 0);
> +	itt_addr = (val & GENMASK_ULL(44, 5)) >> 5;
> +	*next = 1;
> +
> +	if (!itt_addr)
> +		return 0;
> +
> +	/* dte entry is valid */
> +	*next = (val & GENMASK_ULL(63, 45)) >> 45;

No need for GENMASK, just shift by 45.

> +
> +	ret = vgic_its_alloc_device(its, &dev, id,
> +				    itt_addr, size);
> +	if (ret)
> +		return ret;
> +	ret = vgic_its_restore_itt(its, dev);
> +
> +	return ret;
> +}
> +
> +/**
>   * vgic_its_flush_device_tables - flush the device table and all ITT
>   * into guest RAM
>   */
>  static int vgic_its_flush_device_tables(struct vgic_its *its)
>  {
> -	return -ENXIO;
> +	struct its_device *dev;
> +	u64 baser;
> +
> +	baser = its->baser_device_table;
> +
> +	list_for_each_entry(dev, &its->device_list, dev_list) {
> +		int ret;
> +		gpa_t eaddr;
> +
> +		if (!vgic_its_check_id(its, baser,
> +				       dev->device_id, &eaddr))
> +			return -EINVAL;
> +
> +		ret = vgic_its_flush_itt(its, dev);
> +		if (ret)
> +			return ret;
> +
> +		ret = vgic_its_flush_dte(its, dev, eaddr);
> +		if (ret)
> +			return ret;
> +		}

whitespace ?

> +	return 0;
> +}
> +
> +/**
> + * handle_l1_entry - callback used for L1 entries (2 stage case)
> + *
> + * @its: its handle
> + * @id: id
> + * @addr: kernel VA
> + * @opaque: unused
> + * @next_offset: offset to the next L1 entry: 0 if the last element
> + * was found, 1 otherwise
> + */
> +static int handle_l1_entry(struct vgic_its *its, u32 id, void *addr,
> +			   void *opaque, u32 *next_offset)
> +{
> +	u64 *pe = addr;
> +	gpa_t gpa;
> +	int l2_start_id = id * (SZ_64K / 8);

I think we can use GITS_LVL1_ENTRY_SIZE here, which I suppose is what
the 8 stands for.

> +	int ret;
> +
> +	*pe = le64_to_cpu(*pe);

Is it correct to _update_ the entry here? I think that breaks BE, right?
Beside I believe the ITS is not supposed to tinker with the L1 table
entries, isn't it?

So should it be instead:
	u64 pe = *(u64 *)addr;
	
	pe = le64_to_cpu(pe);

instead?
And what "pe" stand for anyway? Maybe "entry" instead?

> +	*next_offset = 1;
> +
> +	if (!(*pe & BIT_ULL(63)))
> +		return 0;
> +
> +	gpa = *pe & GENMASK_ULL(51, 16);
> +
> +	ret = lookup_table(its, gpa, SZ_64K, 8,
> +			    l2_start_id, vgic_its_restore_dte, NULL);
> +
> +	if (ret == 1) {
> +		/* last entry was found in this L2 table */
> +		*next_offset = 0;
> +		ret = 0;
> +	}
> +
> +	return ret;
>  }
>  
>  /**
> @@ -1863,7 +1988,22 @@ static int vgic_its_flush_device_tables(struct vgic_its *its)
>   */
>  static int vgic_its_restore_device_tables(struct vgic_its *its)
>  {
> -	return -ENXIO;
> +	u64 baser = its->baser_device_table;
> +	int l1_tbl_size = GITS_BASER_NR_PAGES(baser) * SZ_64K;
> +	int l1_esz = GITS_BASER_ENTRY_SIZE(baser);
> +	gpa_t l1_gpa;
> +
> +	l1_gpa = BASER_ADDRESS(baser);
> +	if (!l1_gpa)
> +		return 0;
> +
> +	if (!(baser & GITS_BASER_INDIRECT))
> +		return lookup_table(its, l1_gpa, l1_tbl_size, l1_esz,
> +				    0, vgic_its_restore_dte, NULL);
> +
> +	/* two stage table */
> +	return lookup_table(its, l1_gpa, l1_tbl_size, 8, 0,
> +			    handle_l1_entry, NULL);

That usage of lookup_table with the callback is pretty neat!

Cheers,
Andre.

>  }
>  
>  static int vgic_its_flush_cte(struct vgic_its *its,
>