All of lore.kernel.org
 help / color / mirror / Atom feed
* bind pasid table API
@ 2017-09-19  3:45 Jacob Pan
  2017-09-20 12:09 ` Jean-Philippe Brucker
  0 siblings, 1 reply; 18+ messages in thread
From: Jacob Pan @ 2017-09-19  3:45 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jean-Philippe Brucker
  Cc: jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w, David Woodhouse

Hi Jean and All,

This is a follow-up on the LPC discussion we had last week.
(https://linuxplumbersconf.org/2017/ocw/proposals/4748)

My understanding is that the data structure below can satisfy the
needs from Intel (pointer + size) and AMD (pointer only). But ARM
pvIOMMU would need additional info to indicate the page table format.
Could you share your idea of the right addition for ARM such that we
can have a unified API?

/**
 * PASID table data used to bind guest PASID table to the host IOMMU. This will
 * enable guest managed first level page tables.
 * @ptr:	PASID table pointer
 * @size_order:	number of bits supported in the guest PASID table, must be less
 *		or equal than the host table size.
 */
struct pasid_table_info {
	__u64	ptr;
	__u64	size_order;
};

Thanks,

Jacob

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bind pasid table API
  2017-09-19  3:45 bind pasid table API Jacob Pan
@ 2017-09-20 12:09 ` Jean-Philippe Brucker
       [not found]   ` <6ecc1afc-6302-cd22-6944-ef4c6ac09587-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Jean-Philippe Brucker @ 2017-09-20 12:09 UTC (permalink / raw)
  To: Jacob Pan, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: David Woodhouse

Hi Jacob,

[Adding Eric as he might need pasid_table_info for vSVM at some point]

On 19/09/17 04:45, Jacob Pan wrote:
> Hi Jean and All,
> 
> This is a follow-up on the LPC discussion we had last week.
> (https://linuxplumbersconf.org/2017/ocw/proposals/4748)
> 
> My understanding is that the data structure below can satisfy the
> needs from Intel (pointer + size) and AMD (pointer only). But ARM
> pvIOMMU would need additional info to indicate the page table format.
> Could you share your idea of the right addition for ARM such that we
> can have a unified API?
> 
> /**
>  * PASID table data used to bind guest PASID table to the host IOMMU. This will
>  * enable guest managed first level page tables.
>  * @ptr:	PASID table pointer
>  * @size_order:	number of bits supported in the guest PASID table, must be less
>  *		or equal than the host table size.
>  */
> struct pasid_table_info {
> 	__u64	ptr;
> 	__u64	size_order;
> };

For the PASID table, Arm SMMUv3 would need two additional fields:
* 'format' telling whether the table has 1 or 2 levels and their
  dimensions,
* 'default_substream' telling if PASID0 is reserved for non-pasid traffic.

I think that's it for the moment, but it does require to leave space
for a vendor-specific structure at the end. It is one reason why I'd
prefer having a 'model' field in the pasid_table_info structure telling
what fields the whole structure actually contains.

Another reason is if some IOMMU is able to support multiple PASID table
formats, it could advertise them all in sysfs and Qemu could tell which
one it chose in 'model'. I'm not sure we'll ever see that in practice.



For binding page tables instead of PASID tables (e.g. virtio-iommu), the
generic data would be:

struct pgtable_info {
	__u32	pasid;
	__u64	ptr;
	__u32	model;
	__u8	model_data[];
};

Followed by a few arch-specific configuration values. For Arm we can
summarize this to three registers, defined in the Armv8 Architecture
Reference Manual:

struct arm_lpae_pgtable_info {
	__u64	tcr;	/* Translation Control Register */
	__u64	mair;	/* Memory Attributes Indirection Register */
	__u64	asid;	/* Address Space ID */
};

Some data packed in the TCR might be common to most architectures, like
page granularity and max VA size. Most fields of the TCR won't be used but
it provides a nice architected way to communicate Arm page table
configuration.

Note that there might be an additional page directory in the arch-specific
info, as we can split the address space in two. I'm not sure whether we
should allow it yet.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bind pasid table API
       [not found]   ` <6ecc1afc-6302-cd22-6944-ef4c6ac09587-5wv7dgnIgG8@public.gmane.org>
@ 2017-09-20 22:35     ` Jacob Pan
  2017-09-25 11:45       ` Jean-Philippe Brucker
  2017-09-21  3:00     ` Liu, Yi L
  2017-09-27 13:40     ` Joerg Roedel
  2 siblings, 1 reply; 18+ messages in thread
From: Jacob Pan @ 2017-09-20 22:35 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w, David Woodhouse

On Wed, 20 Sep 2017 13:09:47 +0100
Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:

> Hi Jacob,
> 
> [Adding Eric as he might need pasid_table_info for vSVM at some point]
> 
> On 19/09/17 04:45, Jacob Pan wrote:
> > Hi Jean and All,
> > 
> > This is a follow-up on the LPC discussion we had last week.
> > (https://linuxplumbersconf.org/2017/ocw/proposals/4748)
> > 
> > My understanding is that the data structure below can satisfy the
> > needs from Intel (pointer + size) and AMD (pointer only). But ARM
> > pvIOMMU would need additional info to indicate the page table
> > format. Could you share your idea of the right addition for ARM
> > such that we can have a unified API?
> > 
> > /**
> >  * PASID table data used to bind guest PASID table to the host
> > IOMMU. This will
> >  * enable guest managed first level page tables.
> >  * @ptr:	PASID table pointer
> >  * @size_order:	number of bits supported in the guest PASID
> > table, must be less
> >  *		or equal than the host table size.
> >  */
> > struct pasid_table_info {
> > 	__u64	ptr;
> > 	__u64	size_order;
> > };  
> 
> For the PASID table, Arm SMMUv3 would need two additional fields:
> * 'format' telling whether the table has 1 or 2 levels and their
>   dimensions,
just wondering if the format will be different than whatever is used on
the host? i.e. could there be a mismatch? I am ok with this field just
wondering if this can be resolved via the sysfs query interface if that
is a global thing.
> * 'default_substream' telling if PASID0 is reserved for non-pasid
> traffic.
> 
could it be part of a feature flag? i.e.

struct pasid_table_info {
 	__u64	ptr;
 	__u64	size_order;
	__u64	flags;
#define PASID0_FOR_DMA_WITHOUT_PASID;
	enum pasid_table_format format;
};  

> I think that's it for the moment, but it does require to leave space
> for a vendor-specific structure at the end. It is one reason why I'd
> prefer having a 'model' field in the pasid_table_info structure
> telling what fields the whole structure actually contains.
> 
I think we have been there before. the downside is that model specific
knowledge is required in the generic VFIO layer, if its content is to
be inspected.

> Another reason is if some IOMMU is able to support multiple PASID
> table formats, it could advertise them all in sysfs and Qemu could
> tell which one it chose in 'model'. I'm not sure we'll ever see that
> in practice.
> 
I would expect when query interface between QEMU and sysfs would ensure
matching format prior to issue bind_pasid_table call.
> 
> 
> For binding page tables instead of PASID tables (e.g. virtio-iommu),
> the generic data would be:
> 
> struct pgtable_info {
> 	__u32	pasid;
> 	__u64	ptr;
> 	__u32	model;
> 	__u8	model_data[];
> };
> 
> Followed by a few arch-specific configuration values. For Arm we can
> summarize this to three registers, defined in the Armv8 Architecture
> Reference Manual:
> 
> struct arm_lpae_pgtable_info {
> 	__u64	tcr;	/* Translation Control Register */
> 	__u64	mair;	/* Memory Attributes Indirection
> Register */ __u64	asid;	/* Address Space ID */
> };
> 
> Some data packed in the TCR might be common to most architectures,
> like page granularity and max VA size. Most fields of the TCR won't
> be used but it provides a nice architected way to communicate Arm
> page table configuration.
> 
> Note that there might be an additional page directory in the
> arch-specific info, as we can split the address space in two. I'm not
> sure whether we should allow it yet.
> 
This can be combined with bind mm with user tasks also (e.g. DPDK with
SVM in native case), right? In that case the pasid would be allocated by
the kernel instead of passdown in pgtable_info, and your 'ptr' filed is
harvested from the current task mm? There is also complexity w.r.t.
user task life cycle management. My immediate goal to enable vSVM does
not require bind page tables, try not to think too far ahead.

> Thanks,
> Jean

[Jacob Pan]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: bind pasid table API
       [not found]   ` <6ecc1afc-6302-cd22-6944-ef4c6ac09587-5wv7dgnIgG8@public.gmane.org>
  2017-09-20 22:35     ` Jacob Pan
@ 2017-09-21  3:00     ` Liu, Yi L
       [not found]       ` <A2975661238FB949B60364EF0F2C257439ADB33D-zVW8+lm/ZpmiAffOGbnezLfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2017-09-27 13:40     ` Joerg Roedel
  2 siblings, 1 reply; 18+ messages in thread
From: Liu, Yi L @ 2017-09-21  3:00 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Pan, Jacob jun,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Lan, Tianyu, David Woodhouse

Hi Jean,

> -----Original Message-----
> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org]
> Sent: Wednesday, September 20, 2017 8:10 PM
> To: Pan, Jacob jun <jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; iommu-cunTk1MwBs/ROKNJybVBZg@public.gmane.org
> foundation.org
> Cc: Liu, Yi L <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; Raj, Ashok <ashok.raj-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; David
> Woodhouse <dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>; Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>; Tian,
> Kevin <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; Auger Eric <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Subject: Re: bind pasid table API
> 
> Hi Jacob,
> 
> [Adding Eric as he might need pasid_table_info for vSVM at some point]
> 
> On 19/09/17 04:45, Jacob Pan wrote:
> > Hi Jean and All,
> >
> > This is a follow-up on the LPC discussion we had last week.
> > (https://linuxplumbersconf.org/2017/ocw/proposals/4748)
> >
> > My understanding is that the data structure below can satisfy the
> > needs from Intel (pointer + size) and AMD (pointer only). But ARM
> > pvIOMMU would need additional info to indicate the page table format.
> > Could you share your idea of the right addition for ARM such that we
> > can have a unified API?
> >
> > /**
> >  * PASID table data used to bind guest PASID table to the host IOMMU.
> > This will
> >  * enable guest managed first level page tables.
> >  * @ptr:	PASID table pointer
> >  * @size_order:	number of bits supported in the guest PASID table, must
> be less
> >  *		or equal than the host table size.
> >  */
> > struct pasid_table_info {
> > 	__u64	ptr;
> > 	__u64	size_order;
> > };
> 
> For the PASID table, Arm SMMUv3 would need two additional fields:
> * 'format' telling whether the table has 1 or 2 levels and their
>   dimensions,
> * 'default_substream' telling if PASID0 is reserved for non-pasid traffic.
> 
> I think that's it for the moment, but it does require to leave space for a vendor-
> specific structure at the end. It is one reason why I'd prefer having a 'model' field
> in the pasid_table_info structure telling what fields the whole structure actually
> contains.
> 
> Another reason is if some IOMMU is able to support multiple PASID table
> formats, it could advertise them all in sysfs and Qemu could tell which one it
> chose in 'model'. I'm not sure we'll ever see that in practice.

Regards to your idea, I think the whole flow may be:
* Qemu queries underlying IOMMU for the capability(through sysfs)
* Combined with requirement from user(the guy who starts the VM), Qemu chose a
   suitable model or exit if HW capability is incompatible with user's requirement
* In the coming bind_pasid_table() calling, Qemu pass the chosen model info to host
* Host checks the "model" and use the correct model specific structure to parse
   the model specific data. may be kind of structure intel_pasid_table_info,
   amd_pasid_table_info or arm_pasid_table_info_v#

Does this flow show what you want? This would be a "model" + "model specific data"
proposal. And my concern is the model specific field may look like to be kind of "opaque".

Besides the comments above, is there also possibility for us to put all the possible info
in a super-set just as what we plan to do for the tlb_invalidate() API?

> 
> For binding page tables instead of PASID tables (e.g. virtio-iommu), the generic
> data would be:
> 
> struct pgtable_info {
> 	__u32	pasid;
> 	__u64	ptr;
> 	__u32	model;
> 	__u8	model_data[];
> };

Besides bind_pasid_table API, you would also want to propose an extra API
which likely to be named as bind_pgtable() for this page table binding?

What would be the "model" field indicate? "vendor" or "vendor+version" ? You
may want a length field to indicate the size of "model_data" field.

And same with the bind_pasid_table API, would model_data look like an "opaque"?

> Followed by a few arch-specific configuration values. For Arm we can summarize
> this to three registers, defined in the Armv8 Architecture Reference Manual:
> 
> struct arm_lpae_pgtable_info {
> 	__u64	tcr;	/* Translation Control Register */
> 	__u64	mair;	/* Memory Attributes Indirection Register */
> 	__u64	asid;	/* Address Space ID */
> };

Hmmm, just for curious. What is the difference between "pasid" and "asid"?

Thanks,
Yi L

> Some data packed in the TCR might be common to most architectures, like page
> granularity and max VA size. Most fields of the TCR won't be used but it provides
> a nice architected way to communicate Arm page table configuration.
> 
> Note that there might be an additional page directory in the arch-specific info, as
> we can split the address space in two. I'm not sure whether we should allow it
> yet.
> 
> Thanks,
> Jean

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bind pasid table API
  2017-09-20 22:35     ` Jacob Pan
@ 2017-09-25 11:45       ` Jean-Philippe Brucker
       [not found]         ` <ef71b446-ae00-29af-a934-2e253454df31-5wv7dgnIgG8@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Jean-Philippe Brucker @ 2017-09-25 11:45 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, David Woodhouse

On 20/09/17 23:35, Jacob Pan wrote:
> On Wed, 20 Sep 2017 13:09:47 +0100
> Jean-Philippe Brucker <jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org> wrote:
> 
>> Hi Jacob,
>>
>> [Adding Eric as he might need pasid_table_info for vSVM at some point]
>>
>> On 19/09/17 04:45, Jacob Pan wrote:
>>> Hi Jean and All,
>>>
>>> This is a follow-up on the LPC discussion we had last week.
>>> (https://linuxplumbersconf.org/2017/ocw/proposals/4748)
>>>
>>> My understanding is that the data structure below can satisfy the
>>> needs from Intel (pointer + size) and AMD (pointer only). But ARM
>>> pvIOMMU would need additional info to indicate the page table
>>> format. Could you share your idea of the right addition for ARM
>>> such that we can have a unified API?
>>>
>>> /**
>>>  * PASID table data used to bind guest PASID table to the host
>>> IOMMU. This will
>>>  * enable guest managed first level page tables.
>>>  * @ptr:	PASID table pointer
>>>  * @size_order:	number of bits supported in the guest PASID
>>> table, must be less
>>>  *		or equal than the host table size.
>>>  */
>>> struct pasid_table_info {
>>> 	__u64	ptr;
>>> 	__u64	size_order;
>>> };  
>>
>> For the PASID table, Arm SMMUv3 would need two additional fields:
>> * 'format' telling whether the table has 1 or 2 levels and their
>>   dimensions,
> just wondering if the format will be different than whatever is used on
> the host? i.e. could there be a mismatch? I am ok with this field just
> wondering if this can be resolved via the sysfs query interface if that
> is a global thing.

This format tells how the guest organizes its PASID tables. Depending on
'format', the PASID table can be:
* A flat array of descriptors
* One array of 1st-level descriptors pointing to a 2nd level of
descriptors. When translating an access, the SMMU splits the PASID at bit
6, uses the upper half to get the 1st level descriptor and the lower half
to get the 2nd level one (4k leaf table).
* The same format, but instead split at PASID bit 10 (64k leaf table)

So the guest has to tell the host what format of PASID tables it chose,
but the configuration might be too vendor-specific to go in the generic
part of pasid_table_info.

It seems that AMD IOMMUv2 can have either one, two or three levels of
PASID tables. See definition of GLX field in the 3.2.2.1 Device Table
Entry Format.

We could try to come up with a generic definition to cover all these
cases, but it might be more complicated than appending a few
vendor-specific fields. Only AMD can have three levels, only ARM can split
the PASID in two different places. And we can expect a future architecture
to come up with a new colorful format...

I'm mostly concerned about VFIO though, as the IOMMU API doesn't have the
same backward compatible restriction and is therefore easier to update.

>> * 'default_substream' telling if PASID0 is reserved for non-pasid
>> traffic.
>>
> could it be part of a feature flag? i.e.
> 
> struct pasid_table_info {
>  	__u64	ptr;
>  	__u64	size_order;
> 	__u64	flags;
> #define PASID0_FOR_DMA_WITHOUT_PASID;
> 	enum pasid_table_format format;
> };  
> 
>> I think that's it for the moment, but it does require to leave space
>> for a vendor-specific structure at the end. It is one reason why I'd
>> prefer having a 'model' field in the pasid_table_info structure
>> telling what fields the whole structure actually contains.
>>
> I think we have been there before. the downside is that model specific
> knowledge is required in the generic VFIO layer, if its content is to
> be inspected.

Yes, all of these vendor-specific fields would have to be UAPI. VFIO
doesn't necessarily have to deeply inspect each field, it can just check
that the structure contains the right fields (is the right size) and let
the IOMMU driver, that knows how to read these values, validate that they
make sense.

I think VFIO should be able to trust the IOMMU driver to perform the
sanity checks.

>> Another reason is if some IOMMU is able to support multiple PASID
>> table formats, it could advertise them all in sysfs and Qemu could
>> tell which one it chose in 'model'. I'm not sure we'll ever see that
>> in practice.
>>
> I would expect when query interface between QEMU and sysfs would ensure
> matching format prior to issue bind_pasid_table call.
>>
>>
>> For binding page tables instead of PASID tables (e.g. virtio-iommu),
>> the generic data would be:
>>
>> struct pgtable_info {
>> 	__u32	pasid;
>> 	__u64	ptr;
>> 	__u32	model;
>> 	__u8	model_data[];
>> };
>>
>> Followed by a few arch-specific configuration values. For Arm we can
>> summarize this to three registers, defined in the Armv8 Architecture
>> Reference Manual:
>>
>> struct arm_lpae_pgtable_info {
>> 	__u64	tcr;	/* Translation Control Register */
>> 	__u64	mair;	/* Memory Attributes Indirection
>> Register */ __u64	asid;	/* Address Space ID */
>> };
>>
>> Some data packed in the TCR might be common to most architectures,
>> like page granularity and max VA size. Most fields of the TCR won't
>> be used but it provides a nice architected way to communicate Arm
>> page table configuration.
>>
>> Note that there might be an additional page directory in the
>> arch-specific info, as we can split the address space in two. I'm not
>> sure whether we should allow it yet.
>>
> This can be combined with bind mm with user tasks also (e.g. DPDK with
> SVM in native case), right? In that case the pasid would be allocated by
> the kernel instead of passdown in pgtable_info, and your 'ptr' filed is
> harvested from the current task mm? There is also complexity w.r.t.
> user task life cycle management. My immediate goal to enable vSVM does
> not require bind page tables, try not to think too far ahead.

Yes, I'm working on the native case at the moment. The PASID is allocated
by the kernel and returned in 'pasid', and user can specify an additional
'pid' field to bind another process. There would be a 'flag' field telling
if 'pid' is valid or if we should just bind the current process.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bind pasid table API
       [not found]       ` <A2975661238FB949B60364EF0F2C257439ADB33D-zVW8+lm/ZpmiAffOGbnezLfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2017-09-25 11:45         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 18+ messages in thread
From: Jean-Philippe Brucker @ 2017-09-25 11:45 UTC (permalink / raw)
  To: Liu, Yi L, Pan, Jacob jun,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Lan, Tianyu, David Woodhouse

On 21/09/17 04:00, Liu, Yi L wrote:
> Hi Jean,
> 
>> -----Original Message-----
>> From: Jean-Philippe Brucker [mailto:jean-philippe.brucker-5wv7dgnIgG8@public.gmane.org]
>> Sent: Wednesday, September 20, 2017 8:10 PM
>> To: Pan, Jacob jun <jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; iommu-cunTk1MwBs/ROKNJybVBZg@public.gmane.org
>> foundation.org
>> Cc: Liu, Yi L <yi.l.liu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; Raj, Ashok <ashok.raj-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; David
>> Woodhouse <dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>; Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>; Tian,
>> Kevin <kevin.tian-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>; Auger Eric <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> Subject: Re: bind pasid table API
>>
>> Hi Jacob,
>>
>> [Adding Eric as he might need pasid_table_info for vSVM at some point]
>>
>> On 19/09/17 04:45, Jacob Pan wrote:
>>> Hi Jean and All,
>>>
>>> This is a follow-up on the LPC discussion we had last week.
>>> (https://linuxplumbersconf.org/2017/ocw/proposals/4748)
>>>
>>> My understanding is that the data structure below can satisfy the
>>> needs from Intel (pointer + size) and AMD (pointer only). But ARM
>>> pvIOMMU would need additional info to indicate the page table format.
>>> Could you share your idea of the right addition for ARM such that we
>>> can have a unified API?
>>>
>>> /**
>>>  * PASID table data used to bind guest PASID table to the host IOMMU.
>>> This will
>>>  * enable guest managed first level page tables.
>>>  * @ptr:	PASID table pointer
>>>  * @size_order:	number of bits supported in the guest PASID table, must
>> be less
>>>  *		or equal than the host table size.
>>>  */
>>> struct pasid_table_info {
>>> 	__u64	ptr;
>>> 	__u64	size_order;
>>> };
>>
>> For the PASID table, Arm SMMUv3 would need two additional fields:
>> * 'format' telling whether the table has 1 or 2 levels and their
>>   dimensions,
>> * 'default_substream' telling if PASID0 is reserved for non-pasid traffic.
>>
>> I think that's it for the moment, but it does require to leave space for a vendor-
>> specific structure at the end. It is one reason why I'd prefer having a 'model' field
>> in the pasid_table_info structure telling what fields the whole structure actually
>> contains.
>>
>> Another reason is if some IOMMU is able to support multiple PASID table
>> formats, it could advertise them all in sysfs and Qemu could tell which one it
>> chose in 'model'. I'm not sure we'll ever see that in practice.
> 
> Regards to your idea, I think the whole flow may be:
> * Qemu queries underlying IOMMU for the capability(through sysfs)
> * Combined with requirement from user(the guy who starts the VM), Qemu chose a
>    suitable model or exit if HW capability is incompatible with user's requirement
> * In the coming bind_pasid_table() calling, Qemu pass the chosen model info to host
> * Host checks the "model" and use the correct model specific structure to parse
>    the model specific data. may be kind of structure intel_pasid_table_info,
>    amd_pasid_table_info or arm_pasid_table_info_v#
> 
> Does this flow show what you want? This would be a "model" + "model specific data"
> proposal. And my concern is the model specific field may look like to be kind of "opaque".

It doesn't have to be opaque, it has to be defined in UAPI and an
intermediate layer (VFIO) can verify that they are present. The layer can
also inspect their value more deeply if it wants, since it is clear what
they are supposed to contain. But (in VFIO's case) it doesn't have too,
since the IOMMU driver will validate the values anyway.

> Besides the comments above, is there also possibility for us to put all the possible info
> in a super-set just as what we plan to do for the tlb_invalidate() API?

It's possible, but might require too much effort and might become more
difficult to do once the number of IOMMU drivers supporting PASID
increases. (See my other reply to Jacob)

>> For binding page tables instead of PASID tables (e.g. virtio-iommu), the generic
>> data would be:
>>
>> struct pgtable_info {
>> 	__u32	pasid;
>> 	__u64	ptr;
>> 	__u32	model;
>> 	__u8	model_data[];

This should be an union actually, since it's internal to the kernel. It's
VFIO that uses this interface.

>> };
> 
> Besides bind_pasid_table API, you would also want to propose an extra API
> which likely to be named as bind_pgtable() for this page table binding?

Yes, but we could use the same bind() function taking an enum {process,
pgtable, pasid_table} and the parameter structure.

> What would be the "model" field indicate? "vendor" or "vendor+version" ? You
> may want a length field to indicate the size of "model_data" field.

I think it should be vendor+version.

> And same with the bind_pasid_table API, would model_data look like an "opaque"?
> 
>> Followed by a few arch-specific configuration values. For Arm we can summarize
>> this to three registers, defined in the Armv8 Architecture Reference Manual:
>>
>> struct arm_lpae_pgtable_info {
>> 	__u64	tcr;	/* Translation Control Register */
>> 	__u64	mair;	/* Memory Attributes Indirection Register */
>> 	__u64	asid;	/* Address Space ID */
>> };
> 
> Hmmm, just for curious. What is the difference between "pasid" and "asid"?

ASID is an ID used to tag TLB (but not ATC) entries in Arm systems. It's
very similar to the "PCID" recently introduced in Linux for x86. In an
ideal system we could use the same value for ASID and PASID, but I think
it's safer to dissociate them for the moment.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bind pasid table API
       [not found]         ` <ef71b446-ae00-29af-a934-2e253454df31-5wv7dgnIgG8@public.gmane.org>
@ 2017-09-25 15:14           ` Raj, Ashok
  2017-09-26  9:46             ` Jean-Philippe Brucker
  0 siblings, 1 reply; 18+ messages in thread
From: Raj, Ashok @ 2017-09-25 15:14 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jacob Pan,
	David Woodhouse

On Mon, Sep 25, 2017 at 12:45:00PM +0100, Jean-Philippe Brucker wrote:
[snip]

> This format tells how the guest organizes its PASID tables. Depending on
> 'format', the PASID table can be:
> * A flat array of descriptors
> * One array of 1st-level descriptors pointing to a 2nd level of
> descriptors. When translating an access, the SMMU splits the PASID at bit
> 6, uses the upper half to get the 1st level descriptor and the lower half
> to get the 2nd level one (4k leaf table).
> * The same format, but instead split at PASID bit 10 (64k leaf table)

When in multilevel PASID table, do you shadow it in the host? or every 
level is all hosted in guest space? in Intel vt-d, when context is in 
a nested mode, PASID table is a gPA. Its a single level table.


Cheers,
Ashok

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bind pasid table API
  2017-09-25 15:14           ` Raj, Ashok
@ 2017-09-26  9:46             ` Jean-Philippe Brucker
  0 siblings, 0 replies; 18+ messages in thread
From: Jean-Philippe Brucker @ 2017-09-26  9:46 UTC (permalink / raw)
  To: Raj, Ashok
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jacob Pan,
	David Woodhouse

On 25/09/17 16:14, Raj, Ashok wrote:
> On Mon, Sep 25, 2017 at 12:45:00PM +0100, Jean-Philippe Brucker wrote:
> [snip]
> 
>> This format tells how the guest organizes its PASID tables. Depending on
>> 'format', the PASID table can be:
>> * A flat array of descriptors
>> * One array of 1st-level descriptors pointing to a 2nd level of
>> descriptors. When translating an access, the SMMU splits the PASID at bit
>> 6, uses the upper half to get the 1st level descriptor and the lower half
>> to get the 2nd level one (4k leaf table).
>> * The same format, but instead split at PASID bit 10 (64k leaf table)
> 
> When in multilevel PASID table, do you shadow it in the host? or every 
> level is all hosted in guest space? in Intel vt-d, when context is in 
> a nested mode, PASID table is a gPA. Its a single level table.

In the SMMU as well, both levels of PASID tables are accessed by gPA, so
there's no need to shadow them.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bind pasid table API
       [not found]   ` <6ecc1afc-6302-cd22-6944-ef4c6ac09587-5wv7dgnIgG8@public.gmane.org>
  2017-09-20 22:35     ` Jacob Pan
  2017-09-21  3:00     ` Liu, Yi L
@ 2017-09-27 13:40     ` Joerg Roedel
       [not found]       ` <20170927134041.GN8398-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
  2 siblings, 1 reply; 18+ messages in thread
From: Joerg Roedel @ 2017-09-27 13:40 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jacob Pan,
	David Woodhouse

Hi,

On Wed, Sep 20, 2017 at 01:09:47PM +0100, Jean-Philippe Brucker wrote:
> For binding page tables instead of PASID tables (e.g. virtio-iommu), the
> generic data would be:
> 
> struct pgtable_info {
> 	__u32	pasid;
> 	__u64	ptr;
> 	__u32	model;
> 	__u8	model_data[];
> };

I had a look again and at the AMD side there is no way to build a shadow
pasid-table when more than 9-bit pasids are used, because all pointers
in that multi-level table are GPA and thus translated.

So how about allowing the virtio-iommu to build pasid-tables in
different formats? We can easily collect the existing code for that in
the Intel and AMD drivers into a library that could be used by the
virtio-iommu driver too. Then the virtio-iommu can create the correct
table for the host-iommu and we don't have to rely on shadowing.

For binding whole pasid tables, reading through the thread, it looks
like we can't get away with generic attributes that would fit everyone.
For AMD and Intel it would suffice to have a base-ptr and the
number of pasids-bits the table can map.

The Intel driver can then calculate the size of the table and the AMD
driver can compute the GLX value;

In case we need the model-specific info, I'd like to have them
explicitly stated in the struct:

	enum pasid_table_model {
		PASID_TABLE_INTEL,
		PASID_TABLE_ARM,
		PASID_TABLE_AMD,
		/* ... */
	};

	struct pasid_table_config {
		__u64 base_ptr;
		__u8 pasid_bits;
		union {
			struct {
				/* Intel specific fields */
			} intel;

			struct {
				/* ARM specific fields */
			} arm;


			struct {
				/* AMD specific fields */
			} amd;

			/* ... */
		};
	};

How does that look?

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bind pasid table API
       [not found]       ` <20170927134041.GN8398-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
@ 2017-09-27 17:51         ` Jacob Pan
  2017-09-28 12:07           ` Joerg Roedel
  2017-09-28 11:21         ` Jean-Philippe Brucker
  1 sibling, 1 reply; 18+ messages in thread
From: Jacob Pan @ 2017-09-27 17:51 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w, David Woodhouse

On Wed, 27 Sep 2017 15:40:41 +0200
Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> wrote:

> Hi,
> 
> On Wed, Sep 20, 2017 at 01:09:47PM +0100, Jean-Philippe Brucker wrote:
> > For binding page tables instead of PASID tables (e.g.
> > virtio-iommu), the generic data would be:
> > 
> > struct pgtable_info {
> > 	__u32	pasid;
> > 	__u64	ptr;
> > 	__u32	model;
> > 	__u8	model_data[];
> > };  
> 
> I had a look again and at the AMD side there is no way to build a
> shadow pasid-table when more than 9-bit pasids are used, because all
> pointers in that multi-level table are GPA and thus translated.
> 
> So how about allowing the virtio-iommu to build pasid-tables in
> different formats? We can easily collect the existing code for that in
> the Intel and AMD drivers into a library that could be used by the
> virtio-iommu driver too. Then the virtio-iommu can create the correct
> table for the host-iommu and we don't have to rely on shadowing.
> 
> For binding whole pasid tables, reading through the thread, it looks
> like we can't get away with generic attributes that would fit
> everyone. For AMD and Intel it would suffice to have a base-ptr and
> the number of pasids-bits the table can map.
> 
> The Intel driver can then calculate the size of the table and the AMD
> driver can compute the GLX value;
> 
> In case we need the model-specific info, I'd like to have them
> explicitly stated in the struct:
> 
> 	enum pasid_table_model {
> 		PASID_TABLE_INTEL,
> 		PASID_TABLE_ARM,
> 		PASID_TABLE_AMD,
> 		/* ... */
> 	};
> 
I guess one vendor could have multiple pasid table format. so perhaps
the name could reflect the format as well?
> 	struct pasid_table_config {
> 		__u64 base_ptr;
> 		__u8 pasid_bits;
> 		union {
> 			struct {
> 				/* Intel specific fields */
> 			} intel;
> 
> 			struct {
> 				/* ARM specific fields */
> 			} arm;
> 
> 
> 			struct {
> 				/* AMD specific fields */
> 			} amd;
> 
> 			/* ... */
> 		};
> 	};
> 
> How does that look?
> 
It should work for us for now but I am not sure how stable the vendor
specific fields will be, this is UAPI. BTW, do you also intend to
include the pasid_table_model # in pasid_table_config?

> Regards,
> 
> 	Joerg
> 

[Jacob Pan]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bind pasid table API
       [not found]       ` <20170927134041.GN8398-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
  2017-09-27 17:51         ` Jacob Pan
@ 2017-09-28 11:21         ` Jean-Philippe Brucker
       [not found]           ` <e23f7d00-90f2-e5d4-6619-9fe9150a96b9-5wv7dgnIgG8@public.gmane.org>
  1 sibling, 1 reply; 18+ messages in thread
From: Jean-Philippe Brucker @ 2017-09-28 11:21 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jacob Pan,
	David Woodhouse

On 27/09/17 14:40, Joerg Roedel wrote:
> Hi,
> 
> On Wed, Sep 20, 2017 at 01:09:47PM +0100, Jean-Philippe Brucker wrote:
>> For binding page tables instead of PASID tables (e.g. virtio-iommu), the
>> generic data would be:
>>
>> struct pgtable_info {
>> 	__u32	pasid;
>> 	__u64	ptr;
>> 	__u32	model;
>> 	__u8	model_data[];
>> };
> 
> I had a look again and at the AMD side there is no way to build a shadow
> pasid-table when more than 9-bit pasids are used, because all pointers
> in that multi-level table are GPA and thus translated.

It's the same problem on SMMUv3, all pointers in the multi-level PASID
tables are GPAs. Our solution was to grab chunks GPA space from the guest
when necessary.

The host would maintain the PASID tables just like with native SVM. The
difference is that they are stored in guest memory. In more details, it
would go like this:

(1) host tells the guest the size of a memreserve (greatest size between
    all levels) and what pgtable format the host iommu supports.
(2) guest sends a memreserve request, giving up some of its GPA space for
    the host. There already are physical pages allocated for it since the
    GPA->PA mappings were created by the host with VFIO_MAP_DMA before
    booting the guest.
(3) guest sends a add_table request, asking to bind a pgd to a PASID
(4) the host installs the first-level PASID table. Instead of allocating
    it with kzalloc, it uses vmap on the existing memreserve pages.
(5) The host doesn't have any memreserve left for the second level so it
    returns an error code for add_table, asking the guest for more
    memreserve.
(5) guest sends another memreserve request
(6) guest sends the add_table request again, which succeeds
(7) host installs the second-level PASID table
(8) When the guest removes the page tables, it can ask to reclaim the
    memreserve

I know it's not particularly graceful, but it isn't too difficult to
implement and there are a few reasons that make us prefer page table
binding over PASID table binding for virtio-iommu:

* It also enables nested mode for IOMMUs that don't support PASIDs (but do
  support two translation stages), like the SMMUv2. They would require an
  add_table mechanism anyway.

* For a little more effort on the host side (handling memreserve pool), it
  removes a great deal of work in guests. We wouldn't have to factor all
  PASID code into a library in guests (currently battling with SMMUv3
  support for PASID, and struggling to move a little bit of PASID code
  into the core, I'd prefer this to be last resort...)

  The host driver already has code to manipulate PASID table entries in
  the right place, so hopefully it'd be less work doing it in the host.

* Even if we decide to bind PASID tables we'll still have to tell the
  guest what page table format it can use, since in SMMUv3 we have a lot
  of fields and knobs in the PASID descriptor itself, it's not just a PGD
  pointer :/

  So the probe step is necessary. Supporting PASID tables adds an
  additional level of complexity to the device specification

So I'd like to try this solution first, since it seems less invasive, more
flexible and a lot of the code would be necessary for both solutions. Both
are difficult to specify and implement, though, so it's hard to say which
is best for the moment. It shouldn't be too hard to fall back to PASID
table binding if page table binding proves too cumbersome.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bind pasid table API
  2017-09-27 17:51         ` Jacob Pan
@ 2017-09-28 12:07           ` Joerg Roedel
       [not found]             ` <20170928120705.GR8398-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Joerg Roedel @ 2017-09-28 12:07 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, David Woodhouse

On Wed, Sep 27, 2017 at 10:51:55AM -0700, Jacob Pan wrote:
> On Wed, 27 Sep 2017 15:40:41 +0200
> Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> wrote:
> > 	enum pasid_table_model {
> > 		PASID_TABLE_INTEL,
> > 		PASID_TABLE_ARM,
> > 		PASID_TABLE_AMD,
> > 		/* ... */
> > 	};
> > 
> I guess one vendor could have multiple pasid table format. so perhaps
> the name could reflect the format as well?

Yes, that makes sense.

> > 	struct pasid_table_config {
> > 		__u64 base_ptr;
> > 		__u8 pasid_bits;
> > 		union {
> > 			struct {
> > 				/* Intel specific fields */
> > 			} intel;
> > 
> > 			struct {
> > 				/* ARM specific fields */
> > 			} arm;
> > 
> > 
> > 			struct {
> > 				/* AMD specific fields */
> > 			} amd;
> > 
> > 			/* ... */
> > 		};
> > 	};
> > 
> > How does that look?
> > 
> It should work for us for now but I am not sure how stable the vendor
> specific fields will be, this is UAPI. BTW, do you also intend to
> include the pasid_table_model # in pasid_table_config?

If its going to be UAPI then we probably also need a version and a
length/size field. But since individual formats do not change, their
sub-struct should be stable.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bind pasid table API
       [not found]           ` <e23f7d00-90f2-e5d4-6619-9fe9150a96b9-5wv7dgnIgG8@public.gmane.org>
@ 2017-09-28 17:11             ` Raj, Ashok
  2017-09-29  5:44               ` Tian, Kevin
  2017-09-29 15:30               ` Joerg Roedel
  0 siblings, 2 replies; 18+ messages in thread
From: Raj, Ashok @ 2017-09-28 17:11 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jacob Pan,
	David Woodhouse

Hi Jean

On Thu, Sep 28, 2017 at 12:21:34PM +0100, Jean-Philippe Brucker wrote:
> On 27/09/17 14:40, Joerg Roedel wrote:
> > Hi,
> > 
> > On Wed, Sep 20, 2017 at 01:09:47PM +0100, Jean-Philippe Brucker wrote:
> >> For binding page tables instead of PASID tables (e.g. virtio-iommu), the
> >> generic data would be:
> >>
> >> struct pgtable_info {
> >> 	__u32	pasid;
> >> 	__u64	ptr;
> >> 	__u32	model;
> >> 	__u8	model_data[];
> >> };
> > 
> > I had a look again and at the AMD side there is no way to build a shadow
> > pasid-table when more than 9-bit pasids are used, because all pointers
> > in that multi-level table are GPA and thus translated.
> 
> It's the same problem on SMMUv3, all pointers in the multi-level PASID
> tables are GPAs. Our solution was to grab chunks GPA space from the guest
> when necessary.

If pasid table is gPA and built and managed by guest IOMMU driver in case of
vIOMMU, or virtio-iommu. Is this complex interaction to reserve memory and
setup required? wouldn't it be sufficient if we do this.

- Host advertises the capability in terms of what table format it it supports.
- guest does all the memory allocation and management.
- communicate to host via VFIO, gPA, format, size of table
- host now programs context entries appropriately with proper nested mode



> 
> The host would maintain the PASID tables just like with native SVM. The
> difference is that they are stored in guest memory. In more details, it
> would go like this:
> 
> (1) host tells the guest the size of a memreserve (greatest size between
>     all levels) and what pgtable format the host iommu supports.
> (2) guest sends a memreserve request, giving up some of its GPA space for
>     the host. There already are physical pages allocated for it since the
>     GPA->PA mappings were created by the host with VFIO_MAP_DMA before
>     booting the guest.
> (3) guest sends a add_table request, asking to bind a pgd to a PASID
> (4) the host installs the first-level PASID table. Instead of allocating
>     it with kzalloc, it uses vmap on the existing memreserve pages.
> (5) The host doesn't have any memreserve left for the second level so it
>     returns an error code for add_table, asking the guest for more
>     memreserve.
> (5) guest sends another memreserve request
> (6) guest sends the add_table request again, which succeeds
> (7) host installs the second-level PASID table
> (8) When the guest removes the page tables, it can ask to reclaim the
>     memreserve
> 
> I know it's not particularly graceful, but it isn't too difficult to
> implement and there are a few reasons that make us prefer page table
> binding over PASID table binding for virtio-iommu:
> 
> * It also enables nested mode for IOMMUs that don't support PASIDs (but do
>   support two translation stages), like the SMMUv2. They would require an
>   add_table mechanism anyway.
> 
> * For a little more effort on the host side (handling memreserve pool), it
>   removes a great deal of work in guests. We wouldn't have to factor all
>   PASID code into a library in guests (currently battling with SMMUv3
>   support for PASID, and struggling to move a little bit of PASID code
>   into the core, I'd prefer this to be last resort...)
> 
>   The host driver already has code to manipulate PASID table entries in
>   the right place, so hopefully it'd be less work doing it in the host.
> 
> * Even if we decide to bind PASID tables we'll still have to tell the
>   guest what page table format it can use, since in SMMUv3 we have a lot
>   of fields and knobs in the PASID descriptor itself, it's not just a PGD
>   pointer :/
> 
>   So the probe step is necessary. Supporting PASID tables adds an
>   additional level of complexity to the device specification
> 
> So I'd like to try this solution first, since it seems less invasive, more
> flexible and a lot of the code would be necessary for both solutions. Both
> are difficult to specify and implement, though, so it's hard to say which
> is best for the moment. It shouldn't be too hard to fall back to PASID
> table binding if page table binding proves too cumbersome.
> 
> Thanks,
> Jean

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bind pasid table API
       [not found]             ` <20170928120705.GR8398-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
@ 2017-09-28 21:36               ` Jacob Pan
  2017-09-29 15:23                 ` Joerg Roedel
  0 siblings, 1 reply; 18+ messages in thread
From: Jacob Pan @ 2017-09-28 21:36 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	jacob.jun.pan-ral2JQCrhuEAvxtiuMwx3w, David Woodhouse

On Thu, 28 Sep 2017 14:07:05 +0200
Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> wrote:

> On Wed, Sep 27, 2017 at 10:51:55AM -0700, Jacob Pan wrote:
> > On Wed, 27 Sep 2017 15:40:41 +0200
> > Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> wrote:  
> > > 	enum pasid_table_model {
> > > 		PASID_TABLE_INTEL,
> > > 		PASID_TABLE_ARM,
> > > 		PASID_TABLE_AMD,
> > > 		/* ... */
> > > 	};
> > >   
> > I guess one vendor could have multiple pasid table format. so
> > perhaps the name could reflect the format as well?  
> 
> Yes, that makes sense.
> 
> > > 	struct pasid_table_config {
> > > 		__u64 base_ptr;
> > > 		__u8 pasid_bits;
> > > 		union {
> > > 			struct {
> > > 				/* Intel specific fields */
> > > 			} intel;
> > > 
> > > 			struct {
> > > 				/* ARM specific fields */
> > > 			} arm;
> > > 
> > > 
> > > 			struct {
> > > 				/* AMD specific fields */
> > > 			} amd;
> > > 
> > > 			/* ... */
> > > 		};
> > > 	};
> > > 
> > > How does that look?
> > >   
> > It should work for us for now but I am not sure how stable the
> > vendor specific fields will be, this is UAPI. BTW, do you also
> > intend to include the pasid_table_model # in pasid_table_config?  
> 
> If its going to be UAPI then we probably also need a version and a
> length/size field. But since individual formats do not change, their
> sub-struct should be stable.
> 
just to confirm, we have two separate APIs for bind_pasid_table_ptr and
bind_page_table. For the former, we need to passdown the following info
from qemu/vfio. For now
struct pasid_table_config {
	__u32 version;
#define PASID_TABLE_CONFIG_VERSION_1 (1)
	__u32 bytes;
	__u64 base_ptr;
	__u8 pasid_bits;
	enum pasid_table_model model;
	union {
		struct {
			/* Intel specific fields */
		} intel;

		struct {
			/* ARM specific fields */
		} arm;

		struct {
			/* AMD specific fields */
		} amd;

		/* ... */
	};
};

> Regards,
> 
> 	Joerg
> 

[Jacob Pan]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: bind pasid table API
  2017-09-28 17:11             ` Raj, Ashok
@ 2017-09-29  5:44               ` Tian, Kevin
       [not found]                 ` <AADFC41AFE54684AB9EE6CBC0274A5D190DEA654-0J0gbvR4kThpB2pF5aRoyrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
  2017-09-29 15:30               ` Joerg Roedel
  1 sibling, 1 reply; 18+ messages in thread
From: Tian, Kevin @ 2017-09-29  5:44 UTC (permalink / raw)
  To: Raj, Ashok, Jean-Philippe Brucker
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Pan,
	Jacob jun, David Woodhouse

> From: Raj, Ashok
> Sent: Friday, September 29, 2017 1:11 AM
> 
> Hi Jean
> 
> On Thu, Sep 28, 2017 at 12:21:34PM +0100, Jean-Philippe Brucker wrote:
> > On 27/09/17 14:40, Joerg Roedel wrote:
> > > Hi,
> > >
> > > On Wed, Sep 20, 2017 at 01:09:47PM +0100, Jean-Philippe Brucker
> wrote:
> > >> For binding page tables instead of PASID tables (e.g. virtio-iommu), the
> > >> generic data would be:
> > >>
> > >> struct pgtable_info {
> > >> 	__u32	pasid;
> > >> 	__u64	ptr;
> > >> 	__u32	model;
> > >> 	__u8	model_data[];
> > >> };
> > >
> > > I had a look again and at the AMD side there is no way to build a
> shadow
> > > pasid-table when more than 9-bit pasids are used, because all pointers
> > > in that multi-level table are GPA and thus translated.
> >
> > It's the same problem on SMMUv3, all pointers in the multi-level PASID
> > tables are GPAs. Our solution was to grab chunks GPA space from the
> guest
> > when necessary.
> 
> If pasid table is gPA and built and managed by guest IOMMU driver in case
> of
> vIOMMU, or virtio-iommu. Is this complex interaction to reserve memory
> and
> setup required? wouldn't it be sufficient if we do this.
> 
> - Host advertises the capability in terms of what table format it it supports.
> - guest does all the memory allocation and management.
> - communicate to host via VFIO, gPA, format, size of table
> - host now programs context entries appropriately with proper nested
> mode
> 

I can see the motivation why Jean would like to go this way. As he 
explained below, it doesn't require factoring PASID codes (all formats
on all vendors) into a library in guest. Guest just provides PGD and
host IOMMU driver has all the logic to build PASID table in right form.

However I'm with Ashok here. I'm not sure whether host-side change
is really less invasive. New memreserve APIs need to be introduced
cross Qemu/VFIO/IOMMU driver. Also PASID logic in each host IOMMU
driver needs change to cope with whether the table is built for a native 
usage or guest usage. In the latter case, memreserve must be considered 
which is not like a typical way of page table mgmt. (especially regarding 
to a multi-level PASID table).

Thinking about anyway we need to factor page table code in the
guest side. It's not a big deal to further factor PASID code due to the
same reason. Of course there are additional format info to be 
communicated back-and-forth, but the framework should be same
for both page table and PASID purposes. Also such change is restricted
within IOMMU sub-system. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bind pasid table API
  2017-09-28 21:36               ` Jacob Pan
@ 2017-09-29 15:23                 ` Joerg Roedel
  0 siblings, 0 replies; 18+ messages in thread
From: Joerg Roedel @ 2017-09-29 15:23 UTC (permalink / raw)
  To: Jacob Pan
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, David Woodhouse

On Thu, Sep 28, 2017 at 02:36:57PM -0700, Jacob Pan wrote:
> just to confirm, we have two separate APIs for bind_pasid_table_ptr and
> bind_page_table. For the former, we need to passdown the following info
> from qemu/vfio. For now
> struct pasid_table_config {
> 	__u32 version;
> #define PASID_TABLE_CONFIG_VERSION_1 (1)
> 	__u32 bytes;
> 	__u64 base_ptr;
> 	__u8 pasid_bits;
> 	enum pasid_table_model model;
> 	union {
> 		struct {
> 			/* Intel specific fields */
> 		} intel;
> 
> 		struct {
> 			/* ARM specific fields */
> 		} arm;
> 
> 		struct {
> 			/* AMD specific fields */
> 		} amd;
> 
> 		/* ... */
> 	};
> };

Yes, this looks good to me.


Regards,

	Joerg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bind pasid table API
  2017-09-28 17:11             ` Raj, Ashok
  2017-09-29  5:44               ` Tian, Kevin
@ 2017-09-29 15:30               ` Joerg Roedel
  1 sibling, 0 replies; 18+ messages in thread
From: Joerg Roedel @ 2017-09-29 15:30 UTC (permalink / raw)
  To: Raj, Ashok
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Jacob Pan,
	David Woodhouse

On Thu, Sep 28, 2017 at 10:11:21AM -0700, Raj, Ashok wrote:
> Hi Jean
> 
> On Thu, Sep 28, 2017 at 12:21:34PM +0100, Jean-Philippe Brucker wrote:
> > It's the same problem on SMMUv3, all pointers in the multi-level PASID
> > tables are GPAs. Our solution was to grab chunks GPA space from the guest
> > when necessary.
> 
> If pasid table is gPA and built and managed by guest IOMMU driver in case of
> vIOMMU, or virtio-iommu. Is this complex interaction to reserve memory and
> setup required? wouldn't it be sufficient if we do this.
> 
> - Host advertises the capability in terms of what table format it it supports.
> - guest does all the memory allocation and management.
> - communicate to host via VFIO, gPA, format, size of table
> - host now programs context entries appropriately with proper nested mode

Yes, I think this is simpler than introducing a new memreserve API. The
code to build the pasid-tables is mostly there already, it just needs
some refactoring to make it usable for viommu.

The memreserve approach also implicates that the host can not trust the
pasid tables it builds and need to re-verify them on every change
because it is still accessible by the guest. We could write-protect the
table or unmap it from the guest, but it would still be accessible by
DMA. I think its tricky to get this right.


Regards,

	Joerg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: bind pasid table API
       [not found]                 ` <AADFC41AFE54684AB9EE6CBC0274A5D190DEA654-0J0gbvR4kThpB2pF5aRoyrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
@ 2017-09-29 15:38                   ` Joerg Roedel
  0 siblings, 0 replies; 18+ messages in thread
From: Joerg Roedel @ 2017-09-29 15:38 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Pan,
	Jacob jun, David Woodhouse

On Fri, Sep 29, 2017 at 05:44:02AM +0000, Tian, Kevin wrote:
> Thinking about anyway we need to factor page table code in the
> guest side. It's not a big deal to further factor PASID code due to the
> same reason. Of course there are additional format info to be 
> communicated back-and-forth, but the framework should be same
> for both page table and PASID purposes. Also such change is restricted
> within IOMMU sub-system. :-)

Right, and in the common case where the guest runs a distro-kernel which
has the iommu-drivers compiled-in, the code to build the pasid-tables is
also already available and vIOMMU just needs to use it. So this is the
easiest approach.


Regards,

	Joerg

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-09-29 15:38 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-19  3:45 bind pasid table API Jacob Pan
2017-09-20 12:09 ` Jean-Philippe Brucker
     [not found]   ` <6ecc1afc-6302-cd22-6944-ef4c6ac09587-5wv7dgnIgG8@public.gmane.org>
2017-09-20 22:35     ` Jacob Pan
2017-09-25 11:45       ` Jean-Philippe Brucker
     [not found]         ` <ef71b446-ae00-29af-a934-2e253454df31-5wv7dgnIgG8@public.gmane.org>
2017-09-25 15:14           ` Raj, Ashok
2017-09-26  9:46             ` Jean-Philippe Brucker
2017-09-21  3:00     ` Liu, Yi L
     [not found]       ` <A2975661238FB949B60364EF0F2C257439ADB33D-zVW8+lm/ZpmiAffOGbnezLfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-09-25 11:45         ` Jean-Philippe Brucker
2017-09-27 13:40     ` Joerg Roedel
     [not found]       ` <20170927134041.GN8398-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2017-09-27 17:51         ` Jacob Pan
2017-09-28 12:07           ` Joerg Roedel
     [not found]             ` <20170928120705.GR8398-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2017-09-28 21:36               ` Jacob Pan
2017-09-29 15:23                 ` Joerg Roedel
2017-09-28 11:21         ` Jean-Philippe Brucker
     [not found]           ` <e23f7d00-90f2-e5d4-6619-9fe9150a96b9-5wv7dgnIgG8@public.gmane.org>
2017-09-28 17:11             ` Raj, Ashok
2017-09-29  5:44               ` Tian, Kevin
     [not found]                 ` <AADFC41AFE54684AB9EE6CBC0274A5D190DEA654-0J0gbvR4kThpB2pF5aRoyrfspsVTdybXVpNB7YpNyf8@public.gmane.org>
2017-09-29 15:38                   ` Joerg Roedel
2017-09-29 15:30               ` Joerg Roedel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.