[v7,03/15] s390/zcrypt: driver callback to indicate resource in use
diff mbox series

Message ID 20200407192015.19887-4-akrowiak@linux.ibm.com
State New
Headers show
Series
  • s390/vfio-ap: dynamic configuration support
Related show

Commit Message

Tony Krowiak April 7, 2020, 7:20 p.m. UTC
Introduces a new driver callback to prevent a root user from unbinding
an AP queue from its device driver if the queue is in use. The intent of
this callback is to provide a driver with the means to prevent a root user
from inadvertently taking a queue away from a guest and giving it to the
host while the guest is still using it. The callback will
be invoked whenever a change to the AP bus's sysfs apmask or aqmask
attributes would result in one or more AP queues being removed from its
driver. If the callback responds in the affirmative for any driver
queried, the change to the apmask or aqmask will be rejected with a device
in use error.

For this patch, only non-default drivers will be queried. Currently,
there is only one non-default driver, the vfio_ap device driver. The
vfio_ap device driver manages AP queues passed through to one or more
guests and we don't want to unexpectedly take AP resources away from
guests which are most likely independently administered.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/ap_bus.c | 144 +++++++++++++++++++++++++++++++++--
 drivers/s390/crypto/ap_bus.h |   4 +
 2 files changed, 142 insertions(+), 6 deletions(-)

Comments

Cornelia Huck April 14, 2020, 12:08 p.m. UTC | #1
On Tue,  7 Apr 2020 15:20:03 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Introduces a new driver callback to prevent a root user from unbinding
> an AP queue from its device driver if the queue is in use. The intent of
> this callback is to provide a driver with the means to prevent a root user
> from inadvertently taking a queue away from a guest and giving it to the
> host while the guest is still using it. The callback will
> be invoked whenever a change to the AP bus's sysfs apmask or aqmask
> attributes would result in one or more AP queues being removed from its
> driver. If the callback responds in the affirmative for any driver
> queried, the change to the apmask or aqmask will be rejected with a device
> in use error.
> 
> For this patch, only non-default drivers will be queried. Currently,
> there is only one non-default driver, the vfio_ap device driver. The
> vfio_ap device driver manages AP queues passed through to one or more
> guests and we don't want to unexpectedly take AP resources away from
> guests which are most likely independently administered.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/ap_bus.c | 144 +++++++++++++++++++++++++++++++++--
>  drivers/s390/crypto/ap_bus.h |   4 +
>  2 files changed, 142 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
> index 5256e3ce84e5..af15c095e76a 100644
> --- a/drivers/s390/crypto/ap_bus.c
> +++ b/drivers/s390/crypto/ap_bus.c
> @@ -35,6 +35,7 @@
>  #include <linux/mod_devicetable.h>
>  #include <linux/debugfs.h>
>  #include <linux/ctype.h>
> +#include <linux/module.h>
>  
>  #include "ap_bus.h"
>  #include "ap_debug.h"
> @@ -995,9 +996,11 @@ int ap_parse_mask_str(const char *str,
>  	newmap = kmalloc(size, GFP_KERNEL);
>  	if (!newmap)
>  		return -ENOMEM;
> -	if (mutex_lock_interruptible(lock)) {
> -		kfree(newmap);
> -		return -ERESTARTSYS;
> +	if (lock) {
> +		if (mutex_lock_interruptible(lock)) {
> +			kfree(newmap);
> +			return -ERESTARTSYS;
> +		}

This whole function is a bit odd. It seems all masks we want to
manipulate are always guarded by the ap_perms_mutex, and the need for
allowing lock == NULL comes from wanting to call this function with the
ap_perms_mutex already held.

That would argue for a locked/unlocked version of this function... but
looking at it, why do we lock the way we do? The one thing this
function (prior to this patch) does outside of the holding of the mutex
is the allocation and freeing of newmap. But with this patch, we do the
allocation and freeing of newmap while holding the mutex. Something
seems a bit weird here.

>  	}
>  
>  	if (*str == '+' || *str == '-') {
Cornelia Huck April 14, 2020, 12:58 p.m. UTC | #2
On Tue,  7 Apr 2020 15:20:03 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Introduces a new driver callback to prevent a root user from unbinding
> an AP queue from its device driver if the queue is in use. The intent of
> this callback is to provide a driver with the means to prevent a root user
> from inadvertently taking a queue away from a guest and giving it to the
> host while the guest is still using it. The callback will
> be invoked whenever a change to the AP bus's sysfs apmask or aqmask
> attributes would result in one or more AP queues being removed from its
> driver. If the callback responds in the affirmative for any driver
> queried, the change to the apmask or aqmask will be rejected with a device
> in use error.
> 
> For this patch, only non-default drivers will be queried. Currently,
> there is only one non-default driver, the vfio_ap device driver. The
> vfio_ap device driver manages AP queues passed through to one or more
> guests and we don't want to unexpectedly take AP resources away from
> guests which are most likely independently administered.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/ap_bus.c | 144 +++++++++++++++++++++++++++++++++--
>  drivers/s390/crypto/ap_bus.h |   4 +
>  2 files changed, 142 insertions(+), 6 deletions(-)

(...)

> @@ -1196,12 +1202,75 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
>  	return rc;
>  }
>  
> +int __verify_card_reservations(struct device_driver *drv, void *data)
> +{
> +	int rc = 0;
> +	struct ap_driver *ap_drv = to_ap_drv(drv);
> +	unsigned long *newapm = (unsigned long *)data;
> +
> +	/*
> +	 * If the reserved bits do not identify cards reserved for use by the
> +	 * non-default driver, there is no need to verify the driver is using
> +	 * the queues.

I had to read that one several times... what about

"No need to verify whether the driver is using the queues if it is the
default driver."

?

> +	 */
> +	if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
> +		return 0;
> +
> +	/* The non-default driver's module must be loaded */
> +	if (!try_module_get(drv->owner))
> +		return 0;

Is that really needed? I would have thought that the driver core's
klist usage would make sure that the callback would not be invoked for
drivers that are not registered anymore. Or am I missing a window?

> +
> +	if (ap_drv->in_use)
> +		if (ap_drv->in_use(newapm, ap_perms.aqm))

Can we log the offending apm somewhere, preferably with additional info
that allows the admin to figure out why an error was returned?

> +			rc = -EADDRINUSE;
> +
> +	module_put(drv->owner);
> +
> +	return rc;
> +}

(Same comments for the other changes further along in this patch.)
Harald Freudenberger April 15, 2020, 6:08 a.m. UTC | #3
On 14.04.20 14:58, Cornelia Huck wrote:
> On Tue,  7 Apr 2020 15:20:03 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> Introduces a new driver callback to prevent a root user from unbinding
>> an AP queue from its device driver if the queue is in use. The intent of
>> this callback is to provide a driver with the means to prevent a root user
>> from inadvertently taking a queue away from a guest and giving it to the
>> host while the guest is still using it. The callback will
>> be invoked whenever a change to the AP bus's sysfs apmask or aqmask
>> attributes would result in one or more AP queues being removed from its
>> driver. If the callback responds in the affirmative for any driver
>> queried, the change to the apmask or aqmask will be rejected with a device
>> in use error.
>>
>> For this patch, only non-default drivers will be queried. Currently,
>> there is only one non-default driver, the vfio_ap device driver. The
>> vfio_ap device driver manages AP queues passed through to one or more
>> guests and we don't want to unexpectedly take AP resources away from
>> guests which are most likely independently administered.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>  drivers/s390/crypto/ap_bus.c | 144 +++++++++++++++++++++++++++++++++--
>>  drivers/s390/crypto/ap_bus.h |   4 +
>>  2 files changed, 142 insertions(+), 6 deletions(-)
> (...)
>
>> @@ -1196,12 +1202,75 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
>>  	return rc;
>>  }
>>  
>> +int __verify_card_reservations(struct device_driver *drv, void *data)
>> +{
>> +	int rc = 0;
>> +	struct ap_driver *ap_drv = to_ap_drv(drv);
>> +	unsigned long *newapm = (unsigned long *)data;
>> +
>> +	/*
>> +	 * If the reserved bits do not identify cards reserved for use by the
>> +	 * non-default driver, there is no need to verify the driver is using
>> +	 * the queues.
> I had to read that one several times... what about
>
> "No need to verify whether the driver is using the queues if it is the
> default driver."
>
> ?
>
>> +	 */
>> +	if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
>> +		return 0;
>> +
>> +	/* The non-default driver's module must be loaded */
>> +	if (!try_module_get(drv->owner))
>> +		return 0;
> Is that really needed? I would have thought that the driver core's
> klist usage would make sure that the callback would not be invoked for
> drivers that are not registered anymore. Or am I missing a window?
The try_module_get() and module_put() is a result of review feedback from
my side. The ap bus core is static in the kernel whereas the
vfio dd is a kernel module. So there may be a race condition between
calling the callback function and removal of the vfio dd module.
There is similar code in zcrypt_api which does the same for the zcrypt
device drivers before using some variables or functions from the modules.
Help me, it this is outdated code and there is no need to adjust the
module reference counter any more, then I would be happy to remove
this code :-)
>
>> +
>> +	if (ap_drv->in_use)
>> +		if (ap_drv->in_use(newapm, ap_perms.aqm))
> Can we log the offending apm somewhere, preferably with additional info
> that allows the admin to figure out why an error was returned?
>
>> +			rc = -EADDRINUSE;
>> +
>> +	module_put(drv->owner);
>> +
>> +	return rc;
>> +}
> (Same comments for the other changes further along in this patch.)
>
Tony Krowiak April 15, 2020, 5:10 p.m. UTC | #4
On 4/14/20 8:58 AM, Cornelia Huck wrote:
> On Tue,  7 Apr 2020 15:20:03 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> Introduces a new driver callback to prevent a root user from unbinding
>> an AP queue from its device driver if the queue is in use. The intent of
>> this callback is to provide a driver with the means to prevent a root user
>> from inadvertently taking a queue away from a guest and giving it to the
>> host while the guest is still using it. The callback will
>> be invoked whenever a change to the AP bus's sysfs apmask or aqmask
>> attributes would result in one or more AP queues being removed from its
>> driver. If the callback responds in the affirmative for any driver
>> queried, the change to the apmask or aqmask will be rejected with a device
>> in use error.
>>
>> For this patch, only non-default drivers will be queried. Currently,
>> there is only one non-default driver, the vfio_ap device driver. The
>> vfio_ap device driver manages AP queues passed through to one or more
>> guests and we don't want to unexpectedly take AP resources away from
>> guests which are most likely independently administered.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/ap_bus.c | 144 +++++++++++++++++++++++++++++++++--
>>   drivers/s390/crypto/ap_bus.h |   4 +
>>   2 files changed, 142 insertions(+), 6 deletions(-)
> (...)
>
>> @@ -1196,12 +1202,75 @@ static ssize_t apmask_show(struct bus_type *bus, char *buf)
>>   	return rc;
>>   }
>>   
>> +int __verify_card_reservations(struct device_driver *drv, void *data)
>> +{
>> +	int rc = 0;
>> +	struct ap_driver *ap_drv = to_ap_drv(drv);
>> +	unsigned long *newapm = (unsigned long *)data;
>> +
>> +	/*
>> +	 * If the reserved bits do not identify cards reserved for use by the
>> +	 * non-default driver, there is no need to verify the driver is using
>> +	 * the queues.
> I had to read that one several times... what about
> "No need to verify whether the driver is using the queues if it is the
> default driver."
>
> ?

Sure, that's better.

>
>> +	 */
>> +	if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
>> +		return 0;
>> +
>> +	/* The non-default driver's module must be loaded */
>> +	if (!try_module_get(drv->owner))
>> +		return 0;
> Is that really needed? I would have thought that the driver core's
> klist usage would make sure that the callback would not be invoked for
> drivers that are not registered anymore. Or am I missing a window?
>
>> +
>> +	if (ap_drv->in_use)
>> +		if (ap_drv->in_use(newapm, ap_perms.aqm))
> Can we log the offending apm somewhere, preferably with additional info
> that allows the admin to figure out why an error was returned?

One of the things on my TODO list is to add logging to the vfio_ap
module which will track all significant activity within the device
driver. I plan to do that with a patch or set of patches specifically
put together for that purpose. Having said that, the best place to
log this would be in the in_use callback in the vfio_ap device driver
(see next patch) where the APQNs that are in use can be identified.
For now, I will log a message to the dmesg log indicating which
APQNs are in use by the matrix mdev.

>
>> +			rc = -EADDRINUSE;
>> +
>> +	module_put(drv->owner);
>> +
>> +	return rc;
>> +}
> (Same comments for the other changes further along in this patch.)
>
Tony Krowiak April 15, 2020, 5:10 p.m. UTC | #5
On 4/14/20 8:08 AM, Cornelia Huck wrote:
> On Tue,  7 Apr 2020 15:20:03 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> Introduces a new driver callback to prevent a root user from unbinding
>> an AP queue from its device driver if the queue is in use. The intent of
>> this callback is to provide a driver with the means to prevent a root user
>> from inadvertently taking a queue away from a guest and giving it to the
>> host while the guest is still using it. The callback will
>> be invoked whenever a change to the AP bus's sysfs apmask or aqmask
>> attributes would result in one or more AP queues being removed from its
>> driver. If the callback responds in the affirmative for any driver
>> queried, the change to the apmask or aqmask will be rejected with a device
>> in use error.
>>
>> For this patch, only non-default drivers will be queried. Currently,
>> there is only one non-default driver, the vfio_ap device driver. The
>> vfio_ap device driver manages AP queues passed through to one or more
>> guests and we don't want to unexpectedly take AP resources away from
>> guests which are most likely independently administered.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/ap_bus.c | 144 +++++++++++++++++++++++++++++++++--
>>   drivers/s390/crypto/ap_bus.h |   4 +
>>   2 files changed, 142 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
>> index 5256e3ce84e5..af15c095e76a 100644
>> --- a/drivers/s390/crypto/ap_bus.c
>> +++ b/drivers/s390/crypto/ap_bus.c
>> @@ -35,6 +35,7 @@
>>   #include <linux/mod_devicetable.h>
>>   #include <linux/debugfs.h>
>>   #include <linux/ctype.h>
>> +#include <linux/module.h>
>>   
>>   #include "ap_bus.h"
>>   #include "ap_debug.h"
>> @@ -995,9 +996,11 @@ int ap_parse_mask_str(const char *str,
>>   	newmap = kmalloc(size, GFP_KERNEL);
>>   	if (!newmap)
>>   		return -ENOMEM;
>> -	if (mutex_lock_interruptible(lock)) {
>> -		kfree(newmap);
>> -		return -ERESTARTSYS;
>> +	if (lock) {
>> +		if (mutex_lock_interruptible(lock)) {
>> +			kfree(newmap);
>> +			return -ERESTARTSYS;
>> +		}
> This whole function is a bit odd. It seems all masks we want to
> manipulate are always guarded by the ap_perms_mutex, and the need for
> allowing lock == NULL comes from wanting to call this function with the
> ap_perms_mutex already held.
>
> That would argue for a locked/unlocked version of this function... but
> looking at it, why do we lock the way we do? The one thing this
> function (prior to this patch) does outside of the holding of the mutex
> is the allocation and freeing of newmap. But with this patch, we do the
> allocation and freeing of newmap while holding the mutex. Something
> seems a bit weird here.

Note that the ap_parse_mask function copies the newmap
to the bitmap passed in as a parameter to the function.
Prior to the introduction of this patch, the calling functions - i.e.,
apmask_store(), aqmask_store() and ap_perms_init() - passed
in the actual bitmap (i.e., ap_perms.apm or ap_perms aqm),
so the ap_perms were changed directly by this function.

With this patch, the apmask_store() and aqmask_store()
functions now pass in a copy of those bitmaps. This is so
we can verify that any APQNs being removed are not
in use by the vfio_ap device driver before committing the
change to ap_perms. Consequently, it is now necessary
to take the lock for the until the changes are committed.

Having explained that, you make a valid argument that
this calls for a locked/unlocked version of this function, so
I will modify this patch to that effect.

>
>>   	}
>>   
>>   	if (*str == '+' || *str == '-') {
Cornelia Huck April 16, 2020, 9:33 a.m. UTC | #6
On Wed, 15 Apr 2020 08:08:24 +0200
Harald Freudenberger <freude@linux.ibm.com> wrote:

> On 14.04.20 14:58, Cornelia Huck wrote:
> > On Tue,  7 Apr 2020 15:20:03 -0400
> > Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> >> +	/* The non-default driver's module must be loaded */
> >> +	if (!try_module_get(drv->owner))
> >> +		return 0;  
> > Is that really needed? I would have thought that the driver core's
> > klist usage would make sure that the callback would not be invoked for
> > drivers that are not registered anymore. Or am I missing a window?  
> The try_module_get() and module_put() is a result of review feedback from
> my side. The ap bus core is static in the kernel whereas the
> vfio dd is a kernel module. So there may be a race condition between
> calling the callback function and removal of the vfio dd module.
> There is similar code in zcrypt_api which does the same for the zcrypt
> device drivers before using some variables or functions from the modules.
> Help me, it this is outdated code and there is no need to adjust the
> module reference counter any more, then I would be happy to remove
> this code :-)

I think the driver core already should keep us safe. A built-in bus
calling a driver in a module is a very common pattern, and I think
->owner was introduced exactly for that case.

Unless I'm really missing something obvious?
Cornelia Huck April 16, 2020, 9:37 a.m. UTC | #7
On Wed, 15 Apr 2020 13:10:10 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> On 4/14/20 8:58 AM, Cornelia Huck wrote:
> > On Tue,  7 Apr 2020 15:20:03 -0400
> > Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> >> +
> >> +	if (ap_drv->in_use)
> >> +		if (ap_drv->in_use(newapm, ap_perms.aqm))  
> > Can we log the offending apm somewhere, preferably with additional info
> > that allows the admin to figure out why an error was returned?  
> 
> One of the things on my TODO list is to add logging to the vfio_ap
> module which will track all significant activity within the device
> driver. I plan to do that with a patch or set of patches specifically
> put together for that purpose. Having said that, the best place to
> log this would be in the in_use callback in the vfio_ap device driver
> (see next patch) where the APQNs that are in use can be identified.
> For now, I will log a message to the dmesg log indicating which
> APQNs are in use by the matrix mdev.

Sounds reasonable. My main issue was what an admin was supposed to do
until logging was in place :)

> 
> >  
> >> +			rc = -EADDRINUSE;
> >> +
> >> +	module_put(drv->owner);
> >> +
> >> +	return rc;
> >> +}
Cornelia Huck April 16, 2020, 10:05 a.m. UTC | #8
On Wed, 15 Apr 2020 13:10:18 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> On 4/14/20 8:08 AM, Cornelia Huck wrote:
> > On Tue,  7 Apr 2020 15:20:03 -0400
> > Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> >> @@ -995,9 +996,11 @@ int ap_parse_mask_str(const char *str,
> >>   	newmap = kmalloc(size, GFP_KERNEL);
> >>   	if (!newmap)
> >>   		return -ENOMEM;
> >> -	if (mutex_lock_interruptible(lock)) {
> >> -		kfree(newmap);
> >> -		return -ERESTARTSYS;
> >> +	if (lock) {
> >> +		if (mutex_lock_interruptible(lock)) {
> >> +			kfree(newmap);
> >> +			return -ERESTARTSYS;
> >> +		}  
> > This whole function is a bit odd. It seems all masks we want to
> > manipulate are always guarded by the ap_perms_mutex, and the need for
> > allowing lock == NULL comes from wanting to call this function with the
> > ap_perms_mutex already held.
> >
> > That would argue for a locked/unlocked version of this function... but
> > looking at it, why do we lock the way we do? The one thing this
> > function (prior to this patch) does outside of the holding of the mutex
> > is the allocation and freeing of newmap. But with this patch, we do the
> > allocation and freeing of newmap while holding the mutex. Something
> > seems a bit weird here.  
> 
> Note that the ap_parse_mask function copies the newmap
> to the bitmap passed in as a parameter to the function.
> Prior to the introduction of this patch, the calling functions - i.e.,
> apmask_store(), aqmask_store() and ap_perms_init() - passed
> in the actual bitmap (i.e., ap_perms.apm or ap_perms aqm),
> so the ap_perms were changed directly by this function.
> 
> With this patch, the apmask_store() and aqmask_store()
> functions now pass in a copy of those bitmaps. This is so
> we can verify that any APQNs being removed are not
> in use by the vfio_ap device driver before committing the
> change to ap_perms. Consequently, it is now necessary
> to take the lock for the until the changes are committed.

Yes, but every caller actually takes the mutex before calling this
function already :)

> Having explained that, you make a valid argument that
> this calls for a locked/unlocked version of this function, so
> I will modify this patch to that effect.

Ok.

The other thing I found weird is that the function does
alloc newmap -> grab mutex -> do manipulation -> release mutex -> free newmap

while the new callers do
(mutex already held) -> alloc newmap

so why grab/release the mutex the way the function does now? IOW, why
not have an unlocked __ap_parse_mask_string() and do

int ap_parse_mask_string(...)
{
	int rc;

	if (mutex_lock_interruptible(&ap_perms_mutex))
		return -ERESTARTSYS;
	rc = __ap_parse_mask_string(...);
        mutex_unlock(&ap_perms_mutex);
	return rc;
}
Tony Krowiak April 16, 2020, 2:35 p.m. UTC | #9
On 4/16/20 6:05 AM, Cornelia Huck wrote:
> On Wed, 15 Apr 2020 13:10:18 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> On 4/14/20 8:08 AM, Cornelia Huck wrote:
>>> On Tue,  7 Apr 2020 15:20:03 -0400
>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>>>> @@ -995,9 +996,11 @@ int ap_parse_mask_str(const char *str,
>>>>    	newmap = kmalloc(size, GFP_KERNEL);
>>>>    	if (!newmap)
>>>>    		return -ENOMEM;
>>>> -	if (mutex_lock_interruptible(lock)) {
>>>> -		kfree(newmap);
>>>> -		return -ERESTARTSYS;
>>>> +	if (lock) {
>>>> +		if (mutex_lock_interruptible(lock)) {
>>>> +			kfree(newmap);
>>>> +			return -ERESTARTSYS;
>>>> +		}
>>> This whole function is a bit odd. It seems all masks we want to
>>> manipulate are always guarded by the ap_perms_mutex, and the need for
>>> allowing lock == NULL comes from wanting to call this function with the
>>> ap_perms_mutex already held.
>>>
>>> That would argue for a locked/unlocked version of this function... but
>>> looking at it, why do we lock the way we do? The one thing this
>>> function (prior to this patch) does outside of the holding of the mutex
>>> is the allocation and freeing of newmap. But with this patch, we do the
>>> allocation and freeing of newmap while holding the mutex. Something
>>> seems a bit weird here.
>> Note that the ap_parse_mask function copies the newmap
>> to the bitmap passed in as a parameter to the function.
>> Prior to the introduction of this patch, the calling functions - i.e.,
>> apmask_store(), aqmask_store() and ap_perms_init() - passed
>> in the actual bitmap (i.e., ap_perms.apm or ap_perms aqm),
>> so the ap_perms were changed directly by this function.
>>
>> With this patch, the apmask_store() and aqmask_store()
>> functions now pass in a copy of those bitmaps. This is so
>> we can verify that any APQNs being removed are not
>> in use by the vfio_ap device driver before committing the
>> change to ap_perms. Consequently, it is now necessary
>> to take the lock for the until the changes are committed.
> Yes, but every caller actually takes the mutex before calling this
> function already :)

That is not a true statement, the ap_perms_init() function
does not take the mutex prior to calling this function. Keep in
mind, the ap_parse_mask function is not static and is exported,
I was precluded from removing the lock parameter from the function
definition.

>
>> Having explained that, you make a valid argument that
>> this calls for a locked/unlocked version of this function, so
>> I will modify this patch to that effect.
> Ok.
>
> The other thing I found weird is that the function does
> alloc newmap -> grab mutex -> do manipulation -> release mutex -> free newmap
> while the new callers do
> (mutex already held) -> alloc newmap
>
> so why grab/release the mutex the way the function does now? IOW, why
> not have an unlocked __ap_parse_mask_string() and do

In my last comment above, I agreed to create an unlocked version
of this function. Your example below is similar to what I
implemented after responding to your comment yesterday.

>
> int ap_parse_mask_string(...)
> {
> 	int rc;
>
> 	if (mutex_lock_interruptible(&ap_perms_mutex))
> 		return -ERESTARTSYS;
> 	rc = __ap_parse_mask_string(...);
>          mutex_unlock(&ap_perms_mutex);
> 	return rc;
> }
Harald Freudenberger April 17, 2020, 1:54 p.m. UTC | #10
On 16.04.20 11:33, Cornelia Huck wrote:
> On Wed, 15 Apr 2020 08:08:24 +0200
> Harald Freudenberger <freude@linux.ibm.com> wrote:
>
>> On 14.04.20 14:58, Cornelia Huck wrote:
>>> On Tue,  7 Apr 2020 15:20:03 -0400
>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>>>> +	/* The non-default driver's module must be loaded */
>>>> +	if (!try_module_get(drv->owner))
>>>> +		return 0;  
>>> Is that really needed? I would have thought that the driver core's
>>> klist usage would make sure that the callback would not be invoked for
>>> drivers that are not registered anymore. Or am I missing a window?  
>> The try_module_get() and module_put() is a result of review feedback from
>> my side. The ap bus core is static in the kernel whereas the
>> vfio dd is a kernel module. So there may be a race condition between
>> calling the callback function and removal of the vfio dd module.
>> There is similar code in zcrypt_api which does the same for the zcrypt
>> device drivers before using some variables or functions from the modules.
>> Help me, it this is outdated code and there is no need to adjust the
>> module reference counter any more, then I would be happy to remove
>> this code :-)
> I think the driver core already should keep us safe. A built-in bus
> calling a driver in a module is a very common pattern, and I think
> ->owner was introduced exactly for that case.
>
> Unless I'm really missing something obvious?
Hm. I tested a similar code (see zcrypt_api.c where try_module_get() and module_put()
is called surrounding use of functions related to the implementing driver.
The driver module has a reference count of 0 when not used and can get removed
- because refcount is 0 - at any time when there is nothing related to the driver pending.

As soon as the driver is actually used the try_module_get(...driver.owner) increases
the reference counter and makes it impossible to remove the module. After use the
module_put() reduces the reference count.
When I now remove the try_module_get() and module_put() calls and run this modified
code I immediately face a crash when the module is removed during use.

I see code in the kernel which does an initial try_module_get() on the driver to increase
the reference count, for example when the driver registers. However, I see no clear
way to remove such a driver module any more.

I know I had a fight with a tester some years ago where he stated that it is a valid
testcase to remove a device driver module 'during use of the driver'. So I'd like
to have the try_module_get() and module_put() invokations in the ap bus code
until you convince me there are other maybe better ways to make sure the
driver and it's functions are available at the time of the call.

Maybe we can discuss this offline if you wish :-)
Halil Pasic April 24, 2020, 3:33 a.m. UTC | #11
On Thu, 16 Apr 2020 11:37:21 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> On Wed, 15 Apr 2020 13:10:10 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> 
> > On 4/14/20 8:58 AM, Cornelia Huck wrote:
> > > On Tue,  7 Apr 2020 15:20:03 -0400
> > > Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> 
> > >> +
> > >> +	if (ap_drv->in_use)
> > >> +		if (ap_drv->in_use(newapm, ap_perms.aqm))  
> > > Can we log the offending apm somewhere, preferably with additional info
> > > that allows the admin to figure out why an error was returned?  
> > 
> > One of the things on my TODO list is to add logging to the vfio_ap
> > module which will track all significant activity within the device
> > driver. I plan to do that with a patch or set of patches specifically
> > put together for that purpose. Having said that, the best place to
> > log this would be in the in_use callback in the vfio_ap device driver
> > (see next patch) where the APQNs that are in use can be identified.
> > For now, I will log a message to the dmesg log indicating which
> > APQNs are in use by the matrix mdev.
> 
> Sounds reasonable. My main issue was what an admin was supposed to do
> until logging was in place :)

Logging may not be the right answer here. Imagine somebody wants to build
a nice web-tool for managing this stuff at scale -- e.g. something HMC. I
don't think the solution is to let this tool parse the kernel messages
and try to relate that to its own transactions.

But I do agree, having a way to report why "won't do" to the end user is
important for usability.

Regards,
Halil

> 
> > 
> > >  
> > >> +			rc = -EADDRINUSE;
> > >> +
> > >> +	module_put(drv->owner);
> > >> +
> > >> +	return rc;
> > >> +}  
>
Tony Krowiak April 24, 2020, 5:07 p.m. UTC | #12
On 4/23/20 11:33 PM, Halil Pasic wrote:
> On Thu, 16 Apr 2020 11:37:21 +0200
> Cornelia Huck <cohuck@redhat.com> wrote:
>
>> On Wed, 15 Apr 2020 13:10:10 -0400
>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>>
>>> On 4/14/20 8:58 AM, Cornelia Huck wrote:
>>>> On Tue,  7 Apr 2020 15:20:03 -0400
>>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>>>>> +
>>>>> +	if (ap_drv->in_use)
>>>>> +		if (ap_drv->in_use(newapm, ap_perms.aqm))
>>>> Can we log the offending apm somewhere, preferably with additional info
>>>> that allows the admin to figure out why an error was returned?
>>> One of the things on my TODO list is to add logging to the vfio_ap
>>> module which will track all significant activity within the device
>>> driver. I plan to do that with a patch or set of patches specifically
>>> put together for that purpose. Having said that, the best place to
>>> log this would be in the in_use callback in the vfio_ap device driver
>>> (see next patch) where the APQNs that are in use can be identified.
>>> For now, I will log a message to the dmesg log indicating which
>>> APQNs are in use by the matrix mdev.
>> Sounds reasonable. My main issue was what an admin was supposed to do
>> until logging was in place :)
> Logging may not be the right answer here. Imagine somebody wants to build
> a nice web-tool for managing this stuff at scale -- e.g. something HMC. I
> don't think the solution is to let this tool parse the kernel messages
> and try to relate that to its own transactions.

I don't believe there is no right or wrong answer here; I simply don't
see the relevance of discussing a tool in this context. We are talking
about a sysfs attribute interface here, so - correct me if I'm
mistaken - our options for notifying the user that a queue is in use are
limited to the return code from the sysfs interface and logging. I would
expect that a tool would have to do something similar to the callback
implemented in the vfio_ap device driver and check the APQNs
removed against the APQNs assigned to the mdevs to determine which
is in use.

>
> But I do agree, having a way to report why "won't do" to the end user is
> important for usability.
>
> Regards,
> Halil
>
>>>>   
>>>>> +			rc = -EADDRINUSE;
>>>>> +
>>>>> +	module_put(drv->owner);
>>>>> +
>>>>> +	return rc;
>>>>> +}
Halil Pasic April 24, 2020, 6:23 p.m. UTC | #13
On Fri, 24 Apr 2020 13:07:38 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> 
> 
> On 4/23/20 11:33 PM, Halil Pasic wrote:
> > On Thu, 16 Apr 2020 11:37:21 +0200
> > Cornelia Huck <cohuck@redhat.com> wrote:
> >
> >> On Wed, 15 Apr 2020 13:10:10 -0400
> >> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> >>
> >>> On 4/14/20 8:58 AM, Cornelia Huck wrote:
> >>>> On Tue,  7 Apr 2020 15:20:03 -0400
> >>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
> >>>>> +
> >>>>> +	if (ap_drv->in_use)
> >>>>> +		if (ap_drv->in_use(newapm, ap_perms.aqm))
> >>>> Can we log the offending apm somewhere, preferably with additional info
> >>>> that allows the admin to figure out why an error was returned?
> >>> One of the things on my TODO list is to add logging to the vfio_ap
> >>> module which will track all significant activity within the device
> >>> driver. I plan to do that with a patch or set of patches specifically
> >>> put together for that purpose. Having said that, the best place to
> >>> log this would be in the in_use callback in the vfio_ap device driver
> >>> (see next patch) where the APQNs that are in use can be identified.
> >>> For now, I will log a message to the dmesg log indicating which
> >>> APQNs are in use by the matrix mdev.
> >> Sounds reasonable. My main issue was what an admin was supposed to do
> >> until logging was in place :)
> > Logging may not be the right answer here. Imagine somebody wants to build
> > a nice web-tool for managing this stuff at scale -- e.g. something HMC. I
> > don't think the solution is to let this tool parse the kernel messages
> > and try to relate that to its own transactions.
> 
> I don't believe there is no right or wrong answer here; I simply don't
> see the relevance of discussing a tool in this context. We are talking
> about a sysfs attribute interface here, so - correct me if I'm
> mistaken - our options for notifying the user that a queue is in use are
> limited to the return code from the sysfs interface and logging. I would
> expect that a tool would have to do something similar to the callback
> implemented in the vfio_ap device driver and check the APQNs
> removed against the APQNs assigned to the mdevs to determine which
> is in use.
> 

We are talking interface design. The relevance of discussing a tool is
that any userspace tool must come by with whatever interface we come up
now. IMHO thinking about the usage (and the client code) is very helpful
in avoiding broken interface designs. AFAIK this is one of the basic
ideas behind test driven development.

Regards,
Halil
Pierre Morel April 27, 2020, 8:20 a.m. UTC | #14
On 2020-04-07 21:20, Tony Krowiak wrote:
> Introduces a new driver callback to prevent a root user from unbinding
> an AP queue from its device driver if the queue is in use. The intent of
> this callback is to provide a driver with the means to prevent a root user
> from inadvertently taking a queue away from a guest and giving it to the
> host while the guest is still using it.

How can we know, at this point if the guest uses or not the queue?
Do you want to say that this prevents to take away a queue when it is 
currently assigned to a VFIO device?
and with a guest currently using this VFIO device?

> The callback will
> be invoked whenever a change to the AP bus's sysfs apmask or aqmask
> attributes would result in one or more AP queues being removed from its
> driver. If the callback responds in the affirmative for any driver
> queried, the change to the apmask or aqmask will be rejected with a device
> in use error.

AFAIU you mean that Linux's driver's binding and unbinding mechanism is 
not sufficient to avoid this issue because unbind can not be refused by 
the driver.

The reason why we do not want a single queue to be removed from the VFIO 
driver is because the VFIO drivers works on a matrix, not on queues, and 
for the matrix to be consistent it needs to acquire all queues defined 
by the cross product of all APID and AQID assigned to the matrix.

This functionality is valid for the host as for the guests and is 
handled automatically by the firmware with the CRYCB.
The AP bus uses QCI to retrieve the host CRYCB and build the hosts AP 
queues.

If instead to mix VFIO CRYCB matrix handling and queues at the same 
level inside the AP bus we separate these different firmware entities in 
two different software entities.

If we make the AP bus sit above a CRYCB/Matrix bus, and in the way 
virtualize the QCI and test AP queue instructions:
- we can directly pass a matrix device to the guest though a VFIO matrix 
device
- the consistence will be automatic
- the VFIO device and parent device will be of the same kind which would 
make the design much more clearer.
- there will be no need for these callback because the consistence of 
the matrix will be guaranteed by firmware


> 
> For this patch, only non-default drivers will be queried. Currently,
> there is only one non-default driver, the vfio_ap device driver.

You mean that the admin may take queues away from the "default driver", 
while the queue is in use, to give it to an other driver?
Why is it to avoid in one way and not in the other way?

> The
> vfio_ap device driver manages AP queues passed through to one or more
> guests

I read this as if a queue may be passed to several guest...
please, rephrase or explain.

> and we don't want to unexpectedly take AP resources away from
> guests which are most likely independently administered.

When you say "independently administered", you mean as a second admin 
inside the host, don't you?

Regards,
Pierre
Tony Krowiak April 27, 2020, 9:36 p.m. UTC | #15
On 4/24/20 2:23 PM, Halil Pasic wrote:
> On Fri, 24 Apr 2020 13:07:38 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>>
>> On 4/23/20 11:33 PM, Halil Pasic wrote:
>>> On Thu, 16 Apr 2020 11:37:21 +0200
>>> Cornelia Huck <cohuck@redhat.com> wrote:
>>>
>>>> On Wed, 15 Apr 2020 13:10:10 -0400
>>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>>>>
>>>>> On 4/14/20 8:58 AM, Cornelia Huck wrote:
>>>>>> On Tue,  7 Apr 2020 15:20:03 -0400
>>>>>> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>>>>>>> +
>>>>>>> +	if (ap_drv->in_use)
>>>>>>> +		if (ap_drv->in_use(newapm, ap_perms.aqm))
>>>>>> Can we log the offending apm somewhere, preferably with additional info
>>>>>> that allows the admin to figure out why an error was returned?
>>>>> One of the things on my TODO list is to add logging to the vfio_ap
>>>>> module which will track all significant activity within the device
>>>>> driver. I plan to do that with a patch or set of patches specifically
>>>>> put together for that purpose. Having said that, the best place to
>>>>> log this would be in the in_use callback in the vfio_ap device driver
>>>>> (see next patch) where the APQNs that are in use can be identified.
>>>>> For now, I will log a message to the dmesg log indicating which
>>>>> APQNs are in use by the matrix mdev.
>>>> Sounds reasonable. My main issue was what an admin was supposed to do
>>>> until logging was in place :)
>>> Logging may not be the right answer here. Imagine somebody wants to build
>>> a nice web-tool for managing this stuff at scale -- e.g. something HMC. I
>>> don't think the solution is to let this tool parse the kernel messages
>>> and try to relate that to its own transactions.
>> I don't believe there is no right or wrong answer here; I simply don't
>> see the relevance of discussing a tool in this context. We are talking
>> about a sysfs attribute interface here, so - correct me if I'm
>> mistaken - our options for notifying the user that a queue is in use are
>> limited to the return code from the sysfs interface and logging. I would
>> expect that a tool would have to do something similar to the callback
>> implemented in the vfio_ap device driver and check the APQNs
>> removed against the APQNs assigned to the mdevs to determine which
>> is in use.
>>
> We are talking interface design. The relevance of discussing a tool is
> that any userspace tool must come by with whatever interface we come up
> now. IMHO thinking about the usage (and the client code) is very helpful
> in avoiding broken interface designs. AFAIK this is one of the basic
> ideas behind test driven development.

What can a sysfs interface, such as apmask/aqmask, do other than reply with
a return code given sysfs attributes that store data can only return a 
count?
I'm sorry, but I still don't see the relevance.

>
> Regards,
> Halil
Tony Krowiak April 27, 2020, 10:24 p.m. UTC | #16
On 4/27/20 4:20 AM, Pierre Morel wrote:
>
>
> On 2020-04-07 21:20, Tony Krowiak wrote:
>> Introduces a new driver callback to prevent a root user from unbinding
>> an AP queue from its device driver if the queue is in use. The intent of
>> this callback is to provide a driver with the means to prevent a root 
>> user
>> from inadvertently taking a queue away from a guest and giving it to the
>> host while the guest is still using it.
>
> How can we know, at this point if the guest uses or not the queue?

The struct ap_matrix_mdev has a field, struct kvm *kvm, which holds a 
pointer to KVM when
the matrix mdev is in use by a guest. This patch series also introduces 
a shadow_crycb (soon to
be shadow_apcb) which holds the AP configuration for the guest. Between 
those two things,
the driver can detect when a queue is in use by a guest.

> Do you want to say that this prevents to take away a queue when it is 
> currently assigned to a VFIO device?
> and with a guest currently using this VFIO device?

No, I do not. The intent here is to enforce the proper procedure for 
giving up a queue so it is done
deliberately. Before taking a queue away from the matrix mdev, its APQN 
should be unassigned
from the matrix mdev. That is not to say that if there are major 
objections to this that we can't
base in_use upon the queue being in use by a guest at the time. Maybe 
that is preferable to
the community. I'll leave it to them to state their case.

>
>> The callback will
>> be invoked whenever a change to the AP bus's sysfs apmask or aqmask
>> attributes would result in one or more AP queues being removed from its
>> driver. If the callback responds in the affirmative for any driver
>> queried, the change to the apmask or aqmask will be rejected with a 
>> device
>> in use error.
>
> AFAIU you mean that Linux's driver's binding and unbinding mechanism 
> is not sufficient to avoid this issue because unbind can not be 
> refused by the driver.

Correct!

>
>
> The reason why we do not want a single queue to be removed from the 
> VFIO driver is because the VFIO drivers works on a matrix, not on 
> queues, and for the matrix to be consistent it needs to acquire all 
> queues defined by the cross product of all APID and AQID assigned to 
> the matrix.

Not correct. The reason why is because we do not want a queue to be 
surreptitiously removed
without the guest administrator being aware of its removal.

>
> This functionality is valid for the host as for the guests and is 
> handled automatically by the firmware with the CRYCB.
> The AP bus uses QCI to retrieve the host CRYCB and build the hosts AP 
> queues.
>
> If instead to mix VFIO CRYCB matrix handling and queues at the same 
> level inside the AP bus we separate these different firmware entities 
> in two different software entities.
>
> If we make the AP bus sit above a CRYCB/Matrix bus, and in the way 
> virtualize the QCI and test AP queue instructions:
> - we can directly pass a matrix device to the guest though a VFIO 
> matrix device
> - the consistence will be automatic
> - the VFIO device and parent device will be of the same kind which 
> would make the design much more clearer.
> - there will be no need for these callback because the consistence of 
> the matrix will be guaranteed by firmware

As stated in my response above, the issue here is not consistency. While 
the design you describe
may be reasonable, it is a major departure from what is out in the 
field. In other words, that ship
has sailed.

>
>
>>
>> For this patch, only non-default drivers will be queried. Currently,
>> there is only one non-default driver, the vfio_ap device driver.
>
> You mean that the admin may take queues away from the "default 
> driver", while the queue is in use, to give it to an other driver?
> Why is it to avoid in one way and not in the other way?

Because the default drivers have direct control over the queues and can 
ensure they are empty
and reset before giving up control. The vfio driver does not have direct 
control over the queues
because they have been passed through to the guest.

>
>> The
>> vfio_ap device driver manages AP queues passed through to one or more
>> guests
>
> I read this as if a queue may be passed to several guest...
> please, rephrase or explain.

AP queues is plural, so it is true that AP queues can be passed through
to more than one guest. I see your point, however, so I'll reword that
to be more clear.

>
>> and we don't want to unexpectedly take AP resources away from
>> guests which are most likely independently administered.
>
> When you say "independently administered", you mean as a second admin 
> inside the host, don't you?

I mean that a guest can be administered by a different person than the 
host administrator.
Again, I'll try to clarify this.

>
>
> Regards,
> Pierre
>
Pierre Morel April 28, 2020, 8:09 a.m. UTC | #17
On 2020-04-28 00:24, Tony Krowiak wrote:
> 
> 
> On 4/27/20 4:20 AM, Pierre Morel wrote:
>>
>>
>> On 2020-04-07 21:20, Tony Krowiak wrote:
>>> Introduces a new driver callback to prevent a root user from unbinding
>>> an AP queue from its device driver if the queue is in use. The intent of
>>> this callback is to provide a driver with the means to prevent a root 
>>> user
>>> from inadvertently taking a queue away from a guest and giving it to the
>>> host while the guest is still using it.

...snip...

>>
>> This functionality is valid for the host as for the guests and is 
>> handled automatically by the firmware with the CRYCB.
>> The AP bus uses QCI to retrieve the host CRYCB and build the hosts AP 
>> queues.
>>
>> If instead to mix VFIO CRYCB matrix handling and queues at the same 
>> level inside the AP bus we separate these different firmware entities 
>> in two different software entities.
>>
>> If we make the AP bus sit above a CRYCB/Matrix bus, and in the way 
>> virtualize the QCI and test AP queue instructions:
>> - we can directly pass a matrix device to the guest though a VFIO 
>> matrix device
>> - the consistence will be automatic
>> - the VFIO device and parent device will be of the same kind which 
>> would make the design much more clearer.
>> - there will be no need for these callback because the consistence of 
>> the matrix will be guaranteed by firmware
> 
> As stated in my response above, the issue here is not consistency. While 
> the design you describe
> may be reasonable, it is a major departure from what is out in the 
> field. In other words, that ship
> has sailed.


The current VFIO-AP driver works as before, without any change, above 
the Matrix device I suggest.

Aside the old scheme which can continue, the Matrix device can be used 
directly to build a VFIO Matrix device, usable by QEMU without any 
modification.

Once the dynamic extensions proposed in this series and the associated 
tools are out on the field, then yes the ship is really far.
For now, the existing user's API do not change, the existing tools do 
not need modifications and we can repair the ship for its long journey.

The inconsistency between device and VFIO device and the resulting 
complexity is not going to ease future enhancement.

Regards,
Pierre
Harald Freudenberger April 28, 2020, 11:07 a.m. UTC | #18
On 28.04.20 00:24, Tony Krowiak wrote:
>
>
> On 4/27/20 4:20 AM, Pierre Morel wrote:
>>
>>
>> On 2020-04-07 21:20, Tony Krowiak wrote:
>>> Introduces a new driver callback to prevent a root user from unbinding
>>> an AP queue from its device driver if the queue is in use. The intent of
>>> this callback is to provide a driver with the means to prevent a root user
>>> from inadvertently taking a queue away from a guest and giving it to the
>>> host while the guest is still using it.
>>
>> How can we know, at this point if the guest uses or not the queue?
>
> The struct ap_matrix_mdev has a field, struct kvm *kvm, which holds a pointer to KVM when
> the matrix mdev is in use by a guest. This patch series also introduces a shadow_crycb (soon to
> be shadow_apcb) which holds the AP configuration for the guest. Between those two things,
> the driver can detect when a queue is in use by a guest.
>
>> Do you want to say that this prevents to take away a queue when it is currently assigned to a VFIO device?
>> and with a guest currently using this VFIO device?
>
> No, I do not. The intent here is to enforce the proper procedure for giving up a queue so it is done
> deliberately. Before taking a queue away from the matrix mdev, its APQN should be unassigned
> from the matrix mdev. That is not to say that if there are major objections to this that we can't
> base in_use upon the queue being in use by a guest at the time. Maybe that is preferable to
> the community. I'll leave it to them to state their case.
>
>>
>>> The callback will
>>> be invoked whenever a change to the AP bus's sysfs apmask or aqmask
>>> attributes would result in one or more AP queues being removed from its
>>> driver. If the callback responds in the affirmative for any driver
>>> queried, the change to the apmask or aqmask will be rejected with a device
>>> in use error.
>>
>> AFAIU you mean that Linux's driver's binding and unbinding mechanism is not sufficient to avoid this issue because unbind can not be refused by the driver.
>
> Correct!
>
>>
>>
>> The reason why we do not want a single queue to be removed from the VFIO driver is because the VFIO drivers works on a matrix, not on queues, and for the matrix to be consistent it needs to acquire all queues defined by the cross product of all APID and AQID assigned to the matrix.
>
> Not correct. The reason why is because we do not want a queue to be surreptitiously removed
> without the guest administrator being aware of its removal.
>
>>
>> This functionality is valid for the host as for the guests and is handled automatically by the firmware with the CRYCB.
>> The AP bus uses QCI to retrieve the host CRYCB and build the hosts AP queues.
>>
>> If instead to mix VFIO CRYCB matrix handling and queues at the same level inside the AP bus we separate these different firmware entities in two different software entities.
>>
>> If we make the AP bus sit above a CRYCB/Matrix bus, and in the way virtualize the QCI and test AP queue instructions:
>> - we can directly pass a matrix device to the guest though a VFIO matrix device
>> - the consistence will be automatic
>> - the VFIO device and parent device will be of the same kind which would make the design much more clearer.
>> - there will be no need for these callback because the consistence of the matrix will be guaranteed by firmware
>
> As stated in my response above, the issue here is not consistency. While the design you describe
> may be reasonable, it is a major departure from what is out in the field. In other words, that ship
> has sailed.
>
>>
>>
>>>
>>> For this patch, only non-default drivers will be queried. Currently,
>>> there is only one non-default driver, the vfio_ap device driver.
>>
>> You mean that the admin may take queues away from the "default driver", while the queue is in use, to give it to an other driver?
>> Why is it to avoid in one way and not in the other way?
>
> Because the default drivers have direct control over the queues and can ensure they are empty
> and reset before giving up control. The vfio driver does not have direct control over the queues
> because they have been passed through to the guest.
No, that's not true. The 'default' drivers have no change to do anything with an APQN when it is removed
from the driver. They get the very same notification which is the remove() callback as the vfio dd gets
and have the very same change to do something here. The more interesting thing here is, that the remove()
callback invocation is usually because a hardware HAS BEEN GONE AWAY. Neither the 'default' drivers
nor the vfio dd can do a reset on a not-any-more existing APQN.
And it is also not true that the vfio dd has no direct control over the queue because they have been passed
through to the guest. It's the job of the vfio dd to modify the guest's APM, AQM, ADM masks to disable
the guest's access to the APQN and then the vfio can (try to) do a reset.
>
>>
>>> The
>>> vfio_ap device driver manages AP queues passed through to one or more
>>> guests
>>
>> I read this as if a queue may be passed to several guest...
>> please, rephrase or explain.
>
> AP queues is plural, so it is true that AP queues can be passed through
> to more than one guest. I see your point, however, so I'll reword that
> to be more clear.
>
>>
>>> and we don't want to unexpectedly take AP resources away from
>>> guests which are most likely independently administered.
>>
>> When you say "independently administered", you mean as a second admin inside the host, don't you?
>
> I mean that a guest can be administered by a different person than the host administrator.
> Again, I'll try to clarify this.
>
>>
>>
>> Regards,
>> Pierre
>>
>
Tony Krowiak April 28, 2020, 2:37 p.m. UTC | #19
On 4/28/20 7:07 AM, Harald Freudenberger wrote:
> On 28.04.20 00:24, Tony Krowiak wrote:
>>
>> On 4/27/20 4:20 AM, Pierre Morel wrote:
>>>
>>> On 2020-04-07 21:20, Tony Krowiak wrote:
>>>> Introduces a new driver callback to prevent a root user from unbinding
>>>> an AP queue from its device driver if the queue is in use. The intent of
>>>> this callback is to provide a driver with the means to prevent a root user
>>>> from inadvertently taking a queue away from a guest and giving it to the
>>>> host while the guest is still using it.
>>> How can we know, at this point if the guest uses or not the queue?
>> The struct ap_matrix_mdev has a field, struct kvm *kvm, which holds a pointer to KVM when
>> the matrix mdev is in use by a guest. This patch series also introduces a shadow_crycb (soon to
>> be shadow_apcb) which holds the AP configuration for the guest. Between those two things,
>> the driver can detect when a queue is in use by a guest.
>>
>>> Do you want to say that this prevents to take away a queue when it is currently assigned to a VFIO device?
>>> and with a guest currently using this VFIO device?
>> No, I do not. The intent here is to enforce the proper procedure for giving up a queue so it is done
>> deliberately. Before taking a queue away from the matrix mdev, its APQN should be unassigned
>> from the matrix mdev. That is not to say that if there are major objections to this that we can't
>> base in_use upon the queue being in use by a guest at the time. Maybe that is preferable to
>> the community. I'll leave it to them to state their case.
>>
>>>> The callback will
>>>> be invoked whenever a change to the AP bus's sysfs apmask or aqmask
>>>> attributes would result in one or more AP queues being removed from its
>>>> driver. If the callback responds in the affirmative for any driver
>>>> queried, the change to the apmask or aqmask will be rejected with a device
>>>> in use error.
>>> AFAIU you mean that Linux's driver's binding and unbinding mechanism is not sufficient to avoid this issue because unbind can not be refused by the driver.
>> Correct!
>>
>>>
>>> The reason why we do not want a single queue to be removed from the VFIO driver is because the VFIO drivers works on a matrix, not on queues, and for the matrix to be consistent it needs to acquire all queues defined by the cross product of all APID and AQID assigned to the matrix.
>> Not correct. The reason why is because we do not want a queue to be surreptitiously removed
>> without the guest administrator being aware of its removal.
>>
>>> This functionality is valid for the host as for the guests and is handled automatically by the firmware with the CRYCB.
>>> The AP bus uses QCI to retrieve the host CRYCB and build the hosts AP queues.
>>>
>>> If instead to mix VFIO CRYCB matrix handling and queues at the same level inside the AP bus we separate these different firmware entities in two different software entities.
>>>
>>> If we make the AP bus sit above a CRYCB/Matrix bus, and in the way virtualize the QCI and test AP queue instructions:
>>> - we can directly pass a matrix device to the guest though a VFIO matrix device
>>> - the consistence will be automatic
>>> - the VFIO device and parent device will be of the same kind which would make the design much more clearer.
>>> - there will be no need for these callback because the consistence of the matrix will be guaranteed by firmware
>> As stated in my response above, the issue here is not consistency. While the design you describe
>> may be reasonable, it is a major departure from what is out in the field. In other words, that ship
>> has sailed.
>>
>>>
>>>> For this patch, only non-default drivers will be queried. Currently,
>>>> there is only one non-default driver, the vfio_ap device driver.
>>> You mean that the admin may take queues away from the "default driver", while the queue is in use, to give it to an other driver?
>>> Why is it to avoid in one way and not in the other way?
>> Because the default drivers have direct control over the queues and can ensure they are empty
>> and reset before giving up control. The vfio driver does not have direct control over the queues
>> because they have been passed through to the guest.
> No, that's not true. The 'default' drivers have no change to do anything with an APQN when it is removed
> from the driver. They get the very same notification which is the remove() callback as the vfio dd gets
> and have the very same change to do something here. The more interesting thing here is, that the remove()
> callback invocation is usually because a hardware HAS BEEN GONE AWAY. Neither the 'default' drivers
> nor the vfio dd can do a reset on a not-any-more existing APQN.
> And it is also not true that the vfio dd has no direct control over the queue because they have been passed
> through to the guest. It's the job of the vfio dd to modify the guest's APM, AQM, ADM masks to disable
> the guest's access to the APQN and then the vfio can (try to) do a reset.

The context here is when a sysadmin deliberately takes one or more 
queues away from a
guest by changing the apmask or aqmask; we are not talking about the the 
case where an
adapter is deconfigured or disappears. The idea here is to prevent a 
sysadmin for the host
from taking a queue away from a KVM guest that is using it. IMHO, control
over that queue should belong to the guest until such time as the guest 
gives it up or the
guest is terminated. Since the zcrypt drivers are directly responsible 
for their AP queues,
it is not necessary to implement this callback, although there is 
nothing precluding that.

>>>> The
>>>> vfio_ap device driver manages AP queues passed through to one or more
>>>> guests
>>> I read this as if a queue may be passed to several guest...
>>> please, rephrase or explain.
>> AP queues is plural, so it is true that AP queues can be passed through
>> to more than one guest. I see your point, however, so I'll reword that
>> to be more clear.
>>
>>>> and we don't want to unexpectedly take AP resources away from
>>>> guests which are most likely independently administered.
>>> When you say "independently administered", you mean as a second admin inside the host, don't you?
>> I mean that a guest can be administered by a different person than the host administrator.
>> Again, I'll try to clarify this.
>>
>>>
>>> Regards,
>>> Pierre
>>>

Patch
diff mbox series

diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
index 5256e3ce84e5..af15c095e76a 100644
--- a/drivers/s390/crypto/ap_bus.c
+++ b/drivers/s390/crypto/ap_bus.c
@@ -35,6 +35,7 @@ 
 #include <linux/mod_devicetable.h>
 #include <linux/debugfs.h>
 #include <linux/ctype.h>
+#include <linux/module.h>
 
 #include "ap_bus.h"
 #include "ap_debug.h"
@@ -995,9 +996,11 @@  int ap_parse_mask_str(const char *str,
 	newmap = kmalloc(size, GFP_KERNEL);
 	if (!newmap)
 		return -ENOMEM;
-	if (mutex_lock_interruptible(lock)) {
-		kfree(newmap);
-		return -ERESTARTSYS;
+	if (lock) {
+		if (mutex_lock_interruptible(lock)) {
+			kfree(newmap);
+			return -ERESTARTSYS;
+		}
 	}
 
 	if (*str == '+' || *str == '-') {
@@ -1009,7 +1012,10 @@  int ap_parse_mask_str(const char *str,
 	}
 	if (rc == 0)
 		memcpy(bitmap, newmap, size);
-	mutex_unlock(lock);
+
+	if (lock)
+		mutex_unlock(lock);
+
 	kfree(newmap);
 	return rc;
 }
@@ -1196,12 +1202,75 @@  static ssize_t apmask_show(struct bus_type *bus, char *buf)
 	return rc;
 }
 
+int __verify_card_reservations(struct device_driver *drv, void *data)
+{
+	int rc = 0;
+	struct ap_driver *ap_drv = to_ap_drv(drv);
+	unsigned long *newapm = (unsigned long *)data;
+
+	/*
+	 * If the reserved bits do not identify cards reserved for use by the
+	 * non-default driver, there is no need to verify the driver is using
+	 * the queues.
+	 */
+	if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
+		return 0;
+
+	/* The non-default driver's module must be loaded */
+	if (!try_module_get(drv->owner))
+		return 0;
+
+	if (ap_drv->in_use)
+		if (ap_drv->in_use(newapm, ap_perms.aqm))
+			rc = -EADDRINUSE;
+
+	module_put(drv->owner);
+
+	return rc;
+}
+
+static int apmask_commit(unsigned long *newapm)
+{
+	int rc;
+	unsigned long reserved[BITS_TO_LONGS(AP_DEVICES)];
+
+	/*
+	 * Check if any bits in the apmask have been set which will
+	 * result in queues being removed from non-default drivers
+	 */
+	if (bitmap_andnot(reserved, newapm, ap_perms.apm, AP_DEVICES)) {
+		rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
+				      __verify_card_reservations);
+		if (rc)
+			return rc;
+	}
+
+	memcpy(ap_perms.apm, newapm, APMASKSIZE);
+
+	return 0;
+}
+
 static ssize_t apmask_store(struct bus_type *bus, const char *buf,
 			    size_t count)
 {
 	int rc;
+	unsigned long newapm[BITS_TO_LONGS(AP_DEVICES)];
+
+	if (mutex_lock_interruptible(&ap_perms_mutex))
+		return -ERESTARTSYS;
+
+	memcpy(newapm, ap_perms.apm, APMASKSIZE);
+
+	rc = ap_parse_mask_str(buf, newapm, AP_DEVICES, NULL);
+	if (rc)
+		goto done;
+
+	rc = apmask_commit(newapm);
+	if (rc)
+		goto done;
 
-	rc = ap_parse_mask_str(buf, ap_perms.apm, AP_DEVICES, &ap_perms_mutex);
+done:
+	mutex_unlock(&ap_perms_mutex);
 	if (rc)
 		return rc;
 
@@ -1227,12 +1296,75 @@  static ssize_t aqmask_show(struct bus_type *bus, char *buf)
 	return rc;
 }
 
+int __verify_queue_reservations(struct device_driver *drv, void *data)
+{
+	int rc = 0;
+	struct ap_driver *ap_drv = to_ap_drv(drv);
+	unsigned long *newaqm = (unsigned long *)data;
+
+	/*
+	 * If the reserved bits do not identify queues reserved for use by the
+	 * non-default driver, there is no need to verify the driver is using
+	 * the queues.
+	 */
+	if (ap_drv->flags & AP_DRIVER_FLAG_DEFAULT)
+		return 0;
+
+	/* The non-default driver's module must be loaded */
+	if (!try_module_get(drv->owner))
+		return 0;
+
+	if (ap_drv->in_use)
+		if (ap_drv->in_use(ap_perms.apm, newaqm))
+			rc = -EADDRINUSE;
+
+	module_put(drv->owner);
+
+	return rc;
+}
+
+static int aqmask_commit(unsigned long *newaqm)
+{
+	int rc;
+	unsigned long reserved[BITS_TO_LONGS(AP_DOMAINS)];
+
+	/*
+	 * Check if any bits in the aqmask have been set which will
+	 * result in queues being removed from non-default drivers
+	 */
+	if (bitmap_andnot(reserved, newaqm, ap_perms.aqm, AP_DOMAINS)) {
+		rc = bus_for_each_drv(&ap_bus_type, NULL, reserved,
+				      __verify_queue_reservations);
+		if (rc)
+			return rc;
+	}
+
+	memcpy(ap_perms.aqm, newaqm, AQMASKSIZE);
+
+	return 0;
+}
+
 static ssize_t aqmask_store(struct bus_type *bus, const char *buf,
 			    size_t count)
 {
 	int rc;
+	unsigned long newaqm[BITS_TO_LONGS(AP_DEVICES)];
 
-	rc = ap_parse_mask_str(buf, ap_perms.aqm, AP_DOMAINS, &ap_perms_mutex);
+	if (mutex_lock_interruptible(&ap_perms_mutex))
+		return -ERESTARTSYS;
+
+	memcpy(newaqm, ap_perms.aqm, AQMASKSIZE);
+
+	rc = ap_parse_mask_str(buf, newaqm, AP_DOMAINS, NULL);
+	if (rc)
+		goto done;
+
+	rc = aqmask_commit(newaqm);
+	if (rc)
+		goto done;
+
+done:
+	mutex_unlock(&ap_perms_mutex);
 	if (rc)
 		return rc;
 
diff --git a/drivers/s390/crypto/ap_bus.h b/drivers/s390/crypto/ap_bus.h
index 4348fdff1c61..86bf5e224ba4 100644
--- a/drivers/s390/crypto/ap_bus.h
+++ b/drivers/s390/crypto/ap_bus.h
@@ -138,6 +138,7 @@  struct ap_driver {
 	void (*remove)(struct ap_device *);
 	void (*suspend)(struct ap_device *);
 	void (*resume)(struct ap_device *);
+	bool (*in_use)(unsigned long *apm, unsigned long *aqm);
 };
 
 #define to_ap_drv(x) container_of((x), struct ap_driver, driver)
@@ -266,6 +267,9 @@  void ap_queue_init_state(struct ap_queue *aq);
 struct ap_card *ap_card_create(int id, int queue_depth, int raw_device_type,
 			       int comp_device_type, unsigned int functions);
 
+#define APMASKSIZE (BITS_TO_LONGS(AP_DEVICES) * sizeof(unsigned long))
+#define AQMASKSIZE (BITS_TO_LONGS(AP_DOMAINS) * sizeof(unsigned long))
+
 struct ap_perms {
 	unsigned long ioctlm[BITS_TO_LONGS(AP_IOCTLS)];
 	unsigned long apm[BITS_TO_LONGS(AP_DEVICES)];