From: "Jason J. Herne" <jjherne@linux.ibm.com>
To: Tony Krowiak <akrowiak@linux.ibm.com>,
linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org,
kvm@vger.kernel.org
Cc: freude@linux.ibm.com, borntraeger@de.ibm.com, cohuck@redhat.com,
mjrosato@linux.ibm.com, pasic@linux.ibm.com,
alex.williamson@redhat.com, kwankhede@nvidia.com,
fiuczy@linux.ibm.com
Subject: Re: [PATCH v19 15/20] s390/vfio-ap: implement in-use callback for vfio_ap driver
Date: Thu, 2 Jun 2022 16:21:03 -0400 [thread overview]
Message-ID: <043e8669-145c-df44-74c4-a3023cb6b159@linux.ibm.com> (raw)
In-Reply-To: <e1ee7b53-9522-6a8a-463b-329bdab0dd30@linux.ibm.com>
On 6/2/22 15:19, Tony Krowiak wrote:
>
>
> On 6/2/22 2:16 PM, Jason J. Herne wrote:
>> On 4/4/22 18:10, Tony Krowiak wrote:
>>> Let's implement the callback to indicate when an APQN
>>> is in use by the vfio_ap device driver. The callback is
>>> invoked whenever a change to the apmask or aqmask would
>>> result in one or more queue devices being removed from the driver. The
>>> vfio_ap device driver will indicate a resource is in use
>>> if the APQN of any of the queue devices to be removed are assigned to
>>> any of the matrix mdevs under the driver's control.
>>>
>>> There is potential for a deadlock condition between the
>>> matrix_dev->guests_lock used to lock the guest during assignment of
>>> adapters and domains and the ap_perms_mutex locked by the AP bus when
>>> changes are made to the sysfs apmask/aqmask attributes.
>>>
>>> The AP Perms lock controls access to the objects that store the adapter
>>> numbers (ap_perms) and domain numbers (aq_perms) for the sysfs
>>> /sys/bus/ap/apmask and /sys/bus/ap/aqmask attributes. These attributes
>>> identify which queues are reserved for the zcrypt default device drivers.
>>> Before allowing a bit to be removed from either mask, the AP bus must check
>>> with the vfio_ap device driver to verify that none of the queues are
>>> assigned to any of its mediated devices.
>>>
>>> The apmask/aqmask attributes can be written or read at any time from
>>> userspace, so care must be taken to prevent a deadlock with asynchronous
>>> operations that might be taking place in the vfio_ap device driver. For
>>> example, consider the following:
>>>
>>> 1. A system administrator assigns an adapter to a mediated device under the
>>> control of the vfio_ap device driver. The driver will need to first take
>>> the matrix_dev->guests_lock to potentially hot plug the adapter into
>>> the KVM guest.
>>> 2. At the same time, a system administrator sets a bit in the sysfs
>>> /sys/bus/ap/ap_mask attribute. To complete the operation, the AP bus
>>> must:
>>> a. Take the ap_perms_mutex lock to update the object storing the values
>>> for the /sys/bus/ap/ap_mask attribute.
>>> b. Call the vfio_ap device driver's in-use callback to verify that the
>>> queues now being reserved for the default zcrypt drivers are not
>>> assigned to a mediated device owned by the vfio_ap device driver. To
>>> do the verification, the in-use callback function takes the
>>> matrix_dev->guests_lock, but has to wait because it is already held
>>> by the operation in 1 above.
>>> 3. The vfio_ap device driver calls an AP bus function to verify that the
>>> new queues resulting from the assignment of the adapter in step 1 are
>>> not reserved for the default zcrypt device driver. This AP bus function
>>> tries to take the ap_perms_mutex lock but gets stuck waiting for the
>>> waiting for the lock due to step 2a above.
>>>
>>> Consequently, we have the following deadlock situation:
>>>
>>> matrix_dev->guests_lock locked (1)
>>> ap_perms_mutex lock locked (2a)
>>> Waiting for matrix_dev->gusts_lock (2b) which is currently held (1)
>>> Waiting for ap_perms_mutex lock (3) which is currently held (2a)
>>>
>>> To prevent this deadlock scenario, the function called in step 3 will no
>>> longer take the ap_perms_mutex lock and require the caller to take the
>>> lock. The lock will be the first taken by the adapter/domain assignment
>>> functions in the vfio_ap device driver to maintain the proper locking
>>> order.
>>>
>>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>>> ---
>>> drivers/s390/crypto/ap_bus.c | 31 ++++++++----
>>> drivers/s390/crypto/vfio_ap_drv.c | 1 +
>>> drivers/s390/crypto/vfio_ap_ops.c | 68 +++++++++++++++++++++++++++
>>> drivers/s390/crypto/vfio_ap_private.h | 2 +
>>> 4 files changed, 94 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c
>>> index fdf16cb70881..f48552e900a2 100644
>>> --- a/drivers/s390/crypto/ap_bus.c
>>> +++ b/drivers/s390/crypto/ap_bus.c
>>> @@ -817,6 +817,17 @@ static void ap_bus_revise_bindings(void)
>>> bus_for_each_dev(&ap_bus_type, NULL, NULL, __ap_revise_reserved);
>>> }
>>> +/**
>>> + * ap_apqn_in_matrix_owned_by_def_drv: indicates whether an APQN c is reserved
>>> + * for the host drivers or not.
>>> + * @card: the APID of the adapter card to check
>>> + * @queue: the APQI of the queue to check
>>> + *
>>> + * Note: the ap_perms_mutex must be locked by the caller of this function.
>>> + *
>>> + * Return: an int specifying whether the APQN is reserved for the host (1) or
>>> + * not (0)
>>> + */
>>
>> Comment's function name does not match real function name. Also APQN "c", from
>> description does not appear to exist.
>
> No, it definitely does not match and the 'c' is an extraneous typo. I'll fix both.
>
>>
>>
>>
>>> int ap_owned_by_def_drv(int card, int queue)
>>> {
>>> int rc = 0;
>>> @@ -824,25 +835,31 @@ int ap_owned_by_def_drv(int card, int queue)
>>> if (card < 0 || card >= AP_DEVICES || queue < 0 || queue >= AP_DOMAINS)
>>> return -EINVAL;
>>> - mutex_lock(&ap_perms_mutex);
>>> -
>>> if (test_bit_inv(card, ap_perms.apm)
>>> && test_bit_inv(queue, ap_perms.aqm))
>>> rc = 1;
>>> - mutex_unlock(&ap_perms_mutex);
>>> -
>>> return rc;
>>> }
>>> EXPORT_SYMBOL(ap_owned_by_def_drv);
>>> +/**
>>> + * ap_apqn_in_matrix_owned_by_def_drv: indicates whether every APQN contained in
>>> + * a set is reserved for the host drivers
>>> + * or not.
>>> + * @apm: a bitmap specifying a set of APIDs comprising the APQNs to check
>>> + * @aqm: a bitmap specifying a set of APQIs comprising the APQNs to check
>>> + *
>>> + * Note: the ap_perms_mutex must be locked by the caller of this function.
>>> + *
>>> + * Return: an int specifying whether each APQN is reserved for the host (1) or
>>> + * not (0)
>>> + */
>>> int ap_apqn_in_matrix_owned_by_def_drv(unsigned long *apm,
>>> unsigned long *aqm)
>>> {
>>> int card, queue, rc = 0;
>>> - mutex_lock(&ap_perms_mutex);
>>> -
>>> for (card = 0; !rc && card < AP_DEVICES; card++)
>>> if (test_bit_inv(card, apm) &&
>>> test_bit_inv(card, ap_perms.apm))
>>> @@ -851,8 +868,6 @@ int ap_apqn_in_matrix_owned_by_def_drv(unsigned long *apm,
>>> test_bit_inv(queue, ap_perms.aqm))
>>> rc = 1;
>>> - mutex_unlock(&ap_perms_mutex);
>>> -
>>> return rc;
>>> }
>>> EXPORT_SYMBOL(ap_apqn_in_matrix_owned_by_def_drv);
>>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c b/drivers/s390/crypto/vfio_ap_drv.c
>>> index c258e5f7fdfc..2c3084589347 100644
>>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>>> @@ -107,6 +107,7 @@ static const struct attribute_group vfio_queue_attr_group = {
>>> static struct ap_driver vfio_ap_drv = {
>>> .probe = vfio_ap_mdev_probe_queue,
>>> .remove = vfio_ap_mdev_remove_queue,
>>> + .in_use = vfio_ap_mdev_resource_in_use,
>>> .ids = ap_queue_ids,
>>> };
>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>>> index 49ed54dc9e05..3ece2cd9f1e7 100644
>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>> @@ -902,6 +902,21 @@ static int vfio_ap_mdev_verify_no_sharing(unsigned long *mdev_apm,
>>> return 0;
>>> }
>>> +/**
>>> + * vfio_ap_mdev_validate_masks - verify that the APQNs assigned to the mdev are
>>> + * not reserved for the default zcrypt driver and
>>> + * are not assigned to another mdev.
>>> + *
>>> + * @matrix_mdev: the mdev to which the APQNs being validated are assigned.
>>> + *
>>> + * Return: One of the following values:
>>> + * o the error returned from the ap_apqn_in_matrix_owned_by_def_drv() function,
>>> + * most likely -EBUSY indicating the ap_perms_mutex lock is already held.
>>> + * o EADDRNOTAVAIL if an APQN assigned to @matrix_mdev is reserved for the
>>> + * zcrypt default driver.
>>> + * o EADDRINUSE if an APQN assigned to @matrix_mdev is assigned to another mdev
>>> + * o A zero indicating validation succeeded.
>>> + */
>>> static int vfio_ap_mdev_validate_masks(struct ap_matrix_mdev *matrix_mdev)
>>> {
>>> if (ap_apqn_in_matrix_owned_by_def_drv(matrix_mdev->matrix.apm,
>>> @@ -951,6 +966,10 @@ static void vfio_ap_mdev_link_adapter(struct ap_matrix_mdev
>>> *matrix_mdev,
>>> * An APQN derived from the cross product of the APID being assigned
>>> * and the APQIs previously assigned is being used by another mediated
>>> * matrix device
>>> + *
>>> + * 5. -EAGAIN
>>> + * A lock required to validate the mdev's AP configuration could not
>>> + * be obtained.
>>> */
>>> static ssize_t assign_adapter_store(struct device *dev,
>>> struct device_attribute *attr,
>>> @@ -961,6 +980,7 @@ static ssize_t assign_adapter_store(struct device *dev,
>>> DECLARE_BITMAP(apm_delta, AP_DEVICES);
>>> struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
>>> + mutex_lock(&ap_perms_mutex);
>>> get_update_locks_for_mdev(matrix_mdev);
>>> ret = kstrtoul(buf, 0, &apid);
>>> @@ -991,6 +1011,7 @@ static ssize_t assign_adapter_store(struct device *dev,
>>> ret = count;
>>> done:
>>> release_update_locks_for_mdev(matrix_mdev);
>>> + mutex_unlock(&ap_perms_mutex);
>>> return ret;
>>> }
>>> @@ -1144,6 +1165,10 @@ static void vfio_ap_mdev_link_domain(struct ap_matrix_mdev
>>> *matrix_mdev,
>>> * An APQN derived from the cross product of the APQI being assigned
>>> * and the APIDs previously assigned is being used by another mediated
>>> * matrix device
>>> + *
>>> + * 5. -EAGAIN
>>> + * The lock required to validate the mdev's AP configuration could not
>>> + * be obtained.
>>> */
>>> static ssize_t assign_domain_store(struct device *dev,
>>> struct device_attribute *attr,
>>> @@ -1154,6 +1179,7 @@ static ssize_t assign_domain_store(struct device *dev,
>>> DECLARE_BITMAP(aqm_delta, AP_DOMAINS);
>>> struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
>>> + mutex_lock(&ap_perms_mutex);
>>> get_update_locks_for_mdev(matrix_mdev);
>>> ret = kstrtoul(buf, 0, &apqi);
>>> @@ -1184,6 +1210,7 @@ static ssize_t assign_domain_store(struct device *dev,
>>> ret = count;
>>> done:
>>> release_update_locks_for_mdev(matrix_mdev);
>>> + mutex_unlock(&ap_perms_mutex);
>>> return ret;
>>> }
>>> @@ -1868,3 +1895,44 @@ void vfio_ap_mdev_remove_queue(struct ap_device *apdev)
>>> kfree(q);
>>> release_update_locks_for_mdev(matrix_mdev);
>>> }
>>> +
>>> +/**
>>> + * vfio_ap_mdev_resource_in_use: check whether any of a set of APQNs is
>>> + * assigned to a mediated device under the control
>>> + * of the vfio_ap device driver.
>>> + *
>>> + * @apm: a bitmap specifying a set of APIDs comprising the APQNs to check.
>>> + * @aqm: a bitmap specifying a set of APQIs comprising the APQNs to check.
>>> + *
>>> + * This function is invoked by the AP bus when changes to the apmask/aqmask
>>> + * attributes will result in giving control of the queue devices specified via
>>> + * @apm and @aqm to the default zcrypt device driver. Prior to calling this
>>> + * function, the AP bus locks the ap_perms_mutex. If this function is called
>>> + * while an adapter or domain is being assigned to a mediated device, the
>>> + * assignment operations will take the matrix_dev->guests_lock and
>>> + * matrix_dev->mdevs_lock then call the ap_apqn_in_matrix_owned_by_def_drv
>>> + * function, which also locks the ap_perms_mutex. This could result in a
>>> + * deadlock.
>>> + *
>>> + * To avoid a deadlock, this function will verify that the
>>> + * matrix_dev->guests_lock and matrix_dev->mdevs_lock are not currently held and
>>> + * will return -EBUSY if the locks can not be obtained.
>>
>> This part of the comment does not seem to match reality. Unless I'm missing
>> something? Did you mean to call mutex_trylock? Or is the comment stale?
>
> The comment was written prior to the changes introduced to avoid the locking
> dependency (i.e., taking the ap_perms_mutex lock by the assignment functions
> which I believe was your suggestion). I shall remove the comment.
With both changes made...
Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com>
--
-- Jason J. Herne (jjherne@linux.ibm.com)
next prev parent reply other threads:[~2022-06-02 20:21 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-04 22:10 [PATCH v19 00/20] s390/vfio-ap: dynamic configuration support Tony Krowiak
2022-04-04 22:10 ` [PATCH v19 01/20] s390/vfio-ap: use new AP bus interface to search for queue devices Tony Krowiak
2022-04-04 22:10 ` [PATCH v19 02/20] s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c Tony Krowiak
2022-05-24 14:49 ` Jason J. Herne
2022-05-24 17:41 ` Tony Krowiak
2022-04-04 22:10 ` [PATCH v19 03/20] s390/vfio-ap: manage link between queue struct and matrix mdev Tony Krowiak
2022-04-04 22:10 ` [PATCH v19 04/20] s390/vfio-ap: introduce shadow APCB Tony Krowiak
2022-04-04 22:10 ` [PATCH v19 05/20] s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev Tony Krowiak
2022-05-16 16:36 ` Jason J. Herne
2022-05-16 17:13 ` Tony Krowiak
2022-05-16 17:50 ` Jason J. Herne
2022-05-16 18:06 ` Tony Krowiak
2022-04-04 22:10 ` [PATCH v19 06/20] s390/vfio-ap: allow assignment of unavailable AP queues to mdev device Tony Krowiak
2022-04-04 22:10 ` [PATCH v19 07/20] s390/vfio-ap: rename matrix_dev->lock mutex to matrix_dev->mdevs_lock Tony Krowiak
2022-05-17 14:02 ` Jason J. Herne
2022-05-17 18:36 ` Tony Krowiak
2022-04-04 22:10 ` [PATCH v19 08/20] s390/vfio-ap: introduce new mutex to control access to the KVM pointer Tony Krowiak
2022-05-27 12:40 ` Jason J. Herne
2022-04-04 22:10 ` [PATCH v19 09/20] s390/vfio-ap: use proper locking order when setting/clearing " Tony Krowiak
2022-05-27 12:41 ` Jason J. Herne
2022-04-04 22:10 ` [PATCH v19 10/20] s390/vfio-ap: prepare for dynamic update of guest's APCB on assign/unassign Tony Krowiak
2022-05-27 13:18 ` Jason J. Herne
2022-05-31 10:32 ` Tony Krowiak
2022-04-04 22:10 ` [PATCH v19 11/20] s390/vfio-ap: prepare for dynamic update of guest's APCB on queue probe/remove Tony Krowiak
2022-05-27 13:36 ` Jason J. Herne
2022-05-31 10:44 ` Tony Krowiak
2022-06-07 12:05 ` Halil Pasic
2022-06-08 13:31 ` Tony Krowiak
2022-05-27 13:50 ` Jason J. Herne
2022-05-31 11:57 ` Tony Krowiak
2022-05-31 12:02 ` Tony Krowiak
2022-04-04 22:10 ` [PATCH v19 12/20] s390/vfio-ap: allow hot plug/unplug of AP devices when assigned/unassigned Tony Krowiak
2022-06-01 18:54 ` Jason J. Herne
2022-04-04 22:10 ` [PATCH v19 13/20] s390/vfio-ap: hot plug/unplug of AP devices when probed/removed Tony Krowiak
2022-06-01 18:55 ` Jason J. Herne
2022-04-04 22:10 ` [PATCH v19 14/20] s390/vfio-ap: reset queues after adapter/domain unassignment Tony Krowiak
2022-06-02 15:00 ` Jason J. Herne
2022-04-04 22:10 ` [PATCH v19 15/20] s390/vfio-ap: implement in-use callback for vfio_ap driver Tony Krowiak
2022-06-02 18:16 ` Jason J. Herne
2022-06-02 19:19 ` Tony Krowiak
2022-06-02 20:21 ` Jason J. Herne [this message]
2022-04-04 22:10 ` [PATCH v19 16/20] s390/vfio-ap: sysfs attribute to display the guest's matrix Tony Krowiak
2022-04-04 22:10 ` [PATCH v19 17/20] s390/vfio-ap: handle config changed and scan complete notification Tony Krowiak
2022-06-06 17:50 ` Jason J. Herne
2022-04-04 22:10 ` [PATCH v19 18/20] s390/vfio-ap: update docs to include dynamic config support Tony Krowiak
2022-05-31 13:22 ` Jason J. Herne
2022-04-04 22:10 ` [PATCH v19 19/20] s390/Docs: new doc describing lock usage by the vfio_ap device driver Tony Krowiak
2022-05-31 19:23 ` Jason J. Herne
2022-06-02 16:11 ` Tony Krowiak
2022-04-04 22:10 ` [PATCH v19 20/20] MAINTAINERS: pick up all vfio_ap docs for VFIO AP maintainers Tony Krowiak
2022-05-31 18:26 ` Matthew Rosato
2022-06-02 16:19 ` Tony Krowiak
2022-04-29 19:57 ` [PATCH v19 00/20] s390/vfio-ap: dynamic configuration support Tony Krowiak
2022-05-03 17:39 ` Tony Krowiak
2022-05-09 14:34 ` Tony Krowiak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=043e8669-145c-df44-74c4-a3023cb6b159@linux.ibm.com \
--to=jjherne@linux.ibm.com \
--cc=akrowiak@linux.ibm.com \
--cc=alex.williamson@redhat.com \
--cc=borntraeger@de.ibm.com \
--cc=cohuck@redhat.com \
--cc=fiuczy@linux.ibm.com \
--cc=freude@linux.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=kwankhede@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=mjrosato@linux.ibm.com \
--cc=pasic@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).