From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFD5CC433E0 for ; Thu, 4 Mar 2021 00:14:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8D74F64E6F for ; Thu, 4 Mar 2021 00:14:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1358491AbhCDALc (ORCPT ); Wed, 3 Mar 2021 19:11:32 -0500 Received: from mga14.intel.com ([192.55.52.115]:52209 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245235AbhCCFjD (ORCPT ); Wed, 3 Mar 2021 00:39:03 -0500 IronPort-SDR: ZiE5Wlx8P8PiG10tMsaZmtjblFkCkzM8R0UBHjbi7rXYuBQSOahbtedK1B6+FpQBraPR1N0iRH h9x56xTn/auQ== X-IronPort-AV: E=McAfee;i="6000,8403,9911"; a="186452277" X-IronPort-AV: E=Sophos;i="5.81,219,1610438400"; d="scan'208";a="186452277" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Mar 2021 21:36:36 -0800 IronPort-SDR: uJpPMJQAOMn6jvsypTjH8mRZci7G7EmgWvgdAFO9NDBWP4QPfg8eHrF5ai/0WUcZUAqPxJ03Fu FJv967d72kYQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,219,1610438400"; d="scan'208";a="428042389" Received: from allen-box.sh.intel.com (HELO [10.239.159.128]) ([10.239.159.128]) by fmsmga004.fm.intel.com with ESMTP; 02 Mar 2021 21:36:30 -0800 Cc: baolu.lu@linux.intel.com, lorenzo.pieralisi@arm.com, robh+dt@kernel.org, guohanjun@huawei.com, sudeep.holla@arm.com, rjw@rjwysocki.net, lenb@kernel.org, robin.murphy@arm.com, Jonathan.Cameron@huawei.com, eric.auger@redhat.com, iommu@lists.linux-foundation.org, devicetree@vger.kernel.org, linux-acpi@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-accelerators@lists.ozlabs.org, jacob.jun.pan@linux.intel.com, kevin.tian@intel.com, vdumpa@nvidia.com, zhangfei.gao@linaro.org, shameerali.kolothum.thodi@huawei.com, vivek.gautam@arm.com, zhukeqian1@huawei.com, wangzhou1@hisilicon.com, Ashok Raj Subject: Re: [PATCH v13 06/10] iommu: Add a page fault handler To: Jean-Philippe Brucker , joro@8bytes.org, will@kernel.org References: <20210302092644.2553014-1-jean-philippe@linaro.org> <20210302092644.2553014-7-jean-philippe@linaro.org> From: Lu Baolu Message-ID: <15ff4704-0fbd-243f-8e49-a9523ae63ce6@linux.intel.com> Date: Wed, 3 Mar 2021 13:27:34 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20210302092644.2553014-7-jean-philippe@linaro.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-acpi@vger.kernel.org Hi Jean, On 3/2/21 5:26 PM, Jean-Philippe Brucker wrote: > Some systems allow devices to handle I/O Page Faults in the core mm. For > example systems implementing the PCIe PRI extension or Arm SMMU stall > model. Infrastructure for reporting these recoverable page faults was > added to the IOMMU core by commit 0c830e6b3282 ("iommu: Introduce device > fault report API"). Add a page fault handler for host SVA. > > IOMMU driver can now instantiate several fault workqueues and link them > to IOPF-capable devices. Drivers can choose between a single global > workqueue, one per IOMMU device, one per low-level fault queue, one per > domain, etc. > > When it receives a fault event, most commonly in an IRQ handler, the > IOMMU driver reports the fault using iommu_report_device_fault(), which > calls the registered handler. The page fault handler then calls the mm > fault handler, and reports either success or failure with > iommu_page_response(). After the handler succeeds, the hardware retries > the access. > > The iopf_param pointer could be embedded into iommu_fault_param. But > putting iopf_param into the iommu_param structure allows us not to care > about ordering between calls to iopf_queue_add_device() and > iommu_register_device_fault_handler(). > > Reviewed-by: Eric Auger > Reviewed-by: Jonathan Cameron > Signed-off-by: Jean-Philippe Brucker I have tested this framework with the Intel VT-d implementation. It works as expected. Hence, Reviewed-by: Lu Baolu Tested-by: Lu Baolu One possible future optimization is that we could allow the system administrators to choose between handle PRQs in a workqueue or handle them synchronously. One research discovered that most of the software latency of handling a single page fault exists in the schedule part. Hence, synchronous processing will get shorter software latency if PRQs are rare and limited. Best regards, baolu > --- > drivers/iommu/Makefile | 1 + > drivers/iommu/iommu-sva-lib.h | 53 ++++ > include/linux/iommu.h | 2 + > drivers/iommu/io-pgfault.c | 461 ++++++++++++++++++++++++++++++++++ > 4 files changed, 517 insertions(+) > create mode 100644 drivers/iommu/io-pgfault.c > > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile > index 61bd30cd8369..60fafc23dee6 100644 > --- a/drivers/iommu/Makefile > +++ b/drivers/iommu/Makefile > @@ -28,3 +28,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o > obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o > obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o > obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o > +obj-$(CONFIG_IOMMU_SVA_LIB) += io-pgfault.o > diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h > index b40990aef3fd..031155010ca8 100644 > --- a/drivers/iommu/iommu-sva-lib.h > +++ b/drivers/iommu/iommu-sva-lib.h > @@ -12,4 +12,57 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max); > void iommu_sva_free_pasid(struct mm_struct *mm); > struct mm_struct *iommu_sva_find(ioasid_t pasid); > > +/* I/O Page fault */ > +struct device; > +struct iommu_fault; > +struct iopf_queue; > + > +#ifdef CONFIG_IOMMU_SVA_LIB > +int iommu_queue_iopf(struct iommu_fault *fault, void *cookie); > + > +int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev); > +int iopf_queue_remove_device(struct iopf_queue *queue, > + struct device *dev); > +int iopf_queue_flush_dev(struct device *dev); > +struct iopf_queue *iopf_queue_alloc(const char *name); > +void iopf_queue_free(struct iopf_queue *queue); > +int iopf_queue_discard_partial(struct iopf_queue *queue); > + > +#else /* CONFIG_IOMMU_SVA_LIB */ > +static inline int iommu_queue_iopf(struct iommu_fault *fault, void *cookie) > +{ > + return -ENODEV; > +} > + > +static inline int iopf_queue_add_device(struct iopf_queue *queue, > + struct device *dev) > +{ > + return -ENODEV; > +} > + > +static inline int iopf_queue_remove_device(struct iopf_queue *queue, > + struct device *dev) > +{ > + return -ENODEV; > +} > + > +static inline int iopf_queue_flush_dev(struct device *dev) > +{ > + return -ENODEV; > +} > + > +static inline struct iopf_queue *iopf_queue_alloc(const char *name) > +{ > + return NULL; > +} > + > +static inline void iopf_queue_free(struct iopf_queue *queue) > +{ > +} > + > +static inline int iopf_queue_discard_partial(struct iopf_queue *queue) > +{ > + return -ENODEV; > +} > +#endif /* CONFIG_IOMMU_SVA_LIB */ > #endif /* _IOMMU_SVA_LIB_H */ > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index 45c4eb372f56..86d688c4418f 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -367,6 +367,7 @@ struct iommu_fault_param { > * struct dev_iommu - Collection of per-device IOMMU data > * > * @fault_param: IOMMU detected device fault reporting data > + * @iopf_param: I/O Page Fault queue and data > * @fwspec: IOMMU fwspec data > * @iommu_dev: IOMMU device this device is linked to > * @priv: IOMMU Driver private data > @@ -377,6 +378,7 @@ struct iommu_fault_param { > struct dev_iommu { > struct mutex lock; > struct iommu_fault_param *fault_param; > + struct iopf_device_param *iopf_param; > struct iommu_fwspec *fwspec; > struct iommu_device *iommu_dev; > void *priv; > diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c > new file mode 100644 > index 000000000000..1df8c1dcae77 > --- /dev/null > +++ b/drivers/iommu/io-pgfault.c > @@ -0,0 +1,461 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Handle device page faults > + * > + * Copyright (C) 2020 ARM Ltd. > + */ > + > +#include > +#include > +#include > +#include > +#include > + > +#include "iommu-sva-lib.h" > + > +/** > + * struct iopf_queue - IO Page Fault queue > + * @wq: the fault workqueue > + * @devices: devices attached to this queue > + * @lock: protects the device list > + */ > +struct iopf_queue { > + struct workqueue_struct *wq; > + struct list_head devices; > + struct mutex lock; > +}; > + > +/** > + * struct iopf_device_param - IO Page Fault data attached to a device > + * @dev: the device that owns this param > + * @queue: IOPF queue > + * @queue_list: index into queue->devices > + * @partial: faults that are part of a Page Request Group for which the last > + * request hasn't been submitted yet. > + */ > +struct iopf_device_param { > + struct device *dev; > + struct iopf_queue *queue; > + struct list_head queue_list; > + struct list_head partial; > +}; > + > +struct iopf_fault { > + struct iommu_fault fault; > + struct list_head list; > +}; > + > +struct iopf_group { > + struct iopf_fault last_fault; > + struct list_head faults; > + struct work_struct work; > + struct device *dev; > +}; > + > +static int iopf_complete_group(struct device *dev, struct iopf_fault *iopf, > + enum iommu_page_response_code status) > +{ > + struct iommu_page_response resp = { > + .version = IOMMU_PAGE_RESP_VERSION_1, > + .pasid = iopf->fault.prm.pasid, > + .grpid = iopf->fault.prm.grpid, > + .code = status, > + }; > + > + if ((iopf->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) && > + (iopf->fault.prm.flags & IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID)) > + resp.flags = IOMMU_PAGE_RESP_PASID_VALID; > + > + return iommu_page_response(dev, &resp); > +} > + > +static enum iommu_page_response_code > +iopf_handle_single(struct iopf_fault *iopf) > +{ > + vm_fault_t ret; > + struct mm_struct *mm; > + struct vm_area_struct *vma; > + unsigned int access_flags = 0; > + unsigned int fault_flags = FAULT_FLAG_REMOTE; > + struct iommu_fault_page_request *prm = &iopf->fault.prm; > + enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID; > + > + if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID)) > + return status; > + > + mm = iommu_sva_find(prm->pasid); > + if (IS_ERR_OR_NULL(mm)) > + return status; > + > + mmap_read_lock(mm); > + > + vma = find_extend_vma(mm, prm->addr); > + if (!vma) > + /* Unmapped area */ > + goto out_put_mm; > + > + if (prm->perm & IOMMU_FAULT_PERM_READ) > + access_flags |= VM_READ; > + > + if (prm->perm & IOMMU_FAULT_PERM_WRITE) { > + access_flags |= VM_WRITE; > + fault_flags |= FAULT_FLAG_WRITE; > + } > + > + if (prm->perm & IOMMU_FAULT_PERM_EXEC) { > + access_flags |= VM_EXEC; > + fault_flags |= FAULT_FLAG_INSTRUCTION; > + } > + > + if (!(prm->perm & IOMMU_FAULT_PERM_PRIV)) > + fault_flags |= FAULT_FLAG_USER; > + > + if (access_flags & ~vma->vm_flags) > + /* Access fault */ > + goto out_put_mm; > + > + ret = handle_mm_fault(vma, prm->addr, fault_flags, NULL); > + status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID : > + IOMMU_PAGE_RESP_SUCCESS; > + > +out_put_mm: > + mmap_read_unlock(mm); > + mmput(mm); > + > + return status; > +} > + > +static void iopf_handle_group(struct work_struct *work) > +{ > + struct iopf_group *group; > + struct iopf_fault *iopf, *next; > + enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS; > + > + group = container_of(work, struct iopf_group, work); > + > + list_for_each_entry_safe(iopf, next, &group->faults, list) { > + /* > + * For the moment, errors are sticky: don't handle subsequent > + * faults in the group if there is an error. > + */ > + if (status == IOMMU_PAGE_RESP_SUCCESS) > + status = iopf_handle_single(iopf); > + > + if (!(iopf->fault.prm.flags & > + IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) > + kfree(iopf); > + } > + > + iopf_complete_group(group->dev, &group->last_fault, status); > + kfree(group); > +} > + > +/** > + * iommu_queue_iopf - IO Page Fault handler > + * @fault: fault event > + * @cookie: struct device, passed to iommu_register_device_fault_handler. > + * > + * Add a fault to the device workqueue, to be handled by mm. > + * > + * This module doesn't handle PCI PASID Stop Marker; IOMMU drivers must discard > + * them before reporting faults. A PASID Stop Marker (LRW = 0b100) doesn't > + * expect a response. It may be generated when disabling a PASID (issuing a > + * PASID stop request) by some PCI devices. > + * > + * The PASID stop request is issued by the device driver before unbind(). Once > + * it completes, no page request is generated for this PASID anymore and > + * outstanding ones have been pushed to the IOMMU (as per PCIe 4.0r1.0 - 6.20.1 > + * and 10.4.1.2 - Managing PASID TLP Prefix Usage). Some PCI devices will wait > + * for all outstanding page requests to come back with a response before > + * completing the PASID stop request. Others do not wait for page responses, and > + * instead issue this Stop Marker that tells us when the PASID can be > + * reallocated. > + * > + * It is safe to discard the Stop Marker because it is an optimization. > + * a. Page requests, which are posted requests, have been flushed to the IOMMU > + * when the stop request completes. > + * b. The IOMMU driver flushes all fault queues on unbind() before freeing the > + * PASID. > + * > + * So even though the Stop Marker might be issued by the device *after* the stop > + * request completes, outstanding faults will have been dealt with by the time > + * the PASID is freed. > + * > + * Return: 0 on success and <0 on error. > + */ > +int iommu_queue_iopf(struct iommu_fault *fault, void *cookie) > +{ > + int ret; > + struct iopf_group *group; > + struct iopf_fault *iopf, *next; > + struct iopf_device_param *iopf_param; > + > + struct device *dev = cookie; > + struct dev_iommu *param = dev->iommu; > + > + lockdep_assert_held(¶m->lock); > + > + if (fault->type != IOMMU_FAULT_PAGE_REQ) > + /* Not a recoverable page fault */ > + return -EOPNOTSUPP; > + > + /* > + * As long as we're holding param->lock, the queue can't be unlinked > + * from the device and therefore cannot disappear. > + */ > + iopf_param = param->iopf_param; > + if (!iopf_param) > + return -ENODEV; > + > + if (!(fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) { > + iopf = kzalloc(sizeof(*iopf), GFP_KERNEL); > + if (!iopf) > + return -ENOMEM; > + > + iopf->fault = *fault; > + > + /* Non-last request of a group. Postpone until the last one */ > + list_add(&iopf->list, &iopf_param->partial); > + > + return 0; > + } > + > + group = kzalloc(sizeof(*group), GFP_KERNEL); > + if (!group) { > + /* > + * The caller will send a response to the hardware. But we do > + * need to clean up before leaving, otherwise partial faults > + * will be stuck. > + */ > + ret = -ENOMEM; > + goto cleanup_partial; > + } > + > + group->dev = dev; > + group->last_fault.fault = *fault; > + INIT_LIST_HEAD(&group->faults); > + list_add(&group->last_fault.list, &group->faults); > + INIT_WORK(&group->work, iopf_handle_group); > + > + /* See if we have partial faults for this group */ > + list_for_each_entry_safe(iopf, next, &iopf_param->partial, list) { > + if (iopf->fault.prm.grpid == fault->prm.grpid) > + /* Insert *before* the last fault */ > + list_move(&iopf->list, &group->faults); > + } > + > + queue_work(iopf_param->queue->wq, &group->work); > + return 0; > + > +cleanup_partial: > + list_for_each_entry_safe(iopf, next, &iopf_param->partial, list) { > + if (iopf->fault.prm.grpid == fault->prm.grpid) { > + list_del(&iopf->list); > + kfree(iopf); > + } > + } > + return ret; > +} > +EXPORT_SYMBOL_GPL(iommu_queue_iopf); > + > +/** > + * iopf_queue_flush_dev - Ensure that all queued faults have been processed > + * @dev: the endpoint whose faults need to be flushed. > + * > + * The IOMMU driver calls this before releasing a PASID, to ensure that all > + * pending faults for this PASID have been handled, and won't hit the address > + * space of the next process that uses this PASID. The driver must make sure > + * that no new fault is added to the queue. In particular it must flush its > + * low-level queue before calling this function. > + * > + * Return: 0 on success and <0 on error. > + */ > +int iopf_queue_flush_dev(struct device *dev) > +{ > + int ret = 0; > + struct iopf_device_param *iopf_param; > + struct dev_iommu *param = dev->iommu; > + > + if (!param) > + return -ENODEV; > + > + mutex_lock(¶m->lock); > + iopf_param = param->iopf_param; > + if (iopf_param) > + flush_workqueue(iopf_param->queue->wq); > + else > + ret = -ENODEV; > + mutex_unlock(¶m->lock); > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(iopf_queue_flush_dev); > + > +/** > + * iopf_queue_discard_partial - Remove all pending partial fault > + * @queue: the queue whose partial faults need to be discarded > + * > + * When the hardware queue overflows, last page faults in a group may have been > + * lost and the IOMMU driver calls this to discard all partial faults. The > + * driver shouldn't be adding new faults to this queue concurrently. > + * > + * Return: 0 on success and <0 on error. > + */ > +int iopf_queue_discard_partial(struct iopf_queue *queue) > +{ > + struct iopf_fault *iopf, *next; > + struct iopf_device_param *iopf_param; > + > + if (!queue) > + return -EINVAL; > + > + mutex_lock(&queue->lock); > + list_for_each_entry(iopf_param, &queue->devices, queue_list) { > + list_for_each_entry_safe(iopf, next, &iopf_param->partial, > + list) { > + list_del(&iopf->list); > + kfree(iopf); > + } > + } > + mutex_unlock(&queue->lock); > + return 0; > +} > +EXPORT_SYMBOL_GPL(iopf_queue_discard_partial); > + > +/** > + * iopf_queue_add_device - Add producer to the fault queue > + * @queue: IOPF queue > + * @dev: device to add > + * > + * Return: 0 on success and <0 on error. > + */ > +int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev) > +{ > + int ret = -EBUSY; > + struct iopf_device_param *iopf_param; > + struct dev_iommu *param = dev->iommu; > + > + if (!param) > + return -ENODEV; > + > + iopf_param = kzalloc(sizeof(*iopf_param), GFP_KERNEL); > + if (!iopf_param) > + return -ENOMEM; > + > + INIT_LIST_HEAD(&iopf_param->partial); > + iopf_param->queue = queue; > + iopf_param->dev = dev; > + > + mutex_lock(&queue->lock); > + mutex_lock(¶m->lock); > + if (!param->iopf_param) { > + list_add(&iopf_param->queue_list, &queue->devices); > + param->iopf_param = iopf_param; > + ret = 0; > + } > + mutex_unlock(¶m->lock); > + mutex_unlock(&queue->lock); > + > + if (ret) > + kfree(iopf_param); > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(iopf_queue_add_device); > + > +/** > + * iopf_queue_remove_device - Remove producer from fault queue > + * @queue: IOPF queue > + * @dev: device to remove > + * > + * Caller makes sure that no more faults are reported for this device. > + * > + * Return: 0 on success and <0 on error. > + */ > +int iopf_queue_remove_device(struct iopf_queue *queue, struct device *dev) > +{ > + int ret = -EINVAL; > + struct iopf_fault *iopf, *next; > + struct iopf_device_param *iopf_param; > + struct dev_iommu *param = dev->iommu; > + > + if (!param || !queue) > + return -EINVAL; > + > + mutex_lock(&queue->lock); > + mutex_lock(¶m->lock); > + iopf_param = param->iopf_param; > + if (iopf_param && iopf_param->queue == queue) { > + list_del(&iopf_param->queue_list); > + param->iopf_param = NULL; > + ret = 0; > + } > + mutex_unlock(¶m->lock); > + mutex_unlock(&queue->lock); > + if (ret) > + return ret; > + > + /* Just in case some faults are still stuck */ > + list_for_each_entry_safe(iopf, next, &iopf_param->partial, list) > + kfree(iopf); > + > + kfree(iopf_param); > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(iopf_queue_remove_device); > + > +/** > + * iopf_queue_alloc - Allocate and initialize a fault queue > + * @name: a unique string identifying the queue (for workqueue) > + * > + * Return: the queue on success and NULL on error. > + */ > +struct iopf_queue *iopf_queue_alloc(const char *name) > +{ > + struct iopf_queue *queue; > + > + queue = kzalloc(sizeof(*queue), GFP_KERNEL); > + if (!queue) > + return NULL; > + > + /* > + * The WQ is unordered because the low-level handler enqueues faults by > + * group. PRI requests within a group have to be ordered, but once > + * that's dealt with, the high-level function can handle groups out of > + * order. > + */ > + queue->wq = alloc_workqueue("iopf_queue/%s", WQ_UNBOUND, 0, name); > + if (!queue->wq) { > + kfree(queue); > + return NULL; > + } > + > + INIT_LIST_HEAD(&queue->devices); > + mutex_init(&queue->lock); > + > + return queue; > +} > +EXPORT_SYMBOL_GPL(iopf_queue_alloc); > + > +/** > + * iopf_queue_free - Free IOPF queue > + * @queue: queue to free > + * > + * Counterpart to iopf_queue_alloc(). The driver must not be queuing faults or > + * adding/removing devices on this queue anymore. > + */ > +void iopf_queue_free(struct iopf_queue *queue) > +{ > + struct iopf_device_param *iopf_param, *next; > + > + if (!queue) > + return; > + > + list_for_each_entry_safe(iopf_param, next, &queue->devices, queue_list) > + iopf_queue_remove_device(queue, iopf_param->dev); > + > + destroy_workqueue(queue->wq); > + kfree(queue); > +} > +EXPORT_SYMBOL_GPL(iopf_queue_free); > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0924C433DB for ; Wed, 3 Mar 2021 05:36:43 +0000 (UTC) Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 489CD64EC0 for ; Wed, 3 Mar 2021 05:36:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 489CD64EC0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=iommu-bounces@lists.linux-foundation.org Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id CE8A06F60C; Wed, 3 Mar 2021 05:36:42 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xHhcStbMNPcv; Wed, 3 Mar 2021 05:36:41 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp3.osuosl.org (Postfix) with ESMTP id 09FC56F5EC; Wed, 3 Mar 2021 05:36:40 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id D6930C000B; Wed, 3 Mar 2021 05:36:40 +0000 (UTC) Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 29045C0001 for ; Wed, 3 Mar 2021 05:36:39 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 162A56F5F0 for ; Wed, 3 Mar 2021 05:36:39 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RSD_kmvduv5G for ; Wed, 3 Mar 2021 05:36:37 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by smtp3.osuosl.org (Postfix) with ESMTPS id 9C54A6F5EC for ; Wed, 3 Mar 2021 05:36:37 +0000 (UTC) IronPort-SDR: f0AO0xPlclYtBTdEJLksXxCtijdUbcA/g54bVd+Ic1ObyrwOX4Ptj01EaxDLVn5Uxp7Ayown/2 fB61xYXhs7fw== X-IronPort-AV: E=McAfee;i="6000,8403,9911"; a="187221468" X-IronPort-AV: E=Sophos;i="5.81,219,1610438400"; d="scan'208";a="187221468" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Mar 2021 21:36:36 -0800 IronPort-SDR: uJpPMJQAOMn6jvsypTjH8mRZci7G7EmgWvgdAFO9NDBWP4QPfg8eHrF5ai/0WUcZUAqPxJ03Fu FJv967d72kYQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,219,1610438400"; d="scan'208";a="428042389" Received: from allen-box.sh.intel.com (HELO [10.239.159.128]) ([10.239.159.128]) by fmsmga004.fm.intel.com with ESMTP; 02 Mar 2021 21:36:30 -0800 Subject: Re: [PATCH v13 06/10] iommu: Add a page fault handler To: Jean-Philippe Brucker , joro@8bytes.org, will@kernel.org References: <20210302092644.2553014-1-jean-philippe@linaro.org> <20210302092644.2553014-7-jean-philippe@linaro.org> From: Lu Baolu Message-ID: <15ff4704-0fbd-243f-8e49-a9523ae63ce6@linux.intel.com> Date: Wed, 3 Mar 2021 13:27:34 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20210302092644.2553014-7-jean-philippe@linaro.org> Content-Language: en-US Cc: vivek.gautam@arm.com, guohanjun@huawei.com, Ashok Raj , linux-acpi@vger.kernel.org, zhangfei.gao@linaro.org, lenb@kernel.org, devicetree@vger.kernel.org, kevin.tian@intel.com, robh+dt@kernel.org, linux-arm-kernel@lists.infradead.org, rjw@rjwysocki.net, iommu@lists.linux-foundation.org, sudeep.holla@arm.com, robin.murphy@arm.com, linux-accelerators@lists.ozlabs.org X-BeenThere: iommu@lists.linux-foundation.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Development issues for Linux IOMMU support List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: iommu-bounces@lists.linux-foundation.org Sender: "iommu" Hi Jean, On 3/2/21 5:26 PM, Jean-Philippe Brucker wrote: > Some systems allow devices to handle I/O Page Faults in the core mm. For > example systems implementing the PCIe PRI extension or Arm SMMU stall > model. Infrastructure for reporting these recoverable page faults was > added to the IOMMU core by commit 0c830e6b3282 ("iommu: Introduce device > fault report API"). Add a page fault handler for host SVA. > > IOMMU driver can now instantiate several fault workqueues and link them > to IOPF-capable devices. Drivers can choose between a single global > workqueue, one per IOMMU device, one per low-level fault queue, one per > domain, etc. > > When it receives a fault event, most commonly in an IRQ handler, the > IOMMU driver reports the fault using iommu_report_device_fault(), which > calls the registered handler. The page fault handler then calls the mm > fault handler, and reports either success or failure with > iommu_page_response(). After the handler succeeds, the hardware retries > the access. > > The iopf_param pointer could be embedded into iommu_fault_param. But > putting iopf_param into the iommu_param structure allows us not to care > about ordering between calls to iopf_queue_add_device() and > iommu_register_device_fault_handler(). > > Reviewed-by: Eric Auger > Reviewed-by: Jonathan Cameron > Signed-off-by: Jean-Philippe Brucker I have tested this framework with the Intel VT-d implementation. It works as expected. Hence, Reviewed-by: Lu Baolu Tested-by: Lu Baolu One possible future optimization is that we could allow the system administrators to choose between handle PRQs in a workqueue or handle them synchronously. One research discovered that most of the software latency of handling a single page fault exists in the schedule part. Hence, synchronous processing will get shorter software latency if PRQs are rare and limited. Best regards, baolu > --- > drivers/iommu/Makefile | 1 + > drivers/iommu/iommu-sva-lib.h | 53 ++++ > include/linux/iommu.h | 2 + > drivers/iommu/io-pgfault.c | 461 ++++++++++++++++++++++++++++++++++ > 4 files changed, 517 insertions(+) > create mode 100644 drivers/iommu/io-pgfault.c > > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile > index 61bd30cd8369..60fafc23dee6 100644 > --- a/drivers/iommu/Makefile > +++ b/drivers/iommu/Makefile > @@ -28,3 +28,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o > obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o > obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o > obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o > +obj-$(CONFIG_IOMMU_SVA_LIB) += io-pgfault.o > diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h > index b40990aef3fd..031155010ca8 100644 > --- a/drivers/iommu/iommu-sva-lib.h > +++ b/drivers/iommu/iommu-sva-lib.h > @@ -12,4 +12,57 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max); > void iommu_sva_free_pasid(struct mm_struct *mm); > struct mm_struct *iommu_sva_find(ioasid_t pasid); > > +/* I/O Page fault */ > +struct device; > +struct iommu_fault; > +struct iopf_queue; > + > +#ifdef CONFIG_IOMMU_SVA_LIB > +int iommu_queue_iopf(struct iommu_fault *fault, void *cookie); > + > +int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev); > +int iopf_queue_remove_device(struct iopf_queue *queue, > + struct device *dev); > +int iopf_queue_flush_dev(struct device *dev); > +struct iopf_queue *iopf_queue_alloc(const char *name); > +void iopf_queue_free(struct iopf_queue *queue); > +int iopf_queue_discard_partial(struct iopf_queue *queue); > + > +#else /* CONFIG_IOMMU_SVA_LIB */ > +static inline int iommu_queue_iopf(struct iommu_fault *fault, void *cookie) > +{ > + return -ENODEV; > +} > + > +static inline int iopf_queue_add_device(struct iopf_queue *queue, > + struct device *dev) > +{ > + return -ENODEV; > +} > + > +static inline int iopf_queue_remove_device(struct iopf_queue *queue, > + struct device *dev) > +{ > + return -ENODEV; > +} > + > +static inline int iopf_queue_flush_dev(struct device *dev) > +{ > + return -ENODEV; > +} > + > +static inline struct iopf_queue *iopf_queue_alloc(const char *name) > +{ > + return NULL; > +} > + > +static inline void iopf_queue_free(struct iopf_queue *queue) > +{ > +} > + > +static inline int iopf_queue_discard_partial(struct iopf_queue *queue) > +{ > + return -ENODEV; > +} > +#endif /* CONFIG_IOMMU_SVA_LIB */ > #endif /* _IOMMU_SVA_LIB_H */ > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index 45c4eb372f56..86d688c4418f 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -367,6 +367,7 @@ struct iommu_fault_param { > * struct dev_iommu - Collection of per-device IOMMU data > * > * @fault_param: IOMMU detected device fault reporting data > + * @iopf_param: I/O Page Fault queue and data > * @fwspec: IOMMU fwspec data > * @iommu_dev: IOMMU device this device is linked to > * @priv: IOMMU Driver private data > @@ -377,6 +378,7 @@ struct iommu_fault_param { > struct dev_iommu { > struct mutex lock; > struct iommu_fault_param *fault_param; > + struct iopf_device_param *iopf_param; > struct iommu_fwspec *fwspec; > struct iommu_device *iommu_dev; > void *priv; > diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c > new file mode 100644 > index 000000000000..1df8c1dcae77 > --- /dev/null > +++ b/drivers/iommu/io-pgfault.c > @@ -0,0 +1,461 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Handle device page faults > + * > + * Copyright (C) 2020 ARM Ltd. > + */ > + > +#include > +#include > +#include > +#include > +#include > + > +#include "iommu-sva-lib.h" > + > +/** > + * struct iopf_queue - IO Page Fault queue > + * @wq: the fault workqueue > + * @devices: devices attached to this queue > + * @lock: protects the device list > + */ > +struct iopf_queue { > + struct workqueue_struct *wq; > + struct list_head devices; > + struct mutex lock; > +}; > + > +/** > + * struct iopf_device_param - IO Page Fault data attached to a device > + * @dev: the device that owns this param > + * @queue: IOPF queue > + * @queue_list: index into queue->devices > + * @partial: faults that are part of a Page Request Group for which the last > + * request hasn't been submitted yet. > + */ > +struct iopf_device_param { > + struct device *dev; > + struct iopf_queue *queue; > + struct list_head queue_list; > + struct list_head partial; > +}; > + > +struct iopf_fault { > + struct iommu_fault fault; > + struct list_head list; > +}; > + > +struct iopf_group { > + struct iopf_fault last_fault; > + struct list_head faults; > + struct work_struct work; > + struct device *dev; > +}; > + > +static int iopf_complete_group(struct device *dev, struct iopf_fault *iopf, > + enum iommu_page_response_code status) > +{ > + struct iommu_page_response resp = { > + .version = IOMMU_PAGE_RESP_VERSION_1, > + .pasid = iopf->fault.prm.pasid, > + .grpid = iopf->fault.prm.grpid, > + .code = status, > + }; > + > + if ((iopf->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) && > + (iopf->fault.prm.flags & IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID)) > + resp.flags = IOMMU_PAGE_RESP_PASID_VALID; > + > + return iommu_page_response(dev, &resp); > +} > + > +static enum iommu_page_response_code > +iopf_handle_single(struct iopf_fault *iopf) > +{ > + vm_fault_t ret; > + struct mm_struct *mm; > + struct vm_area_struct *vma; > + unsigned int access_flags = 0; > + unsigned int fault_flags = FAULT_FLAG_REMOTE; > + struct iommu_fault_page_request *prm = &iopf->fault.prm; > + enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID; > + > + if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID)) > + return status; > + > + mm = iommu_sva_find(prm->pasid); > + if (IS_ERR_OR_NULL(mm)) > + return status; > + > + mmap_read_lock(mm); > + > + vma = find_extend_vma(mm, prm->addr); > + if (!vma) > + /* Unmapped area */ > + goto out_put_mm; > + > + if (prm->perm & IOMMU_FAULT_PERM_READ) > + access_flags |= VM_READ; > + > + if (prm->perm & IOMMU_FAULT_PERM_WRITE) { > + access_flags |= VM_WRITE; > + fault_flags |= FAULT_FLAG_WRITE; > + } > + > + if (prm->perm & IOMMU_FAULT_PERM_EXEC) { > + access_flags |= VM_EXEC; > + fault_flags |= FAULT_FLAG_INSTRUCTION; > + } > + > + if (!(prm->perm & IOMMU_FAULT_PERM_PRIV)) > + fault_flags |= FAULT_FLAG_USER; > + > + if (access_flags & ~vma->vm_flags) > + /* Access fault */ > + goto out_put_mm; > + > + ret = handle_mm_fault(vma, prm->addr, fault_flags, NULL); > + status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID : > + IOMMU_PAGE_RESP_SUCCESS; > + > +out_put_mm: > + mmap_read_unlock(mm); > + mmput(mm); > + > + return status; > +} > + > +static void iopf_handle_group(struct work_struct *work) > +{ > + struct iopf_group *group; > + struct iopf_fault *iopf, *next; > + enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS; > + > + group = container_of(work, struct iopf_group, work); > + > + list_for_each_entry_safe(iopf, next, &group->faults, list) { > + /* > + * For the moment, errors are sticky: don't handle subsequent > + * faults in the group if there is an error. > + */ > + if (status == IOMMU_PAGE_RESP_SUCCESS) > + status = iopf_handle_single(iopf); > + > + if (!(iopf->fault.prm.flags & > + IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) > + kfree(iopf); > + } > + > + iopf_complete_group(group->dev, &group->last_fault, status); > + kfree(group); > +} > + > +/** > + * iommu_queue_iopf - IO Page Fault handler > + * @fault: fault event > + * @cookie: struct device, passed to iommu_register_device_fault_handler. > + * > + * Add a fault to the device workqueue, to be handled by mm. > + * > + * This module doesn't handle PCI PASID Stop Marker; IOMMU drivers must discard > + * them before reporting faults. A PASID Stop Marker (LRW = 0b100) doesn't > + * expect a response. It may be generated when disabling a PASID (issuing a > + * PASID stop request) by some PCI devices. > + * > + * The PASID stop request is issued by the device driver before unbind(). Once > + * it completes, no page request is generated for this PASID anymore and > + * outstanding ones have been pushed to the IOMMU (as per PCIe 4.0r1.0 - 6.20.1 > + * and 10.4.1.2 - Managing PASID TLP Prefix Usage). Some PCI devices will wait > + * for all outstanding page requests to come back with a response before > + * completing the PASID stop request. Others do not wait for page responses, and > + * instead issue this Stop Marker that tells us when the PASID can be > + * reallocated. > + * > + * It is safe to discard the Stop Marker because it is an optimization. > + * a. Page requests, which are posted requests, have been flushed to the IOMMU > + * when the stop request completes. > + * b. The IOMMU driver flushes all fault queues on unbind() before freeing the > + * PASID. > + * > + * So even though the Stop Marker might be issued by the device *after* the stop > + * request completes, outstanding faults will have been dealt with by the time > + * the PASID is freed. > + * > + * Return: 0 on success and <0 on error. > + */ > +int iommu_queue_iopf(struct iommu_fault *fault, void *cookie) > +{ > + int ret; > + struct iopf_group *group; > + struct iopf_fault *iopf, *next; > + struct iopf_device_param *iopf_param; > + > + struct device *dev = cookie; > + struct dev_iommu *param = dev->iommu; > + > + lockdep_assert_held(¶m->lock); > + > + if (fault->type != IOMMU_FAULT_PAGE_REQ) > + /* Not a recoverable page fault */ > + return -EOPNOTSUPP; > + > + /* > + * As long as we're holding param->lock, the queue can't be unlinked > + * from the device and therefore cannot disappear. > + */ > + iopf_param = param->iopf_param; > + if (!iopf_param) > + return -ENODEV; > + > + if (!(fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) { > + iopf = kzalloc(sizeof(*iopf), GFP_KERNEL); > + if (!iopf) > + return -ENOMEM; > + > + iopf->fault = *fault; > + > + /* Non-last request of a group. Postpone until the last one */ > + list_add(&iopf->list, &iopf_param->partial); > + > + return 0; > + } > + > + group = kzalloc(sizeof(*group), GFP_KERNEL); > + if (!group) { > + /* > + * The caller will send a response to the hardware. But we do > + * need to clean up before leaving, otherwise partial faults > + * will be stuck. > + */ > + ret = -ENOMEM; > + goto cleanup_partial; > + } > + > + group->dev = dev; > + group->last_fault.fault = *fault; > + INIT_LIST_HEAD(&group->faults); > + list_add(&group->last_fault.list, &group->faults); > + INIT_WORK(&group->work, iopf_handle_group); > + > + /* See if we have partial faults for this group */ > + list_for_each_entry_safe(iopf, next, &iopf_param->partial, list) { > + if (iopf->fault.prm.grpid == fault->prm.grpid) > + /* Insert *before* the last fault */ > + list_move(&iopf->list, &group->faults); > + } > + > + queue_work(iopf_param->queue->wq, &group->work); > + return 0; > + > +cleanup_partial: > + list_for_each_entry_safe(iopf, next, &iopf_param->partial, list) { > + if (iopf->fault.prm.grpid == fault->prm.grpid) { > + list_del(&iopf->list); > + kfree(iopf); > + } > + } > + return ret; > +} > +EXPORT_SYMBOL_GPL(iommu_queue_iopf); > + > +/** > + * iopf_queue_flush_dev - Ensure that all queued faults have been processed > + * @dev: the endpoint whose faults need to be flushed. > + * > + * The IOMMU driver calls this before releasing a PASID, to ensure that all > + * pending faults for this PASID have been handled, and won't hit the address > + * space of the next process that uses this PASID. The driver must make sure > + * that no new fault is added to the queue. In particular it must flush its > + * low-level queue before calling this function. > + * > + * Return: 0 on success and <0 on error. > + */ > +int iopf_queue_flush_dev(struct device *dev) > +{ > + int ret = 0; > + struct iopf_device_param *iopf_param; > + struct dev_iommu *param = dev->iommu; > + > + if (!param) > + return -ENODEV; > + > + mutex_lock(¶m->lock); > + iopf_param = param->iopf_param; > + if (iopf_param) > + flush_workqueue(iopf_param->queue->wq); > + else > + ret = -ENODEV; > + mutex_unlock(¶m->lock); > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(iopf_queue_flush_dev); > + > +/** > + * iopf_queue_discard_partial - Remove all pending partial fault > + * @queue: the queue whose partial faults need to be discarded > + * > + * When the hardware queue overflows, last page faults in a group may have been > + * lost and the IOMMU driver calls this to discard all partial faults. The > + * driver shouldn't be adding new faults to this queue concurrently. > + * > + * Return: 0 on success and <0 on error. > + */ > +int iopf_queue_discard_partial(struct iopf_queue *queue) > +{ > + struct iopf_fault *iopf, *next; > + struct iopf_device_param *iopf_param; > + > + if (!queue) > + return -EINVAL; > + > + mutex_lock(&queue->lock); > + list_for_each_entry(iopf_param, &queue->devices, queue_list) { > + list_for_each_entry_safe(iopf, next, &iopf_param->partial, > + list) { > + list_del(&iopf->list); > + kfree(iopf); > + } > + } > + mutex_unlock(&queue->lock); > + return 0; > +} > +EXPORT_SYMBOL_GPL(iopf_queue_discard_partial); > + > +/** > + * iopf_queue_add_device - Add producer to the fault queue > + * @queue: IOPF queue > + * @dev: device to add > + * > + * Return: 0 on success and <0 on error. > + */ > +int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev) > +{ > + int ret = -EBUSY; > + struct iopf_device_param *iopf_param; > + struct dev_iommu *param = dev->iommu; > + > + if (!param) > + return -ENODEV; > + > + iopf_param = kzalloc(sizeof(*iopf_param), GFP_KERNEL); > + if (!iopf_param) > + return -ENOMEM; > + > + INIT_LIST_HEAD(&iopf_param->partial); > + iopf_param->queue = queue; > + iopf_param->dev = dev; > + > + mutex_lock(&queue->lock); > + mutex_lock(¶m->lock); > + if (!param->iopf_param) { > + list_add(&iopf_param->queue_list, &queue->devices); > + param->iopf_param = iopf_param; > + ret = 0; > + } > + mutex_unlock(¶m->lock); > + mutex_unlock(&queue->lock); > + > + if (ret) > + kfree(iopf_param); > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(iopf_queue_add_device); > + > +/** > + * iopf_queue_remove_device - Remove producer from fault queue > + * @queue: IOPF queue > + * @dev: device to remove > + * > + * Caller makes sure that no more faults are reported for this device. > + * > + * Return: 0 on success and <0 on error. > + */ > +int iopf_queue_remove_device(struct iopf_queue *queue, struct device *dev) > +{ > + int ret = -EINVAL; > + struct iopf_fault *iopf, *next; > + struct iopf_device_param *iopf_param; > + struct dev_iommu *param = dev->iommu; > + > + if (!param || !queue) > + return -EINVAL; > + > + mutex_lock(&queue->lock); > + mutex_lock(¶m->lock); > + iopf_param = param->iopf_param; > + if (iopf_param && iopf_param->queue == queue) { > + list_del(&iopf_param->queue_list); > + param->iopf_param = NULL; > + ret = 0; > + } > + mutex_unlock(¶m->lock); > + mutex_unlock(&queue->lock); > + if (ret) > + return ret; > + > + /* Just in case some faults are still stuck */ > + list_for_each_entry_safe(iopf, next, &iopf_param->partial, list) > + kfree(iopf); > + > + kfree(iopf_param); > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(iopf_queue_remove_device); > + > +/** > + * iopf_queue_alloc - Allocate and initialize a fault queue > + * @name: a unique string identifying the queue (for workqueue) > + * > + * Return: the queue on success and NULL on error. > + */ > +struct iopf_queue *iopf_queue_alloc(const char *name) > +{ > + struct iopf_queue *queue; > + > + queue = kzalloc(sizeof(*queue), GFP_KERNEL); > + if (!queue) > + return NULL; > + > + /* > + * The WQ is unordered because the low-level handler enqueues faults by > + * group. PRI requests within a group have to be ordered, but once > + * that's dealt with, the high-level function can handle groups out of > + * order. > + */ > + queue->wq = alloc_workqueue("iopf_queue/%s", WQ_UNBOUND, 0, name); > + if (!queue->wq) { > + kfree(queue); > + return NULL; > + } > + > + INIT_LIST_HEAD(&queue->devices); > + mutex_init(&queue->lock); > + > + return queue; > +} > +EXPORT_SYMBOL_GPL(iopf_queue_alloc); > + > +/** > + * iopf_queue_free - Free IOPF queue > + * @queue: queue to free > + * > + * Counterpart to iopf_queue_alloc(). The driver must not be queuing faults or > + * adding/removing devices on this queue anymore. > + */ > +void iopf_queue_free(struct iopf_queue *queue) > +{ > + struct iopf_device_param *iopf_param, *next; > + > + if (!queue) > + return; > + > + list_for_each_entry_safe(iopf_param, next, &queue->devices, queue_list) > + iopf_queue_remove_device(queue, iopf_param->dev); > + > + destroy_workqueue(queue->wq); > + kfree(queue); > +} > +EXPORT_SYMBOL_GPL(iopf_queue_free); > _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A663C433E0 for ; Wed, 3 Mar 2021 16:58:26 +0000 (UTC) Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5B8ED64DF4 for ; Wed, 3 Mar 2021 16:58:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5B8ED64DF4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Type: Content-Transfer-Encoding:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date:Message-ID:From: References:To:Subject:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=NEXLqfB+BzBmSu2MabG4M5OdjaUOf/NxfRfqKtBIRGQ=; b=dHYr0N4NJMCDm7euQK+MVWvmE D64lZngs0Jz+6cpCSXDuqVyFP1VZGj3XelfifjJBqwP/blcNGlk7fUwwi3E6irDx+zmcyOgEcNllo U79puwohS73ymf1nUd7SLdlSHaKaFeFtIGbxNXOBqp/OZsBqVeuAWF9tBVpeTSz5byZhQsuQ1x4kA xvqaBO6j1oYDin44oalm9Hs0Hk0by/G1f/aB5aihkJBZDUCxtF8fEiuWZ+TMIzXsxFgAcKR0L5gG5 d6zMNYvBsFnX6c4qSI8/tXTbMIjfH0tY7Ijqo8qa6jLfB1Dm8EiANgn9snvo+sSUR11DpKlHbN78F YuvMea0PQ==; Received: from localhost ([::1] helo=desiato.infradead.org) by desiato.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1lHUmI-005gjx-MK; Wed, 03 Mar 2021 16:55:18 +0000 Received: from casper.infradead.org ([2001:8b0:10b:1236::1]) by desiato.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lHSrS-005EUt-Fm for linux-arm-kernel@desiato.infradead.org; Wed, 03 Mar 2021 14:52:27 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:Content-Type: In-Reply-To:MIME-Version:Date:Message-ID:From:References:To:Subject:Cc:Sender :Reply-To:Content-ID:Content-Description; bh=UDH2dcwaEX66vC5s66A5Dvt6IgDPuKNNErpgd9ut4oM=; b=R8BOiPEcAT6BOFZ7SWbH0Me7aS rueYbdGX4flCvCRTc5RuR6cWGQaxmHWPTuS1LdSMrFms8WiIZzQAdZoC7wv02b9jR2Dg3AyrxIE9b qlar18oG0j8WPdgjYjejT4ZtiUkJan3MM/mOK1NM+MQG0dZlIkWr+qXUFOXRufO2Enku+EMpz1Xgm 0H9et3L2FJAdV7xkfC9VSC+lGw6V3VYhUEvqV8SmCfYpQFoGplcSz//O8MO6ovtRy5fuJdtczpK8F aeLH+Kgw3amo/rMyLEUKbYteaWEbacrRboS/ogp35ewQlF9NzQzT8CACgm6NLYRKDG54b2es3fFok q69gAnxQ==; Received: from mga03.intel.com ([134.134.136.65]) by casper.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lHKCb-001PYN-He for linux-arm-kernel@lists.infradead.org; Wed, 03 Mar 2021 05:37:43 +0000 IronPort-SDR: rTi3qD+V+hE/wqon4TkM8bk+s53KESl5ZXbAEy1qnsC1I/HASZ5JAAUZahXtX7dg51hFSnbZMP jZg6OEeD/22w== X-IronPort-AV: E=McAfee;i="6000,8403,9911"; a="187165172" X-IronPort-AV: E=Sophos;i="5.81,219,1610438400"; d="scan'208";a="187165172" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Mar 2021 21:36:36 -0800 IronPort-SDR: uJpPMJQAOMn6jvsypTjH8mRZci7G7EmgWvgdAFO9NDBWP4QPfg8eHrF5ai/0WUcZUAqPxJ03Fu FJv967d72kYQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,219,1610438400"; d="scan'208";a="428042389" Received: from allen-box.sh.intel.com (HELO [10.239.159.128]) ([10.239.159.128]) by fmsmga004.fm.intel.com with ESMTP; 02 Mar 2021 21:36:30 -0800 Subject: Re: [PATCH v13 06/10] iommu: Add a page fault handler To: Jean-Philippe Brucker , joro@8bytes.org, will@kernel.org References: <20210302092644.2553014-1-jean-philippe@linaro.org> <20210302092644.2553014-7-jean-philippe@linaro.org> From: Lu Baolu Message-ID: <15ff4704-0fbd-243f-8e49-a9523ae63ce6@linux.intel.com> Date: Wed, 3 Mar 2021 13:27:34 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20210302092644.2553014-7-jean-philippe@linaro.org> Content-Language: en-US X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210303_053743_623071_EE429999 X-CRM114-Status: GOOD ( 43.87 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: vivek.gautam@arm.com, guohanjun@huawei.com, zhukeqian1@huawei.com, lorenzo.pieralisi@arm.com, Ashok Raj , wangzhou1@hisilicon.com, linux-acpi@vger.kernel.org, zhangfei.gao@linaro.org, lenb@kernel.org, devicetree@vger.kernel.org, kevin.tian@intel.com, jacob.jun.pan@linux.intel.com, eric.auger@redhat.com, robh+dt@kernel.org, Jonathan.Cameron@huawei.com, linux-arm-kernel@lists.infradead.org, rjw@rjwysocki.net, shameerali.kolothum.thodi@huawei.com, iommu@lists.linux-foundation.org, sudeep.holla@arm.com, robin.murphy@arm.com, linux-accelerators@lists.ozlabs.org, baolu.lu@linux.intel.com Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi Jean, On 3/2/21 5:26 PM, Jean-Philippe Brucker wrote: > Some systems allow devices to handle I/O Page Faults in the core mm. For > example systems implementing the PCIe PRI extension or Arm SMMU stall > model. Infrastructure for reporting these recoverable page faults was > added to the IOMMU core by commit 0c830e6b3282 ("iommu: Introduce device > fault report API"). Add a page fault handler for host SVA. > > IOMMU driver can now instantiate several fault workqueues and link them > to IOPF-capable devices. Drivers can choose between a single global > workqueue, one per IOMMU device, one per low-level fault queue, one per > domain, etc. > > When it receives a fault event, most commonly in an IRQ handler, the > IOMMU driver reports the fault using iommu_report_device_fault(), which > calls the registered handler. The page fault handler then calls the mm > fault handler, and reports either success or failure with > iommu_page_response(). After the handler succeeds, the hardware retries > the access. > > The iopf_param pointer could be embedded into iommu_fault_param. But > putting iopf_param into the iommu_param structure allows us not to care > about ordering between calls to iopf_queue_add_device() and > iommu_register_device_fault_handler(). > > Reviewed-by: Eric Auger > Reviewed-by: Jonathan Cameron > Signed-off-by: Jean-Philippe Brucker I have tested this framework with the Intel VT-d implementation. It works as expected. Hence, Reviewed-by: Lu Baolu Tested-by: Lu Baolu One possible future optimization is that we could allow the system administrators to choose between handle PRQs in a workqueue or handle them synchronously. One research discovered that most of the software latency of handling a single page fault exists in the schedule part. Hence, synchronous processing will get shorter software latency if PRQs are rare and limited. Best regards, baolu > --- > drivers/iommu/Makefile | 1 + > drivers/iommu/iommu-sva-lib.h | 53 ++++ > include/linux/iommu.h | 2 + > drivers/iommu/io-pgfault.c | 461 ++++++++++++++++++++++++++++++++++ > 4 files changed, 517 insertions(+) > create mode 100644 drivers/iommu/io-pgfault.c > > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile > index 61bd30cd8369..60fafc23dee6 100644 > --- a/drivers/iommu/Makefile > +++ b/drivers/iommu/Makefile > @@ -28,3 +28,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o > obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o > obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o > obj-$(CONFIG_IOMMU_SVA_LIB) += iommu-sva-lib.o > +obj-$(CONFIG_IOMMU_SVA_LIB) += io-pgfault.o > diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h > index b40990aef3fd..031155010ca8 100644 > --- a/drivers/iommu/iommu-sva-lib.h > +++ b/drivers/iommu/iommu-sva-lib.h > @@ -12,4 +12,57 @@ int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max); > void iommu_sva_free_pasid(struct mm_struct *mm); > struct mm_struct *iommu_sva_find(ioasid_t pasid); > > +/* I/O Page fault */ > +struct device; > +struct iommu_fault; > +struct iopf_queue; > + > +#ifdef CONFIG_IOMMU_SVA_LIB > +int iommu_queue_iopf(struct iommu_fault *fault, void *cookie); > + > +int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev); > +int iopf_queue_remove_device(struct iopf_queue *queue, > + struct device *dev); > +int iopf_queue_flush_dev(struct device *dev); > +struct iopf_queue *iopf_queue_alloc(const char *name); > +void iopf_queue_free(struct iopf_queue *queue); > +int iopf_queue_discard_partial(struct iopf_queue *queue); > + > +#else /* CONFIG_IOMMU_SVA_LIB */ > +static inline int iommu_queue_iopf(struct iommu_fault *fault, void *cookie) > +{ > + return -ENODEV; > +} > + > +static inline int iopf_queue_add_device(struct iopf_queue *queue, > + struct device *dev) > +{ > + return -ENODEV; > +} > + > +static inline int iopf_queue_remove_device(struct iopf_queue *queue, > + struct device *dev) > +{ > + return -ENODEV; > +} > + > +static inline int iopf_queue_flush_dev(struct device *dev) > +{ > + return -ENODEV; > +} > + > +static inline struct iopf_queue *iopf_queue_alloc(const char *name) > +{ > + return NULL; > +} > + > +static inline void iopf_queue_free(struct iopf_queue *queue) > +{ > +} > + > +static inline int iopf_queue_discard_partial(struct iopf_queue *queue) > +{ > + return -ENODEV; > +} > +#endif /* CONFIG_IOMMU_SVA_LIB */ > #endif /* _IOMMU_SVA_LIB_H */ > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index 45c4eb372f56..86d688c4418f 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -367,6 +367,7 @@ struct iommu_fault_param { > * struct dev_iommu - Collection of per-device IOMMU data > * > * @fault_param: IOMMU detected device fault reporting data > + * @iopf_param: I/O Page Fault queue and data > * @fwspec: IOMMU fwspec data > * @iommu_dev: IOMMU device this device is linked to > * @priv: IOMMU Driver private data > @@ -377,6 +378,7 @@ struct iommu_fault_param { > struct dev_iommu { > struct mutex lock; > struct iommu_fault_param *fault_param; > + struct iopf_device_param *iopf_param; > struct iommu_fwspec *fwspec; > struct iommu_device *iommu_dev; > void *priv; > diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c > new file mode 100644 > index 000000000000..1df8c1dcae77 > --- /dev/null > +++ b/drivers/iommu/io-pgfault.c > @@ -0,0 +1,461 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Handle device page faults > + * > + * Copyright (C) 2020 ARM Ltd. > + */ > + > +#include > +#include > +#include > +#include > +#include > + > +#include "iommu-sva-lib.h" > + > +/** > + * struct iopf_queue - IO Page Fault queue > + * @wq: the fault workqueue > + * @devices: devices attached to this queue > + * @lock: protects the device list > + */ > +struct iopf_queue { > + struct workqueue_struct *wq; > + struct list_head devices; > + struct mutex lock; > +}; > + > +/** > + * struct iopf_device_param - IO Page Fault data attached to a device > + * @dev: the device that owns this param > + * @queue: IOPF queue > + * @queue_list: index into queue->devices > + * @partial: faults that are part of a Page Request Group for which the last > + * request hasn't been submitted yet. > + */ > +struct iopf_device_param { > + struct device *dev; > + struct iopf_queue *queue; > + struct list_head queue_list; > + struct list_head partial; > +}; > + > +struct iopf_fault { > + struct iommu_fault fault; > + struct list_head list; > +}; > + > +struct iopf_group { > + struct iopf_fault last_fault; > + struct list_head faults; > + struct work_struct work; > + struct device *dev; > +}; > + > +static int iopf_complete_group(struct device *dev, struct iopf_fault *iopf, > + enum iommu_page_response_code status) > +{ > + struct iommu_page_response resp = { > + .version = IOMMU_PAGE_RESP_VERSION_1, > + .pasid = iopf->fault.prm.pasid, > + .grpid = iopf->fault.prm.grpid, > + .code = status, > + }; > + > + if ((iopf->fault.prm.flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID) && > + (iopf->fault.prm.flags & IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID)) > + resp.flags = IOMMU_PAGE_RESP_PASID_VALID; > + > + return iommu_page_response(dev, &resp); > +} > + > +static enum iommu_page_response_code > +iopf_handle_single(struct iopf_fault *iopf) > +{ > + vm_fault_t ret; > + struct mm_struct *mm; > + struct vm_area_struct *vma; > + unsigned int access_flags = 0; > + unsigned int fault_flags = FAULT_FLAG_REMOTE; > + struct iommu_fault_page_request *prm = &iopf->fault.prm; > + enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID; > + > + if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID)) > + return status; > + > + mm = iommu_sva_find(prm->pasid); > + if (IS_ERR_OR_NULL(mm)) > + return status; > + > + mmap_read_lock(mm); > + > + vma = find_extend_vma(mm, prm->addr); > + if (!vma) > + /* Unmapped area */ > + goto out_put_mm; > + > + if (prm->perm & IOMMU_FAULT_PERM_READ) > + access_flags |= VM_READ; > + > + if (prm->perm & IOMMU_FAULT_PERM_WRITE) { > + access_flags |= VM_WRITE; > + fault_flags |= FAULT_FLAG_WRITE; > + } > + > + if (prm->perm & IOMMU_FAULT_PERM_EXEC) { > + access_flags |= VM_EXEC; > + fault_flags |= FAULT_FLAG_INSTRUCTION; > + } > + > + if (!(prm->perm & IOMMU_FAULT_PERM_PRIV)) > + fault_flags |= FAULT_FLAG_USER; > + > + if (access_flags & ~vma->vm_flags) > + /* Access fault */ > + goto out_put_mm; > + > + ret = handle_mm_fault(vma, prm->addr, fault_flags, NULL); > + status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID : > + IOMMU_PAGE_RESP_SUCCESS; > + > +out_put_mm: > + mmap_read_unlock(mm); > + mmput(mm); > + > + return status; > +} > + > +static void iopf_handle_group(struct work_struct *work) > +{ > + struct iopf_group *group; > + struct iopf_fault *iopf, *next; > + enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS; > + > + group = container_of(work, struct iopf_group, work); > + > + list_for_each_entry_safe(iopf, next, &group->faults, list) { > + /* > + * For the moment, errors are sticky: don't handle subsequent > + * faults in the group if there is an error. > + */ > + if (status == IOMMU_PAGE_RESP_SUCCESS) > + status = iopf_handle_single(iopf); > + > + if (!(iopf->fault.prm.flags & > + IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) > + kfree(iopf); > + } > + > + iopf_complete_group(group->dev, &group->last_fault, status); > + kfree(group); > +} > + > +/** > + * iommu_queue_iopf - IO Page Fault handler > + * @fault: fault event > + * @cookie: struct device, passed to iommu_register_device_fault_handler. > + * > + * Add a fault to the device workqueue, to be handled by mm. > + * > + * This module doesn't handle PCI PASID Stop Marker; IOMMU drivers must discard > + * them before reporting faults. A PASID Stop Marker (LRW = 0b100) doesn't > + * expect a response. It may be generated when disabling a PASID (issuing a > + * PASID stop request) by some PCI devices. > + * > + * The PASID stop request is issued by the device driver before unbind(). Once > + * it completes, no page request is generated for this PASID anymore and > + * outstanding ones have been pushed to the IOMMU (as per PCIe 4.0r1.0 - 6.20.1 > + * and 10.4.1.2 - Managing PASID TLP Prefix Usage). Some PCI devices will wait > + * for all outstanding page requests to come back with a response before > + * completing the PASID stop request. Others do not wait for page responses, and > + * instead issue this Stop Marker that tells us when the PASID can be > + * reallocated. > + * > + * It is safe to discard the Stop Marker because it is an optimization. > + * a. Page requests, which are posted requests, have been flushed to the IOMMU > + * when the stop request completes. > + * b. The IOMMU driver flushes all fault queues on unbind() before freeing the > + * PASID. > + * > + * So even though the Stop Marker might be issued by the device *after* the stop > + * request completes, outstanding faults will have been dealt with by the time > + * the PASID is freed. > + * > + * Return: 0 on success and <0 on error. > + */ > +int iommu_queue_iopf(struct iommu_fault *fault, void *cookie) > +{ > + int ret; > + struct iopf_group *group; > + struct iopf_fault *iopf, *next; > + struct iopf_device_param *iopf_param; > + > + struct device *dev = cookie; > + struct dev_iommu *param = dev->iommu; > + > + lockdep_assert_held(¶m->lock); > + > + if (fault->type != IOMMU_FAULT_PAGE_REQ) > + /* Not a recoverable page fault */ > + return -EOPNOTSUPP; > + > + /* > + * As long as we're holding param->lock, the queue can't be unlinked > + * from the device and therefore cannot disappear. > + */ > + iopf_param = param->iopf_param; > + if (!iopf_param) > + return -ENODEV; > + > + if (!(fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) { > + iopf = kzalloc(sizeof(*iopf), GFP_KERNEL); > + if (!iopf) > + return -ENOMEM; > + > + iopf->fault = *fault; > + > + /* Non-last request of a group. Postpone until the last one */ > + list_add(&iopf->list, &iopf_param->partial); > + > + return 0; > + } > + > + group = kzalloc(sizeof(*group), GFP_KERNEL); > + if (!group) { > + /* > + * The caller will send a response to the hardware. But we do > + * need to clean up before leaving, otherwise partial faults > + * will be stuck. > + */ > + ret = -ENOMEM; > + goto cleanup_partial; > + } > + > + group->dev = dev; > + group->last_fault.fault = *fault; > + INIT_LIST_HEAD(&group->faults); > + list_add(&group->last_fault.list, &group->faults); > + INIT_WORK(&group->work, iopf_handle_group); > + > + /* See if we have partial faults for this group */ > + list_for_each_entry_safe(iopf, next, &iopf_param->partial, list) { > + if (iopf->fault.prm.grpid == fault->prm.grpid) > + /* Insert *before* the last fault */ > + list_move(&iopf->list, &group->faults); > + } > + > + queue_work(iopf_param->queue->wq, &group->work); > + return 0; > + > +cleanup_partial: > + list_for_each_entry_safe(iopf, next, &iopf_param->partial, list) { > + if (iopf->fault.prm.grpid == fault->prm.grpid) { > + list_del(&iopf->list); > + kfree(iopf); > + } > + } > + return ret; > +} > +EXPORT_SYMBOL_GPL(iommu_queue_iopf); > + > +/** > + * iopf_queue_flush_dev - Ensure that all queued faults have been processed > + * @dev: the endpoint whose faults need to be flushed. > + * > + * The IOMMU driver calls this before releasing a PASID, to ensure that all > + * pending faults for this PASID have been handled, and won't hit the address > + * space of the next process that uses this PASID. The driver must make sure > + * that no new fault is added to the queue. In particular it must flush its > + * low-level queue before calling this function. > + * > + * Return: 0 on success and <0 on error. > + */ > +int iopf_queue_flush_dev(struct device *dev) > +{ > + int ret = 0; > + struct iopf_device_param *iopf_param; > + struct dev_iommu *param = dev->iommu; > + > + if (!param) > + return -ENODEV; > + > + mutex_lock(¶m->lock); > + iopf_param = param->iopf_param; > + if (iopf_param) > + flush_workqueue(iopf_param->queue->wq); > + else > + ret = -ENODEV; > + mutex_unlock(¶m->lock); > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(iopf_queue_flush_dev); > + > +/** > + * iopf_queue_discard_partial - Remove all pending partial fault > + * @queue: the queue whose partial faults need to be discarded > + * > + * When the hardware queue overflows, last page faults in a group may have been > + * lost and the IOMMU driver calls this to discard all partial faults. The > + * driver shouldn't be adding new faults to this queue concurrently. > + * > + * Return: 0 on success and <0 on error. > + */ > +int iopf_queue_discard_partial(struct iopf_queue *queue) > +{ > + struct iopf_fault *iopf, *next; > + struct iopf_device_param *iopf_param; > + > + if (!queue) > + return -EINVAL; > + > + mutex_lock(&queue->lock); > + list_for_each_entry(iopf_param, &queue->devices, queue_list) { > + list_for_each_entry_safe(iopf, next, &iopf_param->partial, > + list) { > + list_del(&iopf->list); > + kfree(iopf); > + } > + } > + mutex_unlock(&queue->lock); > + return 0; > +} > +EXPORT_SYMBOL_GPL(iopf_queue_discard_partial); > + > +/** > + * iopf_queue_add_device - Add producer to the fault queue > + * @queue: IOPF queue > + * @dev: device to add > + * > + * Return: 0 on success and <0 on error. > + */ > +int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev) > +{ > + int ret = -EBUSY; > + struct iopf_device_param *iopf_param; > + struct dev_iommu *param = dev->iommu; > + > + if (!param) > + return -ENODEV; > + > + iopf_param = kzalloc(sizeof(*iopf_param), GFP_KERNEL); > + if (!iopf_param) > + return -ENOMEM; > + > + INIT_LIST_HEAD(&iopf_param->partial); > + iopf_param->queue = queue; > + iopf_param->dev = dev; > + > + mutex_lock(&queue->lock); > + mutex_lock(¶m->lock); > + if (!param->iopf_param) { > + list_add(&iopf_param->queue_list, &queue->devices); > + param->iopf_param = iopf_param; > + ret = 0; > + } > + mutex_unlock(¶m->lock); > + mutex_unlock(&queue->lock); > + > + if (ret) > + kfree(iopf_param); > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(iopf_queue_add_device); > + > +/** > + * iopf_queue_remove_device - Remove producer from fault queue > + * @queue: IOPF queue > + * @dev: device to remove > + * > + * Caller makes sure that no more faults are reported for this device. > + * > + * Return: 0 on success and <0 on error. > + */ > +int iopf_queue_remove_device(struct iopf_queue *queue, struct device *dev) > +{ > + int ret = -EINVAL; > + struct iopf_fault *iopf, *next; > + struct iopf_device_param *iopf_param; > + struct dev_iommu *param = dev->iommu; > + > + if (!param || !queue) > + return -EINVAL; > + > + mutex_lock(&queue->lock); > + mutex_lock(¶m->lock); > + iopf_param = param->iopf_param; > + if (iopf_param && iopf_param->queue == queue) { > + list_del(&iopf_param->queue_list); > + param->iopf_param = NULL; > + ret = 0; > + } > + mutex_unlock(¶m->lock); > + mutex_unlock(&queue->lock); > + if (ret) > + return ret; > + > + /* Just in case some faults are still stuck */ > + list_for_each_entry_safe(iopf, next, &iopf_param->partial, list) > + kfree(iopf); > + > + kfree(iopf_param); > + > + return 0; > +} > +EXPORT_SYMBOL_GPL(iopf_queue_remove_device); > + > +/** > + * iopf_queue_alloc - Allocate and initialize a fault queue > + * @name: a unique string identifying the queue (for workqueue) > + * > + * Return: the queue on success and NULL on error. > + */ > +struct iopf_queue *iopf_queue_alloc(const char *name) > +{ > + struct iopf_queue *queue; > + > + queue = kzalloc(sizeof(*queue), GFP_KERNEL); > + if (!queue) > + return NULL; > + > + /* > + * The WQ is unordered because the low-level handler enqueues faults by > + * group. PRI requests within a group have to be ordered, but once > + * that's dealt with, the high-level function can handle groups out of > + * order. > + */ > + queue->wq = alloc_workqueue("iopf_queue/%s", WQ_UNBOUND, 0, name); > + if (!queue->wq) { > + kfree(queue); > + return NULL; > + } > + > + INIT_LIST_HEAD(&queue->devices); > + mutex_init(&queue->lock); > + > + return queue; > +} > +EXPORT_SYMBOL_GPL(iopf_queue_alloc); > + > +/** > + * iopf_queue_free - Free IOPF queue > + * @queue: queue to free > + * > + * Counterpart to iopf_queue_alloc(). The driver must not be queuing faults or > + * adding/removing devices on this queue anymore. > + */ > +void iopf_queue_free(struct iopf_queue *queue) > +{ > + struct iopf_device_param *iopf_param, *next; > + > + if (!queue) > + return; > + > + list_for_each_entry_safe(iopf_param, next, &queue->devices, queue_list) > + iopf_queue_remove_device(queue, iopf_param->dev); > + > + destroy_workqueue(queue->wq); > + kfree(queue); > +} > +EXPORT_SYMBOL_GPL(iopf_queue_free); > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel