From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C3EAC433DF for ; Wed, 20 May 2020 06:42:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 582A1205CB for ; Wed, 20 May 2020 06:42:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 582A1205CB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0723C900003; Wed, 20 May 2020 02:42:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F3C9C900002; Wed, 20 May 2020 02:42:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E2C36900003; Wed, 20 May 2020 02:42:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0036.hostedemail.com [216.40.44.36]) by kanga.kvack.org (Postfix) with ESMTP id CB6BB900002 for ; Wed, 20 May 2020 02:42:33 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 98FA42C98 for ; Wed, 20 May 2020 06:42:33 +0000 (UTC) X-FDA: 76836153786.09.jewel41_8aa2ff0be4303 X-HE-Tag: jewel41_8aa2ff0be4303 X-Filterd-Recvd-Size: 12137 Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Wed, 20 May 2020 06:42:32 +0000 (UTC) IronPort-SDR: 0MWae0oZEfd7trjoDzK9yF85CxIiCFdFz/ejMI/KEvCHViEm0GqxrPZ38XgHWPTyEojmKqkoTS 8J680IC0F4/Q== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 May 2020 23:42:30 -0700 IronPort-SDR: 5wJiWe6pVWB7hyYqiXH2AXwf1QAOQQe/lHI6qJbxXfWjp/3BknyEKxiMDcLSTdLPsWEPoOCHP+ EBdQFFjpDghA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,413,1583222400"; d="scan'208";a="282582095" Received: from blu2-mobl3.ccr.corp.intel.com (HELO [10.255.28.165]) ([10.255.28.165]) by orsmga002.jf.intel.com with ESMTP; 19 May 2020 23:42:22 -0700 Cc: baolu.lu@linux.intel.com, joro@8bytes.org, catalin.marinas@arm.com, will@kernel.org, robin.murphy@arm.com, kevin.tian@intel.com, Jonathan.Cameron@huawei.com, jacob.jun.pan@linux.intel.com, christian.koenig@amd.com, felix.kuehling@amd.com, zhangfei.gao@linaro.org, jgg@ziepe.ca, xuzaibo@huawei.com, fenghua.yu@intel.com, hch@infradead.org Subject: Re: [PATCH v7 04/24] iommu: Add a page fault handler To: Jean-Philippe Brucker , iommu@lists.linux-foundation.org, devicetree@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-pci@vger.kernel.org, linux-mm@kvack.org References: <20200519175502.2504091-1-jean-philippe@linaro.org> <20200519175502.2504091-5-jean-philippe@linaro.org> From: Lu Baolu Message-ID: Date: Wed, 20 May 2020 14:42:21 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 MIME-Version: 1.0 In-Reply-To: <20200519175502.2504091-5-jean-philippe@linaro.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Jean, On 2020/5/20 1:54, Jean-Philippe Brucker wrote: > Some systems allow devices to handle I/O Page Faults in the core mm. For > example systems implementing the PCIe PRI extension or Arm SMMU stall > model. Infrastructure for reporting these recoverable page faults was > added to the IOMMU core by commit 0c830e6b3282 ("iommu: Introduce device > fault report API"). Add a page fault handler for host SVA. > > IOMMU driver can now instantiate several fault workqueues and link them > to IOPF-capable devices. Drivers can choose between a single global > workqueue, one per IOMMU device, one per low-level fault queue, one per > domain, etc. > > When it receives a fault event, supposedly in an IRQ handler, the IOMMU > driver reports the fault using iommu_report_device_fault(), which calls > the registered handler. The page fault handler then calls the mm fault > handler, and reports either success or failure with iommu_page_response(). > When the handler succeeded, the IOMMU retries the access. > > The iopf_param pointer could be embedded into iommu_fault_param. But > putting iopf_param into the iommu_param structure allows us not to care > about ordering between calls to iopf_queue_add_device() and > iommu_register_device_fault_handler(). > > Signed-off-by: Jean-Philippe Brucker > --- > v6->v7: Fix leak in iopf_queue_discard_partial() > --- > drivers/iommu/Kconfig | 4 + > drivers/iommu/Makefile | 1 + > include/linux/iommu.h | 51 +++++ > drivers/iommu/io-pgfault.c | 459 +++++++++++++++++++++++++++++++++++++ > 4 files changed, 515 insertions(+) > create mode 100644 drivers/iommu/io-pgfault.c > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > index d9fa5b410015..15e9dc4e503c 100644 > --- a/drivers/iommu/Kconfig > +++ b/drivers/iommu/Kconfig > @@ -107,6 +107,10 @@ config IOMMU_SVA > bool > select IOASID > > +config IOMMU_PAGE_FAULT > + bool > + select IOMMU_SVA > + > config FSL_PAMU > bool "Freescale IOMMU support" > depends on PCI > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile > index 40c800dd4e3e..bf5cb4ee8409 100644 > --- a/drivers/iommu/Makefile > +++ b/drivers/iommu/Makefile > @@ -4,6 +4,7 @@ obj-$(CONFIG_IOMMU_API) += iommu-traces.o > obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o > obj-$(CONFIG_IOMMU_DEBUGFS) += iommu-debugfs.o > obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o > +obj-$(CONFIG_IOMMU_PAGE_FAULT) += io-pgfault.o > obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o > obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o > obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o [SNIP] > + > +static enum iommu_page_response_code > +iopf_handle_single(struct iopf_fault *iopf) > +{ > + vm_fault_t ret; > + struct mm_struct *mm; > + struct vm_area_struct *vma; > + unsigned int access_flags = 0; > + unsigned int fault_flags = FAULT_FLAG_REMOTE; > + struct iommu_fault_page_request *prm = &iopf->fault.prm; > + enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID; > + > + if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID)) > + return status; > + > + mm = iommu_sva_find(prm->pasid); > + if (IS_ERR_OR_NULL(mm)) > + return status; > + > + down_read(&mm->mmap_sem); > + > + vma = find_extend_vma(mm, prm->addr); > + if (!vma) > + /* Unmapped area */ > + goto out_put_mm; > + > + if (prm->perm & IOMMU_FAULT_PERM_READ) > + access_flags |= VM_READ; > + > + if (prm->perm & IOMMU_FAULT_PERM_WRITE) { > + access_flags |= VM_WRITE; > + fault_flags |= FAULT_FLAG_WRITE; > + } > + > + if (prm->perm & IOMMU_FAULT_PERM_EXEC) { > + access_flags |= VM_EXEC; > + fault_flags |= FAULT_FLAG_INSTRUCTION; > + } > + > + if (!(prm->perm & IOMMU_FAULT_PERM_PRIV)) > + fault_flags |= FAULT_FLAG_USER; > + > + if (access_flags & ~vma->vm_flags) > + /* Access fault */ > + goto out_put_mm; > + > + ret = handle_mm_fault(vma, prm->addr, fault_flags); > + status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID : Do you mind telling why it's IOMMU_PAGE_RESP_INVALID but not IOMMU_PAGE_RESP_FAILURE? > + IOMMU_PAGE_RESP_SUCCESS; > + > +out_put_mm: > + up_read(&mm->mmap_sem); > + mmput(mm); > + > + return status; > +} > + > +static void iopf_handle_group(struct work_struct *work) > +{ > + struct iopf_group *group; > + struct iopf_fault *iopf, *next; > + enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS; > + > + group = container_of(work, struct iopf_group, work); > + > + list_for_each_entry_safe(iopf, next, &group->faults, list) { > + /* > + * For the moment, errors are sticky: don't handle subsequent > + * faults in the group if there is an error. > + */ > + if (status == IOMMU_PAGE_RESP_SUCCESS) > + status = iopf_handle_single(iopf); > + > + if (!(iopf->fault.prm.flags & > + IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) > + kfree(iopf); > + } > + > + iopf_complete_group(group->dev, &group->last_fault, status); > + kfree(group); > +} > + > +/** > + * iommu_queue_iopf - IO Page Fault handler > + * @evt: fault event @fault? > + * @cookie: struct device, passed to iommu_register_device_fault_handler. > + * > + * Add a fault to the device workqueue, to be handled by mm. > + * > + * This module doesn't handle PCI PASID Stop Marker; IOMMU drivers must discard > + * them before reporting faults. A PASID Stop Marker (LRW = 0b100) doesn't > + * expect a response. It may be generated when disabling a PASID (issuing a > + * PASID stop request) by some PCI devices. > + * > + * The PASID stop request is issued by the device driver before unbind(). Once > + * it completes, no page request is generated for this PASID anymore and > + * outstanding ones have been pushed to the IOMMU (as per PCIe 4.0r1.0 - 6.20.1 > + * and 10.4.1.2 - Managing PASID TLP Prefix Usage). Some PCI devices will wait > + * for all outstanding page requests to come back with a response before > + * completing the PASID stop request. Others do not wait for page responses, and > + * instead issue this Stop Marker that tells us when the PASID can be > + * reallocated. > + * > + * It is safe to discard the Stop Marker because it is an optimization. > + * a. Page requests, which are posted requests, have been flushed to the IOMMU > + * when the stop request completes. > + * b. We flush all fault queues on unbind() before freeing the PASID. > + * > + * So even though the Stop Marker might be issued by the device *after* the stop > + * request completes, outstanding faults will have been dealt with by the time > + * we free the PASID. > + * > + * Return: 0 on success and <0 on error. > + */ > +int iommu_queue_iopf(struct iommu_fault *fault, void *cookie) > +{ > + int ret; > + struct iopf_group *group; > + struct iopf_fault *iopf, *next; > + struct iopf_device_param *iopf_param; > + > + struct device *dev = cookie; > + struct dev_iommu *param = dev->iommu; > + > + lockdep_assert_held(¶m->lock); > + > + if (fault->type != IOMMU_FAULT_PAGE_REQ) > + /* Not a recoverable page fault */ > + return -EOPNOTSUPP; > + > + /* > + * As long as we're holding param->lock, the queue can't be unlinked > + * from the device and therefore cannot disappear. > + */ > + iopf_param = param->iopf_param; > + if (!iopf_param) > + return -ENODEV; > + > + if (!(fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) { > + iopf = kzalloc(sizeof(*iopf), GFP_KERNEL); > + if (!iopf) > + return -ENOMEM; > + > + iopf->fault = *fault; > + > + /* Non-last request of a group. Postpone until the last one */ > + list_add(&iopf->list, &iopf_param->partial); > + > + return 0; > + } > + > + group = kzalloc(sizeof(*group), GFP_KERNEL); > + if (!group) { > + /* > + * The caller will send a response to the hardware. But we do > + * need to clean up before leaving, otherwise partial faults > + * will be stuck. > + */ > + ret = -ENOMEM; > + goto cleanup_partial; > + } > + > + group->dev = dev; > + group->last_fault.fault = *fault; > + INIT_LIST_HEAD(&group->faults); > + list_add(&group->last_fault.list, &group->faults); > + INIT_WORK(&group->work, iopf_handle_group); > + > + /* See if we have partial faults for this group */ > + list_for_each_entry_safe(iopf, next, &iopf_param->partial, list) { > + if (iopf->fault.prm.grpid == fault->prm.grpid) > + /* Insert *before* the last fault */ > + list_move(&iopf->list, &group->faults); > + } > + > + queue_work(iopf_param->queue->wq, &group->work); > + return 0; > + > +cleanup_partial: > + list_for_each_entry_safe(iopf, next, &iopf_param->partial, list) { > + if (iopf->fault.prm.grpid == fault->prm.grpid) { > + list_del(&iopf->list); > + kfree(iopf); > + } > + } > + return ret; > +} > +EXPORT_SYMBOL_GPL(iommu_queue_iopf); [SNIP] > + > + > +/** > + * iopf_queue_add_device - Add producer to the fault queue > + * @queue: IOPF queue > + * @dev: device to add > + * > + * Return: 0 on success and <0 on error. > + */ > +int iopf_queue_add_device(struct iopf_queue *queue, struct device *dev) > +{ > + int ret = -EBUSY; > + struct iopf_device_param *iopf_param; > + struct dev_iommu *param = dev->iommu; > + > + if (!param) > + return -ENODEV; > + > + iopf_param = kzalloc(sizeof(*iopf_param), GFP_KERNEL); > + if (!iopf_param) > + return -ENOMEM; > + > + INIT_LIST_HEAD(&iopf_param->partial); > + iopf_param->queue = queue; iopf_param->dev = dev; Two lines? > + > + mutex_lock(&queue->lock); > + mutex_lock(¶m->lock); > + if (!param->iopf_param) { > + list_add(&iopf_param->queue_list, &queue->devices); > + param->iopf_param = iopf_param; > + ret = 0; > + } > + mutex_unlock(¶m->lock); > + mutex_unlock(&queue->lock); > + > + if (ret) > + kfree(iopf_param); > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(iopf_queue_add_device); > + [SNIP] Best regards, baolu