From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753911AbdLGMx3 (ORCPT ); Thu, 7 Dec 2017 07:53:29 -0500 Received: from foss.arm.com ([217.140.101.70]:50502 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753848AbdLGMx0 (ORCPT ); Thu, 7 Dec 2017 07:53:26 -0500 Subject: Re: [PATCH v3 15/16] iommu: introduce page response function To: Jacob Pan Cc: "iommu@lists.linux-foundation.org" , LKML , Joerg Roedel , David Woodhouse , Greg Kroah-Hartman , Rafael Wysocki , Alex Williamson , Lan Tianyu , Jean Delvare , Will Deacon , "Kumar, Sanjay K" References: <1510944914-54430-1-git-send-email-jacob.jun.pan@linux.intel.com> <1510944914-54430-16-git-send-email-jacob.jun.pan@linux.intel.com> <93661c1c-2d3b-295f-0b9d-52e50ea9e1d0@arm.com> <20171204133715.50c45136@jacob-builder> <20171206112521.1edf8e9b@jacob-builder> From: Jean-Philippe Brucker Message-ID: <39fcbbd2-2e6a-f05a-8cb4-8e3ad4ead369@arm.com> Date: Thu, 7 Dec 2017 12:56:55 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <20171206112521.1edf8e9b@jacob-builder> Content-Type: text/plain; charset=windows-1252 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/12/17 19:25, Jacob Pan wrote: [...] >> For SMMUv3, the stall buffer may be shared between devices on some >> implementations, in which case the guest could prevent other >> devices to stall by letting the buffer fill up. >> -> We might have to keep track of stalls in the host driver and set >> a credit or timeout to each stall, if it comes to that. >> -> In addition, send a terminate-all-stalls command when changing >> the device's domain. >> > We have the same situation in VT-d with shared queue which in turn may > affect other guests. Letting host driver maintain record of pending page > request seems the best way to go. VT-d has a way to drain PRQ per PASID > and RID combination. I guess this is the same as your > "terminate-all-stalls" but with finer control? Or > "terminate-all-stalls" only applies to a given device. That command terminates all stalls for a given device (for all PASIDs). It's a bit awkward to implement but should be enough to ensure that we don't leak any outstanding faults to the next VM. > Seems we can implement a generic timeout/credit mechanism in IOMMU > driver with model specific action to drain/terminate. The timeout value > can also be model specific. Sounds good. Timeout seems a bit complicated to implement (and how do we guess what timeout would work?), so maybe it's simpler to enforce a quota of outstanding faults per VM, for example half of the shared queue size (the number can be chosen by the IOMMU driver). If a VM has that many outstanding faults, then any new fault is immediately terminated by the host. A bit rough but it might be enough to mitigate the problem initially, and we can always tweak it later (for instance disable faulting if a guest doesn't ever reply). Seems like VFIO should enforce this quota, since the IOMMU layer doesn't know which device is assigned to which VM. If it's the IOMMU that enforces quotas per device and a VM has 15 devices assigned, then the guest can still DoS the IOMMU. [...] >>> + * @type: group or stream response >> >> The page request doesn't provide this information >> > this is vt-d specific. it is in the vt-d page request descriptor and > response descriptors are different depending on the type. > Since we intend the generic data to be super set of models, I add this > field. But don't you need to add the stream type to enum iommu_fault_type, in patch 8? Otherwise the guest can't know what type to set in the response. Thanks, Jean From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jean-Philippe Brucker Subject: Re: [PATCH v3 15/16] iommu: introduce page response function Date: Thu, 7 Dec 2017 12:56:55 +0000 Message-ID: <39fcbbd2-2e6a-f05a-8cb4-8e3ad4ead369@arm.com> References: <1510944914-54430-1-git-send-email-jacob.jun.pan@linux.intel.com> <1510944914-54430-16-git-send-email-jacob.jun.pan@linux.intel.com> <93661c1c-2d3b-295f-0b9d-52e50ea9e1d0@arm.com> <20171204133715.50c45136@jacob-builder> <20171206112521.1edf8e9b@jacob-builder> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20171206112521.1edf8e9b@jacob-builder> Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Jacob Pan Cc: Lan Tianyu , Greg Kroah-Hartman , Rafael Wysocki , Will Deacon , LKML , "iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org" , Jean Delvare , "Kumar, Sanjay K" , David Woodhouse List-Id: iommu@lists.linux-foundation.org On 06/12/17 19:25, Jacob Pan wrote: [...] >> For SMMUv3, the stall buffer may be shared between devices on some >> implementations, in which case the guest could prevent other >> devices to stall by letting the buffer fill up. >> -> We might have to keep track of stalls in the host driver and set >> a credit or timeout to each stall, if it comes to that. >> -> In addition, send a terminate-all-stalls command when changing >> the device's domain. >> > We have the same situation in VT-d with shared queue which in turn may > affect other guests. Letting host driver maintain record of pending page > request seems the best way to go. VT-d has a way to drain PRQ per PASID > and RID combination. I guess this is the same as your > "terminate-all-stalls" but with finer control? Or > "terminate-all-stalls" only applies to a given device. That command terminates all stalls for a given device (for all PASIDs). It's a bit awkward to implement but should be enough to ensure that we don't leak any outstanding faults to the next VM. > Seems we can implement a generic timeout/credit mechanism in IOMMU > driver with model specific action to drain/terminate. The timeout value > can also be model specific. Sounds good. Timeout seems a bit complicated to implement (and how do we guess what timeout would work?), so maybe it's simpler to enforce a quota of outstanding faults per VM, for example half of the shared queue size (the number can be chosen by the IOMMU driver). If a VM has that many outstanding faults, then any new fault is immediately terminated by the host. A bit rough but it might be enough to mitigate the problem initially, and we can always tweak it later (for instance disable faulting if a guest doesn't ever reply). Seems like VFIO should enforce this quota, since the IOMMU layer doesn't know which device is assigned to which VM. If it's the IOMMU that enforces quotas per device and a VM has 15 devices assigned, then the guest can still DoS the IOMMU. [...] >>> + * @type: group or stream response >> >> The page request doesn't provide this information >> > this is vt-d specific. it is in the vt-d page request descriptor and > response descriptors are different depending on the type. > Since we intend the generic data to be super set of models, I add this > field. But don't you need to add the stream type to enum iommu_fault_type, in patch 8? Otherwise the guest can't know what type to set in the response. Thanks, Jean