From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Xu, Quan" Subject: Re: [PATCH v3 0/2] VT-d flush issue Date: Tue, 22 Dec 2015 08:10:42 +0000 Message-ID: <945CA011AD5F084CBEA3E851C0AB28894B7FCEAF@SHSMSX101.ccr.corp.intel.com> References: <945CA011AD5F084CBEA3E851C0AB28894B7FBC68@SHSMSX101.ccr.corp.intel.com> <5677F4B502000078000C1D51@prv-mh.provo.novell.com> <945CA011AD5F084CBEA3E851C0AB28894B7FC2D8@SHSMSX101.ccr.corp.intel.com> <5678039102000078000C1DEA@prv-mh.provo.novell.com> <945CA011AD5F084CBEA3E851C0AB28894B7FC546@SHSMSX101.ccr.corp.intel.com> <56780B3D02000078000C1E47@prv-mh.provo.novell.com> <5679114D02000078000C219D@prv-mh.provo.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <5679114D02000078000C219D@prv-mh.provo.novell.com> Content-Language: en-US List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich , "Wu, Feng" Cc: "Tian, Kevin" , "'keir@xen.org'" , "'george.dunlap@eu.citrix.com'" , "'andrew.cooper3@citrix.com'" , "'tim@xen.org'" , "'xen-devel@lists.xen.org'" , "Nakajima, Jun" List-Id: xen-devel@lists.xenproject.org >On 22.12.2015 at 4:01pm wrote: > >>> On 22.12.15 at 08:40, wrote: > > Maybe, there are still some misunderstanding about your expectation. > > Let me summarize it here. > > > > After Quan's patch-set, there are two types of error code: > > - -EOPNOTSUPP > > Now we only support and use software way to synchronize the > > invalidation, if someone calls queue_invalidate_wait() and passes sw > > with 0, then -EOPNOTSUPP is returned (Though this cannot happen in > > real world, since > > queue_invalidate_wait() is called only in one place and 1 is passed in to 'sw'). > > So I am not sure what should we do for this return value, if we really > > get that return value, it means the flush is not actually executed, so > > the iommu state is incorrect, the data is inconsistent. Do you think > > what should we do for this case? > > Since seeing this error would be a software bug, BUG() or ASSERT() are fine to > handle this specific case, if need be. > > > - -ETIMEDOUT > > For this case, Quan has elaborate a lot, IIUIC, the main gap between > > you and Quan is you think the error code should be propagated to the > > up caller, while in Quan's implementation, he deals with this error in > > invalidate_timeout() > > and device_tlb_invalidate_timeout(), hence no need to propagated it to > > up called, since the handling policy will crash the domain, so we > > don't think propagated error code is needed. Even we propagate it, the > > up caller still doesn't need to do anything for it. > > "Handling" an error by e.g. domain_crash() doesn't mean you don't need to also > modify (or at the very least inspect) callers: They may continue doing things > _assuming_ success. Of course you don't need to domain_crash() at all layers. > But errors from lower layers should, at least in most ordinary cases, lead to > higher layers bailing instead of continuing. > For Device-TLB flush error, I think we need to propagated error code. For IEC/iotlb/context flush error, if panic is acceptable, we can ignore the propagated error code. BTW, it is very challenge / tricky to handle all Of error, and some error is unrecoverable. As mentioned, it looks like rewriting Xen hypervisor. For -EOPNOTSUPP, we can print warning message. If it supports interrupt method, we can return 0 in queue_invalidate_wait(). Feng, thanks for your update! -Quan