From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A7277C54E58 for ; Mon, 25 Mar 2024 21:44:18 +0000 (UTC) Received: from list by lists.xenproject.org with outflank-mailman.697898.1089143 (Exim 4.92) (envelope-from ) id 1ros6v-0001sW-Aa; Mon, 25 Mar 2024 21:44:05 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 697898.1089143; Mon, 25 Mar 2024 21:44:05 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1ros6v-0001sP-85; Mon, 25 Mar 2024 21:44:05 +0000 Received: by outflank-mailman (input) for mailman id 697898; Mon, 25 Mar 2024 21:44:04 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1ros6u-0001sJ-2F for xen-devel@lists.xenproject.org; Mon, 25 Mar 2024 21:44:04 +0000 Received: from mailhost.m5p.com (mailhost.m5p.com [74.104.188.4]) by se1-gles-flk1.inumbo.com (Halon) with ESMTPS id c9f9bfc9-eaf0-11ee-a1ef-f123f15fe8a2; Mon, 25 Mar 2024 22:44:01 +0100 (CET) Received: from m5p.com (mailhost.m5p.com [IPv6:2001:470:1f07:15ff:0:0:0:f7]) by mailhost.m5p.com (8.17.1/8.15.2) with ESMTPS id 42PLhjIc069561 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO); Mon, 25 Mar 2024 17:43:50 -0400 (EDT) (envelope-from ehem@m5p.com) Received: (from ehem@localhost) by m5p.com (8.17.1/8.15.2/Submit) id 42PLhiqK069560; Mon, 25 Mar 2024 14:43:44 -0700 (PDT) (envelope-from ehem) X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: c9f9bfc9-eaf0-11ee-a1ef-f123f15fe8a2 Date: Mon, 25 Mar 2024 14:43:44 -0700 From: Elliott Mitchell To: Jan Beulich Cc: xen-devel@lists.xenproject.org, Andrew Cooper , Roger Pau =?iso-8859-1?Q?Monn=E9?= , Wei Liu , Kelly Choi Subject: Re: Serious AMD-Vi(?) issue Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Mon, Mar 25, 2024 at 08:55:56AM +0100, Jan Beulich wrote: > On 22.03.2024 20:22, Elliott Mitchell wrote: > > On Fri, Mar 22, 2024 at 04:41:45PM +0000, Kelly Choi wrote: > >> > >> I can see you've recently engaged with our community with some issues you'd > >> like help with. > >> We love the fact you are participating in our project, however, our > >> developers aren't able to help if you do not provide the specific details. > > > > Please point to specific details which have been omitted. Fairly little > > data has been provided as fairly little data is available. The primary > > observation is large numbers of: > > > > (XEN) AMD-Vi: IO_PAGE_FAULT: DDDD:bb:dd.f d0 addr ffffff???????000 flags 0x8 I > > > > Lines in Xen's ring buffer. > > Yet this is (part of) the problem: By providing only the messages that appear > relevant to you, you imply that you know that no other message is in any way > relevant. That's judgement you'd better leave to people actually trying to > investigate. Unless of course you were proposing an actual code change, with > suitable justification. Honestly, I forgot about the very small number of messages from the SATA subsystem. The question of whether the current mitigation actions are effective right now was a bigger issue. As such monitoring `xl dmesg` was a priority to looking at SATA messages which failed to reliably indicate status. I *thought* I would be able to retrieve those via other slow means, but a different and possibly overlapping issue has shown up. Unfortunately this means those are no longer retrievable. :-( > In fact when running into trouble, the usual course of action would be to > increase verbosity in both hypervisor and kernel, just to make sure no > potentially relevant message is missed. More/better information might have been obtained if I'd been engaged earlier. > > The most overt sign was telling the Linux kernel to scan for > > inconsistencies and the kernel finding some. The domain didn't otherwise > > appear to notice trouble. > > > > This is from memory, it would take some time to discover whether any > > messages were missed. Present mitigation action is inhibiting the > > messages, but the trouble is certainly still lurking. > > Iirc you were considering whether any of this might be a timing issue. Yet > beyond voicing that suspicion, you didn't provide any technical details as > to why you think so. Such technical details would include taking into > account how IOMMU mappings and associated IOMMU TLB flushing are carried > out. Right now, to me at least, your speculation in this regard fails > basic sanity checking. Therefore the scenario that you're thinking of > would need better describing, imo. True. Mostly I'm analyzing the known information and considering what the patterns suggest. Presently I'm aware of two reports (Imre Szőllősi and mine). Both of these feature AMD processor machines. Could be people with AMD processors are less trustful of flash storage or could be an AMD-only IOMMU issue. Ideally someone would test and confirm there is no issue with Linux software RAID1 on flash on an Intel machine. Both reports feature two flash storage devices being run through Linux MD RAID1. Could be the MD RAID1 subsystem is abusing the DMA interface in some fashion. While Imre Szőllősi reported this not occuring with a single device, the report does not explicitly state whether that was a degenerate RAID1 versus non-RAID. I'm unaware of any testing with 3x devices in RAID1. Both reports feature Samsung SATA flash devices. My case also includes a Crucial NVMe device. My case also features a Crucial SATA flash device for which the problem did NOT occur. So the question becomes, why did the problem not occur for this Crucial SATA device? According to the specifications, the Crucial SATA device is roughly on par with the Samsung SATA devices in terms of read/write speeds. The NVMe device's specifications are massively better. What comes to mind is the Crucial SATA device might have higher latency before executing commands. Specifications don't mention command execution latency, so it isn't possible to know whether this is the issue. Yes, latency/timing is speculation. Does seem a good fit for the pattern though. This could be a Linux MD RAID1 bug or a Xen bug. Unfortunately data loss is a very serious type of bug so I'm highly reluctant to let go of mitigations without hope for progress. -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg@m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445