From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756120AbeASQXE (ORCPT ); Fri, 19 Jan 2018 11:23:04 -0500 Received: from mail-qt0-f180.google.com ([209.85.216.180]:43820 "EHLO mail-qt0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755734AbeASQWx (ORCPT ); Fri, 19 Jan 2018 11:22:53 -0500 X-Google-Smtp-Source: ACJfBouzpGQTHrgfHa/RieKbts2RVhnqYblstUCI9F+1YSwJ9nFz16ex6JLlGkUOu91B+GkHnBGj5WtKSvq7zAAFr2M= MIME-Version: 1.0 In-Reply-To: <20180119133648.s5nbm4gvby6c33av@f1.synalogic.ca> References: <20180118065054.29844-1-bpoirier@suse.com> <20180119085952.u63kius4ud34lleq@f1.synalogic.ca> <20180119133648.s5nbm4gvby6c33av@f1.synalogic.ca> From: Alexander Duyck Date: Fri, 19 Jan 2018 08:22:52 -0800 Message-ID: Subject: Re: [Intel-wired-lan] [RFC PATCH] e1000e: Remove Other from EIAC. To: Benjamin Poirier Cc: Shrikrishna Khare , Jeff Kirsher , Netdev , intel-wired-lan , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 19, 2018 at 5:36 AM, Benjamin Poirier wrote: > On 2018/01/19 17:59, Benjamin Poirier wrote: >> On 2018/01/18 07:51, Alexander Duyck wrote: >> > On Wed, Jan 17, 2018 at 10:50 PM, Benjamin Poirier wrote: >> > > It was reported that emulated e1000e devices in vmware esxi 6.5 Build >> > > 7526125 do not link up after commit 4aea7a5c5e94 ("e1000e: Avoid receiver >> > > overrun interrupt bursts", v4.15-rc1). Some tracing shows that after >> > > e1000e_trigger_lsc() is called, ICR reads out as 0x0 in e1000_msix_other() >> > > on emulated e1000e devices. In comparison, on real e1000e 82574 hardware, >> > > icr=0x80000004 (_INT_ASSERTED | _OTHER) in the same situation. >> > >> > Isn't 0x80000004 (_INT_ASSERTED | _LSC)? The assumption I based my >> >> Yes. The numeric value is correct. I made a mistake when writing down >> the flag names. >> >> > patch on was that the VMware code was sending _OTHER instead of _LSC >> > to trigger LSC events. As such in my version of the workaround I just >> >> It's not so deterministic, sadly. In my tests, upon entry into >> e1000_msix_other() after e1000e_trigger_lsc(), with no workaround patch >> I've seen: >> icr=0x0 >> icr=0x3d >> Reserved RXDMT0 Reserved LSC TXDW >> icr=0x46 >> RXO LSC TXQE >> and someone else reported: >> icr=0x3c >> Reserved RXDMT0 Reserved LSC >> >> > went through and did the testing if the _RXO bit was set, otherwise I >> > assumed that whatever event was received must have been meant to >> > trigger an _LSC type event since that worked in the past. > ^... > > Re-reading that part, my thoughts are that it worked in the past, before > 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1), > because (presumably) Other was not set in EIAC. It worked after > 16ecba59bc33 but before 4aea7a5c5e94 ("e1000e: Avoid receiver overrun > interrupt bursts", v4.15-rc1) because e1000_msix_other() didn't read the > value of icr. If it had, it would've found a bogus value, which is > what's happening after 4aea7a5c5e94. I wonder if we're not just getting > some uninitialized value from the emulation code... which makes me kind > of uneasy about your approach of trying to make sense of the value. > Maybe Shrikrishna can clarify where the icr value is coming from when > Other is set in EIAC. For now I still say we let my current patch go as-is in order to address the immediate issue. We can follow-up and do a more refined version of things once we get the final word from VMware on how this actually works. If nothing else the current patch appears to resolve the currently reported issue and is already submitted. I'm of the mind that we need to cut down on the code thrash. This driver is supposed to have been in a "maintenance" mode for the last year or so as there aren't being any new parts added is my understanding. As-is I don't see any reason why 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt", v4.5-rc1) was submitted or accepted in the first place. I don't see any notes about it fixing any bug or addressing any issue and it seems like that is the start of all the issues we have been having recently with RXO triggering more interrupts, various link issues, and this most recent VMware issue. - Alex