All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Laight <David.Laight@ACULAB.COM>
To: "'Jonas Dreßler'" <verdre@v0yd.nl>,
	"Amitkumar Karwar" <amitkarwar@gmail.com>,
	"Ganapathi Bhat" <ganapathi017@gmail.com>,
	"Xinming Hu" <huxinming820@gmail.com>,
	"Kalle Valo" <kvalo@codeaurora.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Jakub Kicinski" <kuba@kernel.org>
Cc: "Tsuchiya Yuto" <kitakar@gmail.com>,
	"linux-wireless@vger.kernel.org" <linux-wireless@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"Maximilian Luz" <luzmaximilian@gmail.com>,
	"Andy Shevchenko" <andriy.shevchenko@linux.intel.com>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Pali Rohár" <pali@kernel.org>,
	"Heiner Kallweit" <hkallweit1@gmail.com>,
	"Johannes Berg" <johannes@sipsolutions.net>,
	"Brian Norris" <briannorris@chromium.org>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: RE: [PATCH v2 1/2] mwifiex: Use non-posted PCI write when setting TX ring write pointer
Date: Wed, 22 Sep 2021 14:03:25 +0000	[thread overview]
Message-ID: <8f65f41a807c46d496bf1b45816077e4@AcuMS.aculab.com> (raw)
In-Reply-To: <20210914114813.15404-2-verdre@v0yd.nl>

From: Jonas Dreßler
> Sent: 14 September 2021 12:48
> 
> On the 88W8897 card it's very important the TX ring write pointer is
> updated correctly to its new value before setting the TX ready
> interrupt, otherwise the firmware appears to crash (probably because
> it's trying to DMA-read from the wrong place). The issue is present in
> the latest firmware version 15.68.19.p21 of the pcie+usb card.
> 
> Since PCI uses "posted writes" when writing to a register, it's not
> guaranteed that a write will happen immediately. That means the pointer
> might be outdated when setting the TX ready interrupt, leading to
> firmware crashes especially when ASPM L1 and L1 substates are enabled
> (because of the higher link latency, the write will probably take
> longer).
> 
> So fix those firmware crashes by always using a non-posted write for
> this specific register write. We do that by simply reading back the
> register after writing it, just as a few other PCI drivers do.
> 
> This fixes a bug where during rx/tx traffic and with ASPM L1 substates
> enabled (the enabled substates are platform dependent), the firmware
> crashes and eventually a command timeout appears in the logs.

I think you need to change your terminology.
PCIe does have some non-posted write transactions - but I can't
remember when they are used.

What you need to say is that you are flushing the PCIe posted
writes in order to avoid a timing 'issue' setting the TX ring
write pointer.

Quite where the bug is, and why the read-back actually fixes
it is another matter.

A typical ethernet transmit needs three things written
in the correct order (as seen by the hardware):

1) The transmit frame data.
2) The descriptor ring entry referring to the frame.
3) The 'prod' of the MAC engine to process the frame.

You seems to also have:
2.5) Write the TX ring write pointer to the MAC engine.

The updates of (1) and (2) are normally handles by DMA coherent
memory or cache flushes done by using the DMA APIs.

If the writes for (2.5) and (3) are both writing to the
PCIe card (which seems likely) then the PCIe spec will
guarantee that they happen in the correct order.

This means that the PCIe readback of the (2.5) write doesn't
have any effect on the order of the bus cycles seen by the card.
So flushing the PCIe write isn't what fixes your problem.

The readback between (2.5) and (3) does have two effects:
a) it adds a short delay between the two writes.
b) it (probably) forces the first write to by flushed through
   any posted-write buffers on the card itself.

It may well be that the card has separate posted write buffers
for different parts of the hardware.
In that case the write (3) might get actioned before the write (2.5).
OTOH you'd expect that to only cause packet transmit to be delayed.

If the write (2.5) ends up being non-atomic (ie a 64bit write
converted to multiple 8 bit writes internally) then you'll hit
problems if the mac engine looks at the register while it is
being changed just after transmitting the previous packet.
(ie when the tx starts before write (3) because the tx logic
is active.)

The other horrid possibility is that you have a truly broken
PCIe slave that corrupts its posted-write buffer when a second
write arrives.
If that is actually true then you may need to also add locks
to ensure that multiple threads cannot do writes at the same time.
Or do all (and I mean all) accesses from a single thread/context.

The latter problem reminds me of a PCI card that got terribly
confused if it saw a read request from a 2nd cpu while generating
'cycle rerun' responses to an earlier read request.

Most code that flushes posted writes only needs to do so for
writes that drop level-sensitive interrupt requests.
Failure to flush those can lead to unexpected interrupts.
That problem goes back to VMEbus sunos (amongst others).

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

  parent reply	other threads:[~2021-09-22 14:03 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-14 11:48 [PATCH v2 0/2] mwifiex: Work around firmware bugs on 88W8897 chip Jonas Dreßler
2021-09-14 11:48 ` [PATCH v2 1/2] mwifiex: Use non-posted PCI write when setting TX ring write pointer Jonas Dreßler
2021-09-22 11:17   ` Andy Shevchenko
2021-09-22 12:08     ` Jonas Dreßler
2021-09-22 13:22       ` Andy Shevchenko
2021-09-22 14:03   ` David Laight [this message]
2021-09-22 14:27     ` Pali Rohár
2021-09-22 15:54       ` David Laight
2021-09-30 14:27         ` Jonas Dreßler
2021-10-06 16:01           ` Jonas Dreßler
2021-09-14 11:48 ` [PATCH v2 2/2] mwifiex: Try waking the firmware until we get an interrupt Jonas Dreßler
2021-09-22 11:19   ` Andy Shevchenko
2021-09-30 18:04     ` Jonas Dreßler
2021-09-30 20:58       ` Andy Shevchenko
2021-09-30 21:07         ` Jonas Dreßler
2021-09-30 21:16           ` Andy Shevchenko
2021-10-03  9:18   ` Jonas Dreßler
2021-10-04 17:52     ` Brian Norris
2021-09-27 20:30 ` [PATCH v2 0/2] mwifiex: Work around firmware bugs on 88W8897 chip Brian Norris

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8f65f41a807c46d496bf1b45816077e4@AcuMS.aculab.com \
    --to=david.laight@aculab.com \
    --cc=amitkarwar@gmail.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=briannorris@chromium.org \
    --cc=davem@davemloft.net \
    --cc=ganapathi017@gmail.com \
    --cc=hkallweit1@gmail.com \
    --cc=huxinming820@gmail.com \
    --cc=johannes@sipsolutions.net \
    --cc=kitakar@gmail.com \
    --cc=kuba@kernel.org \
    --cc=kvalo@codeaurora.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=luzmaximilian@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pali@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=verdre@v0yd.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.