linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Liran Alon <liran.alon@oracle.com>
To: Liran Alon <liran.alon@oracle.com>
Cc: "Jason Gunthorpe" <jgg@ziepe.ca>, "Will Deacon" <will@kernel.org>,
	saeedm@mellanox.com, leon@kernel.org, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, eli@mellanox.com,
	tariqt@mellanox.com, danielm@mellanox.com,
	"Håkon Bugge" <haakon.bugge@oracle.com>
Subject: Re: [PATCH] net: mlx5: Use writeX() to ring doorbell and remove reduntant wmb()
Date: Fri, 3 Jan 2020 20:38:51 +0200	[thread overview]
Message-ID: <1C8D5596-F9AD-4E9F-B462-D63DCEEFFE54@oracle.com> (raw)
In-Reply-To: <F7C45792-2F17-42AE-88A2-F744EEAD68A5@oracle.com>



> On 3 Jan 2020, at 18:31, Liran Alon <liran.alon@oracle.com> wrote:
> 
> 
> 
>> On 3 Jan 2020, at 15:37, Jason Gunthorpe <jgg@ziepe.ca> wrote:
>> 
>> On Fri, Jan 03, 2020 at 12:21:06AM +0200, Liran Alon wrote:
>> 
>>>> AFAIK WC is largely unspecified by the memory model. Is wmb() even
>>>> formally specified to interact with WC?
>>> 
>>> As I said, I haven’t seen such semantics defined in kernel
>>> documentation such as memory-barriers.txt.  However, in practice, it
>>> does flush WC buffers. At least for x86 and ARM which I’m familiar
>>> enough with.  I think it’s reasonable to assume that wmb() should
>>> flush WC buffers while dma_wmb()/smp_wmb() doesn’t necessarily have
>>> to do this.
>> 
>> It is because WC is rarely used and laregly undefined for the kernel
>> :(
> 
> Yep.
> 
>> 
>>>>>>> Therefore, change mlx5_write64() to use writeX() and remove wmb() from
>>>>>>> it's callers.
>>>>>> 
>>>>>> Yes, wmb(); writel(); is always redundant
>>>>> 
>>>>> Well, unfortunately not…
>>>>> See: https://urldefense.proofpoint.com/v2/url?u=https-3A__marc.info_-3Fl-3Dlinux-2Dnetdev-26m-3D157798859215697-26w-3D2&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=Ox1lCS1KAGBvJrf24kiFQrranIaNi_zeo05sqCUEf7Y&s=Mz6MJzUQ862DGjgGnj3neX4ZpjI88nOI9KpZhNF9TqQ&e=
>>>>> (See my suggestion to add flush_wc_writeX())
>>>> 
>>>> Well, the last time wmb & writel came up Linus was pretty clear that
>>>> writel is supposed to remain in program order and have the barriers
>>>> needed to do that.
>>> 
>>> Right. But that doesn’t take into account that WC writes are
>>> considered completed when they are still posted in CPU WC buffers.
>> 
>>> The semantics as I understand of writeX() is that it guarantees all
>>> prior writes have been completed.  It means that all prior stores
>>> have executed and that store-buffer is flushed. But it doesn’t mean
>>> that WC buffers is flushed as-well.
>> 
>> The semantic for writel is that prior program order stores will be
>> observable by DMA from the device receiving the writel. This is
>> required for UC and NC stores today. WC is undefined, I think.
>> 
>> This is why ARM has the additional barrier in writel.
> 
> Yep.
> 
>> 
>> It would logically make sense if WC followed the same rule, however,
>> adding a barrier to writel to make WC ordered would not be popular, so
>> I think we are left with using special accessors for WC and placing
>> the barrier there..
> 
> Right.
> 
>> 
>>>> IMHO you should start there before going around and adding/removing wmbs
>>>> related to WC. Update membory-barriers.txt and related with the model
>>>> ordering for WC access and get agreement.
>>> 
>>> I disagree here. It’s more important to fix a real bug (e.g. Not
>>> flushing WC buffers on x86 AMD).  Then, we can later formalise this
>>> and refactor code as necessary. Which will also optimise it as-well.
>>> Bug fix can be merged before we finish all these discussions and get
>>> agreement.
>> 
>> Is it a real bug that people actually hit? It wasn't clear from the
>> commit message. If so, sure, it should be fixed and the commit message
>> clarified. (but I'd put the wmb near the WC writes..)
> 
> I found this bug during code review. I’m not aware if AWS saw this bug happening in production.
> But according to AMD SDM and Optimization Guide SDM, this is a bug.
> 
> I think it doesn’t happen in practice because the write of the Tx descriptor + 128 first bytes of packet
> Effectively fills the relevant WC buffers and when a WC buffer is fully written to, the CPU *should*
> (Not *must*) flush the WC buffer to memory.

Actually after re-reading AMD Optimization Guide SDM, I see it is guaranteed that:
“Write-combining is closed if all 64 bytes of the write buffer are valid”.
And this is indeed always the case for AWS ENA LLQ. Because as can be seen at
ena_com_config_llq_info(), desc_list_entry_size is either 128, 192 or 256. i.e. Always
a multiple of 64 bytes. So this explains why this wasn’t an issue in production.

-Liran



  parent reply	other threads:[~2020-01-03 18:39 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-02 17:44 [PATCH] net: mlx5: Use writeX() to ring doorbell and remove reduntant wmb() Liran Alon
2020-01-02 19:29 ` Jason Gunthorpe
2020-01-02 19:45   ` Liran Alon
2020-01-02 20:58     ` Jason Gunthorpe
2020-01-02 22:21       ` Liran Alon
2020-01-03 13:37         ` Jason Gunthorpe
2020-01-03 16:31           ` Liran Alon
2020-01-03 16:36             ` Jason Gunthorpe
2020-01-03 16:42               ` Liran Alon
2020-01-03 18:38             ` Liran Alon [this message]
2020-01-03 19:06               ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1C8D5596-F9AD-4E9F-B462-D63DCEEFFE54@oracle.com \
    --to=liran.alon@oracle.com \
    --cc=danielm@mellanox.com \
    --cc=eli@mellanox.com \
    --cc=haakon.bugge@oracle.com \
    --cc=jgg@ziepe.ca \
    --cc=leon@kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@mellanox.com \
    --cc=tariqt@mellanox.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).