netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Heiner Kallweit <hkallweit1@gmail.com>
To: Eric Dumazet <eric.dumazet@gmail.com>,
	Sander Eikelenboom <linux@eikelenboom.it>,
	Realtek linux nic maintainers <nic_swsd@realtek.com>,
	Eric Dumazet <edumazet@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	netdev <netdev@vger.kernel.org>
Subject: Re: Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27!
Date: Sat, 9 Feb 2019 10:02:56 +0100	[thread overview]
Message-ID: <aa7eb814-ebf7-ffa5-ce61-317b9bbf2394@gmail.com> (raw)
In-Reply-To: <e6423f51-42cc-e49f-bb48-85ea922f01b6@gmail.com>

On 09.02.2019 00:09, Eric Dumazet wrote:
> 
> 
> On 02/08/2019 01:50 PM, Heiner Kallweit wrote:
>> On 08.02.2019 22:45, Sander Eikelenboom wrote:
>>> On 08/02/2019 22:22, Heiner Kallweit wrote:
>>>> On 08.02.2019 21:55, Sander Eikelenboom wrote:
>>>>> On 08/02/2019 19:52, Heiner Kallweit wrote:
>>>>>> On 08.02.2019 19:29, Sander Eikelenboom wrote:
>>>>>>> L.S.,
>>>>>>>
>>>>>>> While testing a linux 5.0-rc5 kernel (with some patches on top but they don't seem related) under Xen i the nasty splat below, 
>>>>>>> that I haven encountered with Linux 4.20.x.
>>>>>>>
>>>>>>> Unfortunately I haven't got a clear reproducer for this and bisecting could be nasty due to another (networking related) kernel bug.
>>>>>>>
>>>>>>> If you need more info, want me to run a debug patch etc., please feel free to ask.
>>>>>>>
>>>>>> Thanks for the report. However I see no change in the r8169 driver between
>>>>>> 4.20 and 5.0 with regard to BQL code. Having said that the root cause could
>>>>>> be somewhere else. Therefore I'm afraid a bisect will be needed.
>>>>>
>>>>> Hmm i did some diging and i think:
>>>>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers
>>>>> 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue
>>>>> 620344c43edfa020bbadfd81a144ebe5181fc94f net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue
>>>>>
>>>> You're right. Thought this was added in 4.20 already.
>>>> The BQL code pattern I copied from the mlx4 driver and so far I haven't heard about
>>>> this issue from any user of physical hw. And due to the fact that a lot of mainboards
>>>> have onboard Realtek network I have quite a few testers out there.
>>>> Does the issue occur under specific circumstances like very high load?
>>>
>>> Yep, the box is already quite contented with the Xen VM's and if I remember correctly it occurred while kernel compiling
>>> on the host.
>>>
>>>> If indeed the xmit_more patch causes the issue, I think we have to involve Eric Dumazet
>>>> as author of the underlying changes.
>>>
>>> It could also be the barriers weren't that unneeded as assumed.
>>
>> The barriers were removed after adding xmit_more handling. Therefore it would be good to
>> test also with only 
>> bd7153bd83b806bfcc2e79b7a6f43aa653d06ef3 r8169: remove unneeded mmiowb barriers
>> removed.
>>
>>> Since we are almost at RC6 i took the liberty to CC Eric now.
>>>
>> Sure, thanks.
>>
>>> BTW am i correct these patches are merely optimizations ?
>>
>> Yes
>>
>>> If so and concluding they revert cleanly, perhaps it should be considered at this point in the RC's
>>> to revert them for 5.0 and try again for 5.1 ?
>>>
>> Before removing both it would be good to test with only the barrier-removal removed.
>>
> 
> Commit 2e6eedb4813e34d8d84ac0eb3afb668966f3f356 r8169: make use of xmit_more and __netdev_sent_queue
> looks buggy to me, since the skb might have been freed already on another cpu when you call
> 
> You could try :
> 
> diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
> index 3624e67aef72c92ed6e908e2c99ac2d381210126..f907d484165d9fd775e81bf2bfb9aa4ddedb1c93 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -6070,6 +6070,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>         dma_addr_t mapping;
>         u32 opts[2], len;
>         bool stop_queue;
> +       bool door_bell;
>         int frags;
>  
>         if (unlikely(!rtl_tx_slots_avail(tp, skb_shinfo(skb)->nr_frags))) {
> @@ -6116,6 +6117,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>         /* Force memory writes to complete before releasing descriptor */
>         dma_wmb();
>  
> +       door_bell = __netdev_sent_queue(dev, skb->len, skb->xmit_more);
> +
>         txd->opts1 = rtl8169_get_txd_opts1(opts[0], len, entry);
>  
>         /* Force all memory writes to complete before notifying device */
> @@ -6127,7 +6130,7 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
>         if (unlikely(stop_queue))
>                 netif_stop_queue(dev);
>  
> -       if (__netdev_sent_queue(dev, skb->len, skb->xmit_more)) {
> +       if (door_bell) {
>                 RTL_W8(tp, TxPoll, NPQ);
>                 mmiowb();
>         }
> 
Thanks a lot for checking and for the proposed fix.
Sander, can you try with this patch on top of 5.0-rc5 w/o removing two two commits?

> 
> .
> 
Heiner

  reply	other threads:[~2019-02-09  9:10 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-08 18:29 Linux 5.0 regression: rtl8169 / kernel BUG at lib/dynamic_queue_limits.c:27! Sander Eikelenboom
2019-02-08 18:52 ` Heiner Kallweit
2019-02-08 20:55   ` Sander Eikelenboom
2019-02-08 21:22     ` Heiner Kallweit
2019-02-08 21:45       ` Sander Eikelenboom
2019-02-08 21:50         ` Heiner Kallweit
2019-02-08 23:09           ` Eric Dumazet
2019-02-09  9:02             ` Heiner Kallweit [this message]
2019-02-09  9:34               ` Sander Eikelenboom
2019-02-09  9:59                 ` Heiner Kallweit
2019-02-09 10:07                   ` Sander Eikelenboom
2019-02-09 11:50                     ` Heiner Kallweit
2019-02-10  9:16                       ` Sander Eikelenboom
2019-02-10  9:32                         ` Heiner Kallweit
2019-02-10 11:44                         ` Heiner Kallweit
2019-02-10 13:05                           ` Sander Eikelenboom
2019-02-10 13:57                             ` Heiner Kallweit
2019-02-10 15:50                           ` Sander Eikelenboom
2019-02-08 23:34           ` Sander Eikelenboom
2019-02-09  9:10             ` Heiner Kallweit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa7eb814-ebf7-ffa5-ce61-317b9bbf2394@gmail.com \
    --to=hkallweit1@gmail.com \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@eikelenboom.it \
    --cc=netdev@vger.kernel.org \
    --cc=nic_swsd@realtek.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).