netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: Re: [PATCH] ena: Speed up initialization 90x by reducing poll delays
@ 2020-03-11 13:24 Jubran, Samih
  2020-03-13 12:28 ` Josh Triplett
  0 siblings, 1 reply; 4+ messages in thread
From: Jubran, Samih @ 2020-03-11 13:24 UTC (permalink / raw)
  To: Machulsky, Zorik, Josh Triplett
  Cc: Belgazal, Netanel, Kiyanovski, Arthur, Tzalik, Guy, Bshara,
	Saeed, netdev, linux-kernel

Hi Josh,

Thanks for taking the time to write this patch. I have faced a bug while testing it that I haven't pinpointed yet the root cause of the issue, but it seems to me like a race in the netlink infrastructure.

Here is the bug scenario:
1. created ac  c5.24xlarge instance in AWS in v_virginia region using the default amazon Linux 2 AMI 
2. apply your patch won top of net-next v5.2 and install the kernel (currently I'm able to boot net-next v5.2 only, higher versions of net-next suffer from errors during boot time)
3. run "rmmod ena && insmod ena.ko" twice

Result:
The interface is not in up state

Expected result:
The interface should be in up state

What I know so far:
* ena_probe() seems to finish with no errors whatsoever
* adding prints / delays to ena_probe() causes the bug to vanish or less likely to occur depending on the amount of delays I add
* ena_up() is not called at all when the bug occurs, so it's something to do with netlink not invoking dev_open()

Did you face such issues? Do you have any idea what might be causing this?

> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org <linux-kernel-
> owner@vger.kernel.org> On Behalf Of Machulsky, Zorik
> <zorik@amazon.com>
> Sent: Tuesday, March 3, 2020 2:54 AM
> To: Josh Triplett <josh@joshtriplett.org>
> Cc: Belgazal, Netanel <netanel@amazon.com>; Kiyanovski, Arthur
> <akiyano@amazon.com>; Tzalik, Guy <gtzalik@amazon.com>; Bshara, Saeed
> <saeedb@amazon.com>; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH] ena: Speed up initialization 90x by reducing poll delays
> 
> 
> 
> On 3/2/20, 4:40 PM, "Josh Triplett" <josh@joshtriplett.org> wrote:
> 
> 
>     On Mon, Mar 02, 2020 at 11:16:32PM +0000, Machulsky, Zorik wrote:
>     >
>     > On 2/28/20, 4:29 PM, "Josh Triplett" <josh@joshtriplett.org> wrote:
>     >
>     >     Before initializing completion queue interrupts, the ena driver uses
>     >     polling to wait for responses on the admin command queue. The ena
> driver
>     >     waits 5ms between polls, but the hardware has generally finished long
>     >     before that. Reduce the poll time to 10us.
>     >
>     >     On a c5.12xlarge, this improves ena initialization time from 173.6ms to
>     >     1.920ms, an improvement of more than 90x. This improves server boot
> time
>     >     and time to network bringup.
>     >
>     > Thanks Josh,
>     > We agree that polling rate should be increased, but prefer not to do it
> aggressively and blindly.
>     > For example linear backoff approach might be a better choice. Please let
> us re-work a little this
>     > patch and bring it to review. Thanks!
> 
>     That's fine, as long as it has the same net improvement on boot time.
> 
>     I'd appreciate the opportunity to test any alternate approach you might
>     have.
> 
>     (Also, as long as you're working on this, you might wish to make a
>     similar change to the EFA driver, and to the FreeBSD drivers.)
> 
> Absolutely! Already forwarded this to the owners of these drivers.  Thanks!
> 
>     >     Before:
>     >     [    0.531722] calling  ena_init+0x0/0x63 @ 1
>     >     [    0.531722] ena: Elastic Network Adapter (ENA) v2.1.0K
>     >     [    0.531751] ena 0000:00:05.0: Elastic Network Adapter (ENA) v2.1.0K
>     >     [    0.531946] PCI Interrupt Link [LNKD] enabled at IRQ 11
>     >     [    0.547425] ena: ena device version: 0.10
>     >     [    0.547427] ena: ena controller version: 0.0.1 implementation version
> 1
>     >     [    0.709497] ena 0000:00:05.0: Elastic Network Adapter (ENA) found at
> mem febf4000, mac addr 06:c4:22:0e:dc:da, Placement policy: Low Latency
>     >     [    0.709508] initcall ena_init+0x0/0x63 returned 0 after 173616 usecs
>     >
>     >     After:
>     >     [    0.526965] calling  ena_init+0x0/0x63 @ 1
>     >     [    0.526966] ena: Elastic Network Adapter (ENA) v2.1.0K
>     >     [    0.527056] ena 0000:00:05.0: Elastic Network Adapter (ENA) v2.1.0K
>     >     [    0.527196] PCI Interrupt Link [LNKD] enabled at IRQ 11
>     >     [    0.527211] ena: ena device version: 0.10
>     >     [    0.527212] ena: ena controller version: 0.0.1 implementation version
> 1
>     >     [    0.528925] ena 0000:00:05.0: Elastic Network Adapter (ENA) found at
> mem febf4000, mac addr 06:c4:22:0e:dc:da, Placement policy: Low Latency
>     >     [    0.528934] initcall ena_init+0x0/0x63 returned 0 after 1920 usecs
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-04-12 20:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-11 13:24 Re: [PATCH] ena: Speed up initialization 90x by reducing poll delays Jubran, Samih
2020-03-13 12:28 ` Josh Triplett
2020-04-12  9:37   ` Jubran, Samih
2020-04-12 20:27     ` Josh Triplett

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).