All of lore.kernel.org
 help / color / mirror / Atom feed
* netfront/netback multiqueue exhausting grants
@ 2016-01-20 12:23 Ian Campbell
  2016-01-20 14:40 ` Boris Ostrovsky
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Ian Campbell @ 2016-01-20 12:23 UTC (permalink / raw)
  To: xen-devel; +Cc: Boris Ostrovsky, Wei Liu, David Vrabel

There have been a few reports recently[0] which relate to a failure of
netfront to allocate sufficient grant refs for all the queues:

[    0.533589] xen_netfront: can't alloc rx grant refs
[    0.533612] net eth0: only created 31 queues

Which can be worked around by increasing the number of grants on the
hypervisor command line or by limiting the number of queues permitted by
either back or front using a module param (which was broken but is now
fixed on both sides, but I'm not sure it has been backported everywhere
such that it is a reliable thing to always tell users as a workaround).

Is there any plan to do anything about the default/out of the box
experience? Either limiting the number of queues or making both ends cope
more gracefully with failure to create some queues (or both) might be
sufficient?

I think the crash after the above in the first link at [0] is fixed? I
think that was the purpose of ca88ea1247df "xen-netfront: update num_queues
to real created" which was in 4.3.

Ian.

[0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00100.html
    http://lists.xen.org/archives/html/xen-users/2016-01/msg00072.html
    some before hte xmas break too IIRC

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfront/netback multiqueue exhausting grants
  2016-01-20 12:23 netfront/netback multiqueue exhausting grants Ian Campbell
@ 2016-01-20 14:40 ` Boris Ostrovsky
  2016-01-20 14:52   ` Ian Campbell
  2016-01-20 16:18 ` annie li
  2016-01-21 10:56 ` David Vrabel
  2 siblings, 1 reply; 21+ messages in thread
From: Boris Ostrovsky @ 2016-01-20 14:40 UTC (permalink / raw)
  To: Ian Campbell, xen-devel; +Cc: Wei Liu, David Vrabel

On 01/20/2016 07:23 AM, Ian Campbell wrote:
> There have been a few reports recently[0] which relate to a failure of
> netfront to allocate sufficient grant refs for all the queues:
>
> [    0.533589] xen_netfront: can't alloc rx grant refs
> [    0.533612] net eth0: only created 31 queues
>
> Which can be worked around by increasing the number of grants on the
> hypervisor command line or by limiting the number of queues permitted by
> either back or front using a module param (which was broken but is now
> fixed on both sides, but I'm not sure it has been backported everywhere
> such that it is a reliable thing to always tell users as a workaround).
>
> Is there any plan to do anything about the default/out of the box
> experience? Either limiting the number of queues or making both ends cope
> more gracefully with failure to create some queues (or both) might be
> sufficient?
>
> I think the crash after the above in the first link at [0] is fixed? I
> think that was the purpose of ca88ea1247df "xen-netfront: update num_queues
> to real created" which was in 4.3.

I think ca88ea1247df is the solution --- it will limit the number of 
queues.

And apparently it's not in stable trees. At least not in 4.1.15, which 
is what the first reported is running:

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/log/drivers/net/xen-netfront.c?id=refs/tags/v4.1.15

-boris


>
> Ian.
>
> [0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00100.html
>      http://lists.xen.org/archives/html/xen-users/2016-01/msg00072.html
>      some before hte xmas break too IIRC

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfront/netback multiqueue exhausting grants
  2016-01-20 14:40 ` Boris Ostrovsky
@ 2016-01-20 14:52   ` Ian Campbell
  2016-01-20 15:02     ` David Vrabel
  0 siblings, 1 reply; 21+ messages in thread
From: Ian Campbell @ 2016-01-20 14:52 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel; +Cc: Wei Liu, David Vrabel

On Wed, 2016-01-20 at 09:40 -0500, Boris Ostrovsky wrote:
> On 01/20/2016 07:23 AM, Ian Campbell wrote:
> > There have been a few reports recently[0] which relate to a failure of
> > netfront to allocate sufficient grant refs for all the queues:
> > 
> > [    0.533589] xen_netfront: can't alloc rx grant refs
> > [    0.533612] net eth0: only created 31 queues
> > 
> > Which can be worked around by increasing the number of grants on the
> > hypervisor command line or by limiting the number of queues permitted
> > by
> > either back or front using a module param (which was broken but is now
> > fixed on both sides, but I'm not sure it has been backported everywhere
> > such that it is a reliable thing to always tell users as a workaround).
> > 
> > Is there any plan to do anything about the default/out of the box
> > experience? Either limiting the number of queues or making both ends
> > cope
> > more gracefully with failure to create some queues (or both) might be
> > sufficient?
> > 
> > I think the crash after the above in the first link at [0] is fixed? I
> > think that was the purpose of ca88ea1247df "xen-netfront: update
> > num_queues
> > to real created" which was in 4.3.
> 
> I think ca88ea1247df is the solution --- it will limit the number of 
> queues.

That's in 4.4, which the first link at [0] claimed to have tested. I can
see this fixing the crash, but does it really fix the "actually works with
less queues than it tried to get" issue?

In any case having exhausted the grant entries creating queues there aren't
any left to shuffle actual data around, is there? (or are those
preallocated too?)

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfront/netback multiqueue exhausting grants
  2016-01-20 14:52   ` Ian Campbell
@ 2016-01-20 15:02     ` David Vrabel
  2016-01-20 15:10       ` Boris Ostrovsky
  0 siblings, 1 reply; 21+ messages in thread
From: David Vrabel @ 2016-01-20 15:02 UTC (permalink / raw)
  To: Ian Campbell, Boris Ostrovsky, xen-devel; +Cc: Wei Liu, David Vrabel

On 20/01/16 14:52, Ian Campbell wrote:
> On Wed, 2016-01-20 at 09:40 -0500, Boris Ostrovsky wrote:
>> On 01/20/2016 07:23 AM, Ian Campbell wrote:
>>> There have been a few reports recently[0] which relate to a failure of
>>> netfront to allocate sufficient grant refs for all the queues:
>>>
>>> [    0.533589] xen_netfront: can't alloc rx grant refs
>>> [    0.533612] net eth0: only created 31 queues
>>>
>>> Which can be worked around by increasing the number of grants on the
>>> hypervisor command line or by limiting the number of queues permitted
>>> by
>>> either back or front using a module param (which was broken but is now
>>> fixed on both sides, but I'm not sure it has been backported everywhere
>>> such that it is a reliable thing to always tell users as a workaround).
>>>
>>> Is there any plan to do anything about the default/out of the box
>>> experience? Either limiting the number of queues or making both ends
>>> cope
>>> more gracefully with failure to create some queues (or both) might be
>>> sufficient?
>>>
>>> I think the crash after the above in the first link at [0] is fixed? I
>>> think that was the purpose of ca88ea1247df "xen-netfront: update
>>> num_queues
>>> to real created" which was in 4.3.
>>
>> I think ca88ea1247df is the solution --- it will limit the number of 
>> queues.
> 
> That's in 4.4, which the first link at [0] claimed to have tested. I can
> see this fixing the crash, but does it really fix the "actually works with
> less queues than it tried to get" issue?
> 
> In any case having exhausted the grant entries creating queues there aren't
> any left to shuffle actual data around, is there? (or are those
> preallocated too?)

All grants refs for Tx and Rx are preallocated (this is the allocation
that is failing above).

David

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfront/netback multiqueue exhausting grants
  2016-01-20 15:02     ` David Vrabel
@ 2016-01-20 15:10       ` Boris Ostrovsky
  2016-01-20 15:16         ` Ian Campbell
  0 siblings, 1 reply; 21+ messages in thread
From: Boris Ostrovsky @ 2016-01-20 15:10 UTC (permalink / raw)
  To: David Vrabel, Ian Campbell, xen-devel; +Cc: Wei Liu

On 01/20/2016 10:02 AM, David Vrabel wrote:
> On 20/01/16 14:52, Ian Campbell wrote:
>> On Wed, 2016-01-20 at 09:40 -0500, Boris Ostrovsky wrote:
>>> On 01/20/2016 07:23 AM, Ian Campbell wrote:
>>>> There have been a few reports recently[0] which relate to a failure of
>>>> netfront to allocate sufficient grant refs for all the queues:
>>>>
>>>> [    0.533589] xen_netfront: can't alloc rx grant refs
>>>> [    0.533612] net eth0: only created 31 queues
>>>>
>>>> Which can be worked around by increasing the number of grants on the
>>>> hypervisor command line or by limiting the number of queues permitted
>>>> by
>>>> either back or front using a module param (which was broken but is now
>>>> fixed on both sides, but I'm not sure it has been backported everywhere
>>>> such that it is a reliable thing to always tell users as a workaround).
>>>>
>>>> Is there any plan to do anything about the default/out of the box
>>>> experience? Either limiting the number of queues or making both ends
>>>> cope
>>>> more gracefully with failure to create some queues (or both) might be
>>>> sufficient?
>>>>
>>>> I think the crash after the above in the first link at [0] is fixed? I
>>>> think that was the purpose of ca88ea1247df "xen-netfront: update
>>>> num_queues
>>>> to real created" which was in 4.3.
>>> I think ca88ea1247df is the solution --- it will limit the number of
>>> queues.
>> That's in 4.4, which the first link at [0] claimed to have tested. I can
>> see this fixing the crash, but does it really fix the "actually works with
>> less queues than it tried to get" issue?

That's what I thought it does too. I didn't notice that 4.4 was tested 
as well, so maybe not.

-boris

>>
>> In any case having exhausted the grant entries creating queues there aren't
>> any left to shuffle actual data around, is there? (or are those
>> preallocated too?)
> All grants refs for Tx and Rx are preallocated (this is the allocation
> that is failing above).
>
> David

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfront/netback multiqueue exhausting grants
  2016-01-20 15:10       ` Boris Ostrovsky
@ 2016-01-20 15:16         ` Ian Campbell
  2016-01-21 10:12           ` Ian Campbell
  0 siblings, 1 reply; 21+ messages in thread
From: Ian Campbell @ 2016-01-20 15:16 UTC (permalink / raw)
  To: Boris Ostrovsky, David Vrabel, xen-devel; +Cc: Wei Liu

On Wed, 2016-01-20 at 10:10 -0500, Boris Ostrovsky wrote:
> On 01/20/2016 10:02 AM, David Vrabel wrote:
> > On 20/01/16 14:52, Ian Campbell wrote:
> > > On Wed, 2016-01-20 at 09:40 -0500, Boris Ostrovsky wrote:
> > > > On 01/20/2016 07:23 AM, Ian Campbell wrote:
> > > > > There have been a few reports recently[0] which relate to a
> > > > > failure of
> > > > > netfront to allocate sufficient grant refs for all the queues:
> > > > > 
> > > > > [    0.533589] xen_netfront: can't alloc rx grant refs
> > > > > [    0.533612] net eth0: only created 31 queues
> > > > > 
> > > > > Which can be worked around by increasing the number of grants on
> > > > > the
> > > > > hypervisor command line or by limiting the number of queues
> > > > > permitted
> > > > > by
> > > > > either back or front using a module param (which was broken but
> > > > > is now
> > > > > fixed on both sides, but I'm not sure it has been backported
> > > > > everywhere
> > > > > such that it is a reliable thing to always tell users as a
> > > > > workaround).
> > > > > 
> > > > > Is there any plan to do anything about the default/out of the box
> > > > > experience? Either limiting the number of queues or making both
> > > > > ends
> > > > > cope
> > > > > more gracefully with failure to create some queues (or both)
> > > > > might be
> > > > > sufficient?
> > > > > 
> > > > > I think the crash after the above in the first link at [0] is
> > > > > fixed? I
> > > > > think that was the purpose of ca88ea1247df "xen-netfront: update
> > > > > num_queues
> > > > > to real created" which was in 4.3.
> > > > I think ca88ea1247df is the solution --- it will limit the number
> > > > of
> > > > queues.
> > > That's in 4.4, which the first link at [0] claimed to have tested. I
> > > can
> > > see this fixing the crash, but does it really fix the "actually works
> > > with
> > > less queues than it tried to get" issue?
> 
> That's what I thought it does too. I didn't notice that 4.4 was tested 
> as well, so maybe not.

I've asked the reporter to send logs for the 4.4 case to xen-devel.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfront/netback multiqueue exhausting grants
  2016-01-20 12:23 netfront/netback multiqueue exhausting grants Ian Campbell
  2016-01-20 14:40 ` Boris Ostrovsky
@ 2016-01-20 16:18 ` annie li
  2016-01-21 10:56 ` David Vrabel
  2 siblings, 0 replies; 21+ messages in thread
From: annie li @ 2016-01-20 16:18 UTC (permalink / raw)
  To: Ian Campbell, xen-devel; +Cc: Boris Ostrovsky, Wei Liu, David Vrabel


On 2016/1/20 7:23, Ian Campbell wrote:
> There have been a few reports recently[0] which relate to a failure of
> netfront to allocate sufficient grant refs for all the queues:
>
> [    0.533589] xen_netfront: can't alloc rx grant refs
> [    0.533612] net eth0: only created 31 queues
>
> Which can be worked around by increasing the number of grants on the
> hypervisor command line or by limiting the number of queues permitted by
> either back or front using a module param (which was broken but is now
> fixed on both sides, but I'm not sure it has been backported everywhere
> such that it is a reliable thing to always tell users as a workaround).
Following are the patches to fix module param, they exist since v4.3.
xen-netfront: respect user provided max_queues
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=32a844056fd43dda647e1c3c6b9983bdfa04d17d
xen-netback: respect user provided max_queues
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4c82ac3c37363e8c4ded6a5fe1ec5fa756b34df3
>
> Is there any plan to do anything about the default/out of the box
> experience? Either limiting the number of queues or making both ends cope
> more gracefully with failure to create some queues (or both) might be
> sufficient?
We run into similar issue recently, and guess it is better to suggest 
user to set netback module parameter with the default value as 8? see 
this link,
http://wiki.xenproject.org/wiki/Xen-netback_and_xen-netfront_multi-queue_performance_testing 

Probably more test are needed to get the default number of best experience.

>
> I think the crash after the above in the first link at [0] is fixed? I
> think that was the purpose of ca88ea1247df "xen-netfront: update num_queues
> to real created" which was in 4.3.
Correct.

Thanks
Annie
>
> Ian.
>
> [0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00100.html
>      http://lists.xen.org/archives/html/xen-users/2016-01/msg00072.html
>      some before hte xmas break too IIRC
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfront/netback multiqueue exhausting grants
  2016-01-20 15:16         ` Ian Campbell
@ 2016-01-21 10:12           ` Ian Campbell
  2016-01-21 10:25             ` Wei Liu
  0 siblings, 1 reply; 21+ messages in thread
From: Ian Campbell @ 2016-01-21 10:12 UTC (permalink / raw)
  To: Boris Ostrovsky, David Vrabel, xen-devel; +Cc: Wei Liu

On Wed, 2016-01-20 at 15:16 +0000, Ian Campbell wrote:
> On Wed, 2016-01-20 at 10:10 -0500, Boris Ostrovsky wrote:
> > On 01/20/2016 10:02 AM, David Vrabel wrote:
> > > On 20/01/16 14:52, Ian Campbell wrote:
> > > > On Wed, 2016-01-20 at 09:40 -0500, Boris Ostrovsky wrote:
> > > > > On 01/20/2016 07:23 AM, Ian Campbell wrote:
> > > > > > There have been a few reports recently[0] which relate to a
> > > > > > failure of
> > > > > > netfront to allocate sufficient grant refs for all the queues:
> > > > > > 
> > > > > > [    0.533589] xen_netfront: can't alloc rx grant refs
> > > > > > [    0.533612] net eth0: only created 31 queues
> > > > > > 
> > > > > > Which can be worked around by increasing the number of grants
> > > > > > on
> > > > > > the
> > > > > > hypervisor command line or by limiting the number of queues
> > > > > > permitted
> > > > > > by
> > > > > > either back or front using a module param (which was broken but
> > > > > > is now
> > > > > > fixed on both sides, but I'm not sure it has been backported
> > > > > > everywhere
> > > > > > such that it is a reliable thing to always tell users as a
> > > > > > workaround).
> > > > > > 
> > > > > > Is there any plan to do anything about the default/out of the
> > > > > > box
> > > > > > experience? Either limiting the number of queues or making both
> > > > > > ends
> > > > > > cope
> > > > > > more gracefully with failure to create some queues (or both)
> > > > > > might be
> > > > > > sufficient?
> > > > > > 
> > > > > > I think the crash after the above in the first link at [0] is
> > > > > > fixed? I
> > > > > > think that was the purpose of ca88ea1247df "xen-netfront:
> > > > > > update
> > > > > > num_queues
> > > > > > to real created" which was in 4.3.
> > > > > I think ca88ea1247df is the solution --- it will limit the number
> > > > > of
> > > > > queues.
> > > > That's in 4.4, which the first link at [0] claimed to have tested.
> > > > I
> > > > can
> > > > see this fixing the crash, but does it really fix the "actually
> > > > works
> > > > with
> > > > less queues than it tried to get" issue?
> > 
> > That's what I thought it does too. I didn't notice that 4.4 was tested 
> > as well, so maybe not.
> 
> I've asked the reporter to send logs for the 4.4 case to xen-devel.

User confirmed[0] that 4.4 is actually OK.

Did someone request stable backports yet, or shall I do so?

Ian.

[0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00110.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfront/netback multiqueue exhausting grants
  2016-01-21 10:12           ` Ian Campbell
@ 2016-01-21 10:25             ` Wei Liu
  2016-01-21 10:37               ` Ian Campbell
  0 siblings, 1 reply; 21+ messages in thread
From: Wei Liu @ 2016-01-21 10:25 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Boris Ostrovsky, Wei Liu, David Vrabel, xen-devel

On Thu, Jan 21, 2016 at 10:12:27AM +0000, Ian Campbell wrote:
[...]
> > I've asked the reporter to send logs for the 4.4 case to xen-devel.
> 
> User confirmed[0] that 4.4 is actually OK.
> 
> Did someone request stable backports yet, or shall I do so?
> 

I vaguely remember we requested backport for relevant patches long time
ago, but I admit I have lost track. So it wouldn't hurt if you do it
again.

Wei.

> Ian.
> 
> [0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00110.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfront/netback multiqueue exhausting grants
  2016-01-21 10:25             ` Wei Liu
@ 2016-01-21 10:37               ` Ian Campbell
  2016-01-21 10:52                 ` Wei Liu
  0 siblings, 1 reply; 21+ messages in thread
From: Ian Campbell @ 2016-01-21 10:37 UTC (permalink / raw)
  To: Wei Liu; +Cc: Boris Ostrovsky, David Vrabel, xen-devel

On Thu, 2016-01-21 at 10:25 +0000, Wei Liu wrote:
> On Thu, Jan 21, 2016 at 10:12:27AM +0000, Ian Campbell wrote:
> [...]
> > > I've asked the reporter to send logs for the 4.4 case to xen-devel.
> > 
> > User confirmed[0] that 4.4 is actually OK.
> > 
> > Did someone request stable backports yet, or shall I do so?
> > 
> 
> I vaguely remember we requested backport for relevant patches long time
> ago, but I admit I have lost track. So it wouldn't hurt if you do it
> again.

So I think we'd be looking for:

32a8440 xen-netfront: respect user provided max_queues
4c82ac3 xen-netback: respect user provided max_queues
ca88ea1 xen-netfront: update num_queues to real created

which certainly resolves things such that the workarounds work, and I think
will also fix the default case such that it works with up to 32 vcpus
(although it will consume all the grants and only get 31/32 queues).

Does that sound correct?

As Annie said, we may still want to consider what a sensible default max
queues would be.

Ian.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfront/netback multiqueue exhausting grants
  2016-01-21 10:37               ` Ian Campbell
@ 2016-01-21 10:52                 ` Wei Liu
  0 siblings, 0 replies; 21+ messages in thread
From: Wei Liu @ 2016-01-21 10:52 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Boris Ostrovsky, Wei Liu, David Vrabel, xen-devel

On Thu, Jan 21, 2016 at 10:37:51AM +0000, Ian Campbell wrote:
> On Thu, 2016-01-21 at 10:25 +0000, Wei Liu wrote:
> > On Thu, Jan 21, 2016 at 10:12:27AM +0000, Ian Campbell wrote:
> > [...]
> > > > I've asked the reporter to send logs for the 4.4 case to xen-devel.
> > > 
> > > User confirmed[0] that 4.4 is actually OK.
> > > 
> > > Did someone request stable backports yet, or shall I do so?
> > > 
> > 
> > I vaguely remember we requested backport for relevant patches long time
> > ago, but I admit I have lost track. So it wouldn't hurt if you do it
> > again.
> 
> So I think we'd be looking for:
> 
> 32a8440 xen-netfront: respect user provided max_queues
> 4c82ac3 xen-netback: respect user provided max_queues
> ca88ea1 xen-netfront: update num_queues to real created
> 
> which certainly resolves things such that the workarounds work, and I think
> will also fix the default case such that it works with up to 32 vcpus
> (although it will consume all the grants and only get 31/32 queues).
> 
> Does that sound correct?
> 

Yes, it does.

> As Annie said, we may still want to consider what a sensible default max
> queues would be.
> 

Maybe we should set a cap to 8 or 16 by default.

Wei.

> Ian.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfront/netback multiqueue exhausting grants
  2016-01-20 12:23 netfront/netback multiqueue exhausting grants Ian Campbell
  2016-01-20 14:40 ` Boris Ostrovsky
  2016-01-20 16:18 ` annie li
@ 2016-01-21 10:56 ` David Vrabel
  2016-01-21 12:19   ` Ian Campbell
  2 siblings, 1 reply; 21+ messages in thread
From: David Vrabel @ 2016-01-21 10:56 UTC (permalink / raw)
  To: Ian Campbell, xen-devel; +Cc: Boris Ostrovsky, Wei Liu, David Vrabel

On 20/01/16 12:23, Ian Campbell wrote:
> There have been a few reports recently[0] which relate to a failure of
> netfront to allocate sufficient grant refs for all the queues:
> 
> [    0.533589] xen_netfront: can't alloc rx grant refs
> [    0.533612] net eth0: only created 31 queues
> 
> Which can be worked around by increasing the number of grants on the
> hypervisor command line or by limiting the number of queues permitted by
> either back or front using a module param (which was broken but is now
> fixed on both sides, but I'm not sure it has been backported everywhere
> such that it is a reliable thing to always tell users as a workaround).
> 
> Is there any plan to do anything about the default/out of the box
> experience? Either limiting the number of queues or making both ends cope
> more gracefully with failure to create some queues (or both) might be
> sufficient?
> 
> I think the crash after the above in the first link at [0] is fixed? I
> think that was the purpose of ca88ea1247df "xen-netfront: update num_queues
> to real created" which was in 4.3.

I think the correct solution is to increase the default maximum grant
table size.

Although, unless you're using the not-yet-applied per-cpu rwlock patches
multiqueue is terrible on many (multisocket) systems and the number of
queue should be limited in netback to 4 or even just 2.

David

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfront/netback multiqueue exhausting grants
  2016-01-21 10:56 ` David Vrabel
@ 2016-01-21 12:19   ` Ian Campbell
  2016-01-21 14:17     ` David Vrabel
  2016-01-22  3:36     ` Bob Liu
  0 siblings, 2 replies; 21+ messages in thread
From: Ian Campbell @ 2016-01-21 12:19 UTC (permalink / raw)
  To: David Vrabel, xen-devel; +Cc: Boris Ostrovsky, Wei Liu

On Thu, 2016-01-21 at 10:56 +0000, David Vrabel wrote:
> On 20/01/16 12:23, Ian Campbell wrote:
> > There have been a few reports recently[0] which relate to a failure of
> > netfront to allocate sufficient grant refs for all the queues:
> > 
> > [    0.533589] xen_netfront: can't alloc rx grant refs
> > [    0.533612] net eth0: only created 31 queues
> > 
> > Which can be worked around by increasing the number of grants on the
> > hypervisor command line or by limiting the number of queues permitted
> > by
> > either back or front using a module param (which was broken but is now
> > fixed on both sides, but I'm not sure it has been backported everywhere
> > such that it is a reliable thing to always tell users as a workaround).
> > 
> > Is there any plan to do anything about the default/out of the box
> > experience? Either limiting the number of queues or making both ends
> > cope
> > more gracefully with failure to create some queues (or both) might be
> > sufficient?
> > 
> > I think the crash after the above in the first link at [0] is fixed? I
> > think that was the purpose of ca88ea1247df "xen-netfront: update
> > num_queues
> > to real created" which was in 4.3.
> 
> I think the correct solution is to increase the default maximum grant
> table size.

That could well make sense, but then there will just be another higher
limit, so we should perhaps do both.

i.e. factoring in:
 * performance i.e. ability for N queues to saturate whatever sort of link
   contemporary Linux can saturate these days, plus some headroom, or
   whatever other ceiling seems sensible)
 * grant table resource consumption i.e. (sensible max number of blks * nr
   gnts per blk + sensible max number of vifs * nr gnts per vif + other
   devs needs) < per guest grant limit) to pick both the default gnttab
   size and the default max queuers.

(or s/sensible/supportable/g etc).

> Although, unless you're using the not-yet-applied per-cpu rwlock patches
> multiqueue is terrible on many (multisocket) systems and the number of
> queue should be limited in netback to 4 or even just 2.

Presumably the guest can't tell, so it can't do this.

I think when you say "terrible" you don't mean "worse than without mq" but
rather "not realising the expected gains from a larger nunber of queues",
right?.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfront/netback multiqueue exhausting grants
  2016-01-21 12:19   ` Ian Campbell
@ 2016-01-21 14:17     ` David Vrabel
  2016-01-21 15:11       ` annie li
  2016-01-22  3:36     ` Bob Liu
  1 sibling, 1 reply; 21+ messages in thread
From: David Vrabel @ 2016-01-21 14:17 UTC (permalink / raw)
  To: Ian Campbell, David Vrabel, xen-devel; +Cc: Boris Ostrovsky, Wei Liu

On 21/01/16 12:19, Ian Campbell wrote:
> On Thu, 2016-01-21 at 10:56 +0000, David Vrabel wrote:
>> On 20/01/16 12:23, Ian Campbell wrote:
>>> There have been a few reports recently[0] which relate to a failure of
>>> netfront to allocate sufficient grant refs for all the queues:
>>>
>>> [    0.533589] xen_netfront: can't alloc rx grant refs
>>> [    0.533612] net eth0: only created 31 queues
>>>
>>> Which can be worked around by increasing the number of grants on the
>>> hypervisor command line or by limiting the number of queues permitted
>>> by
>>> either back or front using a module param (which was broken but is now
>>> fixed on both sides, but I'm not sure it has been backported everywhere
>>> such that it is a reliable thing to always tell users as a workaround).
>>>
>>> Is there any plan to do anything about the default/out of the box
>>> experience? Either limiting the number of queues or making both ends
>>> cope
>>> more gracefully with failure to create some queues (or both) might be
>>> sufficient?
>>>
>>> I think the crash after the above in the first link at [0] is fixed? I
>>> think that was the purpose of ca88ea1247df "xen-netfront: update
>>> num_queues
>>> to real created" which was in 4.3.
>>
>> I think the correct solution is to increase the default maximum grant
>> table size.
> 
> That could well make sense, but then there will just be another higher
> limit, so we should perhaps do both.
> 
> i.e. factoring in:
>  * performance i.e. ability for N queues to saturate whatever sort of link
>    contemporary Linux can saturate these days, plus some headroom, or
>    whatever other ceiling seems sensible)
>  * grant table resource consumption i.e. (sensible max number of blks * nr
>    gnts per blk + sensible max number of vifs * nr gnts per vif + other
>    devs needs) < per guest grant limit) to pick both the default gnttab
>    size and the default max queuers.

Yes.

>> Although, unless you're using the not-yet-applied per-cpu rwlock patches
>> multiqueue is terrible on many (multisocket) systems and the number of
>> queue should be limited in netback to 4 or even just 2.
> 
> Presumably the guest can't tell, so it can't do this.
> 
> I think when you say "terrible" you don't mean "worse than without mq" but
> rather "not realising the expected gains from a larger nunber of queues",
> right?.

Malcolm did the analysis but if I remember correctly, 8 queues performed
about the same as 1 queue and 16 were worse than 1 queue.

David

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfront/netback multiqueue exhausting grants
  2016-01-21 14:17     ` David Vrabel
@ 2016-01-21 15:11       ` annie li
  0 siblings, 0 replies; 21+ messages in thread
From: annie li @ 2016-01-21 15:11 UTC (permalink / raw)
  To: David Vrabel, Ian Campbell, xen-devel; +Cc: Boris Ostrovsky, Wei Liu


On 2016/1/21 9:17, David Vrabel wrote:
> On 21/01/16 12:19, Ian Campbell wrote:
>> On Thu, 2016-01-21 at 10:56 +0000, David Vrabel wrote:
>>> On 20/01/16 12:23, Ian Campbell wrote:
>>>> There have been a few reports recently[0] which relate to a failure of
>>>> netfront to allocate sufficient grant refs for all the queues:
>>>>
>>>> [    0.533589] xen_netfront: can't alloc rx grant refs
>>>> [    0.533612] net eth0: only created 31 queues
>>>>
>>>> Which can be worked around by increasing the number of grants on the
>>>> hypervisor command line or by limiting the number of queues permitted
>>>> by
>>>> either back or front using a module param (which was broken but is now
>>>> fixed on both sides, but I'm not sure it has been backported everywhere
>>>> such that it is a reliable thing to always tell users as a workaround).
>>>>
>>>> Is there any plan to do anything about the default/out of the box
>>>> experience? Either limiting the number of queues or making both ends
>>>> cope
>>>> more gracefully with failure to create some queues (or both) might be
>>>> sufficient?
>>>>
>>>> I think the crash after the above in the first link at [0] is fixed? I
>>>> think that was the purpose of ca88ea1247df "xen-netfront: update
>>>> num_queues
>>>> to real created" which was in 4.3.
>>> I think the correct solution is to increase the default maximum grant
>>> table size.
>> That could well make sense, but then there will just be another higher
>> limit, so we should perhaps do both.
>>
>> i.e. factoring in:
>>   * performance i.e. ability for N queues to saturate whatever sort of link
>>     contemporary Linux can saturate these days, plus some headroom, or
>>     whatever other ceiling seems sensible)
>>   * grant table resource consumption i.e. (sensible max number of blks * nr
>>     gnts per blk + sensible max number of vifs * nr gnts per vif + other
>>     devs needs) < per guest grant limit) to pick both the default gnttab
>>     size and the default max queuers.
> Yes.

Would it waste lots of resources in the case where guest vif has lots of 
queue but no network load? Here is an example of gntref consumed by vif,
Dom0 20vcpu, domu 20vcpu,
one vif would consumes 20*256*2=10240 gntref.
If setting the maximum grant table size to 64pages(default value of xen 
is 32pages now?), then only 3 vif is supported in guest. Even blk isn't 
taken account in, and also blk multi-page ring feature.

Thanks
Annie
>
>>> Although, unless you're using the not-yet-applied per-cpu rwlock patches
>>> multiqueue is terrible on many (multisocket) systems and the number of
>>> queue should be limited in netback to 4 or even just 2.
>> Presumably the guest can't tell, so it can't do this.
>>
>> I think when you say "terrible" you don't mean "worse than without mq" but
>> rather "not realising the expected gains from a larger nunber of queues",
>> right?.
> Malcolm did the analysis but if I remember correctly, 8 queues performed
> about the same as 1 queue and 16 were worse than 1 queue.
>
> David
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfront/netback multiqueue exhausting grants
  2016-01-21 12:19   ` Ian Campbell
  2016-01-21 14:17     ` David Vrabel
@ 2016-01-22  3:36     ` Bob Liu
  2016-01-22  7:53       ` Jan Beulich
  1 sibling, 1 reply; 21+ messages in thread
From: Bob Liu @ 2016-01-22  3:36 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Boris Ostrovsky, Wei Liu, David Vrabel, xen-devel


On 01/21/2016 08:19 PM, Ian Campbell wrote:
> On Thu, 2016-01-21 at 10:56 +0000, David Vrabel wrote:
>> On 20/01/16 12:23, Ian Campbell wrote:
>>> There have been a few reports recently[0] which relate to a failure of
>>> netfront to allocate sufficient grant refs for all the queues:
>>>
>>> [    0.533589] xen_netfront: can't alloc rx grant refs
>>> [    0.533612] net eth0: only created 31 queues
>>>
>>> Which can be worked around by increasing the number of grants on the
>>> hypervisor command line or by limiting the number of queues permitted
>>> by
>>> either back or front using a module param (which was broken but is now
>>> fixed on both sides, but I'm not sure it has been backported everywhere
>>> such that it is a reliable thing to always tell users as a workaround).
>>>
>>> Is there any plan to do anything about the default/out of the box
>>> experience? Either limiting the number of queues or making both ends
>>> cope
>>> more gracefully with failure to create some queues (or both) might be
>>> sufficient?
>>>
>>> I think the crash after the above in the first link at [0] is fixed? I
>>> think that was the purpose of ca88ea1247df "xen-netfront: update
>>> num_queues
>>> to real created" which was in 4.3.
>>
>> I think the correct solution is to increase the default maximum grant
>> table size.
> 
> That could well make sense, but then there will just be another higher
> limit, so we should perhaps do both.
> 
> i.e. factoring in:
>  * performance i.e. ability for N queues to saturate whatever sort of link
>    contemporary Linux can saturate these days, plus some headroom, or
>    whatever other ceiling seems sensible)
>  * grant table resource consumption i.e. (sensible max number of blks * nr
>    gnts per blk + sensible max number of vifs * nr gnts per vif + other
>    devs needs) < per guest grant limit) to pick both the default gnttab
>    size and the default max queuers.
> 

Agree.
By the way, do you think it's possible to make grant table support bigger page e.g 64K?
One grant-ref per 64KB instead of 4KB, this should able to reduce the grant entry consumption significantly.

Bob

> (or s/sensible/supportable/g etc).
> 
>> Although, unless you're using the not-yet-applied per-cpu rwlock patches
>> multiqueue is terrible on many (multisocket) systems and the number of
>> queue should be limited in netback to 4 or even just 2.
> 
> Presumably the guest can't tell, so it can't do this.
> 
> I think when you say "terrible" you don't mean "worse than without mq" but
> rather "not realising the expected gains from a larger nunber of queues",
> right?.
> 
> Ian.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfront/netback multiqueue exhausting grants
  2016-01-22  3:36     ` Bob Liu
@ 2016-01-22  7:53       ` Jan Beulich
  2016-01-22 10:40         ` Bob Liu
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2016-01-22  7:53 UTC (permalink / raw)
  To: Bob Liu; +Cc: Boris Ostrovsky, xen-devel, Wei Liu, David Vrabel, Ian Campbell

>>> On 22.01.16 at 04:36, <bob.liu@oracle.com> wrote:
> By the way, do you think it's possible to make grant table support bigger 
> page e.g 64K?
> One grant-ref per 64KB instead of 4KB, this should able to reduce the grant 
> entry consumption significantly.

How would that work with an underlying page size of 4k, and pages
potentially being non-contiguous in machine address space? Besides
that the grant table hypercall interface isn't prepared to support
64k page size, due to its use of uint16_t for the length of copy ops.

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfront/netback multiqueue exhausting grants
  2016-01-22  7:53       ` Jan Beulich
@ 2016-01-22 10:40         ` Bob Liu
  2016-01-22 11:02           ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Bob Liu @ 2016-01-22 10:40 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Boris Ostrovsky, xen-devel, Wei Liu, David Vrabel, Ian Campbell



On 01/22/2016 03:53 PM, Jan Beulich wrote:
>>>> On 22.01.16 at 04:36, <bob.liu@oracle.com> wrote:
>> By the way, do you think it's possible to make grant table support bigger 
>> page e.g 64K?
>> One grant-ref per 64KB instead of 4KB, this should able to reduce the grant 
>> entry consumption significantly.
> 
> How would that work with an underlying page size of 4k, and pages
> potentially being non-contiguous in machine address space? Besides
> that the grant table hypercall interface isn't prepared to support
> 64k page size, due to its use of uint16_t for the length of copy ops.
> 

Right, and I mean whether we should consider address all the place as your mentioned.
With multi-queue xen-block and xen-network, we got more reports that the grants were exhausted.

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfront/netback multiqueue exhausting grants
  2016-01-22 10:40         ` Bob Liu
@ 2016-01-22 11:02           ` Jan Beulich
  2016-01-23  0:29             ` Bob Liu
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2016-01-22 11:02 UTC (permalink / raw)
  To: Bob Liu; +Cc: Boris Ostrovsky, xen-devel, Wei Liu, David Vrabel, Ian Campbell

>>> On 22.01.16 at 11:40, <bob.liu@oracle.com> wrote:
> On 01/22/2016 03:53 PM, Jan Beulich wrote:
>>>>> On 22.01.16 at 04:36, <bob.liu@oracle.com> wrote:
>>> By the way, do you think it's possible to make grant table support bigger 
>>> page e.g 64K?
>>> One grant-ref per 64KB instead of 4KB, this should able to reduce the grant 
>>> entry consumption significantly.
>> 
>> How would that work with an underlying page size of 4k, and pages
>> potentially being non-contiguous in machine address space? Besides
>> that the grant table hypercall interface isn't prepared to support
>> 64k page size, due to its use of uint16_t for the length of copy ops.
> 
> Right, and I mean whether we should consider address all the place as your 
> mentioned.

Just from an abstract perspective: How would you envision to avoid
machine address discontiguity? Or would you want to limit such an
improvement to only HVM/PVH/HVMlite guests?

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfront/netback multiqueue exhausting grants
  2016-01-22 11:02           ` Jan Beulich
@ 2016-01-23  0:29             ` Bob Liu
  2016-01-25  9:53               ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Bob Liu @ 2016-01-23  0:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Boris Ostrovsky, xen-devel, Wei Liu, David Vrabel, Ian Campbell


On 01/22/2016 07:02 PM, Jan Beulich wrote:
>>>> On 22.01.16 at 11:40, <bob.liu@oracle.com> wrote:
>> On 01/22/2016 03:53 PM, Jan Beulich wrote:
>>>>>> On 22.01.16 at 04:36, <bob.liu@oracle.com> wrote:
>>>> By the way, do you think it's possible to make grant table support bigger 
>>>> page e.g 64K?
>>>> One grant-ref per 64KB instead of 4KB, this should able to reduce the grant 
>>>> entry consumption significantly.
>>>
>>> How would that work with an underlying page size of 4k, and pages
>>> potentially being non-contiguous in machine address space? Besides
>>> that the grant table hypercall interface isn't prepared to support
>>> 64k page size, due to its use of uint16_t for the length of copy ops.
>>
>> Right, and I mean whether we should consider address all the place as your 
>> mentioned.
> 
> Just from an abstract perspective: How would you envision to avoid
> machine address discontiguity? Or would you want to limit such an

E.g Reserve a page pool with continuous 64KB pages, or make grant-map support huge page(2MB)?
To be honest, I haven't think much about the detail.

Do you think that's unlikely to implement?
If yes, we have to limit the queue numbers, VM numbers and vdisk/vif numbers in a proper way
to make sure the guests won't enter grant-exhausted state.

> improvement to only HVM/PVH/HVMlite guests?
> 
> Jan
> 

-- 
Regards,
-Bob

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: netfront/netback multiqueue exhausting grants
  2016-01-23  0:29             ` Bob Liu
@ 2016-01-25  9:53               ` Jan Beulich
  0 siblings, 0 replies; 21+ messages in thread
From: Jan Beulich @ 2016-01-25  9:53 UTC (permalink / raw)
  To: Bob Liu; +Cc: Boris Ostrovsky, xen-devel, Wei Liu, David Vrabel, Ian Campbell

>>> On 23.01.16 at 01:29, <bob.liu@oracle.com> wrote:

> On 01/22/2016 07:02 PM, Jan Beulich wrote:
>>>>> On 22.01.16 at 11:40, <bob.liu@oracle.com> wrote:
>>> On 01/22/2016 03:53 PM, Jan Beulich wrote:
>>>>>>> On 22.01.16 at 04:36, <bob.liu@oracle.com> wrote:
>>>>> By the way, do you think it's possible to make grant table support bigger 
>>>>> page e.g 64K?
>>>>> One grant-ref per 64KB instead of 4KB, this should able to reduce the grant 
>>>>> entry consumption significantly.
>>>>
>>>> How would that work with an underlying page size of 4k, and pages
>>>> potentially being non-contiguous in machine address space? Besides
>>>> that the grant table hypercall interface isn't prepared to support
>>>> 64k page size, due to its use of uint16_t for the length of copy ops.
>>>
>>> Right, and I mean whether we should consider address all the place as your 
>>> mentioned.
>> 
>> Just from an abstract perspective: How would you envision to avoid
>> machine address discontiguity? Or would you want to limit such an
> 
> E.g Reserve a page pool with continuous 64KB pages, or make grant-map support 
> huge page(2MB)?
> To be honest, I haven't think much about the detail.
> 
> Do you think that's unlikely to implement?

Contiguous memory (of whatever granularity above 4k) is quite
difficult to _guarantee_ in PV guests, so yes, without you or
someone else having a fantastic new idea on how to achieve
this I indeed see this pretty unlikely a thing to come true.

Jan

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-01-25  9:53 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-20 12:23 netfront/netback multiqueue exhausting grants Ian Campbell
2016-01-20 14:40 ` Boris Ostrovsky
2016-01-20 14:52   ` Ian Campbell
2016-01-20 15:02     ` David Vrabel
2016-01-20 15:10       ` Boris Ostrovsky
2016-01-20 15:16         ` Ian Campbell
2016-01-21 10:12           ` Ian Campbell
2016-01-21 10:25             ` Wei Liu
2016-01-21 10:37               ` Ian Campbell
2016-01-21 10:52                 ` Wei Liu
2016-01-20 16:18 ` annie li
2016-01-21 10:56 ` David Vrabel
2016-01-21 12:19   ` Ian Campbell
2016-01-21 14:17     ` David Vrabel
2016-01-21 15:11       ` annie li
2016-01-22  3:36     ` Bob Liu
2016-01-22  7:53       ` Jan Beulich
2016-01-22 10:40         ` Bob Liu
2016-01-22 11:02           ` Jan Beulich
2016-01-23  0:29             ` Bob Liu
2016-01-25  9:53               ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.