All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] multiqueue changes
@ 2009-10-08  7:18 Eric Dumazet
  2009-10-08  9:03 ` Jarek Poplawski
  0 siblings, 1 reply; 47+ messages in thread
From: Eric Dumazet @ 2009-10-08  7:18 UTC (permalink / raw)
  To: David S. Miller, Jarek Poplawski, Patrick McHardy; +Cc: Linux Netdev List

Say I have such non multiqueue device :

03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12)

Driver bnx2

This drivers does an alloc_etherdev_mq(sizeof(*bp), TX_MAX_RINGS),
regardless of real capabilities of the NIC.

Then, a bit later it does :

bp->dev->real_num_tx_queues = bp->num_tx_rings;  

(one in my non multiqueue case)

Now I have :

# tc -s -d qdisc show dev eth0
qdisc mq 0: root
 Sent 117693091 bytes 1542188 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
# tc -s -d class show dev eth0
class mq :1 root
 Sent 113935509 bytes 1492598 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
class mq :2 root
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
class mq :3 root
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
class mq :4 root
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
class mq :5 root
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
class mq :6 root
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
class mq :7 root
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
class mq :8 root
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0

While in previous kernels I had :

# tc -s -d qdisc show dev eth0
qdisc pfifo_fast 0: root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 26292963818 bytes 347139141 pkts (dropped 0, overlimits 0 requeues 0)
# tc -s -d class show dev eth0


So I lost the default pfifo_fast classification.

Just wondering if it could hurt some people.

Anyway, should we change bnx2/tg3 drivers so that single queue devices
have same default qdisc/class than before ?

eg :

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 08cddb6..7cac205 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -6152,6 +6152,7 @@ bnx2_setup_int_mode(struct bnx2 *bp, int dis_msi)
 
 	bp->num_tx_rings = rounddown_pow_of_two(bp->irq_nvecs);
 	bp->dev->real_num_tx_queues = bp->num_tx_rings;
+	bp->dev->num_tx_queues = bp->dev->real_num_tx_queues;
 
 	bp->num_rx_rings = bp->irq_nvecs;
 }

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [RFC] multiqueue changes
  2009-10-08  7:18 [RFC] multiqueue changes Eric Dumazet
@ 2009-10-08  9:03 ` Jarek Poplawski
  2009-10-08 12:00   ` Jarek Poplawski
  2009-10-09  8:51   ` [RFC] multiqueue changes Jarek Poplawski
  0 siblings, 2 replies; 47+ messages in thread
From: Jarek Poplawski @ 2009-10-08  9:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Patrick McHardy, Linux Netdev List

On Thu, Oct 08, 2009 at 09:18:45AM +0200, Eric Dumazet wrote:
> Say I have such non multiqueue device :
> 
> 03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708S Gigabit Ethernet (rev 12)
> 
> Driver bnx2
> 
> This drivers does an alloc_etherdev_mq(sizeof(*bp), TX_MAX_RINGS),
> regardless of real capabilities of the NIC.
> 
> Then, a bit later it does :
> 
> bp->dev->real_num_tx_queues = bp->num_tx_rings;  
> 
> (one in my non multiqueue case)
> 
> Now I have :
> 
> # tc -s -d qdisc show dev eth0
> qdisc mq 0: root
>  Sent 117693091 bytes 1542188 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> # tc -s -d class show dev eth0
> class mq :1 root
>  Sent 113935509 bytes 1492598 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :2 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :3 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :4 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :5 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :6 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :7 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :8 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> 
> While in previous kernels I had :
> 
> # tc -s -d qdisc show dev eth0
> qdisc pfifo_fast 0: root bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>  Sent 26292963818 bytes 347139141 pkts (dropped 0, overlimits 0 requeues 0)
> # tc -s -d class show dev eth0
> 
> 
> So I lost the default pfifo_fast classification.

IMHO it (pfifo_fast qdiscs under mq root) could/should get back.

> 
> Just wondering if it could hurt some people.
> 
> Anyway, should we change bnx2/tg3 drivers so that single queue devices
> have same default qdisc/class than before ?
> 
> eg :
> 
> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> index 08cddb6..7cac205 100644
> --- a/drivers/net/bnx2.c
> +++ b/drivers/net/bnx2.c
> @@ -6152,6 +6152,7 @@ bnx2_setup_int_mode(struct bnx2 *bp, int dis_msi)
>  
>  	bp->num_tx_rings = rounddown_pow_of_two(bp->irq_nvecs);
>  	bp->dev->real_num_tx_queues = bp->num_tx_rings;
> +	bp->dev->num_tx_queues = bp->dev->real_num_tx_queues;
>  
>  	bp->num_rx_rings = bp->irq_nvecs;
>  }

It doesn't look consistent to me wrt. the comment in netdevice.h on
num_tx_queues. But it seems we should rather use more often
real_num_tx_queue in schedulers code like dumps and maybe more
(like e.g. sch_multiq does).

Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] multiqueue changes
  2009-10-08  9:03 ` Jarek Poplawski
@ 2009-10-08 12:00   ` Jarek Poplawski
  2009-10-08 12:13     ` Eric Dumazet
  2009-10-28 17:27     ` Patrick McHardy
  2009-10-09  8:51   ` [RFC] multiqueue changes Jarek Poplawski
  1 sibling, 2 replies; 47+ messages in thread
From: Jarek Poplawski @ 2009-10-08 12:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Patrick McHardy, Linux Netdev List

On Thu, Oct 08, 2009 at 09:03:44AM +0000, Jarek Poplawski wrote:
...
> num_tx_queues. But it seems we should rather use more often
> real_num_tx_queue in schedulers code like dumps and maybe more
> (like e.g. sch_multiq does).

...i.e. probably everywhere between dev_activate and dev_deactivate
all qdisc operations could use real_num_tx_queues (including a test
like: netif_is_real_multique), unless I miss something.

Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] multiqueue changes
  2009-10-08 12:00   ` Jarek Poplawski
@ 2009-10-08 12:13     ` Eric Dumazet
  2009-10-08 12:53       ` Jarek Poplawski
  2009-10-09  7:58       ` David Miller
  2009-10-28 17:27     ` Patrick McHardy
  1 sibling, 2 replies; 47+ messages in thread
From: Eric Dumazet @ 2009-10-08 12:13 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David S. Miller, Patrick McHardy, Linux Netdev List

Jarek Poplawski a écrit :
> On Thu, Oct 08, 2009 at 09:03:44AM +0000, Jarek Poplawski wrote:
> ...
>> num_tx_queues. But it seems we should rather use more often
>> real_num_tx_queue in schedulers code like dumps and maybe more
>> (like e.g. sch_multiq does).
> 
> ...i.e. probably everywhere between dev_activate and dev_deactivate
> all qdisc operations could use real_num_tx_queues (including a test
> like: netif_is_real_multique), unless I miss something.

I am not sure David intent was being able to dynamically adjust real_num_tx_queue
between 1 and num_tx_queue.

For low/moderate traffic, its better to use one queue, to lower IRQ activations,
and let some cpus sleep longer.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] multiqueue changes
  2009-10-08 12:13     ` Eric Dumazet
@ 2009-10-08 12:53       ` Jarek Poplawski
  2009-10-09  7:58       ` David Miller
  1 sibling, 0 replies; 47+ messages in thread
From: Jarek Poplawski @ 2009-10-08 12:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Patrick McHardy, Linux Netdev List

On Thu, Oct 08, 2009 at 02:13:22PM +0200, Eric Dumazet wrote:
> Jarek Poplawski a écrit :
> > On Thu, Oct 08, 2009 at 09:03:44AM +0000, Jarek Poplawski wrote:
> > ...
> >> num_tx_queues. But it seems we should rather use more often
> >> real_num_tx_queue in schedulers code like dumps and maybe more
> >> (like e.g. sch_multiq does).
> > 
> > ...i.e. probably everywhere between dev_activate and dev_deactivate
> > all qdisc operations could use real_num_tx_queues (including a test
> > like: netif_is_real_multique), unless I miss something.
> 
> I am not sure David intent was being able to dynamically adjust real_num_tx_queue
> between 1 and num_tx_queue.

If so, then it looks like some drivers could misuse this intent.

Thanks for the explanation,
Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] multiqueue changes
  2009-10-08 12:13     ` Eric Dumazet
  2009-10-08 12:53       ` Jarek Poplawski
@ 2009-10-09  7:58       ` David Miller
  1 sibling, 0 replies; 47+ messages in thread
From: David Miller @ 2009-10-09  7:58 UTC (permalink / raw)
  To: eric.dumazet; +Cc: jarkao2, kaber, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 08 Oct 2009 14:13:22 +0200

> I am not sure David intent was being able to dynamically adjust
> real_num_tx_queue between 1 and num_tx_queue.

The idea was to allow semi-dynamic adjustment.

Meaning that you could change the number of TX queues, but only in
some quiescent state, such as when the device is down or frozen while
up in some way.

dev->num_tx_queues tracks how many total were allocated.  This is
necessary so that even if the real_num_tx_queues is modified while the
device is up, we can still see the correct statistics by gathering
from queues that are now disabled but were enabled beforehand.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] multiqueue changes
  2009-10-08  9:03 ` Jarek Poplawski
  2009-10-08 12:00   ` Jarek Poplawski
@ 2009-10-09  8:51   ` Jarek Poplawski
  2009-10-09  9:40     ` Jarek Poplawski
  1 sibling, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2009-10-09  8:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Patrick McHardy, Linux Netdev List

On Thu, Oct 08, 2009 at 09:03:44AM +0000, Jarek Poplawski wrote:
> On Thu, Oct 08, 2009 at 09:18:45AM +0200, Eric Dumazet wrote:
...
> > Just wondering if it could hurt some people.
> > 
> > Anyway, should we change bnx2/tg3 drivers so that single queue devices
> > have same default qdisc/class than before ?
> > 
> > eg :
> > 
> > diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> > index 08cddb6..7cac205 100644
> > --- a/drivers/net/bnx2.c
> > +++ b/drivers/net/bnx2.c
> > @@ -6152,6 +6152,7 @@ bnx2_setup_int_mode(struct bnx2 *bp, int dis_msi)
> >  
> >  	bp->num_tx_rings = rounddown_pow_of_two(bp->irq_nvecs);
> >  	bp->dev->real_num_tx_queues = bp->num_tx_rings;
> > +	bp->dev->num_tx_queues = bp->dev->real_num_tx_queues;
> >  
> >  	bp->num_rx_rings = bp->irq_nvecs;
> >  }
> 
> It doesn't look consistent to me wrt. the comment in netdevice.h on
> num_tx_queues. But it seems we should rather use more often
> real_num_tx_queue in schedulers code like dumps and maybe more
> (like e.g. sch_multiq does).

So, according to my current understanding, we should probably let
drivers to reset the tx_queues allocation with a new num_tx_queues
in a place like this (ndo_open).

Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] multiqueue changes
  2009-10-09  8:51   ` [RFC] multiqueue changes Jarek Poplawski
@ 2009-10-09  9:40     ` Jarek Poplawski
  0 siblings, 0 replies; 47+ messages in thread
From: Jarek Poplawski @ 2009-10-09  9:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, Patrick McHardy, Linux Netdev List

On Fri, Oct 09, 2009 at 08:51:07AM +0000, Jarek Poplawski wrote:
> On Thu, Oct 08, 2009 at 09:03:44AM +0000, Jarek Poplawski wrote:
> > On Thu, Oct 08, 2009 at 09:18:45AM +0200, Eric Dumazet wrote:
> ...
> > > Just wondering if it could hurt some people.
> > > 
> > > Anyway, should we change bnx2/tg3 drivers so that single queue devices
> > > have same default qdisc/class than before ?
> > > 
> > > eg :
> > > 
> > > diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> > > index 08cddb6..7cac205 100644
> > > --- a/drivers/net/bnx2.c
> > > +++ b/drivers/net/bnx2.c
> > > @@ -6152,6 +6152,7 @@ bnx2_setup_int_mode(struct bnx2 *bp, int dis_msi)
> > >  
> > >  	bp->num_tx_rings = rounddown_pow_of_two(bp->irq_nvecs);
> > >  	bp->dev->real_num_tx_queues = bp->num_tx_rings;
> > > +	bp->dev->num_tx_queues = bp->dev->real_num_tx_queues;
> > >  
> > >  	bp->num_rx_rings = bp->irq_nvecs;
> > >  }
> > 
> > It doesn't look consistent to me wrt. the comment in netdevice.h on
> > num_tx_queues. But it seems we should rather use more often
> > real_num_tx_queue in schedulers code like dumps and maybe more
> > (like e.g. sch_multiq does).
> 
> So, according to my current understanding, we should probably let
> drivers to reset the tx_queues allocation with a new num_tx_queues
> in a place like this (ndo_open).

Actually, it should be only for the first ndo_open.

Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] multiqueue changes
  2009-10-08 12:00   ` Jarek Poplawski
  2009-10-08 12:13     ` Eric Dumazet
@ 2009-10-28 17:27     ` Patrick McHardy
  2009-10-28 21:23       ` Jarek Poplawski
  1 sibling, 1 reply; 47+ messages in thread
From: Patrick McHardy @ 2009-10-28 17:27 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Eric Dumazet, David S. Miller, Linux Netdev List

Jarek Poplawski wrote:
> On Thu, Oct 08, 2009 at 09:03:44AM +0000, Jarek Poplawski wrote:
> ...
>> num_tx_queues. But it seems we should rather use more often
>> real_num_tx_queue in schedulers code like dumps and maybe more
>> (like e.g. sch_multiq does).
> 
> ...i.e. probably everywhere between dev_activate and dev_deactivate
> all qdisc operations could use real_num_tx_queues (including a test
> like: netif_is_real_multique), unless I miss something.

We don't seem to be supporting changing real_num_tx_queues for
registered devices currently (at least I couldn't find it).
So I guess it depends on how this would be implemented.

Simply changing the dev->real_num_tx_queues value while the
device is down would require qdisc operations to operate on
all possible queues since the amount of queues in use could
be changed after the qdisc is created/configured, but before
the device is set up. This approach has more complications
like switching between mq and non-mq root qdiscs, taking care
of non-default root qdisc (cloning them to the new queues), etc.

A simpler alternative would be to destroy the existing root
qdisc on any change to real_num_tx_queues and have dev_activate()
set it up from scratch. In this case, we could (as you suggested)
use real_num_tx_queues, which should fix the problem Eric reported.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] multiqueue changes
  2009-10-28 17:27     ` Patrick McHardy
@ 2009-10-28 21:23       ` Jarek Poplawski
  2009-10-29 16:37         ` Patrick McHardy
  0 siblings, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2009-10-28 21:23 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Eric Dumazet, David S. Miller, Linux Netdev List

On Wed, Oct 28, 2009 at 06:27:10PM +0100, Patrick McHardy wrote:
> Jarek Poplawski wrote:
> > On Thu, Oct 08, 2009 at 09:03:44AM +0000, Jarek Poplawski wrote:
> > ...
> >> num_tx_queues. But it seems we should rather use more often
> >> real_num_tx_queue in schedulers code like dumps and maybe more
> >> (like e.g. sch_multiq does).
> > 
> > ...i.e. probably everywhere between dev_activate and dev_deactivate
> > all qdisc operations could use real_num_tx_queues (including a test
> > like: netif_is_real_multique), unless I miss something.
> 
> We don't seem to be supporting changing real_num_tx_queues for
> registered devices currently (at least I couldn't find it).
> So I guess it depends on how this would be implemented.
> 
> Simply changing the dev->real_num_tx_queues value while the
> device is down would require qdisc operations to operate on
> all possible queues since the amount of queues in use could
> be changed after the qdisc is created/configured, but before
> the device is set up. This approach has more complications
> like switching between mq and non-mq root qdiscs, taking care
> of non-default root qdisc (cloning them to the new queues), etc.
> 
> A simpler alternative would be to destroy the existing root
> qdisc on any change to real_num_tx_queues and have dev_activate()
> set it up from scratch. In this case, we could (as you suggested)
> use real_num_tx_queues, which should fix the problem Eric reported.

Actually, I changed my mind after Eric's and especially David's
explanations. Probably there will be needed some changes in handling
the real_num_tx_queues, but there is no reason to misuse them for
masking a totally useless num_tx_queues value, like in this case. So,
IMHO, its mainly about the driver(s) (and maybe a bit of API change)
here.

Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] multiqueue changes
  2009-10-28 21:23       ` Jarek Poplawski
@ 2009-10-29 16:37         ` Patrick McHardy
  2009-10-29 21:15           ` Jarek Poplawski
  0 siblings, 1 reply; 47+ messages in thread
From: Patrick McHardy @ 2009-10-29 16:37 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Eric Dumazet, David S. Miller, Linux Netdev List

Jarek Poplawski wrote:
> On Wed, Oct 28, 2009 at 06:27:10PM +0100, Patrick McHardy wrote:
>> We don't seem to be supporting changing real_num_tx_queues for
>> registered devices currently (at least I couldn't find it).
>> So I guess it depends on how this would be implemented.
>>
>> Simply changing the dev->real_num_tx_queues value while the
>> device is down would require qdisc operations to operate on
>> all possible queues since the amount of queues in use could
>> be changed after the qdisc is created/configured, but before
>> the device is set up. This approach has more complications
>> like switching between mq and non-mq root qdiscs, taking care
>> of non-default root qdisc (cloning them to the new queues), etc.
>>
>> A simpler alternative would be to destroy the existing root
>> qdisc on any change to real_num_tx_queues and have dev_activate()
>> set it up from scratch. In this case, we could (as you suggested)
>> use real_num_tx_queues, which should fix the problem Eric reported.
> 
> Actually, I changed my mind after Eric's and especially David's
> explanations. Probably there will be needed some changes in handling
> the real_num_tx_queues, but there is no reason to misuse them for
> masking a totally useless num_tx_queues value, like in this case. So,
> IMHO, its mainly about the driver(s) (and maybe a bit of API change)
> here.

Well, we do need both values for supporting changes to the actually
used numbers of TX queues. If I understood Dave's explanation correctly,
this is also what's intended. It also doesn't seem unreasonable
what bnx2 is doing.

But getting back to the problem Eric reported - so you're suggesting
that bnx2.c should also adjust num_tx_queues in case the hardware
doesn't support multiqueue? That seems reasonable as well.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] multiqueue changes
  2009-10-29 16:37         ` Patrick McHardy
@ 2009-10-29 21:15           ` Jarek Poplawski
  2009-10-29 22:12             ` Patrick McHardy
  0 siblings, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2009-10-29 21:15 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Eric Dumazet, David S. Miller, Linux Netdev List

On Thu, Oct 29, 2009 at 05:37:23PM +0100, Patrick McHardy wrote:
...
> Well, we do need both values for supporting changes to the actually
> used numbers of TX queues. If I understood Dave's explanation correctly,
> this is also what's intended. It also doesn't seem unreasonable
> what bnx2 is doing.

Exactly. With a growing number of cores, both available and powered
off, these values will be soon treated more carefully than now.

> But getting back to the problem Eric reported - so you're suggesting
> that bnx2.c should also adjust num_tx_queues in case the hardware
> doesn't support multiqueue? That seems reasonable as well.

Currently, declaring num_tx_queues with alloc_netdev_mq() looks like
too soon for some drivers. It seems they should be able to do it
separately later during the .probe. There is a question if .ndo_open
should be considered too.

Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] multiqueue changes
  2009-10-29 21:15           ` Jarek Poplawski
@ 2009-10-29 22:12             ` Patrick McHardy
  2009-10-30 10:00               ` Jarek Poplawski
  0 siblings, 1 reply; 47+ messages in thread
From: Patrick McHardy @ 2009-10-29 22:12 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Eric Dumazet, David S. Miller, Linux Netdev List

Jarek Poplawski wrote:
> On Thu, Oct 29, 2009 at 05:37:23PM +0100, Patrick McHardy wrote:
> ...
>> Well, we do need both values for supporting changes to the actually
>> used numbers of TX queues. If I understood Dave's explanation correctly,
>> this is also what's intended. It also doesn't seem unreasonable
>> what bnx2 is doing.
> 
> Exactly. With a growing number of cores, both available and powered
> off, these values will be soon treated more carefully than now.
> 
>> But getting back to the problem Eric reported - so you're suggesting
>> that bnx2.c should also adjust num_tx_queues in case the hardware
>> doesn't support multiqueue? That seems reasonable as well.
> 
> Currently, declaring num_tx_queues with alloc_netdev_mq() looks like
> too soon for some drivers. It seems they should be able to do it
> separately later during the .probe.

The value passed into alloc_netdev_mq() is just used for allocation
purposes, from what I can tell there's no downside in reducing it
before the dev_activate() call.

> There is a question if .ndo_open should be considered too.

I currently can't see any purpose in decreasing num_tx_queues after
registration instead of real_num_tx_queues. But it depends on how
exactly this will be implemented and how it interacts with qdiscs
(hence me previous mail, where I tried to point out possible
inconsistencies from using real_num_tx_queues).

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] multiqueue changes
  2009-10-29 22:12             ` Patrick McHardy
@ 2009-10-30 10:00               ` Jarek Poplawski
  2009-10-31 17:25                 ` Michael Chan
  0 siblings, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2009-10-30 10:00 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Eric Dumazet, David S. Miller, Linux Netdev List

On Thu, Oct 29, 2009 at 11:12:39PM +0100, Patrick McHardy wrote:
> Jarek Poplawski wrote:
> > On Thu, Oct 29, 2009 at 05:37:23PM +0100, Patrick McHardy wrote:
> > ...
> >> Well, we do need both values for supporting changes to the actually
> >> used numbers of TX queues. If I understood Dave's explanation correctly,
> >> this is also what's intended. It also doesn't seem unreasonable
> >> what bnx2 is doing.
> > 
> > Exactly. With a growing number of cores, both available and powered
> > off, these values will be soon treated more carefully than now.
> > 
> >> But getting back to the problem Eric reported - so you're suggesting
> >> that bnx2.c should also adjust num_tx_queues in case the hardware
> >> doesn't support multiqueue? That seems reasonable as well.
> > 
> > Currently, declaring num_tx_queues with alloc_netdev_mq() looks like
> > too soon for some drivers. It seems they should be able to do it
> > separately later during the .probe.
> 
> The value passed into alloc_netdev_mq() is just used for allocation
> purposes, from what I can tell there's no downside in reducing it
> before the dev_activate() call.

Right, but IMHO this reducing (or reallocation) should be done with
some API. Simple overwriting of num_tx_queues proposed by Eric, even
if no downside now, seems to be asking for problems in the future.

> > There is a question if .ndo_open should be considered too.
> 
> I currently can't see any purpose in decreasing num_tx_queues after
> registration instead of real_num_tx_queues.

I agree, but since Eric's example shows some drivers do it (almost)
like this, I'd prefer authors/maintainers answer this question.

Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] multiqueue changes
  2009-10-30 10:00               ` Jarek Poplawski
@ 2009-10-31 17:25                 ` Michael Chan
  2009-11-01 13:20                   ` Jarek Poplawski
  0 siblings, 1 reply; 47+ messages in thread
From: Michael Chan @ 2009-10-31 17:25 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Patrick McHardy, Eric Dumazet, David S. Miller, Linux Netdev List


On Fri, 2009-10-30 at 03:00 -0700, Jarek Poplawski wrote:
> On Thu, Oct 29, 2009 at 11:12:39PM +0100, Patrick McHardy wrote:
> > Jarek Poplawski wrote:
> > > On Thu, Oct 29, 2009 at 05:37:23PM +0100, Patrick McHardy wrote:
> > > ...
> > >> Well, we do need both values for supporting changes to the actually
> > >> used numbers of TX queues. If I understood Dave's explanation correctly,
> > >> this is also what's intended. It also doesn't seem unreasonable
> > >> what bnx2 is doing.
> > > 
> > > Exactly. With a growing number of cores, both available and powered
> > > off, these values will be soon treated more carefully than now.
> > > 
> > >> But getting back to the problem Eric reported - so you're suggesting
> > >> that bnx2.c should also adjust num_tx_queues in case the hardware
> > >> doesn't support multiqueue? That seems reasonable as well.
> > > 
> > > Currently, declaring num_tx_queues with alloc_netdev_mq() looks like
> > > too soon for some drivers. It seems they should be able to do it
> > > separately later during the .probe.
> > 
> > The value passed into alloc_netdev_mq() is just used for allocation
> > purposes, from what I can tell there's no downside in reducing it
> > before the dev_activate() call.
> 
> Right, but IMHO this reducing (or reallocation) should be done with
> some API. Simple overwriting of num_tx_queues proposed by Eric, even
> if no downside now, seems to be asking for problems in the future.
> 
> > > There is a question if .ndo_open should be considered too.
> > 
> > I currently can't see any purpose in decreasing num_tx_queues after
> > registration instead of real_num_tx_queues.
> 
> I agree, but since Eric's example shows some drivers do it (almost)
> like this, I'd prefer authors/maintainers answer this question.
> 

We need the netdev and the private structure at the beginning of
->probe() to store values as we probe the device.  Later on in
->probe(), we'll know whether the device supports MSI-X and multiqueue.
If we successfully enable MSI-X in ->ndo_open(), tx multiqueue will be
used.

So if we can separate the allocation of the netdev and the private
structure from the tx queues, we can allocate and set the exact number
of num_tx_queues in ->ndo_open().



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] multiqueue changes
  2009-10-31 17:25                 ` Michael Chan
@ 2009-11-01 13:20                   ` Jarek Poplawski
  2009-11-02 11:35                     ` David Miller
  0 siblings, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2009-11-01 13:20 UTC (permalink / raw)
  To: Michael Chan
  Cc: Patrick McHardy, Eric Dumazet, David S. Miller, Linux Netdev List

On Sat, Oct 31, 2009 at 10:25:52AM -0700, Michael Chan wrote:
> 
> On Fri, 2009-10-30 at 03:00 -0700, Jarek Poplawski wrote:
> > On Thu, Oct 29, 2009 at 11:12:39PM +0100, Patrick McHardy wrote:
...
> > > I currently can't see any purpose in decreasing num_tx_queues after
> > > registration instead of real_num_tx_queues.
> > 
> > I agree, but since Eric's example shows some drivers do it (almost)
> > like this, I'd prefer authors/maintainers answer this question.
> > 
> 
> We need the netdev and the private structure at the beginning of
> ->probe() to store values as we probe the device.  Later on in
> ->probe(), we'll know whether the device supports MSI-X and multiqueue.
> If we successfully enable MSI-X in ->ndo_open(), tx multiqueue will be
> used.
> 
> So if we can separate the allocation of the netdev and the private
> structure from the tx queues, we can allocate and set the exact number
> of num_tx_queues in ->ndo_open().

There is a question if we can predict in ->probe() MSI-X should be
successfully enabled in ->ndo_open() for probed hardware. If so,
then it could go e.g. like this:
->probe()
	...
	dev = alloc_etherdev_mq(sizeof(*bp), 1)
	...
	if (MSI-X_available && device_supports_MSI-X_and_multiqueue)
		realloc_netdev_mq(dev, TX_MAX_RINGS)
	register_netdev(dev)

->ndo_open()
	if (!enabled_MSI-X)
		/* something gone wrong but it's an exception */
		dev->real_num_tx_queues = 1

Otherwise, (re)allocation in ->ndo_open() would need to answer some
questions about reinitializing scheduler vs preserving qdisc stats
between opens.

Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] multiqueue changes
  2009-11-01 13:20                   ` Jarek Poplawski
@ 2009-11-02 11:35                     ` David Miller
  2009-11-02 12:30                       ` Jarek Poplawski
  0 siblings, 1 reply; 47+ messages in thread
From: David Miller @ 2009-11-02 11:35 UTC (permalink / raw)
  To: jarkao2; +Cc: mchan, kaber, eric.dumazet, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Sun, 1 Nov 2009 14:20:17 +0100

> There is a question if we can predict in ->probe() MSI-X should be
> successfully enabled in ->ndo_open() for probed hardware. If so,
> then it could go e.g. like this:

We never can know this.

Another device driver can eat up all the MSI-X vectors in the PCI
domain before we make the request_irq() calls in ->open().

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] multiqueue changes
  2009-11-02 11:35                     ` David Miller
@ 2009-11-02 12:30                       ` Jarek Poplawski
  2009-11-02 12:39                         ` David Miller
  0 siblings, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2009-11-02 12:30 UTC (permalink / raw)
  To: David Miller; +Cc: mchan, kaber, eric.dumazet, netdev

On Mon, Nov 02, 2009 at 03:35:33AM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Sun, 1 Nov 2009 14:20:17 +0100
> 
> > There is a question if we can predict in ->probe() MSI-X should be
> > successfully enabled in ->ndo_open() for probed hardware. If so,
> > then it could go e.g. like this:
> 
> We never can know this.
> 
> Another device driver can eat up all the MSI-X vectors in the PCI
> domain before we make the request_irq() calls in ->open().

Right, but it's not a 50% chance, I guess? A user most of the time
gets consistently multiqueue or non-multiqueue behavior after open,
unless I miss something. Then such an exceptional state could be
handled by real_num_tx_queues (just like in case of powered of cpus).
The main difference is to hold in num_tx_queues something that is
really available vs max possible value for all configs.

Jarek P. 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] multiqueue changes
  2009-11-02 12:30                       ` Jarek Poplawski
@ 2009-11-02 12:39                         ` David Miller
  2009-11-02 13:02                           ` Jarek Poplawski
                                             ` (2 more replies)
  0 siblings, 3 replies; 47+ messages in thread
From: David Miller @ 2009-11-02 12:39 UTC (permalink / raw)
  To: jarkao2; +Cc: mchan, kaber, eric.dumazet, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Mon, 2 Nov 2009 12:30:29 +0000

> Right, but it's not a 50% chance, I guess? A user most of the time
> gets consistently multiqueue or non-multiqueue behavior after open,
> unless I miss something. Then such an exceptional state could be
> handled by real_num_tx_queues (just like in case of powered of cpus).
> The main difference is to hold in num_tx_queues something that is
> really available vs max possible value for all configs.

I see your point, yes this would seem to be a reasonable way
to start handling num_tx_queues and real_num_tx_queues.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] multiqueue changes
  2009-11-02 12:39                         ` David Miller
@ 2009-11-02 13:02                           ` Jarek Poplawski
  2009-11-02 13:03                             ` Eric Dumazet
  2009-12-03 14:10                           ` [PATCH] net: Introduce realloc_netdev_mq() Jarek Poplawski
  2009-12-03 14:39                           ` [PATCH v2] " Jarek Poplawski
  2 siblings, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2009-11-02 13:02 UTC (permalink / raw)
  To: David Miller; +Cc: mchan, kaber, eric.dumazet, netdev

On Mon, Nov 02, 2009 at 04:39:07AM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Mon, 2 Nov 2009 12:30:29 +0000
> 
> > Right, but it's not a 50% chance, I guess? A user most of the time
> > gets consistently multiqueue or non-multiqueue behavior after open,
> > unless I miss something. Then such an exceptional state could be
> > handled by real_num_tx_queues (just like in case of powered of cpus).
> > The main difference is to hold in num_tx_queues something that is
> > really available vs max possible value for all configs.
> 
> I see your point, yes this would seem to be a reasonable way
> to start handling num_tx_queues and real_num_tx_queues.

Very nice! So, I hope Eric should be satisfied with these requested
comments already :-)

Thanks everybody,
Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] multiqueue changes
  2009-11-02 13:02                           ` Jarek Poplawski
@ 2009-11-02 13:03                             ` Eric Dumazet
  2009-11-02 13:09                               ` Jarek Poplawski
  0 siblings, 1 reply; 47+ messages in thread
From: Eric Dumazet @ 2009-11-02 13:03 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, mchan, kaber, netdev

Jarek Poplawski a écrit :
> On Mon, Nov 02, 2009 at 04:39:07AM -0800, David Miller wrote:
>> From: Jarek Poplawski <jarkao2@gmail.com>
>> Date: Mon, 2 Nov 2009 12:30:29 +0000
>>
>>> Right, but it's not a 50% chance, I guess? A user most of the time
>>> gets consistently multiqueue or non-multiqueue behavior after open,
>>> unless I miss something. Then such an exceptional state could be
>>> handled by real_num_tx_queues (just like in case of powered of cpus).
>>> The main difference is to hold in num_tx_queues something that is
>>> really available vs max possible value for all configs.
>> I see your point, yes this would seem to be a reasonable way
>> to start handling num_tx_queues and real_num_tx_queues.
> 
> Very nice! So, I hope Eric should be satisfied with these requested
> comments already :-)
> 
Sure, but I prefer a patch from you ;)


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [RFC] multiqueue changes
  2009-11-02 13:03                             ` Eric Dumazet
@ 2009-11-02 13:09                               ` Jarek Poplawski
  0 siblings, 0 replies; 47+ messages in thread
From: Jarek Poplawski @ 2009-11-02 13:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, mchan, kaber, netdev

On Mon, Nov 02, 2009 at 02:03:21PM +0100, Eric Dumazet wrote:
...
> Sure, but I prefer a patch from you ;)
> 
Wrong code! (RFP? ;-)

Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH] net: Introduce realloc_netdev_mq()
  2009-11-02 12:39                         ` David Miller
  2009-11-02 13:02                           ` Jarek Poplawski
@ 2009-12-03 14:10                           ` Jarek Poplawski
  2009-12-03 14:39                           ` [PATCH v2] " Jarek Poplawski
  2 siblings, 0 replies; 47+ messages in thread
From: Jarek Poplawski @ 2009-12-03 14:10 UTC (permalink / raw)
  To: David Miller; +Cc: mchan, kaber, eric.dumazet, netdev

On Mon, Nov 02, 2009 at 04:39:07AM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Mon, 2 Nov 2009 12:30:29 +0000
> 
> > Right, but it's not a 50% chance, I guess? A user most of the time
> > gets consistently multiqueue or non-multiqueue behavior after open,
> > unless I miss something. Then such an exceptional state could be
> > handled by real_num_tx_queues (just like in case of powered of cpus).
> > The main difference is to hold in num_tx_queues something that is
> > really available vs max possible value for all configs.
> 
> I see your point, yes this would seem to be a reasonable way
> to start handling num_tx_queues and real_num_tx_queues.

Here is a proposal of netdev api change. I hope Michael finds time
to try if this can be really useful for drivers.

Thanks,
Jarek P.
--------------->

This patch separates allocation of TX subqueues from alloc_netdev_mq()
to realloc_netdev_mq() to allow for resizing like in this example:

some_nic_probe()
{
	...
	dev = alloc_etherdev_mq(sizeof(*bp), 1)
	...
	if (MSI-X_available && device_supports_MSI-X_and_multiqueue)
		realloc_netdev_mq(dev, TX_MAX_RINGS)
	register_netdev(dev)
	...
}

The main difference is to hold in num_tx_queues something that is
really available, instead of max possible value for all configs, in
case of drivers allocating net_device at the beginning of the probe.

The description of alloc_netdev_mq() is fixed btw.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---

 net/core/dev.c |   65 +++++++++++++++++++++++++++++++++++++++----------------
 1 files changed, 46 insertions(+), 19 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index e3e18de..7ea3a77 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5265,11 +5265,49 @@ static void netdev_init_one_queue(struct net_device *dev,
 	queue->dev = dev;
 }
 
-static void netdev_init_queues(struct net_device *dev)
+/**
+ *	realloc_netdev_mq - (re)allocate network subqueues
+ *	@dev:		device
+ *	@queue_count:	the number of subqueues to (re)allocate
+ *
+ *	(Re)allocates and initializes subqueue structs for each queue.
+ *	It is allowed to use only until register_netdev().
+ *	On error previous structs are intact.
+ */
+int realloc_netdev_mq(struct net_device *dev, unsigned int queue_count)
 {
-	netdev_init_one_queue(dev, &dev->rx_queue, NULL);
+	struct netdev_queue *tx;
+
+	tx = kcalloc(queue_count, sizeof(struct netdev_queue), GFP_KERNEL);
+	if (!tx) {
+		printk(KERN_ERR "alloc_netdev: Unable to allocate "
+		       "tx qdiscs.\n");
+		return -ENOMEM;
+	}
+
+	kfree(dev->_tx);
+
+	dev->_tx = tx;
+	dev->num_tx_queues = queue_count;
+	dev->real_num_tx_queues = queue_count;
+
 	netdev_for_each_tx_queue(dev, netdev_init_one_queue, NULL);
+
+	return 0;
+}
+EXPORT_SYMBOL(realloc_netdev_mq);
+
+static int netdev_init_queues(struct net_device *dev, unsigned int queue_count)
+{
+	int err = realloc_netdev_mq(dev, queue_count);
+
+	if (err)
+		return err;
+
+	netdev_init_one_queue(dev, &dev->rx_queue, NULL);
 	spin_lock_init(&dev->tx_global_lock);
+
+	return 0;
 }
 
 /**
@@ -5280,13 +5318,12 @@ static void netdev_init_queues(struct net_device *dev)
  *	@queue_count:	the number of subqueues to allocate
  *
  *	Allocates a struct net_device with private data area for driver use
- *	and performs basic initialization.  Also allocates subquue structs
- *	for each queue on the device at the end of the netdevice.
+ *	and performs basic initialization.  Also allocates subqueue structs
+ *	for each queue on the device.
  */
 struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 		void (*setup)(struct net_device *), unsigned int queue_count)
 {
-	struct netdev_queue *tx;
 	struct net_device *dev;
 	size_t alloc_size;
 	struct net_device *p;
@@ -5308,16 +5345,12 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 		return NULL;
 	}
 
-	tx = kcalloc(queue_count, sizeof(struct netdev_queue), GFP_KERNEL);
-	if (!tx) {
-		printk(KERN_ERR "alloc_netdev: Unable to allocate "
-		       "tx qdiscs.\n");
-		goto free_p;
-	}
-
 	dev = PTR_ALIGN(p, NETDEV_ALIGN);
 	dev->padded = (char *)dev - (char *)p;
 
+	if (netdev_init_queues(dev, queue_count))
+		goto free_p;
+
 	if (dev_addr_init(dev))
 		goto free_tx;
 
@@ -5325,14 +5358,8 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 
 	dev_net_set(dev, &init_net);
 
-	dev->_tx = tx;
-	dev->num_tx_queues = queue_count;
-	dev->real_num_tx_queues = queue_count;
-
 	dev->gso_max_size = GSO_MAX_SIZE;
 
-	netdev_init_queues(dev);
-
 	INIT_LIST_HEAD(&dev->napi_list);
 	INIT_LIST_HEAD(&dev->unreg_list);
 	INIT_LIST_HEAD(&dev->link_watch_list);
@@ -5342,7 +5369,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 	return dev;
 
 free_tx:
-	kfree(tx);
+	kfree(dev->_tx);
 
 free_p:
 	kfree(p);

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2] net: Introduce realloc_netdev_mq()
  2009-11-02 12:39                         ` David Miller
  2009-11-02 13:02                           ` Jarek Poplawski
  2009-12-03 14:10                           ` [PATCH] net: Introduce realloc_netdev_mq() Jarek Poplawski
@ 2009-12-03 14:39                           ` Jarek Poplawski
  2009-12-03 15:17                             ` Eric Dumazet
  2 siblings, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2009-12-03 14:39 UTC (permalink / raw)
  To: David Miller; +Cc: mchan, kaber, eric.dumazet, netdev

Take 2 (with forgotten headers, sorry).

On Mon, Nov 02, 2009 at 04:39:07AM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Mon, 2 Nov 2009 12:30:29 +0000
> 
> > Right, but it's not a 50% chance, I guess? A user most of the time
> > gets consistently multiqueue or non-multiqueue behavior after open,
> > unless I miss something. Then such an exceptional state could be
> > handled by real_num_tx_queues (just like in case of powered of cpus).
> > The main difference is to hold in num_tx_queues something that is
> > really available vs max possible value for all configs.
> 
> I see your point, yes this would seem to be a reasonable way
> to start handling num_tx_queues and real_num_tx_queues.

Here is a proposal of netdev api change. I hope Michael finds time
to try if this can be really useful for drivers.

Thanks,
Jarek P.
---------------> (take 2)

This patch separates allocation of TX subqueues from alloc_netdev_mq()
to realloc_netdev_mq() to allow for resizing like in this example:

some_nic_probe()
{
	...
	dev = alloc_etherdev_mq(sizeof(*bp), 1)
	...
	if (MSI-X_available && device_supports_MSI-X_and_multiqueue)
		realloc_netdev_mq(dev, TX_MAX_RINGS)
	register_netdev(dev)
	...
}

The main difference is to hold in num_tx_queues something that is
really available, instead of max possible value for all configs, in
case of drivers allocating net_device at the beginning of the probe.

The description of alloc_netdev_mq() is fixed btw.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
---

 include/linux/netdevice.h |    1 +
 net/core/dev.c            |   65 +++++++++++++++++++++++++++++++-------------
 2 files changed, 47 insertions(+), 19 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index daf13d3..4b6e2ef 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1904,6 +1904,7 @@ extern void		ether_setup(struct net_device *dev);
 extern struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 				       void (*setup)(struct net_device *),
 				       unsigned int queue_count);
+extern int realloc_netdev_mq(struct net_device *dev, unsigned int queue_count);
 #define alloc_netdev(sizeof_priv, name, setup) \
 	alloc_netdev_mq(sizeof_priv, name, setup, 1)
 extern int		register_netdev(struct net_device *dev);
diff --git a/net/core/dev.c b/net/core/dev.c
index e3e18de..7ea3a77 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5265,11 +5265,49 @@ static void netdev_init_one_queue(struct net_device *dev,
 	queue->dev = dev;
 }
 
-static void netdev_init_queues(struct net_device *dev)
+/**
+ *	realloc_netdev_mq - (re)allocate network subqueues
+ *	@dev:		device
+ *	@queue_count:	the number of subqueues to (re)allocate
+ *
+ *	(Re)allocates and initializes subqueue structs for each queue.
+ *	It is allowed to use only until register_netdev().
+ *	On error previous structs are intact.
+ */
+int realloc_netdev_mq(struct net_device *dev, unsigned int queue_count)
 {
-	netdev_init_one_queue(dev, &dev->rx_queue, NULL);
+	struct netdev_queue *tx;
+
+	tx = kcalloc(queue_count, sizeof(struct netdev_queue), GFP_KERNEL);
+	if (!tx) {
+		printk(KERN_ERR "alloc_netdev: Unable to allocate "
+		       "tx qdiscs.\n");
+		return -ENOMEM;
+	}
+
+	kfree(dev->_tx);
+
+	dev->_tx = tx;
+	dev->num_tx_queues = queue_count;
+	dev->real_num_tx_queues = queue_count;
+
 	netdev_for_each_tx_queue(dev, netdev_init_one_queue, NULL);
+
+	return 0;
+}
+EXPORT_SYMBOL(realloc_netdev_mq);
+
+static int netdev_init_queues(struct net_device *dev, unsigned int queue_count)
+{
+	int err = realloc_netdev_mq(dev, queue_count);
+
+	if (err)
+		return err;
+
+	netdev_init_one_queue(dev, &dev->rx_queue, NULL);
 	spin_lock_init(&dev->tx_global_lock);
+
+	return 0;
 }
 
 /**
@@ -5280,13 +5318,12 @@ static void netdev_init_queues(struct net_device *dev)
  *	@queue_count:	the number of subqueues to allocate
  *
  *	Allocates a struct net_device with private data area for driver use
- *	and performs basic initialization.  Also allocates subquue structs
- *	for each queue on the device at the end of the netdevice.
+ *	and performs basic initialization.  Also allocates subqueue structs
+ *	for each queue on the device.
  */
 struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 		void (*setup)(struct net_device *), unsigned int queue_count)
 {
-	struct netdev_queue *tx;
 	struct net_device *dev;
 	size_t alloc_size;
 	struct net_device *p;
@@ -5308,16 +5345,12 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 		return NULL;
 	}
 
-	tx = kcalloc(queue_count, sizeof(struct netdev_queue), GFP_KERNEL);
-	if (!tx) {
-		printk(KERN_ERR "alloc_netdev: Unable to allocate "
-		       "tx qdiscs.\n");
-		goto free_p;
-	}
-
 	dev = PTR_ALIGN(p, NETDEV_ALIGN);
 	dev->padded = (char *)dev - (char *)p;
 
+	if (netdev_init_queues(dev, queue_count))
+		goto free_p;
+
 	if (dev_addr_init(dev))
 		goto free_tx;
 
@@ -5325,14 +5358,8 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 
 	dev_net_set(dev, &init_net);
 
-	dev->_tx = tx;
-	dev->num_tx_queues = queue_count;
-	dev->real_num_tx_queues = queue_count;
-
 	dev->gso_max_size = GSO_MAX_SIZE;
 
-	netdev_init_queues(dev);
-
 	INIT_LIST_HEAD(&dev->napi_list);
 	INIT_LIST_HEAD(&dev->unreg_list);
 	INIT_LIST_HEAD(&dev->link_watch_list);
@@ -5342,7 +5369,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 	return dev;
 
 free_tx:
-	kfree(tx);
+	kfree(dev->_tx);
 
 free_p:
 	kfree(p);

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v2] net: Introduce realloc_netdev_mq()
  2009-12-03 14:39                           ` [PATCH v2] " Jarek Poplawski
@ 2009-12-03 15:17                             ` Eric Dumazet
  2009-12-03 16:36                               ` Jarek Poplawski
  0 siblings, 1 reply; 47+ messages in thread
From: Eric Dumazet @ 2009-12-03 15:17 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, mchan, kaber, netdev

Jarek Poplawski a écrit :
> Take 2 (with forgotten headers, sorry).

Very nice, thanks !

> Here is a proposal of netdev api change. I hope Michael finds time
> to try if this can be really useful for drivers.
> 
> Thanks,
> Jarek P.
> ---------------> (take 2)
> 
> This patch separates allocation of TX subqueues from alloc_netdev_mq()
> to realloc_netdev_mq() to allow for resizing like in this example:
> 
> some_nic_probe()
> {
> 	...
> 	dev = alloc_etherdev_mq(sizeof(*bp), 1)
> 	...
> 	if (MSI-X_available && device_supports_MSI-X_and_multiqueue)
> 		realloc_netdev_mq(dev, TX_MAX_RINGS)
> 	register_netdev(dev)
> 	...
> }
> 

Or the reverse :

ie, the device first allocate its max possible tx rings

dev = alloc_etherdev_mq(sizeof(*bp), 128)

then, later, try to reduce it, when knowing exact tx count.

if (realloc_netdev_mq(dev, real_queues))
	dev->real_num_tx_queues = real_queues;

In this case the memory error is not fatal.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2] net: Introduce realloc_netdev_mq()
  2009-12-03 15:17                             ` Eric Dumazet
@ 2009-12-03 16:36                               ` Jarek Poplawski
  2009-12-03 16:54                                 ` Jarek Poplawski
  0 siblings, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2009-12-03 16:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, mchan, kaber, netdev

On Thu, Dec 03, 2009 at 04:17:43PM +0100, Eric Dumazet wrote:
> Jarek Poplawski a écrit :
> > Take 2 (with forgotten headers, sorry).
> 
> Very nice, thanks !
> 
> > Here is a proposal of netdev api change. I hope Michael finds time
> > to try if this can be really useful for drivers.
> > 
> > Thanks,
> > Jarek P.
> > ---------------> (take 2)
> > 
> > This patch separates allocation of TX subqueues from alloc_netdev_mq()
> > to realloc_netdev_mq() to allow for resizing like in this example:
> > 
> > some_nic_probe()
> > {
> > 	...
> > 	dev = alloc_etherdev_mq(sizeof(*bp), 1)
> > 	...
> > 	if (MSI-X_available && device_supports_MSI-X_and_multiqueue)
> > 		realloc_netdev_mq(dev, TX_MAX_RINGS)
> > 	register_netdev(dev)
> > 	...
> > }
> > 
> 
> Or the reverse :
> 
> ie, the device first allocate its max possible tx rings
> 
> dev = alloc_etherdev_mq(sizeof(*bp), 128)
> 
> then, later, try to reduce it, when knowing exact tx count.

Right, especially when we know it's a common config/capability for
a NIC.

> 
> if (realloc_netdev_mq(dev, real_queues))
> 	dev->real_num_tx_queues = real_queues;
> 
> In this case the memory error is not fatal.

Good point! We can consider doing this inside the function too?

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2] net: Introduce realloc_netdev_mq()
  2009-12-03 16:36                               ` Jarek Poplawski
@ 2009-12-03 16:54                                 ` Jarek Poplawski
  2009-12-03 17:05                                   ` Eric Dumazet
  0 siblings, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2009-12-03 16:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, mchan, kaber, netdev

On Thu, Dec 03, 2009 at 05:36:40PM +0100, Jarek Poplawski wrote:
> On Thu, Dec 03, 2009 at 04:17:43PM +0100, Eric Dumazet wrote:
> > if (realloc_netdev_mq(dev, real_queues))
> > 	dev->real_num_tx_queues = real_queues;
> > 
> > In this case the memory error is not fatal.
> 
> Good point! We can consider doing this inside the function too?

Hmm... Of course, not exactly this - I mean using min().

Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2] net: Introduce realloc_netdev_mq()
  2009-12-03 16:54                                 ` Jarek Poplawski
@ 2009-12-03 17:05                                   ` Eric Dumazet
  2009-12-03 19:04                                     ` [PATCH v3] " Jarek Poplawski
  2009-12-03 20:29                                     ` [PATCH v4] " Jarek Poplawski
  0 siblings, 2 replies; 47+ messages in thread
From: Eric Dumazet @ 2009-12-03 17:05 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, mchan, kaber, netdev

Jarek Poplawski a écrit :
> On Thu, Dec 03, 2009 at 05:36:40PM +0100, Jarek Poplawski wrote:
>> On Thu, Dec 03, 2009 at 04:17:43PM +0100, Eric Dumazet wrote:
>>> if (realloc_netdev_mq(dev, real_queues))
>>> 	dev->real_num_tx_queues = real_queues;
>>>
>>> In this case the memory error is not fatal.
>> Good point! We can consider doing this inside the function too?
> 
> Hmm... Of course, not exactly this - I mean using min().

Sure, allowing to reduce the count in case new allocation failed.

And report an error if caller wanted to increase number of queues and allocation failed.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v3] net: Introduce realloc_netdev_mq()
  2009-12-03 17:05                                   ` Eric Dumazet
@ 2009-12-03 19:04                                     ` Jarek Poplawski
  2009-12-03 20:29                                     ` [PATCH v4] " Jarek Poplawski
  1 sibling, 0 replies; 47+ messages in thread
From: Jarek Poplawski @ 2009-12-03 19:04 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, mchan, kaber, netdev

On Thu, Dec 03, 2009 at 06:05:39PM +0100, Eric Dumazet wrote:
> Jarek Poplawski a écrit :
> > On Thu, Dec 03, 2009 at 05:36:40PM +0100, Jarek Poplawski wrote:
> >> On Thu, Dec 03, 2009 at 04:17:43PM +0100, Eric Dumazet wrote:
> >>> if (realloc_netdev_mq(dev, real_queues))
> >>> 	dev->real_num_tx_queues = real_queues;
> >>>
> >>> In this case the memory error is not fatal.
> >> Good point! We can consider doing this inside the function too?
> > 
> > Hmm... Of course, not exactly this - I mean using min().
> 
> Sure, allowing to reduce the count in case new allocation failed.
> 
> And report an error if caller wanted to increase number of queues and allocation failed.

OK. Here it is. I guess, you could sign it off, btw?

Thanks,
Jarek P.
---------------> (take 3)

This patch separates allocation of TX subqueues from alloc_netdev_mq()
to realloc_netdev_mq() to allow for resizing like in this example:

some_nic_probe()
{
	...
	dev = alloc_etherdev_mq(sizeof(*bp), 1)
	...
	if (MSI-X_available && device_supports_MSI-X_and_multiqueue)
		realloc_netdev_mq(dev, TX_MAX_RINGS)
	register_netdev(dev)
	...
}

Alternatively, it can be done in reverse order: starting from the
highest queue_count and reallocating with a lower one.

The main difference is to hold in num_tx_queues something that is
really available, instead of max possible value for all configs, in
case of drivers allocating net_device at the beginning of the probe.

The description of alloc_netdev_mq() is fixed btw.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
---

 include/linux/netdevice.h |    1 +
 net/core/dev.c            |   72 +++++++++++++++++++++++++++++++++------------
 2 files changed, 54 insertions(+), 19 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index daf13d3..36cbd53 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1904,6 +1904,7 @@ extern void		ether_setup(struct net_device *dev);
 extern struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 				       void (*setup)(struct net_device *),
 				       unsigned int queue_count);
+extern int realloc_netdev_mq(struct net_device *dev, unsigned int queue_count);
 #define alloc_netdev(sizeof_priv, name, setup) \
 	alloc_netdev_mq(sizeof_priv, name, setup, 1)
 extern int		register_netdev(struct net_device *dev);
diff --git a/net/core/dev.c b/net/core/dev.c
index e3e18de..ffc5c2f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5265,11 +5265,56 @@ static void netdev_init_one_queue(struct net_device *dev,
 	queue->dev = dev;
 }
 
-static void netdev_init_queues(struct net_device *dev)
+/**
+ *	realloc_netdev_mq - (re)allocate network subqueues
+ *	@dev:		device
+ *	@queue_count:	the number of subqueues to (re)allocate
+ *
+ *	(Re)allocates and initializes subqueue structs for each queue.
+ *	It is allowed to use only until register_netdev().
+ *	On allocation failure previous structs are intact, but an error is
+ *	returned only when the queue count is increased. Otherwise, only
+ *	dev->real_num_tx_queue is replaced with a lower queue_count value.
+ */
+int realloc_netdev_mq(struct net_device *dev, unsigned int queue_count)
 {
-	netdev_init_one_queue(dev, &dev->rx_queue, NULL);
+	struct netdev_queue *tx;
+
+	tx = kcalloc(queue_count, sizeof(struct netdev_queue), GFP_KERNEL);
+	if (!tx) {
+		printk(KERN_ERR "alloc_netdev: Unable to (re)allocate "
+		       "tx qdiscs.\n");
+
+		if (dev->real_num_tx_queues < queue_count)
+			return -ENOMEM;
+
+		dev->real_num_tx_queues = queue_count;
+		return 0;
+	}
+
+	kfree(dev->_tx);
+
+	dev->_tx = tx;
+	dev->num_tx_queues = queue_count;
+	dev->real_num_tx_queues = queue_count;
+
 	netdev_for_each_tx_queue(dev, netdev_init_one_queue, NULL);
+
+	return 0;
+}
+EXPORT_SYMBOL(realloc_netdev_mq);
+
+static int netdev_init_queues(struct net_device *dev, unsigned int queue_count)
+{
+	int err = realloc_netdev_mq(dev, queue_count);
+
+	if (err)
+		return err;
+
+	netdev_init_one_queue(dev, &dev->rx_queue, NULL);
 	spin_lock_init(&dev->tx_global_lock);
+
+	return 0;
 }
 
 /**
@@ -5280,13 +5325,12 @@ static void netdev_init_queues(struct net_device *dev)
  *	@queue_count:	the number of subqueues to allocate
  *
  *	Allocates a struct net_device with private data area for driver use
- *	and performs basic initialization.  Also allocates subquue structs
- *	for each queue on the device at the end of the netdevice.
+ *	and performs basic initialization.  Also allocates subqueue structs
+ *	for each queue on the device.
  */
 struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 		void (*setup)(struct net_device *), unsigned int queue_count)
 {
-	struct netdev_queue *tx;
 	struct net_device *dev;
 	size_t alloc_size;
 	struct net_device *p;
@@ -5308,16 +5352,12 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 		return NULL;
 	}
 
-	tx = kcalloc(queue_count, sizeof(struct netdev_queue), GFP_KERNEL);
-	if (!tx) {
-		printk(KERN_ERR "alloc_netdev: Unable to allocate "
-		       "tx qdiscs.\n");
-		goto free_p;
-	}
-
 	dev = PTR_ALIGN(p, NETDEV_ALIGN);
 	dev->padded = (char *)dev - (char *)p;
 
+	if (netdev_init_queues(dev, queue_count))
+		goto free_p;
+
 	if (dev_addr_init(dev))
 		goto free_tx;
 
@@ -5325,14 +5365,8 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 
 	dev_net_set(dev, &init_net);
 
-	dev->_tx = tx;
-	dev->num_tx_queues = queue_count;
-	dev->real_num_tx_queues = queue_count;
-
 	dev->gso_max_size = GSO_MAX_SIZE;
 
-	netdev_init_queues(dev);
-
 	INIT_LIST_HEAD(&dev->napi_list);
 	INIT_LIST_HEAD(&dev->unreg_list);
 	INIT_LIST_HEAD(&dev->link_watch_list);
@@ -5342,7 +5376,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 	return dev;
 
 free_tx:
-	kfree(tx);
+	kfree(dev->_tx);
 
 free_p:
 	kfree(p);

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v4] net: Introduce realloc_netdev_mq()
  2009-12-03 17:05                                   ` Eric Dumazet
  2009-12-03 19:04                                     ` [PATCH v3] " Jarek Poplawski
@ 2009-12-03 20:29                                     ` Jarek Poplawski
  2009-12-03 21:29                                       ` Eric Dumazet
  1 sibling, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2009-12-03 20:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, mchan, kaber, netdev

On Thu, Dec 03, 2009 at 06:05:39PM +0100, Eric Dumazet wrote:
> Jarek Poplawski a écrit :
> > On Thu, Dec 03, 2009 at 05:36:40PM +0100, Jarek Poplawski wrote:
> >> On Thu, Dec 03, 2009 at 04:17:43PM +0100, Eric Dumazet wrote:
> >>> if (realloc_netdev_mq(dev, real_queues))
> >>> 	dev->real_num_tx_queues = real_queues;
> >>>
> >>> In this case the memory error is not fatal.
> >> Good point! We can consider doing this inside the function too?
> > 
> > Hmm... Of course, not exactly this - I mean using min().
> 
> Sure, allowing to reduce the count in case new allocation failed.
> 
> And report an error if caller wanted to increase number of queues and allocation failed.

Hmm... After re-thinking it looks a bit too complex to me. I think,
there is no reason to not report this error since in most cases it
shouldn't be fatal. That's why I skipped this check in the changelog
example. Unless I miss something?

Thanks,
Jarek P.
---------------> (take 4)

This patch separates allocation of TX subqueues from alloc_netdev_mq()
to realloc_netdev_mq() to allow for resizing like in this example:

some_nic_probe()
{
	...
	dev = alloc_etherdev_mq(sizeof(*bp), 1)
	...
	if (MSI-X_available && device_supports_MSI-X_and_multiqueue)
		realloc_netdev_mq(dev, TX_MAX_RINGS)
	register_netdev(dev)
	...
}

Alternatively, it can be done in reverse order: starting from the
highest queue_count and reallocating with a lower one.

The main difference is to hold in num_tx_queues something that is
really available, instead of max possible value for all configs, in
case of drivers allocating net_device at the beginning of the probe.

The description of alloc_netdev_mq() is fixed btw.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
---

 include/linux/netdevice.h |    1 +
 net/core/dev.c            |   69 ++++++++++++++++++++++++++++++++------------
 2 files changed, 51 insertions(+), 19 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index daf13d3..36cbd53 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1904,6 +1904,7 @@ extern void		ether_setup(struct net_device *dev);
 extern struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 				       void (*setup)(struct net_device *),
 				       unsigned int queue_count);
+extern int realloc_netdev_mq(struct net_device *dev, unsigned int queue_count);
 #define alloc_netdev(sizeof_priv, name, setup) \
 	alloc_netdev_mq(sizeof_priv, name, setup, 1)
 extern int		register_netdev(struct net_device *dev);
diff --git a/net/core/dev.c b/net/core/dev.c
index e3e18de..1f45bae 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5265,11 +5265,53 @@ static void netdev_init_one_queue(struct net_device *dev,
 	queue->dev = dev;
 }
 
-static void netdev_init_queues(struct net_device *dev)
+/**
+ *	realloc_netdev_mq - (re)allocate network subqueues
+ *	@dev:		device
+ *	@queue_count:	the number of subqueues to (re)allocate
+ *
+ *	(Re)allocates and initializes subqueue structs for each queue.
+ *	It is allowed to use only until register_netdev().
+ *	On error previous structs are intact, but dev->real_num_tx_queue is
+ *	replaced if the queue_count is lower.
+ */
+int realloc_netdev_mq(struct net_device *dev, unsigned int queue_count)
 {
-	netdev_init_one_queue(dev, &dev->rx_queue, NULL);
+	struct netdev_queue *tx;
+
+	tx = kcalloc(queue_count, sizeof(struct netdev_queue), GFP_KERNEL);
+	if (!tx) {
+		printk(KERN_ERR "alloc_netdev: Unable to (re)allocate "
+		       "tx qdiscs.\n");
+		if (dev->real_num_tx_queues > queue_count)
+			dev->real_num_tx_queues = queue_count;
+
+		return -ENOMEM;
+	}
+
+	kfree(dev->_tx);
+
+	dev->_tx = tx;
+	dev->num_tx_queues = queue_count;
+	dev->real_num_tx_queues = queue_count;
+
 	netdev_for_each_tx_queue(dev, netdev_init_one_queue, NULL);
+
+	return 0;
+}
+EXPORT_SYMBOL(realloc_netdev_mq);
+
+static int netdev_init_queues(struct net_device *dev, unsigned int queue_count)
+{
+	int err = realloc_netdev_mq(dev, queue_count);
+
+	if (err)
+		return err;
+
+	netdev_init_one_queue(dev, &dev->rx_queue, NULL);
 	spin_lock_init(&dev->tx_global_lock);
+
+	return 0;
 }
 
 /**
@@ -5280,13 +5322,12 @@ static void netdev_init_queues(struct net_device *dev)
  *	@queue_count:	the number of subqueues to allocate
  *
  *	Allocates a struct net_device with private data area for driver use
- *	and performs basic initialization.  Also allocates subquue structs
- *	for each queue on the device at the end of the netdevice.
+ *	and performs basic initialization.  Also allocates subqueue structs
+ *	for each queue on the device.
  */
 struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 		void (*setup)(struct net_device *), unsigned int queue_count)
 {
-	struct netdev_queue *tx;
 	struct net_device *dev;
 	size_t alloc_size;
 	struct net_device *p;
@@ -5308,16 +5349,12 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 		return NULL;
 	}
 
-	tx = kcalloc(queue_count, sizeof(struct netdev_queue), GFP_KERNEL);
-	if (!tx) {
-		printk(KERN_ERR "alloc_netdev: Unable to allocate "
-		       "tx qdiscs.\n");
-		goto free_p;
-	}
-
 	dev = PTR_ALIGN(p, NETDEV_ALIGN);
 	dev->padded = (char *)dev - (char *)p;
 
+	if (netdev_init_queues(dev, queue_count))
+		goto free_p;
+
 	if (dev_addr_init(dev))
 		goto free_tx;
 
@@ -5325,14 +5362,8 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 
 	dev_net_set(dev, &init_net);
 
-	dev->_tx = tx;
-	dev->num_tx_queues = queue_count;
-	dev->real_num_tx_queues = queue_count;
-
 	dev->gso_max_size = GSO_MAX_SIZE;
 
-	netdev_init_queues(dev);
-
 	INIT_LIST_HEAD(&dev->napi_list);
 	INIT_LIST_HEAD(&dev->unreg_list);
 	INIT_LIST_HEAD(&dev->link_watch_list);
@@ -5342,7 +5373,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 	return dev;
 
 free_tx:
-	kfree(tx);
+	kfree(dev->_tx);
 
 free_p:
 	kfree(p);

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] net: Introduce realloc_netdev_mq()
  2009-12-03 20:29                                     ` [PATCH v4] " Jarek Poplawski
@ 2009-12-03 21:29                                       ` Eric Dumazet
  2009-12-03 21:31                                         ` David Miller
  0 siblings, 1 reply; 47+ messages in thread
From: Eric Dumazet @ 2009-12-03 21:29 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, mchan, kaber, netdev

Jarek Poplawski a écrit :
> On Thu, Dec 03, 2009 at 06:05:39PM +0100, Eric Dumazet wrote:
>> Jarek Poplawski a écrit :
>>> On Thu, Dec 03, 2009 at 05:36:40PM +0100, Jarek Poplawski wrote:
>>>> On Thu, Dec 03, 2009 at 04:17:43PM +0100, Eric Dumazet wrote:
>>>>> if (realloc_netdev_mq(dev, real_queues))
>>>>> 	dev->real_num_tx_queues = real_queues;
>>>>>
>>>>> In this case the memory error is not fatal.
>>>> Good point! We can consider doing this inside the function too?
>>> Hmm... Of course, not exactly this - I mean using min().
>> Sure, allowing to reduce the count in case new allocation failed.
>>
>> And report an error if caller wanted to increase number of queues and allocation failed.
> 
> Hmm... After re-thinking it looks a bit too complex to me. I think,
> there is no reason to not report this error since in most cases it
> shouldn't be fatal. That's why I skipped this check in the changelog
> example. Unless I miss something?
> 
> Thanks,
> Jarek P.
> ---------------> (take 4)
> 
> This patch separates allocation of TX subqueues from alloc_netdev_mq()
> to realloc_netdev_mq() to allow for resizing like in this example:
> 
> some_nic_probe()
> {
> 	...
> 	dev = alloc_etherdev_mq(sizeof(*bp), 1)
> 	...
> 	if (MSI-X_available && device_supports_MSI-X_and_multiqueue)
> 		realloc_netdev_mq(dev, TX_MAX_RINGS)
> 	register_netdev(dev)
> 	...
> }
> 
> Alternatively, it can be done in reverse order: starting from the
> highest queue_count and reallocating with a lower one.
> 
> The main difference is to hold in num_tx_queues something that is
> really available, instead of max possible value for all configs, in
> case of drivers allocating net_device at the beginning of the probe.
> 
> The description of alloc_netdev_mq() is fixed btw.
> 
> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>

Nice patch, thanks :)

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] net: Introduce realloc_netdev_mq()
  2009-12-03 21:29                                       ` Eric Dumazet
@ 2009-12-03 21:31                                         ` David Miller
  2009-12-03 21:32                                           ` Eric Dumazet
  2009-12-03 21:51                                           ` Eric Dumazet
  0 siblings, 2 replies; 47+ messages in thread
From: David Miller @ 2009-12-03 21:31 UTC (permalink / raw)
  To: eric.dumazet; +Cc: jarkao2, mchan, kaber, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 03 Dec 2009 22:29:42 +0100

> Nice patch, thanks :)
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

I like it too, but please resubmit once we have at least
one example user submitted.

Thanks!

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] net: Introduce realloc_netdev_mq()
  2009-12-03 21:31                                         ` David Miller
@ 2009-12-03 21:32                                           ` Eric Dumazet
  2009-12-03 21:51                                           ` Eric Dumazet
  1 sibling, 0 replies; 47+ messages in thread
From: Eric Dumazet @ 2009-12-03 21:32 UTC (permalink / raw)
  To: David Miller; +Cc: jarkao2, mchan, kaber, netdev

David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 03 Dec 2009 22:29:42 +0100
> 
>> Nice patch, thanks :)
>>
>> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> I like it too, but please resubmit once we have at least
> one example user submitted.
> 
> Thanks!

Well, I was going to do that :)



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] net: Introduce realloc_netdev_mq()
  2009-12-03 21:31                                         ` David Miller
  2009-12-03 21:32                                           ` Eric Dumazet
@ 2009-12-03 21:51                                           ` Eric Dumazet
  2009-12-03 22:47                                             ` Jarek Poplawski
  1 sibling, 1 reply; 47+ messages in thread
From: Eric Dumazet @ 2009-12-03 21:51 UTC (permalink / raw)
  To: David Miller; +Cc: jarkao2, mchan, kaber, netdev, Jeff Kirsher

David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 03 Dec 2009 22:29:42 +0100
> 
>> Nice patch, thanks :)
>>
>> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> I like it too, but please resubmit once we have at least
> one example user submitted.
> 

I successfully tested following patch.

Thanks !

[PATCH] ixgbe: Use realloc_netdev_mq() helper

Instead of 128 tx queues, we can cap the tx queue number to what driver really uses.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---

diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index 9ba506f..e524d0d 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -3306,7 +3306,7 @@ static void ixgbe_set_num_queues(struct ixgbe_adapter *adapter)
 
 done:
 	/* Notify the stack of the (possibly) reduced Tx Queue count. */
-	adapter->netdev->real_num_tx_queues = adapter->num_tx_queues;
+	realloc_netdev_mq(adapter->netdev, adapter->num_tx_queues);
 }
 
 static void ixgbe_acquire_msix_vectors(struct ixgbe_adapter *adapter,


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] net: Introduce realloc_netdev_mq()
  2009-12-03 21:51                                           ` Eric Dumazet
@ 2009-12-03 22:47                                             ` Jarek Poplawski
  2009-12-03 23:04                                               ` Eric Dumazet
  0 siblings, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2009-12-03 22:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, mchan, kaber, netdev, Jeff Kirsher

On Thu, Dec 03, 2009 at 10:51:25PM +0100, Eric Dumazet wrote:
> David Miller a écrit :
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Thu, 03 Dec 2009 22:29:42 +0100
> > 
> >> Nice patch, thanks :)
> >>
> >> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
> > 
> > I like it too, but please resubmit once we have at least
> > one example user submitted.
> > 
> 
> I successfully tested following patch.

Great! But, I see, checking if realloc_netdev_mq() use is always legal
(before register_netdev()) is not trivial in this driver. (I have some
suspicions around ixgbe_resume().) I wonder, if there should be added
some debugging for this.

Thanks,
Jarek P.

> 
> Thanks !
> 
> [PATCH] ixgbe: Use realloc_netdev_mq() helper
> 
> Instead of 128 tx queues, we can cap the tx queue number to what driver really uses.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> 
> diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
> index 9ba506f..e524d0d 100644
> --- a/drivers/net/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ixgbe/ixgbe_main.c
> @@ -3306,7 +3306,7 @@ static void ixgbe_set_num_queues(struct ixgbe_adapter *adapter)
>  
>  done:
>  	/* Notify the stack of the (possibly) reduced Tx Queue count. */
> -	adapter->netdev->real_num_tx_queues = adapter->num_tx_queues;
> +	realloc_netdev_mq(adapter->netdev, adapter->num_tx_queues);
>  }
>  
>  static void ixgbe_acquire_msix_vectors(struct ixgbe_adapter *adapter,
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] net: Introduce realloc_netdev_mq()
  2009-12-03 22:47                                             ` Jarek Poplawski
@ 2009-12-03 23:04                                               ` Eric Dumazet
  2009-12-04  7:48                                                 ` Jarek Poplawski
  2009-12-04 13:01                                                 ` Jarek Poplawski
  0 siblings, 2 replies; 47+ messages in thread
From: Eric Dumazet @ 2009-12-03 23:04 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: David Miller, mchan, kaber, netdev, Jeff Kirsher

Jarek Poplawski a écrit :
> On Thu, Dec 03, 2009 at 10:51:25PM +0100, Eric Dumazet wrote:
>> David Miller a écrit :
>>> From: Eric Dumazet <eric.dumazet@gmail.com>
>>> Date: Thu, 03 Dec 2009 22:29:42 +0100
>>>
>>>> Nice patch, thanks :)
>>>>
>>>> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
>>> I like it too, but please resubmit once we have at least
>>> one example user submitted.
>>>
>> I successfully tested following patch.
> 
> Great! But, I see, checking if realloc_netdev_mq() use is always legal
> (before register_netdev()) is not trivial in this driver. (I have some
> suspicions around ixgbe_resume().) I wonder, if there should be added
> some debugging for this.

Yes, probably

Or even better, allowing realloc_netdev_mq() to be called even after
register_netdev() :)

bnx2 for example could not be patched easily, following patch doesnt work.

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 4cae2a8..8bd5d0d 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -6177,6 +6177,7 @@ bnx2_setup_int_mode(struct bnx2 *bp, int dis_msi)
 
 	bp->num_tx_rings = rounddown_pow_of_two(bp->irq_nvecs);
-	bp->dev->real_num_tx_queues = bp->num_tx_rings;
+	realloc_netdev_mq(bp->dev, bp->num_tx_rings);
 
 	bp->num_rx_rings = bp->irq_nvecs;
 }


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] net: Introduce realloc_netdev_mq()
  2009-12-03 23:04                                               ` Eric Dumazet
@ 2009-12-04  7:48                                                 ` Jarek Poplawski
  2009-12-04 10:51                                                   ` Peter P Waskiewicz Jr
  2009-12-04 13:01                                                 ` Jarek Poplawski
  1 sibling, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2009-12-04  7:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, mchan, kaber, netdev, Jeff Kirsher

On Fri, Dec 04, 2009 at 12:04:42AM +0100, Eric Dumazet wrote:
> Jarek Poplawski a écrit :
> > On Thu, Dec 03, 2009 at 10:51:25PM +0100, Eric Dumazet wrote:
> >> David Miller a écrit :
> >>> From: Eric Dumazet <eric.dumazet@gmail.com>
> >>> Date: Thu, 03 Dec 2009 22:29:42 +0100
> >>>
> >>>> Nice patch, thanks :)
> >>>>
> >>>> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
> >>> I like it too, but please resubmit once we have at least
> >>> one example user submitted.
> >>>
> >> I successfully tested following patch.
> > 
> > Great! But, I see, checking if realloc_netdev_mq() use is always legal
> > (before register_netdev()) is not trivial in this driver. (I have some
> > suspicions around ixgbe_resume().) I wonder, if there should be added
> > some debugging for this.
> 
> Yes, probably
> 
> Or even better, allowing realloc_netdev_mq() to be called even after
> register_netdev() :)

We should try to avoid it because of clashes with qdisc initialization
(especially wrt. preserving stats). But mainly it's about knowing the
exact reason why this thing (probing the hardware for max mq
capabilities) can't be finished before register_netdev(). I'll try to
look at this more.

Jarek P.

> 
> bnx2 for example could not be patched easily, following patch doesnt work.
> 
> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> index 4cae2a8..8bd5d0d 100644
> --- a/drivers/net/bnx2.c
> +++ b/drivers/net/bnx2.c
> @@ -6177,6 +6177,7 @@ bnx2_setup_int_mode(struct bnx2 *bp, int dis_msi)
>  
>  	bp->num_tx_rings = rounddown_pow_of_two(bp->irq_nvecs);
> -	bp->dev->real_num_tx_queues = bp->num_tx_rings;
> +	realloc_netdev_mq(bp->dev, bp->num_tx_rings);
>  
>  	bp->num_rx_rings = bp->irq_nvecs;
>  }
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] net: Introduce realloc_netdev_mq()
  2009-12-04  7:48                                                 ` Jarek Poplawski
@ 2009-12-04 10:51                                                   ` Peter P Waskiewicz Jr
  2009-12-04 11:41                                                     ` Jarek Poplawski
  0 siblings, 1 reply; 47+ messages in thread
From: Peter P Waskiewicz Jr @ 2009-12-04 10:51 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Eric Dumazet, David Miller, mchan, kaber, netdev, Kirsher, Jeffrey T

On Thu, 2009-12-03 at 23:48 -0800, Jarek Poplawski wrote:
> On Fri, Dec 04, 2009 at 12:04:42AM +0100, Eric Dumazet wrote:
> > Jarek Poplawski a écrit :
> > > On Thu, Dec 03, 2009 at 10:51:25PM +0100, Eric Dumazet wrote:
> > >> David Miller a écrit :
> > >>> From: Eric Dumazet <eric.dumazet@gmail.com>
> > >>> Date: Thu, 03 Dec 2009 22:29:42 +0100
> > >>>
> > >>>> Nice patch, thanks :)
> > >>>>
> > >>>> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
> > >>> I like it too, but please resubmit once we have at least
> > >>> one example user submitted.
> > >>>
> > >> I successfully tested following patch.
> > > 
> > > Great! But, I see, checking if realloc_netdev_mq() use is always legal
> > > (before register_netdev()) is not trivial in this driver. (I have some
> > > suspicions around ixgbe_resume().) I wonder, if there should be added
> > > some debugging for this.
> > 
> > Yes, probably
> > 
> > Or even better, allowing realloc_netdev_mq() to be called even after
> > register_netdev() :)
> 
> We should try to avoid it because of clashes with qdisc initialization
> (especially wrt. preserving stats). But mainly it's about knowing the
> exact reason why this thing (probing the hardware for max mq
> capabilities) can't be finished before register_netdev(). I'll try to
> look at this more.
> 

Honestly, there's no reason we can't know how many Tx queues we'll have
prior to registering the netdev.  All of this should be figured out
after ixgbe_alloc_queues() and ixgbe_init_interrupt_scheme() are called.

Once we know what features are enabled, and how many MSI-X vectors the
platform gives us, we can make the call for how many queues to allocate.

I like this realloc_netdev_mq() mechanism.  I'm going to pull that into
my tree for testing, since I unfortunately was unaware it existed (lack
of my poking around).  This is good stuff.

Cheers,
-PJ



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] net: Introduce realloc_netdev_mq()
  2009-12-04 10:51                                                   ` Peter P Waskiewicz Jr
@ 2009-12-04 11:41                                                     ` Jarek Poplawski
  0 siblings, 0 replies; 47+ messages in thread
From: Jarek Poplawski @ 2009-12-04 11:41 UTC (permalink / raw)
  To: Peter P Waskiewicz Jr
  Cc: Eric Dumazet, David Miller, mchan, kaber, netdev, Kirsher, Jeffrey T

On Fri, Dec 04, 2009 at 02:51:38AM -0800, Peter P Waskiewicz Jr wrote:
> Honestly, there's no reason we can't know how many Tx queues we'll have
> prior to registering the netdev.  All of this should be figured out
> after ixgbe_alloc_queues() and ixgbe_init_interrupt_scheme() are called.
> 
> Once we know what features are enabled, and how many MSI-X vectors the
> platform gives us, we can make the call for how many queues to allocate.

Yes. It seems, it's more about cleaning/reordering a few things
sometimes.

> 
> I like this realloc_netdev_mq() mechanism.  I'm going to pull that into
> my tree for testing, since I unfortunately was unaware it existed (lack
> of my poking around).  This is good stuff.

Actually, it's only a simple separation of functionality, no big deal,
but thanks. Since it's still a part of RFC feel free to make it better
or even different way.

Cheers,
Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] net: Introduce realloc_netdev_mq()
  2009-12-03 23:04                                               ` Eric Dumazet
  2009-12-04  7:48                                                 ` Jarek Poplawski
@ 2009-12-04 13:01                                                 ` Jarek Poplawski
  2009-12-04 13:49                                                   ` Jarek Poplawski
  1 sibling, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2009-12-04 13:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, mchan, kaber, netdev, Jeff Kirsher

On Fri, Dec 04, 2009 at 12:04:42AM +0100, Eric Dumazet wrote:
...
> bnx2 for example could not be patched easily, following patch doesnt work.
> 
> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> index 4cae2a8..8bd5d0d 100644
> --- a/drivers/net/bnx2.c
> +++ b/drivers/net/bnx2.c
> @@ -6177,6 +6177,7 @@ bnx2_setup_int_mode(struct bnx2 *bp, int dis_msi)
>  
>  	bp->num_tx_rings = rounddown_pow_of_two(bp->irq_nvecs);
> -	bp->dev->real_num_tx_queues = bp->num_tx_rings;
> +	realloc_netdev_mq(bp->dev, bp->num_tx_rings);
>  
>  	bp->num_rx_rings = bp->irq_nvecs;
>  }

Eric, if didn't miss something in the example you gave at the
beginning of this thread, it seems a change like below (not compiled)
should fix your problem (without even testing MSI exactly)?

Jarek P.
---

 drivers/net/bnx2.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 4bfc808..c1ba2b1 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -8288,6 +8288,11 @@ bnx2_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		dev->features |= NETIF_F_TSO6;
 		vlan_features_add(dev, NETIF_F_TSO6);
 	}
+
+	if (!(bp->flags & BNX2_FLAG_MSIX_CAP) ||
+	    !(bp->flags & BNX2_FLAG_MSI_CAP) || disable_msi)
+		realloc_netdev_mq(dev, 1);
+
 	if ((rc = register_netdev(dev))) {
 		dev_err(&pdev->dev, "Cannot register net device\n");
 		goto error;

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] net: Introduce realloc_netdev_mq()
  2009-12-04 13:01                                                 ` Jarek Poplawski
@ 2009-12-04 13:49                                                   ` Jarek Poplawski
  2010-01-16 22:50                                                     ` Michael Chan
  0 siblings, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2009-12-04 13:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, mchan, kaber, netdev, Jeff Kirsher

On Fri, Dec 04, 2009 at 01:01:34PM +0000, Jarek Poplawski wrote:
> On Fri, Dec 04, 2009 at 12:04:42AM +0100, Eric Dumazet wrote:
> ...
> > bnx2 for example could not be patched easily, following patch doesnt work.
> > 
> > diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> > index 4cae2a8..8bd5d0d 100644
> > --- a/drivers/net/bnx2.c
> > +++ b/drivers/net/bnx2.c
> > @@ -6177,6 +6177,7 @@ bnx2_setup_int_mode(struct bnx2 *bp, int dis_msi)
> >  
> >  	bp->num_tx_rings = rounddown_pow_of_two(bp->irq_nvecs);
> > -	bp->dev->real_num_tx_queues = bp->num_tx_rings;
> > +	realloc_netdev_mq(bp->dev, bp->num_tx_rings);
> >  
> >  	bp->num_rx_rings = bp->irq_nvecs;
> >  }
> 
> Eric, if didn't miss something in the example you gave at the
> beginning of this thread, it seems a change like below (not compiled)
> should fix your problem (without even testing MSI exactly)?

OOPS! Take 2...;-)
 
Jarek P.
---

 drivers/net/bnx2.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 4bfc808..c1ba2b1 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -8288,6 +8288,11 @@ bnx2_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		dev->features |= NETIF_F_TSO6;
 		vlan_features_add(dev, NETIF_F_TSO6);
 	}
+
+	if (!((bp->flags & BNX2_FLAG_MSIX_CAP) ||
+	     (bp->flags & BNX2_FLAG_MSI_CAP)) || disable_msi)
+		realloc_netdev_mq(dev, 1);
+
 	if ((rc = register_netdev(dev))) {
 		dev_err(&pdev->dev, "Cannot register net device\n");
 		goto error;

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] net: Introduce realloc_netdev_mq()
  2009-12-04 13:49                                                   ` Jarek Poplawski
@ 2010-01-16 22:50                                                     ` Michael Chan
  2010-01-17  0:36                                                       ` Jarek Poplawski
  0 siblings, 1 reply; 47+ messages in thread
From: Michael Chan @ 2010-01-16 22:50 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Eric Dumazet, David Miller, kaber, netdev, Jeff Kirsher


On Fri, 2009-12-04 at 05:49 -0800, Jarek Poplawski wrote:
> On Fri, Dec 04, 2009 at 01:01:34PM +0000, Jarek Poplawski wrote:
> > On Fri, Dec 04, 2009 at 12:04:42AM +0100, Eric Dumazet wrote:
> > ...
> > > bnx2 for example could not be patched easily, following patch doesnt work.
> > > 
> > > diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> > > index 4cae2a8..8bd5d0d 100644
> > > --- a/drivers/net/bnx2.c
> > > +++ b/drivers/net/bnx2.c
> > > @@ -6177,6 +6177,7 @@ bnx2_setup_int_mode(struct bnx2 *bp, int dis_msi)
> > >  
> > >  	bp->num_tx_rings = rounddown_pow_of_two(bp->irq_nvecs);
> > > -	bp->dev->real_num_tx_queues = bp->num_tx_rings;
> > > +	realloc_netdev_mq(bp->dev, bp->num_tx_rings);
> > >  
> > >  	bp->num_rx_rings = bp->irq_nvecs;
> > >  }
> > 
> > Eric, if didn't miss something in the example you gave at the
> > beginning of this thread, it seems a change like below (not compiled)
> > should fix your problem (without even testing MSI exactly)?
> 

Bringing back an older thread.  Are we still planning to use
realloc_netdev_mq()?  The problem with the patch below is that
dev->real_num_tx_queues can still be reduced during bnx2_open().

Is it possible to call realloc_netdev_mq() during ->open()?

> OOPS! Take 2...;-)
>  
> Jarek P.
> ---
> 
>  drivers/net/bnx2.c |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> index 4bfc808..c1ba2b1 100644
> --- a/drivers/net/bnx2.c
> +++ b/drivers/net/bnx2.c
> @@ -8288,6 +8288,11 @@ bnx2_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
>  		dev->features |= NETIF_F_TSO6;
>  		vlan_features_add(dev, NETIF_F_TSO6);
>  	}
> +
> +	if (!((bp->flags & BNX2_FLAG_MSIX_CAP) ||
> +	     (bp->flags & BNX2_FLAG_MSI_CAP)) || disable_msi)
> +		realloc_netdev_mq(dev, 1);
> +
>  	if ((rc = register_netdev(dev))) {
>  		dev_err(&pdev->dev, "Cannot register net device\n");
>  		goto error;
> 



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] net: Introduce realloc_netdev_mq()
  2010-01-16 22:50                                                     ` Michael Chan
@ 2010-01-17  0:36                                                       ` Jarek Poplawski
  2010-01-17 16:56                                                         ` Michael Chan
  0 siblings, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2010-01-17  0:36 UTC (permalink / raw)
  To: Michael Chan
  Cc: Eric Dumazet, David Miller, kaber, netdev, Jeff Kirsher,
	Peter P Waskiewicz Jr

On Sat, Jan 16, 2010 at 02:50:48PM -0800, Michael Chan wrote:
> 
> On Fri, 2009-12-04 at 05:49 -0800, Jarek Poplawski wrote:
> > On Fri, Dec 04, 2009 at 01:01:34PM +0000, Jarek Poplawski wrote:
> > > Eric, if didn't miss something in the example you gave at the
> > > beginning of this thread, it seems a change like below (not compiled)
> > > should fix your problem (without even testing MSI exactly)?
> > 
> 
> Bringing back an older thread.  Are we still planning to use
> realloc_netdev_mq()?

I hope so, except it seems we lost Eric's interest in (t)his idea,
and got Peter's kind opinion.

> The problem with the patch below is that
> dev->real_num_tx_queues can still be reduced during bnx2_open().
> 

I'm not sure I get your point, but there should be no problem with
changing dev->real_num_tx_queues during ->open(). The main intention
of realloc_netdev_mq() is to give drivers some official way to change
dev->num_tx_queues until register_netdev() with the main aim: not to
treat obviously non-mq chips as mq according to netif_is_multiqueue().
Additional gain is memory saved in the case fixed by the patch below
(which btw. waits for some refinement/verification).

> Is it possible to call realloc_netdev_mq() during ->open()?

No, and no such intention, since there was kind of agreement it's not
necessary and collides with preserving qdisc state (stats). But the
discussion isn't closed...

Thanks,
Jarek P.

> 
> > OOPS! Take 2...;-)
> >  
> > Jarek P.
> > ---
> > 
> >  drivers/net/bnx2.c |    5 +++++
> >  1 files changed, 5 insertions(+), 0 deletions(-)
> > 
> > diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> > index 4bfc808..c1ba2b1 100644
> > --- a/drivers/net/bnx2.c
> > +++ b/drivers/net/bnx2.c
> > @@ -8288,6 +8288,11 @@ bnx2_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
> >  		dev->features |= NETIF_F_TSO6;
> >  		vlan_features_add(dev, NETIF_F_TSO6);
> >  	}
> > +
> > +	if (!((bp->flags & BNX2_FLAG_MSIX_CAP) ||
> > +	     (bp->flags & BNX2_FLAG_MSI_CAP)) || disable_msi)
> > +		realloc_netdev_mq(dev, 1);
> > +
> >  	if ((rc = register_netdev(dev))) {
> >  		dev_err(&pdev->dev, "Cannot register net device\n");
> >  		goto error;
> > 
> 
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] net: Introduce realloc_netdev_mq()
  2010-01-17  0:36                                                       ` Jarek Poplawski
@ 2010-01-17 16:56                                                         ` Michael Chan
  2010-01-17 22:57                                                           ` Jarek Poplawski
  0 siblings, 1 reply; 47+ messages in thread
From: Michael Chan @ 2010-01-17 16:56 UTC (permalink / raw)
  To: 'Jarek Poplawski'
  Cc: Eric Dumazet, David Miller, kaber, netdev, Jeff Kirsher,
	Peter P Waskiewicz Jr

Jarek Poplawski wrote
> On Sat, Jan 16, 2010 at 02:50:48PM -0800, Michael Chan wrote:
> > The problem with the patch below is that
> > dev->real_num_tx_queues can still be reduced during bnx2_open().
> >
>
> I'm not sure I get your point, but there should be no problem with
> changing dev->real_num_tx_queues during ->open(). The main intention
> of realloc_netdev_mq() is to give drivers some official way to change
> dev->num_tx_queues until register_netdev() with the main aim: not to
> treat obviously non-mq chips as mq according to netif_is_multiqueue().
> Additional gain is memory saved in the case fixed by the patch below
> (which btw. waits for some refinement/verification).

Yes, the patch achieves what you describe.  dev->num_tx_queues will
be set more accurately during ->probe based on chip type.  In ->open(),
the dev->real_num_tx_queues will still be possibly reduced depending
on the number of CPU cores and whether we can enable and allocate the
MSI-X vectors.

>
> > Is it possible to call realloc_netdev_mq() during ->open()?
>
> No, and no such intention, since there was kind of agreement it's not
> necessary and collides with preserving qdisc state (stats). But the
> discussion isn't closed...
>
> Thanks,
> Jarek P.
>
> >
> > > OOPS! Take 2...;-)
> > >
> > > Jarek P.
> > > ---
> > >
> > >  drivers/net/bnx2.c |    5 +++++
> > >  1 files changed, 5 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> > > index 4bfc808..c1ba2b1 100644
> > > --- a/drivers/net/bnx2.c
> > > +++ b/drivers/net/bnx2.c
> > > @@ -8288,6 +8288,11 @@ bnx2_init_one(struct pci_dev
> *pdev, const struct pci_device_id *ent)
> > >           dev->features |= NETIF_F_TSO6;
> > >           vlan_features_add(dev, NETIF_F_TSO6);
> > >   }
> > > +
> > > + if (!((bp->flags & BNX2_FLAG_MSIX_CAP) ||
> > > +      (bp->flags & BNX2_FLAG_MSI_CAP)) || disable_msi)
> > > +         realloc_netdev_mq(dev, 1);
> > > +

I think we can refine this a little:

        if (!(bp->flags & BNX2_FLAG_MSIX_CAP) || disable_msi)
                realloc_netdev_mq(dev, 1);

Thanks.

> > >   if ((rc = register_netdev(dev))) {
> > >           dev_err(&pdev->dev, "Cannot register net device\n");
> > >           goto error;
> > >
> >
> >
>
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] net: Introduce realloc_netdev_mq()
  2010-01-17 16:56                                                         ` Michael Chan
@ 2010-01-17 22:57                                                           ` Jarek Poplawski
  2010-01-18 18:29                                                             ` Michael Chan
  0 siblings, 1 reply; 47+ messages in thread
From: Jarek Poplawski @ 2010-01-17 22:57 UTC (permalink / raw)
  To: Michael Chan
  Cc: Eric Dumazet, David Miller, kaber, netdev, Jeff Kirsher,
	Peter P Waskiewicz Jr

On Sun, Jan 17, 2010 at 08:56:28AM -0800, Michael Chan wrote:
> Jarek Poplawski wrote
> > On Sat, Jan 16, 2010 at 02:50:48PM -0800, Michael Chan wrote:
> > > The problem with the patch below is that
> > > dev->real_num_tx_queues can still be reduced during bnx2_open().
> > >
> >
> > I'm not sure I get your point, but there should be no problem with
> > changing dev->real_num_tx_queues during ->open(). The main intention
> > of realloc_netdev_mq() is to give drivers some official way to change
> > dev->num_tx_queues until register_netdev() with the main aim: not to
> > treat obviously non-mq chips as mq according to netif_is_multiqueue().
> > Additional gain is memory saved in the case fixed by the patch below
> > (which btw. waits for some refinement/verification).
> 
> Yes, the patch achieves what you describe.  dev->num_tx_queues will
> be set more accurately during ->probe based on chip type.  In ->open(),
> the dev->real_num_tx_queues will still be possibly reduced depending
> on the number of CPU cores and whether we can enable and allocate the
> MSI-X vectors.

I wondered, and Peter seemed to confirm, some check whether we can use
MSI-X could be done during ->probe as well?

...
> > > > + if (!((bp->flags & BNX2_FLAG_MSIX_CAP) ||
> > > > +      (bp->flags & BNX2_FLAG_MSI_CAP)) || disable_msi)
> > > > +         realloc_netdev_mq(dev, 1);
> > > > +
> 
> I think we can refine this a little:
> 
>         if (!(bp->flags & BNX2_FLAG_MSIX_CAP) || disable_msi)
>                 realloc_netdev_mq(dev, 1);
> 

Hmm... Probably I should use different words. I mentioned above the
main intention of realloc_netdev_mq() was to fix mq-ness for non-mq
chips, and this patch demonstrates this. But, there is no reason not
to fix/optimize more btw. So, I hoped the refinement could also
include some preliminary MSI-X test etc. Anyway, since I can't even
test it, I would be glad if you could author something based on this
stub (unless you think it's enough for now - then I'll send it as is,
no problem).

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] net: Introduce realloc_netdev_mq()
  2010-01-17 22:57                                                           ` Jarek Poplawski
@ 2010-01-18 18:29                                                             ` Michael Chan
  2010-01-18 19:41                                                               ` Jarek Poplawski
  0 siblings, 1 reply; 47+ messages in thread
From: Michael Chan @ 2010-01-18 18:29 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Eric Dumazet, David Miller, kaber, netdev, Jeff Kirsher,
	Peter P Waskiewicz Jr


On Sun, 2010-01-17 at 14:57 -0800, Jarek Poplawski wrote:
> On Sun, Jan 17, 2010 at 08:56:28AM -0800, Michael Chan wrote:
> > Jarek Poplawski wrote
> > > On Sat, Jan 16, 2010 at 02:50:48PM -0800, Michael Chan wrote:
> > > > The problem with the patch below is that
> > > > dev->real_num_tx_queues can still be reduced during bnx2_open().
> > > >
> > >
> > > I'm not sure I get your point, but there should be no problem with
> > > changing dev->real_num_tx_queues during ->open(). The main intention
> > > of realloc_netdev_mq() is to give drivers some official way to change
> > > dev->num_tx_queues until register_netdev() with the main aim: not to
> > > treat obviously non-mq chips as mq according to netif_is_multiqueue().
> > > Additional gain is memory saved in the case fixed by the patch below
> > > (which btw. waits for some refinement/verification).
> > 
> > Yes, the patch achieves what you describe.  dev->num_tx_queues will
> > be set more accurately during ->probe based on chip type.  In ->open(),
> > the dev->real_num_tx_queues will still be possibly reduced depending
> > on the number of CPU cores and whether we can enable and allocate the
> > MSI-X vectors.
> 
> I wondered, and Peter seemed to confirm, some check whether we can use
> MSI-X could be done during ->probe as well?
> 
> ...
> > > > > + if (!((bp->flags & BNX2_FLAG_MSIX_CAP) ||
> > > > > +      (bp->flags & BNX2_FLAG_MSI_CAP)) || disable_msi)
> > > > > +         realloc_netdev_mq(dev, 1);
> > > > > +
> > 
> > I think we can refine this a little:
> > 
> >         if (!(bp->flags & BNX2_FLAG_MSIX_CAP) || disable_msi)
> >                 realloc_netdev_mq(dev, 1);
> > 
> 
> Hmm... Probably I should use different words. I mentioned above the
> main intention of realloc_netdev_mq() was to fix mq-ness for non-mq
> chips, and this patch demonstrates this. But, there is no reason not
> to fix/optimize more btw. So, I hoped the refinement could also
> include some preliminary MSI-X test etc. Anyway, since I can't even
> test it, I would be glad if you could author something based on this
> stub (unless you think it's enough for now - then I'll send it as is,
> no problem).
> 

It's an improvement over existing code to unconditionally allocate max
tx queues.  To make further improvements, we'll have to experiment with
calling pci_enable_msix() so early during ->probe().  I will look into
it.  Thanks.





^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v4] net: Introduce realloc_netdev_mq()
  2010-01-18 18:29                                                             ` Michael Chan
@ 2010-01-18 19:41                                                               ` Jarek Poplawski
  0 siblings, 0 replies; 47+ messages in thread
From: Jarek Poplawski @ 2010-01-18 19:41 UTC (permalink / raw)
  To: Michael Chan
  Cc: Eric Dumazet, David Miller, kaber, netdev, Jeff Kirsher,
	Peter P Waskiewicz Jr

On Mon, Jan 18, 2010 at 10:29:48AM -0800, Michael Chan wrote:
> It's an improvement over existing code to unconditionally allocate max
> tx queues.  To make further improvements, we'll have to experiment with
> calling pci_enable_msix() so early during ->probe().  I will look into
> it.  Thanks.

I guess it can wait; thanks for remembering, btw.

Jarek P.

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2010-01-18 19:42 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-08  7:18 [RFC] multiqueue changes Eric Dumazet
2009-10-08  9:03 ` Jarek Poplawski
2009-10-08 12:00   ` Jarek Poplawski
2009-10-08 12:13     ` Eric Dumazet
2009-10-08 12:53       ` Jarek Poplawski
2009-10-09  7:58       ` David Miller
2009-10-28 17:27     ` Patrick McHardy
2009-10-28 21:23       ` Jarek Poplawski
2009-10-29 16:37         ` Patrick McHardy
2009-10-29 21:15           ` Jarek Poplawski
2009-10-29 22:12             ` Patrick McHardy
2009-10-30 10:00               ` Jarek Poplawski
2009-10-31 17:25                 ` Michael Chan
2009-11-01 13:20                   ` Jarek Poplawski
2009-11-02 11:35                     ` David Miller
2009-11-02 12:30                       ` Jarek Poplawski
2009-11-02 12:39                         ` David Miller
2009-11-02 13:02                           ` Jarek Poplawski
2009-11-02 13:03                             ` Eric Dumazet
2009-11-02 13:09                               ` Jarek Poplawski
2009-12-03 14:10                           ` [PATCH] net: Introduce realloc_netdev_mq() Jarek Poplawski
2009-12-03 14:39                           ` [PATCH v2] " Jarek Poplawski
2009-12-03 15:17                             ` Eric Dumazet
2009-12-03 16:36                               ` Jarek Poplawski
2009-12-03 16:54                                 ` Jarek Poplawski
2009-12-03 17:05                                   ` Eric Dumazet
2009-12-03 19:04                                     ` [PATCH v3] " Jarek Poplawski
2009-12-03 20:29                                     ` [PATCH v4] " Jarek Poplawski
2009-12-03 21:29                                       ` Eric Dumazet
2009-12-03 21:31                                         ` David Miller
2009-12-03 21:32                                           ` Eric Dumazet
2009-12-03 21:51                                           ` Eric Dumazet
2009-12-03 22:47                                             ` Jarek Poplawski
2009-12-03 23:04                                               ` Eric Dumazet
2009-12-04  7:48                                                 ` Jarek Poplawski
2009-12-04 10:51                                                   ` Peter P Waskiewicz Jr
2009-12-04 11:41                                                     ` Jarek Poplawski
2009-12-04 13:01                                                 ` Jarek Poplawski
2009-12-04 13:49                                                   ` Jarek Poplawski
2010-01-16 22:50                                                     ` Michael Chan
2010-01-17  0:36                                                       ` Jarek Poplawski
2010-01-17 16:56                                                         ` Michael Chan
2010-01-17 22:57                                                           ` Jarek Poplawski
2010-01-18 18:29                                                             ` Michael Chan
2010-01-18 19:41                                                               ` Jarek Poplawski
2009-10-09  8:51   ` [RFC] multiqueue changes Jarek Poplawski
2009-10-09  9:40     ` Jarek Poplawski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.