Why do we use RX queue mapping for TX?

All of lore.kernel.org
 help / color / mirror / Atom feed

* Why do we use RX queue mapping for TX?
@ 2015-02-18 21:09 Cong Wang
  2015-02-18 21:18 ` Cong Wang
  0 siblings, 1 reply; 10+ messages in thread
From: Cong Wang @ 2015-02-18 21:09 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

Hi, David


With regarding to the following commit:

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why do we use RX queue mapping for TX?
  2015-02-18 21:09 Why do we use RX queue mapping for TX? Cong Wang
@ 2015-02-18 21:18 ` Cong Wang
  2015-02-18 22:11   ` Cong Wang
  0 siblings, 1 reply; 10+ messages in thread
From: Cong Wang @ 2015-02-18 21:18 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

Hi, David

With regarding to the following commit:

commit d5a9e24afb4ab38110ebb777588ea0bd0eacbd0a
Author: David S. Miller <davem@davemloft.net>
Date:   Tue Jan 27 16:22:11 2009 -0800

    net: Allow RX queue selection to seed TX queue hashing.

What's the point of this commit? It looks like it is for routers
where skb->queue_mapping is preserved across forwarding.
Or some software interface, but they usually don't even have
a queue.

But:

1) the incoming interface may not be same with the outgoing
interface, therefore they may not has the same number of queues.
Even if it's the same interface, the number of RX queues doesn't
have to be same with TX queues either.

2) This breaks the queue mapping specified by an skbedit action,
since for TX queues the index starts with 0 while for RX it starts with 1
(for some reason I don't see yet). There is at least a mismatch.

3) Since both TX and RX use the same field ->queue_mapping,
I assume normally it is supposed to be set and used only by one of them,
so the above commit kinda breaks this too.

Or am I still missing anything?

Thanks!

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why do we use RX queue mapping for TX?
  2015-02-18 21:18 ` Cong Wang
@ 2015-02-18 22:11   ` Cong Wang
  2015-02-18 23:14     ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Cong Wang @ 2015-02-18 22:11 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

On Wed, Feb 18, 2015 at 1:18 PM, Cong Wang <cwang@twopensource.com> wrote:
>
> 2) This breaks the queue mapping specified by an skbedit action,
> since for TX queues the index starts with 0 while for RX it starts with 1
> (for some reason I don't see yet). There is at least a mismatch.

So queue 0 is reserved for TX, at least for bonding queue mapping.

But for sch_mq, all queues are used, 0 is merely the default one.
This means we should still allow user to specify queue 0 for TX.

We are trying to do queue mapping for containers, queue 0 isn't
special for us at all, therefore I don't see why we should reserve it
at least for hardware TX queues.

*I think* we should use the same strategy for TX queue mapping like
RX queue mapping, where ->queue_mapping = 0 means "not set",
while ->queue_mapping maps to a real queue index starting at 0.
This would also make ->queue_mapping have the same meaning
across RX -> TX if we really need to preserve it.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why do we use RX queue mapping for TX?
  2015-02-18 22:11   ` Cong Wang
@ 2015-02-18 23:14     ` Eric Dumazet
  2015-02-18 23:57       ` Cong Wang
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2015-02-18 23:14 UTC (permalink / raw)
  To: Cong Wang; +Cc: David S. Miller, netdev

On Wed, 2015-02-18 at 14:11 -0800, Cong Wang wrote:
> On Wed, Feb 18, 2015 at 1:18 PM, Cong Wang <cwang@twopensource.com> wrote:
> >
> > 2) This breaks the queue mapping specified by an skbedit action,
> > since for TX queues the index starts with 0 while for RX it starts with 1
> > (for some reason I don't see yet). There is at least a mismatch.
> 
> So queue 0 is reserved for TX, at least for bonding queue mapping.

It seems you missed that bonding ndo_select_queue() is rather special.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why do we use RX queue mapping for TX?
  2015-02-18 23:14     ` Eric Dumazet
@ 2015-02-18 23:57       ` Cong Wang
  2015-02-19  0:16         ` Eric Dumazet
  2015-02-20  1:31         ` John Fastabend
  0 siblings, 2 replies; 10+ messages in thread
From: Cong Wang @ 2015-02-18 23:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, netdev

On Wed, Feb 18, 2015 at 3:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2015-02-18 at 14:11 -0800, Cong Wang wrote:
>> On Wed, Feb 18, 2015 at 1:18 PM, Cong Wang <cwang@twopensource.com> wrote:
>> >
>> > 2) This breaks the queue mapping specified by an skbedit action,
>> > since for TX queues the index starts with 0 while for RX it starts with 1
>> > (for some reason I don't see yet). There is at least a mismatch.
>>
>> So queue 0 is reserved for TX, at least for bonding queue mapping.
>
> It seems you missed that bonding ndo_select_queue() is rather special.
>
>

I am aware of it. I even would guess (means not digging the history)
skb_edit was invented for bonding queue mapping, since I don't see it
even works as a general TC action on the physical interface. We pick
the tx queue prior to getting the Qdisc, therefore too late to set
skb->queue_mapping to specify a hardware TX queue. This is
another story I planned to bring it up to David.

However, this still doesn't seem to be a reason to break people who
don't use bonding at all? At least we just want to map skb's to different
hardware TX queues by setting skb->queue_mapping (before
dev_queue_xmit() of course) and sysfs reports the queues starting with
index 0. This is why I complain. :)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why do we use RX queue mapping for TX?
  2015-02-18 23:57       ` Cong Wang
@ 2015-02-19  0:16         ` Eric Dumazet
  2015-02-20 18:09           ` Cong Wang
  2015-02-20  1:31         ` John Fastabend
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2015-02-19  0:16 UTC (permalink / raw)
  To: Cong Wang; +Cc: David S. Miller, netdev

On Wed, 2015-02-18 at 15:57 -0800, Cong Wang wrote:

> I am aware of it. I even would guess (means not digging the history)
> skb_edit was invented for bonding queue mapping, since I don't see it
> even works as a general TC action on the physical interface. We pick
> the tx queue prior to getting the Qdisc, therefore too late to set
> skb->queue_mapping to specify a hardware TX queue. This is
> another story I planned to bring it up to David.

skbedit can be done on ingress I presume.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why do we use RX queue mapping for TX?
  2015-02-18 23:57       ` Cong Wang
  2015-02-19  0:16         ` Eric Dumazet
@ 2015-02-20  1:31         ` John Fastabend
  2015-02-20 18:33           ` Cong Wang
  1 sibling, 1 reply; 10+ messages in thread
From: John Fastabend @ 2015-02-20  1:31 UTC (permalink / raw)
  To: Cong Wang; +Cc: Eric Dumazet, David S. Miller, netdev

On 02/18/2015 03:57 PM, Cong Wang wrote:
> On Wed, Feb 18, 2015 at 3:14 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Wed, 2015-02-18 at 14:11 -0800, Cong Wang wrote:
>>> On Wed, Feb 18, 2015 at 1:18 PM, Cong Wang <cwang@twopensource.com> wrote:
>>>>
>>>> 2) This breaks the queue mapping specified by an skbedit action,
>>>> since for TX queues the index starts with 0 while for RX it starts with 1
>>>> (for some reason I don't see yet). There is at least a mismatch.
>>>
>>> So queue 0 is reserved for TX, at least for bonding queue mapping.
>>
>> It seems you missed that bonding ndo_select_queue() is rather special.
>>
>>
>
> I am aware of it. I even would guess (means not digging the history)
> skb_edit was invented for bonding queue mapping, since I don't see it
> even works as a general TC action on the physical interface. We pick
> the tx queue prior to getting the Qdisc, therefore too late to set
> skb->queue_mapping to specify a hardware TX queue. This is
> another story I planned to bring it up to David.

At one point I proposed a pre-enqueue hook to call before entering
the qdisc where filters/actions could be attached.

 
https://github.com/jrfastab/Linux-Kernel-QOS/commit/67746f95acd77cf15d7ce34f644b76058ce19813

the idea was to drop select_queue() altogether and force any queue
selection onto a visible action chain.

>
> However, this still doesn't seem to be a reason to break people who
> don't use bonding at all? At least we just want to map skb's to different
> hardware TX queues by setting skb->queue_mapping (before
> dev_queue_xmit() of course) and sysfs reports the queues starting with
> index 0. This is why I complain. :)

for historical info take a look at multiq qdisc. It was added at
some point (i think before all the multiple queue nics were there?).
It does use the select_queue field to select a queue but I doubt anyone
uses it today with mq and mqprio. I would argue no one should be using
it there are better ways to get the same thing.

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
John Fastabend         Intel Corporation

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why do we use RX queue mapping for TX?
  2015-02-19  0:16         ` Eric Dumazet
@ 2015-02-20 18:09           ` Cong Wang
  2015-02-20 19:08             ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Cong Wang @ 2015-02-20 18:09 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S. Miller, netdev

On Wed, Feb 18, 2015 at 4:16 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2015-02-18 at 15:57 -0800, Cong Wang wrote:
>
>> I am aware of it. I even would guess (means not digging the history)
>> skb_edit was invented for bonding queue mapping, since I don't see it
>> even works as a general TC action on the physical interface. We pick
>> the tx queue prior to getting the Qdisc, therefore too late to set
>> skb->queue_mapping to specify a hardware TX queue. This is
>> another story I planned to bring it up to David.
>
> skbedit can be done on ingress I presume.
>

Yes of course, but that means I need an indirect layer to
map RX to TX (ifb or mirred), since I only care about TX queues. :-/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why do we use RX queue mapping for TX?
  2015-02-20  1:31         ` John Fastabend
@ 2015-02-20 18:33           ` Cong Wang
  0 siblings, 0 replies; 10+ messages in thread
From: Cong Wang @ 2015-02-20 18:33 UTC (permalink / raw)
  To: John Fastabend; +Cc: Eric Dumazet, David S. Miller, netdev

On Thu, Feb 19, 2015 at 5:31 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
> On 02/18/2015 03:57 PM, Cong Wang wrote:
>>
>> On Wed, Feb 18, 2015 at 3:14 PM, Eric Dumazet <eric.dumazet@gmail.com>
>> wrote:
>>>
>>> On Wed, 2015-02-18 at 14:11 -0800, Cong Wang wrote:
>>>>
>>>> On Wed, Feb 18, 2015 at 1:18 PM, Cong Wang <cwang@twopensource.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> 2) This breaks the queue mapping specified by an skbedit action,
>>>>> since for TX queues the index starts with 0 while for RX it starts with
>>>>> 1
>>>>> (for some reason I don't see yet). There is at least a mismatch.
>>>>
>>>>
>>>> So queue 0 is reserved for TX, at least for bonding queue mapping.
>>>
>>>
>>> It seems you missed that bonding ndo_select_queue() is rather special.
>>>
>>>
>>
>> I am aware of it. I even would guess (means not digging the history)
>> skb_edit was invented for bonding queue mapping, since I don't see it
>> even works as a general TC action on the physical interface. We pick
>> the tx queue prior to getting the Qdisc, therefore too late to set
>> skb->queue_mapping to specify a hardware TX queue. This is
>> another story I planned to bring it up to David.
>
>
> At one point I proposed a pre-enqueue hook to call before entering
> the qdisc where filters/actions could be attached.
>
>
> https://github.com/jrfastab/Linux-Kernel-QOS/commit/67746f95acd77cf15d7ce34f644b76058ce19813
>
> the idea was to drop select_queue() altogether and force any queue
> selection onto a visible action chain.

That is one quick solution. Ideally, for long term, I think we should
move the TX queues down below the Qdisc layer, or at least separate
them with a clear boundary, rather than mixing in such a way, especially
sch_mq. This is essentially hard since Qdisc's are scheduled in TX softirq,
probably due to historical reason (Qdisc's were invented before multiqueue?).
Anyway, I am still working on it and still trying to find a better solution.

>>
>> However, this still doesn't seem to be a reason to break people who
>> don't use bonding at all? At least we just want to map skb's to different
>> hardware TX queues by setting skb->queue_mapping (before
>> dev_queue_xmit() of course) and sysfs reports the queues starting with
>> index 0. This is why I complain. :)
>
>
> for historical info take a look at multiq qdisc. It was added at
> some point (i think before all the multiple queue nics were there?).
> It does use the select_queue field to select a queue but I doubt anyone
> uses it today with mq and mqprio. I would argue no one should be using
> it there are better ways to get the same thing.

Sure, multiq accepts filters, we have to use our own filters to classify
the flows. Using skbedit is much simpler since we only need to mark
skb->queue_mapping before delivering skb to the physical interface,
no need to add more filters at all (we already have many filters).

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Why do we use RX queue mapping for TX?
  2015-02-20 18:09           ` Cong Wang
@ 2015-02-20 19:08             ` Eric Dumazet
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2015-02-20 19:08 UTC (permalink / raw)
  To: Cong Wang; +Cc: David S. Miller, netdev

On Fri, 2015-02-20 at 10:09 -0800, Cong Wang wrote:

> Yes of course, but that means I need an indirect layer to
> map RX to TX (ifb or mirred), since I only care about TX queues. :-/

We do care of TX queues as well, but we use XPS, for performance
reasons.

It looks like you want something very different.

Note : netdev_pick_tx/ndo_select_queue() is called before queue
selection.

Maybe you want something close to the reclassify thing in tc, allowing
a packet to restart the tx _before_ the point netdev_pick_tx() is
called.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-02-20 19:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-18 21:09 Why do we use RX queue mapping for TX? Cong Wang
2015-02-18 21:18 ` Cong Wang
2015-02-18 22:11   ` Cong Wang
2015-02-18 23:14     ` Eric Dumazet
2015-02-18 23:57       ` Cong Wang
2015-02-19  0:16         ` Eric Dumazet
2015-02-20 18:09           ` Cong Wang
2015-02-20 19:08             ` Eric Dumazet
2015-02-20  1:31         ` John Fastabend
2015-02-20 18:33           ` Cong Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.