All of lore.kernel.org
 help / color / mirror / Atom feed
* XPS configuration question (on tg3)
@ 2016-09-06 18:46 Michal Soltys
  2016-09-06 20:21 ` Alexander Duyck
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Soltys @ 2016-09-06 18:46 UTC (permalink / raw)
  To: Linux Netdev List

Hi,

I've been testing different configurations and I didn't manage to get XPS to "behave" correctly - so I'm probably misunderstanding or forgetting something. The nic in question (under tg3 driver - BCM5720 and BCM5719 models) was configured to 3 tx and 4 rx queues. 3 irqs were shared (tx and rx), 1 was unused (this got me scratching my head a bit) and the remaining one was for the last rx (though due to another bug recently fixed the 4th rx queue was inconfigurable on receive side). The names were: eth1b-0, eth1b-txrx-1, eth1b-txrx-2, eth1b-txrx-3, eth1b-rx-4.

The XPS was configured as:

echo f >/sys/class/net/eth1b/queues/tx-0/xps_cpus
echo f0 >/sys/class/net/eth1b/queues/tx-1/xps_cpus
echo ff00 >/sys/class/net/eth1b/queues/tx-2/xps_cpus

So as far as I understand - cpus 0-3 should be allowed to use tx-0 queue only, 4-7 tx-1 and 8-15 tx-2.

Just in case rx side could get in the way as far as flows go, relevant irqs were pinned to specific cpus - txrx-1 to 2, txrx-2 to 4, txrx-3 to 10 - falling into groups defined by the above masks.

I tested both with mx and multiq scheduler, essentially either this:

qdisc mq 2: root
qdisc pfifo_fast 0: parent 2:1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent 2:2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent 2:3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 

or this (for the record, skbaction queue_mapping was behaving correctly with the one below):

qdisc multiq 3: root refcnt 6 bands 3/5
qdisc pfifo_fast 31: parent 3:1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 32: parent 3:2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 33: parent 3:3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 

Now, do I understand correctly, that under the above setup - commands such as

taskset 400 nc -p $prt host_ip 12345 </dev/zero
or
yancat -i /dev/zero -o t:host_ip:12345 -u 10 -U 10

ITOW - pinning simple nc command on cpu #10 (or using a tool that supports affinity by itself) and sending data to some other host on the net - should *always* use tx-2 queue ?
I also tested variation such as: taskset 400 nc -l -p host_ip 12345 </dev/zero (just in case taskset was "too late" with the affinity).

In my case, what queue it used was basically random (on top of that it sometimes changed the used queue mid-transfer) what could be easily confirmed through both /proc/interrupts and tc -s qdisc show. And I'm a bit at loss now, as I though xps configuration should be absolute.

Well, I'd be greatful for some pointers / hints.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XPS configuration question (on tg3)
  2016-09-06 18:46 XPS configuration question (on tg3) Michal Soltys
@ 2016-09-06 20:21 ` Alexander Duyck
  2016-09-06 21:00   ` Michal Soltys
  0 siblings, 1 reply; 7+ messages in thread
From: Alexander Duyck @ 2016-09-06 20:21 UTC (permalink / raw)
  To: Michal Soltys; +Cc: Linux Netdev List

On Tue, Sep 6, 2016 at 11:46 AM, Michal Soltys <soltys@ziu.info> wrote:
> Hi,
>
> I've been testing different configurations and I didn't manage to get XPS to "behave" correctly - so I'm probably misunderstanding or forgetting something. The nic in question (under tg3 driver - BCM5720 and BCM5719 models) was configured to 3 tx and 4 rx queues. 3 irqs were shared (tx and rx), 1 was unused (this got me scratching my head a bit) and the remaining one was for the last rx (though due to another bug recently fixed the 4th rx queue was inconfigurable on receive side). The names were: eth1b-0, eth1b-txrx-1, eth1b-txrx-2, eth1b-txrx-3, eth1b-rx-4.
>
> The XPS was configured as:
>
> echo f >/sys/class/net/eth1b/queues/tx-0/xps_cpus
> echo f0 >/sys/class/net/eth1b/queues/tx-1/xps_cpus
> echo ff00 >/sys/class/net/eth1b/queues/tx-2/xps_cpus
>
> So as far as I understand - cpus 0-3 should be allowed to use tx-0 queue only, 4-7 tx-1 and 8-15 tx-2.
>
> Just in case rx side could get in the way as far as flows go, relevant irqs were pinned to specific cpus - txrx-1 to 2, txrx-2 to 4, txrx-3 to 10 - falling into groups defined by the above masks.
>
> I tested both with mx and multiq scheduler, essentially either this:
>
> qdisc mq 2: root
> qdisc pfifo_fast 0: parent 2:1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent 2:2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent 2:3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>
> or this (for the record, skbaction queue_mapping was behaving correctly with the one below):
>
> qdisc multiq 3: root refcnt 6 bands 3/5
> qdisc pfifo_fast 31: parent 3:1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 32: parent 3:2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 33: parent 3:3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>
> Now, do I understand correctly, that under the above setup - commands such as
>
> taskset 400 nc -p $prt host_ip 12345 </dev/zero
> or
> yancat -i /dev/zero -o t:host_ip:12345 -u 10 -U 10
>
> ITOW - pinning simple nc command on cpu #10 (or using a tool that supports affinity by itself) and sending data to some other host on the net - should *always* use tx-2 queue ?
> I also tested variation such as: taskset 400 nc -l -p host_ip 12345 </dev/zero (just in case taskset was "too late" with the affinity).
>
> In my case, what queue it used was basically random (on top of that it sometimes changed the used queue mid-transfer) what could be easily confirmed through both /proc/interrupts and tc -s qdisc show. And I'm a bit at loss now, as I though xps configuration should be absolute.
>
> Well, I'd be greatful for some pointers / hints.

So it sounds like you have everything configured correctly.  The one
question I would have is if we are certain the CPU pinning is working
for the application.  You might try using something like perf to
verify what is running on CPU 10, and what is running on the CPUs that
the queues are associated with.

Also after you have configured things you may want to double check and
verify the xps_cpus value is still set.  I know under some
circumstances the value can be reset by a device driver if the number
of queues changes, or if the interface toggles between being
administratively up/down.

Thanks.

- Alex

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XPS configuration question (on tg3)
  2016-09-06 20:21 ` Alexander Duyck
@ 2016-09-06 21:00   ` Michal Soltys
  2016-09-07  0:19     ` Eric Dumazet
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Soltys @ 2016-09-06 21:00 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Linux Netdev List

On 2016-09-06 22:21, Alexander Duyck wrote:
> On Tue, Sep 6, 2016 at 11:46 AM, Michal Soltys <soltys@ziu.info> wrote:
>> Hi,
>>
>> I've been testing different configurations and I didn't manage to get XPS to "behave" correctly - so I'm probably misunderstanding or forgetting something. The nic in question (under tg3 driver - BCM5720 and BCM5719 models) was configured to 3 tx and 4 rx queues. 3 irqs were shared (tx and rx), 1 was unused (this got me scratching my head a bit) and the remaining one was for the last rx (though due to another bug recently fixed the 4th rx queue was inconfigurable on receive side). The names were: eth1b-0, eth1b-txrx-1, eth1b-txrx-2, eth1b-txrx-3, eth1b-rx-4.
>>
>> The XPS was configured as:
>>
>> echo f >/sys/class/net/eth1b/queues/tx-0/xps_cpus
>> echo f0 >/sys/class/net/eth1b/queues/tx-1/xps_cpus
>> echo ff00 >/sys/class/net/eth1b/queues/tx-2/xps_cpus
>>
>> So as far as I understand - cpus 0-3 should be allowed to use tx-0 queue only, 4-7 tx-1 and 8-15 tx-2.
>>
>> Just in case rx side could get in the way as far as flows go, relevant irqs were pinned to specific cpus - txrx-1 to 2, txrx-2 to 4, txrx-3 to 10 - falling into groups defined by the above masks.
>>
>> I tested both with mq and multiq scheduler, essentially either this:
>>
>> qdisc mq 2: root
>> qdisc pfifo_fast 0: parent 2:1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> qdisc pfifo_fast 0: parent 2:2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> qdisc pfifo_fast 0: parent 2:3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>>
>> or this (for the record, skbaction queue_mapping was behaving correctly with the one below):
>>
>> qdisc multiq 3: root refcnt 6 bands 3/5
>> qdisc pfifo_fast 31: parent 3:1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> qdisc pfifo_fast 32: parent 3:2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> qdisc pfifo_fast 33: parent 3:3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>>
>> Now, do I understand correctly, that under the above setup - commands such as
>>
>> taskset 400 nc -p $prt host_ip 12345 </dev/zero
>> or
>> yancat -i /dev/zero -o t:host_ip:12345 -u 10 -U 10
>>
>> ITOW - pinning simple nc command on cpu #10 (or using a tool that supports affinity by itself) and sending data to some other host on the net - should *always* use tx-2 queue ?
>> I also tested variation such as: taskset 400 nc -l -p host_ip 12345 </dev/zero (just in case taskset was "too late" with the affinity).
>>
>> In my case, what queue it used was basically random (on top of that it sometimes changed the used queue mid-transfer) what could be easily confirmed through both /proc/interrupts and tc -s qdisc show. And I'm a bit at loss now, as I though xps configuration should be absolute.
>>
>> Well, I'd be greatful for some pointers / hints.
> 
> So it sounds like you have everything configured correctly.  The one
> question I would have is if we are certain the CPU pinning is working
> for the application.  You might try using something like perf to
> verify what is running on CPU 10, and what is running on the CPUs that
> the queues are associated with.
> 

I did verify with 'top' in this case. I'll double check tommorow just to
be sure. Other than testing, there was nothing else running on the machine.

> Also after you have configured things you may want to double check and
> verify the xps_cpus value is still set.  I know under some
> circumstances the value can be reset by a device driver if the number
> of queues changes, or if the interface toggles between being
> administratively up/down.

Hmm, none of this was happening during tests.

Are there any other circumstances where xps settings could be ignored or
changed during the test (that is during the actual transfer, not between
separate attempts) ?

One thing I'm a bit afraid is that kernel was not exactly the newest
(3.16), maybe I'm missing some crucial fixes, though xps was added much
earlier than that. Either way, I'll try to redo tests with current
kernel tommorow.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XPS configuration question (on tg3)
  2016-09-06 21:00   ` Michal Soltys
@ 2016-09-07  0:19     ` Eric Dumazet
  2016-09-07  7:13       ` Michal Soltys
  2016-09-07 23:45       ` Michal Soltys
  0 siblings, 2 replies; 7+ messages in thread
From: Eric Dumazet @ 2016-09-07  0:19 UTC (permalink / raw)
  To: Michal Soltys; +Cc: Alexander Duyck, Linux Netdev List

On Tue, 2016-09-06 at 23:00 +0200, Michal Soltys wrote:
> On 2016-09-06 22:21, Alexander Duyck wrote:
> > On Tue, Sep 6, 2016 at 11:46 AM, Michal Soltys <soltys@ziu.info> wrote:
> >> Hi,
> >>
> >> I've been testing different configurations and I didn't manage to get XPS to "behave" correctly - so I'm probably misunderstanding or forgetting something. The nic in question (under tg3 driver - BCM5720 and BCM5719 models) was configured to 3 tx and 4 rx queues. 3 irqs were shared (tx and rx), 1 was unused (this got me scratching my head a bit) and the remaining one was for the last rx (though due to another bug recently fixed the 4th rx queue was inconfigurable on receive side). The names were: eth1b-0, eth1b-txrx-1, eth1b-txrx-2, eth1b-txrx-3, eth1b-rx-4.
> >>
> >> The XPS was configured as:
> >>
> >> echo f >/sys/class/net/eth1b/queues/tx-0/xps_cpus
> >> echo f0 >/sys/class/net/eth1b/queues/tx-1/xps_cpus
> >> echo ff00 >/sys/class/net/eth1b/queues/tx-2/xps_cpus
> >>
> >> So as far as I understand - cpus 0-3 should be allowed to use tx-0 queue only, 4-7 tx-1 and 8-15 tx-2.
> >>
> >> Just in case rx side could get in the way as far as flows go, relevant irqs were pinned to specific cpus - txrx-1 to 2, txrx-2 to 4, txrx-3 to 10 - falling into groups defined by the above masks.
> >>
> >> I tested both with mq and multiq scheduler, essentially either this:
> >>
> >> qdisc mq 2: root
> >> qdisc pfifo_fast 0: parent 2:1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> >> qdisc pfifo_fast 0: parent 2:2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> >> qdisc pfifo_fast 0: parent 2:3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> >>
> >> or this (for the record, skbaction queue_mapping was behaving correctly with the one below):
> >>
> >> qdisc multiq 3: root refcnt 6 bands 3/5
> >> qdisc pfifo_fast 31: parent 3:1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> >> qdisc pfifo_fast 32: parent 3:2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> >> qdisc pfifo_fast 33: parent 3:3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> >>
> >> Now, do I understand correctly, that under the above setup - commands such as
> >>
> >> taskset 400 nc -p $prt host_ip 12345 </dev/zero
> >> or
> >> yancat -i /dev/zero -o t:host_ip:12345 -u 10 -U 10
> >>
> >> ITOW - pinning simple nc command on cpu #10 (or using a tool that supports affinity by itself) and sending data to some other host on the net - should *always* use tx-2 queue ?
> >> I also tested variation such as: taskset 400 nc -l -p host_ip 12345 </dev/zero (just in case taskset was "too late" with the affinity).
> >>
> >> In my case, what queue it used was basically random (on top of that it sometimes changed the used queue mid-transfer) what could be easily confirmed through both /proc/interrupts and tc -s qdisc show. And I'm a bit at loss now, as I though xps configuration should be absolute.
> >>
> >> Well, I'd be greatful for some pointers / hints.
> > 
> > So it sounds like you have everything configured correctly.  The one
> > question I would have is if we are certain the CPU pinning is working
> > for the application.  You might try using something like perf to
> > verify what is running on CPU 10, and what is running on the CPUs that
> > the queues are associated with.
> > 
> 
> I did verify with 'top' in this case. I'll double check tommorow just to
> be sure. Other than testing, there was nothing else running on the machine.
> 
> > Also after you have configured things you may want to double check and
> > verify the xps_cpus value is still set.  I know under some
> > circumstances the value can be reset by a device driver if the number
> > of queues changes, or if the interface toggles between being
> > administratively up/down.
> 
> Hmm, none of this was happening during tests.
> 
> Are there any other circumstances where xps settings could be ignored or
> changed during the test (that is during the actual transfer, not between
> separate attempts) ?
> 
> One thing I'm a bit afraid is that kernel was not exactly the newest
> (3.16), maybe I'm missing some crucial fixes, though xps was added much
> earlier than that. Either way, I'll try to redo tests with current
> kernel tommorow.
> 

Keep in mind that TCP stack can send packets, responding to incoming
ACK.

So you might check that incoming ACK are handled by the 'right' cpu.

Without RFS, there is no such guarantee.

echo 32768 >/proc/sys/net/core/rps_sock_flow_entries
echo 8192 >/sys/class/net/eth1/queues/rx-0/rps_flow_cnt
echo 8192 >/sys/class/net/eth1/queues/rx-1/rps_flow_cnt
echo 8192 >/sys/class/net/eth1/queues/rx-2/rps_flow_cnt
echo 8192 >/sys/class/net/eth1/queues/rx-3/rps_flow_cnt

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XPS configuration question (on tg3)
  2016-09-07  0:19     ` Eric Dumazet
@ 2016-09-07  7:13       ` Michal Soltys
  2016-09-07 23:45       ` Michal Soltys
  1 sibling, 0 replies; 7+ messages in thread
From: Michal Soltys @ 2016-09-07  7:13 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Alexander Duyck, Linux Netdev List

On 2016-09-07 02:19, Eric Dumazet wrote:
> On Tue, 2016-09-06 at 23:00 +0200, Michal Soltys wrote:
>> On 2016-09-06 22:21, Alexander Duyck wrote:
>> > On Tue, Sep 6, 2016 at 11:46 AM, Michal Soltys <soltys@ziu.info> wrote:
>> >> Hi,
>> >>
>> >> I've been testing different configurations and I didn't manage to get XPS to "behave" correctly - so I'm probably misunderstanding or forgetting something. The nic in question (under tg3 driver - BCM5720 and BCM5719 models) was configured to 3 tx and 4 rx queues. 3 irqs were shared (tx and rx), 1 was unused (this got me scratching my head a bit) and the remaining one was for the last rx (though due to another bug recently fixed the 4th rx queue was inconfigurable on receive side). The names were: eth1b-0, eth1b-txrx-1, eth1b-txrx-2, eth1b-txrx-3, eth1b-rx-4.
>> >>
>> >> The XPS was configured as:
>> >>
>> >> echo f >/sys/class/net/eth1b/queues/tx-0/xps_cpus
>> >> echo f0 >/sys/class/net/eth1b/queues/tx-1/xps_cpus
>> >> echo ff00 >/sys/class/net/eth1b/queues/tx-2/xps_cpus
>> >>
>> >> So as far as I understand - cpus 0-3 should be allowed to use tx-0 queue only, 4-7 tx-1 and 8-15 tx-2.
>> >>
>> >> Just in case rx side could get in the way as far as flows go, relevant irqs were pinned to specific cpus - txrx-1 to 2, txrx-2 to 4, txrx-3 to 10 - falling into groups defined by the above masks.
>> >>
>> >> I tested both with mq and multiq scheduler, essentially either this:
>> >>
>> >> qdisc mq 2: root
>> >> qdisc pfifo_fast 0: parent 2:1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> >> qdisc pfifo_fast 0: parent 2:2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> >> qdisc pfifo_fast 0: parent 2:3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> >>
>> >> or this (for the record, skbaction queue_mapping was behaving correctly with the one below):
>> >>
>> >> qdisc multiq 3: root refcnt 6 bands 3/5
>> >> qdisc pfifo_fast 31: parent 3:1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> >> qdisc pfifo_fast 32: parent 3:2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> >> qdisc pfifo_fast 33: parent 3:3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> >>
>> >> Now, do I understand correctly, that under the above setup - commands such as
>> >>
>> >> taskset 400 nc -p $prt host_ip 12345 </dev/zero
>> >> or
>> >> yancat -i /dev/zero -o t:host_ip:12345 -u 10 -U 10
>> >>
>> >> ITOW - pinning simple nc command on cpu #10 (or using a tool that supports affinity by itself) and sending data to some other host on the net - should *always* use tx-2 queue ?
>> >> I also tested variation such as: taskset 400 nc -l -p host_ip 12345 </dev/zero (just in case taskset was "too late" with the affinity).
>> >>
>> >> In my case, what queue it used was basically random (on top of that it sometimes changed the used queue mid-transfer) what could be easily confirmed through both /proc/interrupts and tc -s qdisc show. And I'm a bit at loss now, as I though xps configuration should be absolute.
>> >>
>> >> Well, I'd be greatful for some pointers / hints.
>> > 
>> > So it sounds like you have everything configured correctly.  The one
>> > question I would have is if we are certain the CPU pinning is working
>> > for the application.  You might try using something like perf to
>> > verify what is running on CPU 10, and what is running on the CPUs that
>> > the queues are associated with.
>> > 
>> 
>> I did verify with 'top' in this case. I'll double check tommorow just to
>> be sure. Other than testing, there was nothing else running on the machine.
>> 
>> > Also after you have configured things you may want to double check and
>> > verify the xps_cpus value is still set.  I know under some
>> > circumstances the value can be reset by a device driver if the number
>> > of queues changes, or if the interface toggles between being
>> > administratively up/down.
>> 
>> Hmm, none of this was happening during tests.
>> 
>> Are there any other circumstances where xps settings could be ignored or
>> changed during the test (that is during the actual transfer, not between
>> separate attempts) ?
>> 
>> One thing I'm a bit afraid is that kernel was not exactly the newest
>> (3.16), maybe I'm missing some crucial fixes, though xps was added much
>> earlier than that. Either way, I'll try to redo tests with current
>> kernel tommorow.
>> 
> 
> Keep in mind that TCP stack can send packets, responding to incoming
> ACK.
> 
> So you might check that incoming ACK are handled by the 'right' cpu.
> 
> Without RFS, there is no such guarantee.
> 
> echo 32768 >/proc/sys/net/core/rps_sock_flow_entries
> echo 8192 >/sys/class/net/eth1/queues/rx-0/rps_flow_cnt
> echo 8192 >/sys/class/net/eth1/queues/rx-1/rps_flow_cnt
> echo 8192 >/sys/class/net/eth1/queues/rx-2/rps_flow_cnt
> echo 8192 >/sys/class/net/eth1/queues/rx-3/rps_flow_cnt
> 

I do need to enable RPS as well before RFS can take any effect
(queues/rx-.../rps_cpus), right ?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XPS configuration question (on tg3)
  2016-09-07  0:19     ` Eric Dumazet
  2016-09-07  7:13       ` Michal Soltys
@ 2016-09-07 23:45       ` Michal Soltys
  2016-09-08  0:13         ` Eric Dumazet
  1 sibling, 1 reply; 7+ messages in thread
From: Michal Soltys @ 2016-09-07 23:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Alexander Duyck, Linux Netdev List

On 2016-09-07 02:19, Eric Dumazet wrote:
> On Tue, 2016-09-06 at 23:00 +0200, Michal Soltys wrote:
>> On 2016-09-06 22:21, Alexander Duyck wrote:
>> > On Tue, Sep 6, 2016 at 11:46 AM, Michal Soltys <soltys@ziu.info> wrote:
>> >> Hi,
>> >>
>> >> I've been testing different configurations and I didn't manage to get XPS to "behave" correctly - so I'm probably misunderstanding or forgetting something. The nic in question (under tg3 driver - BCM5720 and BCM5719 models) was configured to 3 tx and 4 rx queues. 3 irqs were shared (tx and rx), 1 was unused (this got me scratching my head a bit) and the remaining one was for the last rx (though due to another bug recently fixed the 4th rx queue was inconfigurable on receive side). The names were: eth1b-0, eth1b-txrx-1, eth1b-txrx-2, eth1b-txrx-3, eth1b-rx-4.
>> >>
>> >> The XPS was configured as:
>> >>
>> >> echo f >/sys/class/net/eth1b/queues/tx-0/xps_cpus
>> >> echo f0 >/sys/class/net/eth1b/queues/tx-1/xps_cpus
>> >> echo ff00 >/sys/class/net/eth1b/queues/tx-2/xps_cpus
>> >>
>> >> So as far as I understand - cpus 0-3 should be allowed to use tx-0 queue only, 4-7 tx-1 and 8-15 tx-2.
>> >>
>> >> Just in case rx side could get in the way as far as flows go, relevant irqs were pinned to specific cpus - txrx-1 to 2, txrx-2 to 4, txrx-3 to 10 - falling into groups defined by the above masks.
>> >>
>> >> I tested both with mq and multiq scheduler, essentially either this:
>> >>
>> >> qdisc mq 2: root
>> >> qdisc pfifo_fast 0: parent 2:1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> >> qdisc pfifo_fast 0: parent 2:2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> >> qdisc pfifo_fast 0: parent 2:3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> >>
>> >> or this (for the record, skbaction queue_mapping was behaving correctly with the one below):
>> >>
>> >> qdisc multiq 3: root refcnt 6 bands 3/5
>> >> qdisc pfifo_fast 31: parent 3:1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> >> qdisc pfifo_fast 32: parent 3:2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> >> qdisc pfifo_fast 33: parent 3:3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
>> >>
>> >> Now, do I understand correctly, that under the above setup - commands such as
>> >>
>> >> taskset 400 nc -p $prt host_ip 12345 </dev/zero
>> >> or
>> >> yancat -i /dev/zero -o t:host_ip:12345 -u 10 -U 10
>> >>
>> >> ITOW - pinning simple nc command on cpu #10 (or using a tool that supports affinity by itself) and sending data to some other host on the net - should *always* use tx-2 queue ?
>> >> I also tested variation such as: taskset 400 nc -l -p host_ip 12345 </dev/zero (just in case taskset was "too late" with the affinity).
>> >>
>> >> In my case, what queue it used was basically random (on top of that it sometimes changed the used queue mid-transfer) what could be easily confirmed through both /proc/interrupts and tc -s qdisc show. And I'm a bit at loss now, as I though xps configuration should be absolute.
>> >>
>> >> Well, I'd be greatful for some pointers / hints.
>> > 
>> > So it sounds like you have everything configured correctly.  The one
>> > question I would have is if we are certain the CPU pinning is working
>> > for the application.  You might try using something like perf to
>> > verify what is running on CPU 10, and what is running on the CPUs that
>> > the queues are associated with.
>> > 
>> 
>> I did verify with 'top' in this case. I'll double check tommorow just to
>> be sure. Other than testing, there was nothing else running on the machine.
>> 
>> > Also after you have configured things you may want to double check and
>> > verify the xps_cpus value is still set.  I know under some
>> > circumstances the value can be reset by a device driver if the number
>> > of queues changes, or if the interface toggles between being
>> > administratively up/down.
>> 
>> Hmm, none of this was happening during tests.
>> 
>> Are there any other circumstances where xps settings could be ignored or
>> changed during the test (that is during the actual transfer, not between
>> separate attempts) ?
>> 
>> One thing I'm a bit afraid is that kernel was not exactly the newest
>> (3.16), maybe I'm missing some crucial fixes, though xps was added much
>> earlier than that. Either way, I'll try to redo tests with current
>> kernel tommorow.
>> 
> 
> Keep in mind that TCP stack can send packets, responding to incoming
> ACK.
> 
> So you might check that incoming ACK are handled by the 'right' cpu.
> 
> Without RFS, there is no such guarantee.
> 
> echo 32768 >/proc/sys/net/core/rps_sock_flow_entries
> echo 8192 >/sys/class/net/eth1/queues/rx-0/rps_flow_cnt
> echo 8192 >/sys/class/net/eth1/queues/rx-1/rps_flow_cnt
> echo 8192 >/sys/class/net/eth1/queues/rx-2/rps_flow_cnt
> echo 8192 >/sys/class/net/eth1/queues/rx-3/rps_flow_cnt
> 

Did some more testing today, indeed RFS helped with TCP flows, but ....
it got me wondering:

In scenario such as: XPS off, RFS/RPS off, irqs pinned, process
transfering data pinned - one tx queue was chosen (through hash) and
consistently persisted throughout the whole transfer. No exceptions, at
least none in the tests I did.

When XPS is getting enabled, the only thing that changes is that instead
of using hash to select one of available queues, the cpu running process
is specifically told which queue it can use (and eventually selects one
through hash if more than one is available). Shouldn't the choice
persist throughout the transfer as well then ?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XPS configuration question (on tg3)
  2016-09-07 23:45       ` Michal Soltys
@ 2016-09-08  0:13         ` Eric Dumazet
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2016-09-08  0:13 UTC (permalink / raw)
  To: Michal Soltys; +Cc: Alexander Duyck, Linux Netdev List

On Thu, 2016-09-08 at 01:45 +0200, Michal Soltys wrote:
> > Keep in mind that TCP stack can send packets, responding to incoming
> > ACK.
> > 
> > So you might check that incoming ACK are handled by the 'right' cpu.
> > 
> > Without RFS, there is no such guarantee.

> Did some more testing today, indeed RFS helped with TCP flows, but ....
> it got me wondering:
> 
> In scenario such as: XPS off, RFS/RPS off, irqs pinned, process
> transfering data pinned - one tx queue was chosen (through hash) and
> consistently persisted throughout the whole transfer. No exceptions, at
> least none in the tests I did.

It depends if at least one packet for each flow sits in a qdisc/NIC
queue. TCP has this Out Of Order transmit logic preventing a TX queue
change, even if the process doing the sendmsg() is migrated.

git grep -n ooo_okay

> 
> When XPS is getting enabled, the only thing that changes is that instead
> of using hash to select one of available queues, the cpu running process
> is specifically told which queue it can use (and eventually selects one
> through hash if more than one is available). Shouldn't the choice
> persist throughout the transfer as well then ?

Sure, if the process doing the sendmsg() sticks to one cpu, and this cpu
is the one handling incoming ACK packets as well.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-09-08  0:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-06 18:46 XPS configuration question (on tg3) Michal Soltys
2016-09-06 20:21 ` Alexander Duyck
2016-09-06 21:00   ` Michal Soltys
2016-09-07  0:19     ` Eric Dumazet
2016-09-07  7:13       ` Michal Soltys
2016-09-07 23:45       ` Michal Soltys
2016-09-08  0:13         ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.