* Dropped packets mapping IRQs for adjusted queue counts on i40e @ 2021-05-01 0:45 Zvi Effron [not found] ` <CADS2XXpjasmJKP__oHsrvv3EG8n-FjB6sqHwgQfh7QgeJ8GrrQ@mail.gmail.com> 0 siblings, 1 reply; 18+ messages in thread From: Zvi Effron @ 2021-05-01 0:45 UTC (permalink / raw) To: Xdp Hi-ho, friends! I'm trying to tune network queues so that there's one queue per core, as recommended in many, many places. I've noticed that if I adjust the number of queues from the default AND adjust the IRQ affinities, then I get some percentage (varying from a small percent to 100%, seemingly proportional to the number of reduced queues) of packets not making it through. For my setup, I have a single 36 core Skylake processor with two dual port X710 NICs. All traffic coming in one one port of each NIC is redirected out the other port (traffic arrives on all four ports). 34 cores are isolated for network processing. I adjust the combined queues from the default of 36 down, and then map the IRQ associated with each queue to one of the 34 isolated cores. Everything works fine if I don't map the IRQs. For a minimum repro case, I reduced my program (reproduced below) to a blind redirector using a devmap (it doesn't even adjust MACs, which is not a problem as my DUT is directly connected to a measurement device in promiscuous mode) reproduced below. I use bpftool to load 4 copies of the program and pin them, use bpftool to configure the egress interface in the devmap, and then use ip link to attach the programs to the interfaces. I have played around with when I adjust the queue counts and IRQs (before attaching XDP programs, after, XDP attachment in the middle, etc.) and it doesn't seem to matter. But with any ordering, if I just don't remap the IRQs, everything works fine, and if I remap, I lose packets. Has anyone encountered anything like this? Does anyone know what might be causing it? How can I assign a single queue to a single core without using the default number of queues and without losing packets? Thanks! --Zvi #include <linux/bpf.h> struct bpf_map_def { unsigned int type; unsigned int key_size; unsigned int value_size; unsigned int max_entries; unsigned int map_flags; struct bpf_map_def* inner_map; }; struct bpf_map_def __attribute__((section("maps"), used)) device_map = { .type = BPF_MAP_TYPE_DEVMAP, .key_size = sizeof(__u32), .value_size = sizeof(__u32), .max_entries = 1, }; static int (*bpf_redirect_map)(void *map, int key, int flags) = (void *)BPF_FUNC_redirect_map; __attribute__((section("xdp/test"), used)) int test(struct xdp_md *context) { __u32 key = 0; bpf_redirect_map(&device_map, key, 0); return XDP_REDIRECT; } ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <CADS2XXpjasmJKP__oHsrvv3EG8n-FjB6sqHwgQfh7QgeJ8GrrQ@mail.gmail.com>]
* Re: Dropped packets mapping IRQs for adjusted queue counts on i40e [not found] ` <CADS2XXpjasmJKP__oHsrvv3EG8n-FjB6sqHwgQfh7QgeJ8GrrQ@mail.gmail.com> @ 2021-05-01 17:56 ` Zvi Effron 2021-05-02 20:16 ` T K Sourabh 0 siblings, 1 reply; 18+ messages in thread From: Zvi Effron @ 2021-05-01 17:56 UTC (permalink / raw) To: T K Sourabh; +Cc: Xdp On Sat, May 1, 2021 at 8:23 AM T K Sourabh <sourabhtk37@gmail.com> wrote: > > When we turned off irqbalance service, we would not see the issue as they were messing up the affinities. > > Another thing to ensure: equal number of queues on both interfaces. > Thanks for the info, it does give me some areas to investigate. We already have irqbalance turned off (uninstalled, actually), and we're using the same number of queues on each device (we get the number of queues by taking the number of cores divided by number of interfaces). I am wondering if not having the matching queues for the ingress interface and egress interface on the same cores might be a contributing factor? --Zvi ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Dropped packets mapping IRQs for adjusted queue counts on i40e 2021-05-01 17:56 ` Zvi Effron @ 2021-05-02 20:16 ` T K Sourabh 2021-05-04 23:07 ` Zvi Effron 0 siblings, 1 reply; 18+ messages in thread From: T K Sourabh @ 2021-05-02 20:16 UTC (permalink / raw) To: Zvi Effron; +Cc: Xdp On Sat, May 1, 2021 at 11:26 PM Zvi Effron <zeffron@riotgames.com> wrote: > > On Sat, May 1, 2021 at 8:23 AM T K Sourabh <sourabhtk37@gmail.com> wrote: > > > > When we turned off irqbalance service, we would not see the issue as they were messing up the affinities. > > > > Another thing to ensure: equal number of queues on both interfaces. > > > Thanks for the info, it does give me some areas to investigate. > > We already have irqbalance turned off (uninstalled, actually), and > we're using the same number of queues on each device (we get the > number of queues by taking the number of cores divided by number of > interfaces). > > I am wondering if not having the matching queues for the ingress > interface and egress interface on the same cores might be a > contributing factor? I haven't tested but you could give it a try to see if packet drop happens. > > > --Zvi ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Dropped packets mapping IRQs for adjusted queue counts on i40e 2021-05-02 20:16 ` T K Sourabh @ 2021-05-04 23:07 ` Zvi Effron 2021-05-05 18:55 ` Zvi Effron 0 siblings, 1 reply; 18+ messages in thread From: Zvi Effron @ 2021-05-04 23:07 UTC (permalink / raw) To: T K Sourabh; +Cc: Xdp On Sun, May 2, 2021 at 1:16 PM T K Sourabh <sourabhtk37@gmail.com> wrote: > > On Sat, May 1, 2021 at 11:26 PM Zvi Effron <zeffron@riotgames.com> wrote: > > > > I am wondering if not having the matching queues for the ingress > > interface and egress interface on the same cores might be a > > contributing factor? > I haven't tested but you could give it a try to see if packet drop happens. I have now tested matching the IRQs for queue N on both the ingress and egress interfaces so they are on the same core. (Queue 0 for ingress and egress are on the same core, the same for queue 1 on each, etc, but queue 0 and queue 1 are on different cores). This did not resolve the packet loss. I'm at a loss for what could be causing the packet loss. It's clearly something to do with the IRQ remapping, as all I need to do to remove the loss is comment out the write to the smp_affinity files. I'm suspecting it's something with how XDP_REDIRECT is implemented in the i40e driver, but I don't know if this is a) cross driver behavior, b) expected behavior, or c) a bug. --Zvi ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Dropped packets mapping IRQs for adjusted queue counts on i40e 2021-05-04 23:07 ` Zvi Effron @ 2021-05-05 18:55 ` Zvi Effron 2021-05-05 19:53 ` Zvi Effron ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Zvi Effron @ 2021-05-05 18:55 UTC (permalink / raw) To: T K Sourabh; +Cc: Xdp On Tue, May 4, 2021 at 4:07 PM Zvi Effron <zeffron@riotgames.com> wrote: > I'm suspecting it's something with how XDP_REDIRECT is implemented in > the i40e driver, but I don't know if this is a) cross driver behavior, > b) expected behavior, or c) a bug. I think I've found the issue, and it appears to be specific to i40e (and maybe other drivers, too, but not XDP itself). When performing the XDP xmit, i40e uses the smp_processor_id() to select the tx queue (see https://elixir.bootlin.com/linux/v5.12.1/source/drivers/net/ethernet/intel/i40e/i40e_txrx.c#L3846). I'm not 100% clear on how the CPU is selected (since we don't use cores 0 and 1), we end up on a core whose id is higher than any available queue. I'm going to try to modify our IRQ mappings to test this. If I'm correct, this feels like a bug to me, since it requires a user to understand low level driver details to do IRQ remapping, which is a bit higher level. But if it's intended, we'll just have to figure out how to work around this. (Unfortunately, using split tx and rx queues is not possible with i40e, so that easy solution is unavailable.) --Zvi ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Dropped packets mapping IRQs for adjusted queue counts on i40e 2021-05-05 18:55 ` Zvi Effron @ 2021-05-05 19:53 ` Zvi Effron 2021-05-05 19:55 ` David Ahern 2021-05-05 20:01 ` Jesse Brandeburg 2 siblings, 0 replies; 18+ messages in thread From: Zvi Effron @ 2021-05-05 19:53 UTC (permalink / raw) To: T K Sourabh; +Cc: Xdp On Wed, May 5, 2021 at 11:55 AM Zvi Effron <zeffron@riotgames.com> wrote: > > I'm going to try to modify our IRQ mappings to test this. Confirmed. When I modify the IRQ mappings so that I don't use cores with a higher number than the number of queues, the drops go away. Going to move this over to the i40e driver's mailing list, as it's now confirmed to not be an XDP issue. Thanks! --Zvi ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Dropped packets mapping IRQs for adjusted queue counts on i40e 2021-05-05 18:55 ` Zvi Effron 2021-05-05 19:53 ` Zvi Effron @ 2021-05-05 19:55 ` David Ahern 2021-05-05 20:01 ` Jesse Brandeburg 2 siblings, 0 replies; 18+ messages in thread From: David Ahern @ 2021-05-05 19:55 UTC (permalink / raw) To: Zvi Effron, T K Sourabh; +Cc: Xdp On 5/5/21 12:55 PM, Zvi Effron wrote: > On Tue, May 4, 2021 at 4:07 PM Zvi Effron <zeffron@riotgames.com> wrote: >> I'm suspecting it's something with how XDP_REDIRECT is implemented in >> the i40e driver, but I don't know if this is a) cross driver behavior, >> b) expected behavior, or c) a bug. > I think I've found the issue, and it appears to be specific to i40e > (and maybe other drivers, too, but not XDP itself). > > When performing the XDP xmit, i40e uses the smp_processor_id() to > select the tx queue (see > https://elixir.bootlin.com/linux/v5.12.1/source/drivers/net/ethernet/intel/i40e/i40e_txrx.c#L3846). > I'm not 100% clear on how the CPU is selected (since we don't use > cores 0 and 1), we end up on a core whose id is higher than any > available queue. > that is known problem; mlx5 has that constraint too. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Dropped packets mapping IRQs for adjusted queue counts on i40e 2021-05-05 18:55 ` Zvi Effron 2021-05-05 19:53 ` Zvi Effron 2021-05-05 19:55 ` David Ahern @ 2021-05-05 20:01 ` Jesse Brandeburg 2021-05-05 21:21 ` [Intel-wired-lan] " Maciej Fijalkowski 2 siblings, 1 reply; 18+ messages in thread From: Jesse Brandeburg @ 2021-05-05 20:01 UTC (permalink / raw) To: Zvi Effron; +Cc: T K Sourabh, Xdp Zvi Effron wrote: > On Tue, May 4, 2021 at 4:07 PM Zvi Effron <zeffron@riotgames.com> wrote: > > I'm suspecting it's something with how XDP_REDIRECT is implemented in > > the i40e driver, but I don't know if this is a) cross driver behavior, > > b) expected behavior, or c) a bug. > I think I've found the issue, and it appears to be specific to i40e > (and maybe other drivers, too, but not XDP itself). > > When performing the XDP xmit, i40e uses the smp_processor_id() to > select the tx queue (see > https://elixir.bootlin.com/linux/v5.12.1/source/drivers/net/ethernet/intel/i40e/i40e_txrx.c#L3846). > I'm not 100% clear on how the CPU is selected (since we don't use > cores 0 and 1), we end up on a core whose id is higher than any > available queue. > > I'm going to try to modify our IRQ mappings to test this. > > If I'm correct, this feels like a bug to me, since it requires a user > to understand low level driver details to do IRQ remapping, which is a > bit higher level. But if it's intended, we'll just have to figure out > how to work around this. (Unfortunately, using split tx and rx queues > is not possible with i40e, so that easy solution is unavailable.) > > --Zvi It seems like for Intel drivers, igc, ixgbe, i40e, ice all have this problem. Notably, igb, fixes it like I would expect. Let's talk about it over on intel-wired-lan and cc netdev. Jesse ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Dropped packets mapping IRQs for adjusted queue counts on i40e 2021-05-05 20:01 ` Jesse Brandeburg @ 2021-05-05 21:21 ` Maciej Fijalkowski 0 siblings, 0 replies; 18+ messages in thread From: Maciej Fijalkowski @ 2021-05-05 21:21 UTC (permalink / raw) To: Jesse Brandeburg Cc: Zvi Effron, T K Sourabh, Xdp, intel-wired-lan, netdev, magnus.karlsson, kuba On Wed, May 05, 2021 at 01:01:28PM -0700, Jesse Brandeburg wrote: > Zvi Effron wrote: > > > On Tue, May 4, 2021 at 4:07 PM Zvi Effron <zeffron@riotgames.com> wrote: > > > I'm suspecting it's something with how XDP_REDIRECT is implemented in > > > the i40e driver, but I don't know if this is a) cross driver behavior, > > > b) expected behavior, or c) a bug. > > I think I've found the issue, and it appears to be specific to i40e > > (and maybe other drivers, too, but not XDP itself). > > > > When performing the XDP xmit, i40e uses the smp_processor_id() to > > select the tx queue (see > > https://elixir.bootlin.com/linux/v5.12.1/source/drivers/net/ethernet/intel/i40e/i40e_txrx.c#L3846). > > I'm not 100% clear on how the CPU is selected (since we don't use > > cores 0 and 1), we end up on a core whose id is higher than any > > available queue. > > > > I'm going to try to modify our IRQ mappings to test this. > > > > If I'm correct, this feels like a bug to me, since it requires a user > > to understand low level driver details to do IRQ remapping, which is a > > bit higher level. But if it's intended, we'll just have to figure out > > how to work around this. (Unfortunately, using split tx and rx queues > > is not possible with i40e, so that easy solution is unavailable.) > > > > --Zvi Hey Zvi, sorry for the lack of assistance, there has been statutory free time in Poland and today i'm in the birthday mode, but we managed to discuss the issue with Magnus and we feel like we could have a solution for that, more below. > > > It seems like for Intel drivers, igc, ixgbe, i40e, ice all have > this problem. > > Notably, igb, fixes it like I would expect. igb is correct but I think that we would like to avoid the introduction of locking for higher speed NICs in XDP data path. We talked with Magnus that for i40e and ice that have lots of HW resources, we could always create the xdp_rings array of num_online_cpus() size and use smp_processor_id() for accesses, regardless of the user's changes to queue count. This way the smp_processor_id() provides the serialization by itself as we're under napi on a given cpu, so there's no need for locking introduction - there is a per-cpu XDP ring provided. If we would stick to the approach where you adjust the size of xdp_rings down to the shrinked Rx queue count and use a smp_processor_id() % vsi->num_queue_pairs formula then we could have a resource contention. Say that you did on a 16 core system: $ ethtool -L eth0 combined 2 and then mapped the q0 to cpu1 and q1 to cpu 11. Both queues will grab the xdp_rings[1], so we would have to introduce the locking. Proposed approach would just result with more Tx queues packed onto Tx ring container of queue vector. Thoughts? Any concerns? Should we have a 'fallback' mode if we would be out of queues? Currently I have a patch for ice (which is trivial) that does the thing described above and I'm in the middle of testing it. I feel like this is also addressing Jakub's input on the old patch that tried to resolve the issue around that manner: https://lore.kernel.org/netdev/20210123195219.55f6d4e1@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/ > > Let's talk about it over on intel-wired-lan and cc netdev. Let's do this now. > > Jesse ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Intel-wired-lan] Dropped packets mapping IRQs for adjusted queue counts on i40e @ 2021-05-05 21:21 ` Maciej Fijalkowski 0 siblings, 0 replies; 18+ messages in thread From: Maciej Fijalkowski @ 2021-05-05 21:21 UTC (permalink / raw) To: intel-wired-lan On Wed, May 05, 2021 at 01:01:28PM -0700, Jesse Brandeburg wrote: > Zvi Effron wrote: > > > On Tue, May 4, 2021 at 4:07 PM Zvi Effron <zeffron@riotgames.com> wrote: > > > I'm suspecting it's something with how XDP_REDIRECT is implemented in > > > the i40e driver, but I don't know if this is a) cross driver behavior, > > > b) expected behavior, or c) a bug. > > I think I've found the issue, and it appears to be specific to i40e > > (and maybe other drivers, too, but not XDP itself). > > > > When performing the XDP xmit, i40e uses the smp_processor_id() to > > select the tx queue (see > > https://elixir.bootlin.com/linux/v5.12.1/source/drivers/net/ethernet/intel/i40e/i40e_txrx.c#L3846). > > I'm not 100% clear on how the CPU is selected (since we don't use > > cores 0 and 1), we end up on a core whose id is higher than any > > available queue. > > > > I'm going to try to modify our IRQ mappings to test this. > > > > If I'm correct, this feels like a bug to me, since it requires a user > > to understand low level driver details to do IRQ remapping, which is a > > bit higher level. But if it's intended, we'll just have to figure out > > how to work around this. (Unfortunately, using split tx and rx queues > > is not possible with i40e, so that easy solution is unavailable.) > > > > --Zvi Hey Zvi, sorry for the lack of assistance, there has been statutory free time in Poland and today i'm in the birthday mode, but we managed to discuss the issue with Magnus and we feel like we could have a solution for that, more below. > > > It seems like for Intel drivers, igc, ixgbe, i40e, ice all have > this problem. > > Notably, igb, fixes it like I would expect. igb is correct but I think that we would like to avoid the introduction of locking for higher speed NICs in XDP data path. We talked with Magnus that for i40e and ice that have lots of HW resources, we could always create the xdp_rings array of num_online_cpus() size and use smp_processor_id() for accesses, regardless of the user's changes to queue count. This way the smp_processor_id() provides the serialization by itself as we're under napi on a given cpu, so there's no need for locking introduction - there is a per-cpu XDP ring provided. If we would stick to the approach where you adjust the size of xdp_rings down to the shrinked Rx queue count and use a smp_processor_id() % vsi->num_queue_pairs formula then we could have a resource contention. Say that you did on a 16 core system: $ ethtool -L eth0 combined 2 and then mapped the q0 to cpu1 and q1 to cpu 11. Both queues will grab the xdp_rings[1], so we would have to introduce the locking. Proposed approach would just result with more Tx queues packed onto Tx ring container of queue vector. Thoughts? Any concerns? Should we have a 'fallback' mode if we would be out of queues? Currently I have a patch for ice (which is trivial) that does the thing described above and I'm in the middle of testing it. I feel like this is also addressing Jakub's input on the old patch that tried to resolve the issue around that manner: https://lore.kernel.org/netdev/20210123195219.55f6d4e1 at kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/ > > Let's talk about it over on intel-wired-lan and cc netdev. Let's do this now. > > Jesse ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Dropped packets mapping IRQs for adjusted queue counts on i40e 2021-05-05 21:21 ` [Intel-wired-lan] " Maciej Fijalkowski @ 2021-05-06 10:29 ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= -1 siblings, 0 replies; 18+ messages in thread From: Toke Høiland-Jørgensen @ 2021-05-06 10:29 UTC (permalink / raw) To: Maciej Fijalkowski, Jesse Brandeburg Cc: Zvi Effron, T K Sourabh, Xdp, intel-wired-lan, netdev, magnus.karlsson, kuba Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes: > On Wed, May 05, 2021 at 01:01:28PM -0700, Jesse Brandeburg wrote: >> Zvi Effron wrote: >> >> > On Tue, May 4, 2021 at 4:07 PM Zvi Effron <zeffron@riotgames.com> wrote: >> > > I'm suspecting it's something with how XDP_REDIRECT is implemented in >> > > the i40e driver, but I don't know if this is a) cross driver behavior, >> > > b) expected behavior, or c) a bug. >> > I think I've found the issue, and it appears to be specific to i40e >> > (and maybe other drivers, too, but not XDP itself). >> > >> > When performing the XDP xmit, i40e uses the smp_processor_id() to >> > select the tx queue (see >> > https://elixir.bootlin.com/linux/v5.12.1/source/drivers/net/ethernet/intel/i40e/i40e_txrx.c#L3846). >> > I'm not 100% clear on how the CPU is selected (since we don't use >> > cores 0 and 1), we end up on a core whose id is higher than any >> > available queue. >> > >> > I'm going to try to modify our IRQ mappings to test this. >> > >> > If I'm correct, this feels like a bug to me, since it requires a user >> > to understand low level driver details to do IRQ remapping, which is a >> > bit higher level. But if it's intended, we'll just have to figure out >> > how to work around this. (Unfortunately, using split tx and rx queues >> > is not possible with i40e, so that easy solution is unavailable.) >> > >> > --Zvi > > Hey Zvi, sorry for the lack of assistance, there has been statutory free > time in Poland and today i'm in the birthday mode, but we managed to > discuss the issue with Magnus and we feel like we could have a solution > for that, more below. > >> >> >> It seems like for Intel drivers, igc, ixgbe, i40e, ice all have >> this problem. >> >> Notably, igb, fixes it like I would expect. > > igb is correct but I think that we would like to avoid the introduction of > locking for higher speed NICs in XDP data path. > > We talked with Magnus that for i40e and ice that have lots of HW > resources, we could always create the xdp_rings array of num_online_cpus() > size and use smp_processor_id() for accesses, regardless of the user's > changes to queue count. What is "lots"? Systems with hundreds of CPUs exist (and I seem to recall an issue with just such a system on Intel hardware(?)). Also, what if num_online_cpus() changes? > This way the smp_processor_id() provides the serialization by itself as > we're under napi on a given cpu, so there's no need for locking > introduction - there is a per-cpu XDP ring provided. If we would stick to > the approach where you adjust the size of xdp_rings down to the shrinked > Rx queue count and use a smp_processor_id() % vsi->num_queue_pairs formula > then we could have a resource contention. Say that you did on a 16 core > system: > $ ethtool -L eth0 combined 2 > > and then mapped the q0 to cpu1 and q1 to cpu 11. Both queues will grab the > xdp_rings[1], so we would have to introduce the locking. > > Proposed approach would just result with more Tx queues packed onto Tx > ring container of queue vector. > > Thoughts? Any concerns? Should we have a 'fallback' mode if we would be > out of queues? Yes, please :) -Toke ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Intel-wired-lan] Dropped packets mapping IRQs for adjusted queue counts on i40e @ 2021-05-06 10:29 ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= 0 siblings, 0 replies; 18+ messages in thread From: Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= @ 2021-05-06 10:29 UTC (permalink / raw) To: intel-wired-lan Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes: > On Wed, May 05, 2021 at 01:01:28PM -0700, Jesse Brandeburg wrote: >> Zvi Effron wrote: >> >> > On Tue, May 4, 2021 at 4:07 PM Zvi Effron <zeffron@riotgames.com> wrote: >> > > I'm suspecting it's something with how XDP_REDIRECT is implemented in >> > > the i40e driver, but I don't know if this is a) cross driver behavior, >> > > b) expected behavior, or c) a bug. >> > I think I've found the issue, and it appears to be specific to i40e >> > (and maybe other drivers, too, but not XDP itself). >> > >> > When performing the XDP xmit, i40e uses the smp_processor_id() to >> > select the tx queue (see >> > https://elixir.bootlin.com/linux/v5.12.1/source/drivers/net/ethernet/intel/i40e/i40e_txrx.c#L3846). >> > I'm not 100% clear on how the CPU is selected (since we don't use >> > cores 0 and 1), we end up on a core whose id is higher than any >> > available queue. >> > >> > I'm going to try to modify our IRQ mappings to test this. >> > >> > If I'm correct, this feels like a bug to me, since it requires a user >> > to understand low level driver details to do IRQ remapping, which is a >> > bit higher level. But if it's intended, we'll just have to figure out >> > how to work around this. (Unfortunately, using split tx and rx queues >> > is not possible with i40e, so that easy solution is unavailable.) >> > >> > --Zvi > > Hey Zvi, sorry for the lack of assistance, there has been statutory free > time in Poland and today i'm in the birthday mode, but we managed to > discuss the issue with Magnus and we feel like we could have a solution > for that, more below. > >> >> >> It seems like for Intel drivers, igc, ixgbe, i40e, ice all have >> this problem. >> >> Notably, igb, fixes it like I would expect. > > igb is correct but I think that we would like to avoid the introduction of > locking for higher speed NICs in XDP data path. > > We talked with Magnus that for i40e and ice that have lots of HW > resources, we could always create the xdp_rings array of num_online_cpus() > size and use smp_processor_id() for accesses, regardless of the user's > changes to queue count. What is "lots"? Systems with hundreds of CPUs exist (and I seem to recall an issue with just such a system on Intel hardware(?)). Also, what if num_online_cpus() changes? > This way the smp_processor_id() provides the serialization by itself as > we're under napi on a given cpu, so there's no need for locking > introduction - there is a per-cpu XDP ring provided. If we would stick to > the approach where you adjust the size of xdp_rings down to the shrinked > Rx queue count and use a smp_processor_id() % vsi->num_queue_pairs formula > then we could have a resource contention. Say that you did on a 16 core > system: > $ ethtool -L eth0 combined 2 > > and then mapped the q0 to cpu1 and q1 to cpu 11. Both queues will grab the > xdp_rings[1], so we would have to introduce the locking. > > Proposed approach would just result with more Tx queues packed onto Tx > ring container of queue vector. > > Thoughts? Any concerns? Should we have a 'fallback' mode if we would be > out of queues? Yes, please :) -Toke ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Dropped packets mapping IRQs for adjusted queue counts on i40e 2021-05-06 10:29 ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= @ 2021-05-09 15:50 ` Maciej Fijalkowski -1 siblings, 0 replies; 18+ messages in thread From: Maciej Fijalkowski @ 2021-05-09 15:50 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: Jesse Brandeburg, Zvi Effron, T K Sourabh, Xdp, intel-wired-lan, netdev, magnus.karlsson, kuba On Thu, May 06, 2021 at 12:29:40PM +0200, Toke Høiland-Jørgensen wrote: > Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes: > > > On Wed, May 05, 2021 at 01:01:28PM -0700, Jesse Brandeburg wrote: > >> Zvi Effron wrote: > >> > >> > On Tue, May 4, 2021 at 4:07 PM Zvi Effron <zeffron@riotgames.com> wrote: > >> > > I'm suspecting it's something with how XDP_REDIRECT is implemented in > >> > > the i40e driver, but I don't know if this is a) cross driver behavior, > >> > > b) expected behavior, or c) a bug. > >> > I think I've found the issue, and it appears to be specific to i40e > >> > (and maybe other drivers, too, but not XDP itself). > >> > > >> > When performing the XDP xmit, i40e uses the smp_processor_id() to > >> > select the tx queue (see > >> > https://elixir.bootlin.com/linux/v5.12.1/source/drivers/net/ethernet/intel/i40e/i40e_txrx.c#L3846). > >> > I'm not 100% clear on how the CPU is selected (since we don't use > >> > cores 0 and 1), we end up on a core whose id is higher than any > >> > available queue. > >> > > >> > I'm going to try to modify our IRQ mappings to test this. > >> > > >> > If I'm correct, this feels like a bug to me, since it requires a user > >> > to understand low level driver details to do IRQ remapping, which is a > >> > bit higher level. But if it's intended, we'll just have to figure out > >> > how to work around this. (Unfortunately, using split tx and rx queues > >> > is not possible with i40e, so that easy solution is unavailable.) > >> > > >> > --Zvi > > > > Hey Zvi, sorry for the lack of assistance, there has been statutory free > > time in Poland and today i'm in the birthday mode, but we managed to > > discuss the issue with Magnus and we feel like we could have a solution > > for that, more below. > > > >> > >> > >> It seems like for Intel drivers, igc, ixgbe, i40e, ice all have > >> this problem. > >> > >> Notably, igb, fixes it like I would expect. > > > > igb is correct but I think that we would like to avoid the introduction of > > locking for higher speed NICs in XDP data path. > > > > We talked with Magnus that for i40e and ice that have lots of HW > > resources, we could always create the xdp_rings array of num_online_cpus() > > size and use smp_processor_id() for accesses, regardless of the user's > > changes to queue count. > > What is "lots"? Systems with hundreds of CPUs exist (and I seem to > recall an issue with just such a system on Intel hardware(?)). Also, > what if num_online_cpus() changes? "Lots" is 16k for ice. For i40e datasheet tells that it's only 1536 for whole device, so I back off from the statement that i40e has a lot of resources :) Also, s/num_online_cpus()/num_possible_cpus(). > > > This way the smp_processor_id() provides the serialization by itself as > > we're under napi on a given cpu, so there's no need for locking > > introduction - there is a per-cpu XDP ring provided. If we would stick to > > the approach where you adjust the size of xdp_rings down to the shrinked > > Rx queue count and use a smp_processor_id() % vsi->num_queue_pairs formula > > then we could have a resource contention. Say that you did on a 16 core > > system: > > $ ethtool -L eth0 combined 2 > > > > and then mapped the q0 to cpu1 and q1 to cpu 11. Both queues will grab the > > xdp_rings[1], so we would have to introduce the locking. > > > > Proposed approach would just result with more Tx queues packed onto Tx > > ring container of queue vector. > > > > Thoughts? Any concerns? Should we have a 'fallback' mode if we would be > > out of queues? > > Yes, please :) How to have a fallback (in drivers that need it) in a way that wouldn't hurt the scenario where queue per cpu requirement is satisfied? > > -Toke > ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Intel-wired-lan] Dropped packets mapping IRQs for adjusted queue counts on i40e @ 2021-05-09 15:50 ` Maciej Fijalkowski 0 siblings, 0 replies; 18+ messages in thread From: Maciej Fijalkowski @ 2021-05-09 15:50 UTC (permalink / raw) To: intel-wired-lan On Thu, May 06, 2021 at 12:29:40PM +0200, Toke H?iland-J?rgensen wrote: > Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes: > > > On Wed, May 05, 2021 at 01:01:28PM -0700, Jesse Brandeburg wrote: > >> Zvi Effron wrote: > >> > >> > On Tue, May 4, 2021 at 4:07 PM Zvi Effron <zeffron@riotgames.com> wrote: > >> > > I'm suspecting it's something with how XDP_REDIRECT is implemented in > >> > > the i40e driver, but I don't know if this is a) cross driver behavior, > >> > > b) expected behavior, or c) a bug. > >> > I think I've found the issue, and it appears to be specific to i40e > >> > (and maybe other drivers, too, but not XDP itself). > >> > > >> > When performing the XDP xmit, i40e uses the smp_processor_id() to > >> > select the tx queue (see > >> > https://elixir.bootlin.com/linux/v5.12.1/source/drivers/net/ethernet/intel/i40e/i40e_txrx.c#L3846). > >> > I'm not 100% clear on how the CPU is selected (since we don't use > >> > cores 0 and 1), we end up on a core whose id is higher than any > >> > available queue. > >> > > >> > I'm going to try to modify our IRQ mappings to test this. > >> > > >> > If I'm correct, this feels like a bug to me, since it requires a user > >> > to understand low level driver details to do IRQ remapping, which is a > >> > bit higher level. But if it's intended, we'll just have to figure out > >> > how to work around this. (Unfortunately, using split tx and rx queues > >> > is not possible with i40e, so that easy solution is unavailable.) > >> > > >> > --Zvi > > > > Hey Zvi, sorry for the lack of assistance, there has been statutory free > > time in Poland and today i'm in the birthday mode, but we managed to > > discuss the issue with Magnus and we feel like we could have a solution > > for that, more below. > > > >> > >> > >> It seems like for Intel drivers, igc, ixgbe, i40e, ice all have > >> this problem. > >> > >> Notably, igb, fixes it like I would expect. > > > > igb is correct but I think that we would like to avoid the introduction of > > locking for higher speed NICs in XDP data path. > > > > We talked with Magnus that for i40e and ice that have lots of HW > > resources, we could always create the xdp_rings array of num_online_cpus() > > size and use smp_processor_id() for accesses, regardless of the user's > > changes to queue count. > > What is "lots"? Systems with hundreds of CPUs exist (and I seem to > recall an issue with just such a system on Intel hardware(?)). Also, > what if num_online_cpus() changes? "Lots" is 16k for ice. For i40e datasheet tells that it's only 1536 for whole device, so I back off from the statement that i40e has a lot of resources :) Also, s/num_online_cpus()/num_possible_cpus(). > > > This way the smp_processor_id() provides the serialization by itself as > > we're under napi on a given cpu, so there's no need for locking > > introduction - there is a per-cpu XDP ring provided. If we would stick to > > the approach where you adjust the size of xdp_rings down to the shrinked > > Rx queue count and use a smp_processor_id() % vsi->num_queue_pairs formula > > then we could have a resource contention. Say that you did on a 16 core > > system: > > $ ethtool -L eth0 combined 2 > > > > and then mapped the q0 to cpu1 and q1 to cpu 11. Both queues will grab the > > xdp_rings[1], so we would have to introduce the locking. > > > > Proposed approach would just result with more Tx queues packed onto Tx > > ring container of queue vector. > > > > Thoughts? Any concerns? Should we have a 'fallback' mode if we would be > > out of queues? > > Yes, please :) How to have a fallback (in drivers that need it) in a way that wouldn't hurt the scenario where queue per cpu requirement is satisfied? > > -Toke > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Dropped packets mapping IRQs for adjusted queue counts on i40e 2021-05-09 15:50 ` [Intel-wired-lan] " Maciej Fijalkowski @ 2021-05-10 11:22 ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= -1 siblings, 0 replies; 18+ messages in thread From: Toke Høiland-Jørgensen @ 2021-05-10 11:22 UTC (permalink / raw) To: Maciej Fijalkowski Cc: Jesse Brandeburg, Zvi Effron, T K Sourabh, Xdp, intel-wired-lan, netdev, magnus.karlsson, kuba Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes: > On Thu, May 06, 2021 at 12:29:40PM +0200, Toke Høiland-Jørgensen wrote: >> Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes: >> >> > On Wed, May 05, 2021 at 01:01:28PM -0700, Jesse Brandeburg wrote: >> >> Zvi Effron wrote: >> >> >> >> > On Tue, May 4, 2021 at 4:07 PM Zvi Effron <zeffron@riotgames.com> wrote: >> >> > > I'm suspecting it's something with how XDP_REDIRECT is implemented in >> >> > > the i40e driver, but I don't know if this is a) cross driver behavior, >> >> > > b) expected behavior, or c) a bug. >> >> > I think I've found the issue, and it appears to be specific to i40e >> >> > (and maybe other drivers, too, but not XDP itself). >> >> > >> >> > When performing the XDP xmit, i40e uses the smp_processor_id() to >> >> > select the tx queue (see >> >> > https://elixir.bootlin.com/linux/v5.12.1/source/drivers/net/ethernet/intel/i40e/i40e_txrx.c#L3846). >> >> > I'm not 100% clear on how the CPU is selected (since we don't use >> >> > cores 0 and 1), we end up on a core whose id is higher than any >> >> > available queue. >> >> > >> >> > I'm going to try to modify our IRQ mappings to test this. >> >> > >> >> > If I'm correct, this feels like a bug to me, since it requires a user >> >> > to understand low level driver details to do IRQ remapping, which is a >> >> > bit higher level. But if it's intended, we'll just have to figure out >> >> > how to work around this. (Unfortunately, using split tx and rx queues >> >> > is not possible with i40e, so that easy solution is unavailable.) >> >> > >> >> > --Zvi >> > >> > Hey Zvi, sorry for the lack of assistance, there has been statutory free >> > time in Poland and today i'm in the birthday mode, but we managed to >> > discuss the issue with Magnus and we feel like we could have a solution >> > for that, more below. >> > >> >> >> >> >> >> It seems like for Intel drivers, igc, ixgbe, i40e, ice all have >> >> this problem. >> >> >> >> Notably, igb, fixes it like I would expect. >> > >> > igb is correct but I think that we would like to avoid the introduction of >> > locking for higher speed NICs in XDP data path. >> > >> > We talked with Magnus that for i40e and ice that have lots of HW >> > resources, we could always create the xdp_rings array of num_online_cpus() >> > size and use smp_processor_id() for accesses, regardless of the user's >> > changes to queue count. >> >> What is "lots"? Systems with hundreds of CPUs exist (and I seem to >> recall an issue with just such a system on Intel hardware(?)). Also, >> what if num_online_cpus() changes? > > "Lots" is 16k for ice. For i40e datasheet tells that it's only 1536 for > whole device, so I back off from the statement that i40e has a lot of > resources :) > > Also, s/num_online_cpus()/num_possible_cpus(). OK, even 1536 is more than I expected; I figured it would be way lower, which is why you were suggesting to use num_online_cpus() instead; but yeah, num_possible_cpus() is obviously better, then :) >> > This way the smp_processor_id() provides the serialization by itself as >> > we're under napi on a given cpu, so there's no need for locking >> > introduction - there is a per-cpu XDP ring provided. If we would stick to >> > the approach where you adjust the size of xdp_rings down to the shrinked >> > Rx queue count and use a smp_processor_id() % vsi->num_queue_pairs formula >> > then we could have a resource contention. Say that you did on a 16 core >> > system: >> > $ ethtool -L eth0 combined 2 >> > >> > and then mapped the q0 to cpu1 and q1 to cpu 11. Both queues will grab the >> > xdp_rings[1], so we would have to introduce the locking. >> > >> > Proposed approach would just result with more Tx queues packed onto Tx >> > ring container of queue vector. >> > >> > Thoughts? Any concerns? Should we have a 'fallback' mode if we would be >> > out of queues? >> >> Yes, please :) > > How to have a fallback (in drivers that need it) in a way that wouldn't > hurt the scenario where queue per cpu requirement is satisfied? Well, it should be possible to detect this at setup time, right? Not too familiar with the driver code, but would it be possible to statically dispatch to an entirely different code path if this happens? -Toke ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Intel-wired-lan] Dropped packets mapping IRQs for adjusted queue counts on i40e @ 2021-05-10 11:22 ` Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= 0 siblings, 0 replies; 18+ messages in thread From: Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= @ 2021-05-10 11:22 UTC (permalink / raw) To: intel-wired-lan Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes: > On Thu, May 06, 2021 at 12:29:40PM +0200, Toke H?iland-J?rgensen wrote: >> Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes: >> >> > On Wed, May 05, 2021 at 01:01:28PM -0700, Jesse Brandeburg wrote: >> >> Zvi Effron wrote: >> >> >> >> > On Tue, May 4, 2021 at 4:07 PM Zvi Effron <zeffron@riotgames.com> wrote: >> >> > > I'm suspecting it's something with how XDP_REDIRECT is implemented in >> >> > > the i40e driver, but I don't know if this is a) cross driver behavior, >> >> > > b) expected behavior, or c) a bug. >> >> > I think I've found the issue, and it appears to be specific to i40e >> >> > (and maybe other drivers, too, but not XDP itself). >> >> > >> >> > When performing the XDP xmit, i40e uses the smp_processor_id() to >> >> > select the tx queue (see >> >> > https://elixir.bootlin.com/linux/v5.12.1/source/drivers/net/ethernet/intel/i40e/i40e_txrx.c#L3846). >> >> > I'm not 100% clear on how the CPU is selected (since we don't use >> >> > cores 0 and 1), we end up on a core whose id is higher than any >> >> > available queue. >> >> > >> >> > I'm going to try to modify our IRQ mappings to test this. >> >> > >> >> > If I'm correct, this feels like a bug to me, since it requires a user >> >> > to understand low level driver details to do IRQ remapping, which is a >> >> > bit higher level. But if it's intended, we'll just have to figure out >> >> > how to work around this. (Unfortunately, using split tx and rx queues >> >> > is not possible with i40e, so that easy solution is unavailable.) >> >> > >> >> > --Zvi >> > >> > Hey Zvi, sorry for the lack of assistance, there has been statutory free >> > time in Poland and today i'm in the birthday mode, but we managed to >> > discuss the issue with Magnus and we feel like we could have a solution >> > for that, more below. >> > >> >> >> >> >> >> It seems like for Intel drivers, igc, ixgbe, i40e, ice all have >> >> this problem. >> >> >> >> Notably, igb, fixes it like I would expect. >> > >> > igb is correct but I think that we would like to avoid the introduction of >> > locking for higher speed NICs in XDP data path. >> > >> > We talked with Magnus that for i40e and ice that have lots of HW >> > resources, we could always create the xdp_rings array of num_online_cpus() >> > size and use smp_processor_id() for accesses, regardless of the user's >> > changes to queue count. >> >> What is "lots"? Systems with hundreds of CPUs exist (and I seem to >> recall an issue with just such a system on Intel hardware(?)). Also, >> what if num_online_cpus() changes? > > "Lots" is 16k for ice. For i40e datasheet tells that it's only 1536 for > whole device, so I back off from the statement that i40e has a lot of > resources :) > > Also, s/num_online_cpus()/num_possible_cpus(). OK, even 1536 is more than I expected; I figured it would be way lower, which is why you were suggesting to use num_online_cpus() instead; but yeah, num_possible_cpus() is obviously better, then :) >> > This way the smp_processor_id() provides the serialization by itself as >> > we're under napi on a given cpu, so there's no need for locking >> > introduction - there is a per-cpu XDP ring provided. If we would stick to >> > the approach where you adjust the size of xdp_rings down to the shrinked >> > Rx queue count and use a smp_processor_id() % vsi->num_queue_pairs formula >> > then we could have a resource contention. Say that you did on a 16 core >> > system: >> > $ ethtool -L eth0 combined 2 >> > >> > and then mapped the q0 to cpu1 and q1 to cpu 11. Both queues will grab the >> > xdp_rings[1], so we would have to introduce the locking. >> > >> > Proposed approach would just result with more Tx queues packed onto Tx >> > ring container of queue vector. >> > >> > Thoughts? Any concerns? Should we have a 'fallback' mode if we would be >> > out of queues? >> >> Yes, please :) > > How to have a fallback (in drivers that need it) in a way that wouldn't > hurt the scenario where queue per cpu requirement is satisfied? Well, it should be possible to detect this at setup time, right? Not too familiar with the driver code, but would it be possible to statically dispatch to an entirely different code path if this happens? -Toke ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Dropped packets mapping IRQs for adjusted queue counts on i40e 2021-05-10 11:22 ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= @ 2021-05-10 15:01 ` Jesper Dangaard Brouer -1 siblings, 0 replies; 18+ messages in thread From: Jesper Dangaard Brouer @ 2021-05-10 15:01 UTC (permalink / raw) To: Toke Høiland-Jørgensen Cc: brouer, Maciej Fijalkowski, Jesse Brandeburg, Zvi Effron, T K Sourabh, Xdp, intel-wired-lan, netdev, magnus.karlsson, kuba, Tony Nguyen On Mon, 10 May 2021 13:22:54 +0200 Toke Høiland-Jørgensen <toke@redhat.com> wrote: > Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes: > > > On Thu, May 06, 2021 at 12:29:40PM +0200, Toke Høiland-Jørgensen wrote: > >> Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes: > >> > >> > On Wed, May 05, 2021 at 01:01:28PM -0700, Jesse Brandeburg wrote: > >> >> Zvi Effron wrote: > >> >> > >> >> > On Tue, May 4, 2021 at 4:07 PM Zvi Effron <zeffron@riotgames.com> wrote: > >> >> > > I'm suspecting it's something with how XDP_REDIRECT is implemented in > >> >> > > the i40e driver, but I don't know if this is a) cross driver behavior, > >> >> > > b) expected behavior, or c) a bug. > >> >> > I think I've found the issue, and it appears to be specific to i40e > >> >> > (and maybe other drivers, too, but not XDP itself). > >> >> > > >> >> > When performing the XDP xmit, i40e uses the smp_processor_id() to > >> >> > select the tx queue (see > >> >> > https://elixir.bootlin.com/linux/v5.12.1/source/drivers/net/ethernet/intel/i40e/i40e_txrx.c#L3846). > >> >> > I'm not 100% clear on how the CPU is selected (since we don't use > >> >> > cores 0 and 1), we end up on a core whose id is higher than any > >> >> > available queue. > >> >> > > >> >> > I'm going to try to modify our IRQ mappings to test this. > >> >> > > >> >> > If I'm correct, this feels like a bug to me, since it requires a user > >> >> > to understand low level driver details to do IRQ remapping, which is a > >> >> > bit higher level. But if it's intended, we'll just have to figure out > >> >> > how to work around this. (Unfortunately, using split tx and rx queues > >> >> > is not possible with i40e, so that easy solution is unavailable.) > >> >> > > >> >> > --Zvi > >> > > >> > Hey Zvi, sorry for the lack of assistance, there has been statutory free > >> > time in Poland and today i'm in the birthday mode, but we managed to > >> > discuss the issue with Magnus and we feel like we could have a solution > >> > for that, more below. > >> > > >> >> > >> >> > >> >> It seems like for Intel drivers, igc, ixgbe, i40e, ice all have > >> >> this problem. > >> >> > >> >> Notably, igb, fixes it like I would expect. > >> > > >> > igb is correct but I think that we would like to avoid the introduction of > >> > locking for higher speed NICs in XDP data path. > >> > > >> > We talked with Magnus that for i40e and ice that have lots of HW > >> > resources, we could always create the xdp_rings array of num_online_cpus() > >> > size and use smp_processor_id() for accesses, regardless of the user's > >> > changes to queue count. > >> > >> What is "lots"? Systems with hundreds of CPUs exist (and I seem to > >> recall an issue with just such a system on Intel hardware(?)). Also, > >> what if num_online_cpus() changes? > > > > "Lots" is 16k for ice. For i40e datasheet tells that it's only 1536 for > > whole device, so I back off from the statement that i40e has a lot of > > resources :) > > > > Also, s/num_online_cpus()/num_possible_cpus(). > > OK, even 1536 is more than I expected; I figured it would be way lower, > which is why you were suggesting to use num_online_cpus() instead; but > yeah, num_possible_cpus() is obviously better, then :) > > >> > This way the smp_processor_id() provides the serialization by itself as > >> > we're under napi on a given cpu, so there's no need for locking > >> > introduction - there is a per-cpu XDP ring provided. If we would stick to > >> > the approach where you adjust the size of xdp_rings down to the shrinked > >> > Rx queue count and use a smp_processor_id() % vsi->num_queue_pairs formula > >> > then we could have a resource contention. Say that you did on a 16 core > >> > system: > >> > $ ethtool -L eth0 combined 2 > >> > > >> > and then mapped the q0 to cpu1 and q1 to cpu 11. Both queues will grab the > >> > xdp_rings[1], so we would have to introduce the locking. > >> > > >> > Proposed approach would just result with more Tx queues packed onto Tx > >> > ring container of queue vector. > >> > > >> > Thoughts? Any concerns? Should we have a 'fallback' mode if we would be > >> > out of queues? > >> > >> Yes, please :) > > > > How to have a fallback (in drivers that need it) in a way that wouldn't > > hurt the scenario where queue per cpu requirement is satisfied? > > Well, it should be possible to detect this at setup time, right? Not too > familiar with the driver code, but would it be possible to statically > dispatch to an entirely different code path if this happens? The ndo_xdp_xmit call is a function pointer. Thus, if it happens at this level, then at setup time the driver can simply change the NDO to use a TX-queue-locked variant. I actually consider it a bug that i40e allow this misconfig to happen. The ixgbe driver solves the problem by rejecting XDP attach if the system have more CPUs than TXQs available. IMHO it is a better solution to add shard'ed/partitioned TXQ-locking when this situation happens, instead of denying XDP attach. Since the original XDP-redirect the ndo_xdp_xmit call have gotten bulking added, thus the locking will be amortized over the bulk. One question is how do we inform the end-user that XDP will be using a slightly slower TXQ-locking scheme? Given we have no XDP-features exposed, I suggest a simple kernel log message, which we already have for other XDP situations when the MTU is too large, or TSO is enabled. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer ^ permalink raw reply [flat|nested] 18+ messages in thread
* [Intel-wired-lan] Dropped packets mapping IRQs for adjusted queue counts on i40e @ 2021-05-10 15:01 ` Jesper Dangaard Brouer 0 siblings, 0 replies; 18+ messages in thread From: Jesper Dangaard Brouer @ 2021-05-10 15:01 UTC (permalink / raw) To: intel-wired-lan On Mon, 10 May 2021 13:22:54 +0200 Toke H?iland-J?rgensen <toke@redhat.com> wrote: > Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes: > > > On Thu, May 06, 2021 at 12:29:40PM +0200, Toke H?iland-J?rgensen wrote: > >> Maciej Fijalkowski <maciej.fijalkowski@intel.com> writes: > >> > >> > On Wed, May 05, 2021 at 01:01:28PM -0700, Jesse Brandeburg wrote: > >> >> Zvi Effron wrote: > >> >> > >> >> > On Tue, May 4, 2021 at 4:07 PM Zvi Effron <zeffron@riotgames.com> wrote: > >> >> > > I'm suspecting it's something with how XDP_REDIRECT is implemented in > >> >> > > the i40e driver, but I don't know if this is a) cross driver behavior, > >> >> > > b) expected behavior, or c) a bug. > >> >> > I think I've found the issue, and it appears to be specific to i40e > >> >> > (and maybe other drivers, too, but not XDP itself). > >> >> > > >> >> > When performing the XDP xmit, i40e uses the smp_processor_id() to > >> >> > select the tx queue (see > >> >> > https://elixir.bootlin.com/linux/v5.12.1/source/drivers/net/ethernet/intel/i40e/i40e_txrx.c#L3846). > >> >> > I'm not 100% clear on how the CPU is selected (since we don't use > >> >> > cores 0 and 1), we end up on a core whose id is higher than any > >> >> > available queue. > >> >> > > >> >> > I'm going to try to modify our IRQ mappings to test this. > >> >> > > >> >> > If I'm correct, this feels like a bug to me, since it requires a user > >> >> > to understand low level driver details to do IRQ remapping, which is a > >> >> > bit higher level. But if it's intended, we'll just have to figure out > >> >> > how to work around this. (Unfortunately, using split tx and rx queues > >> >> > is not possible with i40e, so that easy solution is unavailable.) > >> >> > > >> >> > --Zvi > >> > > >> > Hey Zvi, sorry for the lack of assistance, there has been statutory free > >> > time in Poland and today i'm in the birthday mode, but we managed to > >> > discuss the issue with Magnus and we feel like we could have a solution > >> > for that, more below. > >> > > >> >> > >> >> > >> >> It seems like for Intel drivers, igc, ixgbe, i40e, ice all have > >> >> this problem. > >> >> > >> >> Notably, igb, fixes it like I would expect. > >> > > >> > igb is correct but I think that we would like to avoid the introduction of > >> > locking for higher speed NICs in XDP data path. > >> > > >> > We talked with Magnus that for i40e and ice that have lots of HW > >> > resources, we could always create the xdp_rings array of num_online_cpus() > >> > size and use smp_processor_id() for accesses, regardless of the user's > >> > changes to queue count. > >> > >> What is "lots"? Systems with hundreds of CPUs exist (and I seem to > >> recall an issue with just such a system on Intel hardware(?)). Also, > >> what if num_online_cpus() changes? > > > > "Lots" is 16k for ice. For i40e datasheet tells that it's only 1536 for > > whole device, so I back off from the statement that i40e has a lot of > > resources :) > > > > Also, s/num_online_cpus()/num_possible_cpus(). > > OK, even 1536 is more than I expected; I figured it would be way lower, > which is why you were suggesting to use num_online_cpus() instead; but > yeah, num_possible_cpus() is obviously better, then :) > > >> > This way the smp_processor_id() provides the serialization by itself as > >> > we're under napi on a given cpu, so there's no need for locking > >> > introduction - there is a per-cpu XDP ring provided. If we would stick to > >> > the approach where you adjust the size of xdp_rings down to the shrinked > >> > Rx queue count and use a smp_processor_id() % vsi->num_queue_pairs formula > >> > then we could have a resource contention. Say that you did on a 16 core > >> > system: > >> > $ ethtool -L eth0 combined 2 > >> > > >> > and then mapped the q0 to cpu1 and q1 to cpu 11. Both queues will grab the > >> > xdp_rings[1], so we would have to introduce the locking. > >> > > >> > Proposed approach would just result with more Tx queues packed onto Tx > >> > ring container of queue vector. > >> > > >> > Thoughts? Any concerns? Should we have a 'fallback' mode if we would be > >> > out of queues? > >> > >> Yes, please :) > > > > How to have a fallback (in drivers that need it) in a way that wouldn't > > hurt the scenario where queue per cpu requirement is satisfied? > > Well, it should be possible to detect this at setup time, right? Not too > familiar with the driver code, but would it be possible to statically > dispatch to an entirely different code path if this happens? The ndo_xdp_xmit call is a function pointer. Thus, if it happens at this level, then at setup time the driver can simply change the NDO to use a TX-queue-locked variant. I actually consider it a bug that i40e allow this misconfig to happen. The ixgbe driver solves the problem by rejecting XDP attach if the system have more CPUs than TXQs available. IMHO it is a better solution to add shard'ed/partitioned TXQ-locking when this situation happens, instead of denying XDP attach. Since the original XDP-redirect the ndo_xdp_xmit call have gotten bulking added, thus the locking will be amortized over the bulk. One question is how do we inform the end-user that XDP will be using a slightly slower TXQ-locking scheme? Given we have no XDP-features exposed, I suggest a simple kernel log message, which we already have for other XDP situations when the MTU is too large, or TSO is enabled. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2021-05-10 15:05 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-05-01 0:45 Dropped packets mapping IRQs for adjusted queue counts on i40e Zvi Effron [not found] ` <CADS2XXpjasmJKP__oHsrvv3EG8n-FjB6sqHwgQfh7QgeJ8GrrQ@mail.gmail.com> 2021-05-01 17:56 ` Zvi Effron 2021-05-02 20:16 ` T K Sourabh 2021-05-04 23:07 ` Zvi Effron 2021-05-05 18:55 ` Zvi Effron 2021-05-05 19:53 ` Zvi Effron 2021-05-05 19:55 ` David Ahern 2021-05-05 20:01 ` Jesse Brandeburg 2021-05-05 21:21 ` Maciej Fijalkowski 2021-05-05 21:21 ` [Intel-wired-lan] " Maciej Fijalkowski 2021-05-06 10:29 ` Toke Høiland-Jørgensen 2021-05-06 10:29 ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= 2021-05-09 15:50 ` Maciej Fijalkowski 2021-05-09 15:50 ` [Intel-wired-lan] " Maciej Fijalkowski 2021-05-10 11:22 ` Toke Høiland-Jørgensen 2021-05-10 11:22 ` [Intel-wired-lan] " Toke =?unknown-8bit?q?H=C3=B8iland-J=C3=B8rgensen?= 2021-05-10 15:01 ` Jesper Dangaard Brouer 2021-05-10 15:01 ` [Intel-wired-lan] " Jesper Dangaard Brouer
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.