* [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return @ 2019-09-20 18:58 Matthew Cover 2019-09-20 19:45 ` Matt Cover 2019-09-22 12:37 ` Michael S. Tsirkin 0 siblings, 2 replies; 21+ messages in thread From: Matthew Cover @ 2019-09-20 18:58 UTC (permalink / raw) To: davem, ast, daniel, kafai, songliubraving, yhs, jasowang, edumazet, sdf, mst, matthew.cover, mail, pabeni, nicolas.dichtel, wangli39, lifei.shirley, tglx, netdev, linux-kernel, bpf Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal to fallback to tun_automq_select_queue() for tx queue selection. Compilation of this exact patch was tested. For functional testing 3 additional printk()s were added. Functional testing results (on 2 txq tap device): [Fri Sep 20 18:33:27 2019] ========== tun no prog ========== [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran [Fri Sep 20 18:33:27 2019] ========== tun prog -1 ========== [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran [Fri Sep 20 18:33:27 2019] ========== tun prog 0 ========== [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' [Fri Sep 20 18:33:27 2019] ========== tun prog 1 ========== [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' [Fri Sep 20 18:33:27 2019] ========== tun prog 2 ========== [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' Signed-off-by: Matthew Cover <matthew.cover@stackpath.com> --- drivers/net/tun.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index aab0be4..173d159 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -583,35 +583,37 @@ static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) return txq; } -static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) +static int tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) { struct tun_prog *prog; u32 numqueues; - u16 ret = 0; + int ret = -1; numqueues = READ_ONCE(tun->numqueues); if (!numqueues) return 0; + rcu_read_lock(); prog = rcu_dereference(tun->steering_prog); if (prog) ret = bpf_prog_run_clear_cb(prog->prog, skb); + rcu_read_unlock(); - return ret % numqueues; + if (ret >= 0) + ret %= numqueues; + + return ret; } static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, struct net_device *sb_dev) { struct tun_struct *tun = netdev_priv(dev); - u16 ret; + int ret; - rcu_read_lock(); - if (rcu_dereference(tun->steering_prog)) - ret = tun_ebpf_select_queue(tun, skb); - else + ret = tun_ebpf_select_queue(tun, skb); + if (ret < 0) ret = tun_automq_select_queue(tun, skb); - rcu_read_unlock(); return ret; } -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return 2019-09-20 18:58 [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return Matthew Cover @ 2019-09-20 19:45 ` Matt Cover 2019-09-22 12:37 ` Michael S. Tsirkin 1 sibling, 0 replies; 21+ messages in thread From: Matt Cover @ 2019-09-20 19:45 UTC (permalink / raw) To: davem, ast, daniel, kafai, songliubraving, yhs, Jason Wang, Eric Dumazet, Stanislav Fomichev, mst, Matthew Cover, mail, pabeni, Nicolas Dichtel, wangli39, lifei.shirley, tglx, netdev, linux-kernel, bpf On Fri, Sep 20, 2019 at 11:59 AM Matthew Cover <werekraken@gmail.com> wrote: > > Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal > to fallback to tun_automq_select_queue() for tx queue selection. > > Compilation of this exact patch was tested. > > For functional testing 3 additional printk()s were added. > > Functional testing results (on 2 txq tap device): > > [Fri Sep 20 18:33:27 2019] ========== tun no prog ========== > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > [Fri Sep 20 18:33:27 2019] ========== tun prog -1 ========== > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > [Fri Sep 20 18:33:27 2019] ========== tun prog 0 ========== > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > [Fri Sep 20 18:33:27 2019] ========== tun prog 1 ========== > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' > [Fri Sep 20 18:33:27 2019] ========== tun prog 2 ========== > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > > Signed-off-by: Matthew Cover <matthew.cover@stackpath.com> > --- > drivers/net/tun.c | 20 +++++++++++--------- > 1 file changed, 11 insertions(+), 9 deletions(-) > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index aab0be4..173d159 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -583,35 +583,37 @@ static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) > return txq; > } > > -static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > +static int tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > { > struct tun_prog *prog; > u32 numqueues; > - u16 ret = 0; > + int ret = -1; > > numqueues = READ_ONCE(tun->numqueues); > if (!numqueues) > return 0; > > + rcu_read_lock(); > prog = rcu_dereference(tun->steering_prog); > if (prog) > ret = bpf_prog_run_clear_cb(prog->prog, skb); > + rcu_read_unlock(); > > - return ret % numqueues; > + if (ret >= 0) > + ret %= numqueues; > + > + return ret; > } > > static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, > struct net_device *sb_dev) > { > struct tun_struct *tun = netdev_priv(dev); > - u16 ret; > + int ret; > > - rcu_read_lock(); > - if (rcu_dereference(tun->steering_prog)) > - ret = tun_ebpf_select_queue(tun, skb); > - else > + ret = tun_ebpf_select_queue(tun, skb); > + if (ret < 0) > ret = tun_automq_select_queue(tun, skb); > - rcu_read_unlock(); > > return ret; > } > -- > 1.8.3.1 > Sorry for sending this while net-next is closed... I should have been more careful. Please let me know if I should resubmit once net-next is open again. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return 2019-09-20 18:58 [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return Matthew Cover 2019-09-20 19:45 ` Matt Cover @ 2019-09-22 12:37 ` Michael S. Tsirkin 2019-09-22 17:43 ` Matt Cover 1 sibling, 1 reply; 21+ messages in thread From: Michael S. Tsirkin @ 2019-09-22 12:37 UTC (permalink / raw) To: Matthew Cover Cc: davem, ast, daniel, kafai, songliubraving, yhs, jasowang, edumazet, sdf, matthew.cover, mail, pabeni, nicolas.dichtel, wangli39, lifei.shirley, tglx, netdev, linux-kernel, bpf On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: > Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal > to fallback to tun_automq_select_queue() for tx queue selection. > > Compilation of this exact patch was tested. > > For functional testing 3 additional printk()s were added. > > Functional testing results (on 2 txq tap device): > > [Fri Sep 20 18:33:27 2019] ========== tun no prog ========== > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > [Fri Sep 20 18:33:27 2019] ========== tun prog -1 ========== > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > [Fri Sep 20 18:33:27 2019] ========== tun prog 0 ========== > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > [Fri Sep 20 18:33:27 2019] ========== tun prog 1 ========== > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' > [Fri Sep 20 18:33:27 2019] ========== tun prog 2 ========== > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > > Signed-off-by: Matthew Cover <matthew.cover@stackpath.com> Could you add a bit more motivation data here? 1. why is this a good idea 2. how do we know existing userspace does not rely on existing behaviour 3. why doesn't userspace need a way to figure out whether it runs on a kernel with and without this patch thanks, MST > --- > drivers/net/tun.c | 20 +++++++++++--------- > 1 file changed, 11 insertions(+), 9 deletions(-) > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index aab0be4..173d159 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -583,35 +583,37 @@ static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) > return txq; > } > > -static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > +static int tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > { > struct tun_prog *prog; > u32 numqueues; > - u16 ret = 0; > + int ret = -1; > > numqueues = READ_ONCE(tun->numqueues); > if (!numqueues) > return 0; > > + rcu_read_lock(); > prog = rcu_dereference(tun->steering_prog); > if (prog) > ret = bpf_prog_run_clear_cb(prog->prog, skb); > + rcu_read_unlock(); > > - return ret % numqueues; > + if (ret >= 0) > + ret %= numqueues; > + > + return ret; > } > > static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, > struct net_device *sb_dev) > { > struct tun_struct *tun = netdev_priv(dev); > - u16 ret; > + int ret; > > - rcu_read_lock(); > - if (rcu_dereference(tun->steering_prog)) > - ret = tun_ebpf_select_queue(tun, skb); > - else > + ret = tun_ebpf_select_queue(tun, skb); > + if (ret < 0) > ret = tun_automq_select_queue(tun, skb); > - rcu_read_unlock(); > > return ret; > } > -- > 1.8.3.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return 2019-09-22 12:37 ` Michael S. Tsirkin @ 2019-09-22 17:43 ` Matt Cover 2019-09-22 20:35 ` Michael S. Tsirkin 2019-09-23 0:46 ` Jason Wang 0 siblings, 2 replies; 21+ messages in thread From: Matt Cover @ 2019-09-22 17:43 UTC (permalink / raw) To: Michael S. Tsirkin Cc: davem, ast, daniel, kafai, songliubraving, yhs, Jason Wang, Eric Dumazet, Stanislav Fomichev, Matthew Cover, mail, pabeni, Nicolas Dichtel, wangli39, lifei.shirley, tglx, netdev, linux-kernel, bpf On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: > > Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal > > to fallback to tun_automq_select_queue() for tx queue selection. > > > > Compilation of this exact patch was tested. > > > > For functional testing 3 additional printk()s were added. > > > > Functional testing results (on 2 txq tap device): > > > > [Fri Sep 20 18:33:27 2019] ========== tun no prog ========== > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > > [Fri Sep 20 18:33:27 2019] ========== tun prog -1 ========== > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > > [Fri Sep 20 18:33:27 2019] ========== tun prog 0 ========== > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > > [Fri Sep 20 18:33:27 2019] ========== tun prog 1 ========== > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' > > [Fri Sep 20 18:33:27 2019] ========== tun prog 2 ========== > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > > > > Signed-off-by: Matthew Cover <matthew.cover@stackpath.com> > > > Could you add a bit more motivation data here? Thank you for these questions Michael. I'll plan on adding the below information to the commit message and submitting a v2 of this patch when net-next reopens. In the meantime, it would be very helpful to know if these answers address some of your concerns. > 1. why is this a good idea This change allows TUNSETSTEERINGEBPF progs to do any of the following. 1. implement queue selection for a subset of traffic (e.g. special queue selection logic for ipv4, but return negative and use the default automq logic for ipv6) 2. determine there isn't sufficient information to do proper queue selection; return negative and use the default automq logic for the unknown 3. implement a noop prog (e.g. do bpf_trace_printk() then return negative and use the default automq logic for everything) > 2. how do we know existing userspace does not rely on existing behaviour Prior to this change a negative return from a TUNSETSTEERINGEBPF prog would have been cast into a u16 and traversed netdev_cap_txqueue(). In most cases netdev_cap_txqueue() would have found this value to exceed real_num_tx_queues and queue_index would be updated to 0. It is possible that a TUNSETSTEERINGEBPF prog return a negative value which when cast into a u16 results in a positive queue_index less than real_num_tx_queues. For example, on x86_64, a return value of -65535 results in a queue_index of 1; which is a valid queue for any multiqueue device. It seems unlikely, however as stated above is unfortunately possible, that existing TUNSETSTEERINGEBPF programs would choose to return a negative value rather than return the positive value which holds the same meaning. It seems more likely that future TUNSETSTEERINGEBPF programs would leverage a negative return and potentially be loaded into a kernel with the old behavior. > 3. why doesn't userspace need a way to figure out whether it runs on a kernel with and > without this patch There may be some value in exposing this fact to the ebpf prog loader. What is the standard practice here, a define? > > > thanks, > MST > > > --- > > drivers/net/tun.c | 20 +++++++++++--------- > > 1 file changed, 11 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > > index aab0be4..173d159 100644 > > --- a/drivers/net/tun.c > > +++ b/drivers/net/tun.c > > @@ -583,35 +583,37 @@ static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) > > return txq; > > } > > > > -static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > > +static int tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > > { > > struct tun_prog *prog; > > u32 numqueues; > > - u16 ret = 0; > > + int ret = -1; > > > > numqueues = READ_ONCE(tun->numqueues); > > if (!numqueues) > > return 0; > > > > + rcu_read_lock(); > > prog = rcu_dereference(tun->steering_prog); > > if (prog) > > ret = bpf_prog_run_clear_cb(prog->prog, skb); > > + rcu_read_unlock(); > > > > - return ret % numqueues; > > + if (ret >= 0) > > + ret %= numqueues; > > + > > + return ret; > > } > > > > static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, > > struct net_device *sb_dev) > > { > > struct tun_struct *tun = netdev_priv(dev); > > - u16 ret; > > + int ret; > > > > - rcu_read_lock(); > > - if (rcu_dereference(tun->steering_prog)) > > - ret = tun_ebpf_select_queue(tun, skb); > > - else > > + ret = tun_ebpf_select_queue(tun, skb); > > + if (ret < 0) > > ret = tun_automq_select_queue(tun, skb); > > - rcu_read_unlock(); > > > > return ret; > > } > > -- > > 1.8.3.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return 2019-09-22 17:43 ` Matt Cover @ 2019-09-22 20:35 ` Michael S. Tsirkin 2019-09-22 22:30 ` Matt Cover 2019-09-23 0:46 ` Jason Wang 1 sibling, 1 reply; 21+ messages in thread From: Michael S. Tsirkin @ 2019-09-22 20:35 UTC (permalink / raw) To: Matt Cover Cc: davem, ast, daniel, kafai, songliubraving, yhs, Jason Wang, Eric Dumazet, Stanislav Fomichev, Matthew Cover, mail, pabeni, Nicolas Dichtel, wangli39, lifei.shirley, tglx, netdev, linux-kernel, bpf On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote: > On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: > > > Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal > > > to fallback to tun_automq_select_queue() for tx queue selection. > > > > > > Compilation of this exact patch was tested. > > > > > > For functional testing 3 additional printk()s were added. > > > > > > Functional testing results (on 2 txq tap device): > > > > > > [Fri Sep 20 18:33:27 2019] ========== tun no prog ========== > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > > > [Fri Sep 20 18:33:27 2019] ========== tun prog -1 ========== > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > > > [Fri Sep 20 18:33:27 2019] ========== tun prog 0 ========== > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > > > [Fri Sep 20 18:33:27 2019] ========== tun prog 1 ========== > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' > > > [Fri Sep 20 18:33:27 2019] ========== tun prog 2 ========== > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > > > > > > Signed-off-by: Matthew Cover <matthew.cover@stackpath.com> > > > > > > Could you add a bit more motivation data here? > > Thank you for these questions Michael. > > I'll plan on adding the below information to the > commit message and submitting a v2 of this patch > when net-next reopens. In the meantime, it would > be very helpful to know if these answers address > some of your concerns. > > > 1. why is this a good idea > > This change allows TUNSETSTEERINGEBPF progs to > do any of the following. > 1. implement queue selection for a subset of > traffic (e.g. special queue selection logic > for ipv4, but return negative and use the > default automq logic for ipv6) > 2. determine there isn't sufficient information > to do proper queue selection; return > negative and use the default automq logic > for the unknown > 3. implement a noop prog (e.g. do > bpf_trace_printk() then return negative and > use the default automq logic for everything) > > > 2. how do we know existing userspace does not rely on existing behaviour > > Prior to this change a negative return from a > TUNSETSTEERINGEBPF prog would have been cast > into a u16 and traversed netdev_cap_txqueue(). > > In most cases netdev_cap_txqueue() would have > found this value to exceed real_num_tx_queues > and queue_index would be updated to 0. > > It is possible that a TUNSETSTEERINGEBPF prog > return a negative value which when cast into a > u16 results in a positive queue_index less than > real_num_tx_queues. For example, on x86_64, a > return value of -65535 results in a queue_index > of 1; which is a valid queue for any multiqueue > device. > > It seems unlikely, however as stated above is > unfortunately possible, that existing > TUNSETSTEERINGEBPF programs would choose to > return a negative value rather than return the > positive value which holds the same meaning. > > It seems more likely that future > TUNSETSTEERINGEBPF programs would leverage a > negative return and potentially be loaded into > a kernel with the old behavior. OK if we are returning a special value, shouldn't we limit it? How about a special value with this meaning? If we are changing an ABI let's at least make it extensible. > > 3. why doesn't userspace need a way to figure out whether it runs on a kernel with and > > without this patch > > There may be some value in exposing this fact > to the ebpf prog loader. What is the standard > practice here, a define? We'll need something at runtime - people move binaries between kernels without rebuilding then. An ioctl is one option. A sysfs attribute is another, an ethtool flag yet another. A combination of these is possible. And if we are doing this anyway, maybe let userspace select the new behaviour? This way we can stay compatible with old userspace... > > > > > > thanks, > > MST > > > > > --- > > > drivers/net/tun.c | 20 +++++++++++--------- > > > 1 file changed, 11 insertions(+), 9 deletions(-) > > > > > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > > > index aab0be4..173d159 100644 > > > --- a/drivers/net/tun.c > > > +++ b/drivers/net/tun.c > > > @@ -583,35 +583,37 @@ static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) > > > return txq; > > > } > > > > > > -static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > > > +static int tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > > > { > > > struct tun_prog *prog; > > > u32 numqueues; > > > - u16 ret = 0; > > > + int ret = -1; > > > > > > numqueues = READ_ONCE(tun->numqueues); > > > if (!numqueues) > > > return 0; > > > > > > + rcu_read_lock(); > > > prog = rcu_dereference(tun->steering_prog); > > > if (prog) > > > ret = bpf_prog_run_clear_cb(prog->prog, skb); > > > + rcu_read_unlock(); > > > > > > - return ret % numqueues; > > > + if (ret >= 0) > > > + ret %= numqueues; > > > + > > > + return ret; > > > } > > > > > > static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, > > > struct net_device *sb_dev) > > > { > > > struct tun_struct *tun = netdev_priv(dev); > > > - u16 ret; > > > + int ret; > > > > > > - rcu_read_lock(); > > > - if (rcu_dereference(tun->steering_prog)) > > > - ret = tun_ebpf_select_queue(tun, skb); > > > - else > > > + ret = tun_ebpf_select_queue(tun, skb); > > > + if (ret < 0) > > > ret = tun_automq_select_queue(tun, skb); > > > - rcu_read_unlock(); > > > > > > return ret; > > > } > > > -- > > > 1.8.3.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return 2019-09-22 20:35 ` Michael S. Tsirkin @ 2019-09-22 22:30 ` Matt Cover 2019-09-22 22:46 ` Matt Cover 2019-09-23 0:51 ` Jason Wang 0 siblings, 2 replies; 21+ messages in thread From: Matt Cover @ 2019-09-22 22:30 UTC (permalink / raw) To: Michael S. Tsirkin Cc: davem, ast, daniel, kafai, songliubraving, yhs, Jason Wang, Eric Dumazet, Stanislav Fomichev, Matthew Cover, mail, pabeni, Nicolas Dichtel, wangli39, lifei.shirley, tglx, netdev, linux-kernel, bpf On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote: > > On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: > > > > Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal > > > > to fallback to tun_automq_select_queue() for tx queue selection. > > > > > > > > Compilation of this exact patch was tested. > > > > > > > > For functional testing 3 additional printk()s were added. > > > > > > > > Functional testing results (on 2 txq tap device): > > > > > > > > [Fri Sep 20 18:33:27 2019] ========== tun no prog ========== > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > > > > [Fri Sep 20 18:33:27 2019] ========== tun prog -1 ========== > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > > > > [Fri Sep 20 18:33:27 2019] ========== tun prog 0 ========== > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > > > > [Fri Sep 20 18:33:27 2019] ========== tun prog 1 ========== > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' > > > > [Fri Sep 20 18:33:27 2019] ========== tun prog 2 ========== > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > > > > > > > > Signed-off-by: Matthew Cover <matthew.cover@stackpath.com> > > > > > > > > > Could you add a bit more motivation data here? > > > > Thank you for these questions Michael. > > > > I'll plan on adding the below information to the > > commit message and submitting a v2 of this patch > > when net-next reopens. In the meantime, it would > > be very helpful to know if these answers address > > some of your concerns. > > > > > 1. why is this a good idea > > > > This change allows TUNSETSTEERINGEBPF progs to > > do any of the following. > > 1. implement queue selection for a subset of > > traffic (e.g. special queue selection logic > > for ipv4, but return negative and use the > > default automq logic for ipv6) > > 2. determine there isn't sufficient information > > to do proper queue selection; return > > negative and use the default automq logic > > for the unknown > > 3. implement a noop prog (e.g. do > > bpf_trace_printk() then return negative and > > use the default automq logic for everything) > > > > > 2. how do we know existing userspace does not rely on existing behaviour > > > > Prior to this change a negative return from a > > TUNSETSTEERINGEBPF prog would have been cast > > into a u16 and traversed netdev_cap_txqueue(). > > > > In most cases netdev_cap_txqueue() would have > > found this value to exceed real_num_tx_queues > > and queue_index would be updated to 0. > > > > It is possible that a TUNSETSTEERINGEBPF prog > > return a negative value which when cast into a > > u16 results in a positive queue_index less than > > real_num_tx_queues. For example, on x86_64, a > > return value of -65535 results in a queue_index > > of 1; which is a valid queue for any multiqueue > > device. > > > > It seems unlikely, however as stated above is > > unfortunately possible, that existing > > TUNSETSTEERINGEBPF programs would choose to > > return a negative value rather than return the > > positive value which holds the same meaning. > > > > It seems more likely that future > > TUNSETSTEERINGEBPF programs would leverage a > > negative return and potentially be loaded into > > a kernel with the old behavior. > > OK if we are returning a special > value, shouldn't we limit it? How about a special > value with this meaning? > If we are changing an ABI let's at least make it > extensible. > A special value with this meaning sounds good to me. I'll plan on adding a define set to -1 to cause the fallback to automq. The way I was initially viewing the old behavior was that returning negative was undefined; it happened to have the outcomes I walked through, but not necessarily by design. In order to keep the new behavior extensible, how should we state that a negative return other than -1 is undefined and therefore subject to change. Is something like this sufficient? Documentation/networking/tc-actions-env-rules.txt Additionally, what should the new behavior implement when a negative other than -1 is returned? I would like to have it do the same thing as -1 for now, but with the understanding that this behavior is undefined. Does this sound reasonable? > > > 3. why doesn't userspace need a way to figure out whether it runs on a kernel with and > > > without this patch > > > > There may be some value in exposing this fact > > to the ebpf prog loader. What is the standard > > practice here, a define? > > > We'll need something at runtime - people move binaries between kernels > without rebuilding then. An ioctl is one option. > A sysfs attribute is another, an ethtool flag yet another. > A combination of these is possible. > > And if we are doing this anyway, maybe let userspace select > the new behaviour? This way we can stay compatible with old > userspace... > Understood. I'll look into adding an ioctl to activate the new behavior. And perhaps a method of checking which is behavior is currently active (in case we ever want to change the default, say after some suitably long transition period). > > > > > > > > > thanks, > > > MST > > > > > > > --- > > > > drivers/net/tun.c | 20 +++++++++++--------- > > > > 1 file changed, 11 insertions(+), 9 deletions(-) > > > > > > > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > > > > index aab0be4..173d159 100644 > > > > --- a/drivers/net/tun.c > > > > +++ b/drivers/net/tun.c > > > > @@ -583,35 +583,37 @@ static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) > > > > return txq; > > > > } > > > > > > > > -static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > > > > +static int tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > > > > { > > > > struct tun_prog *prog; > > > > u32 numqueues; > > > > - u16 ret = 0; > > > > + int ret = -1; > > > > > > > > numqueues = READ_ONCE(tun->numqueues); > > > > if (!numqueues) > > > > return 0; > > > > > > > > + rcu_read_lock(); > > > > prog = rcu_dereference(tun->steering_prog); > > > > if (prog) > > > > ret = bpf_prog_run_clear_cb(prog->prog, skb); > > > > + rcu_read_unlock(); > > > > > > > > - return ret % numqueues; > > > > + if (ret >= 0) > > > > + ret %= numqueues; > > > > + > > > > + return ret; > > > > } > > > > > > > > static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, > > > > struct net_device *sb_dev) > > > > { > > > > struct tun_struct *tun = netdev_priv(dev); > > > > - u16 ret; > > > > + int ret; > > > > > > > > - rcu_read_lock(); > > > > - if (rcu_dereference(tun->steering_prog)) > > > > - ret = tun_ebpf_select_queue(tun, skb); > > > > - else > > > > + ret = tun_ebpf_select_queue(tun, skb); > > > > + if (ret < 0) > > > > ret = tun_automq_select_queue(tun, skb); > > > > - rcu_read_unlock(); > > > > > > > > return ret; > > > > } > > > > -- > > > > 1.8.3.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return 2019-09-22 22:30 ` Matt Cover @ 2019-09-22 22:46 ` Matt Cover 2019-09-23 0:28 ` Matt Cover 2019-09-25 10:33 ` Michael S. Tsirkin 2019-09-23 0:51 ` Jason Wang 1 sibling, 2 replies; 21+ messages in thread From: Matt Cover @ 2019-09-22 22:46 UTC (permalink / raw) To: Michael S. Tsirkin Cc: davem, ast, daniel, kafai, songliubraving, yhs, Jason Wang, Eric Dumazet, Stanislav Fomichev, Matthew Cover, mail, pabeni, Nicolas Dichtel, wangli39, lifei.shirley, tglx, netdev, linux-kernel, bpf On Sun, Sep 22, 2019 at 3:30 PM Matt Cover <werekraken@gmail.com> wrote: > > On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote: > > > On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: > > > > > Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal > > > > > to fallback to tun_automq_select_queue() for tx queue selection. > > > > > > > > > > Compilation of this exact patch was tested. > > > > > > > > > > For functional testing 3 additional printk()s were added. > > > > > > > > > > Functional testing results (on 2 txq tap device): > > > > > > > > > > [Fri Sep 20 18:33:27 2019] ========== tun no prog ========== > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > > > > > [Fri Sep 20 18:33:27 2019] ========== tun prog -1 ========== > > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > > > > > [Fri Sep 20 18:33:27 2019] ========== tun prog 0 ========== > > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > > > > > [Fri Sep 20 18:33:27 2019] ========== tun prog 1 ========== > > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' > > > > > [Fri Sep 20 18:33:27 2019] ========== tun prog 2 ========== > > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > > > > > > > > > > Signed-off-by: Matthew Cover <matthew.cover@stackpath.com> > > > > > > > > > > > > Could you add a bit more motivation data here? > > > > > > Thank you for these questions Michael. > > > > > > I'll plan on adding the below information to the > > > commit message and submitting a v2 of this patch > > > when net-next reopens. In the meantime, it would > > > be very helpful to know if these answers address > > > some of your concerns. > > > > > > > 1. why is this a good idea > > > > > > This change allows TUNSETSTEERINGEBPF progs to > > > do any of the following. > > > 1. implement queue selection for a subset of > > > traffic (e.g. special queue selection logic > > > for ipv4, but return negative and use the > > > default automq logic for ipv6) > > > 2. determine there isn't sufficient information > > > to do proper queue selection; return > > > negative and use the default automq logic > > > for the unknown > > > 3. implement a noop prog (e.g. do > > > bpf_trace_printk() then return negative and > > > use the default automq logic for everything) > > > > > > > 2. how do we know existing userspace does not rely on existing behaviour > > > > > > Prior to this change a negative return from a > > > TUNSETSTEERINGEBPF prog would have been cast > > > into a u16 and traversed netdev_cap_txqueue(). > > > > > > In most cases netdev_cap_txqueue() would have > > > found this value to exceed real_num_tx_queues > > > and queue_index would be updated to 0. > > > > > > It is possible that a TUNSETSTEERINGEBPF prog > > > return a negative value which when cast into a > > > u16 results in a positive queue_index less than > > > real_num_tx_queues. For example, on x86_64, a > > > return value of -65535 results in a queue_index > > > of 1; which is a valid queue for any multiqueue > > > device. > > > > > > It seems unlikely, however as stated above is > > > unfortunately possible, that existing > > > TUNSETSTEERINGEBPF programs would choose to > > > return a negative value rather than return the > > > positive value which holds the same meaning. > > > > > > It seems more likely that future > > > TUNSETSTEERINGEBPF programs would leverage a > > > negative return and potentially be loaded into > > > a kernel with the old behavior. > > > > OK if we are returning a special > > value, shouldn't we limit it? How about a special > > value with this meaning? > > If we are changing an ABI let's at least make it > > extensible. > > > > A special value with this meaning sounds > good to me. I'll plan on adding a define > set to -1 to cause the fallback to automq. > > The way I was initially viewing the old > behavior was that returning negative was > undefined; it happened to have the > outcomes I walked through, but not > necessarily by design. > > In order to keep the new behavior > extensible, how should we state that a > negative return other than -1 is > undefined and therefore subject to > change. Is something like this > sufficient? > > Documentation/networking/tc-actions-env-rules.txt > > Additionally, what should the new > behavior implement when a negative other > than -1 is returned? I would like to have > it do the same thing as -1 for now, but > with the understanding that this behavior > is undefined. Does this sound reasonable? > > > > > 3. why doesn't userspace need a way to figure out whether it runs on a kernel with and > > > > without this patch > > > > > > There may be some value in exposing this fact > > > to the ebpf prog loader. What is the standard > > > practice here, a define? > > > > > > We'll need something at runtime - people move binaries between kernels > > without rebuilding then. An ioctl is one option. > > A sysfs attribute is another, an ethtool flag yet another. > > A combination of these is possible. > > > > And if we are doing this anyway, maybe let userspace select > > the new behaviour? This way we can stay compatible with old > > userspace... > > > > Understood. I'll look into adding an > ioctl to activate the new behavior. And > perhaps a method of checking which is > behavior is currently active (in case we > ever want to change the default, say > after some suitably long transition > period). > Unless of course we can simply state via documentation that any negative return for which a define doesn't exist is undefined behavior. In which case, there is no old vs new behavior and no need for an ioctl. Simply the understanding provided by the documentation. > > > > > > > > > > > > thanks, > > > > MST > > > > > > > > > --- > > > > > drivers/net/tun.c | 20 +++++++++++--------- > > > > > 1 file changed, 11 insertions(+), 9 deletions(-) > > > > > > > > > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > > > > > index aab0be4..173d159 100644 > > > > > --- a/drivers/net/tun.c > > > > > +++ b/drivers/net/tun.c > > > > > @@ -583,35 +583,37 @@ static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) > > > > > return txq; > > > > > } > > > > > > > > > > -static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > > > > > +static int tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > > > > > { > > > > > struct tun_prog *prog; > > > > > u32 numqueues; > > > > > - u16 ret = 0; > > > > > + int ret = -1; > > > > > > > > > > numqueues = READ_ONCE(tun->numqueues); > > > > > if (!numqueues) > > > > > return 0; > > > > > > > > > > + rcu_read_lock(); > > > > > prog = rcu_dereference(tun->steering_prog); > > > > > if (prog) > > > > > ret = bpf_prog_run_clear_cb(prog->prog, skb); > > > > > + rcu_read_unlock(); > > > > > > > > > > - return ret % numqueues; > > > > > + if (ret >= 0) > > > > > + ret %= numqueues; > > > > > + > > > > > + return ret; > > > > > } > > > > > > > > > > static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, > > > > > struct net_device *sb_dev) > > > > > { > > > > > struct tun_struct *tun = netdev_priv(dev); > > > > > - u16 ret; > > > > > + int ret; > > > > > > > > > > - rcu_read_lock(); > > > > > - if (rcu_dereference(tun->steering_prog)) > > > > > - ret = tun_ebpf_select_queue(tun, skb); > > > > > - else > > > > > + ret = tun_ebpf_select_queue(tun, skb); > > > > > + if (ret < 0) > > > > > ret = tun_automq_select_queue(tun, skb); > > > > > - rcu_read_unlock(); > > > > > > > > > > return ret; > > > > > } > > > > > -- > > > > > 1.8.3.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return 2019-09-22 22:46 ` Matt Cover @ 2019-09-23 0:28 ` Matt Cover 2019-09-25 10:33 ` Michael S. Tsirkin 1 sibling, 0 replies; 21+ messages in thread From: Matt Cover @ 2019-09-23 0:28 UTC (permalink / raw) To: Michael S. Tsirkin Cc: davem, ast, daniel, kafai, songliubraving, yhs, Jason Wang, Eric Dumazet, Stanislav Fomichev, Matthew Cover, mail, pabeni, Nicolas Dichtel, wangli39, lifei.shirley, tglx, netdev, linux-kernel, bpf On Sun, Sep 22, 2019 at 3:46 PM Matt Cover <werekraken@gmail.com> wrote: > > On Sun, Sep 22, 2019 at 3:30 PM Matt Cover <werekraken@gmail.com> wrote: > > > > On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote: > > > > On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: > > > > > > Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal > > > > > > to fallback to tun_automq_select_queue() for tx queue selection. > > > > > > > > > > > > Compilation of this exact patch was tested. > > > > > > > > > > > > For functional testing 3 additional printk()s were added. > > > > > > > > > > > > Functional testing results (on 2 txq tap device): > > > > > > > > > > > > [Fri Sep 20 18:33:27 2019] ========== tun no prog ========== > > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > > > > > > [Fri Sep 20 18:33:27 2019] ========== tun prog -1 ========== > > > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' > > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > > > > > > [Fri Sep 20 18:33:27 2019] ========== tun prog 0 ========== > > > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' > > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > > > > > > [Fri Sep 20 18:33:27 2019] ========== tun prog 1 ========== > > > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' > > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' > > > > > > [Fri Sep 20 18:33:27 2019] ========== tun prog 2 ========== > > > > > > [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' > > > > > > [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > > > > > > > > > > > > Signed-off-by: Matthew Cover <matthew.cover@stackpath.com> > > > > > > > > > > > > > > > Could you add a bit more motivation data here? > > > > > > > > Thank you for these questions Michael. > > > > > > > > I'll plan on adding the below information to the > > > > commit message and submitting a v2 of this patch > > > > when net-next reopens. In the meantime, it would > > > > be very helpful to know if these answers address > > > > some of your concerns. > > > > > > > > > 1. why is this a good idea > > > > > > > > This change allows TUNSETSTEERINGEBPF progs to > > > > do any of the following. > > > > 1. implement queue selection for a subset of > > > > traffic (e.g. special queue selection logic > > > > for ipv4, but return negative and use the > > > > default automq logic for ipv6) > > > > 2. determine there isn't sufficient information > > > > to do proper queue selection; return > > > > negative and use the default automq logic > > > > for the unknown > > > > 3. implement a noop prog (e.g. do > > > > bpf_trace_printk() then return negative and > > > > use the default automq logic for everything) > > > > > > > > > 2. how do we know existing userspace does not rely on existing behaviour > > > > > > > > Prior to this change a negative return from a > > > > TUNSETSTEERINGEBPF prog would have been cast > > > > into a u16 and traversed netdev_cap_txqueue(). > > > > > > > > In most cases netdev_cap_txqueue() would have > > > > found this value to exceed real_num_tx_queues > > > > and queue_index would be updated to 0. > > > > > > > > It is possible that a TUNSETSTEERINGEBPF prog > > > > return a negative value which when cast into a > > > > u16 results in a positive queue_index less than > > > > real_num_tx_queues. For example, on x86_64, a > > > > return value of -65535 results in a queue_index > > > > of 1; which is a valid queue for any multiqueue > > > > device. > > > > > > > > It seems unlikely, however as stated above is > > > > unfortunately possible, that existing > > > > TUNSETSTEERINGEBPF programs would choose to > > > > return a negative value rather than return the > > > > positive value which holds the same meaning. > > > > > > > > It seems more likely that future > > > > TUNSETSTEERINGEBPF programs would leverage a > > > > negative return and potentially be loaded into > > > > a kernel with the old behavior. > > > > > > OK if we are returning a special > > > value, shouldn't we limit it? How about a special > > > value with this meaning? > > > If we are changing an ABI let's at least make it > > > extensible. > > > > > > > A special value with this meaning sounds > > good to me. I'll plan on adding a define > > set to -1 to cause the fallback to automq. > > > > The way I was initially viewing the old > > behavior was that returning negative was > > undefined; it happened to have the > > outcomes I walked through, but not > > necessarily by design. > > > > In order to keep the new behavior > > extensible, how should we state that a > > negative return other than -1 is > > undefined and therefore subject to > > change. Is something like this > > sufficient? > > > > Documentation/networking/tc-actions-env-rules.txt > > > > Additionally, what should the new > > behavior implement when a negative other > > than -1 is returned? I would like to have > > it do the same thing as -1 for now, but > > with the understanding that this behavior > > is undefined. Does this sound reasonable? > > > > > > > 3. why doesn't userspace need a way to figure out whether it runs on a kernel with and > > > > > without this patch > > > > > > > > There may be some value in exposing this fact > > > > to the ebpf prog loader. What is the standard > > > > practice here, a define? > > > > > > > > > We'll need something at runtime - people move binaries between kernels > > > without rebuilding then. An ioctl is one option. > > > A sysfs attribute is another, an ethtool flag yet another. > > > A combination of these is possible. > > > > > > And if we are doing this anyway, maybe let userspace select > > > the new behaviour? This way we can stay compatible with old > > > userspace... > > > > > > > Understood. I'll look into adding an > > ioctl to activate the new behavior. And > > perhaps a method of checking which is > > behavior is currently active (in case we > > ever want to change the default, say > > after some suitably long transition > > period). > > > > Unless of course we can simply state via > documentation that any negative return > for which a define doesn't exist is > undefined behavior. In which case, > there is no old vs new behavior and > no need for an ioctl. Simply the > understanding provided by the > documentation. > On second thought, this again doesn't solve for runtime determination. How does this sound as a complete solution for v2? 1. leave the changes to tun_ebpf_select_queue() as they are 2. update tun_select_queue() to only run tun_automq_select_queue() when ret == TUN_SSE_DO_AUTOMQ (this will also happen when !prog) 3. add an ioctl or sysfs endpoint which allows for runtime querying of the TUNSETSTEERINGEBPF "capabilities" (if I can keep this more generic than return value, I will; e.g. perhaps one day it could be used to indicate a hookpoint specific bpf helper function or similar as a capability) 4. add documentation on how to check "capabilities" and that any unspecified negative return value results in undefined behavior > > > > > > > > > > > > > > > thanks, > > > > > MST > > > > > > > > > > > --- > > > > > > drivers/net/tun.c | 20 +++++++++++--------- > > > > > > 1 file changed, 11 insertions(+), 9 deletions(-) > > > > > > > > > > > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > > > > > > index aab0be4..173d159 100644 > > > > > > --- a/drivers/net/tun.c > > > > > > +++ b/drivers/net/tun.c > > > > > > @@ -583,35 +583,37 @@ static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) > > > > > > return txq; > > > > > > } > > > > > > > > > > > > -static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > > > > > > +static int tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > > > > > > { > > > > > > struct tun_prog *prog; > > > > > > u32 numqueues; > > > > > > - u16 ret = 0; > > > > > > + int ret = -1; > > > > > > > > > > > > numqueues = READ_ONCE(tun->numqueues); > > > > > > if (!numqueues) > > > > > > return 0; > > > > > > > > > > > > + rcu_read_lock(); > > > > > > prog = rcu_dereference(tun->steering_prog); > > > > > > if (prog) > > > > > > ret = bpf_prog_run_clear_cb(prog->prog, skb); > > > > > > + rcu_read_unlock(); > > > > > > > > > > > > - return ret % numqueues; > > > > > > + if (ret >= 0) > > > > > > + ret %= numqueues; > > > > > > + > > > > > > + return ret; > > > > > > } > > > > > > > > > > > > static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, > > > > > > struct net_device *sb_dev) > > > > > > { > > > > > > struct tun_struct *tun = netdev_priv(dev); > > > > > > - u16 ret; > > > > > > + int ret; > > > > > > > > > > > > - rcu_read_lock(); > > > > > > - if (rcu_dereference(tun->steering_prog)) > > > > > > - ret = tun_ebpf_select_queue(tun, skb); > > > > > > - else > > > > > > + ret = tun_ebpf_select_queue(tun, skb); > > > > > > + if (ret < 0) > > > > > > ret = tun_automq_select_queue(tun, skb); > > > > > > - rcu_read_unlock(); > > > > > > > > > > > > return ret; > > > > > > } > > > > > > -- > > > > > > 1.8.3.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return 2019-09-22 22:46 ` Matt Cover 2019-09-23 0:28 ` Matt Cover @ 2019-09-25 10:33 ` Michael S. Tsirkin 1 sibling, 0 replies; 21+ messages in thread From: Michael S. Tsirkin @ 2019-09-25 10:33 UTC (permalink / raw) To: Matt Cover Cc: davem, ast, daniel, kafai, songliubraving, yhs, Jason Wang, Eric Dumazet, Stanislav Fomichev, Matthew Cover, mail, pabeni, Nicolas Dichtel, wangli39, lifei.shirley, tglx, netdev, linux-kernel, bpf On Sun, Sep 22, 2019 at 03:46:19PM -0700, Matt Cover wrote: > Unless of course we can simply state via > documentation that any negative return > for which a define doesn't exist is > undefined behavior. In which case, > there is no old vs new behavior and > no need for an ioctl. Simply the > understanding provided by the > documentation. Unfortunately this isn't sufficient: software can easily return a wrong value by mistake, and become dependent on an undefined behaviour. -- MST ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return 2019-09-22 22:30 ` Matt Cover 2019-09-22 22:46 ` Matt Cover @ 2019-09-23 0:51 ` Jason Wang 2019-09-23 1:15 ` Matt Cover 1 sibling, 1 reply; 21+ messages in thread From: Jason Wang @ 2019-09-23 0:51 UTC (permalink / raw) To: Matt Cover, Michael S. Tsirkin Cc: davem, ast, daniel, kafai, songliubraving, yhs, Eric Dumazet, Stanislav Fomichev, Matthew Cover, mail, pabeni, Nicolas Dichtel, wangli39, lifei.shirley, tglx, netdev, linux-kernel, bpf On 2019/9/23 上午6:30, Matt Cover wrote: > On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin <mst@redhat.com> wrote: >> On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote: >>> On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin <mst@redhat.com> wrote: >>>> On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: >>>>> Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal >>>>> to fallback to tun_automq_select_queue() for tx queue selection. >>>>> >>>>> Compilation of this exact patch was tested. >>>>> >>>>> For functional testing 3 additional printk()s were added. >>>>> >>>>> Functional testing results (on 2 txq tap device): >>>>> >>>>> [Fri Sep 20 18:33:27 2019] ========== tun no prog ========== >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran >>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog -1 ========== >>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran >>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 0 ========== >>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' >>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 1 ========== >>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' >>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 2 ========== >>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' >>>>> >>>>> Signed-off-by: Matthew Cover <matthew.cover@stackpath.com> >>>> >>>> Could you add a bit more motivation data here? >>> Thank you for these questions Michael. >>> >>> I'll plan on adding the below information to the >>> commit message and submitting a v2 of this patch >>> when net-next reopens. In the meantime, it would >>> be very helpful to know if these answers address >>> some of your concerns. >>> >>>> 1. why is this a good idea >>> This change allows TUNSETSTEERINGEBPF progs to >>> do any of the following. >>> 1. implement queue selection for a subset of >>> traffic (e.g. special queue selection logic >>> for ipv4, but return negative and use the >>> default automq logic for ipv6) >>> 2. determine there isn't sufficient information >>> to do proper queue selection; return >>> negative and use the default automq logic >>> for the unknown >>> 3. implement a noop prog (e.g. do >>> bpf_trace_printk() then return negative and >>> use the default automq logic for everything) >>> >>>> 2. how do we know existing userspace does not rely on existing behaviour >>> Prior to this change a negative return from a >>> TUNSETSTEERINGEBPF prog would have been cast >>> into a u16 and traversed netdev_cap_txqueue(). >>> >>> In most cases netdev_cap_txqueue() would have >>> found this value to exceed real_num_tx_queues >>> and queue_index would be updated to 0. >>> >>> It is possible that a TUNSETSTEERINGEBPF prog >>> return a negative value which when cast into a >>> u16 results in a positive queue_index less than >>> real_num_tx_queues. For example, on x86_64, a >>> return value of -65535 results in a queue_index >>> of 1; which is a valid queue for any multiqueue >>> device. >>> >>> It seems unlikely, however as stated above is >>> unfortunately possible, that existing >>> TUNSETSTEERINGEBPF programs would choose to >>> return a negative value rather than return the >>> positive value which holds the same meaning. >>> >>> It seems more likely that future >>> TUNSETSTEERINGEBPF programs would leverage a >>> negative return and potentially be loaded into >>> a kernel with the old behavior. >> OK if we are returning a special >> value, shouldn't we limit it? How about a special >> value with this meaning? >> If we are changing an ABI let's at least make it >> extensible. >> > A special value with this meaning sounds > good to me. I'll plan on adding a define > set to -1 to cause the fallback to automq. Can it really return -1? I see: static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog, struct sk_buff *skb) ... > > The way I was initially viewing the old > behavior was that returning negative was > undefined; it happened to have the > outcomes I walked through, but not > necessarily by design. Having such fallback may bring extra troubles, it requires the eBPF program know the existence of the behavior which is not a part of kernel ABI actually. And then some eBPF program may start to rely on that which is pretty dangerous. Note, one important consideration is to have macvtap support where does not have any stuffs like automq. Thanks > > In order to keep the new behavior > extensible, how should we state that a > negative return other than -1 is > undefined and therefore subject to > change. Is something like this > sufficient? > > Documentation/networking/tc-actions-env-rules.txt > > Additionally, what should the new > behavior implement when a negative other > than -1 is returned? I would like to have > it do the same thing as -1 for now, but > with the understanding that this behavior > is undefined. Does this sound reasonable? > >>>> 3. why doesn't userspace need a way to figure out whether it runs on a kernel with and >>>> without this patch >>> There may be some value in exposing this fact >>> to the ebpf prog loader. What is the standard >>> practice here, a define? >> >> We'll need something at runtime - people move binaries between kernels >> without rebuilding then. An ioctl is one option. >> A sysfs attribute is another, an ethtool flag yet another. >> A combination of these is possible. >> >> And if we are doing this anyway, maybe let userspace select >> the new behaviour? This way we can stay compatible with old >> userspace... >> > Understood. I'll look into adding an > ioctl to activate the new behavior. And > perhaps a method of checking which is > behavior is currently active (in case we > ever want to change the default, say > after some suitably long transition > period). > >>>> >>>> thanks, >>>> MST >>>> >>>>> --- >>>>> drivers/net/tun.c | 20 +++++++++++--------- >>>>> 1 file changed, 11 insertions(+), 9 deletions(-) >>>>> >>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c >>>>> index aab0be4..173d159 100644 >>>>> --- a/drivers/net/tun.c >>>>> +++ b/drivers/net/tun.c >>>>> @@ -583,35 +583,37 @@ static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) >>>>> return txq; >>>>> } >>>>> >>>>> -static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) >>>>> +static int tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) >>>>> { >>>>> struct tun_prog *prog; >>>>> u32 numqueues; >>>>> - u16 ret = 0; >>>>> + int ret = -1; >>>>> >>>>> numqueues = READ_ONCE(tun->numqueues); >>>>> if (!numqueues) >>>>> return 0; >>>>> >>>>> + rcu_read_lock(); >>>>> prog = rcu_dereference(tun->steering_prog); >>>>> if (prog) >>>>> ret = bpf_prog_run_clear_cb(prog->prog, skb); >>>>> + rcu_read_unlock(); >>>>> >>>>> - return ret % numqueues; >>>>> + if (ret >= 0) >>>>> + ret %= numqueues; >>>>> + >>>>> + return ret; >>>>> } >>>>> >>>>> static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, >>>>> struct net_device *sb_dev) >>>>> { >>>>> struct tun_struct *tun = netdev_priv(dev); >>>>> - u16 ret; >>>>> + int ret; >>>>> >>>>> - rcu_read_lock(); >>>>> - if (rcu_dereference(tun->steering_prog)) >>>>> - ret = tun_ebpf_select_queue(tun, skb); >>>>> - else >>>>> + ret = tun_ebpf_select_queue(tun, skb); >>>>> + if (ret < 0) >>>>> ret = tun_automq_select_queue(tun, skb); >>>>> - rcu_read_unlock(); >>>>> >>>>> return ret; >>>>> } >>>>> -- >>>>> 1.8.3.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return 2019-09-23 0:51 ` Jason Wang @ 2019-09-23 1:15 ` Matt Cover 2019-09-23 2:34 ` Jason Wang 0 siblings, 1 reply; 21+ messages in thread From: Matt Cover @ 2019-09-23 1:15 UTC (permalink / raw) To: Jason Wang Cc: Michael S. Tsirkin, davem, ast, daniel, kafai, songliubraving, yhs, Eric Dumazet, Stanislav Fomichev, Matthew Cover, mail, pabeni, Nicolas Dichtel, wangli39, lifei.shirley, tglx, netdev, linux-kernel, bpf On Sun, Sep 22, 2019 at 5:51 PM Jason Wang <jasowang@redhat.com> wrote: > > > On 2019/9/23 上午6:30, Matt Cover wrote: > > On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin <mst@redhat.com> wrote: > >> On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote: > >>> On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin <mst@redhat.com> wrote: > >>>> On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: > >>>>> Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal > >>>>> to fallback to tun_automq_select_queue() for tx queue selection. > >>>>> > >>>>> Compilation of this exact patch was tested. > >>>>> > >>>>> For functional testing 3 additional printk()s were added. > >>>>> > >>>>> Functional testing results (on 2 txq tap device): > >>>>> > >>>>> [Fri Sep 20 18:33:27 2019] ========== tun no prog ========== > >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > >>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog -1 ========== > >>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' > >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > >>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 0 ========== > >>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' > >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > >>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 1 ========== > >>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' > >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' > >>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 2 ========== > >>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' > >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > >>>>> > >>>>> Signed-off-by: Matthew Cover <matthew.cover@stackpath.com> > >>>> > >>>> Could you add a bit more motivation data here? > >>> Thank you for these questions Michael. > >>> > >>> I'll plan on adding the below information to the > >>> commit message and submitting a v2 of this patch > >>> when net-next reopens. In the meantime, it would > >>> be very helpful to know if these answers address > >>> some of your concerns. > >>> > >>>> 1. why is this a good idea > >>> This change allows TUNSETSTEERINGEBPF progs to > >>> do any of the following. > >>> 1. implement queue selection for a subset of > >>> traffic (e.g. special queue selection logic > >>> for ipv4, but return negative and use the > >>> default automq logic for ipv6) > >>> 2. determine there isn't sufficient information > >>> to do proper queue selection; return > >>> negative and use the default automq logic > >>> for the unknown > >>> 3. implement a noop prog (e.g. do > >>> bpf_trace_printk() then return negative and > >>> use the default automq logic for everything) > >>> > >>>> 2. how do we know existing userspace does not rely on existing behaviour > >>> Prior to this change a negative return from a > >>> TUNSETSTEERINGEBPF prog would have been cast > >>> into a u16 and traversed netdev_cap_txqueue(). > >>> > >>> In most cases netdev_cap_txqueue() would have > >>> found this value to exceed real_num_tx_queues > >>> and queue_index would be updated to 0. > >>> > >>> It is possible that a TUNSETSTEERINGEBPF prog > >>> return a negative value which when cast into a > >>> u16 results in a positive queue_index less than > >>> real_num_tx_queues. For example, on x86_64, a > >>> return value of -65535 results in a queue_index > >>> of 1; which is a valid queue for any multiqueue > >>> device. > >>> > >>> It seems unlikely, however as stated above is > >>> unfortunately possible, that existing > >>> TUNSETSTEERINGEBPF programs would choose to > >>> return a negative value rather than return the > >>> positive value which holds the same meaning. > >>> > >>> It seems more likely that future > >>> TUNSETSTEERINGEBPF programs would leverage a > >>> negative return and potentially be loaded into > >>> a kernel with the old behavior. > >> OK if we are returning a special > >> value, shouldn't we limit it? How about a special > >> value with this meaning? > >> If we are changing an ABI let's at least make it > >> extensible. > >> > > A special value with this meaning sounds > > good to me. I'll plan on adding a define > > set to -1 to cause the fallback to automq. > > > Can it really return -1? > > I see: > > static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog, > struct sk_buff *skb) > ... > > > > > > The way I was initially viewing the old > > behavior was that returning negative was > > undefined; it happened to have the > > outcomes I walked through, but not > > necessarily by design. > > > Having such fallback may bring extra troubles, it requires the eBPF > program know the existence of the behavior which is not a part of kernel > ABI actually. And then some eBPF program may start to rely on that which > is pretty dangerous. Note, one important consideration is to have > macvtap support where does not have any stuffs like automq. > > Thanks > How about we call this TUN_SSE_ABORT instead of TUN_SSE_DO_AUTOMQ? TUN_SSE_ABORT could be documented as falling back to the default queue selection method in either space (presumably macvtap has some queue selection method when there is no prog). > > > > > In order to keep the new behavior > > extensible, how should we state that a > > negative return other than -1 is > > undefined and therefore subject to > > change. Is something like this > > sufficient? > > > > Documentation/networking/tc-actions-env-rules.txt > > > > Additionally, what should the new > > behavior implement when a negative other > > than -1 is returned? I would like to have > > it do the same thing as -1 for now, but > > with the understanding that this behavior > > is undefined. Does this sound reasonable? > > > >>>> 3. why doesn't userspace need a way to figure out whether it runs on a kernel with and > >>>> without this patch > >>> There may be some value in exposing this fact > >>> to the ebpf prog loader. What is the standard > >>> practice here, a define? > >> > >> We'll need something at runtime - people move binaries between kernels > >> without rebuilding then. An ioctl is one option. > >> A sysfs attribute is another, an ethtool flag yet another. > >> A combination of these is possible. > >> > >> And if we are doing this anyway, maybe let userspace select > >> the new behaviour? This way we can stay compatible with old > >> userspace... > >> > > Understood. I'll look into adding an > > ioctl to activate the new behavior. And > > perhaps a method of checking which is > > behavior is currently active (in case we > > ever want to change the default, say > > after some suitably long transition > > period). > > > >>>> > >>>> thanks, > >>>> MST > >>>> > >>>>> --- > >>>>> drivers/net/tun.c | 20 +++++++++++--------- > >>>>> 1 file changed, 11 insertions(+), 9 deletions(-) > >>>>> > >>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c > >>>>> index aab0be4..173d159 100644 > >>>>> --- a/drivers/net/tun.c > >>>>> +++ b/drivers/net/tun.c > >>>>> @@ -583,35 +583,37 @@ static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) > >>>>> return txq; > >>>>> } > >>>>> > >>>>> -static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > >>>>> +static int tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > >>>>> { > >>>>> struct tun_prog *prog; > >>>>> u32 numqueues; > >>>>> - u16 ret = 0; > >>>>> + int ret = -1; > >>>>> > >>>>> numqueues = READ_ONCE(tun->numqueues); > >>>>> if (!numqueues) > >>>>> return 0; > >>>>> > >>>>> + rcu_read_lock(); > >>>>> prog = rcu_dereference(tun->steering_prog); > >>>>> if (prog) > >>>>> ret = bpf_prog_run_clear_cb(prog->prog, skb); > >>>>> + rcu_read_unlock(); > >>>>> > >>>>> - return ret % numqueues; > >>>>> + if (ret >= 0) > >>>>> + ret %= numqueues; > >>>>> + > >>>>> + return ret; > >>>>> } > >>>>> > >>>>> static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, > >>>>> struct net_device *sb_dev) > >>>>> { > >>>>> struct tun_struct *tun = netdev_priv(dev); > >>>>> - u16 ret; > >>>>> + int ret; > >>>>> > >>>>> - rcu_read_lock(); > >>>>> - if (rcu_dereference(tun->steering_prog)) > >>>>> - ret = tun_ebpf_select_queue(tun, skb); > >>>>> - else > >>>>> + ret = tun_ebpf_select_queue(tun, skb); > >>>>> + if (ret < 0) > >>>>> ret = tun_automq_select_queue(tun, skb); > >>>>> - rcu_read_unlock(); > >>>>> > >>>>> return ret; > >>>>> } > >>>>> -- > >>>>> 1.8.3.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return 2019-09-23 1:15 ` Matt Cover @ 2019-09-23 2:34 ` Jason Wang 2019-09-23 3:18 ` Matt Cover 0 siblings, 1 reply; 21+ messages in thread From: Jason Wang @ 2019-09-23 2:34 UTC (permalink / raw) To: Matt Cover Cc: Michael S. Tsirkin, davem, ast, daniel, kafai, songliubraving, yhs, Eric Dumazet, Stanislav Fomichev, Matthew Cover, mail, pabeni, Nicolas Dichtel, wangli39, lifei.shirley, tglx, netdev, linux-kernel, bpf On 2019/9/23 上午9:15, Matt Cover wrote: > On Sun, Sep 22, 2019 at 5:51 PM Jason Wang <jasowang@redhat.com> wrote: >> >> On 2019/9/23 上午6:30, Matt Cover wrote: >>> On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin <mst@redhat.com> wrote: >>>> On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote: >>>>> On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin <mst@redhat.com> wrote: >>>>>> On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: >>>>>>> Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal >>>>>>> to fallback to tun_automq_select_queue() for tx queue selection. >>>>>>> >>>>>>> Compilation of this exact patch was tested. >>>>>>> >>>>>>> For functional testing 3 additional printk()s were added. >>>>>>> >>>>>>> Functional testing results (on 2 txq tap device): >>>>>>> >>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun no prog ========== >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran >>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog -1 ========== >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran >>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 0 ========== >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' >>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 1 ========== >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' >>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 2 ========== >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' >>>>>>> >>>>>>> Signed-off-by: Matthew Cover <matthew.cover@stackpath.com> >>>>>> Could you add a bit more motivation data here? >>>>> Thank you for these questions Michael. >>>>> >>>>> I'll plan on adding the below information to the >>>>> commit message and submitting a v2 of this patch >>>>> when net-next reopens. In the meantime, it would >>>>> be very helpful to know if these answers address >>>>> some of your concerns. >>>>> >>>>>> 1. why is this a good idea >>>>> This change allows TUNSETSTEERINGEBPF progs to >>>>> do any of the following. >>>>> 1. implement queue selection for a subset of >>>>> traffic (e.g. special queue selection logic >>>>> for ipv4, but return negative and use the >>>>> default automq logic for ipv6) >>>>> 2. determine there isn't sufficient information >>>>> to do proper queue selection; return >>>>> negative and use the default automq logic >>>>> for the unknown >>>>> 3. implement a noop prog (e.g. do >>>>> bpf_trace_printk() then return negative and >>>>> use the default automq logic for everything) >>>>> >>>>>> 2. how do we know existing userspace does not rely on existing behaviour >>>>> Prior to this change a negative return from a >>>>> TUNSETSTEERINGEBPF prog would have been cast >>>>> into a u16 and traversed netdev_cap_txqueue(). >>>>> >>>>> In most cases netdev_cap_txqueue() would have >>>>> found this value to exceed real_num_tx_queues >>>>> and queue_index would be updated to 0. >>>>> >>>>> It is possible that a TUNSETSTEERINGEBPF prog >>>>> return a negative value which when cast into a >>>>> u16 results in a positive queue_index less than >>>>> real_num_tx_queues. For example, on x86_64, a >>>>> return value of -65535 results in a queue_index >>>>> of 1; which is a valid queue for any multiqueue >>>>> device. >>>>> >>>>> It seems unlikely, however as stated above is >>>>> unfortunately possible, that existing >>>>> TUNSETSTEERINGEBPF programs would choose to >>>>> return a negative value rather than return the >>>>> positive value which holds the same meaning. >>>>> >>>>> It seems more likely that future >>>>> TUNSETSTEERINGEBPF programs would leverage a >>>>> negative return and potentially be loaded into >>>>> a kernel with the old behavior. >>>> OK if we are returning a special >>>> value, shouldn't we limit it? How about a special >>>> value with this meaning? >>>> If we are changing an ABI let's at least make it >>>> extensible. >>>> >>> A special value with this meaning sounds >>> good to me. I'll plan on adding a define >>> set to -1 to cause the fallback to automq. >> >> Can it really return -1? >> >> I see: >> >> static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog, >> struct sk_buff *skb) >> ... >> >> >>> The way I was initially viewing the old >>> behavior was that returning negative was >>> undefined; it happened to have the >>> outcomes I walked through, but not >>> necessarily by design. >> >> Having such fallback may bring extra troubles, it requires the eBPF >> program know the existence of the behavior which is not a part of kernel >> ABI actually. And then some eBPF program may start to rely on that which >> is pretty dangerous. Note, one important consideration is to have >> macvtap support where does not have any stuffs like automq. >> >> Thanks >> > How about we call this TUN_SSE_ABORT > instead of TUN_SSE_DO_AUTOMQ? > > TUN_SSE_ABORT could be documented as > falling back to the default queue > selection method in either space > (presumably macvtap has some queue > selection method when there is no prog). This looks like a more complex API, we don't want userspace to differ macvtap from tap too much. Thanks > >>> In order to keep the new behavior >>> extensible, how should we state that a >>> negative return other than -1 is >>> undefined and therefore subject to >>> change. Is something like this >>> sufficient? >>> >>> Documentation/networking/tc-actions-env-rules.txt >>> >>> Additionally, what should the new >>> behavior implement when a negative other >>> than -1 is returned? I would like to have >>> it do the same thing as -1 for now, but >>> with the understanding that this behavior >>> is undefined. Does this sound reasonable? >>> >>>>>> 3. why doesn't userspace need a way to figure out whether it runs on a kernel with and >>>>>> without this patch >>>>> There may be some value in exposing this fact >>>>> to the ebpf prog loader. What is the standard >>>>> practice here, a define? >>>> We'll need something at runtime - people move binaries between kernels >>>> without rebuilding then. An ioctl is one option. >>>> A sysfs attribute is another, an ethtool flag yet another. >>>> A combination of these is possible. >>>> >>>> And if we are doing this anyway, maybe let userspace select >>>> the new behaviour? This way we can stay compatible with old >>>> userspace... >>>> >>> Understood. I'll look into adding an >>> ioctl to activate the new behavior. And >>> perhaps a method of checking which is >>> behavior is currently active (in case we >>> ever want to change the default, say >>> after some suitably long transition >>> period). >>> >>>>>> thanks, >>>>>> MST >>>>>> >>>>>>> --- >>>>>>> drivers/net/tun.c | 20 +++++++++++--------- >>>>>>> 1 file changed, 11 insertions(+), 9 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c >>>>>>> index aab0be4..173d159 100644 >>>>>>> --- a/drivers/net/tun.c >>>>>>> +++ b/drivers/net/tun.c >>>>>>> @@ -583,35 +583,37 @@ static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) >>>>>>> return txq; >>>>>>> } >>>>>>> >>>>>>> -static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) >>>>>>> +static int tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) >>>>>>> { >>>>>>> struct tun_prog *prog; >>>>>>> u32 numqueues; >>>>>>> - u16 ret = 0; >>>>>>> + int ret = -1; >>>>>>> >>>>>>> numqueues = READ_ONCE(tun->numqueues); >>>>>>> if (!numqueues) >>>>>>> return 0; >>>>>>> >>>>>>> + rcu_read_lock(); >>>>>>> prog = rcu_dereference(tun->steering_prog); >>>>>>> if (prog) >>>>>>> ret = bpf_prog_run_clear_cb(prog->prog, skb); >>>>>>> + rcu_read_unlock(); >>>>>>> >>>>>>> - return ret % numqueues; >>>>>>> + if (ret >= 0) >>>>>>> + ret %= numqueues; >>>>>>> + >>>>>>> + return ret; >>>>>>> } >>>>>>> >>>>>>> static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, >>>>>>> struct net_device *sb_dev) >>>>>>> { >>>>>>> struct tun_struct *tun = netdev_priv(dev); >>>>>>> - u16 ret; >>>>>>> + int ret; >>>>>>> >>>>>>> - rcu_read_lock(); >>>>>>> - if (rcu_dereference(tun->steering_prog)) >>>>>>> - ret = tun_ebpf_select_queue(tun, skb); >>>>>>> - else >>>>>>> + ret = tun_ebpf_select_queue(tun, skb); >>>>>>> + if (ret < 0) >>>>>>> ret = tun_automq_select_queue(tun, skb); >>>>>>> - rcu_read_unlock(); >>>>>>> >>>>>>> return ret; >>>>>>> } >>>>>>> -- >>>>>>> 1.8.3.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return 2019-09-23 2:34 ` Jason Wang @ 2019-09-23 3:18 ` Matt Cover 2019-09-23 5:15 ` Jason Wang 0 siblings, 1 reply; 21+ messages in thread From: Matt Cover @ 2019-09-23 3:18 UTC (permalink / raw) To: Jason Wang Cc: Michael S. Tsirkin, davem, ast, daniel, kafai, songliubraving, yhs, Eric Dumazet, Stanislav Fomichev, Matthew Cover, mail, pabeni, Nicolas Dichtel, wangli39, lifei.shirley, tglx, netdev, linux-kernel, bpf On Sun, Sep 22, 2019 at 7:34 PM Jason Wang <jasowang@redhat.com> wrote: > > > On 2019/9/23 上午9:15, Matt Cover wrote: > > On Sun, Sep 22, 2019 at 5:51 PM Jason Wang <jasowang@redhat.com> wrote: > >> > >> On 2019/9/23 上午6:30, Matt Cover wrote: > >>> On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin <mst@redhat.com> wrote: > >>>> On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote: > >>>>> On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin <mst@redhat.com> wrote: > >>>>>> On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: > >>>>>>> Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal > >>>>>>> to fallback to tun_automq_select_queue() for tx queue selection. > >>>>>>> > >>>>>>> Compilation of this exact patch was tested. > >>>>>>> > >>>>>>> For functional testing 3 additional printk()s were added. > >>>>>>> > >>>>>>> Functional testing results (on 2 txq tap device): > >>>>>>> > >>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun no prog ========== > >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > >>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog -1 ========== > >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' > >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > >>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 0 ========== > >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' > >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > >>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 1 ========== > >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' > >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' > >>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 2 ========== > >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' > >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > >>>>>>> > >>>>>>> Signed-off-by: Matthew Cover <matthew.cover@stackpath.com> > >>>>>> Could you add a bit more motivation data here? > >>>>> Thank you for these questions Michael. > >>>>> > >>>>> I'll plan on adding the below information to the > >>>>> commit message and submitting a v2 of this patch > >>>>> when net-next reopens. In the meantime, it would > >>>>> be very helpful to know if these answers address > >>>>> some of your concerns. > >>>>> > >>>>>> 1. why is this a good idea > >>>>> This change allows TUNSETSTEERINGEBPF progs to > >>>>> do any of the following. > >>>>> 1. implement queue selection for a subset of > >>>>> traffic (e.g. special queue selection logic > >>>>> for ipv4, but return negative and use the > >>>>> default automq logic for ipv6) > >>>>> 2. determine there isn't sufficient information > >>>>> to do proper queue selection; return > >>>>> negative and use the default automq logic > >>>>> for the unknown > >>>>> 3. implement a noop prog (e.g. do > >>>>> bpf_trace_printk() then return negative and > >>>>> use the default automq logic for everything) > >>>>> > >>>>>> 2. how do we know existing userspace does not rely on existing behaviour > >>>>> Prior to this change a negative return from a > >>>>> TUNSETSTEERINGEBPF prog would have been cast > >>>>> into a u16 and traversed netdev_cap_txqueue(). > >>>>> > >>>>> In most cases netdev_cap_txqueue() would have > >>>>> found this value to exceed real_num_tx_queues > >>>>> and queue_index would be updated to 0. > >>>>> > >>>>> It is possible that a TUNSETSTEERINGEBPF prog > >>>>> return a negative value which when cast into a > >>>>> u16 results in a positive queue_index less than > >>>>> real_num_tx_queues. For example, on x86_64, a > >>>>> return value of -65535 results in a queue_index > >>>>> of 1; which is a valid queue for any multiqueue > >>>>> device. > >>>>> > >>>>> It seems unlikely, however as stated above is > >>>>> unfortunately possible, that existing > >>>>> TUNSETSTEERINGEBPF programs would choose to > >>>>> return a negative value rather than return the > >>>>> positive value which holds the same meaning. > >>>>> > >>>>> It seems more likely that future > >>>>> TUNSETSTEERINGEBPF programs would leverage a > >>>>> negative return and potentially be loaded into > >>>>> a kernel with the old behavior. > >>>> OK if we are returning a special > >>>> value, shouldn't we limit it? How about a special > >>>> value with this meaning? > >>>> If we are changing an ABI let's at least make it > >>>> extensible. > >>>> > >>> A special value with this meaning sounds > >>> good to me. I'll plan on adding a define > >>> set to -1 to cause the fallback to automq. > >> > >> Can it really return -1? > >> > >> I see: > >> > >> static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog, > >> struct sk_buff *skb) > >> ... > >> > >> > >>> The way I was initially viewing the old > >>> behavior was that returning negative was > >>> undefined; it happened to have the > >>> outcomes I walked through, but not > >>> necessarily by design. > >> > >> Having such fallback may bring extra troubles, it requires the eBPF > >> program know the existence of the behavior which is not a part of kernel > >> ABI actually. And then some eBPF program may start to rely on that which > >> is pretty dangerous. Note, one important consideration is to have > >> macvtap support where does not have any stuffs like automq. > >> > >> Thanks > >> > > How about we call this TUN_SSE_ABORT > > instead of TUN_SSE_DO_AUTOMQ? > > > > TUN_SSE_ABORT could be documented as > > falling back to the default queue > > selection method in either space > > (presumably macvtap has some queue > > selection method when there is no prog). > > > This looks like a more complex API, we don't want userspace to differ > macvtap from tap too much. > > Thanks > This is barely more complex and provides similar to what is done in many places. For xdp, an XDP_PASS enacts what the kernel would do if there was no bpf prog. For tc cls in da mode, TC_ACT_OK enacts what the kernel would do if there was no bpf prog. For xt_bpf, false enacts what the kernel would do if there was no bpf prog (as long as negation isn't in play in the rule, I believe). I know that this is somewhat of an oversimplification and that each of these also means something else in the respective hookpoint, but I standby seeing value in this change. macvtap must have some default (i.e the action which it takes when no prog is loaded), even if that is just use queue 0. We can provide the same TUN_SSE_ABORT in userspace which does the same thing; enacts the default when returned. Any differences left between tap and macvtap would be in what the default is, not in these changes. And that difference already exists today. > > > > >>> In order to keep the new behavior > >>> extensible, how should we state that a > >>> negative return other than -1 is > >>> undefined and therefore subject to > >>> change. Is something like this > >>> sufficient? > >>> > >>> Documentation/networking/tc-actions-env-rules.txt > >>> > >>> Additionally, what should the new > >>> behavior implement when a negative other > >>> than -1 is returned? I would like to have > >>> it do the same thing as -1 for now, but > >>> with the understanding that this behavior > >>> is undefined. Does this sound reasonable? > >>> > >>>>>> 3. why doesn't userspace need a way to figure out whether it runs on a kernel with and > >>>>>> without this patch > >>>>> There may be some value in exposing this fact > >>>>> to the ebpf prog loader. What is the standard > >>>>> practice here, a define? > >>>> We'll need something at runtime - people move binaries between kernels > >>>> without rebuilding then. An ioctl is one option. > >>>> A sysfs attribute is another, an ethtool flag yet another. > >>>> A combination of these is possible. > >>>> > >>>> And if we are doing this anyway, maybe let userspace select > >>>> the new behaviour? This way we can stay compatible with old > >>>> userspace... > >>>> > >>> Understood. I'll look into adding an > >>> ioctl to activate the new behavior. And > >>> perhaps a method of checking which is > >>> behavior is currently active (in case we > >>> ever want to change the default, say > >>> after some suitably long transition > >>> period). > >>> > >>>>>> thanks, > >>>>>> MST > >>>>>> > >>>>>>> --- > >>>>>>> drivers/net/tun.c | 20 +++++++++++--------- > >>>>>>> 1 file changed, 11 insertions(+), 9 deletions(-) > >>>>>>> > >>>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c > >>>>>>> index aab0be4..173d159 100644 > >>>>>>> --- a/drivers/net/tun.c > >>>>>>> +++ b/drivers/net/tun.c > >>>>>>> @@ -583,35 +583,37 @@ static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) > >>>>>>> return txq; > >>>>>>> } > >>>>>>> > >>>>>>> -static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > >>>>>>> +static int tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > >>>>>>> { > >>>>>>> struct tun_prog *prog; > >>>>>>> u32 numqueues; > >>>>>>> - u16 ret = 0; > >>>>>>> + int ret = -1; > >>>>>>> > >>>>>>> numqueues = READ_ONCE(tun->numqueues); > >>>>>>> if (!numqueues) > >>>>>>> return 0; > >>>>>>> > >>>>>>> + rcu_read_lock(); > >>>>>>> prog = rcu_dereference(tun->steering_prog); > >>>>>>> if (prog) > >>>>>>> ret = bpf_prog_run_clear_cb(prog->prog, skb); > >>>>>>> + rcu_read_unlock(); > >>>>>>> > >>>>>>> - return ret % numqueues; > >>>>>>> + if (ret >= 0) > >>>>>>> + ret %= numqueues; > >>>>>>> + > >>>>>>> + return ret; > >>>>>>> } > >>>>>>> > >>>>>>> static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, > >>>>>>> struct net_device *sb_dev) > >>>>>>> { > >>>>>>> struct tun_struct *tun = netdev_priv(dev); > >>>>>>> - u16 ret; > >>>>>>> + int ret; > >>>>>>> > >>>>>>> - rcu_read_lock(); > >>>>>>> - if (rcu_dereference(tun->steering_prog)) > >>>>>>> - ret = tun_ebpf_select_queue(tun, skb); > >>>>>>> - else > >>>>>>> + ret = tun_ebpf_select_queue(tun, skb); > >>>>>>> + if (ret < 0) > >>>>>>> ret = tun_automq_select_queue(tun, skb); > >>>>>>> - rcu_read_unlock(); > >>>>>>> > >>>>>>> return ret; > >>>>>>> } > >>>>>>> -- > >>>>>>> 1.8.3.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return 2019-09-23 3:18 ` Matt Cover @ 2019-09-23 5:15 ` Jason Wang 2019-09-23 16:31 ` Matt Cover 0 siblings, 1 reply; 21+ messages in thread From: Jason Wang @ 2019-09-23 5:15 UTC (permalink / raw) To: Matt Cover Cc: Michael S. Tsirkin, davem, ast, daniel, kafai, songliubraving, yhs, Eric Dumazet, Stanislav Fomichev, Matthew Cover, mail, pabeni, Nicolas Dichtel, wangli39, lifei.shirley, tglx, netdev, linux-kernel, bpf On 2019/9/23 上午11:18, Matt Cover wrote: > On Sun, Sep 22, 2019 at 7:34 PM Jason Wang <jasowang@redhat.com> wrote: >> >> On 2019/9/23 上午9:15, Matt Cover wrote: >>> On Sun, Sep 22, 2019 at 5:51 PM Jason Wang <jasowang@redhat.com> wrote: >>>> On 2019/9/23 上午6:30, Matt Cover wrote: >>>>> On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin <mst@redhat.com> wrote: >>>>>> On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote: >>>>>>> On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin <mst@redhat.com> wrote: >>>>>>>> On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: >>>>>>>>> Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal >>>>>>>>> to fallback to tun_automq_select_queue() for tx queue selection. >>>>>>>>> >>>>>>>>> Compilation of this exact patch was tested. >>>>>>>>> >>>>>>>>> For functional testing 3 additional printk()s were added. >>>>>>>>> >>>>>>>>> Functional testing results (on 2 txq tap device): >>>>>>>>> >>>>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun no prog ========== >>>>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' >>>>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran >>>>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog -1 ========== >>>>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' >>>>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' >>>>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran >>>>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 0 ========== >>>>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' >>>>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' >>>>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 1 ========== >>>>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' >>>>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' >>>>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 2 ========== >>>>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' >>>>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' >>>>>>>>> >>>>>>>>> Signed-off-by: Matthew Cover <matthew.cover@stackpath.com> >>>>>>>> Could you add a bit more motivation data here? >>>>>>> Thank you for these questions Michael. >>>>>>> >>>>>>> I'll plan on adding the below information to the >>>>>>> commit message and submitting a v2 of this patch >>>>>>> when net-next reopens. In the meantime, it would >>>>>>> be very helpful to know if these answers address >>>>>>> some of your concerns. >>>>>>> >>>>>>>> 1. why is this a good idea >>>>>>> This change allows TUNSETSTEERINGEBPF progs to >>>>>>> do any of the following. >>>>>>> 1. implement queue selection for a subset of >>>>>>> traffic (e.g. special queue selection logic >>>>>>> for ipv4, but return negative and use the >>>>>>> default automq logic for ipv6) >>>>>>> 2. determine there isn't sufficient information >>>>>>> to do proper queue selection; return >>>>>>> negative and use the default automq logic >>>>>>> for the unknown >>>>>>> 3. implement a noop prog (e.g. do >>>>>>> bpf_trace_printk() then return negative and >>>>>>> use the default automq logic for everything) >>>>>>> >>>>>>>> 2. how do we know existing userspace does not rely on existing behaviour >>>>>>> Prior to this change a negative return from a >>>>>>> TUNSETSTEERINGEBPF prog would have been cast >>>>>>> into a u16 and traversed netdev_cap_txqueue(). >>>>>>> >>>>>>> In most cases netdev_cap_txqueue() would have >>>>>>> found this value to exceed real_num_tx_queues >>>>>>> and queue_index would be updated to 0. >>>>>>> >>>>>>> It is possible that a TUNSETSTEERINGEBPF prog >>>>>>> return a negative value which when cast into a >>>>>>> u16 results in a positive queue_index less than >>>>>>> real_num_tx_queues. For example, on x86_64, a >>>>>>> return value of -65535 results in a queue_index >>>>>>> of 1; which is a valid queue for any multiqueue >>>>>>> device. >>>>>>> >>>>>>> It seems unlikely, however as stated above is >>>>>>> unfortunately possible, that existing >>>>>>> TUNSETSTEERINGEBPF programs would choose to >>>>>>> return a negative value rather than return the >>>>>>> positive value which holds the same meaning. >>>>>>> >>>>>>> It seems more likely that future >>>>>>> TUNSETSTEERINGEBPF programs would leverage a >>>>>>> negative return and potentially be loaded into >>>>>>> a kernel with the old behavior. >>>>>> OK if we are returning a special >>>>>> value, shouldn't we limit it? How about a special >>>>>> value with this meaning? >>>>>> If we are changing an ABI let's at least make it >>>>>> extensible. >>>>>> >>>>> A special value with this meaning sounds >>>>> good to me. I'll plan on adding a define >>>>> set to -1 to cause the fallback to automq. >>>> Can it really return -1? >>>> >>>> I see: >>>> >>>> static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog, >>>> struct sk_buff *skb) >>>> ... >>>> >>>> >>>>> The way I was initially viewing the old >>>>> behavior was that returning negative was >>>>> undefined; it happened to have the >>>>> outcomes I walked through, but not >>>>> necessarily by design. >>>> Having such fallback may bring extra troubles, it requires the eBPF >>>> program know the existence of the behavior which is not a part of kernel >>>> ABI actually. And then some eBPF program may start to rely on that which >>>> is pretty dangerous. Note, one important consideration is to have >>>> macvtap support where does not have any stuffs like automq. >>>> >>>> Thanks >>>> >>> How about we call this TUN_SSE_ABORT >>> instead of TUN_SSE_DO_AUTOMQ? >>> >>> TUN_SSE_ABORT could be documented as >>> falling back to the default queue >>> selection method in either space >>> (presumably macvtap has some queue >>> selection method when there is no prog). >> >> This looks like a more complex API, we don't want userspace to differ >> macvtap from tap too much. >> >> Thanks >> > This is barely more complex and provides > similar to what is done in many places. > For xdp, an XDP_PASS enacts what the > kernel would do if there was no bpf prog. > For tc cls in da mode, TC_ACT_OK enacts > what the kernel would do if there was > no bpf prog. For xt_bpf, false enacts > what the kernel would do if there was > no bpf prog (as long as negation > isn't in play in the rule, I believe). I think this is simply because you can't implement e.g XDP_PASS/TC_ACT_OK through eBPF itself which is not the case of steering prog here. > > I know that this is somewhat of an > oversimplification and that each of > these also means something else in > the respective hookpoint, but I standby > seeing value in this change. > > macvtap must have some default (i.e the > action which it takes when no prog is > loaded), even if that is just use queue > 0. We can provide the same TUN_SSE_ABORT > in userspace which does the same thing; > enacts the default when returned. Any > differences left between tap and macvtap > would be in what the default is, not in > these changes. And that difference already > exists today. I think it's better to safe to just drop the packet instead of trying to workaround it. Thanks > >>>>> In order to keep the new behavior >>>>> extensible, how should we state that a >>>>> negative return other than -1 is >>>>> undefined and therefore subject to >>>>> change. Is something like this >>>>> sufficient? >>>>> >>>>> Documentation/networking/tc-actions-env-rules.txt >>>>> >>>>> Additionally, what should the new >>>>> behavior implement when a negative other >>>>> than -1 is returned? I would like to have >>>>> it do the same thing as -1 for now, but >>>>> with the understanding that this behavior >>>>> is undefined. Does this sound reasonable? >>>>> >>>>>>>> 3. why doesn't userspace need a way to figure out whether it runs on a kernel with and >>>>>>>> without this patch >>>>>>> There may be some value in exposing this fact >>>>>>> to the ebpf prog loader. What is the standard >>>>>>> practice here, a define? >>>>>> We'll need something at runtime - people move binaries between kernels >>>>>> without rebuilding then. An ioctl is one option. >>>>>> A sysfs attribute is another, an ethtool flag yet another. >>>>>> A combination of these is possible. >>>>>> >>>>>> And if we are doing this anyway, maybe let userspace select >>>>>> the new behaviour? This way we can stay compatible with old >>>>>> userspace... >>>>>> >>>>> Understood. I'll look into adding an >>>>> ioctl to activate the new behavior. And >>>>> perhaps a method of checking which is >>>>> behavior is currently active (in case we >>>>> ever want to change the default, say >>>>> after some suitably long transition >>>>> period). >>>>> >>>>>>>> thanks, >>>>>>>> MST >>>>>>>> >>>>>>>>> --- >>>>>>>>> drivers/net/tun.c | 20 +++++++++++--------- >>>>>>>>> 1 file changed, 11 insertions(+), 9 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c >>>>>>>>> index aab0be4..173d159 100644 >>>>>>>>> --- a/drivers/net/tun.c >>>>>>>>> +++ b/drivers/net/tun.c >>>>>>>>> @@ -583,35 +583,37 @@ static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) >>>>>>>>> return txq; >>>>>>>>> } >>>>>>>>> >>>>>>>>> -static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) >>>>>>>>> +static int tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) >>>>>>>>> { >>>>>>>>> struct tun_prog *prog; >>>>>>>>> u32 numqueues; >>>>>>>>> - u16 ret = 0; >>>>>>>>> + int ret = -1; >>>>>>>>> >>>>>>>>> numqueues = READ_ONCE(tun->numqueues); >>>>>>>>> if (!numqueues) >>>>>>>>> return 0; >>>>>>>>> >>>>>>>>> + rcu_read_lock(); >>>>>>>>> prog = rcu_dereference(tun->steering_prog); >>>>>>>>> if (prog) >>>>>>>>> ret = bpf_prog_run_clear_cb(prog->prog, skb); >>>>>>>>> + rcu_read_unlock(); >>>>>>>>> >>>>>>>>> - return ret % numqueues; >>>>>>>>> + if (ret >= 0) >>>>>>>>> + ret %= numqueues; >>>>>>>>> + >>>>>>>>> + return ret; >>>>>>>>> } >>>>>>>>> >>>>>>>>> static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, >>>>>>>>> struct net_device *sb_dev) >>>>>>>>> { >>>>>>>>> struct tun_struct *tun = netdev_priv(dev); >>>>>>>>> - u16 ret; >>>>>>>>> + int ret; >>>>>>>>> >>>>>>>>> - rcu_read_lock(); >>>>>>>>> - if (rcu_dereference(tun->steering_prog)) >>>>>>>>> - ret = tun_ebpf_select_queue(tun, skb); >>>>>>>>> - else >>>>>>>>> + ret = tun_ebpf_select_queue(tun, skb); >>>>>>>>> + if (ret < 0) >>>>>>>>> ret = tun_automq_select_queue(tun, skb); >>>>>>>>> - rcu_read_unlock(); >>>>>>>>> >>>>>>>>> return ret; >>>>>>>>> } >>>>>>>>> -- >>>>>>>>> 1.8.3.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return 2019-09-23 5:15 ` Jason Wang @ 2019-09-23 16:31 ` Matt Cover 2019-09-25 4:08 ` Jason Wang 0 siblings, 1 reply; 21+ messages in thread From: Matt Cover @ 2019-09-23 16:31 UTC (permalink / raw) To: Jason Wang Cc: Michael S. Tsirkin, davem, ast, daniel, kafai, songliubraving, yhs, Eric Dumazet, Stanislav Fomichev, Matthew Cover, mail, pabeni, Nicolas Dichtel, wangli39, lifei.shirley, tglx, netdev, linux-kernel, bpf On Sun, Sep 22, 2019 at 10:16 PM Jason Wang <jasowang@redhat.com> wrote: > > > On 2019/9/23 上午11:18, Matt Cover wrote: > > On Sun, Sep 22, 2019 at 7:34 PM Jason Wang <jasowang@redhat.com> wrote: > >> > >> On 2019/9/23 上午9:15, Matt Cover wrote: > >>> On Sun, Sep 22, 2019 at 5:51 PM Jason Wang <jasowang@redhat.com> wrote: > >>>> On 2019/9/23 上午6:30, Matt Cover wrote: > >>>>> On Sun, Sep 22, 2019 at 1:36 PM Michael S. Tsirkin <mst@redhat.com> wrote: > >>>>>> On Sun, Sep 22, 2019 at 10:43:19AM -0700, Matt Cover wrote: > >>>>>>> On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin <mst@redhat.com> wrote: > >>>>>>>> On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: > >>>>>>>>> Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal > >>>>>>>>> to fallback to tun_automq_select_queue() for tx queue selection. > >>>>>>>>> > >>>>>>>>> Compilation of this exact patch was tested. > >>>>>>>>> > >>>>>>>>> For functional testing 3 additional printk()s were added. > >>>>>>>>> > >>>>>>>>> Functional testing results (on 2 txq tap device): > >>>>>>>>> > >>>>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun no prog ========== > >>>>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > >>>>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > >>>>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog -1 ========== > >>>>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' > >>>>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > >>>>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > >>>>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 0 ========== > >>>>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' > >>>>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > >>>>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 1 ========== > >>>>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' > >>>>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' > >>>>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 2 ========== > >>>>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' > >>>>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > >>>>>>>>> > >>>>>>>>> Signed-off-by: Matthew Cover <matthew.cover@stackpath.com> > >>>>>>>> Could you add a bit more motivation data here? > >>>>>>> Thank you for these questions Michael. > >>>>>>> > >>>>>>> I'll plan on adding the below information to the > >>>>>>> commit message and submitting a v2 of this patch > >>>>>>> when net-next reopens. In the meantime, it would > >>>>>>> be very helpful to know if these answers address > >>>>>>> some of your concerns. > >>>>>>> > >>>>>>>> 1. why is this a good idea > >>>>>>> This change allows TUNSETSTEERINGEBPF progs to > >>>>>>> do any of the following. > >>>>>>> 1. implement queue selection for a subset of > >>>>>>> traffic (e.g. special queue selection logic > >>>>>>> for ipv4, but return negative and use the > >>>>>>> default automq logic for ipv6) > >>>>>>> 2. determine there isn't sufficient information > >>>>>>> to do proper queue selection; return > >>>>>>> negative and use the default automq logic > >>>>>>> for the unknown > >>>>>>> 3. implement a noop prog (e.g. do > >>>>>>> bpf_trace_printk() then return negative and > >>>>>>> use the default automq logic for everything) > >>>>>>> > >>>>>>>> 2. how do we know existing userspace does not rely on existing behaviour > >>>>>>> Prior to this change a negative return from a > >>>>>>> TUNSETSTEERINGEBPF prog would have been cast > >>>>>>> into a u16 and traversed netdev_cap_txqueue(). > >>>>>>> > >>>>>>> In most cases netdev_cap_txqueue() would have > >>>>>>> found this value to exceed real_num_tx_queues > >>>>>>> and queue_index would be updated to 0. > >>>>>>> > >>>>>>> It is possible that a TUNSETSTEERINGEBPF prog > >>>>>>> return a negative value which when cast into a > >>>>>>> u16 results in a positive queue_index less than > >>>>>>> real_num_tx_queues. For example, on x86_64, a > >>>>>>> return value of -65535 results in a queue_index > >>>>>>> of 1; which is a valid queue for any multiqueue > >>>>>>> device. > >>>>>>> > >>>>>>> It seems unlikely, however as stated above is > >>>>>>> unfortunately possible, that existing > >>>>>>> TUNSETSTEERINGEBPF programs would choose to > >>>>>>> return a negative value rather than return the > >>>>>>> positive value which holds the same meaning. > >>>>>>> > >>>>>>> It seems more likely that future > >>>>>>> TUNSETSTEERINGEBPF programs would leverage a > >>>>>>> negative return and potentially be loaded into > >>>>>>> a kernel with the old behavior. > >>>>>> OK if we are returning a special > >>>>>> value, shouldn't we limit it? How about a special > >>>>>> value with this meaning? > >>>>>> If we are changing an ABI let's at least make it > >>>>>> extensible. > >>>>>> > >>>>> A special value with this meaning sounds > >>>>> good to me. I'll plan on adding a define > >>>>> set to -1 to cause the fallback to automq. > >>>> Can it really return -1? > >>>> > >>>> I see: > >>>> > >>>> static inline u32 bpf_prog_run_clear_cb(const struct bpf_prog *prog, > >>>> struct sk_buff *skb) > >>>> ... > >>>> > >>>> > >>>>> The way I was initially viewing the old > >>>>> behavior was that returning negative was > >>>>> undefined; it happened to have the > >>>>> outcomes I walked through, but not > >>>>> necessarily by design. > >>>> Having such fallback may bring extra troubles, it requires the eBPF > >>>> program know the existence of the behavior which is not a part of kernel > >>>> ABI actually. And then some eBPF program may start to rely on that which > >>>> is pretty dangerous. Note, one important consideration is to have > >>>> macvtap support where does not have any stuffs like automq. > >>>> > >>>> Thanks > >>>> > >>> How about we call this TUN_SSE_ABORT > >>> instead of TUN_SSE_DO_AUTOMQ? > >>> > >>> TUN_SSE_ABORT could be documented as > >>> falling back to the default queue > >>> selection method in either space > >>> (presumably macvtap has some queue > >>> selection method when there is no prog). > >> > >> This looks like a more complex API, we don't want userspace to differ > >> macvtap from tap too much. > >> > >> Thanks > >> > > This is barely more complex and provides > > similar to what is done in many places. > > For xdp, an XDP_PASS enacts what the > > kernel would do if there was no bpf prog. > > For tc cls in da mode, TC_ACT_OK enacts > > what the kernel would do if there was > > no bpf prog. For xt_bpf, false enacts > > what the kernel would do if there was > > no bpf prog (as long as negation > > isn't in play in the rule, I believe). > > > I think this is simply because you can't implement e.g > XDP_PASS/TC_ACT_OK through eBPF itself which is not the case of steering > prog here. > > > > > > I know that this is somewhat of an > > oversimplification and that each of > > these also means something else in > > the respective hookpoint, but I standby > > seeing value in this change. > > > > macvtap must have some default (i.e the > > action which it takes when no prog is > > loaded), even if that is just use queue > > 0. We can provide the same TUN_SSE_ABORT > > in userspace which does the same thing; > > enacts the default when returned. Any > > differences left between tap and macvtap > > would be in what the default is, not in > > these changes. And that difference already > > exists today. > > > I think it's better to safe to just drop the packet instead of trying to > workaround it. > This patch aside, dropping the packet here seems like the wrong choice. Loading a prog at this hookpoint "configures" steering. The action of configuring steering should not result in dropped packets. Suboptimal delivery is generally preferable to no delivery. Leaving the behavior as-is (i.e. relying on netdev_cap_txqueue()) or making any return which doesn't fit in a u16 simply use queue 0 would be highly preferable to dropping the packet. > Thanks > > > > > >>>>> In order to keep the new behavior > >>>>> extensible, how should we state that a > >>>>> negative return other than -1 is > >>>>> undefined and therefore subject to > >>>>> change. Is something like this > >>>>> sufficient? > >>>>> > >>>>> Documentation/networking/tc-actions-env-rules.txt > >>>>> > >>>>> Additionally, what should the new > >>>>> behavior implement when a negative other > >>>>> than -1 is returned? I would like to have > >>>>> it do the same thing as -1 for now, but > >>>>> with the understanding that this behavior > >>>>> is undefined. Does this sound reasonable? > >>>>> > >>>>>>>> 3. why doesn't userspace need a way to figure out whether it runs on a kernel with and > >>>>>>>> without this patch > >>>>>>> There may be some value in exposing this fact > >>>>>>> to the ebpf prog loader. What is the standard > >>>>>>> practice here, a define? > >>>>>> We'll need something at runtime - people move binaries between kernels > >>>>>> without rebuilding then. An ioctl is one option. > >>>>>> A sysfs attribute is another, an ethtool flag yet another. > >>>>>> A combination of these is possible. > >>>>>> > >>>>>> And if we are doing this anyway, maybe let userspace select > >>>>>> the new behaviour? This way we can stay compatible with old > >>>>>> userspace... > >>>>>> > >>>>> Understood. I'll look into adding an > >>>>> ioctl to activate the new behavior. And > >>>>> perhaps a method of checking which is > >>>>> behavior is currently active (in case we > >>>>> ever want to change the default, say > >>>>> after some suitably long transition > >>>>> period). > >>>>> > >>>>>>>> thanks, > >>>>>>>> MST > >>>>>>>> > >>>>>>>>> --- > >>>>>>>>> drivers/net/tun.c | 20 +++++++++++--------- > >>>>>>>>> 1 file changed, 11 insertions(+), 9 deletions(-) > >>>>>>>>> > >>>>>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c > >>>>>>>>> index aab0be4..173d159 100644 > >>>>>>>>> --- a/drivers/net/tun.c > >>>>>>>>> +++ b/drivers/net/tun.c > >>>>>>>>> @@ -583,35 +583,37 @@ static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) > >>>>>>>>> return txq; > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> -static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > >>>>>>>>> +static int tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > >>>>>>>>> { > >>>>>>>>> struct tun_prog *prog; > >>>>>>>>> u32 numqueues; > >>>>>>>>> - u16 ret = 0; > >>>>>>>>> + int ret = -1; > >>>>>>>>> > >>>>>>>>> numqueues = READ_ONCE(tun->numqueues); > >>>>>>>>> if (!numqueues) > >>>>>>>>> return 0; > >>>>>>>>> > >>>>>>>>> + rcu_read_lock(); > >>>>>>>>> prog = rcu_dereference(tun->steering_prog); > >>>>>>>>> if (prog) > >>>>>>>>> ret = bpf_prog_run_clear_cb(prog->prog, skb); > >>>>>>>>> + rcu_read_unlock(); > >>>>>>>>> > >>>>>>>>> - return ret % numqueues; > >>>>>>>>> + if (ret >= 0) > >>>>>>>>> + ret %= numqueues; > >>>>>>>>> + > >>>>>>>>> + return ret; > >>>>>>>>> } > >>>>>>>>> > >>>>>>>>> static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, > >>>>>>>>> struct net_device *sb_dev) > >>>>>>>>> { > >>>>>>>>> struct tun_struct *tun = netdev_priv(dev); > >>>>>>>>> - u16 ret; > >>>>>>>>> + int ret; > >>>>>>>>> > >>>>>>>>> - rcu_read_lock(); > >>>>>>>>> - if (rcu_dereference(tun->steering_prog)) > >>>>>>>>> - ret = tun_ebpf_select_queue(tun, skb); > >>>>>>>>> - else > >>>>>>>>> + ret = tun_ebpf_select_queue(tun, skb); > >>>>>>>>> + if (ret < 0) > >>>>>>>>> ret = tun_automq_select_queue(tun, skb); > >>>>>>>>> - rcu_read_unlock(); > >>>>>>>>> > >>>>>>>>> return ret; > >>>>>>>>> } > >>>>>>>>> -- > >>>>>>>>> 1.8.3.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return 2019-09-23 16:31 ` Matt Cover @ 2019-09-25 4:08 ` Jason Wang 0 siblings, 0 replies; 21+ messages in thread From: Jason Wang @ 2019-09-25 4:08 UTC (permalink / raw) To: Matt Cover Cc: Michael S. Tsirkin, davem, ast, daniel, kafai, songliubraving, yhs, Eric Dumazet, Stanislav Fomichev, Matthew Cover, mail, pabeni, Nicolas Dichtel, wangli39, lifei.shirley, tglx, netdev, linux-kernel, bpf On 2019/9/24 上午12:31, Matt Cover wrote: >> I think it's better to safe to just drop the packet instead of trying to >> workaround it. >> > This patch aside, dropping the packet here > seems like the wrong choice. Loading a > prog at this hookpoint "configures" > steering. The action of configuring > steering should not result in dropped > packets. > > Suboptimal delivery is generally preferable > to no delivery. Leaving the behavior as-is > (i.e. relying on netdev_cap_txqueue()) or > making any return which doesn't fit in a > u16 simply use queue 0 would be highly > preferable to dropping the packet. > >> Thanks It leaves a choice for steering ebpf program to drop the packet that it can't classify. But consider we have already had socket filter, it probably not a big problem since we can drop packets there. Thanks ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return 2019-09-22 17:43 ` Matt Cover 2019-09-22 20:35 ` Michael S. Tsirkin @ 2019-09-23 0:46 ` Jason Wang 2019-09-23 1:20 ` Matt Cover 1 sibling, 1 reply; 21+ messages in thread From: Jason Wang @ 2019-09-23 0:46 UTC (permalink / raw) To: Matt Cover, Michael S. Tsirkin Cc: davem, ast, daniel, kafai, songliubraving, yhs, Eric Dumazet, Stanislav Fomichev, Matthew Cover, mail, pabeni, Nicolas Dichtel, wangli39, lifei.shirley, tglx, netdev, linux-kernel, bpf On 2019/9/23 上午1:43, Matt Cover wrote: > On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin <mst@redhat.com> wrote: >> On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: >>> Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal >>> to fallback to tun_automq_select_queue() for tx queue selection. >>> >>> Compilation of this exact patch was tested. >>> >>> For functional testing 3 additional printk()s were added. >>> >>> Functional testing results (on 2 txq tap device): >>> >>> [Fri Sep 20 18:33:27 2019] ========== tun no prog ========== >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran >>> [Fri Sep 20 18:33:27 2019] ========== tun prog -1 ========== >>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran >>> [Fri Sep 20 18:33:27 2019] ========== tun prog 0 ========== >>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' >>> [Fri Sep 20 18:33:27 2019] ========== tun prog 1 ========== >>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' >>> [Fri Sep 20 18:33:27 2019] ========== tun prog 2 ========== >>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' >>> >>> Signed-off-by: Matthew Cover <matthew.cover@stackpath.com> >> >> Could you add a bit more motivation data here? > Thank you for these questions Michael. > > I'll plan on adding the below information to the > commit message and submitting a v2 of this patch > when net-next reopens. In the meantime, it would > be very helpful to know if these answers address > some of your concerns. > >> 1. why is this a good idea > This change allows TUNSETSTEERINGEBPF progs to > do any of the following. > 1. implement queue selection for a subset of > traffic (e.g. special queue selection logic > for ipv4, but return negative and use the > default automq logic for ipv6) Well, using ebpf means it need to take care of all the cases. E.g you can easily implement the fallback through eBPF as well. > 2. determine there isn't sufficient information > to do proper queue selection; return > negative and use the default automq logic > for the unknown Same as above. > 3. implement a noop prog (e.g. do > bpf_trace_printk() then return negative and > use the default automq logic for everything) ditto. > >> 2. how do we know existing userspace does not rely on existing behaviour > Prior to this change a negative return from a > TUNSETSTEERINGEBPF prog would have been cast > into a u16 and traversed netdev_cap_txqueue(). > > In most cases netdev_cap_txqueue() would have > found this value to exceed real_num_tx_queues > and queue_index would be updated to 0. > > It is possible that a TUNSETSTEERINGEBPF prog > return a negative value which when cast into a > u16 results in a positive queue_index less than > real_num_tx_queues. For example, on x86_64, a > return value of -65535 results in a queue_index > of 1; which is a valid queue for any multiqueue > device. > > It seems unlikely, however as stated above is > unfortunately possible, that existing > TUNSETSTEERINGEBPF programs would choose to > return a negative value rather than return the > positive value which holds the same meaning. > > It seems more likely that future > TUNSETSTEERINGEBPF programs would leverage a > negative return and potentially be loaded into > a kernel with the old behavior. Yes, eBPF can return probably wrong value, but what kernel did is just to make sure it doesn't harm anything. I would rather just drop the packet in this case. Thanks > >> 3. why doesn't userspace need a way to figure out whether it runs on a kernel with and >> without this patch > There may be some value in exposing this fact > to the ebpf prog loader. What is the standard > practice here, a define? > >> >> thanks, >> MST >> >>> --- >>> drivers/net/tun.c | 20 +++++++++++--------- >>> 1 file changed, 11 insertions(+), 9 deletions(-) >>> >>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c >>> index aab0be4..173d159 100644 >>> --- a/drivers/net/tun.c >>> +++ b/drivers/net/tun.c >>> @@ -583,35 +583,37 @@ static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) >>> return txq; >>> } >>> >>> -static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) >>> +static int tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) >>> { >>> struct tun_prog *prog; >>> u32 numqueues; >>> - u16 ret = 0; >>> + int ret = -1; >>> >>> numqueues = READ_ONCE(tun->numqueues); >>> if (!numqueues) >>> return 0; >>> >>> + rcu_read_lock(); >>> prog = rcu_dereference(tun->steering_prog); >>> if (prog) >>> ret = bpf_prog_run_clear_cb(prog->prog, skb); >>> + rcu_read_unlock(); >>> >>> - return ret % numqueues; >>> + if (ret >= 0) >>> + ret %= numqueues; >>> + >>> + return ret; >>> } >>> >>> static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, >>> struct net_device *sb_dev) >>> { >>> struct tun_struct *tun = netdev_priv(dev); >>> - u16 ret; >>> + int ret; >>> >>> - rcu_read_lock(); >>> - if (rcu_dereference(tun->steering_prog)) >>> - ret = tun_ebpf_select_queue(tun, skb); >>> - else >>> + ret = tun_ebpf_select_queue(tun, skb); >>> + if (ret < 0) >>> ret = tun_automq_select_queue(tun, skb); >>> - rcu_read_unlock(); >>> >>> return ret; >>> } >>> -- >>> 1.8.3.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return 2019-09-23 0:46 ` Jason Wang @ 2019-09-23 1:20 ` Matt Cover 2019-09-23 2:32 ` Jason Wang 0 siblings, 1 reply; 21+ messages in thread From: Matt Cover @ 2019-09-23 1:20 UTC (permalink / raw) To: Jason Wang Cc: Michael S. Tsirkin, davem, ast, daniel, kafai, songliubraving, yhs, Eric Dumazet, Stanislav Fomichev, Matthew Cover, mail, pabeni, Nicolas Dichtel, wangli39, lifei.shirley, tglx, netdev, linux-kernel, bpf On Sun, Sep 22, 2019 at 5:46 PM Jason Wang <jasowang@redhat.com> wrote: > > > On 2019/9/23 上午1:43, Matt Cover wrote: > > On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin <mst@redhat.com> wrote: > >> On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: > >>> Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal > >>> to fallback to tun_automq_select_queue() for tx queue selection. > >>> > >>> Compilation of this exact patch was tested. > >>> > >>> For functional testing 3 additional printk()s were added. > >>> > >>> Functional testing results (on 2 txq tap device): > >>> > >>> [Fri Sep 20 18:33:27 2019] ========== tun no prog ========== > >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > >>> [Fri Sep 20 18:33:27 2019] ========== tun prog -1 ========== > >>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' > >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > >>> [Fri Sep 20 18:33:27 2019] ========== tun prog 0 ========== > >>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' > >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > >>> [Fri Sep 20 18:33:27 2019] ========== tun prog 1 ========== > >>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' > >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' > >>> [Fri Sep 20 18:33:27 2019] ========== tun prog 2 ========== > >>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' > >>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > >>> > >>> Signed-off-by: Matthew Cover <matthew.cover@stackpath.com> > >> > >> Could you add a bit more motivation data here? > > Thank you for these questions Michael. > > > > I'll plan on adding the below information to the > > commit message and submitting a v2 of this patch > > when net-next reopens. In the meantime, it would > > be very helpful to know if these answers address > > some of your concerns. > > > >> 1. why is this a good idea > > This change allows TUNSETSTEERINGEBPF progs to > > do any of the following. > > 1. implement queue selection for a subset of > > traffic (e.g. special queue selection logic > > for ipv4, but return negative and use the > > default automq logic for ipv6) > > > Well, using ebpf means it need to take care of all the cases. E.g you > can easily implement the fallback through eBPF as well. > I really think there is value in being able to implement a scoped special case while leaving the rest of the packets in the kernel's hands. Having to reimplement automq makes this hookpoint less accessible to beginners and experienced alike. > > > 2. determine there isn't sufficient information > > to do proper queue selection; return > > negative and use the default automq logic > > for the unknown > > > Same as above. > > > > 3. implement a noop prog (e.g. do > > bpf_trace_printk() then return negative and > > use the default automq logic for everything) > > > ditto. > > > > > >> 2. how do we know existing userspace does not rely on existing behaviour > > Prior to this change a negative return from a > > TUNSETSTEERINGEBPF prog would have been cast > > into a u16 and traversed netdev_cap_txqueue(). > > > > In most cases netdev_cap_txqueue() would have > > found this value to exceed real_num_tx_queues > > and queue_index would be updated to 0. > > > > It is possible that a TUNSETSTEERINGEBPF prog > > return a negative value which when cast into a > > u16 results in a positive queue_index less than > > real_num_tx_queues. For example, on x86_64, a > > return value of -65535 results in a queue_index > > of 1; which is a valid queue for any multiqueue > > device. > > > > It seems unlikely, however as stated above is > > unfortunately possible, that existing > > TUNSETSTEERINGEBPF programs would choose to > > return a negative value rather than return the > > positive value which holds the same meaning. > > > > It seems more likely that future > > TUNSETSTEERINGEBPF programs would leverage a > > negative return and potentially be loaded into > > a kernel with the old behavior. > > > Yes, eBPF can return probably wrong value, but what kernel did is just > to make sure it doesn't harm anything. > > I would rather just drop the packet in this case. > In addition to TUN_SSE_ABORT, we can add TUN_SSE_DROP. That could be made the default for any undefined negative return as well. > Thanks > > > > > >> 3. why doesn't userspace need a way to figure out whether it runs on a kernel with and > >> without this patch > > There may be some value in exposing this fact > > to the ebpf prog loader. What is the standard > > practice here, a define? > > > >> > >> thanks, > >> MST > >> > >>> --- > >>> drivers/net/tun.c | 20 +++++++++++--------- > >>> 1 file changed, 11 insertions(+), 9 deletions(-) > >>> > >>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c > >>> index aab0be4..173d159 100644 > >>> --- a/drivers/net/tun.c > >>> +++ b/drivers/net/tun.c > >>> @@ -583,35 +583,37 @@ static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) > >>> return txq; > >>> } > >>> > >>> -static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > >>> +static int tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > >>> { > >>> struct tun_prog *prog; > >>> u32 numqueues; > >>> - u16 ret = 0; > >>> + int ret = -1; > >>> > >>> numqueues = READ_ONCE(tun->numqueues); > >>> if (!numqueues) > >>> return 0; > >>> > >>> + rcu_read_lock(); > >>> prog = rcu_dereference(tun->steering_prog); > >>> if (prog) > >>> ret = bpf_prog_run_clear_cb(prog->prog, skb); > >>> + rcu_read_unlock(); > >>> > >>> - return ret % numqueues; > >>> + if (ret >= 0) > >>> + ret %= numqueues; > >>> + > >>> + return ret; > >>> } > >>> > >>> static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, > >>> struct net_device *sb_dev) > >>> { > >>> struct tun_struct *tun = netdev_priv(dev); > >>> - u16 ret; > >>> + int ret; > >>> > >>> - rcu_read_lock(); > >>> - if (rcu_dereference(tun->steering_prog)) > >>> - ret = tun_ebpf_select_queue(tun, skb); > >>> - else > >>> + ret = tun_ebpf_select_queue(tun, skb); > >>> + if (ret < 0) > >>> ret = tun_automq_select_queue(tun, skb); > >>> - rcu_read_unlock(); > >>> > >>> return ret; > >>> } > >>> -- > >>> 1.8.3.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return 2019-09-23 1:20 ` Matt Cover @ 2019-09-23 2:32 ` Jason Wang 2019-09-23 3:00 ` Matt Cover 0 siblings, 1 reply; 21+ messages in thread From: Jason Wang @ 2019-09-23 2:32 UTC (permalink / raw) To: Matt Cover Cc: Michael S. Tsirkin, davem, ast, daniel, kafai, songliubraving, yhs, Eric Dumazet, Stanislav Fomichev, Matthew Cover, mail, pabeni, Nicolas Dichtel, wangli39, lifei.shirley, tglx, netdev, linux-kernel, bpf On 2019/9/23 上午9:20, Matt Cover wrote: > On Sun, Sep 22, 2019 at 5:46 PM Jason Wang <jasowang@redhat.com> wrote: >> >> On 2019/9/23 上午1:43, Matt Cover wrote: >>> On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin <mst@redhat.com> wrote: >>>> On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: >>>>> Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal >>>>> to fallback to tun_automq_select_queue() for tx queue selection. >>>>> >>>>> Compilation of this exact patch was tested. >>>>> >>>>> For functional testing 3 additional printk()s were added. >>>>> >>>>> Functional testing results (on 2 txq tap device): >>>>> >>>>> [Fri Sep 20 18:33:27 2019] ========== tun no prog ========== >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran >>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog -1 ========== >>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran >>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 0 ========== >>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' >>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 1 ========== >>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' >>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 2 ========== >>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' >>>>> >>>>> Signed-off-by: Matthew Cover <matthew.cover@stackpath.com> >>>> Could you add a bit more motivation data here? >>> Thank you for these questions Michael. >>> >>> I'll plan on adding the below information to the >>> commit message and submitting a v2 of this patch >>> when net-next reopens. In the meantime, it would >>> be very helpful to know if these answers address >>> some of your concerns. >>> >>>> 1. why is this a good idea >>> This change allows TUNSETSTEERINGEBPF progs to >>> do any of the following. >>> 1. implement queue selection for a subset of >>> traffic (e.g. special queue selection logic >>> for ipv4, but return negative and use the >>> default automq logic for ipv6) >> >> Well, using ebpf means it need to take care of all the cases. E.g you >> can easily implement the fallback through eBPF as well. >> > I really think there is value in being > able to implement a scoped special > case while leaving the rest of the > packets in the kernel's hands. This is only work when some fucntion could not be done by eBPF itself and then we can provide the function through eBPF helpers. But this is not the case here. > > Having to reimplement automq makes > this hookpoint less accessible to > beginners and experienced alike. Note that automq itself is kind of complicated, it's best effort that is hard to be documented accurately. It has several limitations (e.g flow caches etc.) that may not work well in some conditions. It's not hard to implement a user programmable steering policy through maps which could have much deterministic behavior than automq. The goal of steering ebpf is to get rid of automq completely not partially rely on it. And I don't see how relying on automq can simplify anything. Thanks > >>> 2. determine there isn't sufficient information >>> to do proper queue selection; return >>> negative and use the default automq logic >>> for the unknown >> >> Same as above. >> >> >>> 3. implement a noop prog (e.g. do >>> bpf_trace_printk() then return negative and >>> use the default automq logic for everything) >> >> ditto. >> >> >>>> 2. how do we know existing userspace does not rely on existing behaviour >>> Prior to this change a negative return from a >>> TUNSETSTEERINGEBPF prog would have been cast >>> into a u16 and traversed netdev_cap_txqueue(). >>> >>> In most cases netdev_cap_txqueue() would have >>> found this value to exceed real_num_tx_queues >>> and queue_index would be updated to 0. >>> >>> It is possible that a TUNSETSTEERINGEBPF prog >>> return a negative value which when cast into a >>> u16 results in a positive queue_index less than >>> real_num_tx_queues. For example, on x86_64, a >>> return value of -65535 results in a queue_index >>> of 1; which is a valid queue for any multiqueue >>> device. >>> >>> It seems unlikely, however as stated above is >>> unfortunately possible, that existing >>> TUNSETSTEERINGEBPF programs would choose to >>> return a negative value rather than return the >>> positive value which holds the same meaning. >>> >>> It seems more likely that future >>> TUNSETSTEERINGEBPF programs would leverage a >>> negative return and potentially be loaded into >>> a kernel with the old behavior. >> >> Yes, eBPF can return probably wrong value, but what kernel did is just >> to make sure it doesn't harm anything. >> >> I would rather just drop the packet in this case. >> > In addition to TUN_SSE_ABORT, we can > add TUN_SSE_DROP. That could be made the > default for any undefined negative > return as well. > >> Thanks >> >> >>>> 3. why doesn't userspace need a way to figure out whether it runs on a kernel with and >>>> without this patch >>> There may be some value in exposing this fact >>> to the ebpf prog loader. What is the standard >>> practice here, a define? >>> >>>> thanks, >>>> MST >>>> >>>>> --- >>>>> drivers/net/tun.c | 20 +++++++++++--------- >>>>> 1 file changed, 11 insertions(+), 9 deletions(-) >>>>> >>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c >>>>> index aab0be4..173d159 100644 >>>>> --- a/drivers/net/tun.c >>>>> +++ b/drivers/net/tun.c >>>>> @@ -583,35 +583,37 @@ static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) >>>>> return txq; >>>>> } >>>>> >>>>> -static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) >>>>> +static int tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) >>>>> { >>>>> struct tun_prog *prog; >>>>> u32 numqueues; >>>>> - u16 ret = 0; >>>>> + int ret = -1; >>>>> >>>>> numqueues = READ_ONCE(tun->numqueues); >>>>> if (!numqueues) >>>>> return 0; >>>>> >>>>> + rcu_read_lock(); >>>>> prog = rcu_dereference(tun->steering_prog); >>>>> if (prog) >>>>> ret = bpf_prog_run_clear_cb(prog->prog, skb); >>>>> + rcu_read_unlock(); >>>>> >>>>> - return ret % numqueues; >>>>> + if (ret >= 0) >>>>> + ret %= numqueues; >>>>> + >>>>> + return ret; >>>>> } >>>>> >>>>> static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, >>>>> struct net_device *sb_dev) >>>>> { >>>>> struct tun_struct *tun = netdev_priv(dev); >>>>> - u16 ret; >>>>> + int ret; >>>>> >>>>> - rcu_read_lock(); >>>>> - if (rcu_dereference(tun->steering_prog)) >>>>> - ret = tun_ebpf_select_queue(tun, skb); >>>>> - else >>>>> + ret = tun_ebpf_select_queue(tun, skb); >>>>> + if (ret < 0) >>>>> ret = tun_automq_select_queue(tun, skb); >>>>> - rcu_read_unlock(); >>>>> >>>>> return ret; >>>>> } >>>>> -- >>>>> 1.8.3.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return 2019-09-23 2:32 ` Jason Wang @ 2019-09-23 3:00 ` Matt Cover 2019-09-23 5:08 ` Jason Wang 0 siblings, 1 reply; 21+ messages in thread From: Matt Cover @ 2019-09-23 3:00 UTC (permalink / raw) To: Jason Wang Cc: Michael S. Tsirkin, davem, ast, daniel, kafai, songliubraving, yhs, Eric Dumazet, Stanislav Fomichev, Matthew Cover, mail, pabeni, Nicolas Dichtel, wangli39, lifei.shirley, tglx, netdev, linux-kernel, bpf On Sun, Sep 22, 2019 at 7:32 PM Jason Wang <jasowang@redhat.com> wrote: > > > On 2019/9/23 上午9:20, Matt Cover wrote: > > On Sun, Sep 22, 2019 at 5:46 PM Jason Wang <jasowang@redhat.com> wrote: > >> > >> On 2019/9/23 上午1:43, Matt Cover wrote: > >>> On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin <mst@redhat.com> wrote: > >>>> On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: > >>>>> Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal > >>>>> to fallback to tun_automq_select_queue() for tx queue selection. > >>>>> > >>>>> Compilation of this exact patch was tested. > >>>>> > >>>>> For functional testing 3 additional printk()s were added. > >>>>> > >>>>> Functional testing results (on 2 txq tap device): > >>>>> > >>>>> [Fri Sep 20 18:33:27 2019] ========== tun no prog ========== > >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > >>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog -1 ========== > >>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' > >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' > >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran > >>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 0 ========== > >>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' > >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > >>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 1 ========== > >>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' > >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' > >>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 2 ========== > >>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' > >>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' > >>>>> > >>>>> Signed-off-by: Matthew Cover <matthew.cover@stackpath.com> > >>>> Could you add a bit more motivation data here? > >>> Thank you for these questions Michael. > >>> > >>> I'll plan on adding the below information to the > >>> commit message and submitting a v2 of this patch > >>> when net-next reopens. In the meantime, it would > >>> be very helpful to know if these answers address > >>> some of your concerns. > >>> > >>>> 1. why is this a good idea > >>> This change allows TUNSETSTEERINGEBPF progs to > >>> do any of the following. > >>> 1. implement queue selection for a subset of > >>> traffic (e.g. special queue selection logic > >>> for ipv4, but return negative and use the > >>> default automq logic for ipv6) > >> > >> Well, using ebpf means it need to take care of all the cases. E.g you > >> can easily implement the fallback through eBPF as well. > >> > > I really think there is value in being > > able to implement a scoped special > > case while leaving the rest of the > > packets in the kernel's hands. > > > This is only work when some fucntion could not be done by eBPF itself > and then we can provide the function through eBPF helpers. But this is > not the case here. > > > > > > Having to reimplement automq makes > > this hookpoint less accessible to > > beginners and experienced alike. > > > Note that automq itself is kind of complicated, it's best effort that is > hard to be documented accurately. It has several limitations (e.g flow > caches etc.) that may not work well in some conditions. > > It's not hard to implement a user programmable steering policy through > maps which could have much deterministic behavior than automq. The goal > of steering ebpf is to get rid of automq completely not partially rely > on it. > > And I don't see how relying on automq can simplify anything. > > Thanks > I'm not suggesting that we document automq. I'm suggesting that we add a return value which is documented as signaling to the kernel to implement whatever queue selection method is used when there is no ebpf prog attached. That behavior today is automq. There is nothing about this return value which would harder to change the default queue selection later. The default already exists today when there is no program loaded. > > > > >>> 2. determine there isn't sufficient information > >>> to do proper queue selection; return > >>> negative and use the default automq logic > >>> for the unknown > >> > >> Same as above. > >> > >> > >>> 3. implement a noop prog (e.g. do > >>> bpf_trace_printk() then return negative and > >>> use the default automq logic for everything) > >> > >> ditto. > >> > >> > >>>> 2. how do we know existing userspace does not rely on existing behaviour > >>> Prior to this change a negative return from a > >>> TUNSETSTEERINGEBPF prog would have been cast > >>> into a u16 and traversed netdev_cap_txqueue(). > >>> > >>> In most cases netdev_cap_txqueue() would have > >>> found this value to exceed real_num_tx_queues > >>> and queue_index would be updated to 0. > >>> > >>> It is possible that a TUNSETSTEERINGEBPF prog > >>> return a negative value which when cast into a > >>> u16 results in a positive queue_index less than > >>> real_num_tx_queues. For example, on x86_64, a > >>> return value of -65535 results in a queue_index > >>> of 1; which is a valid queue for any multiqueue > >>> device. > >>> > >>> It seems unlikely, however as stated above is > >>> unfortunately possible, that existing > >>> TUNSETSTEERINGEBPF programs would choose to > >>> return a negative value rather than return the > >>> positive value which holds the same meaning. > >>> > >>> It seems more likely that future > >>> TUNSETSTEERINGEBPF programs would leverage a > >>> negative return and potentially be loaded into > >>> a kernel with the old behavior. > >> > >> Yes, eBPF can return probably wrong value, but what kernel did is just > >> to make sure it doesn't harm anything. > >> > >> I would rather just drop the packet in this case. > >> > > In addition to TUN_SSE_ABORT, we can > > add TUN_SSE_DROP. That could be made the > > default for any undefined negative > > return as well. > > > >> Thanks > >> > >> > >>>> 3. why doesn't userspace need a way to figure out whether it runs on a kernel with and > >>>> without this patch > >>> There may be some value in exposing this fact > >>> to the ebpf prog loader. What is the standard > >>> practice here, a define? > >>> > >>>> thanks, > >>>> MST > >>>> > >>>>> --- > >>>>> drivers/net/tun.c | 20 +++++++++++--------- > >>>>> 1 file changed, 11 insertions(+), 9 deletions(-) > >>>>> > >>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c > >>>>> index aab0be4..173d159 100644 > >>>>> --- a/drivers/net/tun.c > >>>>> +++ b/drivers/net/tun.c > >>>>> @@ -583,35 +583,37 @@ static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) > >>>>> return txq; > >>>>> } > >>>>> > >>>>> -static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > >>>>> +static int tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) > >>>>> { > >>>>> struct tun_prog *prog; > >>>>> u32 numqueues; > >>>>> - u16 ret = 0; > >>>>> + int ret = -1; > >>>>> > >>>>> numqueues = READ_ONCE(tun->numqueues); > >>>>> if (!numqueues) > >>>>> return 0; > >>>>> > >>>>> + rcu_read_lock(); > >>>>> prog = rcu_dereference(tun->steering_prog); > >>>>> if (prog) > >>>>> ret = bpf_prog_run_clear_cb(prog->prog, skb); > >>>>> + rcu_read_unlock(); > >>>>> > >>>>> - return ret % numqueues; > >>>>> + if (ret >= 0) > >>>>> + ret %= numqueues; > >>>>> + > >>>>> + return ret; > >>>>> } > >>>>> > >>>>> static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, > >>>>> struct net_device *sb_dev) > >>>>> { > >>>>> struct tun_struct *tun = netdev_priv(dev); > >>>>> - u16 ret; > >>>>> + int ret; > >>>>> > >>>>> - rcu_read_lock(); > >>>>> - if (rcu_dereference(tun->steering_prog)) > >>>>> - ret = tun_ebpf_select_queue(tun, skb); > >>>>> - else > >>>>> + ret = tun_ebpf_select_queue(tun, skb); > >>>>> + if (ret < 0) > >>>>> ret = tun_automq_select_queue(tun, skb); > >>>>> - rcu_read_unlock(); > >>>>> > >>>>> return ret; > >>>>> } > >>>>> -- > >>>>> 1.8.3.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return 2019-09-23 3:00 ` Matt Cover @ 2019-09-23 5:08 ` Jason Wang 0 siblings, 0 replies; 21+ messages in thread From: Jason Wang @ 2019-09-23 5:08 UTC (permalink / raw) To: Matt Cover Cc: Michael S. Tsirkin, davem, ast, daniel, kafai, songliubraving, yhs, Eric Dumazet, Stanislav Fomichev, Matthew Cover, mail, pabeni, Nicolas Dichtel, wangli39, lifei.shirley, tglx, netdev, linux-kernel, bpf On 2019/9/23 上午11:00, Matt Cover wrote: > On Sun, Sep 22, 2019 at 7:32 PM Jason Wang <jasowang@redhat.com> wrote: >> >> On 2019/9/23 上午9:20, Matt Cover wrote: >>> On Sun, Sep 22, 2019 at 5:46 PM Jason Wang <jasowang@redhat.com> wrote: >>>> On 2019/9/23 上午1:43, Matt Cover wrote: >>>>> On Sun, Sep 22, 2019 at 5:37 AM Michael S. Tsirkin <mst@redhat.com> wrote: >>>>>> On Fri, Sep 20, 2019 at 11:58:43AM -0700, Matthew Cover wrote: >>>>>>> Treat a negative return from a TUNSETSTEERINGEBPF bpf prog as a signal >>>>>>> to fallback to tun_automq_select_queue() for tx queue selection. >>>>>>> >>>>>>> Compilation of this exact patch was tested. >>>>>>> >>>>>>> For functional testing 3 additional printk()s were added. >>>>>>> >>>>>>> Functional testing results (on 2 txq tap device): >>>>>>> >>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun no prog ========== >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran >>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog -1 ========== >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '-1' >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '-1' >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_automq_select_queue() ran >>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 0 ========== >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '0' >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' >>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 1 ========== >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '1' >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '1' >>>>>>> [Fri Sep 20 18:33:27 2019] ========== tun prog 2 ========== >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: bpf_prog_run_clear_cb() returned '2' >>>>>>> [Fri Sep 20 18:33:27 2019] tuntap: tun_ebpf_select_queue() returned '0' >>>>>>> >>>>>>> Signed-off-by: Matthew Cover <matthew.cover@stackpath.com> >>>>>> Could you add a bit more motivation data here? >>>>> Thank you for these questions Michael. >>>>> >>>>> I'll plan on adding the below information to the >>>>> commit message and submitting a v2 of this patch >>>>> when net-next reopens. In the meantime, it would >>>>> be very helpful to know if these answers address >>>>> some of your concerns. >>>>> >>>>>> 1. why is this a good idea >>>>> This change allows TUNSETSTEERINGEBPF progs to >>>>> do any of the following. >>>>> 1. implement queue selection for a subset of >>>>> traffic (e.g. special queue selection logic >>>>> for ipv4, but return negative and use the >>>>> default automq logic for ipv6) >>>> Well, using ebpf means it need to take care of all the cases. E.g you >>>> can easily implement the fallback through eBPF as well. >>>> >>> I really think there is value in being >>> able to implement a scoped special >>> case while leaving the rest of the >>> packets in the kernel's hands. >> >> This is only work when some fucntion could not be done by eBPF itself >> and then we can provide the function through eBPF helpers. But this is >> not the case here. >> >> >>> Having to reimplement automq makes >>> this hookpoint less accessible to >>> beginners and experienced alike. >> >> Note that automq itself is kind of complicated, it's best effort that is >> hard to be documented accurately. It has several limitations (e.g flow >> caches etc.) that may not work well in some conditions. >> >> It's not hard to implement a user programmable steering policy through >> maps which could have much deterministic behavior than automq. The goal >> of steering ebpf is to get rid of automq completely not partially rely >> on it. >> >> And I don't see how relying on automq can simplify anything. >> >> Thanks >> > I'm not suggesting that we document automq. > > I'm suggesting that we add a return value > which is documented as signaling to the > kernel to implement whatever queue > selection method is used when there is no > ebpf prog attached. Again, this only work when there's something that could not be done through eBPF. And then we can provide eBPF helper there. > That behavior today is > automq. Automq is not good, e.g tun_ebpf_select_queue() has already provided a fallback, anything that automq can do better than that? > > There is nothing about this return value > which would harder to change the default > queue selection later. The default already > exists today when there is no program > loaded. The patch depends on incorrect behavior of tuntap (updating flow caches when steering prog is set). I think it's wrong to update flow caches even when steering program is set which leads extra overhead. Will probably submit a patch to disable that behavior. Thanks > >>>>> 2. determine there isn't sufficient information >>>>> to do proper queue selection; return >>>>> negative and use the default automq logic >>>>> for the unknown >>>> Same as above. >>>> >>>> >>>>> 3. implement a noop prog (e.g. do >>>>> bpf_trace_printk() then return negative and >>>>> use the default automq logic for everything) >>>> ditto. >>>> >>>> >>>>>> 2. how do we know existing userspace does not rely on existing behaviour >>>>> Prior to this change a negative return from a >>>>> TUNSETSTEERINGEBPF prog would have been cast >>>>> into a u16 and traversed netdev_cap_txqueue(). >>>>> >>>>> In most cases netdev_cap_txqueue() would have >>>>> found this value to exceed real_num_tx_queues >>>>> and queue_index would be updated to 0. >>>>> >>>>> It is possible that a TUNSETSTEERINGEBPF prog >>>>> return a negative value which when cast into a >>>>> u16 results in a positive queue_index less than >>>>> real_num_tx_queues. For example, on x86_64, a >>>>> return value of -65535 results in a queue_index >>>>> of 1; which is a valid queue for any multiqueue >>>>> device. >>>>> >>>>> It seems unlikely, however as stated above is >>>>> unfortunately possible, that existing >>>>> TUNSETSTEERINGEBPF programs would choose to >>>>> return a negative value rather than return the >>>>> positive value which holds the same meaning. >>>>> >>>>> It seems more likely that future >>>>> TUNSETSTEERINGEBPF programs would leverage a >>>>> negative return and potentially be loaded into >>>>> a kernel with the old behavior. >>>> Yes, eBPF can return probably wrong value, but what kernel did is just >>>> to make sure it doesn't harm anything. >>>> >>>> I would rather just drop the packet in this case. >>>> >>> In addition to TUN_SSE_ABORT, we can >>> add TUN_SSE_DROP. That could be made the >>> default for any undefined negative >>> return as well. >>> >>>> Thanks >>>> >>>> >>>>>> 3. why doesn't userspace need a way to figure out whether it runs on a kernel with and >>>>>> without this patch >>>>> There may be some value in exposing this fact >>>>> to the ebpf prog loader. What is the standard >>>>> practice here, a define? >>>>> >>>>>> thanks, >>>>>> MST >>>>>> >>>>>>> --- >>>>>>> drivers/net/tun.c | 20 +++++++++++--------- >>>>>>> 1 file changed, 11 insertions(+), 9 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c >>>>>>> index aab0be4..173d159 100644 >>>>>>> --- a/drivers/net/tun.c >>>>>>> +++ b/drivers/net/tun.c >>>>>>> @@ -583,35 +583,37 @@ static u16 tun_automq_select_queue(struct tun_struct *tun, struct sk_buff *skb) >>>>>>> return txq; >>>>>>> } >>>>>>> >>>>>>> -static u16 tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) >>>>>>> +static int tun_ebpf_select_queue(struct tun_struct *tun, struct sk_buff *skb) >>>>>>> { >>>>>>> struct tun_prog *prog; >>>>>>> u32 numqueues; >>>>>>> - u16 ret = 0; >>>>>>> + int ret = -1; >>>>>>> >>>>>>> numqueues = READ_ONCE(tun->numqueues); >>>>>>> if (!numqueues) >>>>>>> return 0; >>>>>>> >>>>>>> + rcu_read_lock(); >>>>>>> prog = rcu_dereference(tun->steering_prog); >>>>>>> if (prog) >>>>>>> ret = bpf_prog_run_clear_cb(prog->prog, skb); >>>>>>> + rcu_read_unlock(); >>>>>>> >>>>>>> - return ret % numqueues; >>>>>>> + if (ret >= 0) >>>>>>> + ret %= numqueues; >>>>>>> + >>>>>>> + return ret; >>>>>>> } >>>>>>> >>>>>>> static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb, >>>>>>> struct net_device *sb_dev) >>>>>>> { >>>>>>> struct tun_struct *tun = netdev_priv(dev); >>>>>>> - u16 ret; >>>>>>> + int ret; >>>>>>> >>>>>>> - rcu_read_lock(); >>>>>>> - if (rcu_dereference(tun->steering_prog)) >>>>>>> - ret = tun_ebpf_select_queue(tun, skb); >>>>>>> - else >>>>>>> + ret = tun_ebpf_select_queue(tun, skb); >>>>>>> + if (ret < 0) >>>>>>> ret = tun_automq_select_queue(tun, skb); >>>>>>> - rcu_read_unlock(); >>>>>>> >>>>>>> return ret; >>>>>>> } >>>>>>> -- >>>>>>> 1.8.3.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2019-09-25 10:33 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-09-20 18:58 [PATCH net-next] tuntap: Fallback to automq on TUNSETSTEERINGEBPF prog negative return Matthew Cover 2019-09-20 19:45 ` Matt Cover 2019-09-22 12:37 ` Michael S. Tsirkin 2019-09-22 17:43 ` Matt Cover 2019-09-22 20:35 ` Michael S. Tsirkin 2019-09-22 22:30 ` Matt Cover 2019-09-22 22:46 ` Matt Cover 2019-09-23 0:28 ` Matt Cover 2019-09-25 10:33 ` Michael S. Tsirkin 2019-09-23 0:51 ` Jason Wang 2019-09-23 1:15 ` Matt Cover 2019-09-23 2:34 ` Jason Wang 2019-09-23 3:18 ` Matt Cover 2019-09-23 5:15 ` Jason Wang 2019-09-23 16:31 ` Matt Cover 2019-09-25 4:08 ` Jason Wang 2019-09-23 0:46 ` Jason Wang 2019-09-23 1:20 ` Matt Cover 2019-09-23 2:32 ` Jason Wang 2019-09-23 3:00 ` Matt Cover 2019-09-23 5:08 ` Jason Wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).