All of lore.kernel.org
 help / color / mirror / Atom feed
* Bypass qdiscs?
@ 2023-11-03 23:55 John Ousterhout
  2023-11-04  9:24 ` Ferenc Fejes
  2023-11-04 15:08 ` Andrew Lunn
  0 siblings, 2 replies; 12+ messages in thread
From: John Ousterhout @ 2023-11-03 23:55 UTC (permalink / raw)
  To: netdev; +Cc: John Ousterhout

Is there a way to mark an skb (or its socket) before invoking
ip_queue_xmit/ip6_xmit so that the packet will bypass the qdiscs and
be transmitted immediately? Is doing such a thing considered bad
practice?

(Homa has its own packet scheduling mechanism so the qdiscs are just
getting in the way and adding delays)

-John-

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bypass qdiscs?
  2023-11-03 23:55 Bypass qdiscs? John Ousterhout
@ 2023-11-04  9:24 ` Ferenc Fejes
  2023-11-04 15:08 ` Andrew Lunn
  1 sibling, 0 replies; 12+ messages in thread
From: Ferenc Fejes @ 2023-11-04  9:24 UTC (permalink / raw)
  To: John Ousterhout, netdev

Hi!

On Fri, 2023-11-03 at 16:55 -0700, John Ousterhout wrote:
> Is there a way to mark an skb (or its socket) before invoking
> ip_queue_xmit/ip6_xmit so that the packet will bypass the qdiscs and
> be transmitted immediately? Is doing such a thing considered bad
> practice?

I'm not aware if we have such thing aside from the AF_PACKET's flag 
PACKET_QDISC_BYPASS [1,2]. I think the function packet_xmit [3]
utilizing that flag can be reused for your needs as well.

> 
> (Homa has its own packet scheduling mechanism so the qdiscs are just
> getting in the way and adding delays)
> 
> -John-

Best,
Ferenc

[1] https://man7.org/linux/man-pages/man7/packet.7.html
[2]
https://elixir.bootlin.com/linux/v6.6/source/net/packet/af_packet.c#L4026
[3]
https://elixir.bootlin.com/linux/v6.6/source/net/packet/af_packet.c#L273

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bypass qdiscs?
  2023-11-03 23:55 Bypass qdiscs? John Ousterhout
  2023-11-04  9:24 ` Ferenc Fejes
@ 2023-11-04 15:08 ` Andrew Lunn
  2023-11-05  2:47   ` John Ousterhout
  1 sibling, 1 reply; 12+ messages in thread
From: Andrew Lunn @ 2023-11-04 15:08 UTC (permalink / raw)
  To: John Ousterhout; +Cc: netdev

On Fri, Nov 03, 2023 at 04:55:35PM -0700, John Ousterhout wrote:
> Is there a way to mark an skb (or its socket) before invoking
> ip_queue_xmit/ip6_xmit so that the packet will bypass the qdiscs and
> be transmitted immediately? Is doing such a thing considered bad
> practice?
> 
> (Homa has its own packet scheduling mechanism so the qdiscs are just
> getting in the way and adding delays)

Hi John

One thing to think about is what happens when hardware starts
supporting Homa. Can the packet scheduling be moved into the hardware?
Ideally you want to make use of the existing mechanisms to offload
scheduling to the hardware, rather than add a Homa specific one.

Did you try adding a Homa specific qdisc implementing the scheduling
algorithm? Did it kill performance? We prefer to try to fix problems,
rather than bypass them.

       Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bypass qdiscs?
  2023-11-04 15:08 ` Andrew Lunn
@ 2023-11-05  2:47   ` John Ousterhout
  2023-11-06  3:23     ` Stephen Hemminger
  0 siblings, 1 reply; 12+ messages in thread
From: John Ousterhout @ 2023-11-05  2:47 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev

I haven't tried creating a "pass through" qdisc, but that seems like a
reasonable approach if (as it seems) there isn't something already
built-in that provides equivalent functionality.

-John-

P.S. If hardware starts supporting Homa, I hope that it will be
possible to move the entire transport to the NIC, so that applications
can bypass the kernel entirely, as with RDMA.

On Sat, Nov 4, 2023 at 8:08 AM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Fri, Nov 03, 2023 at 04:55:35PM -0700, John Ousterhout wrote:
> > Is there a way to mark an skb (or its socket) before invoking
> > ip_queue_xmit/ip6_xmit so that the packet will bypass the qdiscs and
> > be transmitted immediately? Is doing such a thing considered bad
> > practice?
> >
> > (Homa has its own packet scheduling mechanism so the qdiscs are just
> > getting in the way and adding delays)
>
> Hi John
>
> One thing to think about is what happens when hardware starts
> supporting Homa. Can the packet scheduling be moved into the hardware?
> Ideally you want to make use of the existing mechanisms to offload
> scheduling to the hardware, rather than add a Homa specific one.
>
> Did you try adding a Homa specific qdisc implementing the scheduling
> algorithm? Did it kill performance? We prefer to try to fix problems,
> rather than bypass them.
>
>        Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bypass qdiscs?
  2023-11-05  2:47   ` John Ousterhout
@ 2023-11-06  3:23     ` Stephen Hemminger
  2023-11-06  4:27       ` David Ahern
  0 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2023-11-06  3:23 UTC (permalink / raw)
  To: John Ousterhout; +Cc: Andrew Lunn, netdev

On Sat, 4 Nov 2023 19:47:30 -0700
John Ousterhout <ouster@cs.stanford.edu> wrote:

> I haven't tried creating a "pass through" qdisc, but that seems like a
> reasonable approach if (as it seems) there isn't something already
> built-in that provides equivalent functionality.
> 
> -John-
> 
> P.S. If hardware starts supporting Homa, I hope that it will be
> possible to move the entire transport to the NIC, so that applications
> can bypass the kernel entirely, as with RDMA.

One old trick was setting netdev queue length to 0 to avoid qdisc.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bypass qdiscs?
  2023-11-06  3:23     ` Stephen Hemminger
@ 2023-11-06  4:27       ` David Ahern
  2023-11-06 16:12         ` Jamal Hadi Salim
  2023-11-08 16:50         ` John Ousterhout
  0 siblings, 2 replies; 12+ messages in thread
From: David Ahern @ 2023-11-06  4:27 UTC (permalink / raw)
  To: Stephen Hemminger, John Ousterhout; +Cc: Andrew Lunn, netdev

On 11/5/23 8:23 PM, Stephen Hemminger wrote:
> On Sat, 4 Nov 2023 19:47:30 -0700
> John Ousterhout <ouster@cs.stanford.edu> wrote:
> 
>> I haven't tried creating a "pass through" qdisc, but that seems like a
>> reasonable approach if (as it seems) there isn't something already
>> built-in that provides equivalent functionality.
>>
>> -John-
>>
>> P.S. If hardware starts supporting Homa, I hope that it will be
>> possible to move the entire transport to the NIC, so that applications
>> can bypass the kernel entirely, as with RDMA.
> 
> One old trick was setting netdev queue length to 0 to avoid qdisc.
> 

tc qdisc replace dev <name> root noqueue

should work

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bypass qdiscs?
  2023-11-06  4:27       ` David Ahern
@ 2023-11-06 16:12         ` Jamal Hadi Salim
  2023-11-06 16:17           ` Jamal Hadi Salim
  2023-11-08 16:50         ` John Ousterhout
  1 sibling, 1 reply; 12+ messages in thread
From: Jamal Hadi Salim @ 2023-11-06 16:12 UTC (permalink / raw)
  To: David Ahern; +Cc: Stephen Hemminger, John Ousterhout, Andrew Lunn, netdev

On Sun, Nov 5, 2023 at 11:27 PM David Ahern <dsahern@kernel.org> wrote:
>
> On 11/5/23 8:23 PM, Stephen Hemminger wrote:
> > On Sat, 4 Nov 2023 19:47:30 -0700
> > John Ousterhout <ouster@cs.stanford.edu> wrote:
> >
> >> I haven't tried creating a "pass through" qdisc, but that seems like a
> >> reasonable approach if (as it seems) there isn't something already
> >> built-in that provides equivalent functionality.
> >>
> >> -John-
> >>
> >> P.S. If hardware starts supporting Homa, I hope that it will be
> >> possible to move the entire transport to the NIC, so that applications
> >> can bypass the kernel entirely, as with RDMA.
> >
> > One old trick was setting netdev queue length to 0 to avoid qdisc.
> >
>
> tc qdisc replace dev <name> root noqueue
>
> should work

John,
IIUC,  Homa transmit is done by  a pacer that ensures the packets are
scheduled without forming the queues in the NIC. So what David said
above should be sufficient setup.

cheers,
jamal
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bypass qdiscs?
  2023-11-06 16:12         ` Jamal Hadi Salim
@ 2023-11-06 16:17           ` Jamal Hadi Salim
  2023-11-06 16:51             ` David Ahern
  0 siblings, 1 reply; 12+ messages in thread
From: Jamal Hadi Salim @ 2023-11-06 16:17 UTC (permalink / raw)
  To: David Ahern; +Cc: Stephen Hemminger, John Ousterhout, Andrew Lunn, netdev

On Mon, Nov 6, 2023 at 11:12 AM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On Sun, Nov 5, 2023 at 11:27 PM David Ahern <dsahern@kernel.org> wrote:
> >
> > On 11/5/23 8:23 PM, Stephen Hemminger wrote:
> > > On Sat, 4 Nov 2023 19:47:30 -0700
> > > John Ousterhout <ouster@cs.stanford.edu> wrote:
> > >
> > >> I haven't tried creating a "pass through" qdisc, but that seems like a
> > >> reasonable approach if (as it seems) there isn't something already
> > >> built-in that provides equivalent functionality.
> > >>
> > >> -John-
> > >>
> > >> P.S. If hardware starts supporting Homa, I hope that it will be
> > >> possible to move the entire transport to the NIC, so that applications
> > >> can bypass the kernel entirely, as with RDMA.
> > >
> > > One old trick was setting netdev queue length to 0 to avoid qdisc.
> > >
> >
> > tc qdisc replace dev <name> root noqueue
> >
> > should work
>
> John,
> IIUC,  Homa transmit is done by  a pacer that ensures the packets are
> scheduled without forming the queues in the NIC. So what David said
> above should be sufficient setup.


BTW, Homa in-kernel instead of bypass is a better approach because you
get the advantages of all other infra that the kernel offers..

cheers,
jamal

> cheers,
> jamal
> >

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bypass qdiscs?
  2023-11-06 16:17           ` Jamal Hadi Salim
@ 2023-11-06 16:51             ` David Ahern
  0 siblings, 0 replies; 12+ messages in thread
From: David Ahern @ 2023-11-06 16:51 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: Stephen Hemminger, John Ousterhout, Andrew Lunn, netdev

On 11/6/23 9:17 AM, Jamal Hadi Salim wrote:
> BTW, Homa in-kernel instead of bypass is a better approach because you
> get the advantages of all other infra that the kernel offers..

yes. The memcpy that Homa currently needs (based on the netdevconf talk)
should be avoidable using the recent page pool work (RFCs).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bypass qdiscs?
  2023-11-06  4:27       ` David Ahern
  2023-11-06 16:12         ` Jamal Hadi Salim
@ 2023-11-08 16:50         ` John Ousterhout
  2023-11-08 17:17           ` David Ahern
  1 sibling, 1 reply; 12+ messages in thread
From: John Ousterhout @ 2023-11-08 16:50 UTC (permalink / raw)
  To: David Ahern; +Cc: Stephen Hemminger, Andrew Lunn, netdev

Hi David,

Thanks for the suggestion, but if I understand this correctly, this
will disable qdiscs for TCP as well as Homa; I suspect I shouldn't do
that?

-John-


On Sun, Nov 5, 2023 at 8:27 PM David Ahern <dsahern@kernel.org> wrote:
>
> On 11/5/23 8:23 PM, Stephen Hemminger wrote:
> > On Sat, 4 Nov 2023 19:47:30 -0700
> > John Ousterhout <ouster@cs.stanford.edu> wrote:
> >
> >> I haven't tried creating a "pass through" qdisc, but that seems like a
> >> reasonable approach if (as it seems) there isn't something already
> >> built-in that provides equivalent functionality.
> >>
> >> -John-
> >>
> >> P.S. If hardware starts supporting Homa, I hope that it will be
> >> possible to move the entire transport to the NIC, so that applications
> >> can bypass the kernel entirely, as with RDMA.
> >
> > One old trick was setting netdev queue length to 0 to avoid qdisc.
> >
>
> tc qdisc replace dev <name> root noqueue
>
> should work

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Bypass qdiscs?
  2023-11-08 16:50         ` John Ousterhout
@ 2023-11-08 17:17           ` David Ahern
  2023-11-09 17:50             ` David Laight
  0 siblings, 1 reply; 12+ messages in thread
From: David Ahern @ 2023-11-08 17:17 UTC (permalink / raw)
  To: John Ousterhout; +Cc: Stephen Hemminger, Andrew Lunn, netdev

On 11/8/23 9:50 AM, John Ousterhout wrote:
> Hi David,
> 
> Thanks for the suggestion, but if I understand this correctly, this
> will disable qdiscs for TCP as well as Homa; I suspect I shouldn't do
> that?
> 

A means to separate issues - i.e., run Homa tests without qdisc overhead
or delays. You can worry about how to handle if/when you start
upstreaming the code.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: Bypass qdiscs?
  2023-11-08 17:17           ` David Ahern
@ 2023-11-09 17:50             ` David Laight
  0 siblings, 0 replies; 12+ messages in thread
From: David Laight @ 2023-11-09 17:50 UTC (permalink / raw)
  To: 'David Ahern', John Ousterhout
  Cc: Stephen Hemminger, Andrew Lunn, netdev

From: David Ahern
> Sent: 08 November 2023 17:17
> 
> On 11/8/23 9:50 AM, John Ousterhout wrote:
> > Hi David,
> >
> > Thanks for the suggestion, but if I understand this correctly, this
> > will disable qdiscs for TCP as well as Homa; I suspect I shouldn't do
> > that?
> >
> 
> A means to separate issues - i.e., run Homa tests without qdisc overhead
> or delays. You can worry about how to handle if/when you start
> upstreaming the code.

Isn't the qdisc overhead pretty minimal most of the time anyway?
If I send a RAW_IP (and probably UDP) packet the ethernet MAC
packet setup (etc) is normally done by direct calls from the
process calling sendmsg().

If two threads call sendmsg (on different sockets) at the same
time something has to give somewhere.
To avoid stalling the 2nd thread, the packet gets queued and is
picked up by the first thread before it returns.

To bypass the qdisc wouldn't you need a MAC driver that can
processes multiple transmit setup requests in parallel?
It can be done for a simple memory ring based interface - just
use a lock to grab the required slots in the transmit ring.
Then it doesn't matter which order setups complete in.
But I don't think Linux makes that easy to write.

Transmit flow control will also require queueing (or discard).
If Homa and TCP are sharing a physical network then surely the
TCP traffic can cause flow control issue for both?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-11-09 17:51 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-03 23:55 Bypass qdiscs? John Ousterhout
2023-11-04  9:24 ` Ferenc Fejes
2023-11-04 15:08 ` Andrew Lunn
2023-11-05  2:47   ` John Ousterhout
2023-11-06  3:23     ` Stephen Hemminger
2023-11-06  4:27       ` David Ahern
2023-11-06 16:12         ` Jamal Hadi Salim
2023-11-06 16:17           ` Jamal Hadi Salim
2023-11-06 16:51             ` David Ahern
2023-11-08 16:50         ` John Ousterhout
2023-11-08 17:17           ` David Ahern
2023-11-09 17:50             ` David Laight

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.