All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC - VXLAN port range facility
@ 2013-05-30 12:40 David Stevens
  2013-05-30 16:41 ` Stephen Hemminger
  0 siblings, 1 reply; 14+ messages in thread
From: David Stevens @ 2013-05-30 12:40 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

Stephen,
        I think there are some issues with the port range facility in
VXLAN. Currently, it picks a random port from a wide range (nearly half 
the
port space) and uses that random value as a source port for a generated
UDP packet.
        There are no checks to see if the port is in use by something 
else.

        I can see the value of using a range of ports, but::

1) VXLAN should use its listen port by default
2) VXLAN should actually bind to any source ports it uses, because...
3) VXLAN should never use a port already exclusively in use by something 
else.

As is, VXLAN is not playing well with other UDP users because, for 
example,
it can trigger ICMP errors which will be delivered to some unwitting 
application
whose port it has hijacked.

I think a port ranges may be useful in the context of a small number (say 
10)
of ports that you are actually bound to, so then as part of multi-port 
binding
support. But then a default range of 32K-61K is too large. It then, at 
least,
has the potential to manage ICMP errors triggered by VXLAN.

I think if you want to use port ranges like a range of empheremal ports, 
it's
less useful, but at a minimum it should be a port that you can legally 
bind
to at the time it's in use. Since actually binding/unbinding for each 
packet
would probably be too expensive, I think it'd be better to:

1) use smaller ranges by default
2) actually bind to the entire range on start-up, to prevent other apps 
from using them
3) fail if any in the range is already bound
4) then, with a range of bound ports, select as currently on sends

                                                +-DLS

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC - VXLAN port range facility
  2013-05-30 12:40 RFC - VXLAN port range facility David Stevens
@ 2013-05-30 16:41 ` Stephen Hemminger
  2013-05-30 18:00   ` David Stevens
  2013-05-30 19:33   ` Ben Hutchings
  0 siblings, 2 replies; 14+ messages in thread
From: Stephen Hemminger @ 2013-05-30 16:41 UTC (permalink / raw)
  To: David Stevens; +Cc: netdev

On Thu, 30 May 2013 08:40:56 -0400
David Stevens <dlstevens@us.ibm.com> wrote:

> Stephen,
>         I think there are some issues with the port range facility in
> VXLAN. Currently, it picks a random port from a wide range (nearly half 
> the
> port space) and uses that random value as a source port for a generated
> UDP packet.
>         There are no checks to see if the port is in use by something 
> else.
> 
>         I can see the value of using a range of ports, but::
> 
> 1) VXLAN should use its listen port by default
> 2) VXLAN should actually bind to any source ports it uses, because...
> 3) VXLAN should never use a port already exclusively in use by something 
> else.

1. The receiver has to match based on src/dst port tuple anyway.
   So it doesn't matter a whole lot.

2. Current behaviour is in the RFC.

3. The choice of source port should follow same rules of other UDP
   send when port is not bound. I am worried that binding lots of ephemeral ports
   up will easily clog the space when there are lots of destinations.

4. The problem with binding for each destination used would be that
   it would mean having a socket for each destination, which would get
   resource intensive.


> As is, VXLAN is not playing well with other UDP users because, for 
> example,
> it can trigger ICMP errors which will be delivered to some unwitting 
> application
> whose port it has hijacked.

The source port is not related to what some application receives.
A RFC conforming VXLAN endpoint will never send traffic back tot the senders source
port. If VXLAN traffic got an ICMP response form an router like
DESTINATION_UNREACHABLE there should be a match on destination port as well.

> I think a port ranges may be useful in the context of a small number (say 
> 10)
> of ports that you are actually bound to, so then as part of multi-port 
> binding
> support. But then a default range of 32K-61K is too large. It then, at 
> least,
> has the potential to manage ICMP errors triggered by VXLAN.

Port ranges are critical to retaining scaling in multi-path infrastructure.
Otherwise all traffic will arrive on a single queue in NIC.

> 
> I think if you want to use port ranges like a range of empheremal ports, 
> it's
> less useful, but at a minimum it should be a port that you can legally 
> bind
> to at the time it's in use. Since actually binding/unbinding for each 
> packet
> would probably be too expensive, I think it'd be better to:

The normal source port range should be the (if not overriden)
is the UDP ephemeral port range.

> 1) use smaller ranges by default
> 2) actually bind to the entire range on start-up, to prevent other apps 
> from using them
> 3) fail if any in the range is already bound
> 4) then, with a range of bound ports, select as currently on sends

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC - VXLAN port range facility
  2013-05-30 16:41 ` Stephen Hemminger
@ 2013-05-30 18:00   ` David Stevens
  2013-05-31  6:09     ` Jesse Gross
  2013-05-31 16:13     ` Stephen Hemminger
  2013-05-30 19:33   ` Ben Hutchings
  1 sibling, 2 replies; 14+ messages in thread
From: David Stevens @ 2013-05-30 18:00 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, netdev-owner

netdev-owner@vger.kernel.org wrote on 05/30/2013 12:41:41 PM:

> From: Stephen Hemminger <stephen@networkplumber.org>
> 
> The source port is not related to what some application receives.
> A RFC conforming VXLAN endpoint will never send traffic back tot the
> senders source
> port. If VXLAN traffic got an ICMP response form an router like
> DESTINATION_UNREACHABLE there should be a match on destination port as 
well.

It'd be sent to the source port of the sender and it need not be bound to
a remote port -- for example, a netfilter rule blocking the destination
would send an administratively-prohibited ICMP error to a UDP
application that did not trigger the traffic that caused the error.

Quite simply, VXLAN uses UDP so it needs to follow the rules of UDP.

But I don't think there's particular advantage in splitting it up 30,000
ways when 10 ways would be both practical, for binding, and spread
traffic to 10 flows potentially.

> 
> Port ranges are critical to retaining scaling in multi-path 
infrastructure.
> Otherwise all traffic will arrive on a single queue in NIC.

I agree, but I don't think there need to be so many, and whatever
the VXLAN draft says, it can't  start causing problems for existing
UDP applications. VXLAN simply shouldn't be using ports allocated
to other applications; it can only do this because the implementation
is not using the real UDP stack to send the packets, but it still
should follow the rules  just as all other UDP servers must do. How
that's enforced is an implementation issue, but I think it must be
enforced.

I think it should be binding to all ports it uses as sources, and
the default range should be min and max of the bind port. People
who need to spread flows have the capability, but that doesn't mean
they need it on by default, or that it has to be so many, or that
it has carte blanche to interfere with existing UDP applications.

                                                                +-DLS

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC - VXLAN port range facility
  2013-05-30 16:41 ` Stephen Hemminger
  2013-05-30 18:00   ` David Stevens
@ 2013-05-30 19:33   ` Ben Hutchings
  1 sibling, 0 replies; 14+ messages in thread
From: Ben Hutchings @ 2013-05-30 19:33 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Stevens, netdev

On Thu, 2013-05-30 at 09:41 -0700, Stephen Hemminger wrote:
> On Thu, 30 May 2013 08:40:56 -0400
> David Stevens <dlstevens@us.ibm.com> wrote:
[...]
> > I think a port ranges may be useful in the context of a small number (say 
> > 10)
> > of ports that you are actually bound to, so then as part of multi-port 
> > binding
> > support. But then a default range of 32K-61K is too large. It then, at 
> > least,
> > has the potential to manage ICMP errors triggered by VXLAN.
> 
> Port ranges are critical to retaining scaling in multi-path infrastructure.
> Otherwise all traffic will arrive on a single queue in NIC.
[...]

If you include UDP port numbers in the flow hash, a UDP flow with a
mixture of fragmented and unfragmented datagrams is likely to be split
between two queues.  Most multiqueue NICs follow the Microsoft RSS spec
which says to include the port numbers in the flow hash for TCP only.
Some hardware and drivers provide the option to override the hashing
behaviour for UDP or to steer by port number, but perhaps it would be
more portable to support the use of a range of IP addresses?

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC - VXLAN port range facility
  2013-05-30 18:00   ` David Stevens
@ 2013-05-31  6:09     ` Jesse Gross
  2013-05-31 12:26       ` David Stevens
  2013-05-31 16:13     ` Stephen Hemminger
  1 sibling, 1 reply; 14+ messages in thread
From: Jesse Gross @ 2013-05-31  6:09 UTC (permalink / raw)
  To: David Stevens; +Cc: Stephen Hemminger, netdev

On Fri, May 31, 2013 at 3:00 AM, David Stevens <dlstevens@us.ibm.com> wrote:
> But I don't think there's particular advantage in splitting it up 30,000
> ways when 10 ways would be both practical, for binding, and spread
> traffic to 10 flows potentially.

Most people that run large data centers think that 16 bits of entropy
is barely sufficient. The issue is not CPUs or link aggregation but
Clos fabrics built using ECMP.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC - VXLAN port range facility
  2013-05-31  6:09     ` Jesse Gross
@ 2013-05-31 12:26       ` David Stevens
  2013-06-01  6:39         ` Jesse Gross
  0 siblings, 1 reply; 14+ messages in thread
From: David Stevens @ 2013-05-31 12:26 UTC (permalink / raw)
  To: Jesse Gross; +Cc: netdev, Stephen Hemminger

Jesse Gross <jesse@nicira.com> wrote on 05/31/2013 02:09:34 AM:

> On Fri, May 31, 2013 at 3:00 AM, David Stevens <dlstevens@us.ibm.com> 
wrote:
> > But I don't think there's particular advantage in splitting it up 
30,000
> > ways when 10 ways would be both practical, for binding, and spread
> > traffic to 10 flows potentially.
> 
> Most people that run large data centers think that 16 bits of entropy
> is barely sufficient. The issue is not CPUs or link aggregation but
> Clos fabrics built using ECMP.
> 

And most people running embedded systems wouldn't want to bind to
30,000 sockets by default, which is the proper way for VXLAN to
interact with UDP.

A casual user of VXLAN between a couple of small machines on ordinary
Ethernet generally won't require multiple ports at all.

I think the default case should lean towards the low end, and the
mechanisms are there to tune the high end.

                                                        +-DLS

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC - VXLAN port range facility
  2013-05-30 18:00   ` David Stevens
  2013-05-31  6:09     ` Jesse Gross
@ 2013-05-31 16:13     ` Stephen Hemminger
  2013-05-31 17:08       ` David Stevens
  1 sibling, 1 reply; 14+ messages in thread
From: Stephen Hemminger @ 2013-05-31 16:13 UTC (permalink / raw)
  To: David Stevens; +Cc: netdev, netdev-owner

On Thu, 30 May 2013 14:00:51 -0400
David Stevens <dlstevens@us.ibm.com> wrote:

> netdev-owner@vger.kernel.org wrote on 05/30/2013 12:41:41 PM:
> 
> > From: Stephen Hemminger <stephen@networkplumber.org>
> > 
> > The source port is not related to what some application receives.
> > A RFC conforming VXLAN endpoint will never send traffic back tot the
> > senders source
> > port. If VXLAN traffic got an ICMP response form an router like
> > DESTINATION_UNREACHABLE there should be a match on destination port as 
> well.
> 
> It'd be sent to the source port of the sender and it need not be bound to
> a remote port -- for example, a netfilter rule blocking the destination
> would send an administratively-prohibited ICMP error to a UDP
> application that did not trigger the traffic that caused the error.
> 
> Quite simply, VXLAN uses UDP so it needs to follow the rules of UDP.
> 
> But I don't think there's particular advantage in splitting it up 30,000
> ways when 10 ways would be both practical, for binding, and spread
> traffic to 10 flows potentially.
>

RFC text:
 Outer UDP Header:  This is the outer UDP header with a source
        port provided by the VTEP and the destination port being a well
        known UDP port to be obtained by IANA assignment. It is recommended
        that the source port be a hash of the inner Ethernet frame's headers
        to obtain a level of entropy for ECMP/load balancing of the VM to VM
        traffic across the VXLAN overlay.


You can restrict to a smaller range if that is a requirement of your infrastructure.

Normal UDP applications assign their source port from the ephemeral port range,
so that is what VXLAN does.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC - VXLAN port range facility
  2013-05-31 16:13     ` Stephen Hemminger
@ 2013-05-31 17:08       ` David Stevens
  2013-05-31 17:22         ` Stephen Hemminger
  0 siblings, 1 reply; 14+ messages in thread
From: David Stevens @ 2013-05-31 17:08 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, netdev-owner

Stephen Hemminger <stephen@networkplumber.org> wrote on 05/31/2013 
12:13:38 PM:

> 
> RFC text:
>  Outer UDP Header:  This is the outer UDP header with a source
>         port provided by the VTEP and the destination port being a well
>         known UDP port to be obtained by IANA assignment. It is 
recommended
>         that the source port be a hash of the inner Ethernet frame's 
headers
>         to obtain a level of entropy for ECMP/load balancing of the VM 
to VM
>         traffic across the VXLAN overlay.
> 
> 
> You can restrict to a smaller range if that is a requirement of your
> infrastructure.

        I'm suggesting the smaller range, because the fix for the part
that is broken would become a resource issue for the current, larger
default range.
        [and a "recommended" in a draft doesn't trump 35 years of UDP
                usage, even if it did say not to bind the ports...]
 
> Normal UDP applications assign their source port from the ephemeral 
> port range,
> so that is what VXLAN does.

        Normal UDP applications bind to the source port. If they are
unbound, they bind just for the send and then unbind after. They
cannot use a port already bound _because_the_bind_prohibits_it.
        That is, in fact, the entire issue I'm raising. (!) If I have
a UDP application that binds to port 35000, no other UDP application
will ever use that port until I release it, and any ICMP errors delivered
to my socket are triggered by my application.
        That became no longer true with the addition of VXLAN port ranges,
because VXLAN does not use UDP bind, or any of the UDP code, to enforce
this. It simply generates a random number in the range, which _can_be_
35000 or any other bound port, and then sends its own, constructed UDP
header using that port.

        The proper way to fix this would be to actually bind to a port in
the range, and retry another port if the binding fails, until the binding
succeeds. But as VXLAN picks a randomized source port _for_each_packet_,
I'm not suggesting we do that.
        I'm suggesting, instead, that we bind on all the source ports we
will use at start-up, which then reserves those ports for VXLAN and
prevents anyone else from binding on them.
        That solves the issue of binding and unbinding on each packet,
but I am not then suggesting that VXLAN should bind on 30,000 ports on
start-up. That would be silly, especially on a system whose primary 
function
is not VXLAN.
        So, the logical next question is: does VXLAN really need a range
of 30,000 ports as the "normal" circumstance? I think the answer to that
is definitely "no." In fact, just one port would work fine a lot of the
time, and when multiple ports are needed, the capability is still there.
That suggests changing the *default* range (I suggest to 1 port).

        My conclusions from that reasoning:

1) VXLAN use of UDP source ports is broken; it cannot use ports that are
        already bound, and right now it does
2) while a bind/unbind would work, doing that on every packet is slow

so,

3) the default port range should be much smaller and VXLAN should bind
        in advance to the set of ports it wants to use.

Now, maybe it wouldn't kill performance, and so doing a bind/unbind per
packet is still an option, but that would definitely hurt performance
for people who don't actually care about port entropy.

Whether solved by a bind/unbind, pre-binding to a smaller default port
range, or a switch between the two, I think VXLAN *must* follow the
rules in its use of UDP and ensure that it doesn't send using source
ports in use by something else. It can't just generate a random one
and use it without checking it, as it does now.

                                                                +-DLS

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC - VXLAN port range facility
  2013-05-31 17:08       ` David Stevens
@ 2013-05-31 17:22         ` Stephen Hemminger
  2013-05-31 18:19           ` David Stevens
  0 siblings, 1 reply; 14+ messages in thread
From: Stephen Hemminger @ 2013-05-31 17:22 UTC (permalink / raw)
  To: David Stevens; +Cc: netdev, netdev-owner

On Fri, 31 May 2013 13:08:56 -0400
David Stevens <dlstevens@us.ibm.com> wrote:

> Stephen Hemminger <stephen@networkplumber.org> wrote on 05/31/2013 
> 12:13:38 PM:
> 
> > 
> > RFC text:
> >  Outer UDP Header:  This is the outer UDP header with a source
> >         port provided by the VTEP and the destination port being a well
> >         known UDP port to be obtained by IANA assignment. It is 
> recommended
> >         that the source port be a hash of the inner Ethernet frame's 
> headers
> >         to obtain a level of entropy for ECMP/load balancing of the VM 
> to VM
> >         traffic across the VXLAN overlay.
> > 
> > 
> > You can restrict to a smaller range if that is a requirement of your
> > infrastructure.
> 
>         I'm suggesting the smaller range, because the fix for the part
> that is broken would become a resource issue for the current, larger
> default range.
>         [and a "recommended" in a draft doesn't trump 35 years of UDP
>                 usage, even if it did say not to bind the ports...]
>  
> > Normal UDP applications assign their source port from the ephemeral 
> > port range,
> > so that is what VXLAN does.
> 
>         Normal UDP applications bind to the source port. If they are
> unbound, they bind just for the send and then unbind after. They
> cannot use a port already bound _because_the_bind_prohibits_it.
>         That is, in fact, the entire issue I'm raising. (!) If I have
> a UDP application that binds to port 35000, no other UDP application
> will ever use that port until I release it, and any ICMP errors delivered
> to my socket are triggered by my application.
>         That became no longer true with the addition of VXLAN port ranges,
> because VXLAN does not use UDP bind, or any of the UDP code, to enforce
> this. It simply generates a random number in the range, which _can_be_
> 35000 or any other bound port, and then sends its own, constructed UDP
> header using that port.
> 
>         The proper way to fix this would be to actually bind to a port in
> the range, and retry another port if the binding fails, until the binding
> succeeds. But as VXLAN picks a randomized source port _for_each_packet_,
> I'm not suggesting we do that.
>         I'm suggesting, instead, that we bind on all the source ports we
> will use at start-up, which then reserves those ports for VXLAN and
> prevents anyone else from binding on them.
>         That solves the issue of binding and unbinding on each packet,
> but I am not then suggesting that VXLAN should bind on 30,000 ports on
> start-up. That would be silly, especially on a system whose primary 
> function
> is not VXLAN.
>         So, the logical next question is: does VXLAN really need a range
> of 30,000 ports as the "normal" circumstance? I think the answer to that
> is definitely "no." In fact, just one port would work fine a lot of the
> time, and when multiple ports are needed, the capability is still there.
> That suggests changing the *default* range (I suggest to 1 port).

The range could be smaller yes, but that means you are restricting
hashing.

>         My conclusions from that reasoning:
> 
> 1) VXLAN use of UDP source ports is broken; it cannot use ports that are
>         already bound, and right now it does
> 2) while a bind/unbind would work, doing that on every packet is slow

The problem is the bind/unbind is  a flow state operation, and
doing keeping flow state wouldn't scale.
 

> 
> so,
> 
> 3) the default port range should be much smaller and VXLAN should bind
>         in advance to the set of ports it wants to use.

Probably should not overlap ephemeral port range for applications.


> 
> Now, maybe it wouldn't kill performance, and so doing a bind/unbind per
> packet is still an option, but that would definitely hurt performance
> for people who don't actually care about port entropy.

What about a peek operation that just avoids existing ports.

> Whether solved by a bind/unbind, pre-binding to a smaller default port
> range, or a switch between the two, I think VXLAN *must* follow the
> rules in its use of UDP and ensure that it doesn't send using source
> ports in use by something else. It can't just generate a random one
> and use it without checking it, as it does now.
> 
>                                                                 +-DLS
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC - VXLAN port range facility
  2013-05-31 17:22         ` Stephen Hemminger
@ 2013-05-31 18:19           ` David Stevens
  2013-06-01  1:43             ` Stephen Hemminger
  0 siblings, 1 reply; 14+ messages in thread
From: David Stevens @ 2013-05-31 18:19 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, netdev-owner

Stephen Hemminger <stephen@networkplumber.org> wrote on 05/31/2013 
01:22:33 PM:

> > Now, maybe it wouldn't kill performance, and so doing a bind/unbind 
per
> > packet is still an option, but that would definitely hurt performance
> > for people who don't actually care about port entropy.
> 
> What about a peek operation that just avoids existing ports.

        That sounds like an excellent idea to me. But anything per-packet
interacting with other parts of the kernel has potential to be slow.
My concern there, if it is noticeably slower, is that someone who
doesn't need the entropy should not pay the penalty for it. So, if
whatever we do per-packet to ensure we're using unbound UDP ports
slows it down, I think we'd want a knob of some sort to allow just
using a pre-bound port or (smaller) set of ports, since those don't
require any per-packet checks.
        But absolutely, if we just do the port lookup and rehash if
the port is in use, even without actually binding, I think that would
work well; we wouldn't cause any troubles for long-term UDP bound
sockets and other emphemeral ports can get stray traffic from prior
use already.

                                                                +-DLS

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC - VXLAN port range facility
  2013-05-31 18:19           ` David Stevens
@ 2013-06-01  1:43             ` Stephen Hemminger
  2013-06-01 13:28               ` David Stevens
  2013-06-03  8:21               ` David Laight
  0 siblings, 2 replies; 14+ messages in thread
From: Stephen Hemminger @ 2013-06-01  1:43 UTC (permalink / raw)
  To: David Stevens; +Cc: netdev, netdev-owner

On Fri, 31 May 2013 14:19:47 -0400
David Stevens <dlstevens@us.ibm.com> wrote:

> Stephen Hemminger <stephen@networkplumber.org> wrote on 05/31/2013 
> 01:22:33 PM:
> 
> > > Now, maybe it wouldn't kill performance, and so doing a bind/unbind 
> per
> > > packet is still an option, but that would definitely hurt performance
> > > for people who don't actually care about port entropy.
> > 
> > What about a peek operation that just avoids existing ports.
> 
>         That sounds like an excellent idea to me. But anything per-packet
> interacting with other parts of the kernel has potential to be slow.
> My concern there, if it is noticeably slower, is that someone who
> doesn't need the entropy should not pay the penalty for it. So, if
> whatever we do per-packet to ensure we're using unbound UDP ports
> slows it down, I think we'd want a knob of some sort to allow just
> using a pre-bound port or (smaller) set of ports, since those don't
> require any per-packet checks.
>         But absolutely, if we just do the port lookup and rehash if
> the port is in use, even without actually binding, I think that would
> work well; we wouldn't cause any troubles for long-term UDP bound
> sockets and other emphemeral ports can get stray traffic from prior
> use already.
> 
>                                                                 +-DLS
> 

I am wondering if this is just a theoretical problem, or does it occur
in real life. My simple tests are not causing ICMP to be delivered to
UDP application (over the VXLAN) but it maybe because of the compare scoring
in ICMP, or because of use of multicast versus unicast destinations.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC - VXLAN port range facility
  2013-05-31 12:26       ` David Stevens
@ 2013-06-01  6:39         ` Jesse Gross
  0 siblings, 0 replies; 14+ messages in thread
From: Jesse Gross @ 2013-06-01  6:39 UTC (permalink / raw)
  To: David Stevens; +Cc: netdev, Stephen Hemminger

On Fri, May 31, 2013 at 9:26 PM, David Stevens <dlstevens@us.ibm.com> wrote:
> Jesse Gross <jesse@nicira.com> wrote on 05/31/2013 02:09:34 AM:
>
>> On Fri, May 31, 2013 at 3:00 AM, David Stevens <dlstevens@us.ibm.com>
> wrote:
>> > But I don't think there's particular advantage in splitting it up
> 30,000
>> > ways when 10 ways would be both practical, for binding, and spread
>> > traffic to 10 flows potentially.
>>
>> Most people that run large data centers think that 16 bits of entropy
>> is barely sufficient. The issue is not CPUs or link aggregation but
>> Clos fabrics built using ECMP.
>>
>
> And most people running embedded systems wouldn't want to bind to
> 30,000 sockets by default, which is the proper way for VXLAN to
> interact with UDP.
>
> A casual user of VXLAN between a couple of small machines on ordinary
> Ethernet generally won't require multiple ports at all.
>
> I think the default case should lean towards the low end, and the
> mechanisms are there to tune the high end.

This line of argument doesn't make a lot of sense because scalability
and ECMP are the two reason that VXLAN was introduced in the first
place. Without the entropy in the source port, it's basically the same
as GRE.

Your solution needs to work reasonably across the entire range of use
cases and based on your arguments, it clearly doesn't. This doesn't
mean that those use cases don't exist or aren't important, it means
that you need to find another solution.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: RFC - VXLAN port range facility
  2013-06-01  1:43             ` Stephen Hemminger
@ 2013-06-01 13:28               ` David Stevens
  2013-06-03  8:21               ` David Laight
  1 sibling, 0 replies; 14+ messages in thread
From: David Stevens @ 2013-06-01 13:28 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, netdev-owner

Stephen Hemminger <stephen@networkplumber.org> wrote on 05/31/2013 
09:43:10 PM:

> 
> I am wondering if this is just a theoretical problem, or does it occur
> in real life. My simple tests are not causing ICMP to be delivered to
> UDP application (over the VXLAN) but it maybe because of the compare 
scoring
> in ICMP, or because of use of multicast versus unicast destinations.
> 

ICMP won't sent errors triggered by multicasts.

                                                        +-DLS

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: RFC - VXLAN port range facility
  2013-06-01  1:43             ` Stephen Hemminger
  2013-06-01 13:28               ` David Stevens
@ 2013-06-03  8:21               ` David Laight
  1 sibling, 0 replies; 14+ messages in thread
From: David Laight @ 2013-06-03  8:21 UTC (permalink / raw)
  To: Stephen Hemminger, David Stevens; +Cc: netdev, netdev-owner

> I am wondering if this is just a theoretical problem, or does it occur
> in real life. My simple tests are not causing ICMP to be delivered to
> UDP application (over the VXLAN) but it maybe because of the compare scoring
> in ICMP, or because of use of multicast versus unicast destinations.

Perhaps modify the routing of received ICMP error messages
so that ones referring to the VXLAN destination port get
routed to VXLAN rather than the socket bound to the source
port (this could be made generic).

Some one seems to have invented a protocol that abuses the
normal port number rules.

	David

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-06-03  8:22 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-30 12:40 RFC - VXLAN port range facility David Stevens
2013-05-30 16:41 ` Stephen Hemminger
2013-05-30 18:00   ` David Stevens
2013-05-31  6:09     ` Jesse Gross
2013-05-31 12:26       ` David Stevens
2013-06-01  6:39         ` Jesse Gross
2013-05-31 16:13     ` Stephen Hemminger
2013-05-31 17:08       ` David Stevens
2013-05-31 17:22         ` Stephen Hemminger
2013-05-31 18:19           ` David Stevens
2013-06-01  1:43             ` Stephen Hemminger
2013-06-01 13:28               ` David Stevens
2013-06-03  8:21               ` David Laight
2013-05-30 19:33   ` Ben Hutchings

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.