Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: "Belanger, Martin" <Martin.Belanger@dell.com>,
	Sagi Grimberg <sagi@grimberg.me>,
	Martin Belanger <nitram_67@hotmail.com>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>
Cc: "kbusch@kernel.org" <kbusch@kernel.org>,
	"axboe@fb.com" <axboe@fb.com>, "hch@lst.de" <hch@lst.de>
Subject: Re: [PATCH 1/1] Add 'Transport Interface' (triface) option. This can be used to specify the IP interface to use for the connection. The driver uses that to set SO_BINDTODEVICE on the socket before connecting.
Date: Thu, 6 May 2021 09:00:20 +0200
Message-ID: <07fa3404-ed37-052a-c2d7-0e21c119f5c5@suse.de> (raw)
In-Reply-To: <dbb70739-e636-7f4c-7332-1fbc07332444@suse.de>

On 5/6/21 8:05 AM, Hannes Reinecke wrote:
> On 5/5/21 4:31 PM, Belanger, Martin wrote:
[ .. ]
>> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
>> group default qlen 1000
>>      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>>      inet 100.0.0.100/24 scope global lo
>>         valid_lft forever preferred_lft forever
>> 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
>> state UP group default qlen 1000
>>      link/ether 08:00:27:21:65:ec brd ff:ff:ff:ff:ff:ff
>>      inet 100.0.0.100/24 scope global enp0s3
>>         valid_lft forever preferred_lft forever
>> 3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
>> state UP group default qlen 1000
>>      link/ether 08:00:27:4f:95:5c brd ff:ff:ff:ff:ff:ff
>>      inet 100.0.0.100/24 scope global enp0s8
>>         valid_lft forever preferred_lft forever
>>
>> The above is a VM that I configured with the same IP address
>> (100.0.0.100) on all interfaces. Doing a reverse
>> lookup to identify the unique interface associated with 100.0.0.100
>> would simply not work here. And this is why
>> the option host_iface is required. I understand that the above config
>> does not represent a standard host system,
>> but I'm using this to prove a point: "we can never know how a user
>> will configure their system and the above
>> configuration is perfectly fine by Linux".
>>
> 
> ... and messing up any switch MAC address caching when doing so. I guess
> the network admin will come down hard on you if you try that on a
> production system.
> And I sincerely question whether this is a valid use-case; I'm already
> getting grief from our network admins if I dare to put two network
> interfaces from the same machine in the same network.
> 
>> The current TCP implementation for host_traddr uses
>> bind()-before-connect(). This is a common construct to set the
>> source IP address on the socket before connecting. This has no effect
>> on how Linux will select the interface for the
>> connection. That's because Linux uses the Weak End System model as
>> described in RFC1122 [2]. Setting the source address
>> on a connection is a common requirement that linux-nvme needs to
>> support. In fact, specifying the Source IP address
>> is a mandatory FedGov requirement (e.g. connection to a RADIUS/TACACS+
>> server). Consider the following configuration.
>>
>> $ ip addr list dev enp0s8
>> 3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel
>> state UP group default qlen 1000
>>      link/ether 08:00:27:4f:95:5c brd ff:ff:ff:ff:ff:ff
>>      inet 192.168.56.101/24 brd 192.168.56.255 scope global dynamic
>> noprefixroute enp0s8
>>         valid_lft 426sec preferred_lft 426sec
>>      inet 192.168.56.102/24 scope global secondary enp0s8
>>         valid_lft forever preferred_lft forever
>>      inet 192.168.56.103/24 scope global secondary enp0s8
>>         valid_lft forever preferred_lft forever
>>      inet 192.168.56.104/24 scope global secondary enp0s8
>>         valid_lft forever preferred_lft forever
>>
>> Here we can see that several addresses are associated with interface
>> enp0s8. By default, Linux will select the
>> default IP address, 192.168.56.101, as the source address when
>> connecting over interface enp0s8. Some users,
>> however, want the ability to specify a different address (e.g.,
>> 192.168.56.103) to be used as the source address.
>> The option host_traddr can be used as-is to perform this function (I
>> tested it).
>>
> 
> No disagreement here.
> 
>> In conclusion, I believe that for TCP we need 2 options. One that can
>> be used to specify an interface. And one
>> that can be used to set the source address. And users should be
>> allowed to use one or the other, or both, or none.
>> Of course, the documentation for host_traddr will need some
>> clarification. It should state that when used for TCP
>> connection, this option only sets the source address. And the
>> documentation for host_iface should say that this
>> option only applies to TCP connections.
>>
> 
> I'm with James Smart here. I do fail to see the need for 'host_iface'
> _without_ 'host_traddr'; especially for IPv6 where several addresses are
> standard just specifying 'host_iface' simply is not enough, and one has
> to specify 'host_traddr' additionally.
> 
> So 'host_iface' should be contingent on 'host_traddr', meaning we can
> just expand the syntax of 'host_traddr'.
> One easy possibility would be to add ',nobind' to the host_traddr syntax
> which would indicate that we should _not_ bind to the underlying
> interface; I do think that binding to the respective interface should be
> the default.
> 
A-ha. Just spoke to our network folks, and they clarified the usage of
binding to an IP address vs binding to a network interface.
Apparently, binding to a source IP address does just that, setting the
source IP address of the outgoing packet. That packet will _still_ be
subjected to the normal routing table, as the routing table is just
influenced by the _destination_ IP address.
So if we want to have it routed via a specific interface (and thereby
influencing the routing table) we need to bind it to that interface.

The only valid scenario our network folks could come up with where we do
_not_ want to bind to an interface is for asymmetric flows, ie in cases
where the outgoing flow is routed to one interface and the incoming flow
is arriving on another interface. But even they admitted that it's not a
common scenario, and probably will be killed by anti-spoofing software
running on the core switches ...

But if we want to support _that_ then clearly binding to a specific
interface doesn't work.

So I would vote for making binding to the network interface holding the
IP address the default, and add an option ',nobind' to host_traddr to
skip it.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		        Kernel Storage Architect
hare@suse.de			               +49 911 74053 688
SUSE Software Solutions Germany GmbH, 90409 Nürnberg
GF: F. Imendörffer, HRB 36809 (AG Nürnberg)

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply index

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20210415192848.962891-1-nitram_67@hotmail.com>
2021-04-15 19:28 ` Martin Belanger
2021-05-01 11:34   ` Hannes Reinecke
2021-05-03 16:59     ` Belanger, Martin
2021-05-04 13:25       ` Hannes Reinecke
2021-05-04 19:56   ` Sagi Grimberg
2021-05-05  8:47     ` Hannes Reinecke
2021-05-05 14:31       ` Belanger, Martin
2021-05-05 18:33         ` James Smart
2021-05-05 20:32         ` Sagi Grimberg
2021-05-06 18:27           ` Michael Christie
2021-05-06  6:05         ` Hannes Reinecke
2021-05-06  7:00           ` Hannes Reinecke [this message]
2021-05-06 15:46             ` Belanger, Martin
2021-05-07 18:20               ` Sagi Grimberg
2021-05-10 13:49                 ` Belanger, Martin
2021-05-10 18:13                   ` Sagi Grimberg
2021-05-10 19:18                     ` Belanger, Martin
2021-05-11  0:28                       ` Sagi Grimberg
2021-05-11 13:41                         ` Belanger, Martin
2021-05-11 17:13                           ` Sagi Grimberg
2021-05-12  6:09                             ` Hannes Reinecke
2021-05-12 12:12                               ` Belanger, Martin
2021-05-12 22:12                                 ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=07fa3404-ed37-052a-c2d7-0e21c119f5c5@suse.de \
    --to=hare@suse.de \
    --cc=Martin.Belanger@dell.com \
    --cc=axboe@fb.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=nitram_67@hotmail.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git