All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi@grimberg.me>
To: "Belanger, Martin" <Martin.Belanger@dell.com>,
	Hannes Reinecke <hare@suse.de>,
	Martin Belanger <nitram_67@hotmail.com>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>
Cc: "kbusch@kernel.org" <kbusch@kernel.org>,
	"axboe@fb.com" <axboe@fb.com>, "hch@lst.de" <hch@lst.de>
Subject: Re: [PATCH 1/1] Add 'Transport Interface' (triface) option. This can be used to specify the IP interface to use for the connection. The driver uses that to set SO_BINDTODEVICE on the socket before connecting.
Date: Wed, 5 May 2021 13:32:27 -0700	[thread overview]
Message-ID: <4a5680c2-9a6c-377b-2666-cbc1fe105cb5@grimberg.me> (raw)
In-Reply-To: <SJ0PR19MB45446577ACEA270069ADB738F2599@SJ0PR19MB4544.namprd19.prod.outlook.com>


>>> Given that this was the original intent for host_traddr, why not have
>>> host_traddr resolve the iface from the address and set sockopt
>>> SO_BINDTODEVICE on it?
>>>
>> That was my question, too.
>>
>> I would vastly prefer to not have another option to deal with (as it raises the
>> question whether to add it eg during 'nvme connect-all') And one could
>> argue that this was the intention of _having_ the host_traddr argument in
>> the first place ...
>>
>> Cheers,
>>
>> Hannes
>> --
>> Dr. Hannes Reinecke		        Kernel Storage Architect
>> hare@suse.de			               +49 911 74053 688
>> SUSE Software Solutions Germany GmbH, 90409 Nürnberg
>> GF: F. Imendörffer, HRB 36809 (AG Nürnberg)
> 
> Hi Sagi and Hannes,
> 
> Correct me if I'm wrong, but it sounds like host_traddr was primarily added for FC (at least it wasn't tested for TCP since it does not work in its current state). I'm not an expert on FC and maybe specifying an address is the right (and only) way to specify and interface for FC. For TCP, however, it's not advisable. Specifying an interface by its associated IP address is less intuitive than specifying the actual interface name and, in some cases, it simply won't work. That's because the association between interfaces and IP addresses is not predictable. IP addresses can be changed or can change by themselves over time (e.g. DHCP). Interface names are predictable [1] and will persist over time. Consider the following configuration.
> 
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
>      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>      inet 100.0.0.100/24 scope global lo
>         valid_lft forever preferred_lft forever
> 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
>      link/ether 08:00:27:21:65:ec brd ff:ff:ff:ff:ff:ff
>      inet 100.0.0.100/24 scope global enp0s3
>         valid_lft forever preferred_lft forever
> 3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
>      link/ether 08:00:27:4f:95:5c brd ff:ff:ff:ff:ff:ff
>      inet 100.0.0.100/24 scope global enp0s8
>         valid_lft forever preferred_lft forever
> 
> The above is a VM that I configured with the same IP address (100.0.0.100) on all interfaces. Doing a reverse lookup to identify the unique interface associated with 100.0.0.100 would simply not work here. And this is why the option host_iface is required. I understand that the above config does not represent a standard host system, but I'm using this to prove a point: "we can never know how a user will configure their system and the above configuration is perfectly fine by Linux".

Is this a common setting? I'd say that we should probably see a real
life need for it before adding a user interface for it.

What if we start with doing bind + BINDTODEVICE sockopt and if interface
resolution results in multiple devices we just skip the BINDTODEVICE
sockopt (which will not introduce a regression).

If this will cover 99% of the use-cases we are in good shape and we
didn't introduce yet another ABI that may be just confusing...

Having said that, if this setting is a real use-case we need to support
then there is no alternative to have the two options. So I'm a bit
on the fence here.

> The current TCP implementation for host_traddr uses bind()-before-connect(). This is a common construct to set the source IP address on the socket before connecting. This has no effect on how Linux will select the interface for the connection. That's because Linux uses the Weak End System model as described in RFC1122 [2]. Setting the source address on a connection is a common requirement that linux-nvme needs to support. In fact, specifying the Source IP address is a mandatory FedGov requirement (e.g. connection to a RADIUS/TACACS+ server). Consider the following configuration.
> 
> $ ip addr list dev enp0s8
> 3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
>      link/ether 08:00:27:4f:95:5c brd ff:ff:ff:ff:ff:ff
>      inet 192.168.56.101/24 brd 192.168.56.255 scope global dynamic noprefixroute enp0s8
>         valid_lft 426sec preferred_lft 426sec
>      inet 192.168.56.102/24 scope global secondary enp0s8
>         valid_lft forever preferred_lft forever
>      inet 192.168.56.103/24 scope global secondary enp0s8
>         valid_lft forever preferred_lft forever
>      inet 192.168.56.104/24 scope global secondary enp0s8
>         valid_lft forever preferred_lft forever
> 
> Here we can see that several addresses are associated with interface enp0s8. By default, Linux will select the default IP address, 192.168.56.101, as the source address when connecting over interface enp0s8. Some users, however, want the ability to specify a different address (e.g., 192.168.56.103) to be used as the source address. The option host_traddr can be used as-is to perform this function (I tested it).
> 

I meant that we do both bind and sockopt for host_traddr.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  parent reply	other threads:[~2021-05-05 20:32 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20210415192848.962891-1-nitram_67@hotmail.com>
2021-04-15 19:28 ` [PATCH 1/1] Add 'Transport Interface' (triface) option. This can be used to specify the IP interface to use for the connection. The driver uses that to set SO_BINDTODEVICE on the socket before connecting Martin Belanger
2021-05-01 11:34   ` Hannes Reinecke
2021-05-03 16:59     ` Belanger, Martin
2021-05-04 13:25       ` Hannes Reinecke
2021-05-04 19:56   ` Sagi Grimberg
2021-05-05  8:47     ` Hannes Reinecke
2021-05-05 14:31       ` Belanger, Martin
2021-05-05 18:33         ` James Smart
2021-05-05 20:32         ` Sagi Grimberg [this message]
2021-05-06 18:27           ` Michael Christie
2021-05-06  6:05         ` Hannes Reinecke
2021-05-06  7:00           ` Hannes Reinecke
2021-05-06 15:46             ` Belanger, Martin
2021-05-07 18:20               ` Sagi Grimberg
2021-05-10 13:49                 ` Belanger, Martin
2021-05-10 18:13                   ` Sagi Grimberg
2021-05-10 19:18                     ` Belanger, Martin
2021-05-11  0:28                       ` Sagi Grimberg
2021-05-11 13:41                         ` Belanger, Martin
2021-05-11 17:13                           ` Sagi Grimberg
2021-05-12  6:09                             ` Hannes Reinecke
2021-05-12 12:12                               ` Belanger, Martin
2021-05-12 22:12                                 ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4a5680c2-9a6c-377b-2666-cbc1fe105cb5@grimberg.me \
    --to=sagi@grimberg.me \
    --cc=Martin.Belanger@dell.com \
    --cc=axboe@fb.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=nitram_67@hotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.