Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
From: "Belanger, Martin" <Martin.Belanger@dell.com>
To: Sagi Grimberg <sagi@grimberg.me>, Hannes Reinecke <hare@suse.de>,
	Martin Belanger <nitram_67@hotmail.com>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>
Cc: "kbusch@kernel.org" <kbusch@kernel.org>,
	"axboe@fb.com" <axboe@fb.com>, "hch@lst.de" <hch@lst.de>
Subject: RE: [PATCH 1/1] Add 'Transport Interface' (triface) option. This can be used to specify the IP interface to use for the connection. The driver uses that to set SO_BINDTODEVICE on the socket before connecting.
Date: Tue, 11 May 2021 13:41:32 +0000
Message-ID: <SJ0PR19MB45449B142BE6DC506FBABA72F2539@SJ0PR19MB4544.namprd19.prod.outlook.com> (raw)
In-Reply-To: <60aa6be0-ca88-bab8-c893-4c2b1d3f8baf@grimberg.me>

> >>>> We already support this for IPv6, we can do that also for IPv4, but
> >>>> this syntax may not be trivially expected for ipv4?
> >>>
> >>> I tried this for IPv6 and it doesn't work. Here's what I get:
> >>> $ sudo nvme discover -g -G -t tcp -s 8009 -a fe80::800:27ff:fe00:0
> >>> Failed to write to /dev/nvme-fabrics: Invalid argument $ sudo nvme
> >>> discover -g -G -t tcp -s 8009 -a fe80::800:27ff:fe00:0%enp0s8 Failed
> >>> to write to /dev/nvme-fabrics: Invalid argument $ sudo nvme discover
> >>> -g -G -t tcp -s 8009 -a [fe80::800:27ff:fe00:0] failed to resolve
> >>> host [fe80::800:27ff:fe00:0] info $ sudo nvme discover -g -G -t tcp
> >>> -s 8009 -a [fe80::800:27ff:fe00:0%enp0s8] failed to resolve host
> >>> [fe80::800:27ff:fe00:0%enp0s8] info
> >>
> >> # nvme discover -t tcp -a fe80::5054:ff:fef1:9f3b -w
> >> fe80::5054:ff:fe28:5edb%enp6s0
> >
> > Thanks for clarifying the syntax. However, that doesn't work for me.
> >
> > # nvme discover -t tcp -a fe80::800:27ff:fe00:0 -w
> > fe80::9266:4855:6cf2:f7e9%enp0s8 Failed to write to /dev/nvme-fabrics:
> > Connection refused
> 
> Are you using the linux target? connection refused means that you don't
> have a listener on it, it's not a resolution error.
> 
> did you have the target listen on fe80::800:27ff:fe00:0%<intf> ?

Doh! You are correct. In my setup, I run the nvme-cli client on a VM and I run the target (nvmet) on the host computer. I had nvmet configured for "0.0.0.0" instead of "::" (i.e. listen on all interfaces). 

After changing nvmet's configuration, I was able to query the discovery log pages, using this syntax:
nvme discover -t tcp -a fe80::800:27ff:fe00:0 -w fe80::9266:4855:6cf2:f7ea%enp0s8

Note that it doesn't work when I append the interface to the Destination IP address as per RFC4007 (like ping) as follows.
nvme discover -t tcp -a fe80::800:27ff:fe00:0%enp0s8 -w fe80::9266:4855:6cf2:f7ea

> 
> >
> > Note that the above syntax does not comply with RFC4007. The '%'
> delimiter is supposed to be appended to the Destination IP address and not
> the Source Address. In other words, to be RFC4007-compliant, the syntax
> should be (using your example):
> >
> > # nvme discover -t tcp -a fe80::5054:ff:fef1:9f3b%enp6s0 -w
> > fe80::5054:ff:fe28:5edb
> >
> > This tells nvme-cli to connect to a controller at address
> fe80::5054:ff:fef1:9f3b using interface enp6s0 for the connection. And set the
> Source address to fe80::5054:ff:fe28:5edb.
> 
> This also seems to work, not sure that it does what we want though...
> nvme discover -t tcp -a fe80::5054:ff:fef1:9f3b%enp6s0 -w
> fe80::5054:ff:fe28:5edb%enp6s0
> 
> Discovery Log Number of Records 1, Generation counter 5 =====Discovery
> Log Entry 0======
> trtype:  tcp
> adrfam:  ipv6
> subtype: nvme subsystem
> treq:    not specified, sq flow control disable supported
> portid:  3
> trsvcid: 8009
> subnqn:  testnqn1
> traddr:  fe80::5054:ff:fef1:9f3b%enp6s0
> sectype: none
> 
> 
> >> The '%' may be confusing when it comes to other transports as well (e.g.
> >> rdma/fc would have to either reject or ignore it, but regardless of
> >> how we add it that would be the case). Having host-traddr accept
> >> either ip or interface seems the most desirable, however that won't
> >> work if there are 2 interfaces that share multiple ip addresses. So
> >> if this is a requirement we'll probably need to add --host-iface as another
> option...
> >
> > I don’t grok what you mean by "that won't work if there are 2 interfaces
> that share multiple ip addresses". Why not? If one specifies the interface by
> its name (e.g. enp0s8), there is no possible confusion even if multiple
> interfaces share the same IP addresses.
> >
> > The following are some examples of how nvme-cli should work to comply
> with RFC4007 and be consistent to the way ping operates.
> > Example 1 - IPv4, Specify Interface with -w and let Linux select Source
> address:
> > nvme discover -t tcp -a 192.168.1.9 -w enp0s8
> >
> > Example 2 - IPv4, Specify Interface and Source address with repeated -w:
> > nvme discover -t tcp -a 192.168.1.9 -w enp0s8 -w 192.168.56.103
> 
> I meant without the repetitions, which you only need if you have 2 devices
> that share more than one address, which again, is not a clear use-case to
> me, but without repetitions we won't support that.

I've been thinking about what you said regarding the need to repeat the -w option when two interfaces share the same IP address. I think we're looking at the problem from a different point of view. The current implementation uses an IP address to identify an interface. I, on the other hand, believe that the best way to identify an interface is by its "interface name or index". In previous emails, I provided examples of the problems that may occur when using an IP address to identify an interface. For example, one can assign the same IP address to different interfaces making it impossible to distinguish interfaces by their IP address alone. Another example is that the low level APIs (e.g. setsockopt(SO_BINDTODEVICE) don’t even require the source IP address. They only need the interface name/index. So, why go through the trouble of performing a reverse address lookup to retrieve the interface name/index when the address is not used at all? 

By the way, if nvme-cli/linux-nvme allowed specifying interfaces by name/index, then we would not really need to repeat the -w option unless we also wanted to set the source address at the same time. Setting the source address is a completely different thing from setting the interface. One should be allowed to set one independently from the other, or both, or none.

If you look at how ping is implemented, they do not infer the interface from the IP address. If one wants to force ping to go over an interface, then one must provide the interface by name/index using the -I option. If one wants to change the source IP address (without forcing a specific interface), then one provides the IP address to the -I option. It's simple and intuitive. And ping also supports appending the interface to the Destination IP using the '%' delimiter for IPv6-only as per RFC4007.

I think that nvme-cli/linux-nvme should follow the ping approach. Interfaces should never be inferred from source IP addresses, but instead be clearly identified by their name or index. And setting the source address should be independent from setting the interface.

Regards,
Martin

> 
> > Example 3 - IPv6, Specify Interface with'%' delimiter and let Linux select
> Source address:
> > nvme discover -t tcp -a fe80::800:27ff:fe00:0%enp0s8
> >
> > Example 4 - IPv6, Specify Interface with -w and let Linux select Source
> address:
> > nvme discover -t tcp -a fe80::800:27ff:fe00:0 -w enp0s8
> >
> > Example 5 - IPv6, Specify Interface with'%' delimiter and Source address
> with -w:
> > nvme discover -t tcp -a fe80::800:27ff:fe00:0%enp0s8 -w
> > fe80::9266:4855:6cf2:f7e9
> >
> > Example 6 - IPv6, Specify Interface and Source address with repeated -w:
> > nvme discover -t tcp -a fe80::800:27ff:fe00:0 -w enp0s8 -w
> > fe80::9266:4855:6cf2:f7e9
> >
> > Martin
> >
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply index

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20210415192848.962891-1-nitram_67@hotmail.com>
2021-04-15 19:28 ` Martin Belanger
2021-05-01 11:34   ` Hannes Reinecke
2021-05-03 16:59     ` Belanger, Martin
2021-05-04 13:25       ` Hannes Reinecke
2021-05-04 19:56   ` Sagi Grimberg
2021-05-05  8:47     ` Hannes Reinecke
2021-05-05 14:31       ` Belanger, Martin
2021-05-05 18:33         ` James Smart
2021-05-05 20:32         ` Sagi Grimberg
2021-05-06 18:27           ` Michael Christie
2021-05-06  6:05         ` Hannes Reinecke
2021-05-06  7:00           ` Hannes Reinecke
2021-05-06 15:46             ` Belanger, Martin
2021-05-07 18:20               ` Sagi Grimberg
2021-05-10 13:49                 ` Belanger, Martin
2021-05-10 18:13                   ` Sagi Grimberg
2021-05-10 19:18                     ` Belanger, Martin
2021-05-11  0:28                       ` Sagi Grimberg
2021-05-11 13:41                         ` Belanger, Martin [this message]
2021-05-11 17:13                           ` Sagi Grimberg
2021-05-12  6:09                             ` Hannes Reinecke
2021-05-12 12:12                               ` Belanger, Martin
2021-05-12 22:12                                 ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SJ0PR19MB45449B142BE6DC506FBABA72F2539@SJ0PR19MB4544.namprd19.prod.outlook.com \
    --to=martin.belanger@dell.com \
    --cc=axboe@fb.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=nitram_67@hotmail.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git