All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Belanger, Martin" <Martin.Belanger@dell.com>
To: Sagi Grimberg <sagi@grimberg.me>, Hannes Reinecke <hare@suse.de>,
	Martin Belanger <nitram_67@hotmail.com>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>
Cc: "kbusch@kernel.org" <kbusch@kernel.org>,
	"axboe@fb.com" <axboe@fb.com>, "hch@lst.de" <hch@lst.de>
Subject: RE: [PATCH 1/1] Add 'Transport Interface' (triface) option. This can be used to specify the IP interface to use for the connection. The driver uses that to set SO_BINDTODEVICE on the socket before connecting.
Date: Mon, 10 May 2021 19:18:12 +0000	[thread overview]
Message-ID: <SJ0PR19MB45449EA9215D3EB116E1325CF2549@SJ0PR19MB4544.namprd19.prod.outlook.com> (raw)
In-Reply-To: <7b0d0915-961a-0af2-7ea3-77f59cd98ef1@grimberg.me>

> >>> ping <dest-ip-addr>%<interface>
> >>
> >> Ping only supports this syntax for IPv6 no?
> >>
> >>> Extending this approach to nvme-cli we arrive to something like this:
> >>>
> >>> nvme discover --traddr 100.64.29.2%enp0s8 --host-traddr
> >>> 192.168.56.102
> >> ....
> >>
> >> We already support this for IPv6, we can do that also for IPv4, but
> >> this syntax may not be trivially expected for ipv4?
> >
> > I tried this for IPv6 and it doesn't work. Here's what I get:
> > $ sudo nvme discover -g -G -t tcp -s 8009 -a fe80::800:27ff:fe00:0
> > Failed to write to /dev/nvme-fabrics: Invalid argument $ sudo nvme
> > discover -g -G -t tcp -s 8009 -a fe80::800:27ff:fe00:0%enp0s8 Failed
> > to write to /dev/nvme-fabrics: Invalid argument $ sudo nvme discover
> > -g -G -t tcp -s 8009 -a [fe80::800:27ff:fe00:0] failed to resolve host
> > [fe80::800:27ff:fe00:0] info $ sudo nvme discover -g -G -t tcp -s 8009
> > -a [fe80::800:27ff:fe00:0%enp0s8] failed to resolve host
> > [fe80::800:27ff:fe00:0%enp0s8] info
> 
> # nvme discover -t tcp -a fe80::5054:ff:fef1:9f3b -w
> fe80::5054:ff:fe28:5edb%enp6s0

Thanks for clarifying the syntax. However, that doesn't work for me. 

# nvme discover -t tcp -a fe80::800:27ff:fe00:0 -w fe80::9266:4855:6cf2:f7e9%enp0s8
Failed to write to /dev/nvme-fabrics: Connection refused

Note that the above syntax does not comply with RFC4007. The '%' delimiter is supposed to be appended to the Destination IP address and not the Source Address. In other words, to be RFC4007-compliant, the syntax should be (using your example):

# nvme discover -t tcp -a fe80::5054:ff:fef1:9f3b%enp6s0 -w fe80::5054:ff:fe28:5edb

This tells nvme-cli to connect to a controller at address fe80::5054:ff:fef1:9f3b using interface enp6s0 for the connection. And set the Source address to fe80::5054:ff:fe28:5edb.

> 
> Discovery Log Number of Records 1, Generation counter 5 =====Discovery
> Log Entry 0======
> trtype:  tcp
> adrfam:  ipv6
> subtype: nvme subsystem
> treq:    not specified, sq flow control disable supported
> portid:  3
> trsvcid: 8009
> subnqn:  testnqn1
> traddr:  fe80::5054:ff:fef1:9f3b%enp6s0
> sectype: none
> 
> >
> >>
> >>> This tells nvme to connect to 100.64.29.2 on interface enp0s8. We
> >>> make no
> >> change to the --host-traddr option. It continues to be used to
> >> specify the Source IP address only (for the rare cases where users
> >> want to specify a Source Address other than the default). With this,
> >> the interface is specified by name and not by its associated address.
> >> This is not only more intuitive, but, as I stated before, eliminates
> >> the problem caused by mapping the same IP address to multiple
> >> interfaces (not to mention that doing a reverse lookup on an IP
> >> address to find the interface is extra work that we don’t need to do in
> kernel space).
> >>
> >> Maybe we do something like ping -I for host_traddr, from ping man
> pages:
> >>
> >> -I interface
> >>              interface is either an address, an interface name or a
> >> VRF name. If interface is an address, it sets source address to specified
> interface address.
> >> If interface is an
> >>              interface name, it sets source interface to specified
> >> interface. If interface is a VRF name, each packet is routed using
> >> the corresponding routing table; in this case, the -I
> >>              option can be repeated to specify a source address. NOTE:
> >> For IPv6, when doing ping to a link-local scope address, link
> >> specification (by the '%'-notation in destination, or
> >>              by this option) can be used but it is no longer required.
> >>
> >>
> >> Without the repetition though, unless we need to support two
> >> interfaces that share the same multiple addresses in the same subnet,
> >> which sounds completely crazy to me...
> >
> > Hi Sagi,
> >
> > If we want to follow ping as an example, the repetition is needed not to
> specify two interfaces, but to specify an interface and the source address. In
> a previous example (reproduced below), I described a configuration where
> an interface had several addresses assigned to it. By default, Linux always
> picks the same Source address (i.e. 192.168.56.101 in this example) when
> connecting. If a user wants a different source address they need a way to
> specify it (currently with --host-traddr). Users also need a way to specify an
> interface separately from the source address (either with a new option like --
> host-iface or by repeating --host-traddr). With the example below, if we
> wanted to force ping to use interface enp0s8 and source address
> 192.168.56.103, we would repeat the -I option, for example "ping -I enp0s8 -I
> 192.168.56.103". We need a way to do the same with nvme-cli.
> >
> > I thought that introducing a new option, "--host-iface", had the smallest
> impact since it requires less code changes, but that was turned down (not
> sure exactly why). I then suggested that we use the '%' delimiter for IPv4 and
> IPv6. I agree that it is not 100% the same as ping since ping only allows the
> '%' delimiter for IPv6 addresses (as per RFC4007). As you suggested, we could
> repeat the --host-traddr option (e.g. --host-traddr enp0s8 --host-traddr
> 192.168.56.103), but this is more impactful to the code than adding a
> separate --host-iface option.
> 
> It's less about code-changes and more on adding a new user ABI, that is the
> reason why (at least I'm fully on board just yet).
> 
> > EXAMPLE: Interface with several addresses assigned:
> > $ ip addr list dev enp0s8
> > 3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 ...
> >        link/ether 08:00:27:4f:95:5c brd ff:ff:ff:ff:ff:ff
> >        inet 192.168.56.101/24 brd 192.168.56.255 scope ...
> >           valid_lft 426sec preferred_lft 426sec
> >        inet 192.168.56.102/24 scope global secondary enp0s8
> >           valid_lft forever preferred_lft forever
> >        inet 192.168.56.103/24 scope global secondary enp0s8
> >           valid_lft forever preferred_lft forever
> >        inet 192.168.56.104/24 scope global secondary enp0s8
> >           valid_lft forever preferred_lft forever
> >
> > In the end, it doesn't really matter (to me) how it is implemented.
> However, a solution that have little to no impact on existing code would be
> nice. Just like ping, we need a way to specify an interface by its **interface
> name** (and not by its associated IP address), and we need to allow users to
> select which Source IP address to use when there are multiple addresses
> associated with an interface.
> 
> The '%' may be confusing when it comes to other transports as well (e.g.
> rdma/fc would have to either reject or ignore it, but regardless of how we
> add it that would be the case). Having host-traddr accept either ip or
> interface seems the most desirable, however that won't work if there are 2
> interfaces that share multiple ip addresses. So if this is a requirement we'll
> probably need to add --host-iface as another option...

I don’t grok what you mean by "that won't work if there are 2 interfaces that share multiple ip addresses". Why not? If one specifies the interface by its name (e.g. enp0s8), there is no possible confusion even if multiple interfaces share the same IP addresses. 

The following are some examples of how nvme-cli should work to comply with RFC4007 and be consistent to the way ping operates.
Example 1 - IPv4, Specify Interface with -w and let Linux select Source address: 
nvme discover -t tcp -a 192.168.1.9 -w enp0s8

Example 2 - IPv4, Specify Interface and Source address with repeated -w:  
nvme discover -t tcp -a 192.168.1.9 -w enp0s8 -w 192.168.56.103

Example 3 - IPv6, Specify Interface with'%' delimiter and let Linux select Source address:
nvme discover -t tcp -a fe80::800:27ff:fe00:0%enp0s8

Example 4 - IPv6, Specify Interface with -w and let Linux select Source address:
nvme discover -t tcp -a fe80::800:27ff:fe00:0 -w enp0s8

Example 5 - IPv6, Specify Interface with'%' delimiter and Source address with -w: 
nvme discover -t tcp -a fe80::800:27ff:fe00:0%enp0s8 -w fe80::9266:4855:6cf2:f7e9

Example 6 - IPv6, Specify Interface and Source address with repeated -w: 
nvme discover -t tcp -a fe80::800:27ff:fe00:0 -w enp0s8 -w fe80::9266:4855:6cf2:f7e9

Martin
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2021-05-10 19:18 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20210415192848.962891-1-nitram_67@hotmail.com>
2021-04-15 19:28 ` [PATCH 1/1] Add 'Transport Interface' (triface) option. This can be used to specify the IP interface to use for the connection. The driver uses that to set SO_BINDTODEVICE on the socket before connecting Martin Belanger
2021-05-01 11:34   ` Hannes Reinecke
2021-05-03 16:59     ` Belanger, Martin
2021-05-04 13:25       ` Hannes Reinecke
2021-05-04 19:56   ` Sagi Grimberg
2021-05-05  8:47     ` Hannes Reinecke
2021-05-05 14:31       ` Belanger, Martin
2021-05-05 18:33         ` James Smart
2021-05-05 20:32         ` Sagi Grimberg
2021-05-06 18:27           ` Michael Christie
2021-05-06  6:05         ` Hannes Reinecke
2021-05-06  7:00           ` Hannes Reinecke
2021-05-06 15:46             ` Belanger, Martin
2021-05-07 18:20               ` Sagi Grimberg
2021-05-10 13:49                 ` Belanger, Martin
2021-05-10 18:13                   ` Sagi Grimberg
2021-05-10 19:18                     ` Belanger, Martin [this message]
2021-05-11  0:28                       ` Sagi Grimberg
2021-05-11 13:41                         ` Belanger, Martin
2021-05-11 17:13                           ` Sagi Grimberg
2021-05-12  6:09                             ` Hannes Reinecke
2021-05-12 12:12                               ` Belanger, Martin
2021-05-12 22:12                                 ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=SJ0PR19MB45449EA9215D3EB116E1325CF2549@SJ0PR19MB4544.namprd19.prod.outlook.com \
    --to=martin.belanger@dell.com \
    --cc=axboe@fb.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=nitram_67@hotmail.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.