linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Logan Gunthorpe <logang@deltatee.com>
To: Sagi Grimberg <sagi@grimberg.me>,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-nvme@lists.infradead.org, linux-rdma@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-block@vger.kernel.org
Cc: "Stephen Bates" <sbates@raithlin.com>,
	"Christoph Hellwig" <hch@lst.de>, "Jens Axboe" <axboe@kernel.dk>,
	"Keith Busch" <keith.busch@intel.com>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Jason Gunthorpe" <jgg@mellanox.com>,
	"Max Gurtovoy" <maxg@mellanox.com>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Benjamin Herrenschmidt" <benh@kernel.crashing.org>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Steve Wise" <swise@opengridcomputing.com>
Subject: Re: [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory
Date: Thu, 1 Mar 2018 10:40:11 -0700	[thread overview]
Message-ID: <40d69074-31a8-d06a-ade9-90de7712c553@deltatee.com> (raw)
In-Reply-To: <749e3752-4349-0bdf-5243-3d510c2b26db@grimberg.me>



On 01/03/18 04:03 AM, Sagi Grimberg wrote:
> Can you describe what would be the plan to have it when these devices
> do come along? I'd say that p2p_dev needs to become a nvmet_ns reference
> and not from nvmet_ctrl. Then, when cmb capable devices come along, the
> ns can prefer to use its own cmb instead of locating a p2p_dev device?

The patchset already supports CMB drives. That's essentially what patch 
7 is for. We change the nvme-pci driver to use the p2pmem code to 
register and manage the CMB memory. After that, it is simply available 
to the nvmet code. We have already been using this with a couple 
prototype CMB drives.

>> +static int nvmet_p2pdma_add_client(struct nvmet_ctrl *ctrl,
>> +                   struct nvmet_ns *ns)
>> +{
>> +    int ret;
>> +
>> +    if (!blk_queue_pci_p2pdma(ns->bdev->bd_queue)) {
>> +        pr_err("peer-to-peer DMA is not supported by %s\n",
>> +               ns->device_path);
>> +        return -EINVAL;
> 
> I'd say that just skip it instead of failing it, in theory one can
> connect nvme devices via p2p mem and expose other devices in the
> same subsystem. The message would be pr_debug to reduce chattiness.

No, it's actually a bit more complicated than you make it. There's a 
couple cases:

1) The user adds a namespace but there hasn't been a connection and no 
p2pmem has been selected. In this case the add_client function is never 
called and the user can add whatever namespace they like.

2) When the first connect happens, nvmet_setup_p2pmem() will call 
add_client() for each namespace and rdma device. If any of them fail 
then it does not use a P2P device and falls back, as you'd like, to 
regular memory.

3) When the user adds a namespace after a port is in use and a 
compatible P2P device has been found. In this case, if the user tries to 
add a namespace that is not compatible with the P2P device in use then 
it fails adding the new namespace. The only alternative here is to tear 
everything down, ensure there are no P2P enabled buffers open and start 
using regular memory again... That is very difficult.

I also disagree that these messages should be pr_debug. If a user is 
trying to use P2P memory and they enable it but the system doesn't 
choose to use their memory they must know why that is so they can make 
the necessary adjustments. If the system doesn't at least print a dmesg 
they are in the dark.

>> +    }
>> +
>> +    ret = pci_p2pdma_add_client(&ctrl->p2p_clients, nvmet_ns_dev(ns));
>> +    if (ret)
>> +        pr_err("failed to add peer-to-peer DMA client %s: %d\n",
>> +               ns->device_path, ret);
>> +
>> +    return ret;
>> +}
>> +
>>   int nvmet_ns_enable(struct nvmet_ns *ns)
>>   {
>>       struct nvmet_subsys *subsys = ns->subsys;
>> @@ -299,6 +319,14 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
>>       if (ret)
>>           goto out_blkdev_put;
>> +    list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
>> +        if (ctrl->p2p_dev) {
>> +            ret = nvmet_p2pdma_add_client(ctrl, ns);
>> +            if (ret)
>> +                goto out_remove_clients;
> 
> Is this really a fatal failure given that we fall-back to main
> memory? Why not continue with main memory (and warn at best)?

See above. It's fatal because we are already using p2p memory and we 
can't easily tear that all down when a user adds a new namespace.

>> +    ctrl->p2p_dev = pci_p2pmem_find(&ctrl->p2p_clients);
> 
> This is the first p2p_dev found right? What happens if I have more than
> a single p2p device? In theory I'd have more p2p memory I can use. Have
> you considered making pci_p2pmem_find return the least used suitable
> device?

Yes, it currently returns the first found. I imagine a bunch of 
improvements could be made to it in future work. However, I'd probably 
start with finding the nearest p2p device and then falling back to the 
least used if that's necessary. At this time though it's not clear how 
complicated these systems will get and what's actually needed here.

>> @@ -68,6 +69,7 @@ struct nvmet_rdma_rsp {
>>       u8            n_rdma;
>>       u32            flags;
>>       u32            invalidate_rkey;
>> +    struct pci_dev        *p2p_dev;
> 
> Given that p2p_client is in nvmet_req, I think it make sense
> that the p2p_dev itself would also live there. In theory, nothing
> is preventing FC from using it as well.

Fair point. I can look at moving it in v3.

Thanks,

Logan

  parent reply	other threads:[~2018-03-01 17:40 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-28 23:39 [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory Logan Gunthorpe
2018-02-28 23:39 ` [PATCH v2 01/10] PCI/P2PDMA: Support peer to peer memory Logan Gunthorpe
2018-03-01 17:37   ` Bjorn Helgaas
2018-03-01 18:55     ` Logan Gunthorpe
2018-03-01 23:00       ` Bjorn Helgaas
2018-03-01 23:06         ` Logan Gunthorpe
2018-03-01 23:14           ` Stephen  Bates
2018-03-01 23:45             ` Bjorn Helgaas
2018-02-28 23:39 ` [PATCH v2 02/10] PCI/P2PDMA: Add sysfs group to display p2pmem stats Logan Gunthorpe
2018-03-01 17:44   ` Bjorn Helgaas
2018-03-02  0:15     ` Logan Gunthorpe
2018-03-02  0:36       ` Dan Williams
2018-03-02  0:37         ` Logan Gunthorpe
2018-02-28 23:39 ` [PATCH v2 03/10] PCI/P2PDMA: Add PCI p2pmem dma mappings to adjust the bus offset Logan Gunthorpe
2018-03-01 17:49   ` Bjorn Helgaas
2018-03-01 19:36     ` Logan Gunthorpe
2018-02-28 23:40 ` [PATCH v2 04/10] PCI/P2PDMA: Clear ACS P2P flags for all devices behind switches Logan Gunthorpe
2018-03-01 18:02   ` Bjorn Helgaas
2018-03-01 18:54     ` Stephen  Bates
2018-03-01 21:21       ` Alex Williamson
2018-03-01 21:26         ` Logan Gunthorpe
2018-03-01 21:32         ` Stephen  Bates
2018-03-01 21:35           ` Jerome Glisse
2018-03-01 21:37             ` Logan Gunthorpe
2018-03-01 23:15       ` Bjorn Helgaas
2018-03-01 23:59         ` Logan Gunthorpe
2018-03-01 19:13     ` Logan Gunthorpe
2018-03-05 22:28       ` Bjorn Helgaas
2018-03-05 23:01         ` Logan Gunthorpe
2018-02-28 23:40 ` [PATCH v2 05/10] block: Introduce PCI P2P flags for request and request queue Logan Gunthorpe
2018-03-01 11:08   ` Sagi Grimberg
2018-02-28 23:40 ` [PATCH v2 06/10] IB/core: Add optional PCI P2P flag to rdma_rw_ctx_[init|destroy]() Logan Gunthorpe
2018-03-01 10:32   ` Sagi Grimberg
2018-03-01 17:16     ` Logan Gunthorpe
2018-02-28 23:40 ` [PATCH v2 07/10] nvme-pci: Use PCI p2pmem subsystem to manage the CMB Logan Gunthorpe
2018-03-05  1:33   ` Oliver
2018-03-05 16:00     ` Keith Busch
2018-03-05 17:10       ` Logan Gunthorpe
2018-03-05 18:02         ` Sinan Kaya
2018-03-05 18:09           ` Logan Gunthorpe
2018-03-06  0:49         ` Oliver
2018-03-06  1:14           ` Logan Gunthorpe
2018-03-06 10:40             ` Oliver
2018-03-05 19:57       ` Sagi Grimberg
2018-03-05 20:10         ` Jason Gunthorpe
2018-03-05 20:16           ` Logan Gunthorpe
2018-03-05 20:42           ` Keith Busch
2018-03-05 20:50             ` Jason Gunthorpe
2018-03-05 20:13         ` Logan Gunthorpe
2018-02-28 23:40 ` [PATCH v2 08/10] nvme-pci: Add support for P2P memory in requests Logan Gunthorpe
2018-03-01 11:07   ` Sagi Grimberg
2018-03-01 15:58     ` Stephen  Bates
2018-03-09  5:08       ` Bart Van Assche
2018-02-28 23:40 ` [PATCH v2 09/10] nvme-pci: Add a quirk for a pseudo CMB Logan Gunthorpe
2018-03-01 11:03   ` Sagi Grimberg
2018-02-28 23:40 ` [PATCH v2 10/10] nvmet: Optionally use PCI P2P memory Logan Gunthorpe
2018-03-01 11:03   ` Sagi Grimberg
2018-03-01 16:15     ` Stephen  Bates
2018-03-01 17:40     ` Logan Gunthorpe [this message]
2018-03-01 18:35       ` Sagi Grimberg
2018-03-01 18:42         ` Jason Gunthorpe
2018-03-01 19:01           ` Stephen  Bates
2018-03-01 19:27           ` Logan Gunthorpe
2018-03-01 22:45             ` Jason Gunthorpe
2018-03-01 22:56               ` Logan Gunthorpe
2018-03-01 23:00               ` Stephen  Bates
2018-03-01 23:20                 ` Jason Gunthorpe
2018-03-01 23:29                   ` Logan Gunthorpe
2018-03-01 23:32                   ` Stephen  Bates
2018-03-01 23:49                 ` Keith Busch
2018-03-01 23:52                   ` Logan Gunthorpe
2018-03-01 23:53                   ` Stephen  Bates
2018-03-02 15:53                     ` Christoph Hellwig
2018-03-02 20:51                       ` Stephen  Bates
2018-03-01 23:57                   ` Stephen  Bates
2018-03-02  0:03                     ` Logan Gunthorpe
2018-03-02 16:18                     ` Jason Gunthorpe
2018-03-02 17:10                       ` Logan Gunthorpe
2018-03-01 19:10         ` Logan Gunthorpe
2018-03-01  3:54 ` [PATCH v2 00/10] Copy Offload in NVMe Fabrics with P2P PCI Memory Benjamin Herrenschmidt
2018-03-01  3:56   ` Benjamin Herrenschmidt
2018-03-01 18:04     ` Logan Gunthorpe
2018-03-01 20:29       ` Benjamin Herrenschmidt
2018-03-01 20:55         ` Jerome Glisse
2018-03-01 21:03           ` Logan Gunthorpe
2018-03-01 21:10             ` Jerome Glisse
2018-03-01 21:15               ` Logan Gunthorpe
2018-03-01 21:25                 ` Jerome Glisse
2018-03-01 21:37               ` Stephen  Bates
2018-03-02 21:38               ` Stephen  Bates
2018-03-02 22:09                 ` Jerome Glisse
2018-03-05 20:36                   ` Stephen  Bates
2018-03-01 20:55         ` Logan Gunthorpe
2018-03-01 18:09     ` Stephen  Bates
2018-03-01 20:32       ` Benjamin Herrenschmidt
2018-03-01 19:21     ` Dan Williams
2018-03-01 19:30       ` Logan Gunthorpe
2018-03-01 20:34       ` Benjamin Herrenschmidt
2018-03-01 20:40         ` Benjamin Herrenschmidt
2018-03-01 20:53           ` Jason Gunthorpe
2018-03-01 20:57             ` Logan Gunthorpe
2018-03-01 22:06             ` Benjamin Herrenschmidt
2018-03-01 22:31               ` Linus Torvalds
2018-03-01 22:34                 ` Benjamin Herrenschmidt
2018-03-02 16:22                   ` Kani, Toshi
2018-03-02 16:57                     ` Linus Torvalds
2018-03-02 17:34                       ` Linus Torvalds
2018-03-02 17:38                       ` Kani, Toshi
2018-03-01 21:37         ` Dan Williams
2018-03-01 21:45           ` Logan Gunthorpe
2018-03-01 21:57             ` Logan Gunthorpe
2018-03-01 23:00               ` Benjamin Herrenschmidt
2018-03-01 23:19                 ` Logan Gunthorpe
2018-03-01 23:25                   ` Benjamin Herrenschmidt
2018-03-02 21:44                     ` Benjamin Herrenschmidt
2018-03-02 22:24                       ` Logan Gunthorpe
2018-03-01 23:26                   ` Benjamin Herrenschmidt
2018-03-01 23:54                     ` Logan Gunthorpe
2018-03-01 21:03       ` Benjamin Herrenschmidt
2018-03-01 21:11         ` Logan Gunthorpe
2018-03-01 21:18           ` Jerome Glisse
2018-03-01 21:22             ` Logan Gunthorpe
2018-03-01 10:31 ` Sagi Grimberg
2018-03-01 19:33   ` Logan Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=40d69074-31a8-d06a-ade9-90de7712c553@deltatee.com \
    --to=logang@deltatee.com \
    --cc=alex.williamson@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=hch@lst.de \
    --cc=jgg@mellanox.com \
    --cc=jglisse@redhat.com \
    --cc=keith.busch@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maxg@mellanox.com \
    --cc=sagi@grimberg.me \
    --cc=sbates@raithlin.com \
    --cc=swise@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).