All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Marchand <david.marchand@redhat.com>
To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>,
	"Burakov, Anatoly" <anatoly.burakov@intel.com>
Cc: dev <dev@dpdk.org>, Thomas Monjalon <thomas@monjalon.net>,
	 Ben Walker <benjamin.walker@intel.com>
Subject: Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection
Date: Wed, 10 Jul 2019 10:09:50 +0200	[thread overview]
Message-ID: <CAJFAV8wK=8fWVAOGhgguCoN4HrRqF65QnN4tiafzQLydFU-CPw@mail.gmail.com> (raw)
In-Reply-To: <BYAPR18MB242483AB7A3D9E9AD6B1FAEEC8F10@BYAPR18MB2424.namprd18.prod.outlook.com>

Hello guys,

On Tue, Jul 9, 2019 at 7:52 PM Jerin Jacob Kollanukkaran <jerinj@marvell.com>
wrote:

> > -----Original Message-----
> > From: Burakov, Anatoly <anatoly.burakov@intel.com>
> > Sent: Tuesday, July 9, 2019 8:07 PM
> > To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; David Marchand
> > <david.marchand@redhat.com>
> > Cc: dev <dev@dpdk.org>; Thomas Monjalon <thomas@monjalon.net>; Ben
> > Walker <benjamin.walker@intel.com>
> > Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode
> > selection
> issue.
> > >>
> > >> I wouldn't classify this as "needing" IOVA. "Need" implies it cannot
> > >> work without it, whereas in this case it's more of a "highly
> > >> recommended" rather than "need".
> > >
> > > It is "need" as performance is horrible without it as is per packet SW
> > translation.
> > > A "need" for DPDK performance perspective.
> >
> > Would the driver fail to initialize if it detects running as IOVA as PA?
>
> Yes.
> https://git.dpdk.org/dpdk/tree/drivers/net/octeontx2/otx2_ethdev.c#n1191
>
> > Also, some other use cases will also require IOVA as PA while having full
> > IOMMU support. An example of this would be systems with limited IOMMU
> > width (such as VM's) - even though the IOMMU is technically supported, we
> > may not have the necessary address width to run all devices in IOVA as VA
> > mode, and would need to fall back to IOVA as PA.
> > Since we cannot *require* IOVA as VA in current codebase, any driver that
> > expects IOVA as VA to always be enabled will presumably not work.
> >
> > >
> > > Again, it is not device attribute, it is system attribute.
> >
> > If it's a system attribute, why is it a device driver flag then? The
> system may
> > or may not support IOMMU, the device itself probably doesn't care since
> bus
> > address looks the same in both cases, *but the driver
> > might* (such as would be in your case - requiring IOVA as VA and
> disallowing
> > IOVA as PA for performance reasons).
>
> Agree.
>
> >
> > Currently (again, disregarding your interpretation of how IOVA as VA
> works
> > and looking at the actual commit history), we always seem to imply that
> IOVA
> > as PA works for all devices, and we use IOVA_AS_VA flag to indicate that
> the
> > device *also* supports IOVA as VA mode.
> >
> > But we don't have any way to express a *requirement* for IOVA as VA mode
> > - only for IOVA as PA mode. That is the purpose of the new flag. You are
> > stating that the IOVA_AS_VA drv flag is an expression of that
> requirement,
> > but that is not reflected in the codebase - our commit history indicates
> that
> > we don't treat IOVA as VA as hard requirement whenever this flag is
> > specified (and i would argue that we shouldn't).
>
> No objection to further classify it.
>

I propose to introduce:
* RTE_PCI_DRV_IOVA_AS_PA which means "the combination of the pmd+kmod+hw
supports usage of Physical Addresses"
* RTE_PCI_DRV_IOVA_AS_VA which means "the combination of the pmd+kmod+hw
supports usage of Virtual Addresses"

- For the pci bus, the algorigthm would be:

devices_want_pa = false
devices_want_va = false

Foreach pci device
  Skip blacklisted devices
  Skip unbound devices (i.e. we only consider devices bound to a known
kernel driver)
  Skip unsupported devices (i.e. we only consider devices that have a pmd
that supports them)

  If the combination pmd+kmod only supports VA (RTE_PCI_DRV_IOVA_AS_VA
capability in driver flags), then devices_want_va = true
  Else if the combination pmd+kmod only supports PA (RTE_PCI_DRV_IOVA_AS_PA
capability in driver flags), then devices_want_pa = true

If devices_want_va and !devices_want_pa
  return RTE_IOVA_VA
If devices_want_pa and !devices_want_va
  return RTE_IOVA_PA

return RTE_IOVA_DC

Notes:
* the IOMMU limitations are considered as a per device/driver thing, since
the kmod is the one that configures the system IOMMU,
* the case "devices_want_pa and devices_want_va" is considered as DC, we
leave EAL decide based on the physical addresses availability because we
can't comply with all present devices/drivers in the system.
  This means that at bus probe time for a device, we must add a check that
the combination is fulfilled (and avoid this check in the drivers
themselves).


- For the global bus code, that aggregates the different buses preferences,
we need to do the same, while I suspect a bug at the moment.

The algorigthm:

buses_want_pa = false
buses_want_va = false

Foreach bus
  If the bus reports RTE_IOVA_VA, then buses_want_va = true
  Else if the bus reports RTE_IOVA_PA, then buses_want_pa = true

If buses_want_va and !buses_want_pa
  return RTE_IOVA_VA
If buses_want_pa and !buses_want_va
  return RTE_IOVA_PA

return RTE_IOVA_DC


- Finally at EAL level, we keep the current code.


Hope I did not miss anything.
If we agree on this, I will send the changes and an update in the
documentation.


-- 
David Marchand

  reply	other threads:[~2019-07-10  8:10 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-08 14:24 [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode selection jerinj
2019-07-08 18:39 ` David Marchand
2019-07-08 19:13   ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
2019-07-09  8:39     ` Bruce Richardson
2019-07-09  9:05       ` Jerin Jacob Kollanukkaran
2019-07-09  9:32         ` Bruce Richardson
2019-07-09  9:44     ` Burakov, Anatoly
2019-07-09 11:13       ` Jerin Jacob Kollanukkaran
2019-07-09 11:40         ` Burakov, Anatoly
2019-07-09 12:11           ` Jerin Jacob Kollanukkaran
2019-07-09 13:30             ` Burakov, Anatoly
2019-07-09 13:50               ` Burakov, Anatoly
2019-07-09 14:19                 ` Jerin Jacob Kollanukkaran
2019-07-09 14:00               ` Jerin Jacob Kollanukkaran
2019-07-09 14:37                 ` Burakov, Anatoly
2019-07-09 15:04                   ` Thomas Monjalon
2019-07-09 15:06                     ` Burakov, Anatoly
2019-07-09 17:50                   ` Jerin Jacob Kollanukkaran
2019-07-10  8:09                     ` David Marchand [this message]
2019-07-09 14:54                 ` Burakov, Anatoly
2019-07-09 14:58                   ` Jerin Jacob Kollanukkaran
2019-07-09 15:02                     ` Burakov, Anatoly
2019-07-09 15:12                       ` Thomas Monjalon
2019-07-09 15:18                         ` Burakov, Anatoly

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJFAV8wK=8fWVAOGhgguCoN4HrRqF65QnN4tiafzQLydFU-CPw@mail.gmail.com' \
    --to=david.marchand@redhat.com \
    --cc=anatoly.burakov@intel.com \
    --cc=benjamin.walker@intel.com \
    --cc=dev@dpdk.org \
    --cc=jerinj@marvell.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.