Running DPDK as an unprivileged user

* Running DPDK as an unprivileged user
@ 2016-12-29 20:41 Walker, Benjamin
  2016-12-30  1:14 ` Stephen Hemminger
  2017-01-04 11:39 ` Tan, Jianfeng
  0 siblings, 2 replies; 18+ messages in thread
From: Walker, Benjamin @ 2016-12-29 20:41 UTC (permalink / raw)
  To: dev

Hi all,

I've been digging in to what it would take to run DPDK as an
unprivileged user and I have some findings that I thought
were worthy of discussion. The assumptions here are that I'm
using a very recent Linux kernel (4.8.15 to be specific) and
I'm using vfio with my IOMMU enabled. I'm only interested in
making it possible to run as an unprivileged user in this
type of environment.

There are a few key things that DPDK needs to do in order to
run as an unprivileged user:

1) Allocate hugepages
2) Map device resources
3) Map hugepage virtual addresses to DMA addresses.

For #1 and #2, DPDK works just fine today. You simply chown
the relevant resources in sysfs to the desired user and
everything is happy. 

The problem is #3. This currently relies on looking up the
mappings in /proc/self/pagemap, but the ability to get
physical addresses in /proc/self/pagemap as an unprivileged
user was removed from the kernel in the 4.x timeframe due to
the Rowhammer vulnerability. At this time, it is not
possible to run DPDK as an unprivileged user on a 4.x Linux
kernel.

There is a way to make this work though, which I'll outline
now. Unfortunately, I think it is going to require some very
significant changes to the initialization flow in the EAL.
One bit of  of background before I go into how to fix this -
there are three types of memory addresses - virtual
addresses, physical addresses, and DMA addresses. Sometimes
DMA addresses are called bus addresses or I/O addresses, but
I'll call them DMA addresses because I think that's the
clearest name. In a system without an IOMMU, DMA addresses
and physical addresses are equivalent, but in a system with
an IOMMU any arbitrary DMA address can be chosen by the user
to map to a given physical address. For security reasons
(rowhammer), it is no longer considered safe to expose
physical addresses to userspace, but it is perfectly fine to
expose DMA addresses when an IOMMU is present.

DPDK today begins by allocating all of the required
hugepages, then finds all of the physical addresses for
those hugepages using /proc/self/pagemap, sorts the
hugepages by physical address, then remaps the pages to
contiguous virtual addresses. Later on and if vfio is
enabled, it asks vfio to pin the hugepages and to set their
DMA addresses in the IOMMU to be the physical addresses
discovered earlier. Of course, running as an unprivileged
user means all of the physical addresses in
/proc/self/pagemap are just 0, so this doesn't end up
working. Further, there is no real reason to choose the
physical address as the DMA address in the IOMMU - it would
be better to just count up starting at 0. Also, because the
pages are pinned after the virtual to physical mapping is
looked up, there is a window where a page could be moved.
Hugepage mappings can be moved on more recent kernels (at
least 4.x), and the reliability of hugepages having static
mappings decreases with every kernel release. Note that this
probably means that using uio on recent kernels is subtly
broken and cannot be supported going forward because there
is no uio mechanism to pin the memory.

The first open question I have is whether DPDK should allow
uio at all on recent (4.x) kernels. My current understanding
is that there is no way to pin memory and hugepages can now
be moved around, so uio would be unsafe. What does the
community think here?

My second question is whether the user should be allowed to
mix uio and vfio usage simultaneously. For vfio, the
physical addresses are really DMA addresses and are best
when arbitrarily chosen to appear sequential relative to
their virtual addresses. For uio, they are physical
addresses and are not chosen at all. It seems that these two
things are in conflict and that it will be difficult, ugly,
and maybe impossible to resolve the simultaneous use of
both.

Once we agree on the above two things, we can try to talk
through some solutions in the code.

Thanks,
Ben

^ permalink raw reply	[flat|nested] 18+ messages in thread