From: Dave Jiang <dave.jiang@intel.com>
To: Zhangfei Gao <zhangfei.gao@linaro.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Arnd Bergmann <arnd@arndb.de>,
jonathan.cameron@huawei.com, grant.likely@arm.com,
jean-philippe <jean-philippe@linaro.org>,
ilias.apalodimas@linaro.org, francois.ozog@linaro.org,
kenneth-lee-2012@foxmail.com, Wangzhou <wangzhou1@hisilicon.com>
Cc: Zaibo Xu <xuzaibo@huawei.com>,
Kenneth Lee <liguozhu@hisilicon.com>,
linux-kernel@vger.kernel.org,
linux-accelerators@lists.ozlabs.org
Subject: Re: [RESEND PATCH v4 1/2] uacce: Add documents for uacce
Date: Wed, 9 Oct 2019 09:11:37 -0700 [thread overview]
Message-ID: <0d339fc4-9a15-9f7f-4f19-89bf8baf1301@intel.com> (raw)
In-Reply-To: <1570634502-20923-2-git-send-email-zhangfei.gao@linaro.org>
On 10/9/19 8:21 AM, Zhangfei Gao wrote:
> From: Kenneth Lee <liguozhu@hisilicon.com>
>
> Uacce (Unified/User-space-access-intended Accelerator Framework) is
> a kernel module targets to provide Shared Virtual Addressing (SVA)
> between the accelerator and process.
>
> This patch add document to explain how it works.
>
> Signed-off-by: Kenneth Lee <liguozhu@hisilicon.com>
> Signed-off-by: Zaibo Xu <xuzaibo@huawei.com>
> Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com>
> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> ---
> Documentation/misc-devices/uacce.rst | 297 +++++++++++++++++++++++++++++++++++
> 1 file changed, 297 insertions(+)
> create mode 100644 Documentation/misc-devices/uacce.rst
>
> diff --git a/Documentation/misc-devices/uacce.rst b/Documentation/misc-devices/uacce.rst
> new file mode 100644
> index 0000000..b3cf0d5
> --- /dev/null
> +++ b/Documentation/misc-devices/uacce.rst
> @@ -0,0 +1,297 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +Introduction of Uacce
> +=========================
> +
> +Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
> +provide Shared Virtual Addressing (SVA) between accelerators and processes.
> +So accelerator can access any data structure of the main cpu.
> +This differs from the data sharing between cpu and io device, which share
> +data content rather than address.
> +Because of the unified address, hardware and user space of process can
> +share the same virtual address in the communication.
> +Uacce takes the hardware accelerator as a heterogeneous processor, while
> +IOMMU share the same CPU page tables and as a result the same translation
> +from va to pa.
> +
> + __________________________ __________________________
> + | | | |
> + | User application (CPU) | | Hardware Accelerator |
> + |__________________________| |__________________________|
> +
> + | |
> + | va | va
> + V V
> + __________ __________
> + | | | |
> + | MMU | | IOMMU |
> + |__________| |__________|
> + | |
> + | |
> + V pa V pa
> + _______________________________________
> + | |
> + | Memory |
> + |_______________________________________|
> +
> +
> +
> +Architecture
> +------------
> +
> +Uacce is the kernel module, taking charge of iommu and address sharing.
> +The user drivers and libraries are called WarpDrive.
> +
> +A virtual concept, queue, is used for the communication. It provides a
> +FIFO-like interface. And it maintains a unified address space between the
> +application and all involved hardware.
> +
> + ___________________ ________________
> + | | user API | |
> + | WarpDrive library | ------------> | user driver |
> + |___________________| |________________|
> + | |
> + | |
> + | queue fd |
> + | |
> + | |
> + v |
> + ___________________ _________ |
> + | | | | | mmap memory
> + | Other framework | | uacce | | r/w interface
> + | crypto/nic/others | |_________| |
> + |___________________| |
> + | | |
> + | register | register |
> + | | |
> + | | |
> + | _________________ __________ |
> + | | | | | |
> + ------------- | Device Driver | | IOMMU | |
> + |_________________| |__________| |
> + | |
> + | V
> + | ___________________
> + | | |
> + -------------------------- | Device(Hardware) |
> + |___________________|
> +
> +
> +How does it work
> +================
> +
> +Uacce uses mmap and IOMMU to play the trick.
> +
> +Uacce create a chrdev for every device registered to it. New queue is
> +created when user application open the chrdev. The file descriptor is used
> +as the user handle of the queue.
> +The accelerator device present itself as an Uacce object, which exports as
> +chrdev to the user space. The user application communicates with the
> +hardware by ioctl (as control path) or share memory (as data path).
> +
> +The control path to the hardware is via file operation, while data path is
> +via mmap space of the queue fd.
> +
> +The queue file address space:
> +
> +enum uacce_qfrt {
> + UACCE_QFRT_MMIO = 0, /* device mmio region */
> + UACCE_QFRT_DKO = 1, /* device kernel-only region */
> + UACCE_QFRT_DUS = 2, /* device user share region */
> + UACCE_QFRT_SS = 3, /* static shared memory (for non-sva devices) */
> + UACCE_QFRT_MAX = 16,
> +};
> +
> +All regions are optional and differ from device type to type. The
> +communication protocol is wrapped by the user driver.
> +
> +The device mmio region is mapped to the hardware mmio space. It is generally
> +used for doorbell or other notification to the hardware. It is not fast enough
> +as data channel.
> +
> +The device kernel-only region is necessary only if the device IOMMU has no
> +PASID support or it cannot send kernel-only address request. In this case, if
> +kernel need to share memory with the device, kernel has to share iova address
> +space with the user process via mmap, to prevent iova conflict.
> +
> +The device user share region is used for share data buffer between user process
> +and device. It can be merged into other regions. But a separated region can help
> +on device state management. For example, the device can be started when this
> +region is mapped.
> +
> +The static share virtual memory region is used for share data buffer with the
> +device and can be shared among queues / devices.
> +Its size is set according to the application requirement.
> +
> +
> +The user API
> +------------
> +
> +We adopt a polling style interface in the user space: ::
> +
> + int wd_request_queue(struct wd_queue *q);
> + void wd_release_queue(struct wd_queue *q);
> + int wd_send(struct wd_queue *q, void *req);
> + int wd_recv(struct wd_queue *q, void **req);
> + int wd_recv_sync(struct wd_queue *q, void **req);
> + void wd_flush(struct wd_queue *q);
> +
> +wd_recv_sync() is a wrapper to its non-sync version. It will trap into
> +kernel and wait until the queue become available.
> +
> +If the queue do not support SVA/SVM. The following helper functions
> +can be used to create Static Virtual Share Memory: ::
> +
> + void *wd_reserve_memory(struct wd_queue *q, size_t size);
> + int wd_share_reserved_memory(struct wd_queue *q,
> + struct wd_queue *target_q);
> +
> +The user API is not mandatory. It is simply a suggestion and hint what the
> +kernel interface is supposed to be.
> +
> +
> +The user driver
> +---------------
> +
> +The queue file mmap space will need a user driver to wrap the communication
> +protocol. Uacce provides some attributes in sysfs for the user driver to
> +match the right accelerator accordingly.
> +More details in Documentation/ABI/testing/sysfs-driver-uacce.
> +
> +
> +The Uacce register API
> +-----------------------
> +The register API is defined in uacce.h.
> +
> +struct uacce_interface {
> + char name[32];
> + unsigned int flags;
> + struct uacce_ops *ops;
> +};
> +
> +According to the IOMMU capability, uacce_interface flags can be:
> +
> +UACCE_DEV_SVA (0x1)
> + Support shared virtual address
> +
> +UACCE_DEV_SHARE_DOMAIN (0)
> + This is used for device which does not support pasid.
> +
> +struct uacce_device *uacce_register(struct device *parent,
> + struct uacce_interface *interface);
> +void uacce_unregister(struct uacce_device *uacce);
> +
> +uacce_register resultes can be:
> +a. If uacce module is not compiled, ERR_PTR(-ENODEV)
> +b. Succeed with the desired flags
> +c. Succeed with the negotiated flags, for example
> + uacce_interface.flags = UACCE_DEV_SVA but uacce->flags = ~UACCE_DEV_SVA
> +So user driver need check return value as well as the negotiated uacce->flags.
> +
> +
> +The Memory Sharing Model
> +------------------------
> +The perfect form of a Uacce device is to support SVM/SVA. We built this upon
> +Jean Philippe Brucker's SVA patches. [1]
> +
> +If the hardware support UACCE_DEV_SVA, the user process's page table is
> +shared to the opened queue. So the device can access any address in the
> +process address space.
> +And it can raise a page fault if the physical page is not available yet.
> +It can also access the address in the kernel space, which is referred by
> +another page table particular to the kernel. Most of IOMMU implementation can
> +handle this by a tag on the address request of the device. For example, ARM
> +SMMU uses SSV bit to indicate that the address request is for kernel or user
> +space.
> +Queue file regions can be used:
> +UACCE_QFRT_MMIO: device mmio region (map to user)
> +UACCE_QFRT_DUS: device user share (map to dev and user)
> +
> +If the device does not support UACCE_DEV_SVA, Uacce allow only one process at
> +the same time. DMA API cannot be used as well, since Uacce will create an
> +unmanaged iommu_domain for the device.
> +Queue file regions can be used:
> +UACCE_QFRT_MMIO: device mmio region (map to user)
> +UACCE_QFRT_DKO: device kernel-only (map to dev, no user)
> +UACCE_QFRT_DUS: device user share (map to dev and user)
> +UACCE_QFRT_SS: static share memory (map to devs and user)
> +
> +
> +The Folk Scenario
> +=================
s/Folk/Fork/
> +For a process with allocated queues and shared memory, what happen if it forks
> +a child?
> +
> +The fd of the queue will be duplicated on folk, so the child can send request
> +to the same queue as its parent. But the requests which is sent from processes
> +except for the one who opens the queue will be blocked.
> +
> +It is recommended to add O_CLOEXEC to the queue file.
> +
> +The queue mmap space has a VM_DONTCOPY in its VMA. So the child will lose all
> +those VMAs.
> +
> +This is a reason why Uacce does not adopt the mode used in VFIO and
> +InfiniBand. Both solutions can set any user pointer for hardware sharing.
> +But they cannot support fork when the dma is in process. Or the
> +"Copy-On-Write" procedure will make the parent process lost its physical
> +pages.
> +
> +
> +Difference to the VFIO and IB framework
> +---------------------------------------
> +The essential function of Uacce is to let the device access the user
> +address directly. There are many device drivers doing the same in the kernel.
> +And both VFIO and IB can provide similar function in framework level.
> +
> +But Uacce has a different goal: "share address space". It is
> +not taken the request to the accelerator as an enclosure data structure. It
> +takes the accelerator as another thread of the same process. So the
> +accelerator can refer to any address used by the process.
> +
> +Both VFIO and IB are taken this as "memory sharing", not "address sharing".
> +They care more on sharing the block of memory. But if there is an address
> +stored in the block and referring to another memory region. The address may
> +not be valid.
> +
> +By adding more constraints to the VFIO and IB framework, in some sense, we may
> +achieve a similar goal. But we gave it up finally. Both VFIO and IB have extra
> +assumption which is unnecessary to Uacce. They may hurt each other if we
> +try to merge them together.
> +
> +VFIO manages resource of a hardware as a "virtual device". If a device need to
> +serve a separated application. It must isolate the resource as a separate
> +virtual device. And the life cycle of the application and virtual device are
> +unnecessary unrelated. And most concepts, such as bus, driver, probe and
> +so on, to make it as a "device" is unnecessary either. And the logic added to
> +VFIO to make address sharing do no help on "creating a virtual device".
> +
> +IB creates a "verbs" standard for sharing memory region to another remote
> +entity. Most of these verbs are to make memory region between entities to be
> +synchronized. This is not what accelerator need. Accelerator is in the same
> +memory system with the CPU. It refers to the same memory system among CPU and
> +devices. So the local memory terms/verbs are good enough for it. Extra "verbs"
> +are not necessary. And its queue (like queue pair in IB) is the communication
> +channel direct to the accelerator hardware. There is nothing about memory
> +itself.
> +
> +Further, both VFIO and IB use the "pin" (get_user_page) way to lock local
> +memory in place. This is flexible. But it can cause other problems. For
> +example, if the user process fork a child process. The COW procedure may make
> +the parent process lost its pages which are sharing with the device. These may
> +be fixed in the future. But is not going to be easy. (There is a discussion
> +about this on Linux Plumbers Conference 2018 [2])
> +
> +So we choose to build the solution directly on top of IOMMU interface. IOMMU
> +is the essential way for device and process to share their page mapping from
> +the hardware perspective. It will be safe to create a software solution on
> +this assumption. Uacce manages the IOMMU interface for the accelerator
> +device, so the device driver can export some of the resources to the user
> +space. Uacce than can make sure the device and the process have the same
> +address space.
> +
> +
> +References
> +==========
> +.. [1] http://jpbrucker.net/sva/
> +.. [2] https://lwn.net/Articles/774411/
>
next prev parent reply other threads:[~2019-10-09 16:11 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-09 15:21 [RESEND PATCH v4 0/2] Add uacce module for Accelerator Zhangfei Gao
2019-10-09 15:21 ` [RESEND PATCH v4 1/2] uacce: Add documents for uacce Zhangfei Gao
2019-10-09 16:11 ` Dave Jiang [this message]
2019-10-09 15:21 ` [RESEND PATCH v4 2/2] uacce: add uacce driver Zhangfei Gao
2019-10-10 12:33 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0d339fc4-9a15-9f7f-4f19-89bf8baf1301@intel.com \
--to=dave.jiang@intel.com \
--cc=arnd@arndb.de \
--cc=francois.ozog@linaro.org \
--cc=grant.likely@arm.com \
--cc=gregkh@linuxfoundation.org \
--cc=ilias.apalodimas@linaro.org \
--cc=jean-philippe@linaro.org \
--cc=jonathan.cameron@huawei.com \
--cc=kenneth-lee-2012@foxmail.com \
--cc=liguozhu@hisilicon.com \
--cc=linux-accelerators@lists.ozlabs.org \
--cc=linux-kernel@vger.kernel.org \
--cc=wangzhou1@hisilicon.com \
--cc=xuzaibo@huawei.com \
--cc=zhangfei.gao@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).