linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v2 0/2] Introduce DMA buffer sharing mechanism
@ 2011-12-02  8:57 Sumit Semwal
  2011-12-02  8:57 ` [RFC v2 1/2] dma-buf: Introduce dma " Sumit Semwal
  2011-12-02  8:57 ` [RFC v2 2/2] dma-buf: Documentation for buffer sharing framework Sumit Semwal
  0 siblings, 2 replies; 64+ messages in thread
From: Sumit Semwal @ 2011-12-02  8:57 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-mm, linaro-mm-sig,
	dri-devel, linux-media
  Cc: linux, arnd, jesse.barker, m.szyprowski, rob, daniel,
	t.stanislaws, Sumit Semwal

Hello Everyone,

This is RFC v2 for DMA buffer sharing mechanism - changes from v1 are in the
changelog below.

Various subsystems - V4L2, GPU-accessors, DRI to name a few - have felt the 
need to have a common mechanism to share memory buffers across different
devices - ARM, video hardware, GPU.

This need comes forth from a variety of use cases including cameras, image 
processing, video recorders, sound processing, DMA engines, GPU and display
buffers, and others.

This RFC is an attempt to define such a buffer sharing mechanism- it is the
result of discussions from a couple of memory-management mini-summits held by
Linaro to understand and address common needs around memory management. [1]

A new dma_buf buffer object is added, with operations and API to allow easy
sharing of this buffer object across devices.

The framework allows:
- a new buffer-object to be created with fixed size.
- different devices to 'attach' themselves to this buffer, to facilitate
  backing storage negotiation, using dma_buf_attach() API.
- association of a file pointer with each user-buffer and associated
   allocator-defined operations on that buffer. This operation is called the
   'export' operation.
- this exported buffer-object to be shared with the other entity by asking for
   its 'file-descriptor (fd)', and sharing the fd across.
- a received fd to get the buffer object back, where it can be accessed using
   the associated exporter-defined operations.
- the exporter and user to share the scatterlist using map_dma_buf and
   unmap_dma_buf operations.

Documentation present in the patch-set gives more details.

This is based on design suggestions from many people at the mini-summits,
most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and
Daniel Vetter <daniel@ffwll.ch>.

The implementation is inspired from proof-of-concept patch-set from
Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing
between two v4l2 devices. [2]

References:
[1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement
[2]: http://lwn.net/Articles/454389

Patchset based on top of 3.2-rc3, the current version can be found at

http://git.linaro.org/gitweb?p=people/sumitsemwal/linux-3.x.git
Branch: dma-buf-upstr-v2

Earlier version at: https://lkml.org/lkml/2011/10/11/92

Best regards,
~Sumit Semwal

History:

v2:
- Review comments incorporated:
   - from Tomasz Stanislawski [https://lkml.org/lkml/2011/10/14/136]
     - kzalloc moved out of critical section
     - corrected some in-code comments

   - from Dave Airlie [https://lkml.org/lkml/2011/11/25/123]

   - from Daniel Vetter and Rob Clark [https://lkml.org/lkml/2011/11/26/53]
     - use struct sg_table in place of struct scatterlist
     - rename {get,put}_scatterlist to {map,unmap}_dma_buf
     - add new wrapper APIs dma_buf_{map,unmap}_attachment for ease of users
     
- documentation updates as per review comments from Randy Dunlap
     [https://lkml.org/lkml/2011/10/12/439]

v1: original

Sumit Semwal (2):
  dma-buf: Introduce dma buffer sharing mechanism
  dma-buf: Documentation for buffer sharing framework

 Documentation/dma-buf-sharing.txt |  223 ++++++++++++++++++++++++++++
 drivers/base/Kconfig              |   10 ++
 drivers/base/Makefile             |    1 +
 drivers/base/dma-buf.c            |  290 +++++++++++++++++++++++++++++++++++++
 include/linux/dma-buf.h           |  176 ++++++++++++++++++++++
 5 files changed, 700 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/dma-buf-sharing.txt
 create mode 100644 drivers/base/dma-buf.c
 create mode 100644 include/linux/dma-buf.h

-- 
1.7.4.1


^ permalink raw reply	[flat|nested] 64+ messages in thread

* [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-02  8:57 [RFC v2 0/2] Introduce DMA buffer sharing mechanism Sumit Semwal
@ 2011-12-02  8:57 ` Sumit Semwal
  2011-12-02 17:11   ` Konrad Rzeszutek Wilk
                     ` (2 more replies)
  2011-12-02  8:57 ` [RFC v2 2/2] dma-buf: Documentation for buffer sharing framework Sumit Semwal
  1 sibling, 3 replies; 64+ messages in thread
From: Sumit Semwal @ 2011-12-02  8:57 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-mm, linaro-mm-sig,
	dri-devel, linux-media
  Cc: linux, arnd, jesse.barker, m.szyprowski, rob, daniel,
	t.stanislaws, Sumit Semwal, Sumit Semwal

This is the first step in defining a dma buffer sharing mechanism.

A new buffer object dma_buf is added, with operations and API to allow easy
sharing of this buffer object across devices.

The framework allows:
- different devices to 'attach' themselves to this buffer, to facilitate
  backing storage negotiation, using dma_buf_attach() API.
- association of a file pointer with each user-buffer and associated
   allocator-defined operations on that buffer. This operation is called the
   'export' operation.
- this exported buffer-object to be shared with the other entity by asking for
   its 'file-descriptor (fd)', and sharing the fd across.
- a received fd to get the buffer object back, where it can be accessed using
   the associated exporter-defined operations.
- the exporter and user to share the scatterlist using map_dma_buf and
   unmap_dma_buf operations.

Atleast one 'attach()' call is required to be made prior to calling the
map_dma_buf() operation.

Couple of building blocks in map_dma_buf() are added to ease introduction
of sync'ing across exporter and users, and late allocation by the exporter.

*OPTIONALLY*: mmap() file operation is provided for the associated 'fd', as
wrapper over the optional allocator defined mmap(), to be used by devices
that might need one.

More details are there in the documentation patch.

This is based on design suggestions from many people at the mini-summits[1],
most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and
Daniel Vetter <daniel@ffwll.ch>.

The implementation is inspired from proof-of-concept patch-set from
Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing
between two v4l2 devices. [2]

[1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement
[2]: http://lwn.net/Articles/454389

Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Sumit Semwal <sumit.semwal@ti.com>
---
 drivers/base/Kconfig    |   10 ++
 drivers/base/Makefile   |    1 +
 drivers/base/dma-buf.c  |  290 +++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-buf.h |  176 ++++++++++++++++++++++++++++
 4 files changed, 477 insertions(+), 0 deletions(-)
 create mode 100644 drivers/base/dma-buf.c
 create mode 100644 include/linux/dma-buf.h

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 21cf46f..07d8095 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -174,4 +174,14 @@ config SYS_HYPERVISOR
 
 source "drivers/base/regmap/Kconfig"
 
+config DMA_SHARED_BUFFER
+	bool "Buffer framework to be shared between drivers"
+	default n
+	depends on ANON_INODES
+	help
+	  This option enables the framework for buffer-sharing between
+	  multiple drivers. A buffer is associated with a file using driver
+	  APIs extension; the file's descriptor can then be passed on to other
+	  driver.
+
 endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 99a375a..d0df046 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
 obj-y			+= power/
 obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
 obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
+obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o
 obj-$(CONFIG_ISA)	+= isa.o
 obj-$(CONFIG_FW_LOADER)	+= firmware_class.o
 obj-$(CONFIG_NUMA)	+= node.o
diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
new file mode 100644
index 0000000..4b9005e
--- /dev/null
+++ b/drivers/base/dma-buf.c
@@ -0,0 +1,290 @@
+/*
+ * Framework for buffer objects that can be shared across devices/subsystems.
+ *
+ * Copyright(C) 2011 Linaro Limited. All rights reserved.
+ * Author: Sumit Semwal <sumit.semwal@ti.com>
+ *
+ * Many thanks to linaro-mm-sig list, and specially
+ * Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and
+ * Daniel Vetter <daniel@ffwll.ch> for their support in creation and
+ * refining of this idea.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/dma-buf.h>
+#include <linux/anon_inodes.h>
+#include <linux/export.h>
+
+static inline int is_dma_buf_file(struct file *);
+
+static int dma_buf_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct dma_buf *dmabuf;
+
+	if (!is_dma_buf_file(file))
+		return -EINVAL;
+
+	dmabuf = file->private_data;
+
+	if (!dmabuf->ops->mmap)
+		return -EINVAL;
+
+	return dmabuf->ops->mmap(dmabuf, vma);
+}
+
+static int dma_buf_release(struct inode *inode, struct file *file)
+{
+	struct dma_buf *dmabuf;
+
+	if (!is_dma_buf_file(file))
+		return -EINVAL;
+
+	dmabuf = file->private_data;
+
+	dmabuf->ops->release(dmabuf);
+	kfree(dmabuf);
+	return 0;
+}
+
+static const struct file_operations dma_buf_fops = {
+	.mmap		= dma_buf_mmap,
+	.release	= dma_buf_release,
+};
+
+/*
+ * is_dma_buf_file - Check if struct file* is associated with dma_buf
+ */
+static inline int is_dma_buf_file(struct file *file)
+{
+	return file->f_op == &dma_buf_fops;
+}
+
+/**
+ * dma_buf_export - Creates a new dma_buf, and associates an anon file
+ * with this buffer,so it can be exported.
+ * Also connect the allocator specific data and ops to the buffer.
+ *
+ * @priv:	[in]	Attach private data of allocator to this buffer
+ * @ops:	[in]	Attach allocator-defined dma buf ops to the new buffer.
+ * @flags:	[in]	mode flags for the file.
+ *
+ * Returns, on success, a newly created dma_buf object, which wraps the
+ * supplied private data and operations for dma_buf_ops. On failure to
+ * allocate the dma_buf object, it can return NULL.
+ *
+ */
+struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops,
+				int flags)
+{
+	struct dma_buf *dmabuf;
+	struct file *file;
+
+	BUG_ON(!priv || !ops);
+
+	dmabuf = kzalloc(sizeof(struct dma_buf), GFP_KERNEL);
+	if (dmabuf == NULL)
+		return dmabuf;
+
+	dmabuf->priv = priv;
+	dmabuf->ops = ops;
+
+	file = anon_inode_getfile("dmabuf", &dma_buf_fops, dmabuf, flags);
+
+	dmabuf->file = file;
+
+	mutex_init(&dmabuf->lock);
+	INIT_LIST_HEAD(&dmabuf->attachments);
+
+	return dmabuf;
+}
+EXPORT_SYMBOL(dma_buf_export);
+
+
+/**
+ * dma_buf_fd - returns a file descriptor for the given dma_buf
+ * @dmabuf:	[in]	pointer to dma_buf for which fd is required.
+ *
+ * On success, returns an associated 'fd'. Else, returns error.
+ */
+int dma_buf_fd(struct dma_buf *dmabuf)
+{
+	int error, fd;
+
+	if (!dmabuf->file)
+		return -EINVAL;
+
+	error = get_unused_fd_flags(0);
+	if (error < 0)
+		return error;
+	fd = error;
+
+	fd_install(fd, dmabuf->file);
+
+	return fd;
+}
+EXPORT_SYMBOL(dma_buf_fd);
+
+/**
+ * dma_buf_get - returns the dma_buf structure related to an fd
+ * @fd:	[in]	fd associated with the dma_buf to be returned
+ *
+ * On success, returns the dma_buf structure associated with an fd; uses
+ * file's refcounting done by fget to increase refcount. returns ERR_PTR
+ * otherwise.
+ */
+struct dma_buf *dma_buf_get(int fd)
+{
+	struct file *file;
+
+	file = fget(fd);
+
+	if (!file)
+		return ERR_PTR(-EBADF);
+
+	if (!is_dma_buf_file(file)) {
+		fput(file);
+		return ERR_PTR(-EINVAL);
+	}
+
+	return file->private_data;
+}
+EXPORT_SYMBOL(dma_buf_get);
+
+/**
+ * dma_buf_put - decreases refcount of the buffer
+ * @dmabuf:	[in]	buffer to reduce refcount of
+ *
+ * Uses file's refcounting done implicitly by fput()
+ */
+void dma_buf_put(struct dma_buf *dmabuf)
+{
+	BUG_ON(!dmabuf->file);
+
+	fput(dmabuf->file);
+}
+EXPORT_SYMBOL(dma_buf_put);
+
+/**
+ * dma_buf_attach - Add the device to dma_buf's attachments list; optionally,
+ * calls attach() of dma_buf_ops to allow device-specific attach functionality
+ * @dmabuf:	[in]	buffer to attach device to.
+ * @dev:	[in]	device to be attached.
+ *
+ * Returns struct dma_buf_attachment * for this attachment; may return NULL.
+ *
+ */
+struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
+						struct device *dev)
+{
+	struct dma_buf_attachment *attach;
+	int ret;
+
+	BUG_ON(!dmabuf || !dev);
+
+	attach = kzalloc(sizeof(struct dma_buf_attachment), GFP_KERNEL);
+	if (attach == NULL)
+		goto err_alloc;
+
+	mutex_lock(&dmabuf->lock);
+
+	attach->dev = dev;
+	attach->dmabuf = dmabuf;
+	if (dmabuf->ops->attach) {
+		ret = dmabuf->ops->attach(dmabuf, dev, attach);
+		if (!ret)
+			goto err_attach;
+	}
+	list_add(&attach->node, &dmabuf->attachments);
+
+	mutex_unlock(&dmabuf->lock);
+
+err_alloc:
+	return attach;
+err_attach:
+	kfree(attach);
+	mutex_unlock(&dmabuf->lock);
+	return ERR_PTR(ret);
+}
+EXPORT_SYMBOL(dma_buf_attach);
+
+/**
+ * dma_buf_detach - Remove the given attachment from dmabuf's attachments list;
+ * optionally calls detach() of dma_buf_ops for device-specific detach
+ * @dmabuf:	[in]	buffer to detach from.
+ * @attach:	[in]	attachment to be detached; is free'd after this call.
+ *
+ */
+void dma_buf_detach(struct dma_buf *dmabuf, struct dma_buf_attachment *attach)
+{
+	BUG_ON(!dmabuf || !attach);
+
+	mutex_lock(&dmabuf->lock);
+	list_del(&attach->node);
+	if (dmabuf->ops->detach)
+		dmabuf->ops->detach(dmabuf, attach);
+
+	mutex_unlock(&dmabuf->lock);
+	kfree(attach);
+}
+EXPORT_SYMBOL(dma_buf_detach);
+
+/**
+ * dma_buf_map_attachment - Returns the scatterlist table of the attachment;
+ * mapped into _device_ address space. Is a wrapper for map_dma_buf() of the
+ * dma_buf_ops.
+ * @attach:	[in]	attachment whose scatterlist is to be returned
+ * @direction:	[in]	direction of DMA transfer
+ *
+ * Returns sg_table containing the scatterlist to be returned; may return NULL
+ * or ERR_PTR.
+ *
+ */
+struct sg_table * dma_buf_map_attachment(struct dma_buf_attachment *attach,
+                                         enum dma_data_direction direction)
+{
+	struct sg_table *sg_table = ERR_PTR(-EINVAL);
+
+	BUG_ON(!attach || !attach->dmabuf);
+
+	mutex_lock(&attach->dmabuf->lock);
+	if (attach->dmabuf->ops->map_dma_buf)
+		sg_table = attach->dmabuf->ops->map_dma_buf(attach, direction);
+	mutex_unlock(&attach->dmabuf->lock);
+
+	return sg_table;
+}
+EXPORT_SYMBOL(dma_buf_map_attachment);
+
+/**
+ * dma_buf_unmap_attachment - unmaps and decreases usecount of the buffer;might
+ * deallocate the scatterlist associated. Is a wrapper for unmap_dma_buf() of
+ * dma_buf_ops.
+ * @attach:	[in]	attachment to unmap buffer from
+ * @sg_table:	[in]	scatterlist info of the buffer to unmap
+ *
+ */
+void dma_buf_unmap_attachment(struct dma_buf_attachment *attach,
+				struct sg_table *sg_table)
+{
+	BUG_ON(!attach || !attach->dmabuf || !sg_table);
+
+	mutex_lock(&attach->dmabuf->lock);
+	if (attach->dmabuf->ops->unmap_dma_buf)
+		attach->dmabuf->ops->unmap_dma_buf(attach, sg_table);
+	mutex_unlock(&attach->dmabuf->lock);
+
+}
+EXPORT_SYMBOL(dma_buf_unmap_attachment);
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
new file mode 100644
index 0000000..db4b384
--- /dev/null
+++ b/include/linux/dma-buf.h
@@ -0,0 +1,176 @@
+/*
+ * Header file for dma buffer sharing framework.
+ *
+ * Copyright(C) 2011 Linaro Limited. All rights reserved.
+ * Author: Sumit Semwal <sumit.semwal@ti.com>
+ *
+ * Many thanks to linaro-mm-sig list, and specially
+ * Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and
+ * Daniel Vetter <daniel@ffwll.ch> for their support in creation and
+ * refining of this idea.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __DMA_BUF_H__
+#define __DMA_BUF_H__
+
+#include <linux/file.h>
+#include <linux/err.h>
+#include <linux/device.h>
+#include <linux/scatterlist.h>
+#include <linux/list.h>
+#include <linux/dma-mapping.h>
+
+struct dma_buf;
+
+/**
+ * struct dma_buf_attachment - holds device-buffer attachment data
+ * @dmabuf: buffer for this attachment.
+ * @dev: device attached to the buffer.
+ * @node: list_head to allow manipulation of list of dma_buf_attachment.
+ * @priv: exporter-specific attachment data.
+ */
+struct dma_buf_attachment {
+	struct dma_buf *dmabuf;
+	struct device *dev;
+	struct list_head node;
+	void *priv;
+};
+
+/**
+ * struct dma_buf_ops - operations possible on struct dma_buf
+ * @attach: allows different devices to 'attach' themselves to the given
+ *	    buffer. It might return -EBUSY to signal that backing storage
+ *	    is already allocated and incompatible with the requirements
+ *	    of requesting device. [optional]
+ * @detach: detach a given device from this buffer. [optional]
+ * @map_dma_buf: returns list of scatter pages allocated, increases usecount
+ *		 of the buffer. Requires atleast one attach to be called
+ *		 before. Returned sg list should already be mapped into
+ *		 _device_ address space. This call may sleep.
+ * @unmap_dma_buf: decreases usecount of buffer, might deallocate scatter
+ *		   pages.
+ * @mmap: memory map this buffer - optional.
+ * @release: release this buffer; to be called after the last dma_buf_put.
+ * @sync_sg_for_cpu: sync the sg list for cpu.
+ * @sync_sg_for_device: synch the sg list for device.
+ */
+struct dma_buf_ops {
+	int (*attach)(struct dma_buf *, struct device *,
+			struct dma_buf_attachment *);
+
+	void (*detach)(struct dma_buf *, struct dma_buf_attachment *);
+
+	/* For {map,unmap}_dma_buf below, any specific buffer attributes
+	 * required should get added to device_dma_parameters accessible
+	 * via dev->dma_params.
+	 */
+	struct sg_table * (*map_dma_buf)(struct dma_buf_attachment *,
+						enum dma_data_direction);
+	void (*unmap_dma_buf)(struct dma_buf_attachment *,
+						struct sg_table *);
+	/* TODO: Add try_map_dma_buf version, to return immed with -EBUSY
+	 * if the call would block.
+	 */
+
+	/* allow mmap optionally for devices that need it */
+	int (*mmap)(struct dma_buf *, struct vm_area_struct *);
+	/* after final dma_buf_put() */
+	void (*release)(struct dma_buf *);
+
+	/* allow allocator to take care of cache ops */
+	void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
+	void (*sync_sg_for_device)(struct dma_buf *, struct device *);
+};
+
+/**
+ * struct dma_buf - shared buffer object
+ * @file: file pointer used for sharing buffers across, and for refcounting.
+ * @attachments: list of dma_buf_attachment that denotes all devices attached.
+ * @ops: dma_buf_ops associated with this buffer object
+ * @priv: user specific private data
+ */
+struct dma_buf {
+	size_t size;
+	struct file *file;
+	struct list_head attachments;
+	const struct dma_buf_ops *ops;
+	/* mutex to serialize list manipulation and other ops */
+	struct mutex lock;
+	void *priv;
+};
+
+#ifdef CONFIG_DMA_SHARED_BUFFER
+struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
+							struct device *dev);
+void dma_buf_detach(struct dma_buf *dmabuf,
+				struct dma_buf_attachment *dmabuf_attach);
+struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops, int flags);
+int dma_buf_fd(struct dma_buf *dmabuf);
+struct dma_buf *dma_buf_get(int fd);
+void dma_buf_put(struct dma_buf *dmabuf);
+
+struct sg_table * dma_buf_map_attachment(struct dma_buf_attachment *,
+					 enum dma_data_direction);
+void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *);
+#else
+
+static inline struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
+							struct device *dev)
+{
+	return ERR_PTR(-ENODEV);
+}
+
+static inline void dma_buf_detach(struct dma_buf *dmabuf,
+				  struct dma_buf_attachment *dmabuf_attach)
+{
+	return;
+}
+
+static inline struct dma_buf *dma_buf_export(void *priv,
+						struct dma_buf_ops *ops,
+						int flags)
+{
+	return ERR_PTR(-ENODEV);
+}
+
+static inline int dma_buf_fd(struct dma_buf *dmabuf)
+{
+	return -ENODEV;
+}
+
+static inline struct dma_buf *dma_buf_get(int fd)
+{
+	return ERR_PTR(-ENODEV);
+}
+
+static inline void dma_buf_put(struct dma_buf *dmabuf)
+{
+	return;
+}
+
+static inline struct sg_table * dma_buf_map_attachment(
+	struct dma_buf_attachment *, enum dma_data_direction)
+{
+	return ERR_PTR(-ENODEV);
+}
+
+static inline void dma_buf_unmap_attachment(struct dma_buf_attachment *,
+						struct sg_table *)
+{
+	return;
+}
+
+#endif /* CONFIG_DMA_SHARED_BUFFER */
+
+#endif /* __DMA_BUF_H__ */
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [RFC v2 2/2] dma-buf: Documentation for buffer sharing framework
  2011-12-02  8:57 [RFC v2 0/2] Introduce DMA buffer sharing mechanism Sumit Semwal
  2011-12-02  8:57 ` [RFC v2 1/2] dma-buf: Introduce dma " Sumit Semwal
@ 2011-12-02  8:57 ` Sumit Semwal
  1 sibling, 0 replies; 64+ messages in thread
From: Sumit Semwal @ 2011-12-02  8:57 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, linux-mm, linaro-mm-sig,
	dri-devel, linux-media
  Cc: linux, arnd, jesse.barker, m.szyprowski, rob, daniel,
	t.stanislaws, Sumit Semwal, Sumit Semwal

Add documentation for dma buffer sharing framework, explaining the
various operations, members and API of the dma buffer sharing
framework.

Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Sumit Semwal <sumit.semwal@ti.com>
---
 Documentation/dma-buf-sharing.txt |  223 +++++++++++++++++++++++++++++++++++++
 1 files changed, 223 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/dma-buf-sharing.txt

diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt
new file mode 100644
index 0000000..2c1bdf6
--- /dev/null
+++ b/Documentation/dma-buf-sharing.txt
@@ -0,0 +1,223 @@
+                    DMA Buffer Sharing API Guide
+                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+                            Sumit Semwal
+                <sumit dot semwal at linaro dot org>
+                 <sumit dot semwal at ti dot com>
+
+This document serves as a guide to device-driver writers on what is the dma-buf
+buffer sharing API, how to use it for exporting and using shared buffers.
+
+Any device driver which wishes to be a part of DMA buffer sharing, can do so as
+either the 'exporter' of buffers, or the 'user' of buffers.
+
+Say a driver A wants to use buffers created by driver B, then we call B as the
+exporter, and A as buffer-user.
+
+The exporter
+- implements and manages operations[1] for the buffer
+- allows other users to share the buffer by using dma_buf sharing APIs,
+- manages the details of buffer allocation,
+- decides about the actual backing storage where this allocation happens,
+- takes care of any migration of scatterlist - for all (shared) users of this
+   buffer,
+- optionally, provides mmap capability for drivers that need it.
+
+The buffer-user
+- is one of (many) sharing users of the buffer.
+- doesn't need to worry about how the buffer is allocated, or where.
+- needs a mechanism to get access to the scatterlist that makes up this buffer
+   in memory, mapped into its own address space, so it can access the same area
+   of memory.
+
+
+The dma_buf buffer sharing API usage contains the following steps:
+
+1. Exporter announces that it wishes to export a buffer
+2. Userspace gets the file descriptor associated with the exported buffer, and
+   passes it around to potential buffer-users based on use case
+3. Each buffer-user 'connects' itself to the buffer
+4. When needed, buffer-user requests access to the buffer from exporter
+5. When finished with its use, the buffer-user notifies end-of-DMA to exporter
+6. when buffer-user is done using this buffer completely, it 'disconnects'
+   itself from the buffer.
+
+
+1. Exporter's announcement of buffer export
+
+   The buffer exporter announces its wish to export a buffer. In this, it
+   connects its own private buffer data, provides implementation for operations
+   that can be performed on the exported dma_buf, and flags for the file
+   associated with this buffer.
+
+   Interface:
+      struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops,
+                                int flags)
+
+   If this succeeds, dma_buf_export allocates a dma_buf structure, and returns a
+   pointer to the same. It also associates an anonymous file with this buffer,
+   so it can be exported. On failure to allocate the dma_buf object, it returns
+   NULL.
+
+2. Userspace gets a handle to pass around to potential buffer-users
+
+   Userspace entity requests for a file-descriptor (fd) which is a handle to the
+   anonymous file associated with the buffer. It can then share the fd with other
+   drivers and/or processes.
+
+   Interface:
+      int dma_buf_fd(struct dma_buf *dmabuf)
+
+   This API installs an fd for the anonymous file associated with this buffer;
+   returns either 'fd', or error.
+
+3. Each buffer-user 'connects' itself to the buffer
+
+   Each buffer-user now gets a reference to the buffer, using the fd passed to
+   it.
+
+   Interface:
+      struct dma_buf *dma_buf_get(int fd)
+
+   This API will return a reference to the dma_buf, and increment refcount for
+   it.
+
+   After this, the buffer-user needs to attach its device with the buffer, which
+   helps the exporter to know of device buffer constraints.
+
+   Interface:
+      struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
+                                                struct device *dev)
+
+   This API returns reference to an attachment structure, which is then used
+   for scatterlist operations. It will optionally call the 'attach' dma_buf
+   operation, if provided by the exporter.
+
+   The dma-buf sharing framework does the bookkeeping bits related to managing
+   the list of all attachments to a buffer.
+
+Until this stage, the buffer-exporter has the option to choose not to actually
+allocate the backing storage for this buffer, but wait for the first buffer-user
+to request use of buffer for allocation.
+
+
+4. When needed, buffer-user requests access to the buffer
+
+   Whenever a buffer-user wants to use the buffer for any DMA, it asks for
+   access to the buffer using dma_buf_map_attachment API. At least one attach to
+   the buffer must have happened before map_dma_buf can be called.
+
+   Interface:
+      struct sg_table * dma_buf_map_attachment(struct dma_buf_attachment *,
+                                         enum dma_data_direction);
+
+   This is a wrapper to dma_buf->ops->map_dma_buf operation, which hides the
+   "dma_buf->ops->" indirection from the users of this interface.
+
+   In struct dma_buf_ops, map_dma_buf is defined as
+      struct sg_table * (*map_dma_buf)(struct dma_buf_attachment *,
+                                                enum dma_data_direction);
+
+   It is one of the buffer operations that must be implemented by the exporter.
+   It should return the sg_table containing scatterlist for this buffer, mapped
+   into caller's address space.
+
+   If this is being called for the first time, the exporter can now choose to
+   scan through the list of attachments for this buffer, collate the requirements
+   of the attached devices, and choose an appropriate backing storage for the
+   buffer.
+
+   Based on enum dma_data_direction, it might be possible to have multiple users
+   accessing at the same time (for reading, maybe), or any other kind of sharing
+   that the exporter might wish to make available to buffer-users.
+
+
+5. When finished, the buffer-user notifies end-of-DMA to exporter
+
+   Once the DMA for the current buffer-user is over, it signals 'end-of-DMA' to
+   the exporter using the dma_buf_unmap_attachment API.
+
+   Interface:
+      void dma_buf_unmap_attachment(struct dma_buf_attachment *,
+                                    struct sg_table *);
+
+   This is a wrapper to dma_buf->ops->unmap_dma_buf() operation, which hides the
+   "dma_buf->ops->" indirection from the users of this interface.
+
+   In struct dma_buf_ops, unmap_dma_buf is defined as
+      void (*unmap_dma_buf)(struct dma_buf_attachment *, struct sg_table *);
+
+   unmap_dma_buf signifies the end-of-DMA for the attachment provided. Like
+   map_dma_buf, this API also must be implemented by the exporter.
+
+
+6. when buffer-user is done using this buffer, it 'disconnects' itself from the
+   buffer.
+
+   After the buffer-user has no more interest in using this buffer, it should
+   disconnect itself from the buffer:
+
+   - it first detaches itself from the buffer.
+
+   Interface:
+      void dma_buf_detach(struct dma_buf *dmabuf,
+                          struct dma_buf_attachment *dmabuf_attach);
+
+   This API removes the attachment from the list in dmabuf, and optionally calls
+   dma_buf->ops->detach(), if provided by exporter, for any housekeeping bits.
+
+   - Then, the buffer-user returns the buffer reference to exporter.
+
+   Interface:
+     void dma_buf_put(struct dma_buf *dmabuf);
+
+   This API then reduces the refcount for this buffer.
+
+   If, as a result of this call, the refcount becomes 0, the 'release' file
+   operation related to this fd is called. It calls the dmabuf->ops->release()
+   operation in turn, and frees the memory allocated for dmabuf when exported.
+
+NOTES:
+- Importance of attach-detach and {map,unmap}_dma_buf operation pairs
+   The attach-detach calls allow the exporter to figure out backing-storage
+   constraints for the currently-interested devices. This allows preferential
+   allocation, and/or migration of pages across different types of storage
+   available, if possible.
+
+   Bracketing of DMA access with {map,unmap}_dma_buf operations is essential
+   to allow just-in-time backing of storage, and migration mid-way through a
+   use-case.
+
+- Migration of backing storage if needed
+   If after
+   - at least one map_dma_buf has happened,
+   - and the backing storage has been allocated for this buffer,
+   another new buffer-user intends to attach itself to this buffer, it might
+   be allowed, if possible for the exporter.
+
+   In case it is allowed by the exporter:
+    if the new buffer-user has stricter 'backing-storage constraints', and the
+    exporter can handle these constraints, the exporter can just stall on the
+    map_dma_buf until all outstanding access is completed (as signalled by
+    unmap_dma_buf).
+    Once all ongoing access is completed, the exporter could potentially move
+    the buffer to the stricter backing-storage, and then allow further
+    {map,unmap}_dma_buf operations from any buffer-user from the migrated
+    backing-storage.
+
+   If the exporter cannot fulfil the backing-storage constraints of the new
+   buffer-user device as requested, dma_buf_attach() would return an error to
+   denote non-compatibility of the new buffer-sharing request with the current
+   buffer.
+
+   If the exporter chooses not to allow an attach() operation once a
+   map_dma_buf() API has been called, it simply returns an error.
+
+- mmap file operation
+   An mmap() file operation is provided for the fd associated with the buffer.
+   If the exporter defines an mmap operation, the mmap() fop calls this to allow
+   mmap for devices that might need it; if not, it returns an error.
+
+References:
+[1] struct dma_buf_ops in include/linux/dma-buf.h
+[2] All interfaces mentioned above defined in include/linux/dma-buf.h
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-02  8:57 ` [RFC v2 1/2] dma-buf: Introduce dma " Sumit Semwal
@ 2011-12-02 17:11   ` Konrad Rzeszutek Wilk
  2011-12-05  9:48     ` Semwal, Sumit
  2011-12-05 17:18   ` Arnd Bergmann
  2012-01-09  6:20   ` InKi Dae
  2 siblings, 1 reply; 64+ messages in thread
From: Konrad Rzeszutek Wilk @ 2011-12-02 17:11 UTC (permalink / raw)
  To: Sumit Semwal
  Cc: linux-kernel, linux-arm-kernel, linux-mm, linaro-mm-sig,
	dri-devel, linux-media, t.stanislaws, linux, arnd, rob,
	Sumit Semwal, m.szyprowski

On Fri, Dec 02, 2011 at 02:27:31PM +0530, Sumit Semwal wrote:
> This is the first step in defining a dma buffer sharing mechanism.
> 
> A new buffer object dma_buf is added, with operations and API to allow easy
> sharing of this buffer object across devices.
> 
> The framework allows:
> - different devices to 'attach' themselves to this buffer, to facilitate
>   backing storage negotiation, using dma_buf_attach() API.
> - association of a file pointer with each user-buffer and associated
>    allocator-defined operations on that buffer. This operation is called the
>    'export' operation.
> - this exported buffer-object to be shared with the other entity by asking for
>    its 'file-descriptor (fd)', and sharing the fd across.
> - a received fd to get the buffer object back, where it can be accessed using
>    the associated exporter-defined operations.
> - the exporter and user to share the scatterlist using map_dma_buf and
>    unmap_dma_buf operations.
> 
> Atleast one 'attach()' call is required to be made prior to calling the
> map_dma_buf() operation.
> 
> Couple of building blocks in map_dma_buf() are added to ease introduction
> of sync'ing across exporter and users, and late allocation by the exporter.
> 
> *OPTIONALLY*: mmap() file operation is provided for the associated 'fd', as
> wrapper over the optional allocator defined mmap(), to be used by devices
> that might need one.
> 
> More details are there in the documentation patch.
> 
> This is based on design suggestions from many people at the mini-summits[1],
> most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and
> Daniel Vetter <daniel@ffwll.ch>.
> 
> The implementation is inspired from proof-of-concept patch-set from
> Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing
> between two v4l2 devices. [2]
> 
> [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement
> [2]: http://lwn.net/Articles/454389
> 
> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com>

You have a clone? You only need one SOB.


> ---
>  drivers/base/Kconfig    |   10 ++
>  drivers/base/Makefile   |    1 +
>  drivers/base/dma-buf.c  |  290 +++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma-buf.h |  176 ++++++++++++++++++++++++++++
>  4 files changed, 477 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/base/dma-buf.c
>  create mode 100644 include/linux/dma-buf.h
> 
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index 21cf46f..07d8095 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -174,4 +174,14 @@ config SYS_HYPERVISOR
>  
>  source "drivers/base/regmap/Kconfig"
>  
> +config DMA_SHARED_BUFFER
> +	bool "Buffer framework to be shared between drivers"
> +	default n
> +	depends on ANON_INODES
> +	help
> +	  This option enables the framework for buffer-sharing between
> +	  multiple drivers. A buffer is associated with a file using driver
> +	  APIs extension; the file's descriptor can then be passed on to other
> +	  driver.
> +
>  endmenu
> diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> index 99a375a..d0df046 100644
> --- a/drivers/base/Makefile
> +++ b/drivers/base/Makefile
> @@ -8,6 +8,7 @@ obj-$(CONFIG_DEVTMPFS)	+= devtmpfs.o
>  obj-y			+= power/
>  obj-$(CONFIG_HAS_DMA)	+= dma-mapping.o
>  obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
> +obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o
>  obj-$(CONFIG_ISA)	+= isa.o
>  obj-$(CONFIG_FW_LOADER)	+= firmware_class.o
>  obj-$(CONFIG_NUMA)	+= node.o
> diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
> new file mode 100644
> index 0000000..4b9005e
> --- /dev/null
> +++ b/drivers/base/dma-buf.c
> @@ -0,0 +1,290 @@
> +/*
> + * Framework for buffer objects that can be shared across devices/subsystems.
> + *
> + * Copyright(C) 2011 Linaro Limited. All rights reserved.
> + * Author: Sumit Semwal <sumit.semwal@ti.com>

OK, so the SOB should be from @ti.com then.

> + *
> + * Many thanks to linaro-mm-sig list, and specially
> + * Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and
> + * Daniel Vetter <daniel@ffwll.ch> for their support in creation and
> + * refining of this idea.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published by
> + * the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/fs.h>
> +#include <linux/slab.h>
> +#include <linux/dma-buf.h>
> +#include <linux/anon_inodes.h>
> +#include <linux/export.h>
> +
> +static inline int is_dma_buf_file(struct file *);
> +
> +static int dma_buf_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	struct dma_buf *dmabuf;
> +
> +	if (!is_dma_buf_file(file))
> +		return -EINVAL;
> +
> +	dmabuf = file->private_data;
> +

Should you check if dmabuf is NULL and or dmabuf->ops is NULL too?

Hm, you probably don't need to check for dmabuf, but from
looking at  dma_buf_export one could pass  a NULL for the ops.

> +	if (!dmabuf->ops->mmap)
> +		return -EINVAL;
> +
> +	return dmabuf->ops->mmap(dmabuf, vma);
> +}
> +
> +static int dma_buf_release(struct inode *inode, struct file *file)
> +{
> +	struct dma_buf *dmabuf;
> +
> +	if (!is_dma_buf_file(file))
> +		return -EINVAL;
> +
> +	dmabuf = file->private_data;
> +

No checking here for ops or ops->release?

> +	dmabuf->ops->release(dmabuf);
> +	kfree(dmabuf);
> +	return 0;
> +}
> +
> +static const struct file_operations dma_buf_fops = {
> +	.mmap		= dma_buf_mmap,
> +	.release	= dma_buf_release,
> +};
> +
> +/*
> + * is_dma_buf_file - Check if struct file* is associated with dma_buf
> + */
> +static inline int is_dma_buf_file(struct file *file)
> +{
> +	return file->f_op == &dma_buf_fops;
> +}
> +
> +/**

I don't think the ** is anymore the current kernel doc format.

> + * dma_buf_export - Creates a new dma_buf, and associates an anon file
> + * with this buffer,so it can be exported.

Put a space there.

> + * Also connect the allocator specific data and ops to the buffer.
> + *
> + * @priv:	[in]	Attach private data of allocator to this buffer
> + * @ops:	[in]	Attach allocator-defined dma buf ops to the new buffer.
> + * @flags:	[in]	mode flags for the file.
> + *
> + * Returns, on success, a newly created dma_buf object, which wraps the
> + * supplied private data and operations for dma_buf_ops. On failure to
> + * allocate the dma_buf object, it can return NULL.

"it can" I think the right word is "it will".

> + *
> + */
> +struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops,
> +				int flags)
> +{
> +	struct dma_buf *dmabuf;
> +	struct file *file;
> +
> +	BUG_ON(!priv || !ops);

Whoa. Crash the whole kernel b/c of this? No no. You should
use WARN_ON and just return NULL.

> +
> +	dmabuf = kzalloc(sizeof(struct dma_buf), GFP_KERNEL);
> +	if (dmabuf == NULL)
> +		return dmabuf;

Hmm, why not return ERR_PTR(-ENOMEM); ?

> +
> +	dmabuf->priv = priv;
> +	dmabuf->ops = ops;
> +
> +	file = anon_inode_getfile("dmabuf", &dma_buf_fops, dmabuf, flags);
> +
> +	dmabuf->file = file;
> +
> +	mutex_init(&dmabuf->lock);
> +	INIT_LIST_HEAD(&dmabuf->attachments);
> +
> +	return dmabuf;
> +}
> +EXPORT_SYMBOL(dma_buf_export);

_GPL ?

> +
> +
> +/**
> + * dma_buf_fd - returns a file descriptor for the given dma_buf
> + * @dmabuf:	[in]	pointer to dma_buf for which fd is required.
> + *
> + * On success, returns an associated 'fd'. Else, returns error.
> + */
> +int dma_buf_fd(struct dma_buf *dmabuf)
> +{
> +	int error, fd;
> +

Should you check if dmabuf is NULL first?

> +	if (!dmabuf->file)
> +		return -EINVAL;
> +
> +	error = get_unused_fd_flags(0);
> +	if (error < 0)
> +		return error;
> +	fd = error;
> +
> +	fd_install(fd, dmabuf->file);
> +
> +	return fd;
> +}
> +EXPORT_SYMBOL(dma_buf_fd);

GPL?
> +
> +/**
> + * dma_buf_get - returns the dma_buf structure related to an fd
> + * @fd:	[in]	fd associated with the dma_buf to be returned
> + *
> + * On success, returns the dma_buf structure associated with an fd; uses
> + * file's refcounting done by fget to increase refcount. returns ERR_PTR
> + * otherwise.
> + */
> +struct dma_buf *dma_buf_get(int fd)
> +{
> +	struct file *file;
> +
> +	file = fget(fd);
> +
> +	if (!file)
> +		return ERR_PTR(-EBADF);
> +
> +	if (!is_dma_buf_file(file)) {
> +		fput(file);
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	return file->private_data;
> +}
> +EXPORT_SYMBOL(dma_buf_get);

GPL
> +
> +/**
> + * dma_buf_put - decreases refcount of the buffer
> + * @dmabuf:	[in]	buffer to reduce refcount of
> + *
> + * Uses file's refcounting done implicitly by fput()
> + */
> +void dma_buf_put(struct dma_buf *dmabuf)
> +{
> +	BUG_ON(!dmabuf->file);

Yikes. BUG_ON? Can't you do WARN_ON and continue on without
doing the refcounting?

> +
> +	fput(dmabuf->file);
> +}
> +EXPORT_SYMBOL(dma_buf_put);
> +
> +/**
> + * dma_buf_attach - Add the device to dma_buf's attachments list; optionally,
> + * calls attach() of dma_buf_ops to allow device-specific attach functionality
> + * @dmabuf:	[in]	buffer to attach device to.
> + * @dev:	[in]	device to be attached.
> + *
> + * Returns struct dma_buf_attachment * for this attachment; may return NULL.
> + *
> + */
> +struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
> +						struct device *dev)

'struct device' should be at the same column as 'struct dma_buf' ..

> +{
> +	struct dma_buf_attachment *attach;
> +	int ret;
> +
> +	BUG_ON(!dmabuf || !dev);

Again, BUG_ON...

> +
> +	attach = kzalloc(sizeof(struct dma_buf_attachment), GFP_KERNEL);
> +	if (attach == NULL)
> +		goto err_alloc;
> +
> +	mutex_lock(&dmabuf->lock);
> +
> +	attach->dev = dev;
> +	attach->dmabuf = dmabuf;
> +	if (dmabuf->ops->attach) {

No checking first of dmabuf->ops?

> +		ret = dmabuf->ops->attach(dmabuf, dev, attach);
> +		if (!ret)
> +			goto err_attach;
> +	}
> +	list_add(&attach->node, &dmabuf->attachments);
> +
> +	mutex_unlock(&dmabuf->lock);
> +
> +err_alloc:
> +	return attach;
> +err_attach:
> +	kfree(attach);
> +	mutex_unlock(&dmabuf->lock);
> +	return ERR_PTR(ret);
> +}
> +EXPORT_SYMBOL(dma_buf_attach);

GPL
> +
> +/**
> + * dma_buf_detach - Remove the given attachment from dmabuf's attachments list;
> + * optionally calls detach() of dma_buf_ops for device-specific detach
> + * @dmabuf:	[in]	buffer to detach from.
> + * @attach:	[in]	attachment to be detached; is free'd after this call.
> + *
> + */
> +void dma_buf_detach(struct dma_buf *dmabuf, struct dma_buf_attachment *attach)
> +{
> +	BUG_ON(!dmabuf || !attach);
> +
> +	mutex_lock(&dmabuf->lock);
> +	list_del(&attach->node);
> +	if (dmabuf->ops->detach)
> +		dmabuf->ops->detach(dmabuf, attach);
> +
> +	mutex_unlock(&dmabuf->lock);
> +	kfree(attach);
> +}
> +EXPORT_SYMBOL(dma_buf_detach);
> +
> +/**
> + * dma_buf_map_attachment - Returns the scatterlist table of the attachment;
> + * mapped into _device_ address space. Is a wrapper for map_dma_buf() of the
> + * dma_buf_ops.
> + * @attach:	[in]	attachment whose scatterlist is to be returned
> + * @direction:	[in]	direction of DMA transfer
> + *
> + * Returns sg_table containing the scatterlist to be returned; may return NULL
> + * or ERR_PTR.
> + *
> + */
> +struct sg_table * dma_buf_map_attachment(struct dma_buf_attachment *attach,
> +                                         enum dma_data_direction direction)
> +{
> +	struct sg_table *sg_table = ERR_PTR(-EINVAL);
> +
> +	BUG_ON(!attach || !attach->dmabuf);
> +
> +	mutex_lock(&attach->dmabuf->lock);
> +	if (attach->dmabuf->ops->map_dma_buf)
> +		sg_table = attach->dmabuf->ops->map_dma_buf(attach, direction);
> +	mutex_unlock(&attach->dmabuf->lock);
> +
> +	return sg_table;
> +}
> +EXPORT_SYMBOL(dma_buf_map_attachment);
> +
> +/**
> + * dma_buf_unmap_attachment - unmaps and decreases usecount of the buffer;might
> + * deallocate the scatterlist associated. Is a wrapper for unmap_dma_buf() of
> + * dma_buf_ops.
> + * @attach:	[in]	attachment to unmap buffer from
> + * @sg_table:	[in]	scatterlist info of the buffer to unmap
> + *
> + */
> +void dma_buf_unmap_attachment(struct dma_buf_attachment *attach,
> +				struct sg_table *sg_table)
> +{
> +	BUG_ON(!attach || !attach->dmabuf || !sg_table);
> +
> +	mutex_lock(&attach->dmabuf->lock);
> +	if (attach->dmabuf->ops->unmap_dma_buf)
> +		attach->dmabuf->ops->unmap_dma_buf(attach, sg_table);
> +	mutex_unlock(&attach->dmabuf->lock);
> +
> +}
> +EXPORT_SYMBOL(dma_buf_unmap_attachment);
> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> new file mode 100644
> index 0000000..db4b384
> --- /dev/null
> +++ b/include/linux/dma-buf.h
> @@ -0,0 +1,176 @@
> +/*
> + * Header file for dma buffer sharing framework.
> + *
> + * Copyright(C) 2011 Linaro Limited. All rights reserved.
> + * Author: Sumit Semwal <sumit.semwal@ti.com>
> + *
> + * Many thanks to linaro-mm-sig list, and specially
> + * Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and
> + * Daniel Vetter <daniel@ffwll.ch> for their support in creation and
> + * refining of this idea.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published by
> + * the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +#ifndef __DMA_BUF_H__
> +#define __DMA_BUF_H__
> +
> +#include <linux/file.h>
> +#include <linux/err.h>
> +#include <linux/device.h>
> +#include <linux/scatterlist.h>
> +#include <linux/list.h>
> +#include <linux/dma-mapping.h>
> +
> +struct dma_buf;
> +
> +/**
> + * struct dma_buf_attachment - holds device-buffer attachment data
> + * @dmabuf: buffer for this attachment.
> + * @dev: device attached to the buffer.
> + * @node: list_head to allow manipulation of list of dma_buf_attachment.
> + * @priv: exporter-specific attachment data.
> + */
> +struct dma_buf_attachment {
> +	struct dma_buf *dmabuf;
> +	struct device *dev;
> +	struct list_head node;
> +	void *priv;
> +};
> +
> +/**
> + * struct dma_buf_ops - operations possible on struct dma_buf
> + * @attach: allows different devices to 'attach' themselves to the given
> + *	    buffer. It might return -EBUSY to signal that backing storage
> + *	    is already allocated and incompatible with the requirements
> + *	    of requesting device. [optional]
> + * @detach: detach a given device from this buffer. [optional]
> + * @map_dma_buf: returns list of scatter pages allocated, increases usecount
> + *		 of the buffer. Requires atleast one attach to be called
> + *		 before. Returned sg list should already be mapped into
> + *		 _device_ address space. This call may sleep.
> + * @unmap_dma_buf: decreases usecount of buffer, might deallocate scatter
> + *		   pages.
> + * @mmap: memory map this buffer - optional.
> + * @release: release this buffer; to be called after the last dma_buf_put.
> + * @sync_sg_for_cpu: sync the sg list for cpu.
> + * @sync_sg_for_device: synch the sg list for device.
> + */
> +struct dma_buf_ops {
> +	int (*attach)(struct dma_buf *, struct device *,
> +			struct dma_buf_attachment *);
> +
> +	void (*detach)(struct dma_buf *, struct dma_buf_attachment *);
> +
> +	/* For {map,unmap}_dma_buf below, any specific buffer attributes
> +	 * required should get added to device_dma_parameters accessible
> +	 * via dev->dma_params.
> +	 */
> +	struct sg_table * (*map_dma_buf)(struct dma_buf_attachment *,
> +						enum dma_data_direction);
> +	void (*unmap_dma_buf)(struct dma_buf_attachment *,
> +						struct sg_table *);
> +	/* TODO: Add try_map_dma_buf version, to return immed with -EBUSY
> +	 * if the call would block.
> +	 */
> +
> +	/* allow mmap optionally for devices that need it */
> +	int (*mmap)(struct dma_buf *, struct vm_area_struct *);
> +	/* after final dma_buf_put() */
> +	void (*release)(struct dma_buf *);
> +
> +	/* allow allocator to take care of cache ops */
> +	void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
> +	void (*sync_sg_for_device)(struct dma_buf *, struct device *);
> +};
> +
> +/**
> + * struct dma_buf - shared buffer object
> + * @file: file pointer used for sharing buffers across, and for refcounting.
> + * @attachments: list of dma_buf_attachment that denotes all devices attached.
> + * @ops: dma_buf_ops associated with this buffer object
> + * @priv: user specific private data
> + */
> +struct dma_buf {
> +	size_t size;
> +	struct file *file;
> +	struct list_head attachments;
> +	const struct dma_buf_ops *ops;
> +	/* mutex to serialize list manipulation and other ops */
> +	struct mutex lock;
> +	void *priv;
> +};
> +
> +#ifdef CONFIG_DMA_SHARED_BUFFER
> +struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
> +							struct device *dev);
> +void dma_buf_detach(struct dma_buf *dmabuf,
> +				struct dma_buf_attachment *dmabuf_attach);
> +struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops, int flags);
> +int dma_buf_fd(struct dma_buf *dmabuf);
> +struct dma_buf *dma_buf_get(int fd);
> +void dma_buf_put(struct dma_buf *dmabuf);
> +
> +struct sg_table * dma_buf_map_attachment(struct dma_buf_attachment *,
> +					 enum dma_data_direction);
> +void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *);
> +#else
> +
> +static inline struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
> +							struct device *dev)
> +{
> +	return ERR_PTR(-ENODEV);
> +}
> +
> +static inline void dma_buf_detach(struct dma_buf *dmabuf,
> +				  struct dma_buf_attachment *dmabuf_attach)
> +{
> +	return;
> +}
> +
> +static inline struct dma_buf *dma_buf_export(void *priv,
> +						struct dma_buf_ops *ops,
> +						int flags)
> +{
> +	return ERR_PTR(-ENODEV);
> +}
> +
> +static inline int dma_buf_fd(struct dma_buf *dmabuf)
> +{
> +	return -ENODEV;
> +}
> +
> +static inline struct dma_buf *dma_buf_get(int fd)
> +{
> +	return ERR_PTR(-ENODEV);
> +}
> +
> +static inline void dma_buf_put(struct dma_buf *dmabuf)
> +{
> +	return;
> +}
> +
> +static inline struct sg_table * dma_buf_map_attachment(
> +	struct dma_buf_attachment *, enum dma_data_direction)
> +{
> +	return ERR_PTR(-ENODEV);
> +}
> +
> +static inline void dma_buf_unmap_attachment(struct dma_buf_attachment *,
> +						struct sg_table *)
> +{
> +	return;
> +}
> +
> +#endif /* CONFIG_DMA_SHARED_BUFFER */
> +
> +#endif /* __DMA_BUF_H__ */
> -- 
> 1.7.4.1
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-02 17:11   ` Konrad Rzeszutek Wilk
@ 2011-12-05  9:48     ` Semwal, Sumit
  0 siblings, 0 replies; 64+ messages in thread
From: Semwal, Sumit @ 2011-12-05  9:48 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-kernel, linux-arm-kernel, linux-mm, linaro-mm-sig,
	dri-devel, linux-media, t.stanislaws, linux, arnd, rob,
	Sumit Semwal, m.szyprowski

Hi Konrad,

On Fri, Dec 2, 2011 at 10:41 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Fri, Dec 02, 2011 at 02:27:31PM +0530, Sumit Semwal wrote:
>> This is the first step in defining a dma buffer sharing mechanism.
>>
<snip>
>>
>> [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement
>> [2]: http://lwn.net/Articles/454389
>>
>> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
>> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com>
>
> You have a clone? You only need one SOB.
:) Thanks for your review - Well, not a clone, but I have two 'employers' :))

I have a rather weird reason for this - I am employed with Texas
Instruments, but working with Linaro as well. And due to some
'non-technical' reasons, I need to send this work from @ti.com mail
ID. At the same time, I would like to acknowledge that this work was
done as part of the Linaro umbrella, so I put another SOB @linaro.org.

>
>
<snip>
>> + * Copyright(C) 2011 Linaro Limited. All rights reserved.
>> + * Author: Sumit Semwal <sumit.semwal@ti.com>
>
> OK, so the SOB should be from @ti.com then.
>
>> + *
<snip>
>> +static int dma_buf_mmap(struct file *file, struct vm_area_struct *vma)
>> +{
>> +     struct dma_buf *dmabuf;
>> +
>> +     if (!is_dma_buf_file(file))
>> +             return -EINVAL;
>> +
>> +     dmabuf = file->private_data;
>> +
>
> Should you check if dmabuf is NULL and or dmabuf->ops is NULL too?
>
> Hm, you probably don't need to check for dmabuf, but from
> looking at  dma_buf_export one could pass  a NULL for the ops.
see next comment
>
>> +     if (!dmabuf->ops->mmap)
>> +             return -EINVAL;
>> +
>> +     return dmabuf->ops->mmap(dmabuf, vma);
>> +}
>> +
>> +static int dma_buf_release(struct inode *inode, struct file *file)
>> +{
>> +     struct dma_buf *dmabuf;
>> +
>> +     if (!is_dma_buf_file(file))
>> +             return -EINVAL;
>> +
>> +     dmabuf = file->private_data;
>> +
>
> No checking here for ops or ops->release?
Hmmm.. you're right, of course. for this common check in mmap and
release, I guess I'd add it to 'is_dma_buf_file()' helper [maybe call
it is_valid_dma_buf_file() or something similar]
>
<snip>
>> +
>> +/**
>
> I don't think the ** is anymore the current kernel doc format.
thanks for catching this :) - will correct.
>
>> + * dma_buf_export - Creates a new dma_buf, and associates an anon file
>> + * with this buffer,so it can be exported.
>
> Put a space there.
ok
>
>> + * Also connect the allocator specific data and ops to the buffer.
>> + *
>> + * @priv:    [in]    Attach private data of allocator to this buffer
>> + * @ops:     [in]    Attach allocator-defined dma buf ops to the new buffer.
>> + * @flags:   [in]    mode flags for the file.
>> + *
>> + * Returns, on success, a newly created dma_buf object, which wraps the
>> + * supplied private data and operations for dma_buf_ops. On failure to
>> + * allocate the dma_buf object, it can return NULL.
>
> "it can" I think the right word is "it will".
Right.
>
>> + *
>> + */
>> +struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops,
>> +                             int flags)
>> +{
>> +     struct dma_buf *dmabuf;
>> +     struct file *file;
>> +
>> +     BUG_ON(!priv || !ops);
>
> Whoa. Crash the whole kernel b/c of this? No no. You should
> use WARN_ON and just return NULL.
ok
>
>> +
>> +     dmabuf = kzalloc(sizeof(struct dma_buf), GFP_KERNEL);
>> +     if (dmabuf == NULL)
>> +             return dmabuf;
>
> Hmm, why not return ERR_PTR(-ENOMEM); ?
ok
>
>> +
>> +     dmabuf->priv = priv;
>> +     dmabuf->ops = ops;
>> +
>> +     file = anon_inode_getfile("dmabuf", &dma_buf_fops, dmabuf, flags);
>> +
>> +     dmabuf->file = file;
>> +
>> +     mutex_init(&dmabuf->lock);
>> +     INIT_LIST_HEAD(&dmabuf->attachments);
>> +
>> +     return dmabuf;
>> +}
>> +EXPORT_SYMBOL(dma_buf_export);
>
> _GPL ?
sure; will change it.
>
>> +
>> +
>> +/**
>> + * dma_buf_fd - returns a file descriptor for the given dma_buf
>> + * @dmabuf:  [in]    pointer to dma_buf for which fd is required.
>> + *
>> + * On success, returns an associated 'fd'. Else, returns error.
>> + */
>> +int dma_buf_fd(struct dma_buf *dmabuf)
>> +{
>> +     int error, fd;
>> +
>
> Should you check if dmabuf is NULL first?
yes.
>
>> +     if (!dmabuf->file)
>> +             return -EINVAL;
>> +
>> +     error = get_unused_fd_flags(0);
>> +     if (error < 0)
>> +             return error;
>> +     fd = error;
>> +
>> +     fd_install(fd, dmabuf->file);
>> +
>> +     return fd;
>> +}
>> +EXPORT_SYMBOL(dma_buf_fd);
>
> GPL?
sure; will change it.
>> +
>> +/**
>> + * dma_buf_get - returns the dma_buf structure related to an fd
>> + * @fd:      [in]    fd associated with the dma_buf to be returned
>> + *
>> + * On success, returns the dma_buf structure associated with an fd; uses
>> + * file's refcounting done by fget to increase refcount. returns ERR_PTR
>> + * otherwise.
>> + */
>> +struct dma_buf *dma_buf_get(int fd)
>> +{
>> +     struct file *file;
>> +
>> +     file = fget(fd);
>> +
>> +     if (!file)
>> +             return ERR_PTR(-EBADF);
>> +
>> +     if (!is_dma_buf_file(file)) {
>> +             fput(file);
>> +             return ERR_PTR(-EINVAL);
>> +     }
>> +
>> +     return file->private_data;
>> +}
>> +EXPORT_SYMBOL(dma_buf_get);
>
> GPL
sure; will change it.
>> +
>> +/**
>> + * dma_buf_put - decreases refcount of the buffer
>> + * @dmabuf:  [in]    buffer to reduce refcount of
>> + *
>> + * Uses file's refcounting done implicitly by fput()
>> + */
>> +void dma_buf_put(struct dma_buf *dmabuf)
>> +{
>> +     BUG_ON(!dmabuf->file);
>
> Yikes. BUG_ON? Can't you do WARN_ON and continue on without
> doing the refcounting?
ok
>
>> +
>> +     fput(dmabuf->file);
>> +}
>> +EXPORT_SYMBOL(dma_buf_put);
>> +
>> +/**
>> + * dma_buf_attach - Add the device to dma_buf's attachments list; optionally,
>> + * calls attach() of dma_buf_ops to allow device-specific attach functionality
>> + * @dmabuf:  [in]    buffer to attach device to.
>> + * @dev:     [in]    device to be attached.
>> + *
>> + * Returns struct dma_buf_attachment * for this attachment; may return NULL.
>> + *
>> + */
>> +struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
>> +                                             struct device *dev)
>
> 'struct device' should be at the same column as 'struct dma_buf' ..
>
>> +{
>> +     struct dma_buf_attachment *attach;
>> +     int ret;
>> +
>> +     BUG_ON(!dmabuf || !dev);
>
> Again, BUG_ON...
will correct
>
>> +
>> +     attach = kzalloc(sizeof(struct dma_buf_attachment), GFP_KERNEL);
>> +     if (attach == NULL)
>> +             goto err_alloc;
>> +
>> +     mutex_lock(&dmabuf->lock);
>> +
>> +     attach->dev = dev;
>> +     attach->dmabuf = dmabuf;
>> +     if (dmabuf->ops->attach) {
>
> No checking first of dmabuf->ops?
Attach is told to be a mandatory operation for dmabuf exporter, but I
understand your point - checking for it won't hurt.
>
>> +             ret = dmabuf->ops->attach(dmabuf, dev, attach);
>> +             if (!ret)
>> +                     goto err_attach;
>> +     }
>> +     list_add(&attach->node, &dmabuf->attachments);
>> +
>> +     mutex_unlock(&dmabuf->lock);
>> +
>> +err_alloc:
>> +     return attach;
>> +err_attach:
>> +     kfree(attach);
>> +     mutex_unlock(&dmabuf->lock);
>> +     return ERR_PTR(ret);
>> +}
>> +EXPORT_SYMBOL(dma_buf_attach);
>
> GPL
sure; will change it.
<snip>

Thanks and regards,
~Sumit.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-02  8:57 ` [RFC v2 1/2] dma-buf: Introduce dma " Sumit Semwal
  2011-12-02 17:11   ` Konrad Rzeszutek Wilk
@ 2011-12-05 17:18   ` Arnd Bergmann
  2011-12-05 18:55     ` Daniel Vetter
                       ` (3 more replies)
  2012-01-09  6:20   ` InKi Dae
  2 siblings, 4 replies; 64+ messages in thread
From: Arnd Bergmann @ 2011-12-05 17:18 UTC (permalink / raw)
  To: Sumit Semwal
  Cc: linux-kernel, linux-arm-kernel, linux-mm, linaro-mm-sig,
	dri-devel, linux-media, linux, jesse.barker, m.szyprowski, rob,
	daniel, t.stanislaws, Sumit Semwal

On Friday 02 December 2011, Sumit Semwal wrote:
> This is the first step in defining a dma buffer sharing mechanism.

This looks very nice, but there are a few things I don't understand yet
and a bunch of trivial comments I have about things I spotted.

Do you have prototype exporter and consumer drivers that you can post
for clarification?

In the patch 2, you have a section about migration that mentions that
it is possible to export a buffer that can be migrated after it
is already mapped into one user driver. How does that work when
the physical addresses are mapped into a consumer device already?

> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index 21cf46f..07d8095 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -174,4 +174,14 @@ config SYS_HYPERVISOR
>  
>  source "drivers/base/regmap/Kconfig"
>  
> +config DMA_SHARED_BUFFER
> +	bool "Buffer framework to be shared between drivers"
> +	default n
> +	depends on ANON_INODES

I would make this 'select ANON_INODES', like the other users of this
feature.

> +	return dmabuf;
> +}
> +EXPORT_SYMBOL(dma_buf_export);

I agree with Konrad, this should definitely be EXPORT_SYMBOL_GPL,
because it's really a low-level function that I would expect
to get used by in-kernel subsystems providing the feature to
users and having back-end drivers, but it's not the kind of thing
we want out-of-tree drivers to mess with.

> +/**
> + * dma_buf_fd - returns a file descriptor for the given dma_buf
> + * @dmabuf:	[in]	pointer to dma_buf for which fd is required.
> + *
> + * On success, returns an associated 'fd'. Else, returns error.
> + */
> +int dma_buf_fd(struct dma_buf *dmabuf)
> +{
> +	int error, fd;
> +
> +	if (!dmabuf->file)
> +		return -EINVAL;
> +
> +	error = get_unused_fd_flags(0);

Why not simply get_unused_fd()?

> +/**
> + * dma_buf_attach - Add the device to dma_buf's attachments list; optionally,
> + * calls attach() of dma_buf_ops to allow device-specific attach functionality
> + * @dmabuf:	[in]	buffer to attach device to.
> + * @dev:	[in]	device to be attached.
> + *
> + * Returns struct dma_buf_attachment * for this attachment; may return NULL.
> + *

Or may return a negative error code. It's better to be consistent here:
either always return NULL on error, or change the allocation error to
ERR_PTR(-ENOMEM).

> + */
> +struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
> +						struct device *dev)
> +{
> +	struct dma_buf_attachment *attach;
> +	int ret;
> +
> +	BUG_ON(!dmabuf || !dev);
> +
> +	attach = kzalloc(sizeof(struct dma_buf_attachment), GFP_KERNEL);
> +	if (attach == NULL)
> +		goto err_alloc;
> +
> +	mutex_lock(&dmabuf->lock);
> +
> +	attach->dev = dev;
> +	attach->dmabuf = dmabuf;
> +	if (dmabuf->ops->attach) {
> +		ret = dmabuf->ops->attach(dmabuf, dev, attach);
> +		if (!ret)
> +			goto err_attach;

You probably mean "if (ret)" here instead of "if (!ret)", right?

> +	/* allow allocator to take care of cache ops */
> +	void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
> +	void (*sync_sg_for_device)(struct dma_buf *, struct device *);

I don't see how this works with multiple consumers: For the streaming
DMA mapping, there must be exactly one owner, either the device or
the CPU. Obviously, this rule needs to be extended when you get to
multiple devices and multiple device drivers, plus possibly user
mappings. Simply assigning the buffer to "the device" from one
driver does not block other drivers from touching the buffer, and
assigning it to "the cpu" does not stop other hardware that the
code calling sync_sg_for_cpu is not aware of.

The only way to solve this that I can think of right now is to
mandate that the mappings are all coherent (i.e. noncachable
on noncoherent architectures like ARM). If you do that, you no
longer need the sync_sg_for_* calls.

> +#ifdef CONFIG_DMA_SHARED_BUFFER

Do you have a use case for making the interface compile-time disabled?
I had assumed that any code using it would make no sense if it's not
available so you don't actually need this.

	Arnd

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-05 17:18   ` Arnd Bergmann
@ 2011-12-05 18:55     ` Daniel Vetter
  2011-12-05 19:29       ` Arnd Bergmann
  2011-12-05 20:46     ` Rob Clark
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 64+ messages in thread
From: Daniel Vetter @ 2011-12-05 18:55 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Sumit Semwal, linux-kernel, linux-arm-kernel, linux-mm,
	linaro-mm-sig, dri-devel, linux-media, linux, jesse.barker,
	m.szyprowski, rob, daniel, t.stanislaws, Sumit Semwal

On Mon, Dec 05, 2011 at 05:18:48PM +0000, Arnd Bergmann wrote:
> On Friday 02 December 2011, Sumit Semwal wrote:
> > +	/* allow allocator to take care of cache ops */
> > +	void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
> > +	void (*sync_sg_for_device)(struct dma_buf *, struct device *);
>
> I don't see how this works with multiple consumers: For the streaming
> DMA mapping, there must be exactly one owner, either the device or
> the CPU. Obviously, this rule needs to be extended when you get to
> multiple devices and multiple device drivers, plus possibly user
> mappings. Simply assigning the buffer to "the device" from one
> driver does not block other drivers from touching the buffer, and
> assigning it to "the cpu" does not stop other hardware that the
> code calling sync_sg_for_cpu is not aware of.
>
> The only way to solve this that I can think of right now is to
> mandate that the mappings are all coherent (i.e. noncachable
> on noncoherent architectures like ARM). If you do that, you no
> longer need the sync_sg_for_* calls.

Woops, totally missed the addition of these. Can somebody explain to used
to rather coherent x86 what we need these for and the code-flow would look
like in a typical example. I was kinda assuming that devices would bracket
their use of a buffer with the attachment_map/unmap calls and any cache
coherency magic that might be needed would be somewhat transparent to
users of the interface?

The map call gets the dma_data_direction parameter, so it should be able
to do the right thing. And because we keep the attachement around, any
caching of mappings should be possible, too.

Yours, Daniel

PS: Slightly related, because it will make the coherency nightmare worse,
afaict: Can we kill mmap support?
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-05 18:55     ` Daniel Vetter
@ 2011-12-05 19:29       ` Arnd Bergmann
  2011-12-05 20:58         ` Daniel Vetter
  0 siblings, 1 reply; 64+ messages in thread
From: Arnd Bergmann @ 2011-12-05 19:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Daniel Vetter, t.stanislaws, linux, Sumit Semwal, jesse.barker,
	linux-kernel, dri-devel, linaro-mm-sig, linux-mm, rob,
	m.szyprowski, Sumit Semwal, linux-media

On Monday 05 December 2011 19:55:44 Daniel Vetter wrote:
> > The only way to solve this that I can think of right now is to
> > mandate that the mappings are all coherent (i.e. noncachable
> > on noncoherent architectures like ARM). If you do that, you no
> > longer need the sync_sg_for_* calls.
> 
> Woops, totally missed the addition of these. Can somebody explain to used
> to rather coherent x86 what we need these for and the code-flow would look
> like in a typical example. I was kinda assuming that devices would bracket
> their use of a buffer with the attachment_map/unmap calls and any cache
> coherency magic that might be needed would be somewhat transparent to
> users of the interface?

I'll describe how the respective functions work in the streaming mapping
API (dma_map_*): You start out with a buffer that is owned by the CPU,
i.e. the kernel can access it freely. When you call dma_map_sg or similar,
a noncoherent device reading the buffer requires the cache to be flushed
in order to see the data that was written by the CPU into the cache.

After dma_map_sg, the device can perform both read and write accesses
(depending on the flag to the map call), but the CPU is no longer allowed
to read (which would allocate a cache line that may become invalid but
remain marked as clean) or write (which would create a dirty cache line
without writing it back) that buffer.

Once the device is done, the driver calls dma_unmap_* and the buffer is
again owned by the CPU. The device can no longer access it (in fact
the address may be no longer be backed if there is an iommu) and the CPU
can again read and write the buffer. On ARMv6 and higher, possibly some
other architectures, dma_unmap_* also needs to invalidate the cache
for the buffer, because due to speculative prefetching, there may also
be a new clean cache line with stale data from an earlier version of
the buffer.

Since map/unmap is an expensive operation, the interface was extended
to pass back the ownership to the CPU and back to the device while leaving
the buffer mapped. dma_sync_sg_for_cpu invalidates the cache in the same
way as dma_unmap_sg, so the CPU can access the buffer, and
dma_sync_sg_for_device hands it back to the device by performing the
same cache flush that dma_map_sg would do.

You could for example do this if you want video input with a
cacheable buffer, or in an rdma scenario with a buffer accessed
by a remote machine.

In case of software iommu (swiotlb, dmabounce), the map and sync
functions don't do cache management but instead copy data between
a buffer accessed by hardware and a different buffer accessed
by the user.

> The map call gets the dma_data_direction parameter, so it should be able
> to do the right thing. And because we keep the attachement around, any
> caching of mappings should be possible, too.
> 
> Yours, Daniel
> 
> PS: Slightly related, because it will make the coherency nightmare worse,
> afaict: Can we kill mmap support?

The mmap support is required (and only make sense) for consistent mappings,
not for streaming mappings. The provider must ensure that if you map
something uncacheable into the kernel in order to provide consistency,
any mapping into user space must also be uncacheable. A driver
that wants to have the buffer mapped to user space as many do should
not need to know whether it is required to do cacheable or uncacheable
mapping because of some other driver, and it should not need to worry
about how to set up uncached mappings in user space.

	Arnd

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-05 17:18   ` Arnd Bergmann
  2011-12-05 18:55     ` Daniel Vetter
@ 2011-12-05 20:46     ` Rob Clark
  2011-12-05 21:23       ` Daniel Vetter
  2011-12-05 22:09       ` Arnd Bergmann
  2011-12-07  6:35     ` Semwal, Sumit
  2011-12-09 22:50     ` [Linaro-mm-sig] " Robert Morell
  3 siblings, 2 replies; 64+ messages in thread
From: Rob Clark @ 2011-12-05 20:46 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Sumit Semwal, linux-kernel, linux-arm-kernel, linux-mm,
	linaro-mm-sig, dri-devel, linux-media, linux, jesse.barker,
	m.szyprowski, daniel, t.stanislaws, Sumit Semwal

On Mon, Dec 5, 2011 at 11:18 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Friday 02 December 2011, Sumit Semwal wrote:
>> This is the first step in defining a dma buffer sharing mechanism.
>
> This looks very nice, but there are a few things I don't understand yet
> and a bunch of trivial comments I have about things I spotted.
>
> Do you have prototype exporter and consumer drivers that you can post
> for clarification?

There is some dummy drivers based on an earlier version.  And airlied
has a prime (multi-gpu) prototype:

http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-prime-dmabuf

I've got a nearly working camera+display prototype:

https://github.com/robclark/kernel-omap4/commits/dmabuf

> In the patch 2, you have a section about migration that mentions that
> it is possible to export a buffer that can be migrated after it
> is already mapped into one user driver. How does that work when
> the physical addresses are mapped into a consumer device already?

I think you can do physical migration if you are attached, but
probably not if you are mapped.

>> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
>> index 21cf46f..07d8095 100644
>> --- a/drivers/base/Kconfig
>> +++ b/drivers/base/Kconfig
>> @@ -174,4 +174,14 @@ config SYS_HYPERVISOR
>>
>>  source "drivers/base/regmap/Kconfig"
>>
>> +config DMA_SHARED_BUFFER
>> +     bool "Buffer framework to be shared between drivers"
>> +     default n
>> +     depends on ANON_INODES
>
> I would make this 'select ANON_INODES', like the other users of this
> feature.
>
>> +     return dmabuf;
>> +}
>> +EXPORT_SYMBOL(dma_buf_export);
>
> I agree with Konrad, this should definitely be EXPORT_SYMBOL_GPL,
> because it's really a low-level function that I would expect
> to get used by in-kernel subsystems providing the feature to
> users and having back-end drivers, but it's not the kind of thing
> we want out-of-tree drivers to mess with.
>
>> +/**
>> + * dma_buf_fd - returns a file descriptor for the given dma_buf
>> + * @dmabuf:  [in]    pointer to dma_buf for which fd is required.
>> + *
>> + * On success, returns an associated 'fd'. Else, returns error.
>> + */
>> +int dma_buf_fd(struct dma_buf *dmabuf)
>> +{
>> +     int error, fd;
>> +
>> +     if (!dmabuf->file)
>> +             return -EINVAL;
>> +
>> +     error = get_unused_fd_flags(0);
>
> Why not simply get_unused_fd()?
>
>> +/**
>> + * dma_buf_attach - Add the device to dma_buf's attachments list; optionally,
>> + * calls attach() of dma_buf_ops to allow device-specific attach functionality
>> + * @dmabuf:  [in]    buffer to attach device to.
>> + * @dev:     [in]    device to be attached.
>> + *
>> + * Returns struct dma_buf_attachment * for this attachment; may return NULL.
>> + *
>
> Or may return a negative error code. It's better to be consistent here:
> either always return NULL on error, or change the allocation error to
> ERR_PTR(-ENOMEM).
>
>> + */
>> +struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
>> +                                             struct device *dev)
>> +{
>> +     struct dma_buf_attachment *attach;
>> +     int ret;
>> +
>> +     BUG_ON(!dmabuf || !dev);
>> +
>> +     attach = kzalloc(sizeof(struct dma_buf_attachment), GFP_KERNEL);
>> +     if (attach == NULL)
>> +             goto err_alloc;
>> +
>> +     mutex_lock(&dmabuf->lock);
>> +
>> +     attach->dev = dev;
>> +     attach->dmabuf = dmabuf;
>> +     if (dmabuf->ops->attach) {
>> +             ret = dmabuf->ops->attach(dmabuf, dev, attach);
>> +             if (!ret)
>> +                     goto err_attach;
>
> You probably mean "if (ret)" here instead of "if (!ret)", right?
>
>> +     /* allow allocator to take care of cache ops */
>> +     void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
>> +     void (*sync_sg_for_device)(struct dma_buf *, struct device *);
>
> I don't see how this works with multiple consumers: For the streaming
> DMA mapping, there must be exactly one owner, either the device or
> the CPU. Obviously, this rule needs to be extended when you get to
> multiple devices and multiple device drivers, plus possibly user
> mappings. Simply assigning the buffer to "the device" from one
> driver does not block other drivers from touching the buffer, and
> assigning it to "the cpu" does not stop other hardware that the
> code calling sync_sg_for_cpu is not aware of.
>
> The only way to solve this that I can think of right now is to
> mandate that the mappings are all coherent (i.e. noncachable
> on noncoherent architectures like ARM). If you do that, you no
> longer need the sync_sg_for_* calls.

My original thinking was that you either need DMABUF_CPU_{PREP,FINI}
ioctls and corresponding dmabuf ops, which userspace is required to
call before / after CPU access.  Or just remove mmap() and do the
mmap() via allocating device and use that device's equivalent
DRM_XYZ_GEM_CPU_{PREP,FINI} or DRM_XYZ_GEM_SET_DOMAIN ioctls.  That
would give you a way to (a) synchronize with gpu/asynchronous
pipeline, (b) synchronize w/ multiple hw devices vs cpu accessing
buffer (ie. wait all devices have dma_buf_unmap_attachment'd).  And
that gives you a convenient place to do cache operations on
noncoherent architecture.

I sort of preferred having the DMABUF shim because that lets you pass
a buffer around userspace without the receiving code knowing about a
device specific API.  But the problem I eventually came around to: if
your GL stack (or some other userspace component) is batching up
commands before submission to kernel, the buffers you need to wait for
completion might not even be submitted yet.  So from kernel
perspective they are "ready" for cpu access.  Even though in fact they
are not in a consistent state from rendering perspective.  I don't
really know a sane way to deal with that.  Maybe the approach instead
should be a userspace level API (in libkms/libdrm?) to provide
abstraction for userspace access to buffers rather than dealing with
this at the kernel level.

BR,
-R

>> +#ifdef CONFIG_DMA_SHARED_BUFFER
>
> Do you have a use case for making the interface compile-time disabled?
> I had assumed that any code using it would make no sense if it's not
> available so you don't actually need this.
>
>        Arnd
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-05 19:29       ` Arnd Bergmann
@ 2011-12-05 20:58         ` Daniel Vetter
  2011-12-05 22:04           ` Arnd Bergmann
  0 siblings, 1 reply; 64+ messages in thread
From: Daniel Vetter @ 2011-12-05 20:58 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Daniel Vetter, t.stanislaws, linux,
	Sumit Semwal, jesse.barker, linux-kernel, dri-devel,
	linaro-mm-sig, linux-mm, rob, m.szyprowski, Sumit Semwal,
	linux-media

On Mon, Dec 05, 2011 at 08:29:49PM +0100, Arnd Bergmann wrote:
> On Monday 05 December 2011 19:55:44 Daniel Vetter wrote:
> > > The only way to solve this that I can think of right now is to
> > > mandate that the mappings are all coherent (i.e. noncachable
> > > on noncoherent architectures like ARM). If you do that, you no
> > > longer need the sync_sg_for_* calls.
> >
> > Woops, totally missed the addition of these. Can somebody explain to used
> > to rather coherent x86 what we need these for and the code-flow would look
> > like in a typical example. I was kinda assuming that devices would bracket
> > their use of a buffer with the attachment_map/unmap calls and any cache
> > coherency magic that might be needed would be somewhat transparent to
> > users of the interface?
>
> I'll describe how the respective functions work in the streaming mapping
> API (dma_map_*): You start out with a buffer that is owned by the CPU,
> i.e. the kernel can access it freely. When you call dma_map_sg or similar,
> a noncoherent device reading the buffer requires the cache to be flushed
> in order to see the data that was written by the CPU into the cache.
>
> After dma_map_sg, the device can perform both read and write accesses
> (depending on the flag to the map call), but the CPU is no longer allowed
> to read (which would allocate a cache line that may become invalid but
> remain marked as clean) or write (which would create a dirty cache line
> without writing it back) that buffer.
>
> Once the device is done, the driver calls dma_unmap_* and the buffer is
> again owned by the CPU. The device can no longer access it (in fact
> the address may be no longer be backed if there is an iommu) and the CPU
> can again read and write the buffer. On ARMv6 and higher, possibly some
> other architectures, dma_unmap_* also needs to invalidate the cache
> for the buffer, because due to speculative prefetching, there may also
> be a new clean cache line with stale data from an earlier version of
> the buffer.
>
> Since map/unmap is an expensive operation, the interface was extended
> to pass back the ownership to the CPU and back to the device while leaving
> the buffer mapped. dma_sync_sg_for_cpu invalidates the cache in the same
> way as dma_unmap_sg, so the CPU can access the buffer, and
> dma_sync_sg_for_device hands it back to the device by performing the
> same cache flush that dma_map_sg would do.
>
> You could for example do this if you want video input with a
> cacheable buffer, or in an rdma scenario with a buffer accessed
> by a remote machine.
>
> In case of software iommu (swiotlb, dmabounce), the map and sync
> functions don't do cache management but instead copy data between
> a buffer accessed by hardware and a different buffer accessed
> by the user.

Thanks a lot for this excellent overview. I think at least for the first
version of dmabuf we should drop the sync_* interfaces and simply require
users to bracket their usage of the buffer from the attached device by
map/unmap. A dma_buf provider is always free to cache the mapping and
simply call sync_sg_for of the streaming dma api.

If it later turns out that we want to be able to cache the sg list also on
the use-site in the driver (e.g. map it into some hw sg list) we can
always add that functionality later. I just fear that sync ops among N
devices is a bit ill-defined and we already have a plethora of ill-defined
issues at hand. Also the proposed api doesn't quite fit into what's
already there, I think an s/device/dma_buf_attachment/ would be more
consistent - otherwise the dmabuf provider might need to walk the list of
attachements to get at the right one for the device.

> > The map call gets the dma_data_direction parameter, so it should be able
> > to do the right thing. And because we keep the attachement around, any
> > caching of mappings should be possible, too.
> >
> > Yours, Daniel
> >
> > PS: Slightly related, because it will make the coherency nightmare worse,
> > afaict: Can we kill mmap support?
>
> The mmap support is required (and only make sense) for consistent mappings,
> not for streaming mappings. The provider must ensure that if you map
> something uncacheable into the kernel in order to provide consistency,
> any mapping into user space must also be uncacheable. A driver
> that wants to have the buffer mapped to user space as many do should
> not need to know whether it is required to do cacheable or uncacheable
> mapping because of some other driver, and it should not need to worry
> about how to set up uncached mappings in user space.

Either I've always missed it or no one ever described it that consisely,
but now I see the use-case for mmap: Simpler drivers (i.e. not gpus) might
need to expose a userspace mapping and only the provider knows how to do
that in a coherent fashion. I want this in the docs (if it's not there yet
...).

But even with that use-case in mind I still have some gripes with the
current mmap interfaces as proposed:
- This use-case explains why the dmabuf provider needs to expose an mmap
  function. It doesn't explain why we need to expose that to userspace
  (instead of faking whatever mmap the importing driver already exposes).
  Imo the userspace-facing part of dmabuf should be about buffer sharing
  and not about how to access that buffer, so I'm still missing the reason
  for that.
- We need some way to tell the provider to sync all completed dma
  operations for userspace mmap access. sync_for_cpu would serve that use.
  But given that we strive for zero-copy I think the kernel shouldn't
  ever need to access the contents of a dmabuf and it would therefore make
  more sense to call it sync_for_mmap.
- And the ugly one: Assuming you want this to be coherent for (almost)
  unchanged users of exisiting subsystem and want this to integrate
  seamlessly with gpu driver management, we also need a
  finish_mmap_access. We _can_ fake this by shooting down userspace ptes
  (if the provider knows about all of them, and it should thanks to the
  mmap interface) and we do that trick on i915 (for certain cases). But
  this is generally slow and painful and does not integrate well with
  other gpu memory management paradigms, where userspace is required to
  explicit bracket access.

Yours, Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-05 20:46     ` Rob Clark
@ 2011-12-05 21:23       ` Daniel Vetter
  2011-12-05 22:11         ` Rob Clark
  2011-12-05 22:09       ` Arnd Bergmann
  1 sibling, 1 reply; 64+ messages in thread
From: Daniel Vetter @ 2011-12-05 21:23 UTC (permalink / raw)
  To: Rob Clark
  Cc: Arnd Bergmann, Sumit Semwal, linux-kernel, linux-arm-kernel,
	linux-mm, linaro-mm-sig, dri-devel, linux-media, linux,
	jesse.barker, m.szyprowski, daniel, t.stanislaws, Sumit Semwal

On Mon, Dec 05, 2011 at 02:46:47PM -0600, Rob Clark wrote:
> On Mon, Dec 5, 2011 at 11:18 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > In the patch 2, you have a section about migration that mentions that
> > it is possible to export a buffer that can be migrated after it
> > is already mapped into one user driver. How does that work when
> > the physical addresses are mapped into a consumer device already?
>
> I think you can do physical migration if you are attached, but
> probably not if you are mapped.

Yeah, that's very much how I see this, and also why map/unmap (at least
for simple users like v4l) should only bracket actual usage. GPU memory
managers need to be able to move around buffers while no one is using
them.

[snip]

> >> +     /* allow allocator to take care of cache ops */
> >> +     void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
> >> +     void (*sync_sg_for_device)(struct dma_buf *, struct device *);
> >
> > I don't see how this works with multiple consumers: For the streaming
> > DMA mapping, there must be exactly one owner, either the device or
> > the CPU. Obviously, this rule needs to be extended when you get to
> > multiple devices and multiple device drivers, plus possibly user
> > mappings. Simply assigning the buffer to "the device" from one
> > driver does not block other drivers from touching the buffer, and
> > assigning it to "the cpu" does not stop other hardware that the
> > code calling sync_sg_for_cpu is not aware of.
> >
> > The only way to solve this that I can think of right now is to
> > mandate that the mappings are all coherent (i.e. noncachable
> > on noncoherent architectures like ARM). If you do that, you no
> > longer need the sync_sg_for_* calls.
>
> My original thinking was that you either need DMABUF_CPU_{PREP,FINI}
> ioctls and corresponding dmabuf ops, which userspace is required to
> call before / after CPU access.  Or just remove mmap() and do the
> mmap() via allocating device and use that device's equivalent
> DRM_XYZ_GEM_CPU_{PREP,FINI} or DRM_XYZ_GEM_SET_DOMAIN ioctls.  That
> would give you a way to (a) synchronize with gpu/asynchronous
> pipeline, (b) synchronize w/ multiple hw devices vs cpu accessing
> buffer (ie. wait all devices have dma_buf_unmap_attachment'd).  And
> that gives you a convenient place to do cache operations on
> noncoherent architecture.
>
> I sort of preferred having the DMABUF shim because that lets you pass
> a buffer around userspace without the receiving code knowing about a
> device specific API.  But the problem I eventually came around to: if
> your GL stack (or some other userspace component) is batching up
> commands before submission to kernel, the buffers you need to wait for
> completion might not even be submitted yet.  So from kernel
> perspective they are "ready" for cpu access.  Even though in fact they
> are not in a consistent state from rendering perspective.  I don't
> really know a sane way to deal with that.  Maybe the approach instead
> should be a userspace level API (in libkms/libdrm?) to provide
> abstraction for userspace access to buffers rather than dealing with
> this at the kernel level.

Well, there's a reason GL has an explicit flush and extensions for sync
objects. It's to support such scenarios where the driver batches up gpu
commands before actually submitting them. Also, recent gpus have all (or
shortly will grow) multiple execution pipelines, so it's also important
that you sync up with the right command stream. Syncing up with all of
them is generally frowned upon for obvious reasons ;-)

So any userspace that interacts with an OpenGL driver needs to take care
of this anyway. But I think for simpler stuff (v4l) kernel only coherency
should work and userspace just needs to take care of gl interactions and
call glflush and friends at the right points. I think we can flesh this
out precisely when we spec the dmabuf EGL extension ... (or implement one
of the preexisting ones already around).

On the topic of a coherency model for dmabuf, I think we need to look at
dma_buf_attachment_map/unmap (and also the mmap variants cpu_start and
cpu_finish or whatever they might get called) as barriers:

So after a dma_buf_map, all previsously completed dma operations (i.e.
unmap already called) and any cpu writes (i.e. cpu_finish called) will be
coherent. Similar rule holds for cpu access through the userspace mmap,
only writes completed before the cpu_start will show up.

Similar, writes done by the device are only guaranteed to show up after
the _unmap. Dito for cpu writes and cpu_finish.

In short we always need two function calls to denote the start/end of the
"critical section".

Any concurrent operations are allowed to yield garbage, meaning any
combination of the old or either of the newly written contents (i.e.
non-overlapping writes might not actually all end up in the buffer,
but instead some old contents). Maybe we even need to loosen that to
the real "undefined behaviour", but atm I can't think of an example.

-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-05 20:58         ` Daniel Vetter
@ 2011-12-05 22:04           ` Arnd Bergmann
  2011-12-05 22:33             ` Daniel Vetter
  0 siblings, 1 reply; 64+ messages in thread
From: Arnd Bergmann @ 2011-12-05 22:04 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Daniel Vetter, t.stanislaws, linux, Sumit Semwal, linux-mm,
	linux-kernel, dri-devel, linaro-mm-sig, jesse.barker, rob,
	linux-media, Sumit Semwal, m.szyprowski

On Monday 05 December 2011 21:58:39 Daniel Vetter wrote:
> On Mon, Dec 05, 2011 at 08:29:49PM +0100, Arnd Bergmann wrote:
> > ...
> 
> Thanks a lot for this excellent overview. I think at least for the first
> version of dmabuf we should drop the sync_* interfaces and simply require
> users to bracket their usage of the buffer from the attached device by
> map/unmap. A dma_buf provider is always free to cache the mapping and
> simply call sync_sg_for of the streaming dma api.

I think we still have the same problem if we allow multiple drivers
to access a noncoherent buffer using map/unmap:

	driver A				driver B

1.	read/write				
2.						read/write
3.	map()					
4.						read/write
5.	dma
6.						map()
7.	dma
8.						dma
9.	unmap()
10.						dma
11.	read/write
12.						unmap()						


In step 4, the buffer is owned by device A, but accessed by driver B, which
is a bug. In step 11, the buffer is owned by device B but accessed by driver
A, which is the same bug on the other side. In steps 7 and 8, the buffer
is owned by both device A and B, which is currently undefined but would
be ok if both devices are on the same coherency domain. Whether that point
is meaningful depends on what the devices actually do. It would be ok
if both are only reading, but not if they write into the same location
concurrently.

As I mentioned originally, the problem could be completely avoided if
we only allow consistent (e.g. uncached) mappings or buffers that
are not mapped into the kernel virtual address space at all.

Alternatively, a clearer model would be to require each access to
nonconsistent buffers to be exclusive: a map() operation would have
to block until the current mapper (if any) has done an unmap(), and
any access from the CPU would also have to call a dma_buf_ops pointer
to serialize the CPU accesses with any device accesses. User
mappings of the buffer can be easily blocked during a DMA access
by unmapping the buffer from user space at map() time and blocking the
vm_ops->fault() operation until the unmap().

> If it later turns out that we want to be able to cache the sg list also on
> the use-site in the driver (e.g. map it into some hw sg list) we can
> always add that functionality later. I just fear that sync ops among N
> devices is a bit ill-defined and we already have a plethora of ill-defined
> issues at hand. Also the proposed api doesn't quite fit into what's
> already there, I think an s/device/dma_buf_attachment/ would be more
> consistent - otherwise the dmabuf provider might need to walk the list of
> attachements to get at the right one for the device.

Right, at last for the start, let's mandate just map/unmap and not provide
sync. I do wonder however whether we should implement consistent (possibly
uncached) or streaming (cacheable, but always owned by either the device
or the CPU, not both) buffers, or who gets to make the decision which
one is used if both are implemented.

> > > The map call gets the dma_data_direction parameter, so it should be able
> > > to do the right thing. And because we keep the attachement around, any
> > > caching of mappings should be possible, too.
> > >
> > > Yours, Daniel
> > >
> > > PS: Slightly related, because it will make the coherency nightmare worse,
> > > afaict: Can we kill mmap support?
> >
> > The mmap support is required (and only make sense) for consistent mappings,
> > not for streaming mappings. The provider must ensure that if you map
> > something uncacheable into the kernel in order to provide consistency,
> > any mapping into user space must also be uncacheable. A driver
> > that wants to have the buffer mapped to user space as many do should
> > not need to know whether it is required to do cacheable or uncacheable
> > mapping because of some other driver, and it should not need to worry
> > about how to set up uncached mappings in user space.
> 
> Either I've always missed it or no one ever described it that consisely,
> but now I see the use-case for mmap: Simpler drivers (i.e. not gpus) might
> need to expose a userspace mapping and only the provider knows how to do
> that in a coherent fashion. I want this in the docs (if it's not there yet
> ...).

It's currently implemented in the ARM/PowerPC-specific dma_mmap_coherent
function and documented (hardly) in arch/arm/include/asm/dma-mapping.h.

We should make clear in that this is actually an extension of the
regular dma mapping api that first needs to be made generic.

> But even with that use-case in mind I still have some gripes with the
> current mmap interfaces as proposed:
> - This use-case explains why the dmabuf provider needs to expose an mmap
>   function. It doesn't explain why we need to expose that to userspace
>   (instead of faking whatever mmap the importing driver already exposes).
>   Imo the userspace-facing part of dmabuf should be about buffer sharing
>   and not about how to access that buffer, so I'm still missing the reason
>   for that.

AFAICT, the only reason for providing an mmap operation in the dma_buf
file descriptor is "because we can". It may in fact be better to leave
that part out and have the discussion whether this is actually a good
thing to do after the dma_buf core code has been merged upstream.

> - We need some way to tell the provider to sync all completed dma
>   operations for userspace mmap access. sync_for_cpu would serve that use.
>   But given that we strive for zero-copy I think the kernel shouldn't
>   ever need to access the contents of a dmabuf and it would therefore make
>   more sense to call it sync_for_mmap.

As I mentioned, the kernel can easily block out the user map by
transparently unmapping the buffer. I've done that in spufs for
context-switching an SPU: during the save and restore phase of the
SPU local memory to system memory, the page tables entries for
all user mappings are removed and then faulted back in after the
context switch. We can do the same during a DMA on a noncoherent
buffer.

> - And the ugly one: Assuming you want this to be coherent for (almost)
>   unchanged users of exisiting subsystem and want this to integrate
>   seamlessly with gpu driver management, we also need a
>   finish_mmap_access. We _can_ fake this by shooting down userspace ptes
>   (if the provider knows about all of them, and it should thanks to the
>   mmap interface) and we do that trick on i915 (for certain cases). But
>   this is generally slow and painful and does not integrate well with
>   other gpu memory management paradigms, where userspace is required to
>   explicit bracket access.

Yes, good point.

	Arnd

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-05 20:46     ` Rob Clark
  2011-12-05 21:23       ` Daniel Vetter
@ 2011-12-05 22:09       ` Arnd Bergmann
  2011-12-05 22:15         ` Rob Clark
  2011-12-05 22:35         ` Rob Clark
  1 sibling, 2 replies; 64+ messages in thread
From: Arnd Bergmann @ 2011-12-05 22:09 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Rob Clark, t.stanislaws, linux, Sumit Semwal, jesse.barker,
	linux-kernel, dri-devel, linaro-mm-sig, linux-mm, daniel,
	m.szyprowski, Sumit Semwal, linux-media

On Monday 05 December 2011 14:46:47 Rob Clark wrote:
> On Mon, Dec 5, 2011 at 11:18 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Friday 02 December 2011, Sumit Semwal wrote:
> >> This is the first step in defining a dma buffer sharing mechanism.
> >
> > This looks very nice, but there are a few things I don't understand yet
> > and a bunch of trivial comments I have about things I spotted.
> >
> > Do you have prototype exporter and consumer drivers that you can post
> > for clarification?
> 
> There is some dummy drivers based on an earlier version.  And airlied
> has a prime (multi-gpu) prototype:
> 
> http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-prime-dmabuf
> 
> I've got a nearly working camera+display prototype:
> 
> https://github.com/robclark/kernel-omap4/commits/dmabuf

Ok, thanks. I think it would be good to post these for reference
in v3, with a clear indication that they are not being submitted
for discussion/inclusion yet.

> > In the patch 2, you have a section about migration that mentions that
> > it is possible to export a buffer that can be migrated after it
> > is already mapped into one user driver. How does that work when
> > the physical addresses are mapped into a consumer device already?
> 
> I think you can do physical migration if you are attached, but
> probably not if you are mapped.

Ok, that's what I thought.

> > You probably mean "if (ret)" here instead of "if (!ret)", right?
> >
> >> +     /* allow allocator to take care of cache ops */
> >> +     void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
> >> +     void (*sync_sg_for_device)(struct dma_buf *, struct device *);
> >
> > I don't see how this works with multiple consumers: For the streaming
> > DMA mapping, there must be exactly one owner, either the device or
> > the CPU. Obviously, this rule needs to be extended when you get to
> > multiple devices and multiple device drivers, plus possibly user
> > mappings. Simply assigning the buffer to "the device" from one
> > driver does not block other drivers from touching the buffer, and
> > assigning it to "the cpu" does not stop other hardware that the
> > code calling sync_sg_for_cpu is not aware of.
> >
> > The only way to solve this that I can think of right now is to
> > mandate that the mappings are all coherent (i.e. noncachable
> > on noncoherent architectures like ARM). If you do that, you no
> > longer need the sync_sg_for_* calls.
> 
> My original thinking was that you either need DMABUF_CPU_{PREP,FINI}
> ioctls and corresponding dmabuf ops, which userspace is required to
> call before / after CPU access.  Or just remove mmap() and do the
> mmap() via allocating device and use that device's equivalent
> DRM_XYZ_GEM_CPU_{PREP,FINI} or DRM_XYZ_GEM_SET_DOMAIN ioctls.  That
> would give you a way to (a) synchronize with gpu/asynchronous
> pipeline, (b) synchronize w/ multiple hw devices vs cpu accessing
> buffer (ie. wait all devices have dma_buf_unmap_attachment'd).  And
> that gives you a convenient place to do cache operations on
> noncoherent architecture.

I wasn't even thinking of user mappings, as I replied to Daniel, I
think they are easy to solve (maybe not efficiently though)

> I sort of preferred having the DMABUF shim because that lets you pass
> a buffer around userspace without the receiving code knowing about a
> device specific API.  But the problem I eventually came around to: if
> your GL stack (or some other userspace component) is batching up
> commands before submission to kernel, the buffers you need to wait for
> completion might not even be submitted yet.  So from kernel
> perspective they are "ready" for cpu access.  Even though in fact they
> are not in a consistent state from rendering perspective.  I don't
> really know a sane way to deal with that.  Maybe the approach instead
> should be a userspace level API (in libkms/libdrm?) to provide
> abstraction for userspace access to buffers rather than dealing with
> this at the kernel level.

It would be nice if user space had no way to block out kernel drivers,
otherwise we have to be very careful to ensure that each map() operation
can be interrupted by a signal as the last resort to avoid deadlocks.

	Arnd

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-05 21:23       ` Daniel Vetter
@ 2011-12-05 22:11         ` Rob Clark
  2011-12-05 22:33           ` Daniel Vetter
                             ` (2 more replies)
  0 siblings, 3 replies; 64+ messages in thread
From: Rob Clark @ 2011-12-05 22:11 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: t.stanislaws, linux, Arnd Bergmann, linux-kernel, dri-devel,
	linaro-mm-sig, linux-mm, m.szyprowski, Sumit Semwal,
	linux-arm-kernel, linux-media

On Mon, Dec 5, 2011 at 3:23 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Mon, Dec 05, 2011 at 02:46:47PM -0600, Rob Clark wrote:
>> On Mon, Dec 5, 2011 at 11:18 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> > In the patch 2, you have a section about migration that mentions that
>> > it is possible to export a buffer that can be migrated after it
>> > is already mapped into one user driver. How does that work when
>> > the physical addresses are mapped into a consumer device already?
>>
>> I think you can do physical migration if you are attached, but
>> probably not if you are mapped.
>
> Yeah, that's very much how I see this, and also why map/unmap (at least
> for simple users like v4l) should only bracket actual usage. GPU memory
> managers need to be able to move around buffers while no one is using
> them.
>
> [snip]
>
>> >> +     /* allow allocator to take care of cache ops */
>> >> +     void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
>> >> +     void (*sync_sg_for_device)(struct dma_buf *, struct device *);
>> >
>> > I don't see how this works with multiple consumers: For the streaming
>> > DMA mapping, there must be exactly one owner, either the device or
>> > the CPU. Obviously, this rule needs to be extended when you get to
>> > multiple devices and multiple device drivers, plus possibly user
>> > mappings. Simply assigning the buffer to "the device" from one
>> > driver does not block other drivers from touching the buffer, and
>> > assigning it to "the cpu" does not stop other hardware that the
>> > code calling sync_sg_for_cpu is not aware of.
>> >
>> > The only way to solve this that I can think of right now is to
>> > mandate that the mappings are all coherent (i.e. noncachable
>> > on noncoherent architectures like ARM). If you do that, you no
>> > longer need the sync_sg_for_* calls.
>>
>> My original thinking was that you either need DMABUF_CPU_{PREP,FINI}
>> ioctls and corresponding dmabuf ops, which userspace is required to
>> call before / after CPU access.  Or just remove mmap() and do the
>> mmap() via allocating device and use that device's equivalent
>> DRM_XYZ_GEM_CPU_{PREP,FINI} or DRM_XYZ_GEM_SET_DOMAIN ioctls.  That
>> would give you a way to (a) synchronize with gpu/asynchronous
>> pipeline, (b) synchronize w/ multiple hw devices vs cpu accessing
>> buffer (ie. wait all devices have dma_buf_unmap_attachment'd).  And
>> that gives you a convenient place to do cache operations on
>> noncoherent architecture.
>>
>> I sort of preferred having the DMABUF shim because that lets you pass
>> a buffer around userspace without the receiving code knowing about a
>> device specific API.  But the problem I eventually came around to: if
>> your GL stack (or some other userspace component) is batching up
>> commands before submission to kernel, the buffers you need to wait for
>> completion might not even be submitted yet.  So from kernel
>> perspective they are "ready" for cpu access.  Even though in fact they
>> are not in a consistent state from rendering perspective.  I don't
>> really know a sane way to deal with that.  Maybe the approach instead
>> should be a userspace level API (in libkms/libdrm?) to provide
>> abstraction for userspace access to buffers rather than dealing with
>> this at the kernel level.
>
> Well, there's a reason GL has an explicit flush and extensions for sync
> objects. It's to support such scenarios where the driver batches up gpu
> commands before actually submitting them.

Hmm.. what about other non-GL APIs..  maybe vaapi/vdpau or similar?
(Or something that I haven't thought of.)

> Also, recent gpus have all (or
> shortly will grow) multiple execution pipelines, so it's also important
> that you sync up with the right command stream. Syncing up with all of
> them is generally frowned upon for obvious reasons ;-)

Well, I guess I am happy enough with something that is at least
functional.  Usespace access would (I think) mainly be weird edge case
type stuff.  But...

> So any userspace that interacts with an OpenGL driver needs to take care
> of this anyway. But I think for simpler stuff (v4l) kernel only coherency
> should work and userspace just needs to take care of gl interactions and
> call glflush and friends at the right points. I think we can flesh this
> out precisely when we spec the dmabuf EGL extension ... (or implement one
> of the preexisting ones already around).

.. yeah, I think egl/eglImage extension would be the right place to
hide this behind.  And I guess your GL stack should be able to figure
out which execution pipeline to sync, cache state of buffer, and
whatever other optimizations you might want to make.

> On the topic of a coherency model for dmabuf, I think we need to look at
> dma_buf_attachment_map/unmap (and also the mmap variants cpu_start and
> cpu_finish or whatever they might get called) as barriers:
>
> So after a dma_buf_map, all previsously completed dma operations (i.e.
> unmap already called) and any cpu writes (i.e. cpu_finish called) will be
> coherent. Similar rule holds for cpu access through the userspace mmap,
> only writes completed before the cpu_start will show up.
>
> Similar, writes done by the device are only guaranteed to show up after
> the _unmap. Dito for cpu writes and cpu_finish.
>
> In short we always need two function calls to denote the start/end of the
> "critical section".

Yup, this was exactly my assumption.  But I guess it is better to spell it out.

BR,
-R

> Any concurrent operations are allowed to yield garbage, meaning any
> combination of the old or either of the newly written contents (i.e.
> non-overlapping writes might not actually all end up in the buffer,
> but instead some old contents). Maybe we even need to loosen that to
> the real "undefined behaviour", but atm I can't think of an example.
>
> -Daniel
> --
> Daniel Vetter
> Mail: daniel@ffwll.ch
> Mobile: +41 (0)79 365 57 48
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-05 22:09       ` Arnd Bergmann
@ 2011-12-05 22:15         ` Rob Clark
  2011-12-05 22:35         ` Rob Clark
  1 sibling, 0 replies; 64+ messages in thread
From: Rob Clark @ 2011-12-05 22:15 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, t.stanislaws, linux, linux-mm, linux-kernel,
	dri-devel, linaro-mm-sig, linux-media, Sumit Semwal,
	m.szyprowski

On Mon, Dec 5, 2011 at 4:09 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>>
>> https://github.com/robclark/kernel-omap4/commits/dmabuf
>
> Ok, thanks. I think it would be good to post these for reference
> in v3, with a clear indication that they are not being submitted
> for discussion/inclusion yet.

btw, don't look at this too closely at that tree yet.. where the
attach/detach is done in videobuf2 code isn't really correct.  But I
was going to get something functioning first.

BR,
-R

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-05 22:11         ` Rob Clark
@ 2011-12-05 22:33           ` Daniel Vetter
  2011-12-06 13:16           ` Arnd Bergmann
  2011-12-07 13:27           ` Semwal, Sumit
  2 siblings, 0 replies; 64+ messages in thread
From: Daniel Vetter @ 2011-12-05 22:33 UTC (permalink / raw)
  To: Rob Clark
  Cc: Daniel Vetter, t.stanislaws, linux, Arnd Bergmann, linux-kernel,
	dri-devel, linaro-mm-sig, linux-mm, m.szyprowski, Sumit Semwal,
	linux-arm-kernel, linux-media

On Mon, Dec 05, 2011 at 04:11:46PM -0600, Rob Clark wrote:
> On Mon, Dec 5, 2011 at 3:23 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Mon, Dec 05, 2011 at 02:46:47PM -0600, Rob Clark wrote:
> >> I sort of preferred having the DMABUF shim because that lets you pass
> >> a buffer around userspace without the receiving code knowing about a
> >> device specific API.  But the problem I eventually came around to: if
> >> your GL stack (or some other userspace component) is batching up
> >> commands before submission to kernel, the buffers you need to wait for
> >> completion might not even be submitted yet.  So from kernel
> >> perspective they are "ready" for cpu access.  Even though in fact they
> >> are not in a consistent state from rendering perspective.  I don't
> >> really know a sane way to deal with that.  Maybe the approach instead
> >> should be a userspace level API (in libkms/libdrm?) to provide
> >> abstraction for userspace access to buffers rather than dealing with
> >> this at the kernel level.
> >
> > Well, there's a reason GL has an explicit flush and extensions for sync
> > objects. It's to support such scenarios where the driver batches up gpu
> > commands before actually submitting them.
>
> Hmm.. what about other non-GL APIs..  maybe vaapi/vdpau or similar?
> (Or something that I haven't thought of.)

They generally all have a concept of when they've actually commited the
rendering to an X pixmap or egl image. Usually it's rather implicit, e.g.
the driver will submit any outstanding batches before returning from any
calls.
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-05 22:04           ` Arnd Bergmann
@ 2011-12-05 22:33             ` Daniel Vetter
  0 siblings, 0 replies; 64+ messages in thread
From: Daniel Vetter @ 2011-12-05 22:33 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Daniel Vetter, t.stanislaws, linux,
	Sumit Semwal, linux-mm, linux-kernel, dri-devel, linaro-mm-sig,
	jesse.barker, rob, linux-media, Sumit Semwal, m.szyprowski

On Mon, Dec 05, 2011 at 11:04:09PM +0100, Arnd Bergmann wrote:
> On Monday 05 December 2011 21:58:39 Daniel Vetter wrote:
> > On Mon, Dec 05, 2011 at 08:29:49PM +0100, Arnd Bergmann wrote:
> > > ...
> >
> > Thanks a lot for this excellent overview. I think at least for the first
> > version of dmabuf we should drop the sync_* interfaces and simply require
> > users to bracket their usage of the buffer from the attached device by
> > map/unmap. A dma_buf provider is always free to cache the mapping and
> > simply call sync_sg_for of the streaming dma api.
>
> I think we still have the same problem if we allow multiple drivers
> to access a noncoherent buffer using map/unmap:
>
> 	driver A				driver B
>
> 1.	read/write				
> 2.						read/write
> 3.	map()					
> 4.						read/write
> 5.	dma
> 6.						map()
> 7.	dma
> 8.						dma
> 9.	unmap()
> 10.						dma
> 11.	read/write
> 12.						unmap()						
>
>
> In step 4, the buffer is owned by device A, but accessed by driver B, which
> is a bug. In step 11, the buffer is owned by device B but accessed by driver
> A, which is the same bug on the other side. In steps 7 and 8, the buffer
> is owned by both device A and B, which is currently undefined but would
> be ok if both devices are on the same coherency domain. Whether that point
> is meaningful depends on what the devices actually do. It would be ok
> if both are only reading, but not if they write into the same location
> concurrently.
>
> As I mentioned originally, the problem could be completely avoided if
> we only allow consistent (e.g. uncached) mappings or buffers that
> are not mapped into the kernel virtual address space at all.
>
> Alternatively, a clearer model would be to require each access to
> nonconsistent buffers to be exclusive: a map() operation would have
> to block until the current mapper (if any) has done an unmap(), and
> any access from the CPU would also have to call a dma_buf_ops pointer
> to serialize the CPU accesses with any device accesses. User
> mappings of the buffer can be easily blocked during a DMA access
> by unmapping the buffer from user space at map() time and blocking the
> vm_ops->fault() operation until the unmap().

See my other mail where I propose a more explicit coherency model, just a
comment here: GPU drivers hate blocking interfaces. Loathe, actually. In
general they're very happy to extend you any amount of rope if it can make
userspace a few percent faster.

So I think the right answer here is: You've asked for trouble, you've got
it. Also see the issue raised by Rob, at least for opengl (and also for
other graphics interfaces) the kernel is not even aware of all outstanding
rendering. So userspace needs to orchestrate access anyway if a gpu is
involved.

Otherwise I agree with your points in this mail.
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-05 22:09       ` Arnd Bergmann
  2011-12-05 22:15         ` Rob Clark
@ 2011-12-05 22:35         ` Rob Clark
  1 sibling, 0 replies; 64+ messages in thread
From: Rob Clark @ 2011-12-05 22:35 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, t.stanislaws, linux, linux-mm, linux-kernel,
	dri-devel, linaro-mm-sig, linux-media, Sumit Semwal,
	m.szyprowski

On Mon, Dec 5, 2011 at 4:09 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Monday 05 December 2011 14:46:47 Rob Clark wrote:
>> I sort of preferred having the DMABUF shim because that lets you pass
>> a buffer around userspace without the receiving code knowing about a
>> device specific API.  But the problem I eventually came around to: if
>> your GL stack (or some other userspace component) is batching up
>> commands before submission to kernel, the buffers you need to wait for
>> completion might not even be submitted yet.  So from kernel
>> perspective they are "ready" for cpu access.  Even though in fact they
>> are not in a consistent state from rendering perspective.  I don't
>> really know a sane way to deal with that.  Maybe the approach instead
>> should be a userspace level API (in libkms/libdrm?) to provide
>> abstraction for userspace access to buffers rather than dealing with
>> this at the kernel level.
>
> It would be nice if user space had no way to block out kernel drivers,
> otherwise we have to be very careful to ensure that each map() operation
> can be interrupted by a signal as the last resort to avoid deadlocks.

map_dma_buf should be documented to be allowed to return -EINTR..
otherwise, yeah, that would be problematic.

>        Arnd
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-05 22:11         ` Rob Clark
  2011-12-05 22:33           ` Daniel Vetter
@ 2011-12-06 13:16           ` Arnd Bergmann
  2011-12-06 15:28             ` Daniel Vetter
  2011-12-07 13:27           ` Semwal, Sumit
  2 siblings, 1 reply; 64+ messages in thread
From: Arnd Bergmann @ 2011-12-06 13:16 UTC (permalink / raw)
  To: Rob Clark
  Cc: Daniel Vetter, t.stanislaws, linux, linux-kernel, dri-devel,
	linaro-mm-sig, linux-mm, m.szyprowski, Sumit Semwal,
	linux-arm-kernel, linux-media

On Monday 05 December 2011, Rob Clark wrote:
> > On the topic of a coherency model for dmabuf, I think we need to look at
> > dma_buf_attachment_map/unmap (and also the mmap variants cpu_start and
> > cpu_finish or whatever they might get called) as barriers:
> >
> > So after a dma_buf_map, all previsously completed dma operations (i.e.
> > unmap already called) and any cpu writes (i.e. cpu_finish called) will be
> > coherent. Similar rule holds for cpu access through the userspace mmap,
> > only writes completed before the cpu_start will show up.
> >
> > Similar, writes done by the device are only guaranteed to show up after
> > the _unmap. Dito for cpu writes and cpu_finish.
> >
> > In short we always need two function calls to denote the start/end of the
> > "critical section".
> 
> Yup, this was exactly my assumption.  But I guess it is better to spell it out.

I still don't understand how this is going to help you if you let
multiple drivers enter and leave the critical section without serializing
against one another. That doesn't sound like what I know as critical
section.

Given some reasonable constraints (all devices must be in the same coherency
domain, for instance), you can probably define it in a way that you can
have multiple devices mapping the same buffer at the same time, and
when no device has mapped the buffer you can have as many concurrent
kernel and user space accesses on the same buffer as you like. But you
must still guarantee that no software touches a noncoherent buffer while
it is mapped into any device and vice versa.

Why can't we just mandate that all mappings into the kernel must be
coherent and that user space accesses must either be coherent as well
or be done by user space that uses explicit serialization with all
DMA accesses?

	Arnd

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-06 13:16           ` Arnd Bergmann
@ 2011-12-06 15:28             ` Daniel Vetter
  0 siblings, 0 replies; 64+ messages in thread
From: Daniel Vetter @ 2011-12-06 15:28 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Rob Clark, Daniel Vetter, t.stanislaws, linux, linux-kernel,
	dri-devel, linaro-mm-sig, linux-mm, m.szyprowski, Sumit Semwal,
	linux-arm-kernel, linux-media

On Tue, Dec 06, 2011 at 01:16:58PM +0000, Arnd Bergmann wrote:
> On Monday 05 December 2011, Rob Clark wrote:
> > > On the topic of a coherency model for dmabuf, I think we need to look at
> > > dma_buf_attachment_map/unmap (and also the mmap variants cpu_start and
> > > cpu_finish or whatever they might get called) as barriers:
> > >
> > > So after a dma_buf_map, all previsously completed dma operations (i.e.
> > > unmap already called) and any cpu writes (i.e. cpu_finish called) will be
> > > coherent. Similar rule holds for cpu access through the userspace mmap,
> > > only writes completed before the cpu_start will show up.
> > >
> > > Similar, writes done by the device are only guaranteed to show up after
> > > the _unmap. Dito for cpu writes and cpu_finish.
> > >
> > > In short we always need two function calls to denote the start/end of the
> > > "critical section".
> >
> > Yup, this was exactly my assumption.  But I guess it is better to spell it out.
>
> I still don't understand how this is going to help you if you let
> multiple drivers enter and leave the critical section without serializing
> against one another. That doesn't sound like what I know as critical
> section.

I already regret to having added that last "critical section" remark.
Think barriers. It's just that you need a barrier in both directions that
bracket the actual usage. In i915-land we call the first one generally
invalidate (so that caches on the target domain don't contain stale data)
and that second one flush (to get any data out of caches).

> Given some reasonable constraints (all devices must be in the same coherency
> domain, for instance), you can probably define it in a way that you can
> have multiple devices mapping the same buffer at the same time, and
> when no device has mapped the buffer you can have as many concurrent
> kernel and user space accesses on the same buffer as you like. But you
> must still guarantee that no software touches a noncoherent buffer while
> it is mapped into any device and vice versa.
>
> Why can't we just mandate that all mappings into the kernel must be
> coherent and that user space accesses must either be coherent as well
> or be done by user space that uses explicit serialization with all
> DMA accesses?

I agree with your points here, afaics the contentious issue is just
whether dma_buf should _enforce_ this strict ordering. I'm leading towards
a "no" for the following reasons:

- gpu people love nonblocking interfaces (and love to come up with
  abuses). In the generic case we'd need some more functions to properly
  flush everything while 2 devices access a buffer concurrently (which is
  imo a bit unrealistic). But e.g. 2 gpus rendering in SLI mode very much
  want to access the same buffer at the same time (and the
  kernel+userspace gpu driver already needs all the information about
  caches to make that happen, at least on x86).

- Buffer sharing alone has already some great potential for deadlock and
  lock recursion issues. Making dma_buf into something that very much acts
  like a new locking primitive itself (even exposed to userspace) will
  make this much worse. I've seen some of the kernel/userspace shared
  hwlock code of dri1 yonder, and it's horrible (and at least for the case
  of the dri1 hwlock, totally broken).

- All current subsystem already have the concept to pass the ownership of
  a buffer between the device and userspace (sometimes even more than just
  2 domains, like in i915 ...). Userspace already needs to use this
  interface to get anything resembling correct data. I don't see any case
  where userspace can't enforce passing around buffer ownership if
  multiple devices are involved (we obviously need to clarify subsystem
  interfaces to make it clear when a buffer is in use and when another
  device taking part in the sharing could use it). So I don't see how the
  kernel enforcing strict access ordering helps implementing correct
  userspace.

- I don't see any security needs that would make it necessary for the
  kernel to enforce any consistency guarantees for concurrent access -
  we're only dealing with pixel data in all the currently discussed
  generic use-cases. So I think garbage as an end-result is acceptable if
  userspace does stupid things (or fails at trying to be clever).

Cheers, Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-05 17:18   ` Arnd Bergmann
  2011-12-05 18:55     ` Daniel Vetter
  2011-12-05 20:46     ` Rob Clark
@ 2011-12-07  6:35     ` Semwal, Sumit
  2011-12-07 10:11       ` Arnd Bergmann
  2011-12-09 22:50     ` [Linaro-mm-sig] " Robert Morell
  3 siblings, 1 reply; 64+ messages in thread
From: Semwal, Sumit @ 2011-12-07  6:35 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-kernel, linux-arm-kernel, linux-mm, linaro-mm-sig,
	dri-devel, linux-media, linux, jesse.barker, m.szyprowski, rob,
	daniel, t.stanislaws, Sumit Semwal

Hi Arnd,

Thanks for your review!
On Mon, Dec 5, 2011 at 10:48 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Friday 02 December 2011, Sumit Semwal wrote:
>> This is the first step in defining a dma buffer sharing mechanism.
>
> This looks very nice, but there are a few things I don't understand yet
> and a bunch of trivial comments I have about things I spotted.
>
> Do you have prototype exporter and consumer drivers that you can post
> for clarification?
>
> In the patch 2, you have a section about migration that mentions that
> it is possible to export a buffer that can be migrated after it
> is already mapped into one user driver. How does that work when
> the physical addresses are mapped into a consumer device already?
I guess I need to clear it up in the documentation - when I said "once
all ongoing access is completed" - I meant to say "once all current
users have finished accessing and have unmapped this buffer". So I
agree with Rob - the migration would only be possible for "attached
but unmapped" buffers. I will update the documentation.
>
>> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
>> index 21cf46f..07d8095 100644
>> --- a/drivers/base/Kconfig
>> +++ b/drivers/base/Kconfig
>> @@ -174,4 +174,14 @@ config SYS_HYPERVISOR
>>
>>  source "drivers/base/regmap/Kconfig"
>>
>> +config DMA_SHARED_BUFFER
>> +     bool "Buffer framework to be shared between drivers"
>> +     default n
>> +     depends on ANON_INODES
>
> I would make this 'select ANON_INODES', like the other users of this
> feature.
Sure.

>
>> +     return dmabuf;
>> +}
>> +EXPORT_SYMBOL(dma_buf_export);
>
> I agree with Konrad, this should definitely be EXPORT_SYMBOL_GPL,
> because it's really a low-level function that I would expect
> to get used by in-kernel subsystems providing the feature to
> users and having back-end drivers, but it's not the kind of thing
> we want out-of-tree drivers to mess with.
Agreed.

>
>> +/**
>> + * dma_buf_fd - returns a file descriptor for the given dma_buf
>> + * @dmabuf:  [in]    pointer to dma_buf for which fd is required.
>> + *
>> + * On success, returns an associated 'fd'. Else, returns error.
>> + */
>> +int dma_buf_fd(struct dma_buf *dmabuf)
>> +{
>> +     int error, fd;
>> +
>> +     if (!dmabuf->file)
>> +             return -EINVAL;
>> +
>> +     error = get_unused_fd_flags(0);
>
> Why not simply get_unused_fd()?
:) oversight. Will correct.

>
>> +/**
>> + * dma_buf_attach - Add the device to dma_buf's attachments list; optionally,
>> + * calls attach() of dma_buf_ops to allow device-specific attach functionality
>> + * @dmabuf:  [in]    buffer to attach device to.
>> + * @dev:     [in]    device to be attached.
>> + *
>> + * Returns struct dma_buf_attachment * for this attachment; may return NULL.
>> + *
>
> Or may return a negative error code. It's better to be consistent here:
> either always return NULL on error, or change the allocation error to
> ERR_PTR(-ENOMEM).
Ok, that makes sense.

>
>> + */
>> +struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
>> +                                             struct device *dev)
>> +{
>> +     struct dma_buf_attachment *attach;
>> +     int ret;
>> +
>> +     BUG_ON(!dmabuf || !dev);
>> +
>> +     attach = kzalloc(sizeof(struct dma_buf_attachment), GFP_KERNEL);
>> +     if (attach == NULL)
>> +             goto err_alloc;
>> +
>> +     mutex_lock(&dmabuf->lock);
>> +
>> +     attach->dev = dev;
>> +     attach->dmabuf = dmabuf;
>> +     if (dmabuf->ops->attach) {
>> +             ret = dmabuf->ops->attach(dmabuf, dev, attach);
>> +             if (!ret)
>> +                     goto err_attach;
>
> You probably mean "if (ret)" here instead of "if (!ret)", right?
yes - a stupid one! will correct.

>
>> +     /* allow allocator to take care of cache ops */
>> +     void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
>> +     void (*sync_sg_for_device)(struct dma_buf *, struct device *);
>
> I don't see how this works with multiple consumers: For the streaming
> DMA mapping, there must be exactly one owner, either the device or
> the CPU. Obviously, this rule needs to be extended when you get to
> multiple devices and multiple device drivers, plus possibly user
> mappings. Simply assigning the buffer to "the device" from one
> driver does not block other drivers from touching the buffer, and
> assigning it to "the cpu" does not stop other hardware that the
> code calling sync_sg_for_cpu is not aware of.
>
> The only way to solve this that I can think of right now is to
> mandate that the mappings are all coherent (i.e. noncachable
> on noncoherent architectures like ARM). If you do that, you no
> longer need the sync_sg_for_* calls.
I will take yours and Daniel's suggestion, and remove these; if at all
they're needed, we can add them back again later, with
/s/device/attachment as suggested by Daniel.
>
>> +#ifdef CONFIG_DMA_SHARED_BUFFER
>
> Do you have a use case for making the interface compile-time disabled?
> I had assumed that any code using it would make no sense if it's not
> available so you don't actually need this.
Ok. Though if we keep the interface compile-time disabled, the users
can actually check and fail or fall-back gracefully when the API is
not available; If I remove it, anyways the users would need to do the
same compile time check whether API is available or not, right?

>
>        Arnd
Thanks, and best regards,
~Sumit.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-07  6:35     ` Semwal, Sumit
@ 2011-12-07 10:11       ` Arnd Bergmann
  2011-12-07 11:02         ` Semwal, Sumit
  0 siblings, 1 reply; 64+ messages in thread
From: Arnd Bergmann @ 2011-12-07 10:11 UTC (permalink / raw)
  To: Semwal, Sumit
  Cc: linux-kernel, linux-arm-kernel, linux-mm, linaro-mm-sig,
	dri-devel, linux-media, linux, jesse.barker, m.szyprowski, rob,
	daniel, t.stanislaws, Sumit Semwal

On Wednesday 07 December 2011, Semwal, Sumit wrote:
> >
> > Do you have a use case for making the interface compile-time disabled?
> > I had assumed that any code using it would make no sense if it's not
> > available so you don't actually need this.
>
> Ok. Though if we keep the interface compile-time disabled, the users
> can actually check and fail or fall-back gracefully when the API is
> not available; If I remove it, anyways the users would need to do the
> same compile time check whether API is available or not, right?

If you have to do a compile-time check for the config symbol, it's better
to do it the way you did here than in the caller.

My guess was that no caller would actually require this, because when you
write a part of a subsystem to interact with the dma-buf infrastructure,
you would always disable compilation of an extire file that deals with 
everything related to struct dma_buf, not just stub out the calls.

	Arnd

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-07 10:11       ` Arnd Bergmann
@ 2011-12-07 11:02         ` Semwal, Sumit
  2011-12-07 11:34           ` Arnd Bergmann
  0 siblings, 1 reply; 64+ messages in thread
From: Semwal, Sumit @ 2011-12-07 11:02 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-kernel, linux-arm-kernel, linux-mm, linaro-mm-sig,
	dri-devel, linux-media, linux, jesse.barker, m.szyprowski, rob,
	daniel, t.stanislaws, Sumit Semwal

On Wed, Dec 7, 2011 at 3:41 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Wednesday 07 December 2011, Semwal, Sumit wrote:
>> >
>> > Do you have a use case for making the interface compile-time disabled?
>> > I had assumed that any code using it would make no sense if it's not
>> > available so you don't actually need this.
>>
>> Ok. Though if we keep the interface compile-time disabled, the users
>> can actually check and fail or fall-back gracefully when the API is
>> not available; If I remove it, anyways the users would need to do the
>> same compile time check whether API is available or not, right?
>
> If you have to do a compile-time check for the config symbol, it's better
> to do it the way you did here than in the caller.
>
> My guess was that no caller would actually require this, because when you
> write a part of a subsystem to interact with the dma-buf infrastructure,
> you would always disable compilation of an extire file that deals with
> everything related to struct dma_buf, not just stub out the calls.

Right; that would be ideal, but we may not be able to ask each user to
do so - especially when the sharing part might be interspersed in
existing buffer handling code. So for now, I would like to keep it as
it-is.
>
>        Arnd
>
BR,
~Sumit.
> --

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-07 11:02         ` Semwal, Sumit
@ 2011-12-07 11:34           ` Arnd Bergmann
  0 siblings, 0 replies; 64+ messages in thread
From: Arnd Bergmann @ 2011-12-07 11:34 UTC (permalink / raw)
  To: Semwal, Sumit
  Cc: linux-kernel, linux-arm-kernel, linux-mm, linaro-mm-sig,
	dri-devel, linux-media, linux, jesse.barker, m.szyprowski, rob,
	daniel, t.stanislaws, Sumit Semwal

On Wednesday 07 December 2011, Semwal, Sumit wrote:
> Right; that would be ideal, but we may not be able to ask each user to
> do so - especially when the sharing part might be interspersed in
> existing buffer handling code. So for now, I would like to keep it as
> it-is.

Ok, fair enough. It certainly doesn't hurt.

	Arnd

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-05 22:11         ` Rob Clark
  2011-12-05 22:33           ` Daniel Vetter
  2011-12-06 13:16           ` Arnd Bergmann
@ 2011-12-07 13:27           ` Semwal, Sumit
  2011-12-07 13:40             ` Arnd Bergmann
  2 siblings, 1 reply; 64+ messages in thread
From: Semwal, Sumit @ 2011-12-07 13:27 UTC (permalink / raw)
  To: Rob Clark
  Cc: Daniel Vetter, t.stanislaws, linux, Arnd Bergmann, linux-kernel,
	dri-devel, linaro-mm-sig, linux-mm, m.szyprowski, Sumit Semwal,
	linux-arm-kernel, linux-media

Hi Daniel, Rob,


On Tue, Dec 6, 2011 at 3:41 AM, Rob Clark <rob@ti.com> wrote:
> On Mon, Dec 5, 2011 at 3:23 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
>> On Mon, Dec 05, 2011 at 02:46:47PM -0600, Rob Clark wrote:
>>> On Mon, Dec 5, 2011 at 11:18 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>>> > In the patch 2, you have a section about migration that mentions that
>>> > it is possible to export a buffer that can be migrated after it
>>> > is already mapped into one user driver. How does that work when
>>> > the physical addresses are mapped into a consumer device already?
>>>
>>> I think you can do physical migration if you are attached, but
>>> probably not if you are mapped.
>>
>> Yeah, that's very much how I see this, and also why map/unmap (at least
>> for simple users like v4l) should only bracket actual usage. GPU memory
>> managers need to be able to move around buffers while no one is using
>> them.
>>
>> [snip]
>>
>>> >> +     /* allow allocator to take care of cache ops */
>>> >> +     void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
>>> >> +     void (*sync_sg_for_device)(struct dma_buf *, struct device *);
>>> >
>>> > I don't see how this works with multiple consumers: For the streaming
>>> > DMA mapping, there must be exactly one owner, either the device or
>>> > the CPU. Obviously, this rule needs to be extended when you get to
>>> > multiple devices and multiple device drivers, plus possibly user
>>> > mappings. Simply assigning the buffer to "the device" from one
>>> > driver does not block other drivers from touching the buffer, and
>>> > assigning it to "the cpu" does not stop other hardware that the
>>> > code calling sync_sg_for_cpu is not aware of.
>>> >
>>> > The only way to solve this that I can think of right now is to
>>> > mandate that the mappings are all coherent (i.e. noncachable
>>> > on noncoherent architectures like ARM). If you do that, you no
>>> > longer need the sync_sg_for_* calls.
>>>
>>> My original thinking was that you either need DMABUF_CPU_{PREP,FINI}
>>> ioctls and corresponding dmabuf ops, which userspace is required to
>>> call before / after CPU access.  Or just remove mmap() and do the
>>> mmap() via allocating device and use that device's equivalent
>>> DRM_XYZ_GEM_CPU_{PREP,FINI} or DRM_XYZ_GEM_SET_DOMAIN ioctls.  That
>>> would give you a way to (a) synchronize with gpu/asynchronous
>>> pipeline, (b) synchronize w/ multiple hw devices vs cpu accessing
>>> buffer (ie. wait all devices have dma_buf_unmap_attachment'd).  And
>>> that gives you a convenient place to do cache operations on
>>> noncoherent architecture.
>>>
>>> I sort of preferred having the DMABUF shim because that lets you pass
>>> a buffer around userspace without the receiving code knowing about a
>>> device specific API.  But the problem I eventually came around to: if
>>> your GL stack (or some other userspace component) is batching up
>>> commands before submission to kernel, the buffers you need to wait for
>>> completion might not even be submitted yet.  So from kernel
>>> perspective they are "ready" for cpu access.  Even though in fact they
>>> are not in a consistent state from rendering perspective.  I don't
>>> really know a sane way to deal with that.  Maybe the approach instead
>>> should be a userspace level API (in libkms/libdrm?) to provide
>>> abstraction for userspace access to buffers rather than dealing with
>>> this at the kernel level.
>>
>> Well, there's a reason GL has an explicit flush and extensions for sync
>> objects. It's to support such scenarios where the driver batches up gpu
>> commands before actually submitting them.
>
> Hmm.. what about other non-GL APIs..  maybe vaapi/vdpau or similar?
> (Or something that I haven't thought of.)
>
>> Also, recent gpus have all (or
>> shortly will grow) multiple execution pipelines, so it's also important
>> that you sync up with the right command stream. Syncing up with all of
>> them is generally frowned upon for obvious reasons ;-)
>
> Well, I guess I am happy enough with something that is at least
> functional.  Usespace access would (I think) mainly be weird edge case
> type stuff.  But...
>
<snip>
>
>> On the topic of a coherency model for dmabuf, I think we need to look at
>> dma_buf_attachment_map/unmap (and also the mmap variants cpu_start and
>> cpu_finish or whatever they might get called) as barriers:
>>
>> So after a dma_buf_map, all previsously completed dma operations (i.e.
>> unmap already called) and any cpu writes (i.e. cpu_finish called) will be
>> coherent. Similar rule holds for cpu access through the userspace mmap,
>> only writes completed before the cpu_start will show up.
>>
>> Similar, writes done by the device are only guaranteed to show up after
>> the _unmap. Dito for cpu writes and cpu_finish.
>>
>> In short we always need two function calls to denote the start/end of the
>> "critical section".
>
> Yup, this was exactly my assumption.  But I guess it is better to spell it out.

Thanks for the excellent discussion - it indeed is very good learning
for the relatively-inexperienced me :)

So, for the purpose of dma-buf framework, could I summarize the
following and rework accordingly?:
1. remove mmap() dma_buf_op [and mmap fop], and introduce cpu_start(),
cpu_finish() ops to bracket cpu accesses to the buffer. Also add
DMABUF_CPU_START / DMABUF_CPU_FINI IOCTLs?
2. remove sg_sync* ops for now (and we'll see if we need to add them
later if needed)
>
> BR,
> -R
>
>> Any concurrent operations are allowed to yield garbage, meaning any
>> combination of the old or either of the newly written contents (i.e.
>> non-overlapping writes might not actually all end up in the buffer,
>> but instead some old contents). Maybe we even need to loosen that to
>> the real "undefined behaviour", but atm I can't think of an example.
I guess that should be acceptable for our video / media use cases. How
about other potential users of dma-buf? [I am asking this because
Jesse did tell me that there were some other subsystems also
interested in dmabuf usage]
>>
>> -Daniel
BR,
~Sumit.
>> --
>> Daniel Vetter
>> Mail: daniel@ffwll.ch
>> Mobile: +41 (0)79 365 57 48
<snip>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-07 13:27           ` Semwal, Sumit
@ 2011-12-07 13:40             ` Arnd Bergmann
  2011-12-08 21:44               ` [Linaro-mm-sig] " Daniel Vetter
  0 siblings, 1 reply; 64+ messages in thread
From: Arnd Bergmann @ 2011-12-07 13:40 UTC (permalink / raw)
  To: Semwal, Sumit
  Cc: Rob Clark, Daniel Vetter, t.stanislaws, linux, linux-kernel,
	dri-devel, linaro-mm-sig, linux-mm, m.szyprowski, Sumit Semwal,
	linux-arm-kernel, linux-media

On Wednesday 07 December 2011, Semwal, Sumit wrote:
> Thanks for the excellent discussion - it indeed is very good learning
> for the relatively-inexperienced me :)
> 
> So, for the purpose of dma-buf framework, could I summarize the
> following and rework accordingly?:
> 1. remove mmap() dma_buf_op [and mmap fop], and introduce cpu_start(),
> cpu_finish() ops to bracket cpu accesses to the buffer. Also add
> DMABUF_CPU_START / DMABUF_CPU_FINI IOCTLs?

I think we'd be better off for now without the extra ioctls and
just document that a shared buffer must not be exported to user
space using mmap at all, to avoid those problems. Serialization
between GPU and CPU is on a higher level than the dma_buf framework
IMHO.

> 2. remove sg_sync* ops for now (and we'll see if we need to add them
> later if needed)

Just removing the sg_sync_* operations is not enough. We have to make
the decision whether we want to allow
a) only coherent mappings of the buffer into kernel memory (requiring
an extension to the dma_map_ops on ARM to not flush caches at map/unmap
time)
b) not allowing any in-kernel mappings (same requirement on ARM, also
limits the usefulness of the dma_buf if we cannot access it from the
kernel or from user space)
c) only allowing streaming mappings, even if those are non-coherent
(requiring strict serialization between CPU (in-kernel) and dma users of
the buffer)

This issue has to be solved or we get random data corruption.

	Arnd

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-07 13:40             ` Arnd Bergmann
@ 2011-12-08 21:44               ` Daniel Vetter
  2011-12-09 14:13                 ` Arnd Bergmann
  2011-12-13 13:33                 ` Hans Verkuil
  0 siblings, 2 replies; 64+ messages in thread
From: Daniel Vetter @ 2011-12-08 21:44 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Semwal, Sumit, linux, linux-kernel, dri-devel, linaro-mm-sig,
	linux-mm, linux-media, linux-arm-kernel

On Wed, Dec 7, 2011 at 14:40, Arnd Bergmann <arnd@arndb.de> wrote:
> On Wednesday 07 December 2011, Semwal, Sumit wrote:
>> Thanks for the excellent discussion - it indeed is very good learning
>> for the relatively-inexperienced me :)
>>
>> So, for the purpose of dma-buf framework, could I summarize the
>> following and rework accordingly?:
>> 1. remove mmap() dma_buf_op [and mmap fop], and introduce cpu_start(),
>> cpu_finish() ops to bracket cpu accesses to the buffer. Also add
>> DMABUF_CPU_START / DMABUF_CPU_FINI IOCTLs?
>
> I think we'd be better off for now without the extra ioctls and
> just document that a shared buffer must not be exported to user
> space using mmap at all, to avoid those problems. Serialization
> between GPU and CPU is on a higher level than the dma_buf framework
> IMHO.

Agreed.

>> 2. remove sg_sync* ops for now (and we'll see if we need to add them
>> later if needed)
>
> Just removing the sg_sync_* operations is not enough. We have to make
> the decision whether we want to allow
> a) only coherent mappings of the buffer into kernel memory (requiring
> an extension to the dma_map_ops on ARM to not flush caches at map/unmap
> time)
> b) not allowing any in-kernel mappings (same requirement on ARM, also
> limits the usefulness of the dma_buf if we cannot access it from the
> kernel or from user space)
> c) only allowing streaming mappings, even if those are non-coherent
> (requiring strict serialization between CPU (in-kernel) and dma users of
> the buffer)

I think only allowing streaming access makes the most sense:
- I don't see much (if any need) for the kernel to access a dma_buf -
in all current usecases it just contains pixel data and no hw-specific
things (like sg tables, command buffers, ..). At most I see the need
for the kernel to access the buffer for dma bounce buffers, but that
is internal to the dma subsystem (and hence does not need to be
exposed).
- Userspace can still access the contents through the exporting
subsystem (e.g. use some gem mmap support). For efficiency reason gpu
drivers are already messing around with cache coherency in a platform
specific way (and hence violated the dma api a bit), so we could stuff
the mmap coherency in there, too. When we later on extend dma_buf
support so that other drivers than the gpu can export dma_bufs, we can
then extend the official dma api with already a few drivers with
use-patterns around.

But I still think that the kernel must not be required to enforce
correct access ordering for the reasons outlined in my other mail.
-Daniel
-- 
Daniel Vetter
daniel.vetter@ffwll.ch - +41 (0) 79 364 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-08 21:44               ` [Linaro-mm-sig] " Daniel Vetter
@ 2011-12-09 14:13                 ` Arnd Bergmann
  2011-12-09 14:24                   ` Alan Cox
  2011-12-20  9:03                   ` Sakari Ailus
  2011-12-13 13:33                 ` Hans Verkuil
  1 sibling, 2 replies; 64+ messages in thread
From: Arnd Bergmann @ 2011-12-09 14:13 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Semwal, Sumit, linux, linux-kernel, dri-devel, linaro-mm-sig,
	linux-mm, linux-media, linux-arm-kernel

On Thursday 08 December 2011, Daniel Vetter wrote:
> > c) only allowing streaming mappings, even if those are non-coherent
> > (requiring strict serialization between CPU (in-kernel) and dma users of
> > the buffer)
> 
> I think only allowing streaming access makes the most sense:
> - I don't see much (if any need) for the kernel to access a dma_buf -
> in all current usecases it just contains pixel data and no hw-specific
> things (like sg tables, command buffers, ..). At most I see the need
> for the kernel to access the buffer for dma bounce buffers, but that
> is internal to the dma subsystem (and hence does not need to be
> exposed).
> - Userspace can still access the contents through the exporting
> subsystem (e.g. use some gem mmap support). For efficiency reason gpu
> drivers are already messing around with cache coherency in a platform
> specific way (and hence violated the dma api a bit), so we could stuff
> the mmap coherency in there, too. When we later on extend dma_buf
> support so that other drivers than the gpu can export dma_bufs, we can
> then extend the official dma api with already a few drivers with
> use-patterns around.
> 
> But I still think that the kernel must not be required to enforce
> correct access ordering for the reasons outlined in my other mail.

I still don't think that's possible. Please explain how you expect
to change the semantics of the streaming mapping API to allow multiple
mappers without having explicit serialization points that are visible
to all users. For simplicity, let's assume a cache coherent system
with bounce buffers where map() copies the buffer to a dma area
and unmap() copies it back to regular kernel memory. How does a driver
know if it can touch the buffer in memory or from DMA at any given
point in time? Note that this problem is the same as the cache coherency
problem but may be easier to grasp.

	Arnd

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-09 14:13                 ` Arnd Bergmann
@ 2011-12-09 14:24                   ` Alan Cox
  2011-12-10  4:01                     ` Daniel Vetter
  2011-12-20  9:03                   ` Sakari Ailus
  1 sibling, 1 reply; 64+ messages in thread
From: Alan Cox @ 2011-12-09 14:24 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Daniel Vetter, Semwal, Sumit, linux, linux-kernel, dri-devel,
	linaro-mm-sig, linux-mm, linux-media, linux-arm-kernel

> I still don't think that's possible. Please explain how you expect
> to change the semantics of the streaming mapping API to allow multiple
> mappers without having explicit serialization points that are visible
> to all users. For simplicity, let's assume a cache coherent system

I would agree. It's not just about barriers but in many cases where you
have multiple mappings by hardware devices the actual hardware interface
is specific to the devices. Just take a look at the fencing in TTM and
the graphics drivers.

Its not something the low level API can deal with, it requires high level
knowledge.

Alan

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-05 17:18   ` Arnd Bergmann
                       ` (2 preceding siblings ...)
  2011-12-07  6:35     ` Semwal, Sumit
@ 2011-12-09 22:50     ` Robert Morell
  2011-12-10 11:13       ` Mauro Carvalho Chehab
  3 siblings, 1 reply; 64+ messages in thread
From: Robert Morell @ 2011-12-09 22:50 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Sumit Semwal, linux, jesse.barker, linux-kernel, dri-devel,
	linaro-mm-sig, linux-mm, daniel, linux-arm-kernel, linux-media

On Mon, Dec 05, 2011 at 09:18:48AM -0800, Arnd Bergmann wrote:
> On Friday 02 December 2011, Sumit Semwal wrote:
> > This is the first step in defining a dma buffer sharing mechanism.
> 
[...]
> 
> > +	return dmabuf;
> > +}
> > +EXPORT_SYMBOL(dma_buf_export);
> 
> I agree with Konrad, this should definitely be EXPORT_SYMBOL_GPL,
> because it's really a low-level function that I would expect
> to get used by in-kernel subsystems providing the feature to
> users and having back-end drivers, but it's not the kind of thing
> we want out-of-tree drivers to mess with.

Is this really necessary?  If this is intended to be a
lowest-common-denominator between many drivers to allow buffer sharing,
it seems like it needs to be able to be usable by all drivers.

If the interface is not accessible then I fear many drivers will be
forced to continue to roll their own buffer sharing mechanisms (which is
exactly what we're trying to avoid here, needless to say).

- Robert

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-09 14:24                   ` Alan Cox
@ 2011-12-10  4:01                     ` Daniel Vetter
  2011-12-12 16:48                       ` Arnd Bergmann
  0 siblings, 1 reply; 64+ messages in thread
From: Daniel Vetter @ 2011-12-10  4:01 UTC (permalink / raw)
  To: Alan Cox
  Cc: Arnd Bergmann, Semwal, Sumit, linux, linux-kernel, dri-devel,
	linaro-mm-sig, linux-mm, linux-media, linux-arm-kernel

On Fri, Dec 9, 2011 at 15:24, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>> I still don't think that's possible. Please explain how you expect
>> to change the semantics of the streaming mapping API to allow multiple
>> mappers without having explicit serialization points that are visible
>> to all users. For simplicity, let's assume a cache coherent system

I think I understand the cache flushing issues on arm (we're doing a
similar madness with manually flushing caches for i915 because the gpu
isn't coherent with the cpu and other dma devices). And I also agree
that you cannot make concurrent mappings work without adding something
on top of the current streaming api/dma-buf proposal. I just think
that it's not the kernel's business (well, at least not dma_buf's
business) to enforce that. If userspace (through some driver calls)
tries to do stupid things, it'll just get garbage. See
Message-ID: <CAKMK7uHeXYn-v_8cmpLNWsFY14KtmuRZy8YRKR5Xst2-2WdFSQ@mail.gmail.com>
for my reasons why it think this is the right way to go forward. So in
essence I'm really interested in the reasons why you want the kernel
to enforce this (or I'm completely missing what's the contentious
issue here).

> I would agree. It's not just about barriers but in many cases where you
> have multiple mappings by hardware devices the actual hardware interface
> is specific to the devices. Just take a look at the fencing in TTM and
> the graphics drivers.
>
> Its not something the low level API can deal with, it requires high level
> knowledge.

Yes, we might want to add some form of in-kernel sync objects on top
of dma_buf, but I'm not really convinced that this will buy us much
above simply synchronizing access in userspace with the existing ipc
tools. In my experience the expensive part of syncing two graphics
engines with the cpu is that we wake up the cpu and prevent it from
going into low-power states if we do this too often. Adding some more
overhead by going through userspace doesn't really make it much worse.
It's a completely different story if there's a hw facility to
synchronize engines without the cpu's involvement (like there is on
every multi-pipe gpu) and there sync objects make tons of sense. But
I'm not aware of that existing on SoCs between different IP blocks.

Cheers, Daniel
-- 
Daniel Vetter
daniel.vetter@ffwll.ch - +41 (0) 79 364 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-09 22:50     ` [Linaro-mm-sig] " Robert Morell
@ 2011-12-10 11:13       ` Mauro Carvalho Chehab
  2011-12-12 22:44         ` Robert Morell
  0 siblings, 1 reply; 64+ messages in thread
From: Mauro Carvalho Chehab @ 2011-12-10 11:13 UTC (permalink / raw)
  To: Robert Morell
  Cc: Arnd Bergmann, Sumit Semwal, linux, jesse.barker, linux-kernel,
	dri-devel, linaro-mm-sig, linux-mm, daniel, linux-arm-kernel,
	linux-media

On 09-12-2011 20:50, Robert Morell wrote:
> On Mon, Dec 05, 2011 at 09:18:48AM -0800, Arnd Bergmann wrote:
>> On Friday 02 December 2011, Sumit Semwal wrote:
>>> This is the first step in defining a dma buffer sharing mechanism.
>>
> [...]
>>
>>> +	return dmabuf;
>>> +}
>>> +EXPORT_SYMBOL(dma_buf_export);
>>
>> I agree with Konrad, this should definitely be EXPORT_SYMBOL_GPL,
>> because it's really a low-level function that I would expect
>> to get used by in-kernel subsystems providing the feature to
>> users and having back-end drivers, but it's not the kind of thing
>> we want out-of-tree drivers to mess with.
>
> Is this really necessary?  If this is intended to be a
> lowest-common-denominator between many drivers to allow buffer sharing,
> it seems like it needs to be able to be usable by all drivers.
>
> If the interface is not accessible then I fear many drivers will be
> forced to continue to roll their own buffer sharing mechanisms (which is
> exactly what we're trying to avoid here, needless to say).

Doing a buffer sharing with something that is not GPL is not fun, as, if any
issue rises there, it would be impossible to discover if the problem is either
at the closed-source driver or at the open source one. At the time I was using
the Nvidia proprietary driver, it was very common to have unexplained issues
caused likely by bad code there at the buffer management code, causing X
applications and extensions (like xv) to break.

We should really make this EXPORT_SYMBOL_GPL(), in order to be able to latter
debug future share buffer issues, when needed.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-10  4:01                     ` Daniel Vetter
@ 2011-12-12 16:48                       ` Arnd Bergmann
  2011-12-19  6:16                         ` Semwal, Sumit
  0 siblings, 1 reply; 64+ messages in thread
From: Arnd Bergmann @ 2011-12-12 16:48 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Alan Cox, Semwal, Sumit, linux, linux-kernel, dri-devel,
	linaro-mm-sig, linux-mm, linux-media, linux-arm-kernel

On Saturday 10 December 2011, Daniel Vetter wrote:
> If userspace (through some driver calls)
> tries to do stupid things, it'll just get garbage. See
> Message-ID: <CAKMK7uHeXYn-v_8cmpLNWsFY14KtmuRZy8YRKR5Xst2-2WdFSQ@mail.gmail.com>
> for my reasons why it think this is the right way to go forward. So in
> essence I'm really interested in the reasons why you want the kernel
> to enforce this (or I'm completely missing what's the contentious
> issue here).

This has nothing to do with user space mappings. Whatever user space does,
you get garbage if you don't invalidate cache lines that were introduced
through speculative prefetching before you access cache lines that were
DMA'd from a device.

	Arnd



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-10 11:13       ` Mauro Carvalho Chehab
@ 2011-12-12 22:44         ` Robert Morell
  2011-12-13 15:10           ` Arnd Bergmann
  0 siblings, 1 reply; 64+ messages in thread
From: Robert Morell @ 2011-12-12 22:44 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Arnd Bergmann, Sumit Semwal, linux, jesse.barker, linux-kernel,
	dri-devel, linaro-mm-sig, linux-mm, daniel, linux-arm-kernel,
	linux-media

On Sat, Dec 10, 2011 at 03:13:06AM -0800, Mauro Carvalho Chehab wrote:
> On 09-12-2011 20:50, Robert Morell wrote:
> > On Mon, Dec 05, 2011 at 09:18:48AM -0800, Arnd Bergmann wrote:
> >> On Friday 02 December 2011, Sumit Semwal wrote:
> >>> This is the first step in defining a dma buffer sharing mechanism.
> >>
> > [...]
> >>
> >>> +	return dmabuf;
> >>> +}
> >>> +EXPORT_SYMBOL(dma_buf_export);
> >>
> >> I agree with Konrad, this should definitely be EXPORT_SYMBOL_GPL,
> >> because it's really a low-level function that I would expect
> >> to get used by in-kernel subsystems providing the feature to
> >> users and having back-end drivers, but it's not the kind of thing
> >> we want out-of-tree drivers to mess with.
> >
> > Is this really necessary?  If this is intended to be a
> > lowest-common-denominator between many drivers to allow buffer sharing,
> > it seems like it needs to be able to be usable by all drivers.
> >
> > If the interface is not accessible then I fear many drivers will be
> > forced to continue to roll their own buffer sharing mechanisms (which is
> > exactly what we're trying to avoid here, needless to say).
> 
> Doing a buffer sharing with something that is not GPL is not fun, as, if any
> issue rises there, it would be impossible to discover if the problem is either
> at the closed-source driver or at the open source one. At the time I was using
> the Nvidia proprietary driver, it was very common to have unexplained issues
> caused likely by bad code there at the buffer management code, causing X
> applications and extensions (like xv) to break.
>
> We should really make this EXPORT_SYMBOL_GPL(), in order to be able to latter
> debug future share buffer issues, when needed.

Sorry, I don't buy this argument.  Making these exports GPL-only is not
likely to cause anybody to open-source their driver, but will rather
just cause them to use yet more closed-source code that is even less
debuggable than this would be, to those without access to the source.

Thanks,
Robert

> Regards,
> Mauro

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-08 21:44               ` [Linaro-mm-sig] " Daniel Vetter
  2011-12-09 14:13                 ` Arnd Bergmann
@ 2011-12-13 13:33                 ` Hans Verkuil
  1 sibling, 0 replies; 64+ messages in thread
From: Hans Verkuil @ 2011-12-13 13:33 UTC (permalink / raw)
  To: linaro-mm-sig
  Cc: Daniel Vetter, Arnd Bergmann, linux, Semwal, Sumit, linux-kernel,
	dri-devel, linux-mm, linux-arm-kernel, linux-media

(I've been away for the past two weeks, so I'm only now catching up)


On Thursday 08 December 2011 22:44:08 Daniel Vetter wrote:
> On Wed, Dec 7, 2011 at 14:40, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Wednesday 07 December 2011, Semwal, Sumit wrote:
> >> Thanks for the excellent discussion - it indeed is very good learning
> >> for the relatively-inexperienced me :)
> >> 
> >> So, for the purpose of dma-buf framework, could I summarize the
> >> following and rework accordingly?:
> >> 1. remove mmap() dma_buf_op [and mmap fop], and introduce cpu_start(),
> >> cpu_finish() ops to bracket cpu accesses to the buffer. Also add
> >> DMABUF_CPU_START / DMABUF_CPU_FINI IOCTLs?
> > 
> > I think we'd be better off for now without the extra ioctls and
> > just document that a shared buffer must not be exported to user
> > space using mmap at all, to avoid those problems. Serialization
> > between GPU and CPU is on a higher level than the dma_buf framework
> > IMHO.
> 
> Agreed.
> 
> >> 2. remove sg_sync* ops for now (and we'll see if we need to add them
> >> later if needed)
> > 
> > Just removing the sg_sync_* operations is not enough. We have to make
> > the decision whether we want to allow
> > a) only coherent mappings of the buffer into kernel memory (requiring
> > an extension to the dma_map_ops on ARM to not flush caches at map/unmap
> > time)
> > b) not allowing any in-kernel mappings (same requirement on ARM, also
> > limits the usefulness of the dma_buf if we cannot access it from the
> > kernel or from user space)
> > c) only allowing streaming mappings, even if those are non-coherent
> > (requiring strict serialization between CPU (in-kernel) and dma users of
> > the buffer)
> 
> I think only allowing streaming access makes the most sense:
> - I don't see much (if any need) for the kernel to access a dma_buf -
> in all current usecases it just contains pixel data and no hw-specific
> things (like sg tables, command buffers, ..). At most I see the need
> for the kernel to access the buffer for dma bounce buffers, but that
> is internal to the dma subsystem (and hence does not need to be
> exposed).

There are a few situations where the kernel might actually access a dma_buf:

First of all there are some sensors that add meta data before the actual
pixel data, and a kernel driver might well want to read out that data and
process it. Secondly (and really very similar), video frames sent to/from
an FPGA can also contain meta data (Cisco does that on some of our products)
that the kernel may need to inspect.

I admit that these use-cases aren't very common, but they do exist.

> - Userspace can still access the contents through the exporting
> subsystem (e.g. use some gem mmap support). For efficiency reason gpu
> drivers are already messing around with cache coherency in a platform
> specific way (and hence violated the dma api a bit), so we could stuff
> the mmap coherency in there, too. When we later on extend dma_buf
> support so that other drivers than the gpu can export dma_bufs, we can
> then extend the official dma api with already a few drivers with
> use-patterns around.
> 
> But I still think that the kernel must not be required to enforce
> correct access ordering for the reasons outlined in my other mail.

I agree with Daniel on this.

BTW, the V4L2 subsystem has a clear concept of passing bufffer ownership: the
VIDIOC_QBUF and VIDIOC_DQBUF ioctls deal with that. Pretty much all V4L2 apps 
request the buffers, then mmap them, then call QBUF to give the ownership of 
those buffers to the kernel. While the kernel owns those buffers any access to 
the mmap'ped memory leads to undefined results. Only after calling DQBUF can 
userspace actually safely access that memory.

Allowing mmap() on the dma_buf's fd would actually make things easier for 
V4L2. It's an elegant way of mapping the memory.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-12 22:44         ` Robert Morell
@ 2011-12-13 15:10           ` Arnd Bergmann
  2011-12-20  2:05             ` Robert Morell
  0 siblings, 1 reply; 64+ messages in thread
From: Arnd Bergmann @ 2011-12-13 15:10 UTC (permalink / raw)
  To: Robert Morell
  Cc: Mauro Carvalho Chehab, Sumit Semwal, linux, jesse.barker,
	linux-kernel, dri-devel, linaro-mm-sig, linux-mm, daniel,
	linux-arm-kernel, linux-media

On Monday 12 December 2011, Robert Morell wrote:
> > 
> > Doing a buffer sharing with something that is not GPL is not fun, as, if any
> > issue rises there, it would be impossible to discover if the problem is either
> > at the closed-source driver or at the open source one. At the time I was using
> > the Nvidia proprietary driver, it was very common to have unexplained issues
> > caused likely by bad code there at the buffer management code, causing X
> > applications and extensions (like xv) to break.
> >
> > We should really make this EXPORT_SYMBOL_GPL(), in order to be able to latter
> > debug future share buffer issues, when needed.
> 
> Sorry, I don't buy this argument.  Making these exports GPL-only is not
> likely to cause anybody to open-source their driver, but will rather
> just cause them to use yet more closed-source code that is even less
> debuggable than this would be, to those without access to the source.

But at least the broken module then won't be interacting with other modules
because it cannot share any buffers.

	Arnd

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-12 16:48                       ` Arnd Bergmann
@ 2011-12-19  6:16                         ` Semwal, Sumit
  2011-12-20 15:41                           ` Arnd Bergmann
  0 siblings, 1 reply; 64+ messages in thread
From: Semwal, Sumit @ 2011-12-19  6:16 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Daniel Vetter, Alan Cox, linux, linux-kernel, dri-devel,
	linaro-mm-sig, linux-mm, linux-media, linux-arm-kernel

Hi Arnd, Daniel,

On Mon, Dec 12, 2011 at 10:18 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Saturday 10 December 2011, Daniel Vetter wrote:
>> If userspace (through some driver calls)
>> tries to do stupid things, it'll just get garbage. See
>> Message-ID: <CAKMK7uHeXYn-v_8cmpLNWsFY14KtmuRZy8YRKR5Xst2-2WdFSQ@mail.gmail.com>
>> for my reasons why it think this is the right way to go forward. So in
>> essence I'm really interested in the reasons why you want the kernel
>> to enforce this (or I'm completely missing what's the contentious
>> issue here).
>
> This has nothing to do with user space mappings. Whatever user space does,
> you get garbage if you don't invalidate cache lines that were introduced
> through speculative prefetching before you access cache lines that were
> DMA'd from a device.
I didn't see a consensus on whether dma_buf should enforce some form
of serialization within the API - so atleast for v1 of dma-buf, I
propose to 'not' impose a restriction, and we can tackle it (add new
ops or enforce as design?) whenever we see the first need of it - will
that be ok? [I am bending towards the thought that it is a problem to
solve at a bigger platform than dma_buf.]
>
>        Arnd

Best regards,
~Sumit.
>
> --

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-13 15:10           ` Arnd Bergmann
@ 2011-12-20  2:05             ` Robert Morell
  2011-12-20 14:29               ` Anca Emanuel
  0 siblings, 1 reply; 64+ messages in thread
From: Robert Morell @ 2011-12-20  2:05 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Mauro Carvalho Chehab, Sumit Semwal, linux, jesse.barker,
	linux-kernel, dri-devel, linaro-mm-sig, linux-mm, daniel,
	linux-arm-kernel, linux-media

On Tue, Dec 13, 2011 at 07:10:02AM -0800, Arnd Bergmann wrote:
> On Monday 12 December 2011, Robert Morell wrote:
> > > 
> > > Doing a buffer sharing with something that is not GPL is not fun, as, if any
> > > issue rises there, it would be impossible to discover if the problem is either
> > > at the closed-source driver or at the open source one. At the time I was using
> > > the Nvidia proprietary driver, it was very common to have unexplained issues
> > > caused likely by bad code there at the buffer management code, causing X
> > > applications and extensions (like xv) to break.
> > >
> > > We should really make this EXPORT_SYMBOL_GPL(), in order to be able to latter
> > > debug future share buffer issues, when needed.
> > 
> > Sorry, I don't buy this argument.  Making these exports GPL-only is not
> > likely to cause anybody to open-source their driver, but will rather
> > just cause them to use yet more closed-source code that is even less
> > debuggable than this would be, to those without access to the source.
> 
> But at least the broken module then won't be interacting with other modules
> because it cannot share any buffers.

One of the goals of this project is to unify the fragmented space of the
ARM SoC memory managers so that each vendor doesn't implement their own,
and they can all be closer to mainline.

I fear that restricting the use of this buffer sharing mechanism to GPL
drivers only will prevent that goal from being achieved, if an SoC
driver has to interact with modules that use a non-GPL license.

As a hypothetical example, consider laptops that have multiple GPUs.
Today, these ship with onboard graphics (integrated to the CPU or
chipset) along with a discrete GPU, where in many cases only the onboard
graphics can actually display to the screen.  In order for anything
rendered by the discrete GPU to be displayed, it has to be copied to
memory available for the smaller onboard graphics to texture from or
display directly.  Obviously, that's best done by sharing dma buffers
rather than bouncing them through the GPU.  It's not much of a stretch
to imagine that we'll see such systems with a Tegra CPU/GPU plus a
discrete GPU in the future; in that case, we'd want to be able to share
memory between the discrete GPU and the Tegra system.  In that scenario,
if this interface is GPL-only, we'd be unable to adopt the dma_buffer
sharing mechanism for Tegra.

(This isn't too pie-in-the-sky, either; people are already combining
Tegra with discrete GPUs:
http://blogs.nvidia.com/2011/11/world%e2%80%99s-first-arm-based-supercomputer-to-launch-in-barcelona/
)

Thanks,
Robert

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-09 14:13                 ` Arnd Bergmann
  2011-12-09 14:24                   ` Alan Cox
@ 2011-12-20  9:03                   ` Sakari Ailus
  2011-12-20 15:36                     ` Arnd Bergmann
  1 sibling, 1 reply; 64+ messages in thread
From: Sakari Ailus @ 2011-12-20  9:03 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Daniel Vetter, Semwal, Sumit, linux, linux-kernel, dri-devel,
	linaro-mm-sig, linux-mm, linux-media, linux-arm-kernel

Hi Arnd,

On Fri, Dec 09, 2011 at 02:13:03PM +0000, Arnd Bergmann wrote:
> On Thursday 08 December 2011, Daniel Vetter wrote:
> > > c) only allowing streaming mappings, even if those are non-coherent
> > > (requiring strict serialization between CPU (in-kernel) and dma users of
> > > the buffer)
> > 
> > I think only allowing streaming access makes the most sense:
> > - I don't see much (if any need) for the kernel to access a dma_buf -
> > in all current usecases it just contains pixel data and no hw-specific
> > things (like sg tables, command buffers, ..). At most I see the need
> > for the kernel to access the buffer for dma bounce buffers, but that
> > is internal to the dma subsystem (and hence does not need to be
> > exposed).
> > - Userspace can still access the contents through the exporting
> > subsystem (e.g. use some gem mmap support). For efficiency reason gpu
> > drivers are already messing around with cache coherency in a platform
> > specific way (and hence violated the dma api a bit), so we could stuff
> > the mmap coherency in there, too. When we later on extend dma_buf
> > support so that other drivers than the gpu can export dma_bufs, we can
> > then extend the official dma api with already a few drivers with
> > use-patterns around.
> > 
> > But I still think that the kernel must not be required to enforce
> > correct access ordering for the reasons outlined in my other mail.
> 
> I still don't think that's possible. Please explain how you expect
> to change the semantics of the streaming mapping API to allow multiple
> mappers without having explicit serialization points that are visible
> to all users. For simplicity, let's assume a cache coherent system
> with bounce buffers where map() copies the buffer to a dma area
> and unmap() copies it back to regular kernel memory. How does a driver
> know if it can touch the buffer in memory or from DMA at any given
> point in time? Note that this problem is the same as the cache coherency
> problem but may be easier to grasp.

(I'm jumping into the discussion in the middle, and might miss something
that has already been talked about. I still hope what I'm about to say is
relevant. :-))

In subsystems such as V4L2 where drivers deal with such large buffers, the
buffers stay mapped all the time. The user explicitly gives the control of
the buffers to the driver and eventually gets them back. This is already
part of those APIs, whether they're using dma_buf or not. The user could
have, and often has, the same buffers mapped elsewhere.

When it comes to passing these buffers between different hardware devices,
either V4L2 or not, the user might not want to perform extra cache flush
when the buffer memory itself is not being touched by the CPU in the process
at all. I'd consider it impossible for the driver to know how the user space
intends to user the buffer.

Flushing the cache is quite expensive: typically it's the best to flush the
whole data cache when one needs to flush buffers. The V4L2 DQBUF and QBUF
IOCTLs already have flags to suggest special cache handling for buffers.

Kind regards,

-- 
Sakari Ailus
e-mail: sakari.ailus@iki.fi	jabber/XMPP/Gmail: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-20  2:05             ` Robert Morell
@ 2011-12-20 14:29               ` Anca Emanuel
  0 siblings, 0 replies; 64+ messages in thread
From: Anca Emanuel @ 2011-12-20 14:29 UTC (permalink / raw)
  To: Robert Morell
  Cc: Arnd Bergmann, linux, Mauro Carvalho Chehab, Sumit Semwal,
	linux-mm, linux-kernel, dri-devel, linaro-mm-sig, jesse.barker,
	daniel, linux-arm-kernel, linux-media

On Tue, Dec 20, 2011 at 4:05 AM, Robert Morell <rmorell@nvidia.com> wrote:
>
> One of the goals of this project is to unify the fragmented space of the
> ARM SoC memory managers so that each vendor doesn't implement their own,
> and they can all be closer to mainline.

That is a very good objective.

> I fear that restricting the use of this buffer sharing mechanism to GPL
> drivers only will prevent that goal from being achieved, if an SoC
> driver has to interact with modules that use a non-GPL license.

If nobody from nvidia have any experience with this kind of work...
Look at Intel. Why are you afraid of ?

> As a hypothetical example, consider laptops that have multiple GPUs.
> Today, these ship with onboard graphics (integrated to the CPU or
> chipset) along with a discrete GPU, where in many cases only the onboard
> graphics can actually display to the screen.  In order for anything
> rendered by the discrete GPU to be displayed, it has to be copied to
> memory available for the smaller onboard graphics to texture from or
> display directly.  Obviously, that's best done by sharing dma buffers
> rather than bouncing them through the GPU.  It's not much of a stretch
> to imagine that we'll see such systems with a Tegra CPU/GPU plus a
> discrete GPU in the future; in that case, we'd want to be able to share
> memory between the discrete GPU and the Tegra system.  In that scenario,
> if this interface is GPL-only, we'd be unable to adopt the dma_buffer
> sharing mechanism for Tegra.
>
> (This isn't too pie-in-the-sky, either; people are already combining
> Tegra with discrete GPUs:
> http://blogs.nvidia.com/2011/11/world%e2%80%99s-first-arm-based-supercomputer-to-launch-in-barcelona/
> )
>
> Thanks,
> Robert

There are other problems ? Some secret agreements with Microsoft ?

I hope to see something open sourced. You can do it nVidia.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-20  9:03                   ` Sakari Ailus
@ 2011-12-20 15:36                     ` Arnd Bergmann
  2012-01-01 20:53                       ` Sakari Ailus
  0 siblings, 1 reply; 64+ messages in thread
From: Arnd Bergmann @ 2011-12-20 15:36 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: Daniel Vetter, Semwal, Sumit, linux, linux-kernel, dri-devel,
	linaro-mm-sig, linux-mm, linux-media, linux-arm-kernel

On Tuesday 20 December 2011, Sakari Ailus wrote:
> (I'm jumping into the discussion in the middle, and might miss something
> that has already been talked about. I still hope what I'm about to say is
> relevant. :-))

It certainly is relevant.

> In subsystems such as V4L2 where drivers deal with such large buffers, the
> buffers stay mapped all the time. The user explicitly gives the control of
> the buffers to the driver and eventually gets them back. This is already
> part of those APIs, whether they're using dma_buf or not. The user could
> have, and often has, the same buffers mapped elsewhere.

Do you normally use streaming (dma_{map,sync,unmap}_*) or consistent
(dma_{alloc,free}_*) mappings for this then?

> When it comes to passing these buffers between different hardware devices,
> either V4L2 or not, the user might not want to perform extra cache flush
> when the buffer memory itself is not being touched by the CPU in the process
> at all. I'd consider it impossible for the driver to know how the user space
> intends to user the buffer.

The easiest solution to this problem would be to only allow consistent mappings
to be shared using the dma_buf mechanism. That means we never have to flush.
If you don't need the CPU to touch the buffer, that would not have any cost
at all, we could even have no kernel mapping at all instead of an uncached
mapping on ARM.

> Flushing the cache is quite expensive: typically it's the best to flush the
> whole data cache when one needs to flush buffers. The V4L2 DQBUF and QBUF
> IOCTLs already have flags to suggest special cache handling for buffers.

[sidenote: whether it makes sense to flush individual cache lines or the entire
cache is a decision best left to the architectures. On systems with larger
caches than on ARM, e.g. 64MB instead of 512KB, you really want to keep
the cache intact.]

	Arnd

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-19  6:16                         ` Semwal, Sumit
@ 2011-12-20 15:41                           ` Arnd Bergmann
  2011-12-20 16:41                             ` Rob Clark
  0 siblings, 1 reply; 64+ messages in thread
From: Arnd Bergmann @ 2011-12-20 15:41 UTC (permalink / raw)
  To: Semwal, Sumit
  Cc: Daniel Vetter, Alan Cox, linux, linux-kernel, dri-devel,
	linaro-mm-sig, linux-mm, linux-media, linux-arm-kernel

On Monday 19 December 2011, Semwal, Sumit wrote:
> I didn't see a consensus on whether dma_buf should enforce some form
> of serialization within the API - so atleast for v1 of dma-buf, I
> propose to 'not' impose a restriction, and we can tackle it (add new
> ops or enforce as design?) whenever we see the first need of it - will
> that be ok? [I am bending towards the thought that it is a problem to
> solve at a bigger platform than dma_buf.]

The problem is generally understood for streaming mappings with a 
single device using it: if you have a long-running mapping, you have
to use dma_sync_*. This obviously falls apart if you have multiple
devices and no serialization between the accesses.

If you don't want serialization, that implies that we cannot have
use the  dma_sync_* API on the buffer, which in turn implies that
we cannot have streaming mappings. I think that's ok, but then
you have to bring back the mmap API on the buffer if you want to
allow any driver to provide an mmap function for a shared buffer.

	Arnd

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-20 15:41                           ` Arnd Bergmann
@ 2011-12-20 16:41                             ` Rob Clark
  2011-12-20 17:14                               ` Daniel Vetter
  0 siblings, 1 reply; 64+ messages in thread
From: Rob Clark @ 2011-12-20 16:41 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Semwal, Sumit, Daniel Vetter, Alan Cox, linux, linux-kernel,
	dri-devel, linaro-mm-sig, linux-mm, linux-media,
	linux-arm-kernel

On Tue, Dec 20, 2011 at 9:41 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Monday 19 December 2011, Semwal, Sumit wrote:
>> I didn't see a consensus on whether dma_buf should enforce some form
>> of serialization within the API - so atleast for v1 of dma-buf, I
>> propose to 'not' impose a restriction, and we can tackle it (add new
>> ops or enforce as design?) whenever we see the first need of it - will
>> that be ok? [I am bending towards the thought that it is a problem to
>> solve at a bigger platform than dma_buf.]
>
> The problem is generally understood for streaming mappings with a
> single device using it: if you have a long-running mapping, you have
> to use dma_sync_*. This obviously falls apart if you have multiple
> devices and no serialization between the accesses.
>
> If you don't want serialization, that implies that we cannot have
> use the  dma_sync_* API on the buffer, which in turn implies that
> we cannot have streaming mappings. I think that's ok, but then
> you have to bring back the mmap API on the buffer if you want to
> allow any driver to provide an mmap function for a shared buffer.

I'm thinking for a first version, we can get enough mileage out of it by saying:
1) only exporter can mmap to userspace
2) only importers that do not need CPU access to buffer..

This way we can get dmabuf into the kernel, maybe even for 3.3.  I
know there are a lot of interesting potential uses where this stripped
down version is good enough.  It probably isn't the final version,
maybe more features are added over time to deal with importers that
need CPU access to buffer, sync object, etc.  But we have to start
somewhere.

BR,
-R

>        Arnd
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-20 16:41                             ` Rob Clark
@ 2011-12-20 17:14                               ` Daniel Vetter
  2011-12-21 17:27                                 ` Arnd Bergmann
  0 siblings, 1 reply; 64+ messages in thread
From: Daniel Vetter @ 2011-12-20 17:14 UTC (permalink / raw)
  To: Rob Clark
  Cc: Arnd Bergmann, Semwal, Sumit, Daniel Vetter, Alan Cox, linux,
	linux-kernel, dri-devel, linaro-mm-sig, linux-mm, linux-media,
	linux-arm-kernel

On Tue, Dec 20, 2011 at 10:41:45AM -0600, Rob Clark wrote:
> On Tue, Dec 20, 2011 at 9:41 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Monday 19 December 2011, Semwal, Sumit wrote:
> >> I didn't see a consensus on whether dma_buf should enforce some form
> >> of serialization within the API - so atleast for v1 of dma-buf, I
> >> propose to 'not' impose a restriction, and we can tackle it (add new
> >> ops or enforce as design?) whenever we see the first need of it - will
> >> that be ok? [I am bending towards the thought that it is a problem to
> >> solve at a bigger platform than dma_buf.]
> >
> > The problem is generally understood for streaming mappings with a
> > single device using it: if you have a long-running mapping, you have
> > to use dma_sync_*. This obviously falls apart if you have multiple
> > devices and no serialization between the accesses.
> >
> > If you don't want serialization, that implies that we cannot have
> > use the  dma_sync_* API on the buffer, which in turn implies that
> > we cannot have streaming mappings. I think that's ok, but then
> > you have to bring back the mmap API on the buffer if you want to
> > allow any driver to provide an mmap function for a shared buffer.
> 
> I'm thinking for a first version, we can get enough mileage out of it by saying:
> 1) only exporter can mmap to userspace
> 2) only importers that do not need CPU access to buffer..
> 
> This way we can get dmabuf into the kernel, maybe even for 3.3.  I
> know there are a lot of interesting potential uses where this stripped
> down version is good enough.  It probably isn't the final version,
> maybe more features are added over time to deal with importers that
> need CPU access to buffer, sync object, etc.  But we have to start
> somewhere.

I agree with Rob here - I think especially for the coherency discussion
some actual users of dma_buf on a bunch of insane platforms (i915
qualifies here too, because we do some cacheline flushing behind everyones
back) would massively help in clarifying things.

It also sounds like that at least for proper userspace mmap support we'd
need some dma api extensions on at least arm, and that might take a while
...

Cheers, Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-20 17:14                               ` Daniel Vetter
@ 2011-12-21 17:27                                 ` Arnd Bergmann
  2011-12-21 19:04                                   ` Daniel Vetter
  2011-12-23 10:00                                   ` Semwal, Sumit
  0 siblings, 2 replies; 64+ messages in thread
From: Arnd Bergmann @ 2011-12-21 17:27 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Rob Clark, Semwal, Sumit, Alan Cox, linux, linux-kernel,
	dri-devel, linaro-mm-sig, linux-mm, linux-media,
	linux-arm-kernel

On Tuesday 20 December 2011, Daniel Vetter wrote:
> > I'm thinking for a first version, we can get enough mileage out of it by saying:
> > 1) only exporter can mmap to userspace
> > 2) only importers that do not need CPU access to buffer..

Ok, that sounds possible. The alternative to this would be:

1) The exporter has to use dma_alloc_coherent() or dma_alloc_writecombine()
to allocate the buffer
2. Every user space mapping has to go through dma_mmap_coherent()
or dma_mmap_writecombine()

We can extend the rules later to allow either one after we have gained
some experience using it.

> > This way we can get dmabuf into the kernel, maybe even for 3.3.  I
> > know there are a lot of interesting potential uses where this stripped
> > down version is good enough.  It probably isn't the final version,
> > maybe more features are added over time to deal with importers that
> > need CPU access to buffer, sync object, etc.  But we have to start
> > somewhere.
> 
> I agree with Rob here - I think especially for the coherency discussion
> some actual users of dma_buf on a bunch of insane platforms (i915
> qualifies here too, because we do some cacheline flushing behind everyones
> back) would massively help in clarifying things.

Yes, agreed.

> It also sounds like that at least for proper userspace mmap support we'd
> need some dma api extensions on at least arm, and that might take a while
> ...

I think it's actually the opposite -- you'd need dma api extensions on
everything else other than arm, which already has dma_mmap_coherent()
and dma_mmap_writecombine() for this purpose.

	Arnd

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-21 17:27                                 ` Arnd Bergmann
@ 2011-12-21 19:04                                   ` Daniel Vetter
  2011-12-23 10:00                                   ` Semwal, Sumit
  1 sibling, 0 replies; 64+ messages in thread
From: Daniel Vetter @ 2011-12-21 19:04 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Daniel Vetter, linux, Semwal, Sumit, linux-kernel, dri-devel,
	linaro-mm-sig, linux-mm, Rob Clark, linux-arm-kernel, Alan Cox,
	linux-media

On Wed, Dec 21, 2011 at 05:27:16PM +0000, Arnd Bergmann wrote:
> On Tuesday 20 December 2011, Daniel Vetter wrote:
> > It also sounds like that at least for proper userspace mmap support we'd
> > need some dma api extensions on at least arm, and that might take a while
> > ...
> 
> I think it's actually the opposite -- you'd need dma api extensions on
> everything else other than arm, which already has dma_mmap_coherent()
> and dma_mmap_writecombine() for this purpose.

Yeah, that's actually what I wanted to say, but failed at ... Another
thing is that at least for i915, _writecombine isn't what we want actually
because:
- It won't work anyway cause i915 maps stuff cached and does the flushing
  itself and x86 PAT doesn't support mixed mappings (kinda like arm).
- It isn't actually enough, there's another hidden buffer between the
  memory controller interface and the gpu that i915 manually flushes
  (because even a readback on a wc mapping doesn't flush things in there).

So I assume we'll have plenty of funny beating out a good api for cpu
access ;-)

Cheers, Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-21 17:27                                 ` Arnd Bergmann
  2011-12-21 19:04                                   ` Daniel Vetter
@ 2011-12-23 10:00                                   ` Semwal, Sumit
  2011-12-23 17:10                                     ` Rob Clark
  1 sibling, 1 reply; 64+ messages in thread
From: Semwal, Sumit @ 2011-12-23 10:00 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Daniel Vetter, Rob Clark, Alan Cox, linux, linux-kernel,
	dri-devel, linaro-mm-sig, linux-mm, linux-media,
	linux-arm-kernel

On Wed, Dec 21, 2011 at 10:57 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tuesday 20 December 2011, Daniel Vetter wrote:
>> > I'm thinking for a first version, we can get enough mileage out of it by saying:
>> > 1) only exporter can mmap to userspace
>> > 2) only importers that do not need CPU access to buffer..

Thanks Rob - and the exporter can do the mmap outside of dma-buf
usage, right? I mean, we don't need to provide an mmap to dma_buf()
and restrict it to exporter, when the exporter has more 'control' of
the buffer anyways.
>
BR,
~Sumit.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-23 10:00                                   ` Semwal, Sumit
@ 2011-12-23 17:10                                     ` Rob Clark
  0 siblings, 0 replies; 64+ messages in thread
From: Rob Clark @ 2011-12-23 17:10 UTC (permalink / raw)
  To: Semwal, Sumit
  Cc: Arnd Bergmann, Daniel Vetter, Alan Cox, linux, linux-kernel,
	dri-devel, linaro-mm-sig, linux-mm, linux-media,
	linux-arm-kernel

On Fri, Dec 23, 2011 at 4:00 AM, Semwal, Sumit <sumit.semwal@ti.com> wrote:
> On Wed, Dec 21, 2011 at 10:57 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> On Tuesday 20 December 2011, Daniel Vetter wrote:
>>> > I'm thinking for a first version, we can get enough mileage out of it by saying:
>>> > 1) only exporter can mmap to userspace
>>> > 2) only importers that do not need CPU access to buffer..
>
> Thanks Rob - and the exporter can do the mmap outside of dma-buf
> usage, right?

Yes

> I mean, we don't need to provide an mmap to dma_buf()
> and restrict it to exporter, when the exporter has more 'control' of
> the buffer anyways.

No, if it is only for the exporter, it really doesn't need to be in
dmabuf (ie. the exporter already knows how he is)

BR,
-R

>>
> BR,
> ~Sumit.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-20 15:36                     ` Arnd Bergmann
@ 2012-01-01 20:53                       ` Sakari Ailus
  2012-01-01 23:12                         ` Rob Clark
  0 siblings, 1 reply; 64+ messages in thread
From: Sakari Ailus @ 2012-01-01 20:53 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Daniel Vetter, Semwal, Sumit, linux, linux-kernel, dri-devel,
	linaro-mm-sig, linux-mm, linux-media, linux-arm-kernel,
	laurent.pinchart

Hi Arnd,

On Tue, Dec 20, 2011 at 03:36:49PM +0000, Arnd Bergmann wrote:
> On Tuesday 20 December 2011, Sakari Ailus wrote:
> > (I'm jumping into the discussion in the middle, and might miss something
> > that has already been talked about. I still hope what I'm about to say is
> > relevant. :-))
> 
> It certainly is relevant.
> 
> > In subsystems such as V4L2 where drivers deal with such large buffers, the
> > buffers stay mapped all the time. The user explicitly gives the control of
> > the buffers to the driver and eventually gets them back. This is already
> > part of those APIs, whether they're using dma_buf or not. The user could
> > have, and often has, the same buffers mapped elsewhere.
> 
> Do you normally use streaming (dma_{map,sync,unmap}_*) or consistent
> (dma_{alloc,free}_*) mappings for this then?

The OMAP 3 ISP driver I'm familiar with uses the OMAP 3 IOMMU / IOVMM API
which is to be replaced by dmabuf. I'm trying to understand how the dma
api / dma-buf should be used to achieve a superset of that functionality.

I think I'm interested in the DMA mapping API. I replied to Sumit's new
patchset, you're cc'd.

> > When it comes to passing these buffers between different hardware devices,
> > either V4L2 or not, the user might not want to perform extra cache flush
> > when the buffer memory itself is not being touched by the CPU in the process
> > at all. I'd consider it impossible for the driver to know how the user space
> > intends to user the buffer.
> 
> The easiest solution to this problem would be to only allow consistent mappings
> to be shared using the dma_buf mechanism. That means we never have to flush.

Do you mean the memory would be non-cacheable? Accessing memory w/o caching
is typically prohibitively expensive, so I don't think this could ever be
the primary means to do the above.

In some cases non-cacheable can perform better, taking into account the time
which is required for flusing the cache and the other consequences of that,
but I still think it's more of an exception than a rule.

> If you don't need the CPU to touch the buffer, that would not have any cost
> at all, we could even have no kernel mapping at all instead of an uncached
> mapping on ARM.

I think in general creating unused mappings should really be avoided.
Creating them consumes time, effort at creation time and possibly also in
cache related operations.

> > Flushing the cache is quite expensive: typically it's the best to flush the
> > whole data cache when one needs to flush buffers. The V4L2 DQBUF and QBUF
> > IOCTLs already have flags to suggest special cache handling for buffers.
> 
> [sidenote: whether it makes sense to flush individual cache lines or the entire
> cache is a decision best left to the architectures. On systems with larger
> caches than on ARM, e.g. 64MB instead of 512KB, you really want to keep
> the cache intact.]

That also depend on the buffer size and what the rest of the system is
doing. I could imagine buffer size, system memory data rate, CPU frequency,
cache line width and the properties of the cache all affect how fast both of
the operations are.

It would probably be possible to perform a heuristic analysis on this at
system startup similar to different software raid algorithm implementations
( e.g. to use MMX or SSE for sw raid).

Some additional complexity will arise from the fact that on some ARM machines
one must know all the CPU MMU mappings pointing to a piece of physical
memory to properly flush them, AFAIR. Naturally a good alternative on such
system is to pperform full dcache flush / cleaning.

Also, cache handling only affects systems without coherent cache. ARM CPUs
are finding their ways to servers as well, but I'd guess it'll still take a
while before we have ARM CPUs with 64 MiB of cache..

Kind regards,

-- 
Sakari Ailus
e-mail: sakari.ailus@iki.fi	jabber/XMPP/Gmail: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2012-01-01 20:53                       ` Sakari Ailus
@ 2012-01-01 23:12                         ` Rob Clark
  0 siblings, 0 replies; 64+ messages in thread
From: Rob Clark @ 2012-01-01 23:12 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: Arnd Bergmann, Daniel Vetter, Semwal, Sumit, linux, linux-kernel,
	dri-devel, linaro-mm-sig, linux-mm, linux-media,
	linux-arm-kernel, laurent.pinchart

On Sun, Jan 1, 2012 at 2:53 PM, Sakari Ailus <sakari.ailus@iki.fi> wrote:
> Hi Arnd,
>
> On Tue, Dec 20, 2011 at 03:36:49PM +0000, Arnd Bergmann wrote:
>> On Tuesday 20 December 2011, Sakari Ailus wrote:
>> > (I'm jumping into the discussion in the middle, and might miss something
>> > that has already been talked about. I still hope what I'm about to say is
>> > relevant. :-))
>>
>> It certainly is relevant.
>>
>> > In subsystems such as V4L2 where drivers deal with such large buffers, the
>> > buffers stay mapped all the time. The user explicitly gives the control of
>> > the buffers to the driver and eventually gets them back. This is already
>> > part of those APIs, whether they're using dma_buf or not. The user could
>> > have, and often has, the same buffers mapped elsewhere.
>>
>> Do you normally use streaming (dma_{map,sync,unmap}_*) or consistent
>> (dma_{alloc,free}_*) mappings for this then?
>
> The OMAP 3 ISP driver I'm familiar with uses the OMAP 3 IOMMU / IOVMM API
> which is to be replaced by dmabuf. I'm trying to understand how the dma
> api / dma-buf should be used to achieve a superset of that functionality.
>
> I think I'm interested in the DMA mapping API. I replied to Sumit's new
> patchset, you're cc'd.
>
>> > When it comes to passing these buffers between different hardware devices,
>> > either V4L2 or not, the user might not want to perform extra cache flush
>> > when the buffer memory itself is not being touched by the CPU in the process
>> > at all. I'd consider it impossible for the driver to know how the user space
>> > intends to user the buffer.
>>
>> The easiest solution to this problem would be to only allow consistent mappings
>> to be shared using the dma_buf mechanism. That means we never have to flush.
>
> Do you mean the memory would be non-cacheable? Accessing memory w/o caching
> is typically prohibitively expensive, so I don't think this could ever be
> the primary means to do the above.
>
> In some cases non-cacheable can perform better, taking into account the time
> which is required for flusing the cache and the other consequences of that,
> but I still think it's more of an exception than a rule.

I think we decided to completely leave cpu virtual mappings out of the
API to start with, because (a) we can already get significant value
out of the API without this, and (b) it is not quite clear how to
handle virtual mappings in a way that can deal with all cases.  For
now, userspace virtual mappings must come from the exporting device,
and kernel virtual mappings (if needed by the importing device) are
not supported.. although I think it is a smaller subset of devices
that might need a kernel virtual mapping.

This sidesteps the whole issue of cache handling, avoiding aliased
mappings, etc.  And leaves cpu access synchronization and cache
handling to be handled however the exporting device handles this
today.

BR,
-R

>> If you don't need the CPU to touch the buffer, that would not have any cost
>> at all, we could even have no kernel mapping at all instead of an uncached
>> mapping on ARM.
>
> I think in general creating unused mappings should really be avoided.
> Creating them consumes time, effort at creation time and possibly also in
> cache related operations.
>
>> > Flushing the cache is quite expensive: typically it's the best to flush the
>> > whole data cache when one needs to flush buffers. The V4L2 DQBUF and QBUF
>> > IOCTLs already have flags to suggest special cache handling for buffers.
>>
>> [sidenote: whether it makes sense to flush individual cache lines or the entire
>> cache is a decision best left to the architectures. On systems with larger
>> caches than on ARM, e.g. 64MB instead of 512KB, you really want to keep
>> the cache intact.]
>
> That also depend on the buffer size and what the rest of the system is
> doing. I could imagine buffer size, system memory data rate, CPU frequency,
> cache line width and the properties of the cache all affect how fast both of
> the operations are.
>
> It would probably be possible to perform a heuristic analysis on this at
> system startup similar to different software raid algorithm implementations
> ( e.g. to use MMX or SSE for sw raid).
>
> Some additional complexity will arise from the fact that on some ARM machines
> one must know all the CPU MMU mappings pointing to a piece of physical
> memory to properly flush them, AFAIR. Naturally a good alternative on such
> system is to pperform full dcache flush / cleaning.
>
> Also, cache handling only affects systems without coherent cache. ARM CPUs
> are finding their ways to servers as well, but I'd guess it'll still take a
> while before we have ARM CPUs with 64 MiB of cache..
>
> Kind regards,
>
> --
> Sakari Ailus
> e-mail: sakari.ailus@iki.fi     jabber/XMPP/Gmail: sailus@retiisi.org.uk
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2011-12-02  8:57 ` [RFC v2 1/2] dma-buf: Introduce dma " Sumit Semwal
  2011-12-02 17:11   ` Konrad Rzeszutek Wilk
  2011-12-05 17:18   ` Arnd Bergmann
@ 2012-01-09  6:20   ` InKi Dae
  2012-01-09  8:10     ` Daniel Vetter
  2 siblings, 1 reply; 64+ messages in thread
From: InKi Dae @ 2012-01-09  6:20 UTC (permalink / raw)
  To: Sumit Semwal
  Cc: linux-kernel, linux-arm-kernel, linux-mm, linaro-mm-sig,
	dri-devel, linux-media, linux, arnd, jesse.barker, m.szyprowski,
	rob, daniel, t.stanislaws, Sumit Semwal

2011/12/2 Sumit Semwal <sumit.semwal@ti.com>:
> This is the first step in defining a dma buffer sharing mechanism.
>
> A new buffer object dma_buf is added, with operations and API to allow easy
> sharing of this buffer object across devices.
>
> The framework allows:
> - different devices to 'attach' themselves to this buffer, to facilitate
>  backing storage negotiation, using dma_buf_attach() API.
> - association of a file pointer with each user-buffer and associated
>   allocator-defined operations on that buffer. This operation is called the
>   'export' operation.
> - this exported buffer-object to be shared with the other entity by asking for
>   its 'file-descriptor (fd)', and sharing the fd across.
> - a received fd to get the buffer object back, where it can be accessed using
>   the associated exporter-defined operations.
> - the exporter and user to share the scatterlist using map_dma_buf and
>   unmap_dma_buf operations.
>
> Atleast one 'attach()' call is required to be made prior to calling the
> map_dma_buf() operation.
>
> Couple of building blocks in map_dma_buf() are added to ease introduction
> of sync'ing across exporter and users, and late allocation by the exporter.
>
> *OPTIONALLY*: mmap() file operation is provided for the associated 'fd', as
> wrapper over the optional allocator defined mmap(), to be used by devices
> that might need one.
>
> More details are there in the documentation patch.
>
> This is based on design suggestions from many people at the mini-summits[1],
> most notably from Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and
> Daniel Vetter <daniel@ffwll.ch>.
>
> The implementation is inspired from proof-of-concept patch-set from
> Tomasz Stanislawski <t.stanislaws@samsung.com>, who demonstrated buffer sharing
> between two v4l2 devices. [2]
>
> [1]: https://wiki.linaro.org/OfficeofCTO/MemoryManagement
> [2]: http://lwn.net/Articles/454389
>
> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
> Signed-off-by: Sumit Semwal <sumit.semwal@ti.com>
> ---
>  drivers/base/Kconfig    |   10 ++
>  drivers/base/Makefile   |    1 +
>  drivers/base/dma-buf.c  |  290 +++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma-buf.h |  176 ++++++++++++++++++++++++++++
>  4 files changed, 477 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/base/dma-buf.c
>  create mode 100644 include/linux/dma-buf.h
>
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index 21cf46f..07d8095 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -174,4 +174,14 @@ config SYS_HYPERVISOR
>
>  source "drivers/base/regmap/Kconfig"
>
> +config DMA_SHARED_BUFFER
> +       bool "Buffer framework to be shared between drivers"
> +       default n
> +       depends on ANON_INODES
> +       help
> +         This option enables the framework for buffer-sharing between
> +         multiple drivers. A buffer is associated with a file using driver
> +         APIs extension; the file's descriptor can then be passed on to other
> +         driver.
> +
>  endmenu
> diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> index 99a375a..d0df046 100644
> --- a/drivers/base/Makefile
> +++ b/drivers/base/Makefile
> @@ -8,6 +8,7 @@ obj-$(CONFIG_DEVTMPFS)  += devtmpfs.o
>  obj-y                  += power/
>  obj-$(CONFIG_HAS_DMA)  += dma-mapping.o
>  obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o
> +obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o
>  obj-$(CONFIG_ISA)      += isa.o
>  obj-$(CONFIG_FW_LOADER)        += firmware_class.o
>  obj-$(CONFIG_NUMA)     += node.o
> diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
> new file mode 100644
> index 0000000..4b9005e
> --- /dev/null
> +++ b/drivers/base/dma-buf.c
> @@ -0,0 +1,290 @@
> +/*
> + * Framework for buffer objects that can be shared across devices/subsystems.
> + *
> + * Copyright(C) 2011 Linaro Limited. All rights reserved.
> + * Author: Sumit Semwal <sumit.semwal@ti.com>
> + *
> + * Many thanks to linaro-mm-sig list, and specially
> + * Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and
> + * Daniel Vetter <daniel@ffwll.ch> for their support in creation and
> + * refining of this idea.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published by
> + * the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include <linux/fs.h>
> +#include <linux/slab.h>
> +#include <linux/dma-buf.h>
> +#include <linux/anon_inodes.h>
> +#include <linux/export.h>
> +
> +static inline int is_dma_buf_file(struct file *);
> +
> +static int dma_buf_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +       struct dma_buf *dmabuf;
> +
> +       if (!is_dma_buf_file(file))
> +               return -EINVAL;
> +
> +       dmabuf = file->private_data;
> +
> +       if (!dmabuf->ops->mmap)
> +               return -EINVAL;
> +
> +       return dmabuf->ops->mmap(dmabuf, vma);
> +}
> +
> +static int dma_buf_release(struct inode *inode, struct file *file)
> +{
> +       struct dma_buf *dmabuf;
> +
> +       if (!is_dma_buf_file(file))
> +               return -EINVAL;
> +
> +       dmabuf = file->private_data;
> +
> +       dmabuf->ops->release(dmabuf);
> +       kfree(dmabuf);
> +       return 0;
> +}
> +
> +static const struct file_operations dma_buf_fops = {
> +       .mmap           = dma_buf_mmap,
> +       .release        = dma_buf_release,
> +};
> +
> +/*
> + * is_dma_buf_file - Check if struct file* is associated with dma_buf
> + */
> +static inline int is_dma_buf_file(struct file *file)
> +{
> +       return file->f_op == &dma_buf_fops;
> +}
> +
> +/**
> + * dma_buf_export - Creates a new dma_buf, and associates an anon file
> + * with this buffer,so it can be exported.
> + * Also connect the allocator specific data and ops to the buffer.
> + *
> + * @priv:      [in]    Attach private data of allocator to this buffer
> + * @ops:       [in]    Attach allocator-defined dma buf ops to the new buffer.
> + * @flags:     [in]    mode flags for the file.
> + *
> + * Returns, on success, a newly created dma_buf object, which wraps the
> + * supplied private data and operations for dma_buf_ops. On failure to
> + * allocate the dma_buf object, it can return NULL.
> + *
> + */
> +struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops,
> +                               int flags)
> +{
> +       struct dma_buf *dmabuf;
> +       struct file *file;
> +
> +       BUG_ON(!priv || !ops);
> +
> +       dmabuf = kzalloc(sizeof(struct dma_buf), GFP_KERNEL);
> +       if (dmabuf == NULL)
> +               return dmabuf;
> +
> +       dmabuf->priv = priv;
> +       dmabuf->ops = ops;
> +
> +       file = anon_inode_getfile("dmabuf", &dma_buf_fops, dmabuf, flags);
> +
> +       dmabuf->file = file;
> +
> +       mutex_init(&dmabuf->lock);
> +       INIT_LIST_HEAD(&dmabuf->attachments);
> +
> +       return dmabuf;
> +}
> +EXPORT_SYMBOL(dma_buf_export);
> +
> +
> +/**
> + * dma_buf_fd - returns a file descriptor for the given dma_buf
> + * @dmabuf:    [in]    pointer to dma_buf for which fd is required.
> + *
> + * On success, returns an associated 'fd'. Else, returns error.
> + */
> +int dma_buf_fd(struct dma_buf *dmabuf)
> +{
> +       int error, fd;
> +
> +       if (!dmabuf->file)
> +               return -EINVAL;
> +
> +       error = get_unused_fd_flags(0);
> +       if (error < 0)
> +               return error;
> +       fd = error;
> +
> +       fd_install(fd, dmabuf->file);
> +
> +       return fd;
> +}
> +EXPORT_SYMBOL(dma_buf_fd);
> +
> +/**
> + * dma_buf_get - returns the dma_buf structure related to an fd
> + * @fd:        [in]    fd associated with the dma_buf to be returned
> + *
> + * On success, returns the dma_buf structure associated with an fd; uses
> + * file's refcounting done by fget to increase refcount. returns ERR_PTR
> + * otherwise.
> + */
> +struct dma_buf *dma_buf_get(int fd)
> +{
> +       struct file *file;
> +
> +       file = fget(fd);
> +
> +       if (!file)
> +               return ERR_PTR(-EBADF);
> +
> +       if (!is_dma_buf_file(file)) {
> +               fput(file);
> +               return ERR_PTR(-EINVAL);
> +       }
> +
> +       return file->private_data;
> +}
> +EXPORT_SYMBOL(dma_buf_get);
> +
> +/**
> + * dma_buf_put - decreases refcount of the buffer
> + * @dmabuf:    [in]    buffer to reduce refcount of
> + *
> + * Uses file's refcounting done implicitly by fput()
> + */
> +void dma_buf_put(struct dma_buf *dmabuf)
> +{
> +       BUG_ON(!dmabuf->file);
> +
> +       fput(dmabuf->file);
> +}
> +EXPORT_SYMBOL(dma_buf_put);
> +
> +/**
> + * dma_buf_attach - Add the device to dma_buf's attachments list; optionally,
> + * calls attach() of dma_buf_ops to allow device-specific attach functionality
> + * @dmabuf:    [in]    buffer to attach device to.
> + * @dev:       [in]    device to be attached.
> + *
> + * Returns struct dma_buf_attachment * for this attachment; may return NULL.
> + *
> + */
> +struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
> +                                               struct device *dev)
> +{
> +       struct dma_buf_attachment *attach;
> +       int ret;
> +
> +       BUG_ON(!dmabuf || !dev);
> +
> +       attach = kzalloc(sizeof(struct dma_buf_attachment), GFP_KERNEL);
> +       if (attach == NULL)
> +               goto err_alloc;
> +
> +       mutex_lock(&dmabuf->lock);
> +
> +       attach->dev = dev;
> +       attach->dmabuf = dmabuf;
> +       if (dmabuf->ops->attach) {
> +               ret = dmabuf->ops->attach(dmabuf, dev, attach);
> +               if (!ret)
> +                       goto err_attach;
> +       }
> +       list_add(&attach->node, &dmabuf->attachments);
> +
> +       mutex_unlock(&dmabuf->lock);
> +
> +err_alloc:
> +       return attach;
> +err_attach:
> +       kfree(attach);
> +       mutex_unlock(&dmabuf->lock);
> +       return ERR_PTR(ret);
> +}
> +EXPORT_SYMBOL(dma_buf_attach);
> +
> +/**
> + * dma_buf_detach - Remove the given attachment from dmabuf's attachments list;
> + * optionally calls detach() of dma_buf_ops for device-specific detach
> + * @dmabuf:    [in]    buffer to detach from.
> + * @attach:    [in]    attachment to be detached; is free'd after this call.
> + *
> + */
> +void dma_buf_detach(struct dma_buf *dmabuf, struct dma_buf_attachment *attach)
> +{
> +       BUG_ON(!dmabuf || !attach);
> +
> +       mutex_lock(&dmabuf->lock);
> +       list_del(&attach->node);
> +       if (dmabuf->ops->detach)
> +               dmabuf->ops->detach(dmabuf, attach);
> +
> +       mutex_unlock(&dmabuf->lock);
> +       kfree(attach);
> +}
> +EXPORT_SYMBOL(dma_buf_detach);
> +
> +/**
> + * dma_buf_map_attachment - Returns the scatterlist table of the attachment;
> + * mapped into _device_ address space. Is a wrapper for map_dma_buf() of the
> + * dma_buf_ops.
> + * @attach:    [in]    attachment whose scatterlist is to be returned
> + * @direction: [in]    direction of DMA transfer
> + *
> + * Returns sg_table containing the scatterlist to be returned; may return NULL
> + * or ERR_PTR.
> + *
> + */
> +struct sg_table * dma_buf_map_attachment(struct dma_buf_attachment *attach,
> +                                         enum dma_data_direction direction)
> +{
> +       struct sg_table *sg_table = ERR_PTR(-EINVAL);
> +
> +       BUG_ON(!attach || !attach->dmabuf);
> +
> +       mutex_lock(&attach->dmabuf->lock);
> +       if (attach->dmabuf->ops->map_dma_buf)
> +               sg_table = attach->dmabuf->ops->map_dma_buf(attach, direction);
> +       mutex_unlock(&attach->dmabuf->lock);
> +
> +       return sg_table;
> +}
> +EXPORT_SYMBOL(dma_buf_map_attachment);
> +
> +/**
> + * dma_buf_unmap_attachment - unmaps and decreases usecount of the buffer;might
> + * deallocate the scatterlist associated. Is a wrapper for unmap_dma_buf() of
> + * dma_buf_ops.
> + * @attach:    [in]    attachment to unmap buffer from
> + * @sg_table:  [in]    scatterlist info of the buffer to unmap
> + *
> + */
> +void dma_buf_unmap_attachment(struct dma_buf_attachment *attach,
> +                               struct sg_table *sg_table)
> +{
> +       BUG_ON(!attach || !attach->dmabuf || !sg_table);
> +
> +       mutex_lock(&attach->dmabuf->lock);
> +       if (attach->dmabuf->ops->unmap_dma_buf)
> +               attach->dmabuf->ops->unmap_dma_buf(attach, sg_table);
> +       mutex_unlock(&attach->dmabuf->lock);
> +
> +}
> +EXPORT_SYMBOL(dma_buf_unmap_attachment);
> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> new file mode 100644
> index 0000000..db4b384
> --- /dev/null
> +++ b/include/linux/dma-buf.h
> @@ -0,0 +1,176 @@
> +/*
> + * Header file for dma buffer sharing framework.
> + *
> + * Copyright(C) 2011 Linaro Limited. All rights reserved.
> + * Author: Sumit Semwal <sumit.semwal@ti.com>
> + *
> + * Many thanks to linaro-mm-sig list, and specially
> + * Arnd Bergmann <arnd@arndb.de>, Rob Clark <rob@ti.com> and
> + * Daniel Vetter <daniel@ffwll.ch> for their support in creation and
> + * refining of this idea.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published by
> + * the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program.  If not, see <http://www.gnu.org/licenses/>.
> + */
> +#ifndef __DMA_BUF_H__
> +#define __DMA_BUF_H__
> +
> +#include <linux/file.h>
> +#include <linux/err.h>
> +#include <linux/device.h>
> +#include <linux/scatterlist.h>
> +#include <linux/list.h>
> +#include <linux/dma-mapping.h>
> +
> +struct dma_buf;
> +
> +/**
> + * struct dma_buf_attachment - holds device-buffer attachment data
> + * @dmabuf: buffer for this attachment.
> + * @dev: device attached to the buffer.
> + * @node: list_head to allow manipulation of list of dma_buf_attachment.
> + * @priv: exporter-specific attachment data.
> + */
> +struct dma_buf_attachment {
> +       struct dma_buf *dmabuf;
> +       struct device *dev;
> +       struct list_head node;
> +       void *priv;
> +};
> +
> +/**
> + * struct dma_buf_ops - operations possible on struct dma_buf
> + * @attach: allows different devices to 'attach' themselves to the given
> + *         buffer. It might return -EBUSY to signal that backing storage
> + *         is already allocated and incompatible with the requirements
> + *         of requesting device. [optional]
> + * @detach: detach a given device from this buffer. [optional]
> + * @map_dma_buf: returns list of scatter pages allocated, increases usecount
> + *              of the buffer. Requires atleast one attach to be called
> + *              before. Returned sg list should already be mapped into
> + *              _device_ address space. This call may sleep.
> + * @unmap_dma_buf: decreases usecount of buffer, might deallocate scatter
> + *                pages.
> + * @mmap: memory map this buffer - optional.
> + * @release: release this buffer; to be called after the last dma_buf_put.
> + * @sync_sg_for_cpu: sync the sg list for cpu.
> + * @sync_sg_for_device: synch the sg list for device.
> + */
> +struct dma_buf_ops {
> +       int (*attach)(struct dma_buf *, struct device *,
> +                       struct dma_buf_attachment *);
> +
> +       void (*detach)(struct dma_buf *, struct dma_buf_attachment *);
> +
> +       /* For {map,unmap}_dma_buf below, any specific buffer attributes
> +        * required should get added to device_dma_parameters accessible
> +        * via dev->dma_params.
> +        */
> +       struct sg_table * (*map_dma_buf)(struct dma_buf_attachment *,
> +                                               enum dma_data_direction);
> +       void (*unmap_dma_buf)(struct dma_buf_attachment *,
> +                                               struct sg_table *);
> +       /* TODO: Add try_map_dma_buf version, to return immed with -EBUSY
> +        * if the call would block.
> +        */

I has test dmabuf based drm gem module for exynos and I found one problem.
you can refer to this test repository:
http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/exynos-drm-dmabuf

at this repository, I added some exception codes for resource release
in addition to Dave's patch sets.

let's suppose we use dmabuf based vb2 and drm gem with physically
continuous memory(no IOMMU) and we try to share allocated buffer
between them(v4l2 and drm driver).

1. request memory allocation through drm gem interface.
2. request DRM_SET_PRIME ioctl with the gem handle to get a fd to the
gem object.
- internally, private gem based dmabuf moudle calls drm_buf_export()
to register allocated gem object to fd.
3. request qbuf with the fd(got from 2) and DMABUF type to set the
buffer to v4l2 based device.
- internally, vb2 plug in module gets a buffer to the fd and then
calls dmabuf->ops->map_dmabuf() callback to get the sg table
containing physical memory info to the gem object. and then the
physical memory info would be copied to vb2_xx_buf object.
for DMABUF feature for v4l2 and videobuf2 framework, you can refer to
this repository:
git://github.com/robclark/kernel-omap4.git drmplane-dmabuf

after that, if v4l2 driver want to release vb2_xx_buf object with
allocated memory region by user request, how should we do?. refcount
to vb2_xx_buf is dependent on videobuf2 framework. so when vb2_xx_buf
object is released videobuf2 framework don't know who is using the
physical memory region. so this physical memory region is released and
when drm driver tries to access the region or to release it also, a
problem would be induced.

for this problem, I added get_shared_cnt() callback to dma-buf.h but
I'm not sure that this is good way. maybe there may be better way.
if there is any missing point, please let me know.

Thanks.

> +
> +       /* allow mmap optionally for devices that need it */
> +       int (*mmap)(struct dma_buf *, struct vm_area_struct *);
> +       /* after final dma_buf_put() */
> +       void (*release)(struct dma_buf *);
> +
> +       /* allow allocator to take care of cache ops */
> +       void (*sync_sg_for_cpu) (struct dma_buf *, struct device *);
> +       void (*sync_sg_for_device)(struct dma_buf *, struct device *);
> +};
> +
> +/**
> + * struct dma_buf - shared buffer object
> + * @file: file pointer used for sharing buffers across, and for refcounting.
> + * @attachments: list of dma_buf_attachment that denotes all devices attached.
> + * @ops: dma_buf_ops associated with this buffer object
> + * @priv: user specific private data
> + */
> +struct dma_buf {
> +       size_t size;
> +       struct file *file;
> +       struct list_head attachments;
> +       const struct dma_buf_ops *ops;
> +       /* mutex to serialize list manipulation and other ops */
> +       struct mutex lock;
> +       void *priv;
> +};
> +
> +#ifdef CONFIG_DMA_SHARED_BUFFER
> +struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
> +                                                       struct device *dev);
> +void dma_buf_detach(struct dma_buf *dmabuf,
> +                               struct dma_buf_attachment *dmabuf_attach);
> +struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops, int flags);
> +int dma_buf_fd(struct dma_buf *dmabuf);
> +struct dma_buf *dma_buf_get(int fd);
> +void dma_buf_put(struct dma_buf *dmabuf);
> +
> +struct sg_table * dma_buf_map_attachment(struct dma_buf_attachment *,
> +                                        enum dma_data_direction);
> +void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *);
> +#else
> +
> +static inline struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
> +                                                       struct device *dev)
> +{
> +       return ERR_PTR(-ENODEV);
> +}
> +
> +static inline void dma_buf_detach(struct dma_buf *dmabuf,
> +                                 struct dma_buf_attachment *dmabuf_attach)
> +{
> +       return;
> +}
> +
> +static inline struct dma_buf *dma_buf_export(void *priv,
> +                                               struct dma_buf_ops *ops,
> +                                               int flags)
> +{
> +       return ERR_PTR(-ENODEV);
> +}
> +
> +static inline int dma_buf_fd(struct dma_buf *dmabuf)
> +{
> +       return -ENODEV;
> +}
> +
> +static inline struct dma_buf *dma_buf_get(int fd)
> +{
> +       return ERR_PTR(-ENODEV);
> +}
> +
> +static inline void dma_buf_put(struct dma_buf *dmabuf)
> +{
> +       return;
> +}
> +
> +static inline struct sg_table * dma_buf_map_attachment(
> +       struct dma_buf_attachment *, enum dma_data_direction)
> +{
> +       return ERR_PTR(-ENODEV);
> +}
> +
> +static inline void dma_buf_unmap_attachment(struct dma_buf_attachment *,
> +                                               struct sg_table *)
> +{
> +       return;
> +}
> +
> +#endif /* CONFIG_DMA_SHARED_BUFFER */
> +
> +#endif /* __DMA_BUF_H__ */
> --
> 1.7.4.1
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2012-01-09  6:20   ` InKi Dae
@ 2012-01-09  8:10     ` Daniel Vetter
  2012-01-09  8:11       ` [Linaro-mm-sig] " Dave Airlie
  2012-01-09 10:10       ` InKi Dae
  0 siblings, 2 replies; 64+ messages in thread
From: Daniel Vetter @ 2012-01-09  8:10 UTC (permalink / raw)
  To: InKi Dae
  Cc: Sumit Semwal, linux-kernel, linux-arm-kernel, linux-mm,
	linaro-mm-sig, dri-devel, linux-media, linux, arnd, jesse.barker,
	m.szyprowski, rob, daniel, t.stanislaws, Sumit Semwal

On Mon, Jan 09, 2012 at 03:20:48PM +0900, InKi Dae wrote:
> I has test dmabuf based drm gem module for exynos and I found one problem.
> you can refer to this test repository:
> http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/exynos-drm-dmabuf
> 
> at this repository, I added some exception codes for resource release
> in addition to Dave's patch sets.
> 
> let's suppose we use dmabuf based vb2 and drm gem with physically
> continuous memory(no IOMMU) and we try to share allocated buffer
> between them(v4l2 and drm driver).
> 
> 1. request memory allocation through drm gem interface.
> 2. request DRM_SET_PRIME ioctl with the gem handle to get a fd to the
> gem object.
> - internally, private gem based dmabuf moudle calls drm_buf_export()
> to register allocated gem object to fd.
> 3. request qbuf with the fd(got from 2) and DMABUF type to set the
> buffer to v4l2 based device.
> - internally, vb2 plug in module gets a buffer to the fd and then
> calls dmabuf->ops->map_dmabuf() callback to get the sg table
> containing physical memory info to the gem object. and then the
> physical memory info would be copied to vb2_xx_buf object.
> for DMABUF feature for v4l2 and videobuf2 framework, you can refer to
> this repository:
> git://github.com/robclark/kernel-omap4.git drmplane-dmabuf
> 
> after that, if v4l2 driver want to release vb2_xx_buf object with
> allocated memory region by user request, how should we do?. refcount
> to vb2_xx_buf is dependent on videobuf2 framework. so when vb2_xx_buf
> object is released videobuf2 framework don't know who is using the
> physical memory region. so this physical memory region is released and
> when drm driver tries to access the region or to release it also, a
> problem would be induced.
> 
> for this problem, I added get_shared_cnt() callback to dma-buf.h but
> I'm not sure that this is good way. maybe there may be better way.
> if there is any missing point, please let me know.

The dma_buf object needs to hold a reference on the underlying
(necessarily reference-counted) buffer object when the exporter creates
the dma_buf handle. This reference should then get dropped in the
exporters dma_buf->ops->release() function, which is only getting called
when the last reference to the dma_buf disappears.

If this doesn't work like that currently, we have a bug, and exporting the
reference count or something similar can't fix that.

Yours, Daniel

PS: Please cut down the original mail when replying, otherwise it's pretty
hard to find your response ;-)
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Linaro-mm-sig] [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2012-01-09  8:10     ` Daniel Vetter
@ 2012-01-09  8:11       ` Dave Airlie
  2012-01-09 10:10       ` InKi Dae
  1 sibling, 0 replies; 64+ messages in thread
From: Dave Airlie @ 2012-01-09  8:11 UTC (permalink / raw)
  To: InKi Dae, Sumit Semwal, linux-kernel, linux-arm-kernel, linux-mm,
	linaro-mm-sig, dri-devel, linux-media, linux, arnd, jesse.barker,
	m.szyprowski, rob, t.stanislaws, Sumit Semwal
  Cc: daniel

On Mon, Jan 9, 2012 at 8:10 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Mon, Jan 09, 2012 at 03:20:48PM +0900, InKi Dae wrote:
>> I has test dmabuf based drm gem module for exynos and I found one problem.
>> you can refer to this test repository:
>> http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/exynos-drm-dmabuf
>>
>> at this repository, I added some exception codes for resource release
>> in addition to Dave's patch sets.
>>
>> let's suppose we use dmabuf based vb2 and drm gem with physically
>> continuous memory(no IOMMU) and we try to share allocated buffer
>> between them(v4l2 and drm driver).
>>
>> 1. request memory allocation through drm gem interface.
>> 2. request DRM_SET_PRIME ioctl with the gem handle to get a fd to the
>> gem object.
>> - internally, private gem based dmabuf moudle calls drm_buf_export()
>> to register allocated gem object to fd.
>> 3. request qbuf with the fd(got from 2) and DMABUF type to set the
>> buffer to v4l2 based device.
>> - internally, vb2 plug in module gets a buffer to the fd and then
>> calls dmabuf->ops->map_dmabuf() callback to get the sg table
>> containing physical memory info to the gem object. and then the
>> physical memory info would be copied to vb2_xx_buf object.
>> for DMABUF feature for v4l2 and videobuf2 framework, you can refer to
>> this repository:
>> git://github.com/robclark/kernel-omap4.git drmplane-dmabuf
>>
>> after that, if v4l2 driver want to release vb2_xx_buf object with
>> allocated memory region by user request, how should we do?. refcount
>> to vb2_xx_buf is dependent on videobuf2 framework. so when vb2_xx_buf
>> object is released videobuf2 framework don't know who is using the
>> physical memory region. so this physical memory region is released and
>> when drm driver tries to access the region or to release it also, a
>> problem would be induced.
>>
>> for this problem, I added get_shared_cnt() callback to dma-buf.h but
>> I'm not sure that this is good way. maybe there may be better way.
>> if there is any missing point, please let me know.
>
> The dma_buf object needs to hold a reference on the underlying
> (necessarily reference-counted) buffer object when the exporter creates
> the dma_buf handle. This reference should then get dropped in the
> exporters dma_buf->ops->release() function, which is only getting called
> when the last reference to the dma_buf disappears.
>
> If this doesn't work like that currently, we have a bug, and exporting the
> reference count or something similar can't fix that.
>
> Yours, Daniel
>
> PS: Please cut down the original mail when replying, otherwise it's pretty
> hard to find your response ;-)

And also the importer needs to realise it doesn't own the pages in the
sg_table and when its freeing its backing memory it shouldn't free
those pages. So for GEM objects we have to keep track if we allocated
the pages or we got them from an dma buf.

Dave.

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2012-01-09  8:10     ` Daniel Vetter
  2012-01-09  8:11       ` [Linaro-mm-sig] " Dave Airlie
@ 2012-01-09 10:10       ` InKi Dae
  2012-01-09 10:27         ` Daniel Vetter
  2012-01-09 15:17         ` Rob Clark
  1 sibling, 2 replies; 64+ messages in thread
From: InKi Dae @ 2012-01-09 10:10 UTC (permalink / raw)
  To: InKi Dae, Sumit Semwal, linux-kernel, linux-arm-kernel, linux-mm,
	linaro-mm-sig, dri-devel, linux-media, linux, arnd, jesse.barker,
	m.szyprowski, rob, t.stanislaws, Sumit Semwal
  Cc: daniel

2012/1/9 Daniel Vetter <daniel@ffwll.ch>:
> On Mon, Jan 09, 2012 at 03:20:48PM +0900, InKi Dae wrote:
>> I has test dmabuf based drm gem module for exynos and I found one problem.
>> you can refer to this test repository:
>> http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/exynos-drm-dmabuf
>>
>> at this repository, I added some exception codes for resource release
>> in addition to Dave's patch sets.
>>
>> let's suppose we use dmabuf based vb2 and drm gem with physically
>> continuous memory(no IOMMU) and we try to share allocated buffer
>> between them(v4l2 and drm driver).
>>
>> 1. request memory allocation through drm gem interface.
>> 2. request DRM_SET_PRIME ioctl with the gem handle to get a fd to the
>> gem object.
>> - internally, private gem based dmabuf moudle calls drm_buf_export()
>> to register allocated gem object to fd.
>> 3. request qbuf with the fd(got from 2) and DMABUF type to set the
>> buffer to v4l2 based device.
>> - internally, vb2 plug in module gets a buffer to the fd and then
>> calls dmabuf->ops->map_dmabuf() callback to get the sg table
>> containing physical memory info to the gem object. and then the
>> physical memory info would be copied to vb2_xx_buf object.
>> for DMABUF feature for v4l2 and videobuf2 framework, you can refer to
>> this repository:
>> git://github.com/robclark/kernel-omap4.git drmplane-dmabuf
>>
>> after that, if v4l2 driver want to release vb2_xx_buf object with
>> allocated memory region by user request, how should we do?. refcount
>> to vb2_xx_buf is dependent on videobuf2 framework. so when vb2_xx_buf
>> object is released videobuf2 framework don't know who is using the
>> physical memory region. so this physical memory region is released and
>> when drm driver tries to access the region or to release it also, a
>> problem would be induced.
>>
>> for this problem, I added get_shared_cnt() callback to dma-buf.h but
>> I'm not sure that this is good way. maybe there may be better way.
>> if there is any missing point, please let me know.
>
> The dma_buf object needs to hold a reference on the underlying
> (necessarily reference-counted) buffer object when the exporter creates
> the dma_buf handle. This reference should then get dropped in the
> exporters dma_buf->ops->release() function, which is only getting called
> when the last reference to the dma_buf disappears.
>

when the exporter creates the dma_buf handle(for example, gem -> fd),
I think the refcount of gem object should be increased at this point,
and decreased by dma_buf->ops->release() again because when the
dma_buf is created and dma_buf_export() is called, this dma_buf refers
to the gem object one time. and in case of inporter(fd -> gem),
file->f_count of the dma_buf is increased and then when this gem
object is released by user request such as drm close or
drn_gem_close_ioctl, dma_buf_put() should be called by
dma_buf->ops->detach() to decrease file->f_count again because the gem
object refers to the dma_buf. for this, you can refer to my test
repository I mentioned above. but the problem is that when a buffer is
released by one side, another can't know whether the buffer already
was released or not.
note : in case of sharing a buffer between v4l2 and drm driver, the
memory info would be copied vb2_xx_buf to xx_gem or xx_gem to
vb2_xx_buf through sg table. in this case, only memory info is used to
share, not some objects.

> If this doesn't work like that currently, we have a bug, and exporting the
> reference count or something similar can't fix that.
>
> Yours, Daniel
>
> PS: Please cut down the original mail when replying, otherwise it's pretty
> hard to find your response ;-)

Ok, got it. thanks. :)

> --
> Daniel Vetter
> Mail: daniel@ffwll.ch
> Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2012-01-09 10:10       ` InKi Dae
@ 2012-01-09 10:27         ` Daniel Vetter
  2012-01-09 12:06           ` InKi Dae
  2012-01-09 15:17         ` Rob Clark
  1 sibling, 1 reply; 64+ messages in thread
From: Daniel Vetter @ 2012-01-09 10:27 UTC (permalink / raw)
  To: InKi Dae
  Cc: Sumit Semwal, linux-kernel, linux-arm-kernel, linux-mm,
	linaro-mm-sig, dri-devel, linux-media, linux, arnd, jesse.barker,
	m.szyprowski, rob, t.stanislaws, Sumit Semwal, daniel

On Mon, Jan 09, 2012 at 07:10:25PM +0900, InKi Dae wrote:
> 2012/1/9 Daniel Vetter <daniel@ffwll.ch>:
> > On Mon, Jan 09, 2012 at 03:20:48PM +0900, InKi Dae wrote:
> >> I has test dmabuf based drm gem module for exynos and I found one problem.
> >> you can refer to this test repository:
> >> http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/exynos-drm-dmabuf
> >>
> >> at this repository, I added some exception codes for resource release
> >> in addition to Dave's patch sets.
> >>
> >> let's suppose we use dmabuf based vb2 and drm gem with physically
> >> continuous memory(no IOMMU) and we try to share allocated buffer
> >> between them(v4l2 and drm driver).
> >>
> >> 1. request memory allocation through drm gem interface.
> >> 2. request DRM_SET_PRIME ioctl with the gem handle to get a fd to the
> >> gem object.
> >> - internally, private gem based dmabuf moudle calls drm_buf_export()
> >> to register allocated gem object to fd.
> >> 3. request qbuf with the fd(got from 2) and DMABUF type to set the
> >> buffer to v4l2 based device.
> >> - internally, vb2 plug in module gets a buffer to the fd and then
> >> calls dmabuf->ops->map_dmabuf() callback to get the sg table
> >> containing physical memory info to the gem object. and then the
> >> physical memory info would be copied to vb2_xx_buf object.
> >> for DMABUF feature for v4l2 and videobuf2 framework, you can refer to
> >> this repository:
> >> git://github.com/robclark/kernel-omap4.git drmplane-dmabuf
> >>
> >> after that, if v4l2 driver want to release vb2_xx_buf object with
> >> allocated memory region by user request, how should we do?. refcount
> >> to vb2_xx_buf is dependent on videobuf2 framework. so when vb2_xx_buf
> >> object is released videobuf2 framework don't know who is using the
> >> physical memory region. so this physical memory region is released and
> >> when drm driver tries to access the region or to release it also, a
> >> problem would be induced.
> >>
> >> for this problem, I added get_shared_cnt() callback to dma-buf.h but
> >> I'm not sure that this is good way. maybe there may be better way.
> >> if there is any missing point, please let me know.
> >
> > The dma_buf object needs to hold a reference on the underlying
> > (necessarily reference-counted) buffer object when the exporter creates
> > the dma_buf handle. This reference should then get dropped in the
> > exporters dma_buf->ops->release() function, which is only getting called
> > when the last reference to the dma_buf disappears.
> >
> 
> when the exporter creates the dma_buf handle(for example, gem -> fd),
> I think the refcount of gem object should be increased at this point,
> and decreased by dma_buf->ops->release() again because when the
> dma_buf is created and dma_buf_export() is called, this dma_buf refers
> to the gem object one time. and in case of inporter(fd -> gem),
> file->f_count of the dma_buf is increased and then when this gem
> object is released by user request such as drm close or
> drn_gem_close_ioctl, dma_buf_put() should be called by
> dma_buf->ops->detach() to decrease file->f_count again because the gem
> object refers to the dma_buf. for this, you can refer to my test
> repository I mentioned above. but the problem is that when a buffer is
> released by one side, another can't know whether the buffer already
> was released or not.

Nope, dma_buf_put should not be called by ->detach. The importer gets his
reference when importing the dma_buf and needs to drop that reference
himself when it's done using the buffer by calling dma_buf_put (i.e. after
the last ->detach call). I think adding separate reference counting to
->attach and ->detach is a waste of time and only papers over buggy
importers.

Additionally the importer does _not_ control the lifetime of an dma_buf
object and it's underlying backing storage. It hence may _never_ free the
backing storage itself, that's the job of the exporter.

With that cleared up, referencing the exporters underlying buffer object
from the dma_buf will just do the right thing.

> note : in case of sharing a buffer between v4l2 and drm driver, the
> memory info would be copied vb2_xx_buf to xx_gem or xx_gem to
> vb2_xx_buf through sg table. in this case, only memory info is used to
> share, not some objects.

Hm, maybe I need to take a look at the currently proposed v4l dma_buf
patches ;-) atm I don't have an idea what exactly you're talking about.

Yours, Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2012-01-09 10:27         ` Daniel Vetter
@ 2012-01-09 12:06           ` InKi Dae
  2012-01-09 16:02             ` Daniel Vetter
  0 siblings, 1 reply; 64+ messages in thread
From: InKi Dae @ 2012-01-09 12:06 UTC (permalink / raw)
  To: InKi Dae, Sumit Semwal, linux-kernel, linux-arm-kernel, linux-mm,
	linaro-mm-sig, dri-devel, linux-media, linux, arnd, jesse.barker,
	m.szyprowski, rob, t.stanislaws, Sumit Semwal
  Cc: daniel

2012/1/9 Daniel Vetter <daniel@ffwll.ch>:
> On Mon, Jan 09, 2012 at 07:10:25PM +0900, InKi Dae wrote:
>> 2012/1/9 Daniel Vetter <daniel@ffwll.ch>:
>> > On Mon, Jan 09, 2012 at 03:20:48PM +0900, InKi Dae wrote:
>> >> I has test dmabuf based drm gem module for exynos and I found one problem.
>> >> you can refer to this test repository:
>> >> http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/exynos-drm-dmabuf
>> >>
>> >> at this repository, I added some exception codes for resource release
>> >> in addition to Dave's patch sets.
>> >>
>> >> let's suppose we use dmabuf based vb2 and drm gem with physically
>> >> continuous memory(no IOMMU) and we try to share allocated buffer
>> >> between them(v4l2 and drm driver).
>> >>
>> >> 1. request memory allocation through drm gem interface.
>> >> 2. request DRM_SET_PRIME ioctl with the gem handle to get a fd to the
>> >> gem object.
>> >> - internally, private gem based dmabuf moudle calls drm_buf_export()
>> >> to register allocated gem object to fd.
>> >> 3. request qbuf with the fd(got from 2) and DMABUF type to set the
>> >> buffer to v4l2 based device.
>> >> - internally, vb2 plug in module gets a buffer to the fd and then
>> >> calls dmabuf->ops->map_dmabuf() callback to get the sg table
>> >> containing physical memory info to the gem object. and then the
>> >> physical memory info would be copied to vb2_xx_buf object.
>> >> for DMABUF feature for v4l2 and videobuf2 framework, you can refer to
>> >> this repository:
>> >> git://github.com/robclark/kernel-omap4.git drmplane-dmabuf
>> >>
>> >> after that, if v4l2 driver want to release vb2_xx_buf object with
>> >> allocated memory region by user request, how should we do?. refcount
>> >> to vb2_xx_buf is dependent on videobuf2 framework. so when vb2_xx_buf
>> >> object is released videobuf2 framework don't know who is using the
>> >> physical memory region. so this physical memory region is released and
>> >> when drm driver tries to access the region or to release it also, a
>> >> problem would be induced.
>> >>
>> >> for this problem, I added get_shared_cnt() callback to dma-buf.h but
>> >> I'm not sure that this is good way. maybe there may be better way.
>> >> if there is any missing point, please let me know.
>> >
>> > The dma_buf object needs to hold a reference on the underlying
>> > (necessarily reference-counted) buffer object when the exporter creates
>> > the dma_buf handle. This reference should then get dropped in the
>> > exporters dma_buf->ops->release() function, which is only getting called
>> > when the last reference to the dma_buf disappears.
>> >
>>
>> when the exporter creates the dma_buf handle(for example, gem -> fd),
>> I think the refcount of gem object should be increased at this point,
>> and decreased by dma_buf->ops->release() again because when the
>> dma_buf is created and dma_buf_export() is called, this dma_buf refers
>> to the gem object one time. and in case of inporter(fd -> gem),
>> file->f_count of the dma_buf is increased and then when this gem
>> object is released by user request such as drm close or
>> drn_gem_close_ioctl, dma_buf_put() should be called by
>> dma_buf->ops->detach() to decrease file->f_count again because the gem
>> object refers to the dma_buf. for this, you can refer to my test
>> repository I mentioned above. but the problem is that when a buffer is
>> released by one side, another can't know whether the buffer already
>> was released or not.
>
> Nope, dma_buf_put should not be called by ->detach. The importer gets his
> reference when importing the dma_buf and needs to drop that reference
> himself when it's done using the buffer by calling dma_buf_put (i.e. after
> the last ->detach call).

I'm afraid that there may be my missing points. I'm confusing. who is
Importer and who is Exporter you think? Importer is fd goes to private
buffer and Exporter is private buffer goes to fd? if so, yes, when the
importer needs to drop that reference(the importer want to release
that buffer), dma_buf_put() should be called somewhere and in my case,
that function is called by drm_prime_gem_destory(). this function is
included at Dave's patch sets and also dma_buf_detatch() is called
there. and I just thought that here is right place. I didn't find the
place dma_buf_put() is called anywhere. could you please tell me where
dma_buf_put() should be called at you think?.

for this, you can refer to Dave's repository:
http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-prime-dmabuf

> I think adding separate reference counting to
> ->attach and ->detach is a waste of time and only papers over buggy
> importers.
>

 I mean  when fd goes to private buffer, that reference(file->f_count)
would be increased by dma_buf_get() and  only ->detach is used to drop
that reference.

> Additionally the importer does _not_ control the lifetime of an dma_buf
> object and it's underlying backing storage. It hence may _never_ free the
> backing storage itself, that's the job of the exporter.
>

yes, my understanding is that if user app requested close(fd) call to
the Importer, after close(fd), file->f_count would be decreased but
would still has 1. because file->f_count already was 1 by the
Exporter.

> With that cleared up, referencing the exporters underlying buffer object
> from the dma_buf will just do the right thing.
>
>> note : in case of sharing a buffer between v4l2 and drm driver, the
>> memory info would be copied vb2_xx_buf to xx_gem or xx_gem to
>> vb2_xx_buf through sg table. in this case, only memory info is used to
>> share, not some objects.
>
> Hm, maybe I need to take a look at the currently proposed v4l dma_buf
> patches ;-) atm I don't have an idea what exactly you're talking about.
>

I need to stop the confusing. thank you for your answer. :)

> Yours, Daniel
> --
> Daniel Vetter
> Mail: daniel@ffwll.ch
> Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2012-01-09 10:10       ` InKi Dae
  2012-01-09 10:27         ` Daniel Vetter
@ 2012-01-09 15:17         ` Rob Clark
  2012-01-10  1:34           ` InKi Dae
  1 sibling, 1 reply; 64+ messages in thread
From: Rob Clark @ 2012-01-09 15:17 UTC (permalink / raw)
  To: InKi Dae
  Cc: Sumit Semwal, linux-kernel, linux-arm-kernel, linux-mm,
	linaro-mm-sig, dri-devel, linux-media, linux, arnd, jesse.barker,
	m.szyprowski, t.stanislaws, Sumit Semwal, daniel

On Mon, Jan 9, 2012 at 4:10 AM, InKi Dae <daeinki@gmail.com> wrote:
> note : in case of sharing a buffer between v4l2 and drm driver, the
> memory info would be copied vb2_xx_buf to xx_gem or xx_gem to
> vb2_xx_buf through sg table. in this case, only memory info is used to
> share, not some objects.

which v4l2/vb2 patches are you looking at?  The patches I was using,
vb2 holds a reference to the 'struct dma_buf *' internally, not just
keeping the sg_table

BR,
-R

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2012-01-09 12:06           ` InKi Dae
@ 2012-01-09 16:02             ` Daniel Vetter
  0 siblings, 0 replies; 64+ messages in thread
From: Daniel Vetter @ 2012-01-09 16:02 UTC (permalink / raw)
  To: InKi Dae
  Cc: Sumit Semwal, linux-kernel, linux-arm-kernel, linux-mm,
	linaro-mm-sig, dri-devel, linux-media, linux, arnd, jesse.barker,
	m.szyprowski, rob, t.stanislaws, Sumit Semwal, daniel

On Mon, Jan 09, 2012 at 09:06:56PM +0900, InKi Dae wrote:
> 2012/1/9 Daniel Vetter <daniel@ffwll.ch>:
> > On Mon, Jan 09, 2012 at 07:10:25PM +0900, InKi Dae wrote:
> >> 2012/1/9 Daniel Vetter <daniel@ffwll.ch>:
> >> > On Mon, Jan 09, 2012 at 03:20:48PM +0900, InKi Dae wrote:
> >> >> I has test dmabuf based drm gem module for exynos and I found one problem.
> >> >> you can refer to this test repository:
> >> >> http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/exynos-drm-dmabuf
> >> >>
> >> >> at this repository, I added some exception codes for resource release
> >> >> in addition to Dave's patch sets.
> >> >>
> >> >> let's suppose we use dmabuf based vb2 and drm gem with physically
> >> >> continuous memory(no IOMMU) and we try to share allocated buffer
> >> >> between them(v4l2 and drm driver).
> >> >>
> >> >> 1. request memory allocation through drm gem interface.
> >> >> 2. request DRM_SET_PRIME ioctl with the gem handle to get a fd to the
> >> >> gem object.
> >> >> - internally, private gem based dmabuf moudle calls drm_buf_export()
> >> >> to register allocated gem object to fd.
> >> >> 3. request qbuf with the fd(got from 2) and DMABUF type to set the
> >> >> buffer to v4l2 based device.
> >> >> - internally, vb2 plug in module gets a buffer to the fd and then
> >> >> calls dmabuf->ops->map_dmabuf() callback to get the sg table
> >> >> containing physical memory info to the gem object. and then the
> >> >> physical memory info would be copied to vb2_xx_buf object.
> >> >> for DMABUF feature for v4l2 and videobuf2 framework, you can refer to
> >> >> this repository:
> >> >> git://github.com/robclark/kernel-omap4.git drmplane-dmabuf
> >> >>
> >> >> after that, if v4l2 driver want to release vb2_xx_buf object with
> >> >> allocated memory region by user request, how should we do?. refcount
> >> >> to vb2_xx_buf is dependent on videobuf2 framework. so when vb2_xx_buf
> >> >> object is released videobuf2 framework don't know who is using the
> >> >> physical memory region. so this physical memory region is released and
> >> >> when drm driver tries to access the region or to release it also, a
> >> >> problem would be induced.
> >> >>
> >> >> for this problem, I added get_shared_cnt() callback to dma-buf.h but
> >> >> I'm not sure that this is good way. maybe there may be better way.
> >> >> if there is any missing point, please let me know.
> >> >
> >> > The dma_buf object needs to hold a reference on the underlying
> >> > (necessarily reference-counted) buffer object when the exporter creates
> >> > the dma_buf handle. This reference should then get dropped in the
> >> > exporters dma_buf->ops->release() function, which is only getting called
> >> > when the last reference to the dma_buf disappears.
> >> >
> >>
> >> when the exporter creates the dma_buf handle(for example, gem -> fd),
> >> I think the refcount of gem object should be increased at this point,
> >> and decreased by dma_buf->ops->release() again because when the
> >> dma_buf is created and dma_buf_export() is called, this dma_buf refers
> >> to the gem object one time. and in case of inporter(fd -> gem),
> >> file->f_count of the dma_buf is increased and then when this gem
> >> object is released by user request such as drm close or
> >> drn_gem_close_ioctl, dma_buf_put() should be called by
> >> dma_buf->ops->detach() to decrease file->f_count again because the gem
> >> object refers to the dma_buf. for this, you can refer to my test
> >> repository I mentioned above. but the problem is that when a buffer is
> >> released by one side, another can't know whether the buffer already
> >> was released or not.
> >
> > Nope, dma_buf_put should not be called by ->detach. The importer gets his
> > reference when importing the dma_buf and needs to drop that reference
> > himself when it's done using the buffer by calling dma_buf_put (i.e. after
> > the last ->detach call).
> 
> I'm afraid that there may be my missing points. I'm confusing. who is
> Importer and who is Exporter you think? Importer is fd goes to private
> buffer and Exporter is private buffer goes to fd? if so, yes, when the
> importer needs to drop that reference(the importer want to release
> that buffer), dma_buf_put() should be called somewhere and in my case,
> that function is called by drm_prime_gem_destory(). this function is
> included at Dave's patch sets and also dma_buf_detatch() is called
> there. and I just thought that here is right place. I didn't find the
> place dma_buf_put() is called anywhere. could you please tell me where
> dma_buf_put() should be called at you think?.
> 
> for this, you can refer to Dave's repository:
> http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-prime-dmabuf

I haven't really looked at Dave's latest prime patches, but he reported
some reference counting issues last time around we chatted about it on
irc. So maybe you're just right and the dma_buf_put is indeed missing from
drm_prime_gem_destroy ;-) But as I've said, haven't really reviewed the
code, so I'm likely completely wrong.

Cheers, Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2012-01-09 15:17         ` Rob Clark
@ 2012-01-10  1:34           ` InKi Dae
  2012-01-10  2:14             ` Rob Clark
  0 siblings, 1 reply; 64+ messages in thread
From: InKi Dae @ 2012-01-10  1:34 UTC (permalink / raw)
  To: Rob Clark
  Cc: Sumit Semwal, linux-kernel, linux-arm-kernel, linux-mm,
	linaro-mm-sig, dri-devel, linux-media, linux, arnd, jesse.barker,
	m.szyprowski, t.stanislaws, Sumit Semwal, daniel

2012/1/10 Rob Clark <rob@ti.com>:
> On Mon, Jan 9, 2012 at 4:10 AM, InKi Dae <daeinki@gmail.com> wrote:
>> note : in case of sharing a buffer between v4l2 and drm driver, the
>> memory info would be copied vb2_xx_buf to xx_gem or xx_gem to
>> vb2_xx_buf through sg table. in this case, only memory info is used to
>> share, not some objects.
>
> which v4l2/vb2 patches are you looking at?  The patches I was using,
> vb2 holds a reference to the 'struct dma_buf *' internally, not just
> keeping the sg_table
>

yes, not keeping the sg_table. I mean... see a example below please.

static void vb2_dma_contig_map_dmabuf(void *mem_priv)
{
    struct sg_table *sg;
     ...
     sg = dma_buf_map_attachment(buf->db_attach, dir);
     ...
     buf->dma_addr = sg_dma_address(sg->sgl);
     ...
}

at least with no IOMMU, the memory information(containing physical
memory address) would be copied to vb2_xx_buf object if drm gem
exported its own buffer and vb2 wants to use that buffer at this time,
sg table is used to share that buffer. and the problem I pointed out
is that this buffer(also physical memory region) could be released by
vb2 framework(as you know, vb2_xx_buf object and the memory region for
buf->dma_addr pointing) but the Exporter(drm gem) couldn't know that
so some problems would be induced once drm gem tries to release or
access that buffer. and I have tried to resolve this issue adding
get_shared_cnt() callback to dma-buf.h but I'm not sure that this is
good way. maybe there would be better way.

Thanks.

> BR,
> -R

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2012-01-10  1:34           ` InKi Dae
@ 2012-01-10  2:14             ` Rob Clark
  2012-01-10  6:09               ` Semwal, Sumit
  2012-01-11  1:08               ` InKi Dae
  0 siblings, 2 replies; 64+ messages in thread
From: Rob Clark @ 2012-01-10  2:14 UTC (permalink / raw)
  To: InKi Dae
  Cc: t.stanislaws, linux, arnd, linux-kernel, dri-devel,
	linaro-mm-sig, linux-mm, m.szyprowski, Sumit Semwal,
	linux-arm-kernel, linux-media

On Mon, Jan 9, 2012 at 7:34 PM, InKi Dae <daeinki@gmail.com> wrote:
> 2012/1/10 Rob Clark <rob@ti.com>:
>> On Mon, Jan 9, 2012 at 4:10 AM, InKi Dae <daeinki@gmail.com> wrote:
>>> note : in case of sharing a buffer between v4l2 and drm driver, the
>>> memory info would be copied vb2_xx_buf to xx_gem or xx_gem to
>>> vb2_xx_buf through sg table. in this case, only memory info is used to
>>> share, not some objects.
>>
>> which v4l2/vb2 patches are you looking at?  The patches I was using,
>> vb2 holds a reference to the 'struct dma_buf *' internally, not just
>> keeping the sg_table
>>
>
> yes, not keeping the sg_table. I mean... see a example below please.
>
> static void vb2_dma_contig_map_dmabuf(void *mem_priv)
> {
>    struct sg_table *sg;
>     ...
>     sg = dma_buf_map_attachment(buf->db_attach, dir);
>     ...
>     buf->dma_addr = sg_dma_address(sg->sgl);
>     ...
> }
>
> at least with no IOMMU, the memory information(containing physical
> memory address) would be copied to vb2_xx_buf object if drm gem
> exported its own buffer and vb2 wants to use that buffer at this time,
> sg table is used to share that buffer. and the problem I pointed out
> is that this buffer(also physical memory region) could be released by
> vb2 framework(as you know, vb2_xx_buf object and the memory region for
> buf->dma_addr pointing) but the Exporter(drm gem) couldn't know that
> so some problems would be induced once drm gem tries to release or
> access that buffer. and I have tried to resolve this issue adding
> get_shared_cnt() callback to dma-buf.h but I'm not sure that this is
> good way. maybe there would be better way.

the exporter (in this case your driver's drm/gem bits) shouldn't
release that mapping / sgtable until the importer (in this case v4l2)
calls dma_buf_unmap fxn..

It would be an error if the importer did a dma_buf_put() without first
calling dma_buf_unmap_attachment() (if currently mapped) and then
dma_buf_detach() (if currently attached).  Perhaps somewhere there
should be some sanity checking debug code which could be enabled to do
a WARN_ON() if the importer does the wrong thing.  It shouldn't really
be part of the API, I don't think, but it actually does seem like a
good thing, esp. as new drivers start trying to use dmabuf, to have
some debug options which could be enabled.

It is entirely possible that something was missed on the vb2 patches,
but the way it is intended to work is like this:
https://github.com/robclark/kernel-omap4/blob/0961428143cd10269223e3d0f24bc3a66a96185f/drivers/media/video/videobuf2-core.c#L92

where it does a detach() before the dma_buf_put(), and the vb2-contig
backend checks here that it is also unmapped():
https://github.com/robclark/kernel-omap4/blob/0961428143cd10269223e3d0f24bc3a66a96185f/drivers/media/video/videobuf2-dma-contig.c#L251

BR,
-R

> Thanks.
>
>> BR,
>> -R
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2012-01-10  2:14             ` Rob Clark
@ 2012-01-10  6:09               ` Semwal, Sumit
  2012-01-10  7:28                 ` InKi Dae
  2012-01-11  1:08               ` InKi Dae
  1 sibling, 1 reply; 64+ messages in thread
From: Semwal, Sumit @ 2012-01-10  6:09 UTC (permalink / raw)
  To: Rob Clark
  Cc: InKi Dae, t.stanislaws, linux, arnd, linux-kernel, dri-devel,
	linaro-mm-sig, linux-mm, m.szyprowski, Sumit Semwal,
	linux-arm-kernel, linux-media

On Tue, Jan 10, 2012 at 7:44 AM, Rob Clark <rob@ti.com> wrote:
> On Mon, Jan 9, 2012 at 7:34 PM, InKi Dae <daeinki@gmail.com> wrote:
>> 2012/1/10 Rob Clark <rob@ti.com>:
>> at least with no IOMMU, the memory information(containing physical
>> memory address) would be copied to vb2_xx_buf object if drm gem
>> exported its own buffer and vb2 wants to use that buffer at this time,
>> sg table is used to share that buffer. and the problem I pointed out
>> is that this buffer(also physical memory region) could be released by
>> vb2 framework(as you know, vb2_xx_buf object and the memory region for
>> buf->dma_addr pointing) but the Exporter(drm gem) couldn't know that
>> so some problems would be induced once drm gem tries to release or
>> access that buffer. and I have tried to resolve this issue adding
>> get_shared_cnt() callback to dma-buf.h but I'm not sure that this is
>> good way. maybe there would be better way.
Hi Inki,
As also mentioned in the documentation patch, importer (the user of
the buffer) - in this case for current RFC patches on
v4l2-as-a-user[1] vb2 framework - shouldn't release the backing memory
of the buffer directly - it should only use the dma-buf callbacks in
the right sequence to let the exporter know that it is done using this
buffer, so the exporter can release it if allowed and needed.
>
> the exporter (in this case your driver's drm/gem bits) shouldn't
> release that mapping / sgtable until the importer (in this case v4l2)
> calls dma_buf_unmap fxn..
>
> It would be an error if the importer did a dma_buf_put() without first
> calling dma_buf_unmap_attachment() (if currently mapped) and then
> dma_buf_detach() (if currently attached).  Perhaps somewhere there
> should be some sanity checking debug code which could be enabled to do
> a WARN_ON() if the importer does the wrong thing.  It shouldn't really
> be part of the API, I don't think, but it actually does seem like a
> good thing, esp. as new drivers start trying to use dmabuf, to have
> some debug options which could be enabled.
>
> It is entirely possible that something was missed on the vb2 patches,
> but the way it is intended to work is like this:
> https://github.com/robclark/kernel-omap4/blob/0961428143cd10269223e3d0f24bc3a66a96185f/drivers/media/video/videobuf2-core.c#L92
>
> where it does a detach() before the dma_buf_put(), and the vb2-contig
> backend checks here that it is also unmapped():
> https://github.com/robclark/kernel-omap4/blob/0961428143cd10269223e3d0f24bc3a66a96185f/drivers/media/video/videobuf2-dma-contig.c#L251

The proposed RFC for V4L2 adaptation at [1] does exactly the same
thing; detach() before dma_buf_put(), and check in vb2-contig backend
for unmapped() as mentioned above.

>
> BR,
> -R
>
BR,
Sumit.

[1]: V4l2 as a dma-buf user RFC:
http://comments.gmane.org/gmane.linux.drivers.video-input-infrastructure/42966

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2012-01-10  6:09               ` Semwal, Sumit
@ 2012-01-10  7:28                 ` InKi Dae
  2012-01-10  9:19                   ` InKi Dae
  0 siblings, 1 reply; 64+ messages in thread
From: InKi Dae @ 2012-01-10  7:28 UTC (permalink / raw)
  To: Semwal, Sumit
  Cc: Rob Clark, t.stanislaws, linux, arnd, linux-kernel, dri-devel,
	linaro-mm-sig, linux-mm, m.szyprowski, Sumit Semwal,
	linux-arm-kernel, linux-media

2012/1/10 Semwal, Sumit <sumit.semwal@ti.com>:
> On Tue, Jan 10, 2012 at 7:44 AM, Rob Clark <rob@ti.com> wrote:
>> On Mon, Jan 9, 2012 at 7:34 PM, InKi Dae <daeinki@gmail.com> wrote:
>>> 2012/1/10 Rob Clark <rob@ti.com>:
>>> at least with no IOMMU, the memory information(containing physical
>>> memory address) would be copied to vb2_xx_buf object if drm gem
>>> exported its own buffer and vb2 wants to use that buffer at this time,
>>> sg table is used to share that buffer. and the problem I pointed out
>>> is that this buffer(also physical memory region) could be released by
>>> vb2 framework(as you know, vb2_xx_buf object and the memory region for
>>> buf->dma_addr pointing) but the Exporter(drm gem) couldn't know that
>>> so some problems would be induced once drm gem tries to release or
>>> access that buffer. and I have tried to resolve this issue adding
>>> get_shared_cnt() callback to dma-buf.h but I'm not sure that this is
>>> good way. maybe there would be better way.
> Hi Inki,
> As also mentioned in the documentation patch, importer (the user of
> the buffer) - in this case for current RFC patches on
> v4l2-as-a-user[1] vb2 framework - shouldn't release the backing memory
> of the buffer directly - it should only use the dma-buf callbacks in
> the right sequence to let the exporter know that it is done using this
> buffer, so the exporter can release it if allowed and needed.

thank you for your comments.:) and below are some tables about dmabuf
operations with ideal use and these tables indicate reference count of
when buffer is created, shared and released. so if there are any
problems, please let me know. P.S. these are just simple cases so
there would be others.


in case of using only drm gem and dmabuf,

operations                       gem refcount    file f_count   buf refcount
------------------------------------------------------------------------------------------------
1. gem create                   A:1                                   A:0
2. export(handle A -> fd)    A:2                A:1              A:0
3. import(fd -> handle B)    A:2, B:1         A:2              A:1
4. file close(A)                  A:2, B:1         A:1              A:1
5. gem close(A)                A:1, B:1         A:1              A:1
6. gem close(B)                A:1, B:0         A:1              A:0
7. file close(A)                  A:0                A:0
-----------------------------------------------------------------------------------------------
3. handle B shares the buf of handle A.
6. release handle B but its buf.
7. release gem handle A and dmabuf of file A and also physical memory region.


and in case of using drm gem, vb2 and dmabuf,

operations                  gem, vb2 refcount    file f_count   buf refcount
------------------------------------------------------------------------------------------------
1. gem create                   A:1                 A:0
   (GEM side)
2. export(handle A -> fd)    A:2                 A:1              A:0
   (GEM side)
3. import(fd -> handle B)    A:2, B:1          A:2              A:1
   (VB2 side)
4. file close(A)                  A:2, B:1          A:1              A:1
   (VB2 side)
5. vb2 close(B)                 A:2, B:0          A:1              A:0
   (VB2 side)
6. gem close(A)                A:1                A:1              A:0
   (GEM side)
7. file close(A)                  A:0                A:0
   (GEM side)
------------------------------------------------------------------------------------------------
3. vb2 handle B is shared with the buf of gem handle A.
5. release vb2 handle B and decrease refcount of the buf pointed by it.
7. release gem handle A and dmabuf of file A and also physical memory region.

>>
>> the exporter (in this case your driver's drm/gem bits) shouldn't
>> release that mapping / sgtable until the importer (in this case v4l2)
>> calls dma_buf_unmap fxn..
>>
>> It would be an error if the importer did a dma_buf_put() without first
>> calling dma_buf_unmap_attachment() (if currently mapped) and then
>> dma_buf_detach() (if currently attached).  Perhaps somewhere there
>> should be some sanity checking debug code which could be enabled to do
>> a WARN_ON() if the importer does the wrong thing.  It shouldn't really
>> be part of the API, I don't think, but it actually does seem like a
>> good thing, esp. as new drivers start trying to use dmabuf, to have
>> some debug options which could be enabled.
>>
>> It is entirely possible that something was missed on the vb2 patches,
>> but the way it is intended to work is like this:
>> https://github.com/robclark/kernel-omap4/blob/0961428143cd10269223e3d0f24bc3a66a96185f/drivers/media/video/videobuf2-core.c#L92
>>
>> where it does a detach() before the dma_buf_put(), and the vb2-contig
>> backend checks here that it is also unmapped():
>> https://github.com/robclark/kernel-omap4/blob/0961428143cd10269223e3d0f24bc3a66a96185f/drivers/media/video/videobuf2-dma-contig.c#L251
>
> The proposed RFC for V4L2 adaptation at [1] does exactly the same
> thing; detach() before dma_buf_put(), and check in vb2-contig backend
> for unmapped() as mentioned above.
>
>>
>> BR,
>> -R
>>
> BR,
> Sumit.
>
> [1]: V4l2 as a dma-buf user RFC:
> http://comments.gmane.org/gmane.linux.drivers.video-input-infrastructure/42966

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2012-01-10  7:28                 ` InKi Dae
@ 2012-01-10  9:19                   ` InKi Dae
  0 siblings, 0 replies; 64+ messages in thread
From: InKi Dae @ 2012-01-10  9:19 UTC (permalink / raw)
  To: Semwal, Sumit
  Cc: Rob Clark, t.stanislaws, linux, arnd, linux-kernel, dri-devel,
	linaro-mm-sig, linux-mm, m.szyprowski, Sumit Semwal,
	linux-arm-kernel, linux-media

2012/1/10 InKi Dae <daeinki@gmail.com>:
> 2012/1/10 Semwal, Sumit <sumit.semwal@ti.com>:
>> On Tue, Jan 10, 2012 at 7:44 AM, Rob Clark <rob@ti.com> wrote:
>>> On Mon, Jan 9, 2012 at 7:34 PM, InKi Dae <daeinki@gmail.com> wrote:
>>>> 2012/1/10 Rob Clark <rob@ti.com>:
>>>> at least with no IOMMU, the memory information(containing physical
>>>> memory address) would be copied to vb2_xx_buf object if drm gem
>>>> exported its own buffer and vb2 wants to use that buffer at this time,
>>>> sg table is used to share that buffer. and the problem I pointed out
>>>> is that this buffer(also physical memory region) could be released by
>>>> vb2 framework(as you know, vb2_xx_buf object and the memory region for
>>>> buf->dma_addr pointing) but the Exporter(drm gem) couldn't know that
>>>> so some problems would be induced once drm gem tries to release or
>>>> access that buffer. and I have tried to resolve this issue adding
>>>> get_shared_cnt() callback to dma-buf.h but I'm not sure that this is
>>>> good way. maybe there would be better way.
>> Hi Inki,
>> As also mentioned in the documentation patch, importer (the user of
>> the buffer) - in this case for current RFC patches on
>> v4l2-as-a-user[1] vb2 framework - shouldn't release the backing memory
>> of the buffer directly - it should only use the dma-buf callbacks in
>> the right sequence to let the exporter know that it is done using this
>> buffer, so the exporter can release it if allowed and needed.
>
> thank you for your comments.:) and below are some tables about dmabuf
> operations with ideal use and these tables indicate reference count of
> when buffer is created, shared and released. so if there are any
> problems, please let me know. P.S. these are just simple cases so
> there would be others.
>
>
> in case of using only drm gem and dmabuf,
>
> operations                       gem refcount    file f_count   buf refcount
> ------------------------------------------------------------------------------------------------
> 1. gem create                   A:1                                   A:0
> 2. export(handle A -> fd)    A:2                A:1              A:0
> 3. import(fd -> handle B)    A:2, B:1         A:2              A:1
> 4. file close(A)                  A:2, B:1         A:1              A:1
> 5. gem close(A)                A:1, B:1         A:1              A:1
> 6. gem close(B)                A:1, B:0         A:1              A:0
> 7. file close(A)                  A:0                A:0
> -----------------------------------------------------------------------------------------------
> 3. handle B shares the buf of handle A.
> 6. release handle B but its buf.
> 7. release gem handle A and dmabuf of file A and also physical memory region.
>
>
> and in case of using drm gem, vb2 and dmabuf,
>
> operations                  gem, vb2 refcount    file f_count   buf refcount
> ------------------------------------------------------------------------------------------------
> 1. gem create                   A:1                 A:0
>   (GEM side)
> 2. export(handle A -> fd)    A:2                 A:1              A:0
>   (GEM side)
> 3. import(fd -> handle B)    A:2, B:1          A:2              A:1
>   (VB2 side)
> 4. file close(A)                  A:2, B:1          A:1              A:1
>   (VB2 side)
> 5. vb2 close(B)                 A:2, B:0          A:1              A:0
>   (VB2 side)
> 6. gem close(A)                A:1                A:1              A:0
>   (GEM side)
> 7. file close(A)                  A:0                A:0
>   (GEM side)
> ------------------------------------------------------------------------------------------------
> 3. vb2 handle B is shared with the buf of gem handle A.
> 5. release vb2 handle B and decrease refcount of the buf pointed by it.
> 7. release gem handle A and dmabuf of file A and also physical memory region.
>

Ah, sorry, it seems that file close shouldn't be called because
file->f_count of the file would be dropped by Importer and if f_count
is 0 then the file would be released by fput() so I'm not sure but
again:

in case of using only drm gem and dmabuf,

operations                       gem refcount      file f_count   buf refcount
--------------------------------------------------------------------------------------------------
1. gem create(A)               A:1                                     A:0
2. export(handle A -> fd)    A:2                  A:1              A:0
3. import(fd -> handle B)    A:2, B:1           A:2              A:1
4. gem close(B)                A:2, B:release  A:1              A:0
5. gem close(A)                A:1,                 A:1              A:0
6. gem close(A)                A:release         A:release     A:release
-------------------------------------------------------------------------------------------------

and in case of using drm gem, vb2 and dmabuf,

operations                  gem, vb2 refcount    file f_count   buf refcount
------------------------------------------------------------------------------------------------
1. gem create                   A:1                 A:0              A:0
  (GEM side)
2. export(handle A -> fd)    A:2                 A:1              A:0
  (GEM side)
3. import(fd -> handle B)    A:2, B:1          A:2              A:1
  (VB2 side)
5. vb2 close(B)                 A:2, B:release  A:1              A:0
  (VB2 side)
6. gem close(A)                A:1                A:1              A:0
  (GEM side)
6. gem close(A)                A:release       A:release      A:release
  (GEM side)
------------------------------------------------------------------------------------------------

>>>
>>> the exporter (in this case your driver's drm/gem bits) shouldn't
>>> release that mapping / sgtable until the importer (in this case v4l2)
>>> calls dma_buf_unmap fxn..
>>>
>>> It would be an error if the importer did a dma_buf_put() without first
>>> calling dma_buf_unmap_attachment() (if currently mapped) and then
>>> dma_buf_detach() (if currently attached).  Perhaps somewhere there
>>> should be some sanity checking debug code which could be enabled to do
>>> a WARN_ON() if the importer does the wrong thing.  It shouldn't really
>>> be part of the API, I don't think, but it actually does seem like a
>>> good thing, esp. as new drivers start trying to use dmabuf, to have
>>> some debug options which could be enabled.
>>>
>>> It is entirely possible that something was missed on the vb2 patches,
>>> but the way it is intended to work is like this:
>>> https://github.com/robclark/kernel-omap4/blob/0961428143cd10269223e3d0f24bc3a66a96185f/drivers/media/video/videobuf2-core.c#L92
>>>
>>> where it does a detach() before the dma_buf_put(), and the vb2-contig
>>> backend checks here that it is also unmapped():
>>> https://github.com/robclark/kernel-omap4/blob/0961428143cd10269223e3d0f24bc3a66a96185f/drivers/media/video/videobuf2-dma-contig.c#L251
>>
>> The proposed RFC for V4L2 adaptation at [1] does exactly the same
>> thing; detach() before dma_buf_put(), and check in vb2-contig backend
>> for unmapped() as mentioned above.
>>
>>>
>>> BR,
>>> -R
>>>
>> BR,
>> Sumit.
>>
>> [1]: V4l2 as a dma-buf user RFC:
>> http://comments.gmane.org/gmane.linux.drivers.video-input-infrastructure/42966

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [RFC v2 1/2] dma-buf: Introduce dma buffer sharing mechanism
  2012-01-10  2:14             ` Rob Clark
  2012-01-10  6:09               ` Semwal, Sumit
@ 2012-01-11  1:08               ` InKi Dae
  1 sibling, 0 replies; 64+ messages in thread
From: InKi Dae @ 2012-01-11  1:08 UTC (permalink / raw)
  To: Rob Clark
  Cc: t.stanislaws, linux, arnd, linux-kernel, dri-devel,
	linaro-mm-sig, linux-mm, m.szyprowski, Sumit Semwal,
	linux-arm-kernel, linux-media

2012/1/10 Rob Clark <rob@ti.com>:
> On Mon, Jan 9, 2012 at 7:34 PM, InKi Dae <daeinki@gmail.com> wrote:
>> 2012/1/10 Rob Clark <rob@ti.com>:
>>> On Mon, Jan 9, 2012 at 4:10 AM, InKi Dae <daeinki@gmail.com> wrote:
>>>> note : in case of sharing a buffer between v4l2 and drm driver, the
>>>> memory info would be copied vb2_xx_buf to xx_gem or xx_gem to
>>>> vb2_xx_buf through sg table. in this case, only memory info is used to
>>>> share, not some objects.
>>>
>>> which v4l2/vb2 patches are you looking at?  The patches I was using,
>>> vb2 holds a reference to the 'struct dma_buf *' internally, not just
>>> keeping the sg_table
>>>
>>
>> yes, not keeping the sg_table. I mean... see a example below please.
>>
>> static void vb2_dma_contig_map_dmabuf(void *mem_priv)
>> {
>>    struct sg_table *sg;
>>     ...
>>     sg = dma_buf_map_attachment(buf->db_attach, dir);
>>     ...
>>     buf->dma_addr = sg_dma_address(sg->sgl);
>>     ...
>> }
>>
>> at least with no IOMMU, the memory information(containing physical
>> memory address) would be copied to vb2_xx_buf object if drm gem
>> exported its own buffer and vb2 wants to use that buffer at this time,
>> sg table is used to share that buffer. and the problem I pointed out
>> is that this buffer(also physical memory region) could be released by
>> vb2 framework(as you know, vb2_xx_buf object and the memory region for
>> buf->dma_addr pointing) but the Exporter(drm gem) couldn't know that
>> so some problems would be induced once drm gem tries to release or
>> access that buffer. and I have tried to resolve this issue adding
>> get_shared_cnt() callback to dma-buf.h but I'm not sure that this is
>> good way. maybe there would be better way.
>
> the exporter (in this case your driver's drm/gem bits) shouldn't
> release that mapping / sgtable until the importer (in this case v4l2)
> calls dma_buf_unmap fxn..
>
> It would be an error if the importer did a dma_buf_put() without first
> calling dma_buf_unmap_attachment() (if currently mapped) and then
> dma_buf_detach() (if currently attached).  Perhaps somewhere there
> should be some sanity checking debug code which could be enabled to do
> a WARN_ON() if the importer does the wrong thing.  It shouldn't really
> be part of the API, I don't think, but it actually does seem like a
> good thing, esp. as new drivers start trying to use dmabuf, to have
> some debug options which could be enabled.
>
> It is entirely possible that something was missed on the vb2 patches,
> but the way it is intended to work is like this:
> https://github.com/robclark/kernel-omap4/blob/0961428143cd10269223e3d0f24bc3a66a96185f/drivers/media/video/videobuf2-core.c#L92
>
> where it does a detach() before the dma_buf_put(), and the vb2-contig
> backend checks here that it is also unmapped():
> https://github.com/robclark/kernel-omap4/blob/0961428143cd10269223e3d0f24bc3a66a96185f/drivers/media/video/videobuf2-dma-contig.c#L251
>

I think that we also used same concept as your. for this, you can
refer to Dave's repository below and see the drm_prime_gem_destroy
function.
http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-prime-dmabuf&id=7cb374d6642e838e0e4836042e057e6d9139dcad

but when it comes to releasing resources, I mistakely understood some
parts of dmabuf concept so thank you for Rob and Sumit. that is very
useful.

> BR,
> -R
>
>> Thanks.
>>
>>> BR,
>>> -R
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2012-01-11  1:08 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-12-02  8:57 [RFC v2 0/2] Introduce DMA buffer sharing mechanism Sumit Semwal
2011-12-02  8:57 ` [RFC v2 1/2] dma-buf: Introduce dma " Sumit Semwal
2011-12-02 17:11   ` Konrad Rzeszutek Wilk
2011-12-05  9:48     ` Semwal, Sumit
2011-12-05 17:18   ` Arnd Bergmann
2011-12-05 18:55     ` Daniel Vetter
2011-12-05 19:29       ` Arnd Bergmann
2011-12-05 20:58         ` Daniel Vetter
2011-12-05 22:04           ` Arnd Bergmann
2011-12-05 22:33             ` Daniel Vetter
2011-12-05 20:46     ` Rob Clark
2011-12-05 21:23       ` Daniel Vetter
2011-12-05 22:11         ` Rob Clark
2011-12-05 22:33           ` Daniel Vetter
2011-12-06 13:16           ` Arnd Bergmann
2011-12-06 15:28             ` Daniel Vetter
2011-12-07 13:27           ` Semwal, Sumit
2011-12-07 13:40             ` Arnd Bergmann
2011-12-08 21:44               ` [Linaro-mm-sig] " Daniel Vetter
2011-12-09 14:13                 ` Arnd Bergmann
2011-12-09 14:24                   ` Alan Cox
2011-12-10  4:01                     ` Daniel Vetter
2011-12-12 16:48                       ` Arnd Bergmann
2011-12-19  6:16                         ` Semwal, Sumit
2011-12-20 15:41                           ` Arnd Bergmann
2011-12-20 16:41                             ` Rob Clark
2011-12-20 17:14                               ` Daniel Vetter
2011-12-21 17:27                                 ` Arnd Bergmann
2011-12-21 19:04                                   ` Daniel Vetter
2011-12-23 10:00                                   ` Semwal, Sumit
2011-12-23 17:10                                     ` Rob Clark
2011-12-20  9:03                   ` Sakari Ailus
2011-12-20 15:36                     ` Arnd Bergmann
2012-01-01 20:53                       ` Sakari Ailus
2012-01-01 23:12                         ` Rob Clark
2011-12-13 13:33                 ` Hans Verkuil
2011-12-05 22:09       ` Arnd Bergmann
2011-12-05 22:15         ` Rob Clark
2011-12-05 22:35         ` Rob Clark
2011-12-07  6:35     ` Semwal, Sumit
2011-12-07 10:11       ` Arnd Bergmann
2011-12-07 11:02         ` Semwal, Sumit
2011-12-07 11:34           ` Arnd Bergmann
2011-12-09 22:50     ` [Linaro-mm-sig] " Robert Morell
2011-12-10 11:13       ` Mauro Carvalho Chehab
2011-12-12 22:44         ` Robert Morell
2011-12-13 15:10           ` Arnd Bergmann
2011-12-20  2:05             ` Robert Morell
2011-12-20 14:29               ` Anca Emanuel
2012-01-09  6:20   ` InKi Dae
2012-01-09  8:10     ` Daniel Vetter
2012-01-09  8:11       ` [Linaro-mm-sig] " Dave Airlie
2012-01-09 10:10       ` InKi Dae
2012-01-09 10:27         ` Daniel Vetter
2012-01-09 12:06           ` InKi Dae
2012-01-09 16:02             ` Daniel Vetter
2012-01-09 15:17         ` Rob Clark
2012-01-10  1:34           ` InKi Dae
2012-01-10  2:14             ` Rob Clark
2012-01-10  6:09               ` Semwal, Sumit
2012-01-10  7:28                 ` InKi Dae
2012-01-10  9:19                   ` InKi Dae
2012-01-11  1:08               ` InKi Dae
2011-12-02  8:57 ` [RFC v2 2/2] dma-buf: Documentation for buffer sharing framework Sumit Semwal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).