linux-media.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Stevenson <dave.stevenson@raspberrypi.com>
To: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Cc: "Laurent Pinchart" <laurent.pinchart@ideasonboard.com>,
	"Linux Media Mailing List" <linux-media@vger.kernel.org>,
	"Kieran Bingham" <kieran.bingham@ideasonboard.com>,
	"Jacopo Mondi" <jacopo@jmondi.org>,
	"Niklas Söderlund" <niklas.soderlund@ragnatech.se>,
	"Naushir Patuck" <naush@raspberrypi.com>,
	"Dave Stevenson" <dave.stevenson@raspberrypi.org>
Subject: Re: [PATCH v2 06/34] staging: vc04_services: Add new vc-sm-cma driver
Date: Mon, 18 May 2020 16:48:07 +0100	[thread overview]
Message-ID: <CAPY8ntCZsFtko4LMUsfSEUV9LwtJ9bdjXK4ZVJ3KFd18vzRp5A@mail.gmail.com> (raw)
In-Reply-To: <9b42ad8c4c39ac3873e7c3ea2951bea1caef8bd1.camel@suse.de>

Hi Nicolas

On Mon, 11 May 2020 at 19:43, Nicolas Saenz Julienne
<nsaenzjulienne@suse.de> wrote:
>
> Hi Dave,
> some questions/comments.
>
> On Mon, 2020-05-04 at 12:25 +0300, Laurent Pinchart wrote:
> > From: Dave Stevenson <dave.stevenson@raspberrypi.org>
> >
> > Add Broadcom VideoCore Shared Memory support.
> >
> > This new driver allows contiguous memory blocks to be imported
> > into the VideoCore VPU memory map, and manages the lifetime of
> > those objects, only releasing the source dmabuf once the VPU has
> > confirmed it has finished with it.
> >
> > Driver upported from the RaspberryPi BSP at revision:
> > 890691d1c996 ("staging: vc04_services: Fix vcsm overflow bug when counting
> > transactions")
> > forward ported to recent mainline kernel version.
> >
> > Signed-off-by: Naushir Patuck <naush@raspberrypi.com>
> > Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
> > Signed-off-by: Jacopo Mondi <jacopo@jmondi.org>
> > ---
>
> [...]
>
> > +
> > +/* Import a dma_buf to be shared with VC. */
> > +int
> > +vc_sm_cma_import_dmabuf_internal(struct vc_sm_privdata_t *private,
> > +                              struct dma_buf *dma_buf,
> > +                              int fd,
> > +                              struct dma_buf **imported_buf)
> > +{
> > +     DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
> > +     struct vc_sm_buffer *buffer = NULL;
> > +     struct vc_sm_import import = { };
> > +     struct vc_sm_import_result result = { };
> > +     struct dma_buf_attachment *attach = NULL;
> > +     struct sg_table *sgt = NULL;
> > +     dma_addr_t dma_addr;
> > +     int ret = 0;
> > +     int status;
> > +
> > +     /* Setup our allocation parameters */
> > +     pr_debug("%s: importing dma_buf %p/fd %d\n", __func__, dma_buf, fd);
> > +
> > +     if (fd < 0)
> > +             get_dma_buf(dma_buf);
> > +     else
> > +             dma_buf = dma_buf_get(fd);
> > +
> > +     if (!dma_buf)
> > +             return -EINVAL;
> > +
> > +     attach = dma_buf_attach(dma_buf, &sm_state->pdev->dev);
> > +     if (IS_ERR(attach)) {
> > +             ret = PTR_ERR(attach);
> > +             goto error;
> > +     }
> > +
> > +     sgt = dma_buf_map_attachment(attach, DMA_BIDIRECTIONAL);
> > +     if (IS_ERR(sgt)) {
> > +             ret = PTR_ERR(sgt);
> > +             goto error;
> > +     }
> > +
> > +     /* Verify that the address block is contiguous */
> > +     if (sgt->nents != 1) {
> > +             ret = -ENOMEM;
> > +             goto error;
> > +     }
> > +
> > +     /* Allocate local buffer to track this allocation. */
> > +     buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
> > +     if (!buffer) {
> > +             ret = -ENOMEM;
> > +             goto error;
> > +     }
> > +
> > +     import.type = VC_SM_ALLOC_NON_CACHED;
> > +     dma_addr = sg_dma_address(sgt->sgl);
> > +     import.addr = (u32)dma_addr;
> > +     if ((import.addr & 0xC0000000) != 0xC0000000) {
> > +             pr_err("%s: Expecting an uncached alias for dma_addr %pad\n",
> > +                    __func__, &dma_addr);
> > +             import.addr |= 0xC0000000;
> > +     }
>
> Just so we don't forget about it, this shouldn't be needed once dma-ranges are
> fixed.

Indeed not, but we've had enough issues with dma-ranges going missing
that screaming and shouting about it is a good thing.

> > +     import.size = sg_dma_len(sgt->sgl);
> > +     import.allocator = current->tgid;
> > +     import.kernel_id = get_kernel_id(buffer);
> > +
> > +     memcpy(import.name, VC_SM_RESOURCE_NAME_DEFAULT,
> > +            sizeof(VC_SM_RESOURCE_NAME_DEFAULT));
> > +
> > +     pr_debug("[%s]: attempt to import \"%s\" data - type %u, addr %pad, size
> > %u.\n",
> > +              __func__, import.name, import.type, &dma_addr, import.size);
> > +
> > +     /* Allocate the videocore buffer. */
> > +     status = vc_sm_cma_vchi_import(sm_state->sm_handle, &import, &result,
> > +                                    &sm_state->int_trans_id);
> > +     if (status == -EINTR) {
> > +             pr_debug("[%s]: requesting import memory action restart
> > (trans_id: %u)\n",
> > +                      __func__, sm_state->int_trans_id);
> > +             ret = -ERESTARTSYS;
> > +             private->restart_sys = -EINTR;
> > +             private->int_action = VC_SM_MSG_TYPE_IMPORT;
> > +             goto error;
> > +     } else if (status || !result.res_handle) {
> > +             pr_debug("[%s]: failed to import memory on videocore (status:
> > %u, trans_id: %u)\n",
> > +                      __func__, status, sm_state->int_trans_id);
> > +             ret = -ENOMEM;
> > +             goto error;
> > +     }
> > +
> > +     mutex_init(&buffer->lock);
> > +     INIT_LIST_HEAD(&buffer->attachments);
> > +     memcpy(buffer->name, import.name,
> > +            min(sizeof(buffer->name), sizeof(import.name) - 1));
> > +
> > +     /* Keep track of the buffer we created. */
> > +     buffer->private = private;
> > +     buffer->vc_handle = result.res_handle;
> > +     buffer->size = import.size;
> > +     buffer->vpu_state = VPU_MAPPED;
> > +
> > +     buffer->imported = 1;
> > +     buffer->import.dma_buf = dma_buf;
> > +
> > +     buffer->import.attach = attach;
> > +     buffer->import.sgt = sgt;
> > +     buffer->dma_addr = dma_addr;
> > +     buffer->in_use = 1;
> > +     buffer->kernel_id = import.kernel_id;
> > +
> > +     /*
> > +      * We're done - we need to export a new dmabuf chaining through most
> > +      * functions, but enabling us to release our own internal references
> > +      * here.
> > +      */
> > +     exp_info.ops = &dma_buf_import_ops;
> > +     exp_info.size = import.size;
> > +     exp_info.flags = O_RDWR;
> > +     exp_info.priv = buffer;
> > +
> > +     buffer->dma_buf = dma_buf_export(&exp_info);
>
> Could you comment on the need for this second dma_buf? I've only reviewed code
> related to mmal-vchiq imports, but it seems to me that it would be saner to do
> the unmapping and unattaching explicitly as opposed to having this second
> buffer refcount hit 0. Although, I can imagine this being needed for the
> userspace interface.

Indeed, as it is needed for the userspace interface it seemed to make
more sense to have common handling rather than two code paths doing
nearly the same thing but in different ways.
Downstream we need a userspace import at least to allow MMAL to set up
zero copy, so unless it raises any real objections then it would be
useful to keep it.

> When you talk about moving to dmabuf heaps, I've pictured a specific dmabuf
> heap for vc4 that takes care of all the importing and unimporting (aside from
> cma allocations). Am I right? If so, I'm pretty confident we can do away with
> this.

(Note I'm talking about the VideoCore4 VPU and other blocks, and not
the vc4 DRM/KMS and V3D drivers)

No, I'm looking at using the existing cma_heap driver to do the
allocations, and then this driver will import them and handle the
lifetime on behalf of the VPU. There's no need for VPU allocations to
be split off into yet another heap.
One of the things we are trying to get away from is having the gpu_mem
reserved lump that Linux can't get access to at all, so allocating
from the CMA heap and importing to the VPU avoids that.


I'll give some history here, which also hopefully covers your query
over switching mmal-vchiq to zero copy.

Almost all the VC4 blocks need contiguous memory, so fragmentation was
an issue. To resolve that we (back in Broadcom days) had the
"relocatable heap" - allocations that needed to be locked before
access and unlocked after. Unlocked blocks could be copied and moved
around to free up larger contiguous blocks. These allocations use a
handle instead of a pointer, and have internal refcounting etc.
Basically providing some of the features of an MMU when you don't have
one.

The original VCSM driver allowed userspace to make a relocatable heap
allocation, lock it, and the kernel to map the relevant pages into the
ARM memory space. Now you have a shared buffer, and VCHIQ no longer
has to copy the data back and forth. (Cache flushing was also
handled).
So MMAL in zero copy mode passes the VPU relocatable heap handle
across in the VCHIQ message, not a pointer to the actual data. VCSM
did the allocation on behalf of the MMAL client, and provides the
mapping and VPU handle to the buffer. This still leaves the allocation
being made from gpu_mem though.

The rewrite (vc-sm-cma) was to make use of an import feature into the
relocatable heap, termed internally as mem wrapping. Take a CMA
allocation made by something, pass the DMA address and size across to
the VPU, and it can insert it as a relocatable heap object that can be
used in exactly the same way gpu_mem allocations. gpu_mem can now be
shrunk in size :-) It was using a dma-buf as a convenient object to
manage the allocation, and handle importing buffers allocated by other
subsystems
Note that we still have refcounting internally to the relocatable
heap, so at the point the client says it has finished with it, the VPU
may not have done. When the last relocatable heap reference is
released, the kernel gets a callback (VC_SM_MSG_TYPE_RELEASED), and it
is only at that point that it is safe to drop the reference to the
imported dmabuf.

V4L2 can do the relevant import and wrapping to a relocatable heap
handle as part of the buffer passing. MMAL needs to do it manually
from userspace as VCHIQ is the only in-kernel service that it uses,
hence we need an import ioctl and free mechanism (if the handle is a
dmabuf, then that's close).


From a platform level it would be nice to have the userspace ioctl for
importing a dmabuf in mainline, however it isn't necessary for the
V4L2 use cases that we're trying to upstream here. The driver without
userspace API would look pretty much like the one in [1]. I'll try and
update that to include the basic import userspace API to give a
comparison.
I don't mind which way this goes as to whether the userspace ioctl
remains as downstream patches, but losing the dmabuf as the handle
within vc-sm-cma will make that patch huge, and they're almost
guaranteed to diverge.
Ignore the caching ioctls - they're irrelevant.

I hope that makes the situation a little clearer.

  Dave

[1] https://github.com/6by9/linux/tree/staging_next_upstreaming_apr20/drivers/staging/vc04_services/vc-sm-cma

  reply	other threads:[~2020-05-18 15:48 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-04  9:25 [PATCH v2 00/34] Drivers for the BCM283x CSI-2/CCP2 receiver and ISP Laurent Pinchart
2020-05-04  9:25 ` [PATCH v2 01/34] media: uapi: v4l2-core: Add sensor ancillary data V4L2 fourcc type Laurent Pinchart
2020-05-04 13:48   ` Hans Verkuil
2020-05-04 14:39     ` Dave Stevenson
2020-05-04 15:32       ` Hans Verkuil
2020-05-04 16:08         ` Laurent Pinchart
2020-05-05 11:20           ` Dave Stevenson
2020-05-04  9:25 ` [PATCH v2 02/34] media: uapi: Add MEDIA_BUS_FMT_SENSOR_DATA media bus format Laurent Pinchart
2020-05-04  9:25 ` [PATCH v2 03/34] dt-bindings: media: Document BCM283x CSI2/CCP2 receiver Laurent Pinchart
2020-05-04  9:25 ` [PATCH v2 04/34] media: bcm2835-unicam: Driver for CCP2/CSI2 camera interface Laurent Pinchart
2020-05-04 15:21   ` Hans Verkuil
2020-05-05  1:26   ` kbuild test robot
2020-05-06 18:01   ` Nicolas Saenz Julienne
2020-08-29 11:20   ` Jacopo Mondi
2020-08-29 18:32     ` Laurent Pinchart
2020-08-31  7:38       ` Jacopo Mondi
2020-08-31 14:17         ` Laurent Pinchart
2020-08-31 14:46           ` Jacopo Mondi
2020-08-31 14:56             ` Laurent Pinchart
2020-09-01  8:41               ` Dave Stevenson
2020-09-01 10:22                 ` Jacopo Mondi
2020-09-01 16:37                   ` Dave Stevenson
2020-09-01 17:11                     ` Laurent Pinchart
2020-09-15  7:03   ` Sakari Ailus
2020-09-15  9:32     ` Laurent Pinchart
2020-09-15 13:28       ` Dave Stevenson
2020-10-30 17:53         ` Jacopo Mondi
2020-09-15 17:30     ` Dave Stevenson
2020-05-04  9:25 ` [PATCH v2 05/34] ARM: dts: bcm2711: Add Unicam DT nodes Laurent Pinchart
2020-05-04  9:25 ` [PATCH v2 06/34] staging: vc04_services: Add new vc-sm-cma driver Laurent Pinchart
2020-05-05  2:38   ` kbuild test robot
2020-05-05 12:17   ` kbuild test robot
2020-05-06  3:05   ` kbuild test robot
2020-05-06 18:04   ` Nicolas Saenz Julienne
2020-05-06 19:24     ` Dave Stevenson
2020-05-08  0:11       ` Laurent Pinchart
2020-05-18 12:06         ` Hans Verkuil
2020-08-24 16:39       ` Jacopo Mondi
2020-08-25 17:52         ` Dave Stevenson
2020-08-27 10:38           ` Jacopo Mondi
2020-08-27 12:51             ` Dave Stevenson
2020-08-27 16:46               ` Jacopo Mondi
2020-08-27 17:19                 ` Dave Stevenson
2020-05-11 18:42   ` Nicolas Saenz Julienne
2020-05-18 15:48     ` Dave Stevenson [this message]
2020-05-20 14:41       ` Nicolas Saenz Julienne
2020-05-21 11:01         ` Dave Stevenson
2020-05-04  9:25 ` [PATCH v2 07/34] staging: bcm2835: Break MMAL support out from camera Laurent Pinchart
2020-05-04  9:25 ` [PATCH v2 08/34] staging: mmal-vchiq: Allocate and free components as required Laurent Pinchart
2020-05-04  9:25 ` [PATCH v2 09/34] staging: mmal-vchiq: Avoid use of bool in structures Laurent Pinchart
2020-05-04  9:25 ` [PATCH v2 10/34] staging: mmal-vchiq: Make timeout a defined parameter Laurent Pinchart
2020-05-04  9:25 ` [PATCH v2 11/34] staging: mmal-vchiq: Make a mmal_buf struct for passing parameters Laurent Pinchart
2020-05-04  9:25 ` [PATCH v2 12/34] staging: mmal-vchiq: Add support for event callbacks Laurent Pinchart
2020-05-04  9:25 ` [PATCH v2 13/34] staging: mmal-vchiq: Support sending data to MMAL ports Laurent Pinchart
2020-05-04  9:25 ` [PATCH v2 14/34] staging: mmal-vchiq: Fixup vchiq-mmal include ordering Laurent Pinchart
2020-05-04  9:25 ` [PATCH v2 15/34] staging: mmal-vchiq: Use vc-sm-cma to support zero copy Laurent Pinchart
2020-05-04 16:30   ` Nicolas Saenz Julienne
2020-05-05 16:07   ` kbuild test robot
2020-05-11 19:15   ` Nicolas Saenz Julienne
2020-05-04  9:25 ` [PATCH v2 16/34] staging: mmal-vchiq: Fix client_component for 64 bit kernel Laurent Pinchart
2020-05-18 10:19   ` Hans Verkuil
2020-05-04  9:25 ` [PATCH v2 17/34] staging: mmal_vchiq: Add in the Bayer encoding formats Laurent Pinchart
2020-05-04  9:25 ` [PATCH v2 18/34] staging: mmal-vchiq: Always return the param size from param_get Laurent Pinchart
2020-05-04  9:25 ` [PATCH v2 19/34] staging: mmal-vchiq: If the VPU returns an error, don't negate it Laurent Pinchart
2020-05-04  9:25 ` [PATCH v2 20/34] staging: mmal-vchiq: Fix handling of VB2_MEMORY_DMABUF buffers Laurent Pinchart
2020-05-04  9:25 ` [PATCH v2 21/34] staging: mmal-vchiq: Update mmal_parameters.h with recently defined params Laurent Pinchart
2020-05-04  9:25 ` [PATCH v2 22/34] staging: mmal-vchiq: Free the event context for control ports Laurent Pinchart
2020-05-04  9:26 ` [PATCH v2 23/34] staging: mmal-vchiq: Fix memory leak in error path Laurent Pinchart
2020-05-04  9:26 ` [PATCH v2 24/34] staging: mmal-vchiq: Fix formatting errors in mmal_parameters.h Laurent Pinchart
2020-05-04  9:26 ` [PATCH v2 25/34] staging: vchiq_arm: Register vcsm-cma as a platform driver Laurent Pinchart
2020-05-04  9:26 ` [PATCH v2 26/34] staging: vchiq_arm: Set up dma ranges on child devices Laurent Pinchart
2020-05-04 16:54   ` Nicolas Saenz Julienne
2020-08-25 16:57     ` Jacopo Mondi
2020-05-04  9:26 ` [PATCH v2 27/34] staging: vchiq: Use the old dma controller for OF config on platform devices Laurent Pinchart
2020-05-04 15:44   ` Nicolas Saenz Julienne
2020-05-04  9:26 ` [PATCH v2 28/34] staging: vchiq_2835_arm: Implement a DMA pool for small bulk transfers Laurent Pinchart
2020-05-04  9:26 ` [PATCH v2 29/34] staging: vchiq: Add 36-bit address support Laurent Pinchart
2020-05-04 17:40   ` Nicolas Saenz Julienne
2020-05-04 20:46     ` Phil Elwell
2020-05-05 10:13       ` Nicolas Saenz Julienne
2020-05-05 10:57         ` Phil Elwell
2020-05-05 13:22   ` kbuild test robot
2020-05-04  9:26 ` [PATCH v2 30/34] staging: vchiq_arm: Give vchiq children DT nodes Laurent Pinchart
2020-05-04 17:12   ` Nicolas Saenz Julienne
2020-05-04 19:42     ` Phil Elwell
2020-05-05 10:37       ` Nicolas Saenz Julienne
2020-05-05 10:50         ` Phil Elwell
2020-05-04  9:26 ` [PATCH v2 31/34] staging: vchiq_arm: Add a matching unregister call Laurent Pinchart
2020-05-04  9:26 ` [PATCH v2 32/34] media: videobuf2: Allow exporting of a struct dmabuf Laurent Pinchart
2020-05-04 13:36   ` Hans Verkuil
2020-05-04  9:26 ` [PATCH v2 33/34] staging: bcm2835-isp: Add support for BC2835 ISP Laurent Pinchart
2020-05-11 19:19   ` Nicolas Saenz Julienne
2020-05-18 13:38     ` Dave Stevenson
2020-05-20 13:46       ` Nicolas Saenz Julienne
2020-05-18 12:02   ` Hans Verkuil
2020-05-18 14:36     ` Dave Stevenson
2020-05-18 15:07       ` Hans Verkuil
2020-05-19 12:47     ` Naushir Patuck
2020-05-19 13:03       ` Jacopo Mondi
2020-06-24 11:28       ` Hans Verkuil
2020-05-04  9:26 ` [PATCH v2 34/34] staging: vchiq: Load bcm2835_isp driver from vchiq Laurent Pinchart
2020-05-04 15:15 ` [PATCH v2 00/34] Drivers for the BCM283x CSI-2/CCP2 receiver and ISP Nicolas Saenz Julienne
2020-05-04 15:38   ` Laurent Pinchart
2020-05-04 16:15     ` Nicolas Saenz Julienne

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPY8ntCZsFtko4LMUsfSEUV9LwtJ9bdjXK4ZVJ3KFd18vzRp5A@mail.gmail.com \
    --to=dave.stevenson@raspberrypi.com \
    --cc=dave.stevenson@raspberrypi.org \
    --cc=jacopo@jmondi.org \
    --cc=kieran.bingham@ideasonboard.com \
    --cc=laurent.pinchart@ideasonboard.com \
    --cc=linux-media@vger.kernel.org \
    --cc=naush@raspberrypi.com \
    --cc=niklas.soderlund@ragnatech.se \
    --cc=nsaenzjulienne@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).