All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rob Clark <robdclark@gmail.com>
To: Lucas Stach <l.stach@pengutronix.de>
Cc: Tom Cooksey <tom.cooksey@arm.com>,
	linux-fbdev@vger.kernel.org, Pawel Moll <Pawel.Moll@arm.com>,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org,
	linux-arm-kernel@lists.infradead.org,
	linux-media@vger.kernel.org
Subject: Re: [Linaro-mm-sig] [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
Date: Tue, 6 Aug 2013 10:59:46 -0400	[thread overview]
Message-ID: <CAF6AEGsHB0day3mMFpp_bHWyZSKa7K_3DBaBrvqYGeia4iR7TA@mail.gmail.com> (raw)
In-Reply-To: <1375799770.4276.47.camel@weser.hi.pengutronix.de>

On Tue, Aug 6, 2013 at 10:36 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
> Am Dienstag, den 06.08.2013, 10:14 -0400 schrieb Rob Clark:
>> On Tue, Aug 6, 2013 at 8:18 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
>> > Am Dienstag, den 06.08.2013, 12:31 +0100 schrieb Tom Cooksey:
>> >> Hi Rob,
>> >>
>> >> +lkml
>> >>
>> >> > >> On Fri, Jul 26, 2013 at 11:58 AM, Tom Cooksey <tom.cooksey@arm.com>
>> >> > >> wrote:
>> >> > >> >> >  * It abuses flags parameter of DRM_IOCTL_MODE_CREATE_DUMB to
>> >> > >> >> >    also allocate buffers for the GPU. Still not sure how to
>> >> > >> >> >    resolve this as we don't use DRM for our GPU driver.
>> >> > >> >>
>> >> > >> >> any thoughts/plans about a DRM GPU driver?  Ideally long term
>> >> > >> >> (esp. once the dma-fence stuff is in place), we'd have
>> >> > >> >> gpu-specific drm (gpu-only, no kms) driver, and SoC/display
>> >> > >> >> specific drm/kms driver, using prime/dmabuf to share between
>> >> > >> >> the two.
>> >> > >> >
>> >> > >> > The "extra" buffers we were allocating from armsoc DDX were really
>> >> > >> > being allocated through DRM/GEM so we could get an flink name
>> >> > >> > for them and pass a reference to them back to our GPU driver on
>> >> > >> > the client side. If it weren't for our need to access those
>> >> > >> > extra off-screen buffers with the GPU we wouldn't need to
>> >> > >> > allocate them with DRM at all. So, given they are really "GPU"
>> >> > >> > buffers, it does absolutely make sense to allocate them in a
>> >> > >> > different driver to the display driver.
>> >> > >> >
>> >> > >> > However, to avoid unnecessary memcpys & related cache
>> >> > >> > maintenance ops, we'd also like the GPU to render into buffers
>> >> > >> > which are scanned out by the display controller. So let's say
>> >> > >> > we continue using DRM_IOCTL_MODE_CREATE_DUMB to allocate scan
>> >> > >> > out buffers with the display's DRM driver but a custom ioctl
>> >> > >> > on the GPU's DRM driver to allocate non scanout, off-screen
>> >> > >> > buffers. Sounds great, but I don't think that really works
>> >> > >> > with DRI2. If we used two drivers to allocate buffers, which
>> >> > >> > of those drivers do we return in DRI2ConnectReply? Even if we
>> >> > >> > solve that somehow, GEM flink names are name-spaced to a
>> >> > >> > single device node (AFAIK). So when we do a DRI2GetBuffers,
>> >> > >> > how does the EGL in the client know which DRM device owns GEM
>> >> > >> > flink name "1234"? We'd need some pretty dirty hacks.
>> >> > >>
>> >> > >> You would return the name of the display driver allocating the
>> >> > >> buffers.  On the client side you can use generic ioctls to go from
>> >> > >> flink -> handle -> dmabuf.  So the client side would end up opening
>> >> > >> both the display drm device and the gpu, but without needing to know
>> >> > >> too much about the display.
>> >> > >
>> >> > > I think the bit I was missing was that a GEM bo for a buffer imported
>> >> > > using dma_buf/PRIME can still be flink'd. So the display controller's
>> >> > > DRM driver allocates scan-out buffers via the DUMB buffer allocate
>> >> > > ioctl. Those scan-out buffers than then be exported from the
>> >> > > dispaly's DRM driver and imported into the GPU's DRM driver using
>> >> > > PRIME. Once imported into the GPU's driver, we can use flink to get a
>> >> > > name for that buffer within the GPU DRM driver's name-space to return
>> >> > > to the DRI2 client. That same namespace is also what DRI2 back-
>> >> > > buffers are allocated from, so I think that could work... Except...
>> >> >
>> >> > (and.. the general direction is that things will move more to just use
>> >> > dmabuf directly, ie. wayland or dri3)
>> >>
>> >> I agree, DRI2 is the only reason why we need a system-wide ID. I also
>> >> prefer buffers to be passed around by dma_buf fd, but we still need to
>> >> support DRI2 and will do for some time I expect.
>> >>
>> >>
>> >>
>> >> > >> > Anyway, that latter case also gets quite difficult. The "GPU"
>> >> > >> > DRM driver would need to know the constraints of the display
>> >> > >> > controller when allocating buffers intended to be scanned out.
>> >> > >> > For example, pl111 typically isn't behind an IOMMU and so
>> >> > >> > requires physically contiguous memory. We'd have to teach the
>> >> > >> > GPU's DRM driver about the constraints of the display HW. Not
>> >> > >> > exactly a clean driver model. :-(
>> >> > >> >
>> >> > >> > I'm still a little stuck on how to proceed, so any ideas
>> >> > >> > would greatly appreciated! My current train of thought is
>> >> > >> > having a kind of SoC-specific DRM driver which allocates
>> >> > >> > buffers for both display and GPU within a single GEM
>> >> > >> > namespace. That SoC-specific DRM driver could then know the
>> >> > >> > constraints of both the GPU and the display HW. We could then
>> >> > >> > use PRIME to export buffers allocated with the SoC DRM driver
>> >> > >> > and import them into the GPU and/or display DRM driver.
>> >> > >>
>> >> > >> Usually if the display drm driver is allocating the buffers that
>> >> > >> might be scanned out, it just needs to have minimal knowledge of
>> >> > >> the GPU (pitch alignment constraints).  I don't think we need a
>> >> > >> 3rd device just to allocate buffers.
>> >> > >
>> >> > > While Mali can render to pretty much any buffer, there is a mild
>> >> > > performance improvement to be had if the buffer stride is aligned to
>> >> > > the AXI bus's max burst length when drawing to the buffer.
>> >> >
>> >> > I suspect the display controllers might frequently benefit if the
>> >> > pitch is aligned to AXI burst length too..
>> >>
>> >> If the display controller is going to be reading from linear memory
>> >> I don't think it will make much difference - you'll just get an extra
>> >> 1-2 bus transactions per scanline. With a tile-based GPU like Mali,
>> >> you get those extra transactions per _tile_ scan-line and as such,
>> >> the overhead is more pronounced.
>> >>
>> >>
>> >>
>> >> > > So in some respects, there is a constraint on how buffers which will
>> >> > > be drawn to using the GPU are allocated. I don't really like the idea
>> >> > > of teaching the display controller DRM driver about the GPU buffer
>> >> > > constraints, even if they are fairly trivial like this. If the same
>> >> > > display HW IP is being used on several SoCs, it seems wrong somehow
>> >> > > to enforce those GPU constraints if some of those SoCs don't have a
>> >> > > GPU.
>> >> >
>> >> > Well, I suppose you could get min_pitch_alignment from devicetree, or
>> >> > something like this..
>> >> >
>> >> > In the end, the easy solution is just to make the display allocate to
>> >> > the worst-case pitch alignment.  In the early days of dma-buf
>> >> > discussions, we kicked around the idea of negotiating or
>> >> > programatically describing the constraints, but that didn't really
>> >> > seem like a bounded problem.
>> >>
>> >> Yeah - I was around for some of those discussions and agree it's not
>> >> really an easy problem to solve.
>> >>
>> >>
>> >>
>> >> > > We may also then have additional constraints when sharing buffers
>> >> > > between the display HW and video decode or even camera ISP HW.
>> >> > > Programmatically describing buffer allocation constraints is very
>> >> > > difficult and I'm not sure you can actually do it - there's some
>> >> > > pretty complex constraints out there! E.g. I believe there's a
>> >> > > platform where Y and UV planes of the reference frame need to be in
>> >> > > separate DRAM banks for real-time 1080p decode, or something like
>> >> > > that?
>> >> >
>> >> > yes, this was discussed.  This is different from pitch/format/size
>> >> > constraints.. it is really just a placement constraint (ie. where do
>> >> > the physical pages go).  IIRC the conclusion was to use a dummy
>> >> > devices with it's own CMA pool for attaching the Y vs UV buffers.
>> >> >
>> >> > > Anyway, I guess my point is that even if we solve how to allocate
>> >> > > buffers which will be shared between the GPU and display HW such that
>> >> > > both sets of constraints are satisfied, that may not be the end of
>> >> > > the story.
>> >> > >
>> >> >
>> >> > that was part of the reason to punt this problem to userspace ;-)
>> >> >
>> >> > In practice, the kernel drivers doesn't usually know too much about
>> >> > the dimensions/format/etc.. that is really userspace level knowledge.
>> >> > There are a few exceptions when the kernel needs to know how to setup
>> >> > GTT/etc for tiled buffers, but normally this sort of information is up
>> >> > at the next level up (userspace, and drm_framebuffer in case of
>> >> > scanout).  Userspace media frameworks like GStreamer already have a
>> >> > concept of format/caps negotiation.  For non-display<->gpu sharing, I
>> >> > think this is probably where this sort of constraint negotiation
>> >> > should be handled.
>> >>
>> >> I agree that user-space will know which devices will access the buffer
>> >> and thus can figure out at least a common pixel format. Though I'm not
>> >> so sure userspace can figure out more low-level details like alignment
>> >> and placement in physical memory, etc.
>> >>
>> >> Anyway, assuming user-space can figure out how a buffer should be
>> >> stored in memory, how does it indicate this to a kernel driver and
>> >> actually allocate it? Which ioctl on which device does user-space
>> >> call, with what parameters? Are you suggesting using something like
>> >> ION which exposes the low-level details of how buffers are laid out in
>> >> physical memory to userspace? If not, what?
>> >>
>> >
>> > I strongly disagree with exposing low-level hardware details like tiling
>> > to userspace. If we have to do the negotiation of those things in
>> > userspace we will end up with having to pipe those information through
>> > things like the wayland protocol. I don't see how this could ever be
>> > considered a good idea.
>>
>> well, unless userspace mmap's via a de-tiling gart type thing, I don't
>> think tiling can be invisible to userspace.
>>
> Why is mmap considered to be such a strong use-case for DMABUFs? After
> all we are trying to _avoid_ mmapping shared buffers where ever
> possible.

I don't know if I'd call it a strong use-case, as much as a good
worst-case to consider

>> But if two GPU's have some overlap in supportable tiled formats, and
>> you have one gpu doing app and one doing compositor, and you want to
>> use tiling for the shared buffers, you need something in the wayland
>> protocol to figure out what the common supported formats are between
>> the two sides.  I suppose it shouldn't be too hard to add a
>> standardized (cross-driver) format-negotiation protocol, and in the
>> absence of that fallback to non tiled format for shared buffers.
>>
> I don't see how tiling format negotiation would be easier in userspace
> than in the kernel. If we can come up with a scheme for that, we can as
> well do it in the kernel.

well.. I guess I don't see how it would be easier in the kernel than
in userspace ;-)

But if you can come up with a good way to add it to dev->dma_params
that everyone can agree on, I don't mind.  I suppose I'm starting with
the assumption that it will be difficult/impossible to get everyone to
agree on this, but I don't mind being proved wrong.

>> > I would rather see kernel drivers negotiating those things at dmabuf
>> > attach time in way invisible to userspace. I agree that this negotiation
>> > thing isn't easy to get right for the plethora of different hardware
>> > constraints we see today, but I would rather see this in-kernel, where
>> > we have the chance to fix things up if needed, than in a fixed userspace
>> > interface.
>>
>> Well, if you can think of a sane way to add that to dev->dma_params,
>> and if it isn't visible if userspace mmap's a buffer, then we could
>> handle that in the kernel.  But I don't think that will be the case.
>>
> I'm sure we can come up with something sane to put it in there. The
> userspace mmap thing is a bit complicated to deal with, but I have the
> feeling that going through a slow path if you really need mmap is a
> reasonable thing to do.
> For example we could just make mmap some form of attach that forces
> linear layout if the exporter isn't able to give userspace a linear
> mapping from a tiled buffer by using GART or VM.

I guess I don't object to mmap being a slow path (as long is it
doesn't end up requiring a slow path on hw where this otherwise
wouldn't be needed).

BR,
-R

> Regards,
> Lucas
> --
> Pengutronix e.K.                           | Lucas Stach                 |
> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
> Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
>

WARNING: multiple messages have this Message-ID (diff)
From: Rob Clark <robdclark@gmail.com>
To: Lucas Stach <l.stach@pengutronix.de>
Cc: Tom Cooksey <tom.cooksey@arm.com>,
	linux-fbdev@vger.kernel.org, Pawel Moll <Pawel.Moll@arm.com>,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org,
	linux-arm-kernel@lists.infradead.org,
	linux-media@vger.kernel.org
Subject: Re: [Linaro-mm-sig] [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
Date: Tue, 06 Aug 2013 14:59:46 +0000	[thread overview]
Message-ID: <CAF6AEGsHB0day3mMFpp_bHWyZSKa7K_3DBaBrvqYGeia4iR7TA@mail.gmail.com> (raw)
In-Reply-To: <1375799770.4276.47.camel@weser.hi.pengutronix.de>

On Tue, Aug 6, 2013 at 10:36 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
> Am Dienstag, den 06.08.2013, 10:14 -0400 schrieb Rob Clark:
>> On Tue, Aug 6, 2013 at 8:18 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
>> > Am Dienstag, den 06.08.2013, 12:31 +0100 schrieb Tom Cooksey:
>> >> Hi Rob,
>> >>
>> >> +lkml
>> >>
>> >> > >> On Fri, Jul 26, 2013 at 11:58 AM, Tom Cooksey <tom.cooksey@arm.com>
>> >> > >> wrote:
>> >> > >> >> >  * It abuses flags parameter of DRM_IOCTL_MODE_CREATE_DUMB to
>> >> > >> >> >    also allocate buffers for the GPU. Still not sure how to
>> >> > >> >> >    resolve this as we don't use DRM for our GPU driver.
>> >> > >> >>
>> >> > >> >> any thoughts/plans about a DRM GPU driver?  Ideally long term
>> >> > >> >> (esp. once the dma-fence stuff is in place), we'd have
>> >> > >> >> gpu-specific drm (gpu-only, no kms) driver, and SoC/display
>> >> > >> >> specific drm/kms driver, using prime/dmabuf to share between
>> >> > >> >> the two.
>> >> > >> >
>> >> > >> > The "extra" buffers we were allocating from armsoc DDX were really
>> >> > >> > being allocated through DRM/GEM so we could get an flink name
>> >> > >> > for them and pass a reference to them back to our GPU driver on
>> >> > >> > the client side. If it weren't for our need to access those
>> >> > >> > extra off-screen buffers with the GPU we wouldn't need to
>> >> > >> > allocate them with DRM at all. So, given they are really "GPU"
>> >> > >> > buffers, it does absolutely make sense to allocate them in a
>> >> > >> > different driver to the display driver.
>> >> > >> >
>> >> > >> > However, to avoid unnecessary memcpys & related cache
>> >> > >> > maintenance ops, we'd also like the GPU to render into buffers
>> >> > >> > which are scanned out by the display controller. So let's say
>> >> > >> > we continue using DRM_IOCTL_MODE_CREATE_DUMB to allocate scan
>> >> > >> > out buffers with the display's DRM driver but a custom ioctl
>> >> > >> > on the GPU's DRM driver to allocate non scanout, off-screen
>> >> > >> > buffers. Sounds great, but I don't think that really works
>> >> > >> > with DRI2. If we used two drivers to allocate buffers, which
>> >> > >> > of those drivers do we return in DRI2ConnectReply? Even if we
>> >> > >> > solve that somehow, GEM flink names are name-spaced to a
>> >> > >> > single device node (AFAIK). So when we do a DRI2GetBuffers,
>> >> > >> > how does the EGL in the client know which DRM device owns GEM
>> >> > >> > flink name "1234"? We'd need some pretty dirty hacks.
>> >> > >>
>> >> > >> You would return the name of the display driver allocating the
>> >> > >> buffers.  On the client side you can use generic ioctls to go from
>> >> > >> flink -> handle -> dmabuf.  So the client side would end up opening
>> >> > >> both the display drm device and the gpu, but without needing to know
>> >> > >> too much about the display.
>> >> > >
>> >> > > I think the bit I was missing was that a GEM bo for a buffer imported
>> >> > > using dma_buf/PRIME can still be flink'd. So the display controller's
>> >> > > DRM driver allocates scan-out buffers via the DUMB buffer allocate
>> >> > > ioctl. Those scan-out buffers than then be exported from the
>> >> > > dispaly's DRM driver and imported into the GPU's DRM driver using
>> >> > > PRIME. Once imported into the GPU's driver, we can use flink to get a
>> >> > > name for that buffer within the GPU DRM driver's name-space to return
>> >> > > to the DRI2 client. That same namespace is also what DRI2 back-
>> >> > > buffers are allocated from, so I think that could work... Except...
>> >> >
>> >> > (and.. the general direction is that things will move more to just use
>> >> > dmabuf directly, ie. wayland or dri3)
>> >>
>> >> I agree, DRI2 is the only reason why we need a system-wide ID. I also
>> >> prefer buffers to be passed around by dma_buf fd, but we still need to
>> >> support DRI2 and will do for some time I expect.
>> >>
>> >>
>> >>
>> >> > >> > Anyway, that latter case also gets quite difficult. The "GPU"
>> >> > >> > DRM driver would need to know the constraints of the display
>> >> > >> > controller when allocating buffers intended to be scanned out.
>> >> > >> > For example, pl111 typically isn't behind an IOMMU and so
>> >> > >> > requires physically contiguous memory. We'd have to teach the
>> >> > >> > GPU's DRM driver about the constraints of the display HW. Not
>> >> > >> > exactly a clean driver model. :-(
>> >> > >> >
>> >> > >> > I'm still a little stuck on how to proceed, so any ideas
>> >> > >> > would greatly appreciated! My current train of thought is
>> >> > >> > having a kind of SoC-specific DRM driver which allocates
>> >> > >> > buffers for both display and GPU within a single GEM
>> >> > >> > namespace. That SoC-specific DRM driver could then know the
>> >> > >> > constraints of both the GPU and the display HW. We could then
>> >> > >> > use PRIME to export buffers allocated with the SoC DRM driver
>> >> > >> > and import them into the GPU and/or display DRM driver.
>> >> > >>
>> >> > >> Usually if the display drm driver is allocating the buffers that
>> >> > >> might be scanned out, it just needs to have minimal knowledge of
>> >> > >> the GPU (pitch alignment constraints).  I don't think we need a
>> >> > >> 3rd device just to allocate buffers.
>> >> > >
>> >> > > While Mali can render to pretty much any buffer, there is a mild
>> >> > > performance improvement to be had if the buffer stride is aligned to
>> >> > > the AXI bus's max burst length when drawing to the buffer.
>> >> >
>> >> > I suspect the display controllers might frequently benefit if the
>> >> > pitch is aligned to AXI burst length too..
>> >>
>> >> If the display controller is going to be reading from linear memory
>> >> I don't think it will make much difference - you'll just get an extra
>> >> 1-2 bus transactions per scanline. With a tile-based GPU like Mali,
>> >> you get those extra transactions per _tile_ scan-line and as such,
>> >> the overhead is more pronounced.
>> >>
>> >>
>> >>
>> >> > > So in some respects, there is a constraint on how buffers which will
>> >> > > be drawn to using the GPU are allocated. I don't really like the idea
>> >> > > of teaching the display controller DRM driver about the GPU buffer
>> >> > > constraints, even if they are fairly trivial like this. If the same
>> >> > > display HW IP is being used on several SoCs, it seems wrong somehow
>> >> > > to enforce those GPU constraints if some of those SoCs don't have a
>> >> > > GPU.
>> >> >
>> >> > Well, I suppose you could get min_pitch_alignment from devicetree, or
>> >> > something like this..
>> >> >
>> >> > In the end, the easy solution is just to make the display allocate to
>> >> > the worst-case pitch alignment.  In the early days of dma-buf
>> >> > discussions, we kicked around the idea of negotiating or
>> >> > programatically describing the constraints, but that didn't really
>> >> > seem like a bounded problem.
>> >>
>> >> Yeah - I was around for some of those discussions and agree it's not
>> >> really an easy problem to solve.
>> >>
>> >>
>> >>
>> >> > > We may also then have additional constraints when sharing buffers
>> >> > > between the display HW and video decode or even camera ISP HW.
>> >> > > Programmatically describing buffer allocation constraints is very
>> >> > > difficult and I'm not sure you can actually do it - there's some
>> >> > > pretty complex constraints out there! E.g. I believe there's a
>> >> > > platform where Y and UV planes of the reference frame need to be in
>> >> > > separate DRAM banks for real-time 1080p decode, or something like
>> >> > > that?
>> >> >
>> >> > yes, this was discussed.  This is different from pitch/format/size
>> >> > constraints.. it is really just a placement constraint (ie. where do
>> >> > the physical pages go).  IIRC the conclusion was to use a dummy
>> >> > devices with it's own CMA pool for attaching the Y vs UV buffers.
>> >> >
>> >> > > Anyway, I guess my point is that even if we solve how to allocate
>> >> > > buffers which will be shared between the GPU and display HW such that
>> >> > > both sets of constraints are satisfied, that may not be the end of
>> >> > > the story.
>> >> > >
>> >> >
>> >> > that was part of the reason to punt this problem to userspace ;-)
>> >> >
>> >> > In practice, the kernel drivers doesn't usually know too much about
>> >> > the dimensions/format/etc.. that is really userspace level knowledge.
>> >> > There are a few exceptions when the kernel needs to know how to setup
>> >> > GTT/etc for tiled buffers, but normally this sort of information is up
>> >> > at the next level up (userspace, and drm_framebuffer in case of
>> >> > scanout).  Userspace media frameworks like GStreamer already have a
>> >> > concept of format/caps negotiation.  For non-display<->gpu sharing, I
>> >> > think this is probably where this sort of constraint negotiation
>> >> > should be handled.
>> >>
>> >> I agree that user-space will know which devices will access the buffer
>> >> and thus can figure out at least a common pixel format. Though I'm not
>> >> so sure userspace can figure out more low-level details like alignment
>> >> and placement in physical memory, etc.
>> >>
>> >> Anyway, assuming user-space can figure out how a buffer should be
>> >> stored in memory, how does it indicate this to a kernel driver and
>> >> actually allocate it? Which ioctl on which device does user-space
>> >> call, with what parameters? Are you suggesting using something like
>> >> ION which exposes the low-level details of how buffers are laid out in
>> >> physical memory to userspace? If not, what?
>> >>
>> >
>> > I strongly disagree with exposing low-level hardware details like tiling
>> > to userspace. If we have to do the negotiation of those things in
>> > userspace we will end up with having to pipe those information through
>> > things like the wayland protocol. I don't see how this could ever be
>> > considered a good idea.
>>
>> well, unless userspace mmap's via a de-tiling gart type thing, I don't
>> think tiling can be invisible to userspace.
>>
> Why is mmap considered to be such a strong use-case for DMABUFs? After
> all we are trying to _avoid_ mmapping shared buffers where ever
> possible.

I don't know if I'd call it a strong use-case, as much as a good
worst-case to consider

>> But if two GPU's have some overlap in supportable tiled formats, and
>> you have one gpu doing app and one doing compositor, and you want to
>> use tiling for the shared buffers, you need something in the wayland
>> protocol to figure out what the common supported formats are between
>> the two sides.  I suppose it shouldn't be too hard to add a
>> standardized (cross-driver) format-negotiation protocol, and in the
>> absence of that fallback to non tiled format for shared buffers.
>>
> I don't see how tiling format negotiation would be easier in userspace
> than in the kernel. If we can come up with a scheme for that, we can as
> well do it in the kernel.

well.. I guess I don't see how it would be easier in the kernel than
in userspace ;-)

But if you can come up with a good way to add it to dev->dma_params
that everyone can agree on, I don't mind.  I suppose I'm starting with
the assumption that it will be difficult/impossible to get everyone to
agree on this, but I don't mind being proved wrong.

>> > I would rather see kernel drivers negotiating those things at dmabuf
>> > attach time in way invisible to userspace. I agree that this negotiation
>> > thing isn't easy to get right for the plethora of different hardware
>> > constraints we see today, but I would rather see this in-kernel, where
>> > we have the chance to fix things up if needed, than in a fixed userspace
>> > interface.
>>
>> Well, if you can think of a sane way to add that to dev->dma_params,
>> and if it isn't visible if userspace mmap's a buffer, then we could
>> handle that in the kernel.  But I don't think that will be the case.
>>
> I'm sure we can come up with something sane to put it in there. The
> userspace mmap thing is a bit complicated to deal with, but I have the
> feeling that going through a slow path if you really need mmap is a
> reasonable thing to do.
> For example we could just make mmap some form of attach that forces
> linear layout if the exporter isn't able to give userspace a linear
> mapping from a tiled buffer by using GART or VM.

I guess I don't object to mmap being a slow path (as long is it
doesn't end up requiring a slow path on hw where this otherwise
wouldn't be needed).

BR,
-R

> Regards,
> Lucas
> --
> Pengutronix e.K.                           | Lucas Stach                 |
> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
> Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
>

WARNING: multiple messages have this Message-ID (diff)
From: robdclark@gmail.com (Rob Clark)
To: linux-arm-kernel@lists.infradead.org
Subject: [Linaro-mm-sig] [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
Date: Tue, 6 Aug 2013 10:59:46 -0400	[thread overview]
Message-ID: <CAF6AEGsHB0day3mMFpp_bHWyZSKa7K_3DBaBrvqYGeia4iR7TA@mail.gmail.com> (raw)
In-Reply-To: <1375799770.4276.47.camel@weser.hi.pengutronix.de>

On Tue, Aug 6, 2013 at 10:36 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
> Am Dienstag, den 06.08.2013, 10:14 -0400 schrieb Rob Clark:
>> On Tue, Aug 6, 2013 at 8:18 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
>> > Am Dienstag, den 06.08.2013, 12:31 +0100 schrieb Tom Cooksey:
>> >> Hi Rob,
>> >>
>> >> +lkml
>> >>
>> >> > >> On Fri, Jul 26, 2013 at 11:58 AM, Tom Cooksey <tom.cooksey@arm.com>
>> >> > >> wrote:
>> >> > >> >> >  * It abuses flags parameter of DRM_IOCTL_MODE_CREATE_DUMB to
>> >> > >> >> >    also allocate buffers for the GPU. Still not sure how to
>> >> > >> >> >    resolve this as we don't use DRM for our GPU driver.
>> >> > >> >>
>> >> > >> >> any thoughts/plans about a DRM GPU driver?  Ideally long term
>> >> > >> >> (esp. once the dma-fence stuff is in place), we'd have
>> >> > >> >> gpu-specific drm (gpu-only, no kms) driver, and SoC/display
>> >> > >> >> specific drm/kms driver, using prime/dmabuf to share between
>> >> > >> >> the two.
>> >> > >> >
>> >> > >> > The "extra" buffers we were allocating from armsoc DDX were really
>> >> > >> > being allocated through DRM/GEM so we could get an flink name
>> >> > >> > for them and pass a reference to them back to our GPU driver on
>> >> > >> > the client side. If it weren't for our need to access those
>> >> > >> > extra off-screen buffers with the GPU we wouldn't need to
>> >> > >> > allocate them with DRM at all. So, given they are really "GPU"
>> >> > >> > buffers, it does absolutely make sense to allocate them in a
>> >> > >> > different driver to the display driver.
>> >> > >> >
>> >> > >> > However, to avoid unnecessary memcpys & related cache
>> >> > >> > maintenance ops, we'd also like the GPU to render into buffers
>> >> > >> > which are scanned out by the display controller. So let's say
>> >> > >> > we continue using DRM_IOCTL_MODE_CREATE_DUMB to allocate scan
>> >> > >> > out buffers with the display's DRM driver but a custom ioctl
>> >> > >> > on the GPU's DRM driver to allocate non scanout, off-screen
>> >> > >> > buffers. Sounds great, but I don't think that really works
>> >> > >> > with DRI2. If we used two drivers to allocate buffers, which
>> >> > >> > of those drivers do we return in DRI2ConnectReply? Even if we
>> >> > >> > solve that somehow, GEM flink names are name-spaced to a
>> >> > >> > single device node (AFAIK). So when we do a DRI2GetBuffers,
>> >> > >> > how does the EGL in the client know which DRM device owns GEM
>> >> > >> > flink name "1234"? We'd need some pretty dirty hacks.
>> >> > >>
>> >> > >> You would return the name of the display driver allocating the
>> >> > >> buffers.  On the client side you can use generic ioctls to go from
>> >> > >> flink -> handle -> dmabuf.  So the client side would end up opening
>> >> > >> both the display drm device and the gpu, but without needing to know
>> >> > >> too much about the display.
>> >> > >
>> >> > > I think the bit I was missing was that a GEM bo for a buffer imported
>> >> > > using dma_buf/PRIME can still be flink'd. So the display controller's
>> >> > > DRM driver allocates scan-out buffers via the DUMB buffer allocate
>> >> > > ioctl. Those scan-out buffers than then be exported from the
>> >> > > dispaly's DRM driver and imported into the GPU's DRM driver using
>> >> > > PRIME. Once imported into the GPU's driver, we can use flink to get a
>> >> > > name for that buffer within the GPU DRM driver's name-space to return
>> >> > > to the DRI2 client. That same namespace is also what DRI2 back-
>> >> > > buffers are allocated from, so I think that could work... Except...
>> >> >
>> >> > (and.. the general direction is that things will move more to just use
>> >> > dmabuf directly, ie. wayland or dri3)
>> >>
>> >> I agree, DRI2 is the only reason why we need a system-wide ID. I also
>> >> prefer buffers to be passed around by dma_buf fd, but we still need to
>> >> support DRI2 and will do for some time I expect.
>> >>
>> >>
>> >>
>> >> > >> > Anyway, that latter case also gets quite difficult. The "GPU"
>> >> > >> > DRM driver would need to know the constraints of the display
>> >> > >> > controller when allocating buffers intended to be scanned out.
>> >> > >> > For example, pl111 typically isn't behind an IOMMU and so
>> >> > >> > requires physically contiguous memory. We'd have to teach the
>> >> > >> > GPU's DRM driver about the constraints of the display HW. Not
>> >> > >> > exactly a clean driver model. :-(
>> >> > >> >
>> >> > >> > I'm still a little stuck on how to proceed, so any ideas
>> >> > >> > would greatly appreciated! My current train of thought is
>> >> > >> > having a kind of SoC-specific DRM driver which allocates
>> >> > >> > buffers for both display and GPU within a single GEM
>> >> > >> > namespace. That SoC-specific DRM driver could then know the
>> >> > >> > constraints of both the GPU and the display HW. We could then
>> >> > >> > use PRIME to export buffers allocated with the SoC DRM driver
>> >> > >> > and import them into the GPU and/or display DRM driver.
>> >> > >>
>> >> > >> Usually if the display drm driver is allocating the buffers that
>> >> > >> might be scanned out, it just needs to have minimal knowledge of
>> >> > >> the GPU (pitch alignment constraints).  I don't think we need a
>> >> > >> 3rd device just to allocate buffers.
>> >> > >
>> >> > > While Mali can render to pretty much any buffer, there is a mild
>> >> > > performance improvement to be had if the buffer stride is aligned to
>> >> > > the AXI bus's max burst length when drawing to the buffer.
>> >> >
>> >> > I suspect the display controllers might frequently benefit if the
>> >> > pitch is aligned to AXI burst length too..
>> >>
>> >> If the display controller is going to be reading from linear memory
>> >> I don't think it will make much difference - you'll just get an extra
>> >> 1-2 bus transactions per scanline. With a tile-based GPU like Mali,
>> >> you get those extra transactions per _tile_ scan-line and as such,
>> >> the overhead is more pronounced.
>> >>
>> >>
>> >>
>> >> > > So in some respects, there is a constraint on how buffers which will
>> >> > > be drawn to using the GPU are allocated. I don't really like the idea
>> >> > > of teaching the display controller DRM driver about the GPU buffer
>> >> > > constraints, even if they are fairly trivial like this. If the same
>> >> > > display HW IP is being used on several SoCs, it seems wrong somehow
>> >> > > to enforce those GPU constraints if some of those SoCs don't have a
>> >> > > GPU.
>> >> >
>> >> > Well, I suppose you could get min_pitch_alignment from devicetree, or
>> >> > something like this..
>> >> >
>> >> > In the end, the easy solution is just to make the display allocate to
>> >> > the worst-case pitch alignment.  In the early days of dma-buf
>> >> > discussions, we kicked around the idea of negotiating or
>> >> > programatically describing the constraints, but that didn't really
>> >> > seem like a bounded problem.
>> >>
>> >> Yeah - I was around for some of those discussions and agree it's not
>> >> really an easy problem to solve.
>> >>
>> >>
>> >>
>> >> > > We may also then have additional constraints when sharing buffers
>> >> > > between the display HW and video decode or even camera ISP HW.
>> >> > > Programmatically describing buffer allocation constraints is very
>> >> > > difficult and I'm not sure you can actually do it - there's some
>> >> > > pretty complex constraints out there! E.g. I believe there's a
>> >> > > platform where Y and UV planes of the reference frame need to be in
>> >> > > separate DRAM banks for real-time 1080p decode, or something like
>> >> > > that?
>> >> >
>> >> > yes, this was discussed.  This is different from pitch/format/size
>> >> > constraints.. it is really just a placement constraint (ie. where do
>> >> > the physical pages go).  IIRC the conclusion was to use a dummy
>> >> > devices with it's own CMA pool for attaching the Y vs UV buffers.
>> >> >
>> >> > > Anyway, I guess my point is that even if we solve how to allocate
>> >> > > buffers which will be shared between the GPU and display HW such that
>> >> > > both sets of constraints are satisfied, that may not be the end of
>> >> > > the story.
>> >> > >
>> >> >
>> >> > that was part of the reason to punt this problem to userspace ;-)
>> >> >
>> >> > In practice, the kernel drivers doesn't usually know too much about
>> >> > the dimensions/format/etc.. that is really userspace level knowledge.
>> >> > There are a few exceptions when the kernel needs to know how to setup
>> >> > GTT/etc for tiled buffers, but normally this sort of information is up
>> >> > at the next level up (userspace, and drm_framebuffer in case of
>> >> > scanout).  Userspace media frameworks like GStreamer already have a
>> >> > concept of format/caps negotiation.  For non-display<->gpu sharing, I
>> >> > think this is probably where this sort of constraint negotiation
>> >> > should be handled.
>> >>
>> >> I agree that user-space will know which devices will access the buffer
>> >> and thus can figure out at least a common pixel format. Though I'm not
>> >> so sure userspace can figure out more low-level details like alignment
>> >> and placement in physical memory, etc.
>> >>
>> >> Anyway, assuming user-space can figure out how a buffer should be
>> >> stored in memory, how does it indicate this to a kernel driver and
>> >> actually allocate it? Which ioctl on which device does user-space
>> >> call, with what parameters? Are you suggesting using something like
>> >> ION which exposes the low-level details of how buffers are laid out in
>> >> physical memory to userspace? If not, what?
>> >>
>> >
>> > I strongly disagree with exposing low-level hardware details like tiling
>> > to userspace. If we have to do the negotiation of those things in
>> > userspace we will end up with having to pipe those information through
>> > things like the wayland protocol. I don't see how this could ever be
>> > considered a good idea.
>>
>> well, unless userspace mmap's via a de-tiling gart type thing, I don't
>> think tiling can be invisible to userspace.
>>
> Why is mmap considered to be such a strong use-case for DMABUFs? After
> all we are trying to _avoid_ mmapping shared buffers where ever
> possible.

I don't know if I'd call it a strong use-case, as much as a good
worst-case to consider

>> But if two GPU's have some overlap in supportable tiled formats, and
>> you have one gpu doing app and one doing compositor, and you want to
>> use tiling for the shared buffers, you need something in the wayland
>> protocol to figure out what the common supported formats are between
>> the two sides.  I suppose it shouldn't be too hard to add a
>> standardized (cross-driver) format-negotiation protocol, and in the
>> absence of that fallback to non tiled format for shared buffers.
>>
> I don't see how tiling format negotiation would be easier in userspace
> than in the kernel. If we can come up with a scheme for that, we can as
> well do it in the kernel.

well.. I guess I don't see how it would be easier in the kernel than
in userspace ;-)

But if you can come up with a good way to add it to dev->dma_params
that everyone can agree on, I don't mind.  I suppose I'm starting with
the assumption that it will be difficult/impossible to get everyone to
agree on this, but I don't mind being proved wrong.

>> > I would rather see kernel drivers negotiating those things at dmabuf
>> > attach time in way invisible to userspace. I agree that this negotiation
>> > thing isn't easy to get right for the plethora of different hardware
>> > constraints we see today, but I would rather see this in-kernel, where
>> > we have the chance to fix things up if needed, than in a fixed userspace
>> > interface.
>>
>> Well, if you can think of a sane way to add that to dev->dma_params,
>> and if it isn't visible if userspace mmap's a buffer, then we could
>> handle that in the kernel.  But I don't think that will be the case.
>>
> I'm sure we can come up with something sane to put it in there. The
> userspace mmap thing is a bit complicated to deal with, but I have the
> feeling that going through a slow path if you really need mmap is a
> reasonable thing to do.
> For example we could just make mmap some form of attach that forces
> linear layout if the exporter isn't able to give userspace a linear
> mapping from a tiled buffer by using GART or VM.

I guess I don't object to mmap being a slow path (as long is it
doesn't end up requiring a slow path on hw where this otherwise
wouldn't be needed).

BR,
-R

> Regards,
> Lucas
> --
> Pengutronix e.K.                           | Lucas Stach                 |
> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
> Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
>

  reply	other threads:[~2013-08-06 14:59 UTC|newest]

Thread overview: 131+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-25 17:17 [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111 tom.cooksey
2013-07-25 17:17 ` tom.cooksey
2013-07-25 17:17 ` tom.cooksey at arm.com
2013-07-25 17:17 ` [RFC 1/1] " tom.cooksey
2013-08-07 16:17   ` Rob Clark
2013-08-07 16:46     ` Daniel Vetter
2013-08-07 16:46       ` Daniel Vetter
2013-08-07 16:46       ` Daniel Vetter
2013-08-09 16:15       ` Tom Cooksey
2013-08-09 16:15       ` Tom Cooksey
2013-08-09 16:15       ` Tom Cooksey
     [not found]       ` <520515b0.88b70e0a.3ecd.1004SMTPIN_ADDED_BROKEN@mx.google.com>
2013-08-09 16:34         ` Rob Clark
2013-08-09 16:34           ` Rob Clark
2013-08-09 16:34           ` Rob Clark
2013-08-09 16:57           ` Daniel Vetter
2013-08-09 16:57             ` Daniel Vetter
2013-08-09 16:57             ` Daniel Vetter
2013-08-09 17:31             ` Tom Cooksey
2013-08-13 14:35               ` Tom Cooksey
2013-08-13 14:35               ` Tom Cooksey
2013-08-13 14:35               ` Tom Cooksey
2013-08-09 17:31             ` Tom Cooksey
2013-08-09 17:31             ` Tom Cooksey
     [not found]             ` <5205277e.84320f0a.1cdf.ffff8816SMTPIN_ADDED_BROKEN@mx.google.com>
2013-08-10 12:30               ` Rob Clark
2013-08-10 12:30                 ` Rob Clark
2013-08-10 12:30                 ` Rob Clark
     [not found]             ` <520a4435.070a0e0a.4fce.ffff9731SMTPIN_ADDED_BROKEN@mx.google.com>
2013-08-13 14:58               ` Rob Clark
2013-08-13 14:58                 ` Rob Clark
2013-08-13 14:58                 ` Rob Clark
2013-08-14 14:59                 ` Tom Cooksey
2013-08-14 14:59                 ` Tom Cooksey
2013-08-14 14:59                 ` Tom Cooksey
2013-08-10  4:56       ` Inki Dae
2013-07-25 18:21 ` [RFC 0/1] " Rob Clark
2013-07-25 18:21   ` Rob Clark
2013-07-25 18:21   ` Rob Clark
2013-07-26 14:06   ` Pawel Moll
2013-07-26 14:06     ` Pawel Moll
2013-07-26 14:06     ` Pawel Moll
2013-07-26 14:21     ` Rob Clark
2013-07-26 14:21       ` Rob Clark
2013-07-26 14:21       ` Rob Clark
2013-07-26 14:41       ` Pawel Moll
2013-07-26 14:41         ` Pawel Moll
2013-07-26 14:41         ` Pawel Moll
2013-07-26 14:14   ` Russell King - ARM Linux
2013-07-26 14:14     ` Russell King - ARM Linux
2013-07-26 14:14     ` Russell King - ARM Linux
2013-07-26 14:24     ` Rob Clark
2013-07-26 14:24       ` Rob Clark
2013-07-26 14:24       ` Rob Clark
2013-07-26 15:58   ` Tom Cooksey
2013-07-26 15:58   ` Tom Cooksey
2013-07-26 15:58   ` Tom Cooksey
     [not found]   ` <51f29cd1.e686440a.66b2.fffff9d5SMTPIN_ADDED_BROKEN@mx.google.com>
2013-07-26 18:56     ` Daniel Vetter
2013-07-26 18:56       ` Daniel Vetter
2013-07-26 18:56       ` Daniel Vetter
     [not found]   ` <51f29ccd.f014b40a.34cc.ffffca2aSMTPIN_ADDED_BROKEN@mx.google.com>
2013-07-29 14:58     ` Rob Clark
2013-07-29 14:58       ` Rob Clark
2013-07-29 14:58       ` Rob Clark
2013-08-05 17:10       ` Tom Cooksey
2013-08-05 17:10       ` Tom Cooksey
2013-08-05 17:10       ` Tom Cooksey
     [not found]       ` <51ffdc7e.06b8b40a.2cc8.0fe0SMTPIN_ADDED_BROKEN@mx.google.com>
2013-08-05 18:07         ` Rob Clark
2013-08-05 18:07           ` Rob Clark
2013-08-05 18:07           ` Rob Clark
2013-08-06 11:31           ` Tom Cooksey
2013-08-06 11:31           ` Tom Cooksey
2013-08-06 11:31           ` Tom Cooksey
2013-08-06 12:18             ` [Linaro-mm-sig] " Lucas Stach
2013-08-06 12:18               ` Lucas Stach
2013-08-06 12:18               ` Lucas Stach
2013-08-06 14:14               ` Rob Clark
2013-08-06 14:14                 ` Rob Clark
2013-08-06 14:14                 ` Rob Clark
2013-08-06 14:36                 ` Lucas Stach
2013-08-06 14:36                   ` Lucas Stach
2013-08-06 14:36                   ` Lucas Stach
2013-08-06 14:59                   ` Rob Clark [this message]
2013-08-06 14:59                     ` Rob Clark
2013-08-06 14:59                     ` Rob Clark
2013-08-06 15:28               ` Daniel Vetter
2013-08-06 15:28                 ` Daniel Vetter
2013-08-06 15:28                 ` Daniel Vetter
2013-08-06 15:28                 ` Daniel Vetter
2013-09-14 21:33             ` Daniel Stone
2013-09-14 21:33               ` Daniel Stone
2013-09-14 21:33               ` Daniel Stone
     [not found]           ` <5200deb3.0b24b40a.3b26.ffffbadeSMTPIN_ADDED_BROKEN@mx.google.com>
2013-08-06 12:15             ` Rob Clark
2013-08-06 12:15               ` Rob Clark
2013-08-06 12:15               ` Rob Clark
2013-08-06 14:03               ` Tom Cooksey
2013-08-06 14:03               ` Tom Cooksey
2013-08-06 14:03               ` Tom Cooksey
     [not found]               ` <52010257.245fc20a.6ff8.1cfdSMTPIN_ADDED_BROKEN@mx.google.com>
2013-08-06 14:40                 ` Rob Clark
2013-08-06 14:40                   ` Rob Clark
2013-08-06 14:40                   ` Rob Clark
2013-08-06 17:38                   ` Tom Cooksey
2013-08-06 17:38                   ` Tom Cooksey
2013-08-06 17:38                   ` Tom Cooksey
     [not found]                   ` <52013482.e107c20a.27f9.ffffa718SMTPIN_ADDED_BROKEN@mx.google.com>
2013-08-06 19:06                     ` Rob Clark
2013-08-06 19:06                       ` Rob Clark
2013-08-06 19:06                       ` Rob Clark
2013-08-07 17:33                       ` Tom Cooksey
2013-08-07 17:33                       ` Tom Cooksey
2013-08-07 17:33                       ` Tom Cooksey
     [not found]                       ` <520284fe.a16ec20a.2d3c.6e19SMTPIN_ADDED_BROKEN@mx.google.com>
2013-08-07 17:56                         ` [Linaro-mm-sig] " Alex Deucher
2013-08-07 17:56                           ` Alex Deucher
2013-08-07 17:56                           ` Alex Deucher
     [not found]                       ` <520284d6.07300f0a.72a4.1623SMTPIN_ADDED_BROKEN@mx.google.com>
2013-08-07 18:12                         ` Rob Clark
2013-08-07 18:12                           ` Rob Clark
2013-08-07 18:12                           ` Rob Clark
2013-08-09 16:15                           ` Tom Cooksey
2013-08-09 16:15                           ` Tom Cooksey
2013-08-09 16:15                           ` Tom Cooksey
     [not found]                           ` <520515b9.87370f0a.16e6.2380SMTPIN_ADDED_BROKEN@mx.google.com>
2013-08-09 17:12                             ` Rob Clark
2013-08-09 17:12                               ` Rob Clark
2013-08-09 17:12                               ` Rob Clark
2013-08-14 15:01                               ` Tom Cooksey
2013-08-14 15:01                               ` Tom Cooksey
2013-08-14 15:01                               ` Tom Cooksey
2013-08-07  4:23               ` John Stultz
2013-08-07  4:23                 ` John Stultz
2013-08-07  4:23                 ` John Stultz
2013-08-07  4:23                 ` John Stultz
2013-08-07 13:27                 ` Rob Clark
2013-08-07 13:27                   ` Rob Clark
2013-08-07 13:27                   ` Rob Clark
2013-08-07  3:57       ` John Stultz
2013-08-07  3:57         ` John Stultz
2013-08-07  3:57         ` John Stultz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAF6AEGsHB0day3mMFpp_bHWyZSKa7K_3DBaBrvqYGeia4iR7TA@mail.gmail.com \
    --to=robdclark@gmail.com \
    --cc=Pawel.Moll@arm.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=l.stach@pengutronix.de \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-fbdev@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=tom.cooksey@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.