All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Noralf Trønnes" <noralf@tronnes.org>
To: Felipe Balbi <balbi@kernel.org>, dri-devel@lists.freedesktop.org
Cc: linux-usb@vger.kernel.org, sam@ravnborg.org
Subject: Re: [PATCH v3 6/6] usb: gadget: function: Add Generic USB Display support
Date: Tue, 2 Jun 2020 21:14:59 +0200	[thread overview]
Message-ID: <0d0ec3f5-4b9a-e128-5d50-9e7096b3f984@tronnes.org> (raw)
In-Reply-To: <87k10p1jr6.fsf@kernel.org>

Hi,

Den 02.06.2020 19.05, skrev Felipe Balbi:
> 
> Hi,
> 
> I missed this completely until now.
> Noralf Trønnes <noralf@tronnes.org> writes:
>> This adds the gadget side support for the Generic USB Display. It presents
>> a DRM display device as a USB Display configured through configfs.
>>
>> The display is implemented as a vendor type USB interface with one bulk
>> out endpoint. The protocol is implemented using control requests.
>> lz4 compressed framebuffer data/pixels are sent over the bulk endpoint.
>>
>> The DRM part of the gadget is placed in the DRM subsystem since it reaches
>> into the DRM internals.
> 
> First and foremost, could this be done in userspace using the raw gadget
> or f_fs?
> 

An uncompressed 1920x1080-RGB565 frame is ~4MB. All frames can be
compressed (lz4) even if just a little, so this is decompressed into the
framebuffer of the attached DRM device. AFAIU I would need to be able to
mmap the received bulk buffer if I were to do this from userspace
without killing performance. So it doesn't look like I can use raw
gadget or f_fs.

>> diff --git a/drivers/usb/gadget/function/f_gud_drm.c b/drivers/usb/gadget/function/f_gud_drm.c
>> new file mode 100644
>> index 000000000000..9a2d6bb9739f
>> --- /dev/null
>> +++ b/drivers/usb/gadget/function/f_gud_drm.c
>> @@ -0,0 +1,678 @@
>> +struct f_gud_drm {
>> +	struct usb_function func;
>> +	struct work_struct worker;
> 
> why do you need a worker?
> 

The gadget runs in interrupt context and I need to call into the DRM
subsystem which can sleep.

>> +	size_t max_buffer_size;
>> +	void *ctrl_req_buf;
>> +
>> +	u8 interface_id;
>> +	struct usb_request *ctrl_req;
>> +
>> +	struct usb_ep *bulk_ep;
>> +	struct usb_request *bulk_req;
> 
> single request? Don't you want to amortize the latency of
> queue->complete by using a series of requests?
> 

I use only one request per update or partial update.
I kmalloc the biggest buffer I can get (default 4MB) and tell the host
about this size. If a frame doesn't fit, the host splits it up into
partial updates. I already support partial updates so this is built in.
Userspace can tell the graphics driver which portion of the framebuffer
it has touched to avoid sending the full frame each time.
Having one continous buffer simplifies decompression.

There's a control request preceding the bulk request which tells the
area the update is for and whether it is compressed or not.
I did some testing to see if I should avoid the control request overhead
for split updates, but it turns out that the bottleneck is the
decompression. The control request is just 400-500us, while the total
time from control request to buffer is decompressed is 50-100ms
(depending on how well the frame compresses).

>> +	struct gud_drm_gadget *gdg;
>> +
>> +	spinlock_t lock; /* Protects the following members: */
>> +	bool ctrl_pending;
>> +	bool status_pending;
>> +	bool bulk_pending;
>> +	bool disable_pending;
> 
> could this be a single u32 with #define'd flags? That would be atomic,
> right?
> 

I have never grasped all the concurrency issues, but wouldn't I need
memory barriers as well?

>> +	u8 errno;
> 
> a global per-function error? Why?
> 

This is the result of the previously request operation. The host will
request this value to see how it went. I might switch to using a bulk
endpoint for status following a discussion with Alan Stern in the host
driver thread in this patch series. If so I might not need this.

>> +	u16 request;
>> +	u16 value;
> 
> also why? Looks odd
> 

These values contains the operation (in addition to the payload) that
the worker shall perform following the control-OUT requests.

control-IN requests can run in interrupt context so in that case the
payload is queued up immediately.

<snip>

>> +static void f_gud_drm_bulk_complete(struct usb_ep *ep, struct usb_request *req)
>> +{
>> +	struct f_gud_drm *fgd = req->context;
>> +	unsigned long flags;
>> +
>> +	if (req->status || req->actual != req->length)
>> +		return;
> 
> so, if we complete with an erroneous status or a short packet, you'll
> ignore it?
> 

Hmm yeah. When I wrote this I thought that the bottleneck was the USB
transfers, so I didn't want the host to slow down performance by
requesting this status. Now I know it's the decompression that takes
time, so I could actually do a status check and retry the frame if the
device detects an error.

>> +	spin_lock_irqsave(&fgd->lock, flags);
>> +	fgd->bulk_pending = true;
>> +	spin_unlock_irqrestore(&fgd->lock, flags);
>> +
>> +	queue_work(system_long_wq, &fgd->worker);
> 
> Do you need to offset this to a worker?
> 

Yes, long running (one operation can be >100ms) and can sleep.

>> +static int f_gud_drm_ctrl_req_set_buffer(struct f_gud_drm *fgd, void *buf, size_t len)
>> +{
>> +	int ret;
>> +
>> +	if (len != sizeof(struct gud_drm_req_set_buffer))
>> +		return -EINVAL;
>> +
>> +	ret = gud_drm_gadget_set_buffer(fgd->gdg, buf);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	if (ret > fgd->max_buffer_size)
>> +		return -EOVERFLOW;
>> +
>> +	fgd->bulk_req->length = ret;
>> +
>> +	return usb_ep_queue(fgd->bulk_ep, fgd->bulk_req, GFP_KERNEL);
>> +}
>> +
>> +static void f_gud_drm_worker(struct work_struct *work)
>> +{
>> +	struct f_gud_drm *fgd = container_of(work, struct f_gud_drm, worker);
>> +	bool ctrl_pending, bulk_pending, disable_pending;
>> +	struct gud_drm_gadget *gdg = fgd->gdg;
>> +	unsigned long flags;
>> +	u16 request, value;
>> +	int ret;
>> +
>> +	spin_lock_irqsave(&fgd->lock, flags);
>> +	request = fgd->request;
>> +	value = fgd->value;
>> +	ctrl_pending = fgd->ctrl_pending;
>> +	bulk_pending = fgd->bulk_pending;
>> +	disable_pending = fgd->disable_pending;
>> +	spin_unlock_irqrestore(&fgd->lock, flags);
>> +
>> +	pr_debug("%s: bulk_pending=%u ctrl_pending=%u disable_pending=%u\n",
>> +		 __func__, bulk_pending, ctrl_pending, disable_pending);
> 
> could we use dev_dbg() at least?
> 

Sure.

Noralf.


WARNING: multiple messages have this Message-ID (diff)
From: "Noralf Trønnes" <noralf@tronnes.org>
To: Felipe Balbi <balbi@kernel.org>, dri-devel@lists.freedesktop.org
Cc: linux-usb@vger.kernel.org, sam@ravnborg.org
Subject: Re: [PATCH v3 6/6] usb: gadget: function: Add Generic USB Display support
Date: Tue, 2 Jun 2020 21:14:59 +0200	[thread overview]
Message-ID: <0d0ec3f5-4b9a-e128-5d50-9e7096b3f984@tronnes.org> (raw)
In-Reply-To: <87k10p1jr6.fsf@kernel.org>

Hi,

Den 02.06.2020 19.05, skrev Felipe Balbi:
> 
> Hi,
> 
> I missed this completely until now.
> Noralf Trønnes <noralf@tronnes.org> writes:
>> This adds the gadget side support for the Generic USB Display. It presents
>> a DRM display device as a USB Display configured through configfs.
>>
>> The display is implemented as a vendor type USB interface with one bulk
>> out endpoint. The protocol is implemented using control requests.
>> lz4 compressed framebuffer data/pixels are sent over the bulk endpoint.
>>
>> The DRM part of the gadget is placed in the DRM subsystem since it reaches
>> into the DRM internals.
> 
> First and foremost, could this be done in userspace using the raw gadget
> or f_fs?
> 

An uncompressed 1920x1080-RGB565 frame is ~4MB. All frames can be
compressed (lz4) even if just a little, so this is decompressed into the
framebuffer of the attached DRM device. AFAIU I would need to be able to
mmap the received bulk buffer if I were to do this from userspace
without killing performance. So it doesn't look like I can use raw
gadget or f_fs.

>> diff --git a/drivers/usb/gadget/function/f_gud_drm.c b/drivers/usb/gadget/function/f_gud_drm.c
>> new file mode 100644
>> index 000000000000..9a2d6bb9739f
>> --- /dev/null
>> +++ b/drivers/usb/gadget/function/f_gud_drm.c
>> @@ -0,0 +1,678 @@
>> +struct f_gud_drm {
>> +	struct usb_function func;
>> +	struct work_struct worker;
> 
> why do you need a worker?
> 

The gadget runs in interrupt context and I need to call into the DRM
subsystem which can sleep.

>> +	size_t max_buffer_size;
>> +	void *ctrl_req_buf;
>> +
>> +	u8 interface_id;
>> +	struct usb_request *ctrl_req;
>> +
>> +	struct usb_ep *bulk_ep;
>> +	struct usb_request *bulk_req;
> 
> single request? Don't you want to amortize the latency of
> queue->complete by using a series of requests?
> 

I use only one request per update or partial update.
I kmalloc the biggest buffer I can get (default 4MB) and tell the host
about this size. If a frame doesn't fit, the host splits it up into
partial updates. I already support partial updates so this is built in.
Userspace can tell the graphics driver which portion of the framebuffer
it has touched to avoid sending the full frame each time.
Having one continous buffer simplifies decompression.

There's a control request preceding the bulk request which tells the
area the update is for and whether it is compressed or not.
I did some testing to see if I should avoid the control request overhead
for split updates, but it turns out that the bottleneck is the
decompression. The control request is just 400-500us, while the total
time from control request to buffer is decompressed is 50-100ms
(depending on how well the frame compresses).

>> +	struct gud_drm_gadget *gdg;
>> +
>> +	spinlock_t lock; /* Protects the following members: */
>> +	bool ctrl_pending;
>> +	bool status_pending;
>> +	bool bulk_pending;
>> +	bool disable_pending;
> 
> could this be a single u32 with #define'd flags? That would be atomic,
> right?
> 

I have never grasped all the concurrency issues, but wouldn't I need
memory barriers as well?

>> +	u8 errno;
> 
> a global per-function error? Why?
> 

This is the result of the previously request operation. The host will
request this value to see how it went. I might switch to using a bulk
endpoint for status following a discussion with Alan Stern in the host
driver thread in this patch series. If so I might not need this.

>> +	u16 request;
>> +	u16 value;
> 
> also why? Looks odd
> 

These values contains the operation (in addition to the payload) that
the worker shall perform following the control-OUT requests.

control-IN requests can run in interrupt context so in that case the
payload is queued up immediately.

<snip>

>> +static void f_gud_drm_bulk_complete(struct usb_ep *ep, struct usb_request *req)
>> +{
>> +	struct f_gud_drm *fgd = req->context;
>> +	unsigned long flags;
>> +
>> +	if (req->status || req->actual != req->length)
>> +		return;
> 
> so, if we complete with an erroneous status or a short packet, you'll
> ignore it?
> 

Hmm yeah. When I wrote this I thought that the bottleneck was the USB
transfers, so I didn't want the host to slow down performance by
requesting this status. Now I know it's the decompression that takes
time, so I could actually do a status check and retry the frame if the
device detects an error.

>> +	spin_lock_irqsave(&fgd->lock, flags);
>> +	fgd->bulk_pending = true;
>> +	spin_unlock_irqrestore(&fgd->lock, flags);
>> +
>> +	queue_work(system_long_wq, &fgd->worker);
> 
> Do you need to offset this to a worker?
> 

Yes, long running (one operation can be >100ms) and can sleep.

>> +static int f_gud_drm_ctrl_req_set_buffer(struct f_gud_drm *fgd, void *buf, size_t len)
>> +{
>> +	int ret;
>> +
>> +	if (len != sizeof(struct gud_drm_req_set_buffer))
>> +		return -EINVAL;
>> +
>> +	ret = gud_drm_gadget_set_buffer(fgd->gdg, buf);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	if (ret > fgd->max_buffer_size)
>> +		return -EOVERFLOW;
>> +
>> +	fgd->bulk_req->length = ret;
>> +
>> +	return usb_ep_queue(fgd->bulk_ep, fgd->bulk_req, GFP_KERNEL);
>> +}
>> +
>> +static void f_gud_drm_worker(struct work_struct *work)
>> +{
>> +	struct f_gud_drm *fgd = container_of(work, struct f_gud_drm, worker);
>> +	bool ctrl_pending, bulk_pending, disable_pending;
>> +	struct gud_drm_gadget *gdg = fgd->gdg;
>> +	unsigned long flags;
>> +	u16 request, value;
>> +	int ret;
>> +
>> +	spin_lock_irqsave(&fgd->lock, flags);
>> +	request = fgd->request;
>> +	value = fgd->value;
>> +	ctrl_pending = fgd->ctrl_pending;
>> +	bulk_pending = fgd->bulk_pending;
>> +	disable_pending = fgd->disable_pending;
>> +	spin_unlock_irqrestore(&fgd->lock, flags);
>> +
>> +	pr_debug("%s: bulk_pending=%u ctrl_pending=%u disable_pending=%u\n",
>> +		 __func__, bulk_pending, ctrl_pending, disable_pending);
> 
> could we use dev_dbg() at least?
> 

Sure.

Noralf.

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply	other threads:[~2020-06-02 19:15 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-29 17:56 [PATCH v3 0/6] Generic USB Display driver Noralf Trønnes
2020-05-29 17:56 ` Noralf Trønnes
2020-05-29 17:56 ` [PATCH v3 1/6] drm/client: Add drm_client_init_from_id() Noralf Trønnes
2020-05-29 17:56   ` Noralf Trønnes
2020-05-29 17:56 ` [PATCH v3 2/6] drm/client: Add drm_client_modeset_disable() Noralf Trønnes
2020-05-29 17:56   ` Noralf Trønnes
2020-05-29 17:56 ` [PATCH v3 3/6] drm/client: Add a way to set modeset, properties and rotation Noralf Trønnes
2020-05-29 17:56   ` Noralf Trønnes
2020-05-29 17:56 ` [PATCH v3 4/6] drm: Add Generic USB Display driver Noralf Trønnes
2020-05-29 17:56   ` Noralf Trønnes
2020-05-29 22:45   ` Peter Stuge
2020-05-29 22:45     ` Peter Stuge
2020-06-01 16:57     ` Noralf Trønnes
2020-06-01 16:57       ` Noralf Trønnes
2020-06-02  0:12       ` Peter Stuge
2020-06-02  0:12         ` Peter Stuge
2020-06-02  2:32         ` Alan Stern
2020-06-02  2:32           ` Alan Stern
2020-06-02  5:21           ` Peter Stuge
2020-06-02  5:21             ` Peter Stuge
2020-06-02 15:27             ` Alan Stern
2020-06-02 15:27               ` Alan Stern
2020-06-02 18:38               ` Peter Stuge
2020-06-02 18:38                 ` Peter Stuge
2020-06-05 12:03                 ` Noralf Trønnes
2020-06-05 12:03                   ` Noralf Trønnes
2020-06-02 11:46           ` Noralf Trønnes
2020-06-02 11:46             ` Noralf Trønnes
2020-07-14 15:30             ` Noralf Trønnes
2020-07-14 15:30               ` Noralf Trønnes
2020-06-03 19:17         ` Noralf Trønnes
2020-06-03 19:17           ` Noralf Trønnes
2020-05-29 17:56 ` [PATCH v3 5/6] drm/gud: Add functionality for the USB gadget side Noralf Trønnes
2020-05-29 17:56   ` Noralf Trønnes
2020-05-29 17:56 ` [PATCH v3 6/6] usb: gadget: function: Add Generic USB Display support Noralf Trønnes
2020-05-29 17:56   ` Noralf Trønnes
2020-06-02 17:05   ` Felipe Balbi
2020-06-02 17:05     ` Felipe Balbi
2020-06-02 19:14     ` Noralf Trønnes [this message]
2020-06-02 19:14       ` Noralf Trønnes
2020-06-03  7:10       ` Felipe Balbi
2020-06-03  7:10         ` Felipe Balbi
2020-07-09 16:32 ` [PATCH v3 0/6] Generic USB Display driver Lubomir Rintel
2020-07-09 16:32   ` Lubomir Rintel
2020-07-14 13:33   ` Noralf Trønnes
2020-07-14 13:33     ` Noralf Trønnes
2020-07-14 17:40     ` Peter Stuge
2020-07-14 17:40       ` Peter Stuge
2020-07-14 19:03       ` Noralf Trønnes
2020-07-14 19:03         ` Noralf Trønnes
2020-07-14 19:38         ` Peter Stuge
2020-07-14 19:38           ` Peter Stuge
2020-07-16 17:43           ` Noralf Trønnes
2020-07-16 17:43             ` Noralf Trønnes
2020-07-15  7:30         ` Lubomir Rintel
2020-07-15  7:30           ` Lubomir Rintel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0d0ec3f5-4b9a-e128-5d50-9e7096b3f984@tronnes.org \
    --to=noralf@tronnes.org \
    --cc=balbi@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=sam@ravnborg.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.