All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: the side effect of using copy skb instead of skb_clone in cdc ncm/mbim driver
       [not found] <CAOa2BBy4t_17J7G5bBpAZZpBWZYh+Cpc6Fn4u+9cV98M8CZ5gw@mail.gmail.com>
@ 2014-07-08  7:03 ` Bjørn Mork
       [not found]   ` <CAOa2BBwVxcC_WoAqExRgEX57BwkikOL_sOBnjus+udNzVB0zqw@mail.gmail.com>
  0 siblings, 1 reply; 4+ messages in thread
From: Bjørn Mork @ 2014-07-08  7:03 UTC (permalink / raw)
  To: joey ming; +Cc: jim_baxter, alexey.orishko, netdev, zhao.ming9

joey ming <joey.zming@gmail.com> writes:

> Hi, I was hoping you can help me with some questions.
>
> some qualcomm LTE modem has two mode: ECM and QC_NCM.

Yes, and worse than that: You can even mix and match these on UL and DL.

> QC_NCM is the
> protocol which qualcomm defined,the  difference between qc-ncm and cdc-mbim
> are:
> data pipe:     NDP sig vary
>                    qc-ncm  packets are composed of  NTH+datagram+NTB  but
> mbim/ncm are NTH+NTB+datagram

Both these schemes are allowed in NCM (and therefore MBIM). There is no
required datagram nd NTB order. Proper alignment is the only requirement.
So the only real difference is the signature, which AFAIK is
configurable on the QC_NCM devices.

> control pipe: the biggest difference is control pipe. QC-NCM uses QMI cmd
> to control device but mbim/ncm uses CDC cmd .

Yes, but this usage is actually also within the NCM spec.  The only
issues making a typical QC_NCM capable device fail as a CDC NCM class
device are
 - the default signature
 - the USB descriptors

Both are of course configurable by the modem vendors using these
chips/firmware toolkits, so it's possible to create standard CDC NCM
class devices out of these.  But you probably don't want to do that,
because there aren't any NCM class drivers which will let you access the
QMI control channel (except the one you have made :-), and you need QMI
for modem management.

> so for a QC-NCM driver, we can borrow the code of data pipe from
> cdc-mbim/ncm. for a real LTE modem(DL speed:150Mbps) the qc-ncm driver
> speed which used skb_clone in qc_ncm_rxfixup  can reach 126Mbps,but at the
> same time and same envirmont ecm driver can reach 144Mbps for the same
> modem.
> Thanks for google, I found  Bjorn Mork's discussion about buffering
> restrictions of cdc-ncm host driver.
>  the qc-ncm driver speed can reach 145Mbps  if  I used copy skb instead of
> skb_clone.
> as we know, the aggregation protocol(such as cdc-ncm,cdc-mbim,qc-ncm )
> major purpose are improving performance of CPU by reducing interrupts.
> Compared ecm and qc-ncm test result, I thought we didn't reach the goal.
> the interrupts decreased 20% BUT the driver load was no change.
> I used Intel Core Duo laptop and oprofiled in 3.12 kernel:
>   QC-NCM           0.3146%(CPU0)    0.2899%(CPU1)  vmlinux
>  vmlinux                  memcpy
>                           0.2198%(CPU0)    0.1864%(CPU1)  usbnet
>         usbnet                   usbnet
>                           0.0423 %(CPU0)   0.0385%(CPU1)  qmi_wwan
>      qmi_wwan              /qmi_wwan
>
>   CDC-ECM         0.2470%(CPU0)     0.1735%(CPU1)  usbnet
> usbnet                   /usbnet
>                           0.1341%(CPU0)     0.0132%(CPU1)  vmlinux
>          vmlinux                  memcpy
>                           0.0335%(CPU0)      0.0387%(CPU1)  qc_ncm
>         qc_ncm                  /qc_ncm
>
> for a high speed modem(DL >100Mbps), Is it a appropriate solution use copy
> skb? as above test result, memcpy load was higher than usbnet.

copy is the only solution.  With aggregation on the USB link we can
choose either
 - USB buffers which are much bigger than one IP packet, or
 - splitting some IP packets over two USB buffers

Either way we'll have to copy the IP packets before handing them over to
the upper layers.   Cloning oversized skbs is not a good idea.

The problem you found is that there is no real value in this USB layer
aggregation on Linux.  It's just pointless complexity. A plain ECM
driver transmitting ethernet frames back to back will achieve the exact
same utilisation of the USB link, without all the aggregation complexity
and overhead.

I don't know why the USB-IF found it necessary to create these
aggregation protocols, but I assume there are other OS's than Linux out
there (running on either hosts or devices) which are unable to use the
full USB bandwith with ECM.

But the results you show above is the primary reason why I chose to
completely ignore the QC_NCM bastard protocol.  AFAIK all devices
capable of using this protocol can also do the plain ECM, and will do so
by default.  So they are supported by the qmi_wwan driver with no more
fuzz, doing the maximum speed the devices are capable of.

Thanks a lot for testing this and providing the results!


Bjørn

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: the side effect of using copy skb instead of skb_clone in cdc ncm/mbim driver
       [not found]   ` <CAOa2BBwVxcC_WoAqExRgEX57BwkikOL_sOBnjus+udNzVB0zqw@mail.gmail.com>
@ 2014-07-09 16:01     ` Bjørn Mork
  2014-07-09 23:41       ` Alexey Orishko
  0 siblings, 1 reply; 4+ messages in thread
From: Bjørn Mork @ 2014-07-09 16:01 UTC (permalink / raw)
  To: joey ming; +Cc: jim_baxter, Alexey Orishko, netdev, zhao.ming9

[didn't notice earlier, but Alexey's address was wrong - I fixed it on
this reply]

joey ming <joey.zming@gmail.com> writes:

> thanks for your reply.
> from my test results, perhaps the ncm protocol is the same efficiency with
> cdc-ecm. But Alexey(alexey.orishko@stericsson.com) said his experiment two
> years ago:"One real-world example was modem for 21+6Mbit/s what used 100%
> CPU with ECM responsible for approx. 40% of the MIPS used. Using NCM
> instead CPU was only at approx. 65% utilization. Which allowed multiple
> other functions to be added and significantly increased the usability and
> value of the modem". I don't know why the test result was differ so large.
> Is that correct that cdc-ncm is effctive than cdc-ecm for low speed device
> but not for high speed device?

Alexey's results were on modem hardware, and I am guessing the OS wasn't
Linux.  I have no doubt that you can increase efficiency if you can take
a fixed size big NCM buffer, and make the radio interface write packets
directly into it using the alignment of your choice, before you just
give the whole buffer to a USB controller.  You mostly don't have to
involve the CPU at all.  So NCM is probably a great win for the modems,
and you are right: That is likely why this aggregating protocol was
invented.

But little of this is applicable to the typical Linux implementation,
whether it runs on a host or a device.  Big USB buffers do not help much
on the USB controllers, and they are just a hassle other places because
it's difficult to pass partial buffers around.

And I think modem hardware now has become so much more powerful that the
same applies to it as well.


Bjørn

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: the side effect of using copy skb instead of skb_clone in cdc ncm/mbim driver
  2014-07-09 16:01     ` Bjørn Mork
@ 2014-07-09 23:41       ` Alexey Orishko
  2014-07-10  8:50         ` David Laight
  0 siblings, 1 reply; 4+ messages in thread
From: Alexey Orishko @ 2014-07-09 23:41 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: joey ming, jim_baxter, netdev, zhao.ming9

On Wed, Jul 9, 2014 at 6:01 PM, Bjørn Mork <bjorn@mork.no> wrote:
> Alexey's results were on modem hardware, and I am guessing the OS wasn't
> Linux.
NCM driver was deployed on Unix-like realtime OS and on various
embedded application CPU running Android.

> So NCM is probably a great win for the modems,
> and you are right: That is likely why this aggregating protocol was
> invented.

One comment for this and earlier passages about USB-IF: during
specification development there was no intention "to make life easier"
on the host side, the goal was to make mobile device more efficient
with a weak CPU (since 2007 situation changed significantly for mobile
computing power).

One important comment though about the main idea of the aggregation protocol:
- in order to get a *real* benefit of the protocol sender shall
initially send only a table containing pointers in a separate 512
bytes packet and only then send the rest of NTB.
- receiving side shall allocate DMA job for receiving 512 bytes first
and after parsing it, setup a job for receiving all IP packets into
separate data buffers. After that DMA engine will handle data without
involving CPU.
As a result: minimum CPU usage and all IP packets are placed into
separate skb-s.

However, I don't believe usbnet infrastructure is capable of doing
that, but it can be done in proprietary code in usb modem. I don't
have any info if someone actually manage to build such a system, which
required a lot of efforts; it is much easier to do data coping with
Intel Quad Core 3+GHz CPU with 8+GB RAM - you hardly notice any
difference comparing to embedded systems...

Regards,
Alexey

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: the side effect of using copy skb instead of skb_clone in cdc ncm/mbim driver
  2014-07-09 23:41       ` Alexey Orishko
@ 2014-07-10  8:50         ` David Laight
  0 siblings, 0 replies; 4+ messages in thread
From: David Laight @ 2014-07-10  8:50 UTC (permalink / raw)
  To: 'Alexey Orishko', Bjørn Mork
  Cc: joey ming, jim_baxter, netdev, zhao.ming9

From: Alexey Orishko
> On Wed, Jul 9, 2014 at 6:01 PM, Bjrn Mork <bjorn@mork.no> wrote:
> > Alexey's results were on modem hardware, and I am guessing the OS wasn't
> > Linux.
> NCM driver was deployed on Unix-like realtime OS and on various
> embedded application CPU running Android.
> 
> > So NCM is probably a great win for the modems,
> > and you are right: That is likely why this aggregating protocol was
> > invented.
> 
> One comment for this and earlier passages about USB-IF: during
> specification development there was no intention "to make life easier"
> on the host side, the goal was to make mobile device more efficient
> with a weak CPU (since 2007 situation changed significantly for mobile
> computing power).
> 
> One important comment though about the main idea of the aggregation protocol:
> - in order to get a *real* benefit of the protocol sender shall
> initially send only a table containing pointers in a separate 512
> bytes packet and only then send the rest of NTB.
> - receiving side shall allocate DMA job for receiving 512 bytes first
> and after parsing it, setup a job for receiving all IP packets into
> separate data buffers. After that DMA engine will handle data without
> involving CPU.
> As a result: minimum CPU usage and all IP packets are placed into
> separate skb-s.
> 
> However, I don't believe usbnet infrastructure is capable of doing
> that, but it can be done in proprietary code in usb modem. I don't
> have any info if someone actually manage to build such a system, which
> required a lot of efforts; it is much easier to do data coping with
> Intel Quad Core 3+GHz CPU with 8+GB RAM - you hardly notice any
> difference comparing to embedded systems...

If the subsequent data is packed into a single USB bulk data transfer,
then only the xhci controller has the capability to perform the required
dma transfers - since it needs arbitrary scatter-gather support.

The usbnet infrastructure would also need changing.

In practise I suspect that a data copy in the host is unlikely to
be significant for anything running USB2 speeds or 100M ethernet.
There is probably more scope for reducing cpu usage by optimising
the USB stack itself.

	David



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-07-10  8:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAOa2BBy4t_17J7G5bBpAZZpBWZYh+Cpc6Fn4u+9cV98M8CZ5gw@mail.gmail.com>
2014-07-08  7:03 ` the side effect of using copy skb instead of skb_clone in cdc ncm/mbim driver Bjørn Mork
     [not found]   ` <CAOa2BBwVxcC_WoAqExRgEX57BwkikOL_sOBnjus+udNzVB0zqw@mail.gmail.com>
2014-07-09 16:01     ` Bjørn Mork
2014-07-09 23:41       ` Alexey Orishko
2014-07-10  8:50         ` David Laight

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.