All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: xiaohui.xin@intel.com
Cc: netdev@vger.kernel.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, mingo@elte.hu, davem@davemloft.net,
	herbert@gondor.hengli.com.au, jdike@linux.intel.com
Subject: Re: [RFC PATCH v9 12/16] Add mp(mediate passthru) device.
Date: Mon, 6 Sep 2010 14:11:26 +0300	[thread overview]
Message-ID: <20100906111126.GA15608@redhat.com> (raw)
In-Reply-To: <1281086624-5765-14-git-send-email-xiaohui.xin@intel.com>

So - does this driver help reduce service demand signifiantly?
Some comments from looking at the code:

On Fri, Aug 06, 2010 at 05:23:41PM +0800, xiaohui.xin@intel.com wrote:
> +static struct page_info *alloc_page_info(struct page_ctor *ctor,
> +		struct kiocb *iocb, struct iovec *iov,
> +		int count, struct frag *frags,
> +		int npages, int total)
> +{
> +	int rc;
> +	int i, j, n = 0;
> +	int len;
> +	unsigned long base, lock_limit;
> +	struct page_info *info = NULL;
> +
> +	lock_limit = current->signal->rlim[RLIMIT_MEMLOCK].rlim_cur;
> +	lock_limit >>= PAGE_SHIFT;

Playing with rlimit on data path, transparently to the application in this way
looks strange to me, I suspect this has unexpected security implications.
Further, applications may have other uses for locked memory
besides mpassthru - you should not just take it because it's there.

Can we have an ioctl that lets userspace configure how much
memory to lock? This ioctl will decrement the rlimit and store
the data in the device structure so we can do accounting
internally. Put it back on close or on another ioctl.
Need to be careful for when this operation gets called
again with 0 or another small value while we have locked memory -
maybe just fail with EBUSY?  or wait until it gets unlocked?
Maybe 0 can be special-cased and deactivate zero-copy?.


> +
> +	if (ctor->lock_pages + count > lock_limit && npages) {
> +		printk(KERN_INFO "exceed the locked memory rlimit.");
> +		return NULL;
> +	}
> +
> +	info = kmem_cache_zalloc(ext_page_info_cache, GFP_KERNEL);

You seem to fill in all memory, why zalloc? this is data path ...

> +
> +	if (!info)
> +		return NULL;
> +
> +	for (i = j = 0; i < count; i++) {
> +		base = (unsigned long)iov[i].iov_base;
> +		len = iov[i].iov_len;
> +
> +		if (!len)
> +			continue;
> +		n = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT;
> +
> +		rc = get_user_pages_fast(base, n, npages ? 1 : 0,

npages controls whether this is a write? Why?

> +				&info->pages[j]);
> +		if (rc != n)
> +			goto failed;
> +
> +		while (n--) {
> +			frags[j].offset = base & ~PAGE_MASK;
> +			frags[j].size = min_t(int, len,
> +					PAGE_SIZE - frags[j].offset);
> +			len -= frags[j].size;
> +			base += frags[j].size;
> +			j++;
> +		}
> +	}
> +
> +#ifdef CONFIG_HIGHMEM
> +	if (npages && !(dev->features & NETIF_F_HIGHDMA)) {
> +		for (i = 0; i < j; i++) {
> +			if (PageHighMem(info->pages[i]))
> +				goto failed;
> +		}
> +	}
> +#endif

Are non-highdma devices worth bothering with? If yes -
are there other limitations devices might have that we need to handle?
E.g. what about non-s/g devices or no checksum offloading?

> +		skb_push(skb, ETH_HLEN);
> +
> +		if (skb_is_gso(skb)) {
> +			hdr.hdr.hdr_len = skb_headlen(skb);
> +			hdr.hdr.gso_size = shinfo->gso_size;
> +			if (shinfo->gso_type & SKB_GSO_TCPV4)
> +				hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
> +			else if (shinfo->gso_type & SKB_GSO_TCPV6)
> +				hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
> +			else if (shinfo->gso_type & SKB_GSO_UDP)
> +				hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_UDP;
> +			else
> +				BUG();
> +			if (shinfo->gso_type & SKB_GSO_TCP_ECN)
> +				hdr.hdr.gso_type |= VIRTIO_NET_HDR_GSO_ECN;
> +
> +		} else
> +			hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE;
> +
> +		if (skb->ip_summed == CHECKSUM_PARTIAL) {
> +			hdr.hdr.flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
> +			hdr.hdr.csum_start =
> +				skb->csum_start - skb_headroom(skb);
> +			hdr.hdr.csum_offset = skb->csum_offset;
> +		}

We have this code in tun, macvtap and packet socket already.
Could this be a good time to move these into networking core?
I'm not asking you to do this right now, but could this generic
virtio-net to skb stuff be encapsulated in functions?

-- 
MST

  parent reply	other threads:[~2010-09-06 11:17 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-06  9:23 [RFC PATCH v9 00/16] Provide a zero-copy method on KVM virtio-net xiaohui.xin
2010-08-06  9:23 ` xiaohui.xin
2010-08-06  9:23 ` [RFC PATCH v9 01/16] Add a new structure for skb buffer from external xiaohui.xin
2010-08-06  9:23   ` xiaohui.xin
2010-08-06  9:23   ` [RFC PATCH v9 02/16] Add a new struct for device to manipulate external buffer xiaohui.xin
2010-08-06  9:23     ` xiaohui.xin
2010-08-06  9:23     ` [RFC PATCH v9 03/16] Add a ndo_mp_port_prep func to net_device_ops xiaohui.xin
2010-08-06  9:23       ` xiaohui.xin
2010-08-06  9:23       ` [RFC PATCH v9 04/16] Add a function make external buffer owner to query capability xiaohui.xin
2010-08-06  9:23         ` xiaohui.xin
2010-08-06  9:23         ` [RFC PATCH v9 05/16] Add a function to indicate if device use external buffer xiaohui.xin
2010-08-06  9:23           ` xiaohui.xin
2010-08-06  9:23           ` [RFC PATCH v9 06/16] Use callback to deal with skb_release_data() specially xiaohui.xin
2010-08-06  9:23             ` xiaohui.xin
2010-08-06  9:23             ` [RFC PATCH v9 07/16] Modify netdev_alloc_page() to get external buffer xiaohui.xin
2010-08-06  9:23               ` xiaohui.xin
2010-08-06  9:23               ` [RFC PATCH v9 08/16] Modify netdev_free_page() to release " xiaohui.xin
2010-08-06  9:23                 ` xiaohui.xin
2010-08-06  9:23                 ` [RFC PATCH v9 09/16] Don't do skb recycle, if device use " xiaohui.xin
2010-08-06  9:23                   ` xiaohui.xin
2010-08-06  9:23                   ` [RFC PATCH v9 10/16] Add a hook to intercept external buffers from NIC driver xiaohui.xin
2010-08-06  9:23                     ` xiaohui.xin
2010-08-06  9:23                     ` [RFC PATCH v9 11/16] Add header file for mp device xiaohui.xin
2010-08-06  9:23                       ` xiaohui.xin
2010-08-06  9:23                       ` [RFC PATCH v9 13/16] Add a kconfig entry and make entry " xiaohui.xin
2010-08-06  9:23                         ` xiaohui.xin
2010-08-06  9:23                         ` [RFC PATCH v9 12/16] Add mp(mediate passthru) device xiaohui.xin
2010-08-06  9:23                           ` xiaohui.xin
2010-08-06  9:23                           ` [RFC PATCH v9 14/16] Provides multiple submits and asynchronous notifications xiaohui.xin
2010-08-06  9:23                             ` xiaohui.xin
2010-08-06  9:23                             ` [RFC PATCH v9 15/16] An example how to modifiy NIC driver to use napi_gro_frags() interface xiaohui.xin
2010-08-06  9:23                               ` xiaohui.xin
2010-08-06  9:23                               ` [RFC PATCH v9 16/16] An example how to alloc user buffer based on " xiaohui.xin
2010-08-06  9:23                                 ` xiaohui.xin
2010-09-06 11:11                           ` Michael S. Tsirkin [this message]
2010-09-10 13:40                             ` [RFC PATCH v9 12/16] Add mp(mediate passthru) device Xin, Xiaohui
2010-09-11  7:41                               ` Xin, Xiaohui
2010-09-12 13:37                                 ` Michael S. Tsirkin
2010-09-15  3:13                                   ` Xin, Xiaohui
2010-09-15 11:28                                     ` Michael S. Tsirkin
2010-09-17  3:16                                       ` Xin, Xiaohui
2010-09-20  8:08                                       ` xiaohui.xin
2010-09-20 11:36                                         ` Michael S. Tsirkin
2010-09-21  1:39                                           ` Xin, Xiaohui
2010-09-21 13:14                                             ` Michael S. Tsirkin
2010-09-22 11:41                                               ` Xin, Xiaohui
2010-09-22 11:55                                                 ` Michael S. Tsirkin
2010-09-23 12:56                                                   ` Xin, Xiaohui
2010-09-26 11:50                                                     ` Michael S. Tsirkin
2010-09-27  0:42                                                       ` Xin, Xiaohui
2010-09-11  9:42                               ` Xin, Xiaohui
2010-08-11  1:23 ` [RFC PATCH v9 00/16] Provide a zero-copy method on KVM virtio-net Shirley Ma
2010-08-11  1:23   ` Shirley Ma
2010-08-11  1:43   ` Shirley Ma
2010-08-11  1:43     ` Shirley Ma
2010-08-11  6:01     ` Shirley Ma
2010-08-11  6:01       ` Shirley Ma
2010-08-11  6:55       ` Shirley Ma
2010-08-11  6:55         ` Shirley Ma
2010-09-03 10:52         ` Michael S. Tsirkin
2010-09-13 18:48           ` Shirley Ma
2010-09-13 21:35           ` Shirley Ma
2010-09-03 10:14   ` Michael S. Tsirkin
2010-09-03 20:29     ` Sridhar Samudrala

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100906111126.GA15608@redhat.com \
    --to=mst@redhat.com \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.hengli.com.au \
    --cc=jdike@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=netdev@vger.kernel.org \
    --cc=xiaohui.xin@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.