All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Xin, Xiaohui" <xiaohui.xin@intel.com>
To: "Xin, Xiaohui" <xiaohui.xin@intel.com>,
	"Michael S. Tsirkin" <mst@redhat.com>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"mingo@elte.hu" <mingo@elte.hu>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"herbert@gondor.hengli.com.au" <herbert@gondor.hengli.com.au>,
	"jdike@linux.intel.com" <jdike@linux.intel.com>
Subject: RE: [RFC PATCH v9 12/16] Add mp(mediate passthru) device.
Date: Sat, 11 Sep 2010 15:41:14 +0800	[thread overview]
Message-ID: <F2E9EB7348B8264F86B6AB8151CE2D79280489AA05@shsmsx502.ccr.corp.intel.com> (raw)
In-Reply-To: <F2E9EB7348B8264F86B6AB8151CE2D79280489A9C9@shsmsx502.ccr.corp.intel.com>

>>Playing with rlimit on data path, transparently to the application in this way
>>looks strange to me, I suspect this has unexpected security implications.
>>Further, applications may have other uses for locked memory
>>besides mpassthru - you should not just take it because it's there.
>>
>>Can we have an ioctl that lets userspace configure how much
>>memory to lock? This ioctl will decrement the rlimit and store
>>the data in the device structure so we can do accounting
>>internally. Put it back on close or on another ioctl.
>Yes, we can decrement the rlimit in ioctl in one time to avoid
>data path.
>
>>Need to be careful for when this operation gets called
>>again with 0 or another small value while we have locked memory -
>>maybe just fail with EBUSY?  or wait until it gets unlocked?
>>Maybe 0 can be special-cased and deactivate zero-copy?.
>>

How about we don't use a new ioctl, but just check the rlimit 
in one MPASSTHRU_BINDDEV ioctl? If we find mp device
break the rlimit, then we fail the bind ioctl, and thus can't do 
zero copy any more.

>In fact, if we choose RLIMIT_MEMLOCK to limit the lock memory,
>the default value is only 16 pages. It's too small to make the device to
>work. So we always to configure it with a large value.
>I think, if rlimit value after decrement is < 0, then deactivate zero-copy
>is better. 0 maybe ok.
>

>>> +
>>> +	if (ctor->lock_pages + count > lock_limit && npages) {
>>> +		printk(KERN_INFO "exceed the locked memory rlimit.");
>>> +		return NULL;
>>> +	}
>>> +
>>> +	info = kmem_cache_zalloc(ext_page_info_cache, GFP_KERNEL);
>>
>>You seem to fill in all memory, why zalloc? this is data path ...
>
>Ok, Let me check this.
>
>>
>>> +
>>> +	if (!info)
>>> +		return NULL;
>>> +
>>> +	for (i = j = 0; i < count; i++) {
>>> +		base = (unsigned long)iov[i].iov_base;
>>> +		len = iov[i].iov_len;
>>> +
>>> +		if (!len)
>>> +			continue;
>>> +		n = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT;
>>> +
>>> +		rc = get_user_pages_fast(base, n, npages ? 1 : 0,
>>
>>npages controls whether this is a write? Why?
>
>We use npages as a flag here. In mp_sendmsg(), we called alloc_page_info() with npages =
>0.
>
>>
>>> +				&info->pages[j]);
>>> +		if (rc != n)
>>> +			goto failed;
>>> +
>>> +		while (n--) {
>>> +			frags[j].offset = base & ~PAGE_MASK;
>>> +			frags[j].size = min_t(int, len,
>>> +					PAGE_SIZE - frags[j].offset);
>>> +			len -= frags[j].size;
>>> +			base += frags[j].size;
>>> +			j++;
>>> +		}
>>> +	}
>>> +
>>> +#ifdef CONFIG_HIGHMEM
>>> +	if (npages && !(dev->features & NETIF_F_HIGHDMA)) {
>>> +		for (i = 0; i < j; i++) {
>>> +			if (PageHighMem(info->pages[i]))
>>> +				goto failed;
>>> +		}
>>> +	}
>>> +#endif
>>
>>Are non-highdma devices worth bothering with? If yes -
>>are there other limitations devices might have that we need to handle?
>>E.g. what about non-s/g devices or no checksum offloading?.
>
>Basically I think there is no limitations for both, but let me have a check.
>
>>
>>> +		skb_push(skb, ETH_HLEN);
>>> +
>>> +		if (skb_is_gso(skb)) {
>>> +			hdr.hdr.hdr_len = skb_headlen(skb);
>>> +			hdr.hdr.gso_size = shinfo->gso_size;
>>> +			if (shinfo->gso_type & SKB_GSO_TCPV4)
>>> +				hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
>>> +			else if (shinfo->gso_type & SKB_GSO_TCPV6)
>>> +				hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
>>> +			else if (shinfo->gso_type & SKB_GSO_UDP)
>>> +				hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_UDP;
>>> +			else
>>> +				BUG();
>>> +			if (shinfo->gso_type & SKB_GSO_TCP_ECN)
>>> +				hdr.hdr.gso_type |= VIRTIO_NET_HDR_GSO_ECN;
>>> +
>>> +		} else
>>> +			hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE;
>>> +
>>> +		if (skb->ip_summed == CHECKSUM_PARTIAL) {
>>> +			hdr.hdr.flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
>>> +			hdr.hdr.csum_start =
>>> +				skb->csum_start - skb_headroom(skb);
>>> +			hdr.hdr.csum_offset = skb->csum_offset;
>>> +		}
>>
>>We have this code in tun, macvtap and packet socket already.
>>Could this be a good time to move these into networking core?
>>I'm not asking you to do this right now, but could this generic
>>virtio-net to skb stuff be encapsulated in functions?
>
>It seems reasonable.
>
>>
>>--
>>MST
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2010-09-11  7:41 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-08-06  9:23 [RFC PATCH v9 00/16] Provide a zero-copy method on KVM virtio-net xiaohui.xin
2010-08-06  9:23 ` xiaohui.xin
2010-08-06  9:23 ` [RFC PATCH v9 01/16] Add a new structure for skb buffer from external xiaohui.xin
2010-08-06  9:23   ` xiaohui.xin
2010-08-06  9:23   ` [RFC PATCH v9 02/16] Add a new struct for device to manipulate external buffer xiaohui.xin
2010-08-06  9:23     ` xiaohui.xin
2010-08-06  9:23     ` [RFC PATCH v9 03/16] Add a ndo_mp_port_prep func to net_device_ops xiaohui.xin
2010-08-06  9:23       ` xiaohui.xin
2010-08-06  9:23       ` [RFC PATCH v9 04/16] Add a function make external buffer owner to query capability xiaohui.xin
2010-08-06  9:23         ` xiaohui.xin
2010-08-06  9:23         ` [RFC PATCH v9 05/16] Add a function to indicate if device use external buffer xiaohui.xin
2010-08-06  9:23           ` xiaohui.xin
2010-08-06  9:23           ` [RFC PATCH v9 06/16] Use callback to deal with skb_release_data() specially xiaohui.xin
2010-08-06  9:23             ` xiaohui.xin
2010-08-06  9:23             ` [RFC PATCH v9 07/16] Modify netdev_alloc_page() to get external buffer xiaohui.xin
2010-08-06  9:23               ` xiaohui.xin
2010-08-06  9:23               ` [RFC PATCH v9 08/16] Modify netdev_free_page() to release " xiaohui.xin
2010-08-06  9:23                 ` xiaohui.xin
2010-08-06  9:23                 ` [RFC PATCH v9 09/16] Don't do skb recycle, if device use " xiaohui.xin
2010-08-06  9:23                   ` xiaohui.xin
2010-08-06  9:23                   ` [RFC PATCH v9 10/16] Add a hook to intercept external buffers from NIC driver xiaohui.xin
2010-08-06  9:23                     ` xiaohui.xin
2010-08-06  9:23                     ` [RFC PATCH v9 11/16] Add header file for mp device xiaohui.xin
2010-08-06  9:23                       ` xiaohui.xin
2010-08-06  9:23                       ` [RFC PATCH v9 13/16] Add a kconfig entry and make entry " xiaohui.xin
2010-08-06  9:23                         ` xiaohui.xin
2010-08-06  9:23                         ` [RFC PATCH v9 12/16] Add mp(mediate passthru) device xiaohui.xin
2010-08-06  9:23                           ` xiaohui.xin
2010-08-06  9:23                           ` [RFC PATCH v9 14/16] Provides multiple submits and asynchronous notifications xiaohui.xin
2010-08-06  9:23                             ` xiaohui.xin
2010-08-06  9:23                             ` [RFC PATCH v9 15/16] An example how to modifiy NIC driver to use napi_gro_frags() interface xiaohui.xin
2010-08-06  9:23                               ` xiaohui.xin
2010-08-06  9:23                               ` [RFC PATCH v9 16/16] An example how to alloc user buffer based on " xiaohui.xin
2010-08-06  9:23                                 ` xiaohui.xin
2010-09-06 11:11                           ` [RFC PATCH v9 12/16] Add mp(mediate passthru) device Michael S. Tsirkin
2010-09-10 13:40                             ` Xin, Xiaohui
2010-09-11  7:41                               ` Xin, Xiaohui [this message]
2010-09-12 13:37                                 ` Michael S. Tsirkin
2010-09-15  3:13                                   ` Xin, Xiaohui
2010-09-15 11:28                                     ` Michael S. Tsirkin
2010-09-17  3:16                                       ` Xin, Xiaohui
2010-09-20  8:08                                       ` xiaohui.xin
2010-09-20 11:36                                         ` Michael S. Tsirkin
2010-09-21  1:39                                           ` Xin, Xiaohui
2010-09-21 13:14                                             ` Michael S. Tsirkin
2010-09-22 11:41                                               ` Xin, Xiaohui
2010-09-22 11:55                                                 ` Michael S. Tsirkin
2010-09-23 12:56                                                   ` Xin, Xiaohui
2010-09-26 11:50                                                     ` Michael S. Tsirkin
2010-09-27  0:42                                                       ` Xin, Xiaohui
2010-09-11  9:42                               ` Xin, Xiaohui
2010-08-11  1:23 ` [RFC PATCH v9 00/16] Provide a zero-copy method on KVM virtio-net Shirley Ma
2010-08-11  1:23   ` Shirley Ma
2010-08-11  1:43   ` Shirley Ma
2010-08-11  1:43     ` Shirley Ma
2010-08-11  6:01     ` Shirley Ma
2010-08-11  6:01       ` Shirley Ma
2010-08-11  6:55       ` Shirley Ma
2010-08-11  6:55         ` Shirley Ma
2010-09-03 10:52         ` Michael S. Tsirkin
2010-09-13 18:48           ` Shirley Ma
2010-09-13 21:35           ` Shirley Ma
2010-09-03 10:14   ` Michael S. Tsirkin
2010-09-03 20:29     ` Sridhar Samudrala

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F2E9EB7348B8264F86B6AB8151CE2D79280489AA05@shsmsx502.ccr.corp.intel.com \
    --to=xiaohui.xin@intel.com \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.hengli.com.au \
    --cc=jdike@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.