All of lore.kernel.org
 help / color / mirror / Atom feed
* [uml-devel] Contribution - Bug fixes and contributions to UML
@ 2014-02-28  8:27 Anton Ivanov (antivano)
  2014-02-28  8:33 ` Richard Weinberger
  0 siblings, 1 reply; 5+ messages in thread
From: Anton Ivanov (antivano) @ 2014-02-28  8:27 UTC (permalink / raw)
  To: Richard Weinberger, user-mode-linux-devel, jdike

Hi Richard, Hi Jeff, hi list,

On behalf of Cisco systems, I am authorized to make a offer a set bug
fixes as well as contribute several additional features and performance
improvements to UML. All of these have been used internally for a couple
of years and will ship as parts of product(s) in the near future. Some
of these improve performance by up to 8 times on use cases which are of
interest to us and are likely to be of interest to the community.

As the full patchset is now in the 100k+ zone, so I am going to do only
the announcement now and submit the patches one by one after that over
the next 1-2 weeks.

We will submit separately bug fixes for:

1. Critical memory corruption on startup observed on heavily loaded
machines (especially when multiple UMLs run simultaneously).
2. Fix(es) for incorrect handling of error conditions when UML is run
under expect and conX=fd: is used to communicate with another process.
The same error may be observed on internal UML IPCs too leading to
immediate crash.

I will also file bugs for both vs Debian UML package so that patches for
both can go in ASAP.

In addition to the bug fixes, the new features include:

1. Several transports. All can do up to multi-gigabit throughput on some
scenarios. We are contributing their counterparts to qemu/kvm as well.

1.1. Direct connection of UML to overlay networks/L2 VPNs using L2TPv3.

This has a number of advantages compared to the existing UML "multicast"
and qemu "socket" transports.

    * Standard compliant - RFC 3931 updated recently by RFC 5641
    * Supported on most network equipment
    * Allowing to move virtual switching off-host to an NPU or high
performance physical switch
    * Allowing to mix virtual and physical switching (well supported on
modern Linuxes and other OSes)
    * Well researched security profile as well as established
interactions with IPSEC allowing to extend virtual networks outside the
datacenter to remote physical devices and/or VMs.

1.2. Raw transport which allows both bi-directional communication with
any network device which looks like Ethernet as well as in-span
listening at speeds in the multi-gigabit range.

1.3. We intend to contribute other key overlay transports like GRE, etc
as well. The ones we are contributing at this point are the ones which
we have used most extensively and have had the most testing (~ 1.5-2 years).

2. New high res timer subsystem

Adding these new network transports to UML revealed a key issue - it
cannot meter or shape any traffic correctly as its internal timer system
is way off. Personally, I consider it a bug, however there is no "easy"
fix here. The only way to fix it is a new timer driver. Unfortunately,
it does not fix uml userspace - timers there remain off. It does fix all
kernel timer functionality - traffic shaping (both qdisc and iptables
traffic limits).

As a side effect, this provides performance improvements for tcp and
other protocols which rely on kernel high res timers for their state
machines.

We have further scalability contributions lined up which improve network
and IO performance between 1.5 and 8 times (depending on use case),
allow hundreds of virtual interfaces per UML without performance
penalties, allow to run several hundreds (if not thousands) of UMLs per
machine, etc. All in all, it can no go where no virtualization and no
virtual networking has gone before.

However, I would prefer to take it one step at a time and get through
these first (even these are quite a lot for one "sitting").

Best Regards,

Anton.
------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [uml-devel] Contribution - Bug fixes and contributions to UML
  2014-02-28  8:27 [uml-devel] Contribution - Bug fixes and contributions to UML Anton Ivanov (antivano)
@ 2014-02-28  8:33 ` Richard Weinberger
  2014-02-28  8:54   ` Anton Ivanov (antivano)
  2014-02-28 10:53   ` Anton Ivanov (antivano)
  0 siblings, 2 replies; 5+ messages in thread
From: Richard Weinberger @ 2014-02-28  8:33 UTC (permalink / raw)
  To: Anton Ivanov (antivano), user-mode-linux-devel, jdike

Am 28.02.2014 09:27, schrieb Anton Ivanov (antivano):
> Hi Richard, Hi Jeff, hi list,
> 
> On behalf of Cisco systems, I am authorized to make a offer a set bug
> fixes as well as contribute several additional features and performance
> improvements to UML. All of these have been used internally for a couple
> of years and will ship as parts of product(s) in the near future. Some
> of these improve performance by up to 8 times on use cases which are of
> interest to us and are likely to be of interest to the community.
> 
> As the full patchset is now in the 100k+ zone, so I am going to do only
> the announcement now and submit the patches one by one after that over
> the next 1-2 weeks.
> 
> We will submit separately bug fixes for:
> 
> 1. Critical memory corruption on startup observed on heavily loaded
> machines (especially when multiple UMLs run simultaneously).
> 2. Fix(es) for incorrect handling of error conditions when UML is run
> under expect and conX=fd: is used to communicate with another process.
> The same error may be observed on internal UML IPCs too leading to
> immediate crash.
> 
> I will also file bugs for both vs Debian UML package so that patches for
> both can go in ASAP.
> 
> In addition to the bug fixes, the new features include:
> 
> 1. Several transports. All can do up to multi-gigabit throughput on some
> scenarios. We are contributing their counterparts to qemu/kvm as well.
> 
> 1.1. Direct connection of UML to overlay networks/L2 VPNs using L2TPv3.
> 
> This has a number of advantages compared to the existing UML "multicast"
> and qemu "socket" transports.
> 
>     * Standard compliant - RFC 3931 updated recently by RFC 5641
>     * Supported on most network equipment
>     * Allowing to move virtual switching off-host to an NPU or high
> performance physical switch
>     * Allowing to mix virtual and physical switching (well supported on
> modern Linuxes and other OSes)
>     * Well researched security profile as well as established
> interactions with IPSEC allowing to extend virtual networks outside the
> datacenter to remote physical devices and/or VMs.
> 
> 1.2. Raw transport which allows both bi-directional communication with
> any network device which looks like Ethernet as well as in-span
> listening at speeds in the multi-gigabit range.
> 
> 1.3. We intend to contribute other key overlay transports like GRE, etc
> as well. The ones we are contributing at this point are the ones which
> we have used most extensively and have had the most testing (~ 1.5-2 years).
> 
> 2. New high res timer subsystem
> 
> Adding these new network transports to UML revealed a key issue - it
> cannot meter or shape any traffic correctly as its internal timer system
> is way off. Personally, I consider it a bug, however there is no "easy"
> fix here. The only way to fix it is a new timer driver. Unfortunately,
> it does not fix uml userspace - timers there remain off. It does fix all
> kernel timer functionality - traffic shaping (both qdisc and iptables
> traffic limits).
> 
> As a side effect, this provides performance improvements for tcp and
> other protocols which rely on kernel high res timers for their state
> machines.
> 
> We have further scalability contributions lined up which improve network
> and IO performance between 1.5 and 8 times (depending on use case),
> allow hundreds of virtual interfaces per UML without performance
> penalties, allow to run several hundreds (if not thousands) of UMLs per
> machine, etc. All in all, it can no go where no virtualization and no
> virtual networking has gone before.
> 
> However, I would prefer to take it one step at a time and get through
> these first (even these are quite a lot for one "sitting").

Sounds awesome!

Please send the patches as soon as possible.
I'm eager to test and merge them.

Thanks,
//richard

------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [uml-devel] Contribution - Bug fixes and contributions to UML
  2014-02-28  8:33 ` Richard Weinberger
@ 2014-02-28  8:54   ` Anton Ivanov (antivano)
  2014-03-06  6:52     ` Anton Ivanov (antivano)
  2014-02-28 10:53   ` Anton Ivanov (antivano)
  1 sibling, 1 reply; 5+ messages in thread
From: Anton Ivanov (antivano) @ 2014-02-28  8:54 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: jdike, user-mode-linux-devel

[-- Attachment #1: Type: text/plain, Size: 5490 bytes --]

Bugfixes.

I need to pull actual changesets for the drivers, etc properly and
verify that they build so those will be coming next week. You will be
getting them one by one.

1. Memory corruption.

The reverse case of this race (you need to msync, before you do non-mmap
fileops) is well known and textbook. This is the first and only time I
have seen this one (fsync before mmap). I have not heard it mentioned
either. It is however fairly easy to reproduce. If you run 200+ UML on a
system ~0.2-0.5% will always die at startup with a memory corruption
warning. While this does not happen every time (0.2-0.5% and only on
startup) it is very reproducible for systems running lots of UMLs.

Once this fix went in we stopped seeing that one. Observed on 3.2, 3.3
and 3.8, fix tested on 3.2, 3.3, 3.4 and 3.8.

2. SIGPIPE.

Linux actually produces SIGPIPE ane EPIPE not only on missing reader. It
will produce it under some circumstances on a stalled reader. Discovered
when running UML under expect and/or trying to use fds and other virtual
serials to do management transactions.

While I have  not seen it on UML internal pipes I would not be surprised
if you can reproduce it there too (f.e. if ubd thread is too slow). So
SIGPIPE needs to be disabled. From there on, for most drivers have
correct error handling for this.

Observed on 3.2, 3.3 and 3.8, fix tested on 3.2, 3.3, 3.4 and 3.8.

A.


On 28/02/14 08:33, Richard Weinberger wrote:
> Am 28.02.2014 09:27, schrieb Anton Ivanov (antivano):
>> Hi Richard, Hi Jeff, hi list,
>>
>> On behalf of Cisco systems, I am authorized to make a offer a set bug
>> fixes as well as contribute several additional features and performance
>> improvements to UML. All of these have been used internally for a couple
>> of years and will ship as parts of product(s) in the near future. Some
>> of these improve performance by up to 8 times on use cases which are of
>> interest to us and are likely to be of interest to the community.
>>
>> As the full patchset is now in the 100k+ zone, so I am going to do only
>> the announcement now and submit the patches one by one after that over
>> the next 1-2 weeks.
>>
>> We will submit separately bug fixes for:
>>
>> 1. Critical memory corruption on startup observed on heavily loaded
>> machines (especially when multiple UMLs run simultaneously).
>> 2. Fix(es) for incorrect handling of error conditions when UML is run
>> under expect and conX=fd: is used to communicate with another process.
>> The same error may be observed on internal UML IPCs too leading to
>> immediate crash.
>>
>> I will also file bugs for both vs Debian UML package so that patches for
>> both can go in ASAP.
>>
>> In addition to the bug fixes, the new features include:
>>
>> 1. Several transports. All can do up to multi-gigabit throughput on some
>> scenarios. We are contributing their counterparts to qemu/kvm as well.
>>
>> 1.1. Direct connection of UML to overlay networks/L2 VPNs using L2TPv3.
>>
>> This has a number of advantages compared to the existing UML "multicast"
>> and qemu "socket" transports.
>>
>>     * Standard compliant - RFC 3931 updated recently by RFC 5641
>>     * Supported on most network equipment
>>     * Allowing to move virtual switching off-host to an NPU or high
>> performance physical switch
>>     * Allowing to mix virtual and physical switching (well supported on
>> modern Linuxes and other OSes)
>>     * Well researched security profile as well as established
>> interactions with IPSEC allowing to extend virtual networks outside the
>> datacenter to remote physical devices and/or VMs.
>>
>> 1.2. Raw transport which allows both bi-directional communication with
>> any network device which looks like Ethernet as well as in-span
>> listening at speeds in the multi-gigabit range.
>>
>> 1.3. We intend to contribute other key overlay transports like GRE, etc
>> as well. The ones we are contributing at this point are the ones which
>> we have used most extensively and have had the most testing (~ 1.5-2 years).
>>
>> 2. New high res timer subsystem
>>
>> Adding these new network transports to UML revealed a key issue - it
>> cannot meter or shape any traffic correctly as its internal timer system
>> is way off. Personally, I consider it a bug, however there is no "easy"
>> fix here. The only way to fix it is a new timer driver. Unfortunately,
>> it does not fix uml userspace - timers there remain off. It does fix all
>> kernel timer functionality - traffic shaping (both qdisc and iptables
>> traffic limits).
>>
>> As a side effect, this provides performance improvements for tcp and
>> other protocols which rely on kernel high res timers for their state
>> machines.
>>
>> We have further scalability contributions lined up which improve network
>> and IO performance between 1.5 and 8 times (depending on use case),
>> allow hundreds of virtual interfaces per UML without performance
>> penalties, allow to run several hundreds (if not thousands) of UMLs per
>> machine, etc. All in all, it can no go where no virtualization and no
>> virtual networking has gone before.
>>
>> However, I would prefer to take it one step at a time and get through
>> these first (even these are quite a lot for one "sitting").
> Sounds awesome!
>
> Please send the patches as soon as possible.
> I'm eager to test and merge them.
>
> Thanks,
> //richard


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: sigpipe.diff --]
[-- Type: text/x-patch; name="sigpipe.diff", Size: 334 bytes --]

diff --git a/arch/um/os-Linux/main.c b/arch/um/os-Linux/main.c
index 7a86dd5..048166d 100644
--- a/arch/um/os-Linux/main.c
+++ b/arch/um/os-Linux/main.c
@@ -149,6 +149,7 @@ int __init main(int argc, char **argv, char **envp)
 #endif
 
 	do_uml_initcalls();
+	change_sig(SIGPIPE, 0);
 	ret = linux_main(argc, argv);
 
 	/*

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: memory-corruption.diff --]
[-- Type: text/x-patch; name="memory-corruption.diff", Size: 2258 bytes --]

diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h
index b314cf7..e984b2c 100644
--- a/arch/um/include/shared/os.h
+++ b/arch/um/include/shared/os.h
@@ -144,6 +144,7 @@ extern int os_write_file(int fd, const void *buf, int count);
 extern int os_file_size(const char *file, unsigned long long *size_out);
 extern int os_file_modtime(const char *file, unsigned long *modtime);
 extern int os_pipe(int *fd, int stream, int close_on_exec);
+extern int os_fsync(int fd);
 extern int os_set_fd_async(int fd);
 extern int os_clear_fd_async(int fd);
 extern int os_set_fd_block(int fd, int blocking);
diff --git a/arch/um/kernel/physmem.c b/arch/um/kernel/physmem.c
index f116db1..ea606ee 100644
--- a/arch/um/kernel/physmem.c
+++ b/arch/um/kernel/physmem.c
@@ -87,6 +87,14 @@ void __init setup_physmem(unsigned long start, unsigned long reserve_end,
 
 	physmem_fd = create_mem_file(len + highmem);
 
+	/*
+	 * Special kludge - This page will be mapped in to userspace processes
+	 * from physmem_fd, so it needs to be written out there.
+	 */
+	os_seek_file(physmem_fd, __pa(&__syscall_stub_start));
+	os_write_file(physmem_fd, &__syscall_stub_start, PAGE_SIZE);
+        os_fsync(physmem_fd);
+
 	offset = uml_reserved - uml_physmem;
 	err = os_map_memory((void *) uml_reserved, physmem_fd, offset,
 			    len - offset, 1, 1, 1);
@@ -97,12 +105,6 @@ void __init setup_physmem(unsigned long start, unsigned long reserve_end,
 		exit(1);
 	}
 
-	/*
-	 * Special kludge - This page will be mapped in to userspace processes
-	 * from physmem_fd, so it needs to be written out there.
-	 */
-	os_seek_file(physmem_fd, __pa(&__syscall_stub_start));
-	os_write_file(physmem_fd, &__syscall_stub_start, PAGE_SIZE);
 
 	bootmap_size = init_bootmem(pfn, pfn + delta);
 	free_bootmem(__pa(reserve_end) + bootmap_size,
diff --git a/arch/um/os-Linux/file.c b/arch/um/os-Linux/file.c
index b049a63..d4985be99 100644
--- a/arch/um/os-Linux/file.c
+++ b/arch/um/os-Linux/file.c
@@ -33,6 +33,10 @@ static void copy_stat(struct uml_stat *dst, const struct stat64 *src)
 	});
 }
 
+int os_fsync(int fd) {
+   fsync(fd);
+}
+
 int os_stat_fd(const int fd, struct uml_stat *ubuf)
 {
 	struct stat64 sbuf;

[-- Attachment #4: Type: text/plain, Size: 436 bytes --]

------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk

[-- Attachment #5: Type: text/plain, Size: 194 bytes --]

_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [uml-devel] Contribution - Bug fixes and contributions to UML
  2014-02-28  8:33 ` Richard Weinberger
  2014-02-28  8:54   ` Anton Ivanov (antivano)
@ 2014-02-28 10:53   ` Anton Ivanov (antivano)
  1 sibling, 0 replies; 5+ messages in thread
From: Anton Ivanov (antivano) @ 2014-02-28 10:53 UTC (permalink / raw)
  To: user-mode-linux-devel

[-- Attachment #1: Type: text/plain, Size: 8324 bytes --]

Hi all,

First of all, if this does not apply cleanly somewhere, accept my
apologies. The patchsets have grown organically for the last 1.5 years
and I am having trouble disentangling some of them into distinct
features. Feel free to bounce, I will be happy to fix it.

This is taken vs 3.3.8. The choice of kernel is because this is the
production Openwrt AA kernel which what we use for development. It
applies with minimal fuss from 3.2 to 3.8 (I have not tested it further).

Attached are the l2tpv3 and the raw transports.

Principles of operation:

UML original network infrastructure allocates a SKB at a time, reads
into it using a driver read function and if the read is successful
passes the skb to if_rx. I

This will not work for multi-packet receive unless you copy. Without
multipacket receive it is difficult to go beyond 1.5Gbit for L2TPv3 and
1Gbit for raw. With multipacket you are looking at double that (at
least). As an alternative to multipacket rx raw can also use packet
mmap. While this incurs an extra copy the savings from fewer syscalls
allow it to get fairly reasonable performance. Multipacket tx is not
used for now.

To avoid the copy we pre-allocate a vector of skbs, a matching vector of
supporting messages for recvmsg/sendmsg and matching iovs.

If the encapsulation has a distinct header (l2tpv3, gre, etc) we use
readmsg (or readmmsg) to split that header from the packet payload on
rx. If there is no header (raw) - we read directly. Same on xmit. So the
structures for the former and the latter differ somewhat - the former
uses a pointer to a two buffer iov (one fo header, one for payload), the
latter a pointer to a single buffer iov in the msg_hdr structures.
Otherwise, they are similar.

In order to avoid invoking kernel side functions directly the 3 kernel
side bits we need to build these vectors are isolated in net_extra_kern.c

The userspace bits including wrappers around recvmsg, sendmsg and
recvmmsg are in net_extra_user.c The same file contains all the infra to
build the vectors.

When using vector io we no longer need drop_skb too. If there is no skb
alloced (buffer length set to zero or buffer is NULL), recvmsg and
recvmmsg will drop the packet for us.

>>From there on, the transports use this common infra.

The arguments for l2tpv3 are:

eth1=l2tpv3,,src,srcport,dst,dstport,rxcookie,txcookie,rxsession,txsession,unused,mode

mode is a bitmask (described in the uml_l2tpv3.h include file).


1           /* on for v6, off for v4 */
2           /* on for udp, off for raw ip */
4           /* cookie present */
8           /* on for 64 bit cookie */
16          /* draft keyed ip - no counter */

src, srcport - source, format according to the above

dst, dstport - destination (optional, if null will listen for first packet)

txcookie, rxcookie, txsession, rxsession - same as for ip l2tpv3 arguments

unused - older mode spec, not used now

mode - current (bitmask) mode spec

Examples:

eth1=l2tpv3,,192.168.64.1,,192.168.128.1,,0xdeadbeefdeadbeef,0xbeefdeadbeefdead,0xffffffff,0xffffffff,,c

This will configure raw ip v4 EoGRE tunnel from 64.1 to 128.1. The
config is not very user friendly, but it is a bit difficult to make it
so using current option parsing routines.

Raw is much simpler:

eth1=raw,,ifname

ifname is the name of the interface to bind to.

Note - the driver does not bring up the interface or set its mode. For
most use cases you have to do an ifconfig ethX up promisc before binding
to it for this to work.

A.


On 28/02/14 08:33, Richard Weinberger wrote:
> Am 28.02.2014 09:27, schrieb Anton Ivanov (antivano):
>> Hi Richard, Hi Jeff, hi list,
>>
>> On behalf of Cisco systems, I am authorized to make a offer a set bug
>> fixes as well as contribute several additional features and performance
>> improvements to UML. All of these have been used internally for a couple
>> of years and will ship as parts of product(s) in the near future. Some
>> of these improve performance by up to 8 times on use cases which are of
>> interest to us and are likely to be of interest to the community.
>>
>> As the full patchset is now in the 100k+ zone, so I am going to do only
>> the announcement now and submit the patches one by one after that over
>> the next 1-2 weeks.
>>
>> We will submit separately bug fixes for:
>>
>> 1. Critical memory corruption on startup observed on heavily loaded
>> machines (especially when multiple UMLs run simultaneously).
>> 2. Fix(es) for incorrect handling of error conditions when UML is run
>> under expect and conX=fd: is used to communicate with another process.
>> The same error may be observed on internal UML IPCs too leading to
>> immediate crash.
>>
>> I will also file bugs for both vs Debian UML package so that patches for
>> both can go in ASAP.
>>
>> In addition to the bug fixes, the new features include:
>>
>> 1. Several transports. All can do up to multi-gigabit throughput on some
>> scenarios. We are contributing their counterparts to qemu/kvm as well.
>>
>> 1.1. Direct connection of UML to overlay networks/L2 VPNs using L2TPv3.
>>
>> This has a number of advantages compared to the existing UML "multicast"
>> and qemu "socket" transports.
>>
>>     * Standard compliant - RFC 3931 updated recently by RFC 5641
>>     * Supported on most network equipment
>>     * Allowing to move virtual switching off-host to an NPU or high
>> performance physical switch
>>     * Allowing to mix virtual and physical switching (well supported on
>> modern Linuxes and other OSes)
>>     * Well researched security profile as well as established
>> interactions with IPSEC allowing to extend virtual networks outside the
>> datacenter to remote physical devices and/or VMs.
>>
>> 1.2. Raw transport which allows both bi-directional communication with
>> any network device which looks like Ethernet as well as in-span
>> listening at speeds in the multi-gigabit range.
>>
>> 1.3. We intend to contribute other key overlay transports like GRE, etc
>> as well. The ones we are contributing at this point are the ones which
>> we have used most extensively and have had the most testing (~ 1.5-2 years).
>>
>> 2. New high res timer subsystem
>>
>> Adding these new network transports to UML revealed a key issue - it
>> cannot meter or shape any traffic correctly as its internal timer system
>> is way off. Personally, I consider it a bug, however there is no "easy"
>> fix here. The only way to fix it is a new timer driver. Unfortunately,
>> it does not fix uml userspace - timers there remain off. It does fix all
>> kernel timer functionality - traffic shaping (both qdisc and iptables
>> traffic limits).
>>
>> As a side effect, this provides performance improvements for tcp and
>> other protocols which rely on kernel high res timers for their state
>> machines.
>>
>> We have further scalability contributions lined up which improve network
>> and IO performance between 1.5 and 8 times (depending on use case),
>> allow hundreds of virtual interfaces per UML without performance
>> penalties, allow to run several hundreds (if not thousands) of UMLs per
>> machine, etc. All in all, it can no go where no virtualization and no
>> virtual networking has gone before.
>>
>> However, I would prefer to take it one step at a time and get through
>> these first (even these are quite a lot for one "sitting").
> Sounds awesome!
>
> Please send the patches as soon as possible.
> I'm eager to test and merge them.
>
> Thanks,
> //richard
>
> ------------------------------------------------------------------------------
> Flow-based real-time traffic analytics software. Cisco certified tool.
> Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
> Customize your own dashboards, set traffic alerts and generate reports.
> Network behavioral analysis & security monitoring. All-in-one tool.
> http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk
> _______________________________________________
> User-mode-linux-devel mailing list
> User-mode-linux-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel
>


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: linux.diff --]
[-- Type: text/x-patch; name="linux.diff", Size: 60462 bytes --]

diff --git a/arch/um/Kconfig.net b/arch/um/Kconfig.net
index 3160b1a..386dc12 100644
--- a/arch/um/Kconfig.net
+++ b/arch/um/Kconfig.net
@@ -108,6 +108,22 @@ config UML_NET_DAEMON
         more than one without conflict.  If you don't need UML networking,
         say N.
 
+config UML_NET_RAW
+	bool "Raw transport"
+	depends on UML_NET
+	help
+	This User-Mode Linux network transport binds a VM ethX interface
+	to a host interface using raw sockets. The host interface is 
+	expected to have Ethernet framing
+
+config UML_NET_L2TPV3
+	bool "L2TPv3 transport"
+	depends on UML_NET
+	help
+	This User-Mode Linux network transport binds an IP address (v4 or v6)
+	and listens for incoming L2TPV3 frames. Frames are decapsulated
+	and presented to the VM. 
+
 config UML_NET_VDE
 	bool "VDE transport"
 	depends on UML_NET
diff --git a/arch/um/drivers/Makefile b/arch/um/drivers/Makefile
index e7582e1..ae30b83 100644
--- a/arch/um/drivers/Makefile
+++ b/arch/um/drivers/Makefile
@@ -1,6 +1,7 @@
 # 
 # Copyright (C) 2000, 2002, 2003 Jeff Dike (jdike@karaya.com)
 # Licensed under the GPL
+# Copyright (C) 2012 - 2014 Cisco Systems
 #
 
 # pcap is broken in 2.5 because kbuild doesn't allow pcap.a to be linked
@@ -9,8 +10,10 @@
 slip-objs := slip_kern.o slip_user.o
 slirp-objs := slirp_kern.o slirp_user.o
 daemon-objs := daemon_kern.o daemon_user.o
+uml_raw-objs := uml_raw_kern.o uml_raw_user.o
+uml_l2tpv3-objs := uml_l2tpv3_kern.o uml_l2tpv3_user.o
 umcast-objs := umcast_kern.o umcast_user.o
-net-objs := net_kern.o net_user.o
+net-objs := net_kern.o net_user.o net_extra_user.o net_extra_kern.o
 mconsole-objs := mconsole_kern.o mconsole_user.o
 hostaudio-objs := hostaudio_kern.o
 ubd-objs := ubd_kern.o ubd_user.o
@@ -43,6 +46,8 @@ obj-$(CONFIG_STDERR_CONSOLE) += stderr_console.o
 obj-$(CONFIG_UML_NET_SLIP) += slip.o slip_common.o
 obj-$(CONFIG_UML_NET_SLIRP) += slirp.o slip_common.o
 obj-$(CONFIG_UML_NET_DAEMON) += daemon.o 
+obj-$(CONFIG_UML_NET_RAW) += uml_raw.o 
+obj-$(CONFIG_UML_NET_L2TPV3) += uml_l2tpv3.o 
 obj-$(CONFIG_UML_NET_VDE) += vde.o
 obj-$(CONFIG_UML_NET_MCAST) += umcast.o
 obj-$(CONFIG_UML_NET_PCAP) += pcap.o
@@ -56,12 +61,13 @@ obj-$(CONFIG_PORT_CHAN) += port.o
 obj-$(CONFIG_PTY_CHAN) += pty.o
 obj-$(CONFIG_TTY_CHAN) += tty.o 
 obj-$(CONFIG_XTERM_CHAN) += xterm.o xterm_kern.o
+obj-$(CONFIG_UNIX_CHAN) += chan_unix.o
 obj-$(CONFIG_UML_WATCHDOG) += harddog.o
 obj-$(CONFIG_BLK_DEV_COW_COMMON) += cow_user.o
 obj-$(CONFIG_UML_RANDOM) += random.o
 
 # pcap_user.o must be added explicitly.
-USER_OBJS := fd.o null.o pty.o tty.o xterm.o slip_common.o pcap_user.o vde_user.o
+USER_OBJS := fd.o null.o pty.o tty.o xterm.o slip_common.o pcap_user.o vde_user.o chan_unix.o
 CFLAGS_null.o = -DDEV_NULL=$(DEV_NULL_PATH)
 
 include arch/um/scripts/Makefile.rules
diff --git a/arch/um/drivers/net_extra_kern.c b/arch/um/drivers/net_extra_kern.c
new file mode 100644
index 0000000..40b294c
--- /dev/null
+++ b/arch/um/drivers/net_extra_kern.c
@@ -0,0 +1,62 @@
+/*
+ * Copyright (C) 2012 - 2014 Cisco Systems
+ * Copyright (C) 2001 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com)
+ * Copyright (C) 2001 Lennert Buytenhek (buytenh@gnu.org) and
+ * James Leu (jleu@mindspring.net).
+ * Copyright (C) 2001 by various other people who didn't put their name here.
+ * Licensed under the GPL.
+ */
+
+#include <linux/bootmem.h>
+#include <linux/etherdevice.h>
+#include <linux/ethtool.h>
+#include <linux/inetdevice.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/netdevice.h>
+#include <linux/platform_device.h>
+#include <linux/rtnetlink.h>
+#include <linux/skbuff.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include "init.h"
+#include "irq_kern.h"
+#include "irq_user.h"
+#include "mconsole_kern.h"
+#include "net_kern.h"
+#include "net_user.h"
+
+#define DRIVER_NAME "uml-netdev"
+
+/* 
+    These are wrappers around key kernel side functions so we can
+    invoke them from the user side of our Schizofreniac self
+
+*/
+
+void uml_net_destroy_skb(void * skb) {
+    if (skb) {
+	kfree_skb((struct sk_buff *) skb);
+    }
+}
+
+void * uml_net_build_skb (void * dev) {
+    struct uml_net_private *lp = netdev_priv((struct net_device *) dev);
+    struct sk_buff * skb;
+
+    skb =  dev_alloc_skb(lp->max_packet + 32);
+    if (skb) {
+	/* add some tunneling space just in case, we usually do not need it as we use vector IO */
+	skb_reserve(skb,32);	
+	skb->dev = dev;
+	skb_put(skb, lp->max_packet);
+	skb_reset_mac_header(skb);
+    }
+    return skb;
+}
+
+void * uml_net_skb_data (void * skb) {
+    return ((struct sk_buff *) skb)->data;
+}
+
+
diff --git a/arch/um/drivers/net_extra_user.c b/arch/um/drivers/net_extra_user.c
new file mode 100644
index 0000000..ebde8a4
--- /dev/null
+++ b/arch/um/drivers/net_extra_user.c
@@ -0,0 +1,291 @@
+/*
+ * Copyright (C) 2012 - 2014 Cisco Systems
+ * Copyright (C) 2001 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com)
+ * Licensed under the GPL
+ */
+
+#include <stdio.h>
+#include <unistd.h>
+#include <stdarg.h>
+#include <errno.h>
+#include <stddef.h>
+#include <string.h>
+#include <sys/socket.h>
+#include <sys/wait.h>
+#include "net_user.h"
+#include "os.h"
+#include "um_malloc.h"
+
+/* 
+    Principles of operation:
+
+    EVERYTHING here is built to tolerate a failed memory allocation. If either a header buffer
+    or a data buffer (taken from skb->data) is NULL the read will fail and the packet will be
+    dropped. This is the normal behaviour of recvmsg and recvmmsg functions - if a particular
+    iov_base == NULL and its corresponding iov_baselen is 0 we truncate and/or drop the packet
+    altogether.
+    
+    On the negative side this means that we have to do a few more checks for NULL here and there.
+    On the positive side this means that the whole thing is more robust including under low
+    memory conditions.
+
+    There is one special case which we need to handle as a result of this - any header verification
+    functions should return "broken header" on hitting a NULL. This will in turn invoke the applicable
+    packet drop logic.
+
+    Any changes should follow this overall design.
+
+    Side effect - none of these need to use the shared (and mutexed) drop skb. This is surplus to reqs,
+    the normal recvm(m)msg drop mechanics will drop it.
+    
+*/
+
+int net_readv(int fd, void *iov, int iovcnt)
+{
+	int n;
+
+	n = readv(fd,  iov,  iovcnt);
+
+	if ((n < 0) && (errno == EAGAIN))
+		return 0;
+	else if (n == 0)
+		return -ENOTCONN;
+	return n;
+}
+
+int net_recvfrom2(int fd, void *buf, int len, void *src_addr, int *addrlen)
+{
+	int n;
+
+	CATCH_EINTR(n = recvfrom(fd,  buf,  len, 0, src_addr, addrlen));
+	if (n < 0) {
+		if (errno == EAGAIN)
+			return 0;
+		return -errno;
+	}
+	else if (n == 0)
+		return -ENOTCONN;
+	return n;
+}
+
+
+
+
+
+int net_writev(int fd, void *iov, int iovcnt)
+{
+	int n;
+
+	n = writev(fd, iov, iovcnt);
+
+	if ((n < 0) && (errno == EAGAIN))
+		return 0;
+	else if (n == 0)
+		return -ENOTCONN;
+	return n;
+}
+
+int net_sendmessage(int fd, void *msg, int flags)
+{
+	int n;
+
+	CATCH_EINTR(n = sendmsg(fd, msg, flags));
+	if (n < 0) {
+		if (errno == EAGAIN)
+			return 0;
+		return -errno;
+	}
+	else if (n == 0)
+		return -ENOTCONN;
+	return n;
+}
+int net_recvmessage(int fd, void *msg, int flags)
+{
+	int n;
+
+	CATCH_EINTR(n = recvmsg(fd, msg, flags));
+	if (n < 0) {
+		if (errno == EAGAIN)
+			return 0;
+		return -errno;
+	}
+	else if (n == 0)
+		return -ENOTCONN;
+	return n;
+}
+
+int net_recvmmsg(int fd, void *msgvec, unsigned int vlen,
+                    unsigned int flags, struct timespec *timeout)
+{
+	int n;
+
+	CATCH_EINTR(n = recvmmsg(fd, msgvec, vlen, flags, timeout));
+	if (n < 0) {
+		if (errno == EAGAIN)
+			return 0;
+		return -errno;
+	}
+	else if (n == 0)
+		return -ENOTCONN;
+	return n;
+}
+//int net_sendmmsg(int fd, void *msgvec, unsigned int vlen,
+//                    unsigned int flags, struct timespec *timeout) 
+//{
+//	int n;
+//
+//	CATCH_EINTR(n = sendmmsg(fd, msgvec, vlen, flags));
+//	if (n < 0) {
+//		if (errno == EAGAIN)
+//			return 0;
+//		return -errno;
+//	}
+//	else if (n == 0)
+//		return -ENOTCONN;
+//	return n;
+//}
+
+void destroy_skb_vector(void ** vector, int size) {
+    int i;
+    for (i=0;i<size;i++) {
+	if ( * vector) {
+	    uml_net_destroy_skb(* vector);
+	}
+	vector ++;
+    }
+    kfree(vector);
+}
+
+void destroy_mmsg_vector(void * mmsgvector, int size, int free_iov_base) {
+    struct mmsghdr * vector = (struct mmsghdr *) mmsgvector;
+    struct iovec * iov;
+    int i;
+    for (i=0;i<size;i++) {
+	iov = vector->msg_hdr.msg_iov;
+	if (iov) {
+	    if (free_iov_base) {
+		kfree(iov->iov_base);
+	    }
+	    kfree(iov);
+	}
+	vector ++;
+    }
+    kfree(vector);
+}
+
+void * build_skbuf_vector(int size, void * dev){
+    int i;
+    void **result, **vector;
+    result = uml_kmalloc(size * sizeof(void *), UM_GFP_KERNEL);
+    vector = result;
+    if (vector) {
+	for (i=0;i<size;i++) {
+	   * vector = uml_net_build_skb(dev);
+	   vector++;
+	}
+    }
+    return result;
+}  
+
+void rebuild_skbuf_vector(void ** skbvec, int size, void * dev){
+    int i;
+    if (skbvec) {
+	for (i=0;i<size;i++) {
+	   * skbvec = uml_net_build_skb(dev);
+	   skbvec++;
+	}
+    }
+}  
+
+void repair_mmsg (void *vec, int iovsize, int header_size) {
+    struct mmsghdr * msgvec = (struct mmsghdr *) vec;
+    struct iovec * iov;
+    if (! msgvec->msg_hdr.msg_iov) {
+	msgvec->msg_hdr.msg_iov = uml_kmalloc(sizeof(struct iovec) * iovsize, UM_GFP_KERNEL);
+    }
+    iov = msgvec->msg_hdr.msg_iov;
+    if (iov) {
+	if (! iov->iov_base) {
+	    iov->iov_base=uml_kmalloc(header_size, UM_GFP_KERNEL);
+	}
+	if (iov->iov_base) {
+	    /* put correct header size just in case - we may have had a short frame */
+	    iov->iov_len = header_size; 
+	} else {
+	    printk("failed to allocate a header buffer, will cause a packet drop later\n");
+	    iov->iov_len = 0;
+	}
+    }
+}
+
+void * build_mmsg_vector(int size, int iovsize) {
+    int i;
+    struct mmsghdr *msgvec, *result;
+    struct iovec * iov;
+    result = uml_kmalloc(sizeof(struct mmsghdr) * size, UM_GFP_KERNEL);
+    msgvec = result;
+    if (msgvec) {
+	memset(msgvec, '\0', sizeof(struct mmsghdr) * size); 
+	for (i=0;i<size;i++) {
+	    iov = uml_kmalloc(sizeof(struct iovec) * iovsize, UM_GFP_KERNEL);
+	    msgvec->msg_hdr.msg_iov=iov;
+	    if (iov) {
+		memset(iov, '\0', sizeof(struct iovec) * iovsize); 
+		msgvec->msg_hdr.msg_iovlen=iovsize;
+	    } else {
+		msgvec->msg_hdr.msg_iovlen=0; /* silent drop on receive, no xmit */
+	    }
+	    msgvec++;
+	}
+    }
+    return result;
+}
+
+void add_header_buffers(void * msgvec, int size, int header_size) {
+    int i;
+    struct iovec * iov;
+    struct mmsghdr * mmsgvec = (struct mmsghdr *) msgvec;
+    for (i=0;i<size;i++) {
+	iov = mmsgvec->msg_hdr.msg_iov;
+	if (iov) {
+	    iov->iov_base=uml_kmalloc(header_size, UM_GFP_KERNEL);
+	    if (iov->iov_base) {
+		iov->iov_len = header_size;
+	    } else {
+		printk("failed to allocate a header buffer, will cause a packet drop later\n");
+		iov->iov_len = 0;
+	    }
+	} 
+	mmsgvec++;
+    }
+}
+
+/* NOTE - this is only for offset = 0 or 1, other cases are unhandled!!! */
+
+void add_skbuffs(void * msgvec, void ** skbvec, int size, int skb_size, int offset) {
+    int i;
+    struct iovec * iov;
+    struct mmsghdr * mmsgvec = (struct mmsghdr *) msgvec;
+    for (i=0;i<size;i++) {
+	/* 
+	    This heavily relies on all IOVs being present, if the initial allocation 
+	    fails it must clean up and switch to "normal" per-packet receive instead
+	    Later allocations of skbufs can fail - this will result in short reads
+	    and skips
+
+	 */
+	iov = mmsgvec->msg_hdr.msg_iov;
+	if (iov) {
+	    iov += offset; 
+	    iov->iov_base=uml_net_skb_data(* skbvec);
+	    if (iov->iov_base) {
+		iov->iov_len = skb_size;
+	    } else {
+		iov->iov_len = 0;
+	    }
+	} 
+	mmsgvec++;
+	skbvec++;
+    }
+}
+
diff --git a/arch/um/drivers/net_kern.c b/arch/um/drivers/net_kern.c
index a492e59..afe6086 100644
--- a/arch/um/drivers/net_kern.c
+++ b/arch/um/drivers/net_kern.c
@@ -1,4 +1,5 @@
 /*
+ * Copyright (C) 2012 - 2014 Cisco Systems
  * Copyright (C) 2001 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com)
  * Copyright (C) 2001 Lennert Buytenhek (buytenh@gnu.org) and
  * James Leu (jleu@mindspring.net).
@@ -42,6 +43,7 @@ static DEFINE_SPINLOCK(drop_lock);
 static struct sk_buff *drop_skb;
 static int drop_max;
 
+
 static int update_drop_skb(int max)
 {
 	struct sk_buff *new;
@@ -77,24 +79,39 @@ static int uml_net_rx(struct net_device *dev)
 	struct sk_buff *skb;
 
 	/* If we can't allocate memory, try again next round. */
-	skb = dev_alloc_skb(lp->max_packet);
-	if (skb == NULL) {
-		drop_skb->dev = dev;
-		/* Read a packet into drop_skb and don't do anything with it. */
-		(*lp->read)(lp->fd, drop_skb, lp);
-		dev->stats.rx_dropped++;
+	if (lp->options & UML_NET_USE_SKB_READ) {
+	    /* we expect a full formed, well behaved skb from zero copy drivers here */
+	    skb = (*lp->skb_read)(lp);
+	    if (skb == NULL) {
 		return 0;
-	}
-
-	skb->dev = dev;
-	skb_put(skb, lp->max_packet);
-	skb_reset_mac_header(skb);
-	pkt_len = (*lp->read)(lp->fd, skb, lp);
-
-	if (pkt_len > 0) {
+	    }
+	    pkt_len = skb->len;
+	    skb->ip_summed = CHECKSUM_NONE;
+	} else {
+	    skb = dev_alloc_skb(lp->max_packet + 32);
+	    if (skb == NULL) {
+		    drop_skb->dev = dev;
+		    /* Read a packet into drop_skb and don't do anything with it. */
+		    (*lp->read)(lp->fd, drop_skb, lp);
+		    dev->stats.rx_dropped++;
+		    return 0;
+	    }
+
+	    skb_reserve(skb,32);
+	    skb->dev = dev;
+	    skb_put(skb, lp->max_packet);
+	    skb_reset_mac_header(skb);
+
+	    // Mark that virtual devices cannot provide required checksum.
+	    skb->ip_summed = CHECKSUM_NONE;
+	    pkt_len = (*lp->read)(lp->fd, skb, lp);
+	    if (pkt_len > 0) {
 		skb_trim(skb, pkt_len);
 		skb->protocol = (*lp->protocol)(skb);
+	    }
+	}
 
+	if (pkt_len > 0) {
 		dev->stats.rx_bytes += skb->len;
 		dev->stats.rx_packets++;
 		netif_rx(skb);
@@ -137,7 +154,7 @@ static irqreturn_t uml_net_interrupt(int irq, void *dev_id)
 		schedule_work(&lp->work);
 		goto out;
 	}
-	reactivate_fd(lp->fd, UM_ETH_IRQ);
+	reactivate_fd(lp->fd, dev->irq);
 
 out:
 	spin_unlock(&lp->lock);
@@ -161,7 +178,7 @@ static int uml_net_open(struct net_device *dev)
 	}
 
 	err = um_request_irq(dev->irq, lp->fd, IRQ_READ, uml_net_interrupt,
-			     IRQF_DISABLED | IRQF_SHARED, dev->name, dev);
+			     IRQF_SHARED, dev->name, dev);
 	if (err != 0) {
 		printk(KERN_ERR "uml_net_open: failed to get irq(%d)\n", err);
 		err = -ENETUNREACH;
@@ -446,6 +463,7 @@ static void eth_configure(int n, void *init, char *mac,
 	 * These just fill in a data structure, so there's no failure
 	 * to be worried about.
 	 */
+	dev->ethtool_ops = &uml_net_ethtool_ops;
 	(*transport->kern->init)(dev, init);
 
 	*lp = ((struct uml_net_private)
@@ -458,7 +476,9 @@ static void eth_configure(int n, void *init, char *mac,
 		  .open 		= transport->user->open,
 		  .close 		= transport->user->close,
 		  .remove 		= transport->user->remove,
+		  .options 		= transport->kern->options,
 		  .read 		= transport->kern->read,
+		  .skb_read 		= transport->kern->skb_read,
 		  .write 		= transport->kern->write,
 		  .add_address 		= transport->user->add_address,
 		  .delete_address  	= transport->user->delete_address });
@@ -476,7 +496,6 @@ static void eth_configure(int n, void *init, char *mac,
 	memcpy(dev->dev_addr, device->mac, ETH_ALEN);
 	dev->mtu = transport->user->mtu;
 	dev->netdev_ops = &uml_netdev_ops;
-	dev->ethtool_ops = &uml_net_ethtool_ops;
 	dev->watchdog_timeo = (HZ >> 1);
 	dev->irq = UM_ETH_IRQ;
 
diff --git a/arch/um/drivers/net_user.c b/arch/um/drivers/net_user.c
index 05090c3..bbfaf47 100644
--- a/arch/um/drivers/net_user.c
+++ b/arch/um/drivers/net_user.c
@@ -1,4 +1,5 @@
 /*
+ * Copyright (C) 2012 - 2014 Cisco Systems
  * Copyright (C) 2001 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com)
  * Licensed under the GPL
  */
@@ -120,6 +121,7 @@ int net_recvfrom(int fd, void *buf, int len)
 	return n;
 }
 
+
 int net_write(int fd, void *buf, int len)
 {
 	int n;
@@ -164,6 +166,7 @@ int net_sendto(int fd, void *buf, int len, void *to, int sock_len)
 	return n;
 }
 
+
 struct change_pre_exec_data {
 	int close_me;
 	int stdout;
diff --git a/arch/um/drivers/uml_l2tpv3.h b/arch/um/drivers/uml_l2tpv3.h
new file mode 100644
index 0000000..bd8369c
--- /dev/null
+++ b/arch/um/drivers/uml_l2tpv3.h
@@ -0,0 +1,109 @@
+/*
+ * Copyright (C) 2012 - 2014 Cisco Systems
+ * Copyright (C) 2001 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com)
+ * Licensed under the GPL
+ */
+
+#ifndef __UML_L2TPV3_H__
+#define __UML_L2TPV3_H__
+
+#include "net_user.h"
+
+
+#define NEW_MODE_IP_VERSION   1		  /* on for v6, off for v4 */
+#define NEW_MODE_UDP	      2		  /* on for udp, off for raw ip */
+#define NEW_MODE_COOKIE	      4		  /* cookie present */
+#define NEW_MODE_COOKIE_SIZE  8		  /* on for 64 bit */
+#define NEW_MODE_NO_COUNTER   16	  /* DT - no counter */
+
+/* legacy modes */
+
+/* mode 0 */
+
+#define LEGACY_UDP6_64_NO_COUNTER (NEW_MODE_IP_VERSION + NEW_MODE_UDP + NEW_MODE_COOKIE + NEW_MODE_COOKIE_SIZE + NEW_MODE_NO_COUNTER)
+
+#define LEGACY_MODE0 LEGACY_UDP6_64_NO_COUNTER
+
+/* mode 1 */
+
+#define LEGACY_IP6_64_NO_COUNTER (NEW_MODE_IP_VERSION + NEW_MODE_COOKIE + NEW_MODE_COOKIE_SIZE + NEW_MODE_NO_COUNTER)
+
+#define LEGACY_MODE1 LEGACY_IP6_64_NO_COUNTER
+
+/* mode 2 */
+
+#define LEGACY_UDP4_64_COUNTER (NEW_MODE_COOKIE + NEW_MODE_UDP + NEW_MODE_COOKIE_SIZE )
+
+#define LEGACY_MODE2 LEGACY_UDP4_64_COUNTER
+
+/* mode 3 */
+
+#define LEGACY_IP4_64_COUNTER (NEW_MODE_COOKIE + NEW_MODE_COOKIE_SIZE)
+
+#define LEGACY_MODE3 LEGACY_IP4_64_COUNTER
+
+
+#define L2TPV3_HEADER 16
+
+
+struct temphtonl {
+   uint32_t low; 
+   uint32_t high;
+};
+
+
+struct uml_l2tpv3_data {
+        void *remote_addr;
+        int  remote_addr_size;
+        char *remote_addr_string;
+        char *local_addr_string;
+        char *local_service;
+        char *remote_service;
+	char *local_session_string;
+	char *remote_session_string;
+	uint32_t local_session;
+	uint32_t remote_session;
+        char *rx_cookie_string;
+        char *tx_cookie_string;
+        uint64_t rx_cookie;
+        uint64_t tx_cookie;
+        uint8_t *network_buffer;
+	int fd;
+	void *dev;
+        uint32_t uml_l2tpv3_flags;
+        uint32_t mode;
+        uint32_t new_mode; /* listening, sending, etc */
+        uint32_t counter;
+   
+	/*  Precomputed offsets */
+	 
+        uint32_t offset;   /* main offset == header offset */
+        uint32_t cookie_offset;
+        uint32_t counter_offset;
+        uint32_t session_offset;
+
+	/* high speed vector io data */
+    
+	void ** skb_vector;
+	void * mmsg_vector;
+	uint32_t vector_len;
+	uint32_t recv_index;
+	uint32_t recv_enqueued;
+	/* normally same as offset, add size of struct ipv4 header in ipv4 raw - API stupiditities */
+	uint32_t header_size; 
+
+};  
+
+
+extern const struct net_user_info uml_l2tpv3_user_info;
+
+extern int uml_l2tpv3_user_sendmsg(int fd, void *header, int headerlen, void *data, int datalen, struct uml_l2tpv3_data *pri);
+
+extern int uml_l2tpv3_user_recvmsg(int fd, void *header, int headerlen, void *data, int datalen, struct uml_l2tpv3_data *pri);
+
+
+
+#define UML_L2TPV3_FLAG_TX_CHECKSUMS                0x00000001
+#define UML_L2TPV3_FLAG_RX_CHECKSUMS                0x00000002
+
+#endif
diff --git a/arch/um/drivers/uml_l2tpv3_kern.c b/arch/um/drivers/uml_l2tpv3_kern.c
new file mode 100644
index 0000000..aae9554
--- /dev/null
+++ b/arch/um/drivers/uml_l2tpv3_kern.c
@@ -0,0 +1,354 @@
+/*
+ * Copyright (C) 2012 - 2014 Cisco Systems
+ * Copyright (C) 2001 Lennert Buytenhek (buytenh@gnu.org) and
+ * James Leu (jleu@mindspring.net).
+ * Copyright (C) 2001 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com)
+ * Copyright (C) 2001 by various other people who didn't put their name here.
+ * Licensed under the GPL.
+ */
+
+#include "linux/init.h"
+#include <linux/netdevice.h>
+#include <linux/ethtool.h>
+#include <linux/ip.h>
+#include "net_kern.h"
+#include "uml_l2tpv3.h"
+#include "um_malloc.h"
+
+#define DRIVER_NAME "uml-l2tpv3"
+
+/* 
+   we will still use this if skbuff cannot be adjusted to include 
+   network header 
+*/
+
+struct uml_l2tpv3_init {
+	char *local_addr_string;
+	char *remote_addr_string;
+	char *local_service;
+	char *remote_service;
+	char *rx_cookie_string;
+	char *tx_cookie_string;
+	char *local_session_string;
+	char *remote_session_string;
+	char *mode_string;
+	char *new_mode_string;
+};
+
+static void uml_l2tpv3_get_drvinfo(struct net_device *dev,
+				struct ethtool_drvinfo *info)
+{
+	strcpy(info->driver, DRIVER_NAME);
+	strcpy(info->version, "42");
+}
+
+
+
+static const struct ethtool_ops uml_l2tpv3_ethtool_ops = {
+	.get_drvinfo	        = uml_l2tpv3_get_drvinfo,
+	.get_link		= ethtool_op_get_link,
+};
+
+
+
+static void uml_l2tpv3_init(struct net_device *dev, void *data)
+{
+	struct uml_net_private *pri;
+	struct uml_l2tpv3_data *dpri;
+	struct uml_l2tpv3_init *init = data;
+
+	pri = netdev_priv(dev);
+	dpri = (struct uml_l2tpv3_data *) pri->user;
+
+        /* 
+	    these are as is, we keep them for future reference
+	    and parse them in userspace
+
+	*/
+
+	dpri->local_addr_string = init->local_addr_string;
+	dpri->remote_addr_string = init->remote_addr_string;
+	dpri->local_service = init->local_service;
+	dpri->remote_service = init->remote_service;
+	dpri->rx_cookie_string = init->rx_cookie_string;
+	dpri->tx_cookie_string = init->tx_cookie_string;
+	dpri->local_session_string = init->local_session_string;
+        dpri->remote_session_string = init->remote_session_string;
+
+        /* the only ones we pre-parse */
+
+        if (init->new_mode_string != NULL) {
+	   sscanf(init->new_mode_string, "%x", &dpri->new_mode);
+	   printk("new mode %x\n", dpri->new_mode);
+        } else {
+	   if (init->mode_string != NULL) {
+	      sscanf(init->mode_string, "%i", &dpri->mode);
+	   } else {
+	      dpri->mode=1;
+	   }
+	   /* legacy modes */
+	   switch (dpri->mode) {
+	     case 0 :  dpri->new_mode = LEGACY_MODE0; break ;
+	     case 1 :  dpri->new_mode = LEGACY_MODE1; break ;
+	     case 2 :  dpri->new_mode = LEGACY_MODE2; break ;
+         case 3 :  dpri->new_mode = LEGACY_MODE3; break ;
+	   }
+	   printk("backwards compatible mode %i maps to new mode %x\n", dpri->mode, dpri->new_mode);
+        }
+
+	dpri->fd = -1;
+	dpri->dev = dev;
+	printk("l2tpv3 backend - %s:%s<->%s:%s, rxcookie: %s, txcookie:%s, local_session: %s, peer_session: %s\n",  dpri->local_addr_string, dpri->local_service, dpri->remote_addr_string, dpri->remote_service, dpri->rx_cookie_string, dpri->tx_cookie_string, dpri->local_session_string, dpri->remote_session_string);
+        dpri->uml_l2tpv3_flags = 0; /* we have everything turned off initially */
+        SET_ETHTOOL_OPS(dev, &uml_l2tpv3_ethtool_ops);
+}
+
+static int uml_l2tpv3_verify_header(uint8_t * buffer, struct uml_l2tpv3_data *dpri ) {
+    uint64_t *cookie64;
+    uint32_t *cookie32;
+    uint32_t *session_id;
+    
+
+    if ((!(dpri->new_mode & NEW_MODE_IP_VERSION)) && (!(dpri->new_mode & NEW_MODE_UDP))){
+	buffer += sizeof(struct iphdr) /* fix for ipv4 raw */;
+    } 
+    
+    session_id = (uint32_t *)(buffer + dpri->session_offset);
+    if (*session_id != dpri->remote_session) {
+	printk("Unknown Sesion id\n");
+	return 0; 
+    }
+
+    if (dpri->new_mode & NEW_MODE_COOKIE) {
+       if (dpri->new_mode & NEW_MODE_COOKIE_SIZE) {
+	  /* 64 bit cookie */
+	  cookie64 = (uint64_t *)(buffer + dpri->cookie_offset);
+	  if (*cookie64 != dpri->rx_cookie) {
+	     printk("unknown cookie id\n");
+	     return 0; /* we need to return 0, otherwise barfus */
+	  }
+       } else {
+	  cookie32 = (uint32_t *)(buffer + dpri->cookie_offset);
+	  if (*cookie32 != * (uint32_t *) &dpri->rx_cookie) {
+	     printk("unknown cookie id\n");
+	     return 0; /* we need to return 0, otherwise barfus */
+	  }
+       }
+    }
+    return 1;
+}
+
+static struct sk_buff * uml_l2tpv3_multiread (struct uml_net_private * lp) {
+    struct uml_l2tpv3_data *dpri = (struct uml_l2tpv3_data *) &lp->user;
+    void ** skb_vector = dpri->skb_vector;
+    struct mmsghdr * mmsg_vector = (struct mmsghdr *) dpri->mmsg_vector;
+    struct sk_buff * result;
+    struct iovec * iov;
+    int ret;
+    
+    
+    /* Are we done processing the enqueued buffers */
+
+
+    if (dpri->recv_index >= dpri->recv_enqueued) {
+	/* Do we need to refresh the buffer list */
+	if (dpri->recv_enqueued) {
+	    /* Replace dpri->recv_enqueued skbuffs */
+	    rebuild_skbuf_vector(skb_vector, dpri->recv_enqueued, lp->dev);
+	    /* Rebuild message vector */
+	    add_skbuffs(dpri->mmsg_vector, skb_vector, dpri->recv_enqueued, lp->max_packet, 1);
+	}
+	ret = net_recvmmsg(
+	    dpri->fd, dpri->mmsg_vector, dpri->vector_len, 0,NULL);
+	if (ret >= 0) {
+	    dpri->recv_enqueued = ret;
+	} else {
+	    printk("Error in multi-packet receive %d\n", ret);
+	    return NULL;
+	}
+	dpri->recv_index = 0;
+    }
+    /* check if we are done processing the enqueued buffers */
+    if (dpri->recv_index < dpri->recv_enqueued) {
+	skb_vector += dpri->recv_index;
+	mmsg_vector += dpri->recv_index;
+	dpri->recv_index ++;
+	iov = mmsg_vector->msg_hdr.msg_iov;
+	if (
+	    (iov) &&
+	    (mmsg_vector->msg_len > dpri->header_size) && 
+	    (uml_l2tpv3_verify_header(iov->iov_base, dpri))
+	) {
+	    if ((!dpri->remote_addr) && (mmsg_vector->msg_hdr.msg_name)) {
+		dpri->remote_addr = mmsg_vector->msg_hdr.msg_name;
+		dpri->remote_addr_size = mmsg_vector->msg_hdr.msg_namelen;
+		mmsg_vector->msg_hdr.msg_name = NULL;
+		mmsg_vector->msg_hdr.msg_namelen = 0;
+	    }
+    
+	    result = (struct sk_buff *)(* skb_vector);
+	    if (result) {
+		skb_trim(result, mmsg_vector->msg_len - dpri->header_size);
+		result->protocol = (*lp->protocol)(result);
+	    }
+	} else {
+	    uml_net_destroy_skb(* skb_vector ) ; /* otherwise we leak it */
+	    result = NULL;
+	}
+    } else {
+	result = NULL;
+    }
+    return result;
+}
+
+static int uml_l2tpv3_read(int fd, struct sk_buff *skb, struct uml_net_private *lp)
+{
+    int result;
+    struct uml_l2tpv3_data *dpri = (struct uml_l2tpv3_data *) &lp->user;
+    uint8_t  *buffer ;
+
+
+    int offset = dpri->offset;
+    
+    buffer = dpri->network_buffer;
+
+    if (!(dpri->new_mode & NEW_MODE_UDP) && !(dpri->new_mode & NEW_MODE_IP_VERSION))
+    {
+	/* IPv4 RAW mode: Account for the IP header that will be received */
+	offset += sizeof(struct iphdr);
+    }
+     
+
+        result = uml_l2tpv3_user_recvmsg(
+	    fd, 
+	    buffer, offset,
+	    skb->data, skb->dev->mtu + ETH_HEADER_OTHER,
+	    dpri
+    );
+    if (result <= 0) {
+	return result;
+    } 
+    if (
+	!(dpri->new_mode & NEW_MODE_UDP) && 
+	!(dpri->new_mode & NEW_MODE_IP_VERSION)
+    ) {
+	/* IPv4 RAW mode: Ignore the IP header */
+	buffer += sizeof(struct iphdr);
+    }
+
+    if ((result > offset) && (uml_l2tpv3_verify_header(buffer, dpri))) {
+	if ((dpri->uml_l2tpv3_flags & UML_L2TPV3_FLAG_RX_CHECKSUMS) != 0) {
+	   skb->ip_summed = CHECKSUM_UNNECESSARY;
+	}
+	return result - offset;
+    } else {
+	return 0;
+    }
+}
+
+static void uml_l2tpv3_form_header(uint8_t * buffer, struct uml_l2tpv3_data *pri) {
+        uint32_t *header;
+        uint32_t *session;
+        uint64_t *cookie64;
+        uint32_t *cookie32;
+        uint32_t *counter;
+	if (pri->new_mode & NEW_MODE_UDP) {
+	   header = (uint32_t *) buffer;
+	   * header = htonl(0x30000);
+        }
+	session = (uint32_t *) (buffer + pri->session_offset);
+	*session = pri->local_session;
+
+        if (pri->new_mode & NEW_MODE_COOKIE) {
+	    if (pri->new_mode & NEW_MODE_COOKIE_SIZE) {
+	       cookie64 = (uint64_t *)(buffer + pri->cookie_offset);
+	       * cookie64 = pri->tx_cookie;
+	    } else {
+	       cookie32 = (uint32_t *) (buffer + pri->cookie_offset);
+	       * cookie32 = * ((uint32_t *) &pri->tx_cookie);
+	    }
+        }
+
+        if (!(pri->new_mode & NEW_MODE_NO_COUNTER)) {
+	    counter = (uint32_t *)(buffer + pri->counter_offset);
+	    * counter = htonl(++pri->counter);
+        }
+}
+
+static int uml_l2tpv3_write(int fd, struct sk_buff *skb, struct uml_net_private *lp)
+{
+        struct uml_l2tpv3_data *pri = (struct uml_l2tpv3_data *) &lp->user;
+        uint8_t *buffer = pri->network_buffer;
+        int result;
+
+ 
+	buffer = (uint8_t *)  pri->network_buffer;
+
+	uml_l2tpv3_form_header(buffer, pri);
+
+	result = uml_l2tpv3_user_sendmsg(
+	    fd, 
+	    buffer, pri->offset,
+	    skb->data, skb->len,
+	    pri
+	);   
+    
+        if (result > pri->offset) {
+	    return result - pri->offset;
+        } else {
+	    return result; /* not particularly correct */
+        }
+}
+
+static const struct net_kern_info uml_l2tpv3_kern_info = {
+	.options		= UML_NET_USE_SKB_READ,
+	.init			= uml_l2tpv3_init,
+	.protocol		= eth_protocol,
+	.read			= uml_l2tpv3_read,
+	.skb_read		= uml_l2tpv3_multiread,
+	.write			= uml_l2tpv3_write,
+};
+
+static int uml_l2tpv3_setup(char *str, char **mac_out, void *data)
+{
+	struct uml_l2tpv3_init *init = data;
+	char *remain;
+
+	*init = (
+		(struct uml_l2tpv3_init)
+		   { 
+		     .local_addr_string = "::1",
+		     .local_service = "1701",
+		     .remote_service = "1702",
+		     .rx_cookie_string = "0xdeadbeefdeadbeef",
+		     .tx_cookie_string = "0xdeadbeefdeadbeef",
+		     .local_session_string = "0xFFFFFFFF",
+		     .remote_session_string = "0xFFFFFFFF",
+		     .mode_string = "0",
+		   }
+	        );
+
+	remain = split_if_spec(str, mac_out, &init->local_addr_string, &init->local_service, &init->remote_addr_string, &init->remote_service, &init->rx_cookie_string, &init->tx_cookie_string, &init->local_session_string, &init->remote_session_string, &init->mode_string, &init->new_mode_string, NULL);
+	if (remain != NULL)
+		printk(KERN_WARNING " Strange interface spec \n");
+
+	return 1;
+}
+
+static struct transport uml_l2tpv3_transport = {
+	.list 		= LIST_HEAD_INIT(uml_l2tpv3_transport.list),
+	.name 		= "l2tpv3",
+	.setup  	= uml_l2tpv3_setup,
+	.user 		= &uml_l2tpv3_user_info,
+	.kern 		= &uml_l2tpv3_kern_info,
+	.private_size 	= sizeof(struct uml_l2tpv3_data),
+	.setup_size 	= sizeof(struct uml_l2tpv3_init),
+};
+
+static int register_uml_l2tpv3(void)
+{
+	register_transport(&uml_l2tpv3_transport);
+	return 0;
+}
+
+late_initcall(register_uml_l2tpv3);
diff --git a/arch/um/drivers/uml_l2tpv3_user.c b/arch/um/drivers/uml_l2tpv3_user.c
new file mode 100644
index 0000000..9c9858e
--- /dev/null
+++ b/arch/um/drivers/uml_l2tpv3_user.c
@@ -0,0 +1,391 @@
+/*
+ * Copyright (C) 2012-2014 Cisco Systems
+ * Copyright (C) 2001 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com)
+ * Copyright (C) 2001 Lennert Buytenhek (buytenh@gnu.org) and
+ * James Leu (jleu@mindspring.net).
+ * Copyright (C) 2001 by various other people who didn't put their name here.
+ * Licensed under the GPL.
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <sys/ioctl.h>
+#include <net/if.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <netdb.h>
+#include <net/ethernet.h>
+#include <netinet/ip.h>
+#include <netinet/ether.h>
+#include <linux/if_ether.h>
+#include <linux/if_packet.h>
+#include <arpa/inet.h>
+
+#include "uml_l2tpv3.h"
+#include "net_user.h"
+#include "os.h"
+#include "um_malloc.h"
+#include "user.h"
+
+#define VECTOR_SIZE 32
+
+int l2tpv3_parse_cookie32(char *src , void * dst) {
+
+  if (
+	 (src == NULL) || 
+	 (sscanf(src, "%x", (unsigned int *) dst) != 1)
+      ) { 
+	 printk(UM_KERN_ERR "cannot parse cookie!!!: %s\n", src);
+	 return -1;
+   } 
+   
+   * (( uint32_t *) dst) = htonl(* ((uint32_t* )dst));
+
+   return 0;
+}
+
+int l2tpv3_parse_cookie64(char *src , void * dst) {
+
+   struct temphtonl temph;
+   uint32_t temp;
+   const int num = 42;
+   if (
+	 (src == NULL) || 
+	 (sscanf(src, "%llx", (long unsigned int *) &temph) != 1)
+      ) { 
+	 printk(UM_KERN_ERR "cannot parse cookie!!!: %s\n", src);
+	 return -1;
+   } 
+   if(*(char *)&num == 42) {
+      // why oh why there is no htonll
+      temp = htonl(temph.high);
+      temph.high = htonl(temph.low);
+      temph.low = temp;
+   } else {
+      temph.low = htonl(temph.low); 
+      temph.high = htonl(temph.high);
+   }	
+
+   memcpy(dst, &temph, sizeof (uint64_t));
+   return 0;
+}
+
+static int uml_l2tpv3_user_init(void *data, void *dev)
+{
+	struct uml_l2tpv3_data *pri = data;
+	int fd;
+        int local_port, remote_port;
+
+        int sock_family, sock_type, sock_proto;
+        
+        
+	/* 
+	 
+	This may look ugly, but there is no way about it
+        UML DIY threading is incompatible with getaddrinfo
+        so all resolution has to be done using legacy functions
+   
+        */
+      
+        struct sockaddr_storage LocalSock;
+   
+        struct sockaddr_in6 *LocalSockv6;
+        struct sockaddr_in6 *RemoteSockv6;
+
+        struct sockaddr_in *LocalSockv4; 
+        struct sockaddr_in *RemoteSockv4; 
+
+	struct mmsghdr * mmsghdr;
+
+	pri->offset = 4;
+        pri->session_offset = 0;
+        pri->cookie_offset = 4;
+        pri->counter_offset = 4;
+ 
+        LocalSockv4 = (struct sockaddr_in *) &LocalSock;
+        LocalSockv6 = (struct sockaddr_in6 *) &LocalSock;
+        
+        printk(UM_KERN_INFO "l2tpv3 user init mode %i\n", pri->mode);
+
+
+        /* basic variable parsing */
+	 
+	pri->local_session = 0;
+        if (l2tpv3_parse_cookie32(pri->local_session_string,&pri->local_session) !=0) {
+           return -1;
+        }
+        pri->remote_session = 0;
+        if (l2tpv3_parse_cookie32(pri->remote_session_string,&pri->remote_session) !=0) {
+           return -1;
+        }
+
+        if (pri->new_mode & NEW_MODE_COOKIE) {
+	   if (pri->new_mode & NEW_MODE_COOKIE_SIZE) {
+	      /* 64 bit cookie */
+	      pri->offset += 8;
+	      pri->counter_offset += 8;
+	      if (l2tpv3_parse_cookie64(pri->tx_cookie_string,&pri->tx_cookie) !=0) {
+		  return -1;
+	      }
+	      if (l2tpv3_parse_cookie64(pri->rx_cookie_string,&pri->rx_cookie) !=0) {
+		  return -1;
+	      }
+	   } else {
+	      /* 32 bit cookie */
+	      pri->offset += 4;
+	      pri->counter_offset +=4;
+	      pri->tx_cookie = 0;
+	      if (l2tpv3_parse_cookie32(pri->tx_cookie_string,&pri->tx_cookie) !=0) {
+		  return -1;
+	      }
+	      pri->rx_cookie = 0;
+	      if (l2tpv3_parse_cookie32(pri->rx_cookie_string,&pri->rx_cookie) !=0) {
+		  return -1;
+	      }
+	   }
+        }
+	 
+        if (pri->local_service) {
+	   sscanf(pri->local_service, "%i", &local_port);
+        }
+      
+        if (pri->remote_service) {
+	   sscanf(pri->remote_service, "%i", &remote_port);
+        }
+  
+        if (pri->remote_addr_string) {         
+
+	   /* we now allocate it only if it we are not "listening" */
+
+	   pri->remote_addr = uml_kmalloc(sizeof(struct sockaddr_storage), UM_GFP_KERNEL);
+        } else {
+	   pri->remote_addr = NULL;
+        }
+
+        if (pri->new_mode & NEW_MODE_IP_VERSION) {
+	    /* IPv6 */
+	    sock_family = AF_INET6;
+        } else {
+	    /* IPv4 */
+	    sock_family = AF_INET;
+        }
+	if (pri->new_mode & NEW_MODE_UDP) {
+	    printk(UM_KERN_ERR "uml_l2tpv3_open : preparing udp socket for mode %x\n ", pri->new_mode);
+	    sock_type = SOCK_DGRAM;
+	    sock_proto = 0;
+   
+	    /* space for header. In UDP mode, the 
+         * egress packet also includes the 
+         * 'Ver' and 'Reserved' fields.
+         */
+
+	    pri->offset += 4;
+	    pri->counter_offset += 4;
+	    pri->session_offset += 4;
+	    pri->cookie_offset += 4;
+        } else {
+	    printk(UM_KERN_ERR "uml_l2tpv3_open : preparing raw socket for mode %x\n ", pri->new_mode);
+	    sock_type = SOCK_RAW;
+	    sock_proto = 0x73;
+	    local_port = 0x73;
+	    remote_port = 0x73;
+        }
+
+        if (!(pri->new_mode & NEW_MODE_NO_COUNTER)) {
+	    pri->offset += 4;
+        }
+
+
+	if ((fd = socket(sock_family, sock_type, sock_proto)) == -1) {
+	    fd = -errno;
+	    printk(UM_KERN_ERR "uml_l2tpv3_open : socket creation failed, "
+	       "errno = %d\n", -fd);
+	      return fd;
+	}
+
+        if (pri->new_mode & NEW_MODE_IP_VERSION) {
+	    LocalSockv6->sin6_family = AF_INET6;
+	    LocalSockv6->sin6_port = htons(local_port);
+	    if (inet_pton(AF_INET6,pri->local_addr_string, &LocalSockv6->sin6_addr) <  1) {
+	       printk(UM_KERN_ERR "uml_l2tpv3_open : local address conversion failed ");
+	       return -1;    
+	    }
+        } else {
+	    LocalSockv4->sin_family = AF_INET;
+	    LocalSockv4->sin_port = htons(local_port);
+	    if (inet_pton(AF_INET,pri->local_addr_string, &LocalSockv4->sin_addr) <  1) {
+	       printk(UM_KERN_ERR "uml_l2tpv3_open : local address conversion failed ");
+	       return -1;    
+	    }
+        }
+        if (pri->remote_addr) {
+	   if (pri->new_mode & NEW_MODE_IP_VERSION) {
+	       RemoteSockv6 = (struct sockaddr_in6 *) pri->remote_addr;
+	       RemoteSockv6->sin6_family = AF_INET6;
+	       RemoteSockv6->sin6_port = htons(remote_port);
+	       if (inet_pton(AF_INET6,pri->remote_addr_string, &RemoteSockv6->sin6_addr) <  1) {
+		  printk(UM_KERN_ERR "uml_l2tpv3_open : remote address conversion failed ");
+		  return -1;    
+	       }
+	       pri->remote_addr_size = sizeof(struct sockaddr_in6);
+	   } else {
+	       RemoteSockv4 = (struct sockaddr_in *) pri->remote_addr;
+	       RemoteSockv4->sin_family = AF_INET;
+	       RemoteSockv4->sin_port = htons(remote_port);
+	       if (inet_pton(AF_INET,pri->remote_addr_string, &RemoteSockv4->sin_addr) <  1) {
+		  printk(UM_KERN_ERR "uml_l2tpv3_open : remote address conversion failed ");
+		  return -1;    
+	       }
+	       pri->remote_addr_size = sizeof(struct sockaddr_in);
+	   }
+        }
+ 
+	if (bind(fd, (struct sockaddr *) &LocalSock, sizeof(LocalSock))) {
+	    printk("uml_l2tpv3_open :  could not bind socket\n");
+	    close(fd);
+	    return -1;
+	} else {
+	    printk("uml_l2tpv3_open : socket bound\n");
+	}
+
+
+/* vector IO init */
+
+
+        pri->vector_len = VECTOR_SIZE;
+        pri->recv_index = 0;
+        pri->recv_enqueued = 0;
+        pri->header_size = pri->offset /* fix for ipv4 raw */;
+	if ((!(pri->new_mode & NEW_MODE_IP_VERSION)) && (!(pri->new_mode & NEW_MODE_UDP))){
+	    pri->header_size += sizeof(struct iphdr) /* fix for ipv4 raw */;
+	}
+	pri->skb_vector = build_skbuf_vector(VECTOR_SIZE, dev);
+	pri->mmsg_vector = build_mmsg_vector(VECTOR_SIZE, 2);
+	add_header_buffers(pri->mmsg_vector, VECTOR_SIZE, pri->header_size); 
+	add_skbuffs(
+	    pri->mmsg_vector, 
+	    pri->skb_vector, 
+	    VECTOR_SIZE, ETH_MAX_PACKET + ETH_HEADER_OTHER, 
+	    1
+	);
+
+	pri->network_buffer = uml_kmalloc(pri->header_size, UM_GFP_KERNEL); /* enough for any header, regardless how stupid */
+
+	if (!pri->network_buffer) {
+	    printk("uml_l2tpv3_open : could not allocate buffer\n");
+	    return -1;
+	}
+
+		if (!pri->remote_addr) {
+	    mmsghdr = (struct mmsghdr *) pri->mmsg_vector;
+	    mmsghdr->msg_hdr.msg_name = uml_kmalloc(sizeof(struct sockaddr_storage), UM_GFP_KERNEL);
+	    if (mmsghdr->msg_hdr.msg_name) {
+		mmsghdr->msg_hdr.msg_namelen = sizeof(struct sockaddr_storage);
+	    } else {
+		printk("Failed to allocate remote address name\n");
+	    }
+	}
+       
+	pri->dev = dev;
+	pri->fd = fd;
+	if (pri->fd < 0) {
+		return pri->fd;
+	}
+
+	printk("uml_l2tpv3_open : init complete, fd %i\n", fd);
+ 
+	return 0;
+}
+
+static int uml_l2tpv3_open(void *data)
+{
+	struct uml_l2tpv3_data *pri = data;
+	return pri->fd;
+}
+
+static void uml_l2tpv3_remove(void *data)
+{
+	struct uml_l2tpv3_data *pri = data;
+
+	close(pri->fd);
+	if (pri->skb_vector) {
+	    destroy_skb_vector(pri->skb_vector, VECTOR_SIZE);
+	}
+	if (pri->mmsg_vector) {
+	    destroy_mmsg_vector(pri->mmsg_vector, VECTOR_SIZE, 1);
+	}
+	pri->fd = -1;
+}
+
+
+int uml_l2tpv3_user_sendmsg(int fd, void *header, int headerlen, void *data, int datalen, struct uml_l2tpv3_data *pri)
+{
+        struct msghdr message;
+        struct iovec vec[2];
+        vec[0].iov_base = header;
+        vec[0].iov_len = headerlen;
+        vec[1].iov_base = data;
+	vec[1].iov_len = datalen;
+
+
+        message.msg_name = pri->remote_addr;
+        message.msg_namelen = pri->remote_addr_size;
+        message.msg_iov = (struct iovec *) &vec;
+        message.msg_iovlen = 2;
+        message.msg_control = NULL;
+        message.msg_controllen = 0;
+        message.msg_flags = MSG_DONTWAIT;
+
+
+        if (pri->remote_addr != NULL) {
+	   return net_sendmessage(fd, &message, MSG_DONTWAIT);
+        } else {
+	   return -1;
+	}
+}
+int uml_l2tpv3_user_recvmsg(int fd, void *header, int headerlen, void *data, int datalen, struct uml_l2tpv3_data *pri)
+{
+        struct msghdr message;
+        struct iovec vec[2];
+        vec[0].iov_base = header;
+        vec[0].iov_len = headerlen;
+        vec[1].iov_base = data;
+	vec[1].iov_len = datalen;
+
+	if (!pri->remote_addr) {
+	    pri->remote_addr = uml_kmalloc(sizeof(struct sockaddr_storage), UM_GFP_KERNEL);
+	    if (pri->remote_addr) {
+		message.msg_name = pri->remote_addr;
+		message.msg_namelen = pri->remote_addr_size;
+	    } else {
+		message.msg_name = NULL;
+		message.msg_namelen = 0;
+	    }
+	} else {
+	    message.msg_name = NULL;
+	    message.msg_namelen = 0;
+	}
+    
+        message.msg_iov = (struct iovec *) &vec;
+        message.msg_iovlen = 2;
+        message.msg_control = NULL;
+        message.msg_controllen = 0;
+        message.msg_flags = MSG_DONTWAIT;
+
+
+	return net_recvmessage(fd, &message, MSG_DONTWAIT);
+}
+const struct net_user_info uml_l2tpv3_user_info = {
+	.init		= uml_l2tpv3_user_init,
+	.open		= uml_l2tpv3_open,
+	.close	 	= NULL,
+	.remove	 	= uml_l2tpv3_remove,
+	.add_address	= NULL,
+	.delete_address = NULL,
+	.mtu		= ETH_MAX_PACKET,
+	.max_packet	= ETH_MAX_PACKET + ETH_HEADER_OTHER + L2TPV3_HEADER,
+};
diff --git a/arch/um/drivers/uml_raw.h b/arch/um/drivers/uml_raw.h
new file mode 100644
index 0000000..5a08604
--- /dev/null
+++ b/arch/um/drivers/uml_raw.h
@@ -0,0 +1,49 @@
+/*
+ * Copyright (C) 2012 - 2014 Cisco Systems
+ * Copyright (C) 2001 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com)
+ * Licensed under the GPL
+ */
+
+#ifndef __UML_RAW_H__
+#define __UML_RAW_H__
+
+#include "net_user.h"
+
+struct uml_raw_data {
+	char *host_iface;
+	int fd;
+	void *dev;
+        uint32_t uml_raw_flags;
+
+	/* packet mmap read */
+
+	uint8_t *scratch_buffer;  /* for dummy reads*/
+	uint8_t *multiread_buffer;
+	int ring_index;
+
+	/* multi-rx read */
+
+	void ** skb_vector;
+	void * mmsg_vector;
+	uint32_t vector_len;
+	uint32_t recv_index;
+	uint32_t recv_enqueued;
+
+};
+
+extern const struct net_user_info uml_raw_user_info;
+
+extern int uml_raw_user_write(int fd, void *buf, int len,
+			     struct uml_raw_data *pri);
+
+#define UML_RAW_FLAG_TX_CHECKSUMS                0x00000001
+#define UML_RAW_FLAG_RX_CHECKSUMS                0x00000002
+
+
+#define UML_RAW_TP_BLOCK_SIZE 4096
+#define UML_RAW_TP_FRAME_SIZE 2048
+#define UML_RAW_TP_BLOCK_NR 32
+#define UML_RAW_TP_FRAME_NR 64
+
+
+#endif
diff --git a/arch/um/drivers/uml_raw_kern.c b/arch/um/drivers/uml_raw_kern.c
new file mode 100644
index 0000000..8e71051
--- /dev/null
+++ b/arch/um/drivers/uml_raw_kern.c
@@ -0,0 +1,213 @@
+/*
+ * Copyright (C) 2012 - 2014 Cisco Systems
+ * Copyright (C) 2001 Lennert Buytenhek (buytenh@gnu.org) and
+ * James Leu (jleu@mindspring.net).
+ * Copyright (C) 2001 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com)
+ * Copyright (C) 2001 by various other people who didn't put their name here.
+ * Licensed under the GPL.
+ */
+
+#include "linux/init.h"
+#include <linux/netdevice.h>
+#include <linux/ethtool.h>
+#include <linux/if_packet.h>
+#include "net_kern.h"
+#include "uml_raw.h"
+#include "linux/mutex.h"
+#include "um_malloc.h"
+
+#define DRIVER_NAME "uml-raw"
+
+
+struct uml_raw_init {
+	char *host_iface;
+};
+
+static void uml_raw_get_drvinfo(struct net_device *dev,
+				struct ethtool_drvinfo *info)
+{
+	strcpy(info->driver, DRIVER_NAME);
+	strcpy(info->version, "42");
+}
+
+
+static const struct ethtool_ops uml_raw_ethtool_ops = {
+	.get_drvinfo	        = uml_raw_get_drvinfo,
+	.get_link		= ethtool_op_get_link,
+};
+
+
+static void uml_raw_init(struct net_device *dev, void *data)
+{
+	struct uml_net_private *pri;
+	struct uml_raw_data *dpri;
+	struct uml_raw_init *init = data;
+
+	pri = netdev_priv(dev);
+	dpri = (struct uml_raw_data *) pri->user;
+	dpri->host_iface = init->host_iface;
+	dpri->fd = -1;
+	dpri->dev = dev;
+    
+	/* We will free this pointer. If it contains crap we're burned. */
+
+	printk("raw backend - host iface: %s",  dpri->host_iface);
+	printk("\n");
+        printk("enabling ethtool support\n");
+        dpri->uml_raw_flags = 0; /* we have everything turned off initially */
+        SET_ETHTOOL_OPS(dev, &uml_raw_ethtool_ops);
+
+}
+
+#ifdef PACKETMMAP_RX
+
+static inline struct tpacket_hdr * current_header (struct uml_raw_data *dpri) {
+    uint8_t * buffer;
+    buffer = dpri->multiread_buffer + (dpri->ring_index * UML_RAW_TP_FRAME_SIZE);
+    return (struct tpacket_hdr *) buffer;
+}
+
+static struct tpacket_hdr * uml_raw_advance_ring(struct uml_raw_data *dpri ) {
+
+    struct tpacket_hdr * header = current_header(dpri);
+    header->tp_status = TP_STATUS_KERNEL; /* mark as free */
+    dpri->ring_index = (dpri->ring_index + 1) % UML_RAW_TP_FRAME_NR;
+    return current_header(dpri);
+}
+
+#else 
+
+static struct sk_buff * uml_raw_multiread (struct uml_net_private * lp) {
+    struct uml_raw_data *dpri = (struct uml_raw_data *) &lp->user;
+    void ** skb_vector = dpri->skb_vector;
+    struct mmsghdr * mmsg_vector = (struct mmsghdr *) dpri->mmsg_vector;
+    struct sk_buff * result = NULL;
+    struct iovec * iov;
+    int ret;
+    
+    if (dpri->recv_index >= dpri->recv_enqueued) {
+	dpri->recv_index = 0;
+	if (dpri->recv_enqueued) {
+	    rebuild_skbuf_vector(skb_vector, dpri->recv_enqueued, lp->dev);
+	    add_skbuffs(dpri->mmsg_vector, skb_vector, dpri->recv_enqueued, lp->max_packet, 0);
+	}
+	ret = net_recvmmsg(
+	    dpri->fd, dpri->mmsg_vector, dpri->vector_len, 0, NULL);
+	if (ret >= 0) {
+	    dpri->recv_enqueued = ret;
+	} else {
+	    dpri->recv_enqueued = 0;
+	    return NULL;
+	}
+    }
+    if (dpri->recv_index < dpri->recv_enqueued) {
+	skb_vector += dpri->recv_index;
+	mmsg_vector += dpri->recv_index;
+	dpri->recv_index ++;
+	iov = mmsg_vector->msg_hdr.msg_iov;
+	if ((mmsg_vector->msg_len) && (iov)) {
+	    result = (struct sk_buff *)(* skb_vector);
+	    if (result) {
+		skb_trim(result, mmsg_vector->msg_len);
+		result->protocol = (*lp->protocol)(result);
+	    }
+	} else {
+	    uml_net_destroy_skb(* skb_vector ) ; /* otherwise we leak it */
+	    result = NULL;
+	}
+	//repair_mmsg(mmsg_vector, 2, dpri->header_size);
+    } else {
+	result = NULL;
+    }
+    return result;
+}
+
+#endif
+
+
+static int uml_raw_read(int fd, struct sk_buff *skb, struct uml_net_private *lp)
+{
+        int result;
+        struct uml_raw_data *dpri;
+	dpri = (struct uml_raw_data *) lp->user;
+#ifdef PACKETMMAP_RX
+	struct tpacket_hdr * header;
+
+	header = current_header(dpri);	 
+
+
+	if ((header->tp_status & TP_STATUS_USER) > 0) { 
+	    result = header->tp_snaplen;
+	    memcpy(skb_mac_header(skb), ((uint8_t *) header) + header->tp_mac, result);
+	    if ((dpri->uml_raw_flags & UML_RAW_FLAG_RX_CHECKSUMS) != 0) {
+	       skb->ip_summed = CHECKSUM_UNNECESSARY;
+	    }
+	    uml_raw_advance_ring(dpri);
+	} else {
+	    result = 0;
+	} 
+
+#else
+	result = net_read(fd, skb_mac_header(skb),
+                            skb->dev->mtu + ETH_HEADER_OTHER);
+	
+#endif
+
+        return result;
+}
+
+static int uml_raw_write(int fd, struct sk_buff *skb, struct uml_net_private *lp)
+{
+	return uml_raw_user_write(fd, skb->data, skb->len,
+				 (struct uml_raw_data *) &lp->user);
+}
+
+static const struct net_kern_info uml_raw_kern_info = {
+#ifdef PACKETMMAP_RX 
+	.options		= 0,
+#else
+	.options		= UML_NET_USE_SKB_READ,
+#endif
+	.init			= uml_raw_init,
+	.protocol		= eth_protocol,
+	.read			= uml_raw_read,
+#ifndef PACKETMMAP_RX 
+	.skb_read		= uml_raw_multiread,
+#endif
+	.write			= uml_raw_write
+};
+
+static int uml_raw_setup(char *str, char **mac_out, void *data)
+{
+	struct uml_raw_init *init = data;
+	char *remain;
+
+	*init = (
+		(struct uml_raw_init)
+		   { .host_iface = "eth0"}
+	        );
+
+	remain = split_if_spec(str, mac_out, &init->host_iface, NULL);
+	if (remain != NULL)
+		printk(KERN_WARNING " Strange interface spec \n");
+
+	return 1;
+}
+
+static struct transport uml_raw_transport = {
+	.list 		= LIST_HEAD_INIT(uml_raw_transport.list),
+	.name 		= "raw",
+	.setup  	= uml_raw_setup,
+	.user 		= &uml_raw_user_info,
+	.kern 		= &uml_raw_kern_info,
+	.private_size 	= sizeof(struct uml_raw_data),
+	.setup_size 	= sizeof(struct uml_raw_init),
+};
+
+static int register_uml_raw(void)
+{
+	register_transport(&uml_raw_transport);
+	return 0;
+}
+
+late_initcall(register_uml_raw);
diff --git a/arch/um/drivers/uml_raw_user.c b/arch/um/drivers/uml_raw_user.c
new file mode 100644
index 0000000..24194dc
--- /dev/null
+++ b/arch/um/drivers/uml_raw_user.c
@@ -0,0 +1,151 @@
+/*
+ * Copyright (C) 2012 - 2014 Cisco Systems
+ * Copyright (C) 2001 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com)
+ * Copyright (C) 2001 Lennert Buytenhek (buytenh@gnu.org) and
+ * James Leu (jleu@mindspring.net).
+ * Copyright (C) 2001 by various other people who didn't put their name here.
+ * Licensed under the GPL.
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <errno.h>
+#include <sys/ioctl.h>
+#include <net/if.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <net/ethernet.h>
+#include <netinet/ip.h>
+#include <netinet/ether.h>
+#include <linux/if_ether.h>
+#include <linux/if_packet.h>
+#include <sys/mman.h>
+
+
+#include "uml_raw.h"
+#include "net_user.h"
+#include "os.h"
+#include "um_malloc.h"
+#include "user.h"
+
+#define VECTOR_SIZE 32
+
+static int uml_raw_user_init(void *data, void *dev)
+{
+	struct uml_raw_data *pri = data;
+        struct ifreq ifr;
+	int fd;
+        struct sockaddr_ll sock;
+        int err;
+	struct tpacket_req tpacket;
+    
+	pri->ring_index = 0;
+	 
+        if ((fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL))) == -1) {
+	    err = -errno;
+	    printk(UM_KERN_ERR "uml_raw_open : raw socket creation failed, "
+		       "errno = %d\n", -err);
+	    return err;
+        }
+
+#ifdef PACKETMMAP_RX	
+
+       	tpacket.tp_block_size = UML_RAW_TP_BLOCK_SIZE; 
+	tpacket.tp_frame_size = UML_RAW_TP_FRAME_SIZE; 
+	tpacket.tp_block_nr = UML_RAW_TP_BLOCK_NR ; 
+	tpacket.tp_frame_nr = UML_RAW_TP_FRAME_NR;
+
+	if (setsockopt(fd, SOL_PACKET, PACKET_RX_RING, (void *) &tpacket, sizeof(struct tpacket_req))) {
+	    printk(UM_KERN_ERR "uml_raw: failed to request packet mmap");
+	    return -errno;
+	} else {
+	    printk(UM_KERN_ERR "uml_raw: requested packet mmap\n");
+	}
+
+	pri->multiread_buffer = (uint8_t *) mmap(NULL, UML_RAW_TP_FRAME_SIZE * UML_RAW_TP_FRAME_NR, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+
+	if (!(pri->multiread_buffer)) {
+	    printk(UM_KERN_ERR "uml_raw: failed to map buffer");
+	    return -1;
+	} else {
+	    printk(UM_KERN_ERR "uml_raw: mmmap-ed buffer at %p\n", pri->multiread_buffer);
+	}
+
+#else
+
+	pri->vector_len = VECTOR_SIZE;
+        pri->recv_index = 0;
+        pri->recv_enqueued = 0;
+	pri->skb_vector = build_skbuf_vector(VECTOR_SIZE, dev);
+	pri->mmsg_vector = build_mmsg_vector(VECTOR_SIZE, 1);
+	add_skbuffs(
+	    pri->mmsg_vector, 
+	    pri->skb_vector, 
+	    VECTOR_SIZE, ETH_MAX_PACKET + ETH_HEADER_OTHER, 
+	    0
+	);
+
+#endif
+
+	memset(&ifr, 0, sizeof(ifr));
+        strncpy(&ifr.ifr_name, pri->host_iface, sizeof(ifr.ifr_name) - 1);
+        if(ioctl(fd, SIOCGIFINDEX, (void *) &ifr) < 0) {
+	    err = -errno;
+	    printk(UM_KERN_ERR "SIOCGIFINDEX, failed to get raw interface index for %s", pri->host_iface);
+	    close(fd);
+	    return(-1);
+	}
+
+        sock.sll_family = AF_PACKET;
+        sock.sll_protocol = htons(ETH_P_ALL);
+	sock.sll_ifindex = ifr.ifr_ifindex;
+
+	printk(UM_KERN_INFO "uml_raw: binding raw on interface index: %i\n", ifr.ifr_ifindex);
+        if (bind(fd, (struct sockaddr *) &sock, sizeof(struct sockaddr_ll)) < 0) {
+	    printk(UM_KERN_ERR "uml_raw: failed to bind raw socket");
+	    close(fd);
+    	    return(-1);
+	}
+
+	pri->dev = dev;
+	pri->fd = fd;
+	if (pri->fd < 0) {
+		return pri->fd;
+	}
+
+	return 0;
+}
+
+static int uml_raw_open(void *data)
+{
+	struct uml_raw_data *pri = data;
+	return pri->fd;
+}
+
+static void uml_raw_remove(void *data)
+{
+	struct uml_raw_data *pri = data;
+
+	close(pri->fd);
+	pri->fd = -1;
+//	kfree(pri->host_iface);
+//	pri->host_iface = NULL;
+}
+
+int uml_raw_user_write(int fd, void *buf, int len, struct uml_raw_data *pri)
+{
+	return net_write(fd, buf, len);
+}
+
+const struct net_user_info uml_raw_user_info = {
+	.init		= uml_raw_user_init,
+	.open		= uml_raw_open,
+	.close	 	= NULL,
+	.remove	 	= uml_raw_remove,
+	.add_address	= NULL,
+	.delete_address = NULL,
+	.mtu		= ETH_MAX_PACKET,
+	.max_packet	= ETH_MAX_PACKET + ETH_HEADER_OTHER,
+};
diff --git a/arch/um/include/shared/net_kern.h b/arch/um/include/shared/net_kern.h
index 5c367f2..6f4dd8e 100644
--- a/arch/um/include/shared/net_kern.h
+++ b/arch/um/include/shared/net_kern.h
@@ -1,4 +1,5 @@
 /*
+ * Copyright (C) 2012 - 2014 Cisco Systems
  * Copyright (C) 2002 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com)
  * Licensed under the GPL
  */
@@ -21,6 +22,8 @@ struct uml_net {
 	unsigned char mac[ETH_ALEN];
 };
 
+#define UML_NET_USE_SKB_READ 1
+
 struct uml_net_private {
 	struct list_head list;
 	spinlock_t lock;
@@ -29,6 +32,7 @@ struct uml_net_private {
 
 	struct work_struct work;
 	int fd;
+	unsigned int options;
 	unsigned char mac[ETH_ALEN];
 	int max_packet;
 	unsigned short (*protocol)(struct sk_buff *);
@@ -36,6 +40,7 @@ struct uml_net_private {
 	void (*close)(int, void *);
 	void (*remove)(void *);
 	int (*read)(int, struct sk_buff *skb, struct uml_net_private *);
+	struct sk_buff * (*skb_read)(struct uml_net_private *);
 	int (*write)(int, struct sk_buff *skb, struct uml_net_private *);
 
 	void (*add_address)(unsigned char *, unsigned char *, void *);
@@ -46,7 +51,9 @@ struct uml_net_private {
 struct net_kern_info {
 	void (*init)(struct net_device *, void *);
 	unsigned short (*protocol)(struct sk_buff *);
+	unsigned int options;
 	int (*read)(int, struct sk_buff *skb, struct uml_net_private *);
+	struct sk_buff * (*skb_read)(struct uml_net_private *);
 	int (*write)(int, struct sk_buff *skb, struct uml_net_private *);
 };
 
@@ -66,5 +73,6 @@ extern int tap_setup_common(char *str, char *type, char **dev_name,
 			    char **mac_out, char **gate_addr);
 extern void register_transport(struct transport *new);
 extern unsigned short eth_protocol(struct sk_buff *skb);
+extern struct sk_buff *my_build_skb(void * head, void *data, unsigned int frag_size);
 
 #endif
diff --git a/arch/um/include/shared/net_user.h b/arch/um/include/shared/net_user.h
index 3dabbe1..52f087a 100644
--- a/arch/um/include/shared/net_user.h
+++ b/arch/um/include/shared/net_user.h
@@ -1,4 +1,5 @@
 /*
+ * Copyright (C) 2012 - 2014 Cisco Systems
  * Copyright (C) 2002 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com)
  * Licensed under the GPL
  */
@@ -38,10 +39,15 @@ extern void tap_check_ips(char *gate_addr, unsigned char *eth_addr);
 extern void read_output(int fd, char *output_out, int len);
 
 extern int net_read(int fd, void *buf, int len);
+extern int net_readv(int fd, void *iov, int iovcnt);
 extern int net_recvfrom(int fd, void *buf, int len);
+extern int net_recvfrom2(int fd, void *buf, int len, void *src_addr, int *addrlen);
 extern int net_write(int fd, void *buf, int len);
+extern int net_writev(int fd, void *iov, int iovcnt);
 extern int net_send(int fd, void *buf, int len);
 extern int net_sendto(int fd, void *buf, int len, void *to, int sock_len);
+extern int net_sendmessage(int fd, void *msg, int flags);
+extern int net_recvmessage(int fd, void *msg, int flags);
 
 extern void open_addr(unsigned char *addr, unsigned char *netmask, void *arg);
 extern void close_addr(unsigned char *addr, unsigned char *netmask, void *arg);
@@ -50,4 +56,23 @@ extern char *split_if_spec(char *str, ...);
 
 extern int dev_netmask(void *d, void *m);
 
+
+/* net kern extra */
+extern void uml_net_destroy_skb(void * skb);
+extern void * uml_net_build_skb (void * dev);
+extern void * uml_net_skb_data (void * skb);
+
+extern void add_skbuffs(void * msgvec, void ** skbvec, int size, int skb_size, int offset);
+extern void add_header_buffers(void * msgvec, int size, int header_size);
+extern void * build_mmsg_vector(int size, int iovsize);
+extern void rebuild_skbuf_vector(void ** skbvec, int size, void * dev);
+extern void * build_skbuf_vector(int size, void * dev);
+extern int net_recvmmsg(int fd, void *msgvec, unsigned int vlen,
+                    unsigned int flags, struct timespec *timeout);
+extern int net_sendmmsg(int fd, void *msgvec, unsigned int vlen,
+                    unsigned int flags);
+extern void repair_mmsg (void *msgvec, int iovsize, int header_size);
+extern void destroy_skb_vector(void ** vector, int size);
+extern void destroy_mmsg_vector(void * mmsgvector, int size, int free_iov_base);
+
 #endif
diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h
index 89b686c1..d2a4f4b 100644
--- a/arch/um/include/shared/os.h
+++ b/arch/um/include/shared/os.h
@@ -1,4 +1,5 @@
 /*
+ * Copyright (C) 2012 - 2014 Cisco Systems
  * Copyright (C) 2002 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com)
  * Licensed under the GPL
  */
@@ -274,6 +275,7 @@ extern void halt_skas(void);
 extern void reboot_skas(void);
 
 /* irq.c */
+
 extern int os_waiting_for_events(struct irq_fd *active_fds);
 extern int os_create_pollfd(int fd, int events, void *tmp_pfd, int size_tmpfds);
 extern void os_free_irq_by_cb(int (*test)(struct irq_fd *, void *), void *arg,
@@ -299,4 +301,5 @@ extern int get_pty(void);
 /* sys-$ARCH/task_size.c */
 extern unsigned long os_get_top_address(void);
 
+
 #endif

[-- Attachment #3: Type: text/plain, Size: 436 bytes --]

------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk

[-- Attachment #4: Type: text/plain, Size: 194 bytes --]

_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [uml-devel] Contribution - Bug fixes and contributions to UML
  2014-02-28  8:54   ` Anton Ivanov (antivano)
@ 2014-03-06  6:52     ` Anton Ivanov (antivano)
  0 siblings, 0 replies; 5+ messages in thread
From: Anton Ivanov (antivano) @ 2014-03-06  6:52 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: jdike, user-mode-linux-devel


[-- Attachment #1.1: Type: text/plain, Size: 6426 bytes --]

HI Richard, hi list.

I will reformat these (as well as the new drivers), clean them up to the current coding standard and resubmit so it will be easier to review them and consider form merging.

Apology for not coming back to this earlier - I was busy with the qemu counterparts of the transport patches (I will also apply the "lessons learned" from there to these).

A.

On 28/02/14 08:54, Anton Ivanov (antivano) wrote:

Bugfixes.

I need to pull actual changesets for the drivers, etc properly and
verify that they build so those will be coming next week. You will be
getting them one by one.

1. Memory corruption.

The reverse case of this race (you need to msync, before you do non-mmap
fileops) is well known and textbook. This is the first and only time I
have seen this one (fsync before mmap). I have not heard it mentioned
either. It is however fairly easy to reproduce. If you run 200+ UML on a
system ~0.2-0.5% will always die at startup with a memory corruption
warning. While this does not happen every time (0.2-0.5% and only on
startup) it is very reproducible for systems running lots of UMLs.

Once this fix went in we stopped seeing that one. Observed on 3.2, 3.3
and 3.8, fix tested on 3.2, 3.3, 3.4 and 3.8.

2. SIGPIPE.

Linux actually produces SIGPIPE ane EPIPE not only on missing reader. It
will produce it under some circumstances on a stalled reader. Discovered
when running UML under expect and/or trying to use fds and other virtual
serials to do management transactions.

While I have  not seen it on UML internal pipes I would not be surprised
if you can reproduce it there too (f.e. if ubd thread is too slow). So
SIGPIPE needs to be disabled. From there on, for most drivers have
correct error handling for this.

Observed on 3.2, 3.3 and 3.8, fix tested on 3.2, 3.3, 3.4 and 3.8.

A.


On 28/02/14 08:33, Richard Weinberger wrote:


Am 28.02.2014 09:27, schrieb Anton Ivanov (antivano):


Hi Richard, Hi Jeff, hi list,

On behalf of Cisco systems, I am authorized to make a offer a set bug
fixes as well as contribute several additional features and performance
improvements to UML. All of these have been used internally for a couple
of years and will ship as parts of product(s) in the near future. Some
of these improve performance by up to 8 times on use cases which are of
interest to us and are likely to be of interest to the community.

As the full patchset is now in the 100k+ zone, so I am going to do only
the announcement now and submit the patches one by one after that over
the next 1-2 weeks.

We will submit separately bug fixes for:

1. Critical memory corruption on startup observed on heavily loaded
machines (especially when multiple UMLs run simultaneously).
2. Fix(es) for incorrect handling of error conditions when UML is run
under expect and conX=fd: is used to communicate with another process.
The same error may be observed on internal UML IPCs too leading to
immediate crash.

I will also file bugs for both vs Debian UML package so that patches for
both can go in ASAP.

In addition to the bug fixes, the new features include:

1. Several transports. All can do up to multi-gigabit throughput on some
scenarios. We are contributing their counterparts to qemu/kvm as well.

1.1. Direct connection of UML to overlay networks/L2 VPNs using L2TPv3.

This has a number of advantages compared to the existing UML "multicast"
and qemu "socket" transports.

    * Standard compliant - RFC 3931 updated recently by RFC 5641
    * Supported on most network equipment
    * Allowing to move virtual switching off-host to an NPU or high
performance physical switch
    * Allowing to mix virtual and physical switching (well supported on
modern Linuxes and other OSes)
    * Well researched security profile as well as established
interactions with IPSEC allowing to extend virtual networks outside the
datacenter to remote physical devices and/or VMs.

1.2. Raw transport which allows both bi-directional communication with
any network device which looks like Ethernet as well as in-span
listening at speeds in the multi-gigabit range.

1.3. We intend to contribute other key overlay transports like GRE, etc
as well. The ones we are contributing at this point are the ones which
we have used most extensively and have had the most testing (~ 1.5-2 years).

2. New high res timer subsystem

Adding these new network transports to UML revealed a key issue - it
cannot meter or shape any traffic correctly as its internal timer system
is way off. Personally, I consider it a bug, however there is no "easy"
fix here. The only way to fix it is a new timer driver. Unfortunately,
it does not fix uml userspace - timers there remain off. It does fix all
kernel timer functionality - traffic shaping (both qdisc and iptables
traffic limits).

As a side effect, this provides performance improvements for tcp and
other protocols which rely on kernel high res timers for their state
machines.

We have further scalability contributions lined up which improve network
and IO performance between 1.5 and 8 times (depending on use case),
allow hundreds of virtual interfaces per UML without performance
penalties, allow to run several hundreds (if not thousands) of UMLs per
machine, etc. All in all, it can no go where no virtualization and no
virtual networking has gone before.

However, I would prefer to take it one step at a time and get through
these first (even these are quite a lot for one "sitting").


Sounds awesome!

Please send the patches as soon as possible.
I'm eager to test and merge them.

Thanks,
//richard







------------------------------------------------------------------------------
Flow-based real-time traffic analytics software. Cisco certified tool.
Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer
Customize your own dashboards, set traffic alerts and generate reports.
Network behavioral analysis & security monitoring. All-in-one tool.
http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk



_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net<mailto:User-mode-linux-devel@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel



[-- Attachment #1.2: Type: text/html, Size: 7364 bytes --]

[-- Attachment #2: Type: text/plain, Size: 451 bytes --]

------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk

[-- Attachment #3: Type: text/plain, Size: 194 bytes --]

_______________________________________________
User-mode-linux-devel mailing list
User-mode-linux-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/user-mode-linux-devel

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-03-06  6:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-28  8:27 [uml-devel] Contribution - Bug fixes and contributions to UML Anton Ivanov (antivano)
2014-02-28  8:33 ` Richard Weinberger
2014-02-28  8:54   ` Anton Ivanov (antivano)
2014-03-06  6:52     ` Anton Ivanov (antivano)
2014-02-28 10:53   ` Anton Ivanov (antivano)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.