All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-26  2:31 ` Dmitry Safonov
  0 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-26  2:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev, Andy Lutomirski, Ard Biesheuvel,
	H. Peter Anvin, Ingo Molnar, John Stultz, Kirill A. Shutemov,
	Oleg Nesterov, Stephen Boyd, Steven Rostedt, Thomas Gleixner,
	x86, linux-efi, Andrew Morton, Greg Kroah-Hartman,
	Mauro Carvalho Chehab, Shuah Khan, linux-kselftest, Eric Paris,
	Florian Westphal, Jozsef Kadlecsik, Pablo Neira Ayuso,
	Paul Moore, coreteam, linux-audit, netfilter-devel, Fan Du

Due to some historical mistake, xfrm User ABI differ between native and
compatible applications. The difference is in structures paddings and in
the result in the size of netlink messages.
As it's already visible ABI, it cannot be adjusted by packing structures.

Possibility for compatible application to manage xfrm tunnels was
disabled by: the commmit 19d7df69fdb2 ("xfrm: Refuse to insert 32 bit
userspace socket policies on 64 bit systems") and the commit 74005991b78a
("xfrm: Do not parse 32bits compiled xfrm netlink msg on 64bits host").

By some wonderful reasons and brilliant architecture decisions for
creating userspace, on Arista switches we still use 32-bit userspace
with 64-bit kernel. There is slow movement to full 64-bit build, but
it's not yet here. As the switches need support for ipsec tunnels, the
local kernel has reverted mentioned patches that disable xfrm for
compat apps. On the top of that there is a bunch of disgraceful hacks
in userspace to work around the size check for netlink messages
and all that jazz.

It looks like, we're not the only desirable users of compatible xfrm,
there were a couple of attempts to make it work:
https://lkml.org/lkml/2017/1/20/733
https://patchwork.ozlabs.org/patch/44600/
http://netdev.vger.kernel.narkive.com/2Gesykj6/patch-net-next-xfrm-correctly-parse-netlink-msg-from-32bits-ip-command-on-64bits-host

All the discussions end in the conclusion that xfrm should have a full
compatible layer to correctly work with 32-bit applications on 64-bit
kernels:
https://lkml.org/lkml/2017/1/23/413
https://patchwork.ozlabs.org/patch/433279/

In some recent lkml discussion, Linus said that it's worth to fix this
problem and not giving people an excuse to stay on 32-bit kernel:
https://lkml.org/lkml/2018/2/13/752

So, here I add a compatible layer to xfrm.
As xfrm uses netlink notifications, kernel should send them in ABI
format that an application will parse. The proposed solution is
to save the ABI of bind() syscall. The realization detail is
to create kernel-hidden, non visible to userspace netlink groups
for compat applications.

The first two patches simplify ifdeffery, and while I've already submitted
them a while ago, I'm resending them for completeness:
https://lore.kernel.org/lkml/20180717005004.25984-1-dima@arista.com/T/#u

There is also an exhaustive selftest for ipsec tunnels and to check
that kernel parses correctly the structures those differ in size.
It doesn't depend on any library and compat version can be easy
build with: make CFLAGS=-m32 net/ipsec

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: netdev@vger.kernel.org

Dmitry Safonov (18):
  x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT
  compat: Cleanup in_compat_syscall() callers
  selftest/net/xfrm: Add test for ipsec tunnel
  net/xfrm: Add _packed types for compat users
  net/xfrm: Parse userspi_info{,_packed} depending on syscall
  netlink: Do not subscribe to non-existent groups
  netlink: Pass groups pointer to .bind()
  xfrm: Add in-kernel groups for compat notifications
  xfrm: Dump usersa_info in compat/native formats
  xfrm: Send state notifications in compat format too
  xfrm: Add compat support for xfrm_user_expire messages
  xfrm: Add compat support for xfrm_userpolicy_info messages
  xfrm: Add compat support for xfrm_user_acquire messages
  xfrm: Add compat support for xfrm_user_polexpire messages
  xfrm: Check compat acquire listeners in xfrm_is_alive()
  xfrm: Notify compat listeners about policy flush
  xfrm: Notify compat listeners about state flush
  xfrm: Enable compat syscalls

 MAINTAINERS                            |    1 +
 arch/x86/include/asm/compat.h          |    9 +-
 arch/x86/include/asm/ftrace.h          |    4 +-
 arch/x86/kernel/process_64.c           |    4 +-
 arch/x86/kernel/sys_x86_64.c           |   11 +-
 arch/x86/mm/hugetlbpage.c              |    4 +-
 arch/x86/mm/mmap.c                     |    2 +-
 drivers/firmware/efi/efivars.c         |   16 +-
 include/linux/compat.h                 |    4 +-
 include/linux/netlink.h                |    2 +-
 include/net/xfrm.h                     |   14 -
 kernel/audit.c                         |    2 +-
 kernel/time/time.c                     |    2 +-
 net/core/rtnetlink.c                   |   14 +-
 net/core/sock_diag.c                   |   25 +-
 net/netfilter/nfnetlink.c              |   24 +-
 net/netlink/af_netlink.c               |   28 +-
 net/netlink/af_netlink.h               |    4 +-
 net/netlink/genetlink.c                |   26 +-
 net/xfrm/xfrm_state.c                  |    5 -
 net/xfrm/xfrm_user.c                   |  690 ++++++++---
 tools/testing/selftests/net/.gitignore |    1 +
 tools/testing/selftests/net/Makefile   |    1 +
 tools/testing/selftests/net/ipsec.c    | 1987 ++++++++++++++++++++++++++++++++
 24 files changed, 2612 insertions(+), 268 deletions(-)
 create mode 100644 tools/testing/selftests/net/ipsec.c

-- 
2.13.6


^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-26  2:31 ` Dmitry Safonov
  0 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-26  2:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev, Andy Lutomirski, Ard Biesheuvel,
	H. Peter Anvin, Ingo Molnar, John Stultz, Kirill A. Shutemov,
	Oleg Nesterov, Stephen Boyd, Steven Rostedt, Thomas Gleixner,
	x86, linux-efi, Andrew Morton, Greg Kroah-Hartman, Mauro

Due to some historical mistake, xfrm User ABI differ between native and
compatible applications. The difference is in structures paddings and in
the result in the size of netlink messages.
As it's already visible ABI, it cannot be adjusted by packing structures.

Possibility for compatible application to manage xfrm tunnels was
disabled by: the commmit 19d7df69fdb2 ("xfrm: Refuse to insert 32 bit
userspace socket policies on 64 bit systems") and the commit 74005991b78a
("xfrm: Do not parse 32bits compiled xfrm netlink msg on 64bits host").

By some wonderful reasons and brilliant architecture decisions for
creating userspace, on Arista switches we still use 32-bit userspace
with 64-bit kernel. There is slow movement to full 64-bit build, but
it's not yet here. As the switches need support for ipsec tunnels, the
local kernel has reverted mentioned patches that disable xfrm for
compat apps. On the top of that there is a bunch of disgraceful hacks
in userspace to work around the size check for netlink messages
and all that jazz.

It looks like, we're not the only desirable users of compatible xfrm,
there were a couple of attempts to make it work:
https://lkml.org/lkml/2017/1/20/733
https://patchwork.ozlabs.org/patch/44600/
http://netdev.vger.kernel.narkive.com/2Gesykj6/patch-net-next-xfrm-correctly-parse-netlink-msg-from-32bits-ip-command-on-64bits-host

All the discussions end in the conclusion that xfrm should have a full
compatible layer to correctly work with 32-bit applications on 64-bit
kernels:
https://lkml.org/lkml/2017/1/23/413
https://patchwork.ozlabs.org/patch/433279/

In some recent lkml discussion, Linus said that it's worth to fix this
problem and not giving people an excuse to stay on 32-bit kernel:
https://lkml.org/lkml/2018/2/13/752

So, here I add a compatible layer to xfrm.
As xfrm uses netlink notifications, kernel should send them in ABI
format that an application will parse. The proposed solution is
to save the ABI of bind() syscall. The realization detail is
to create kernel-hidden, non visible to userspace netlink groups
for compat applications.

The first two patches simplify ifdeffery, and while I've already submitted
them a while ago, I'm resending them for completeness:
https://lore.kernel.org/lkml/20180717005004.25984-1-dima@arista.com/T/#u

There is also an exhaustive selftest for ipsec tunnels and to check
that kernel parses correctly the structures those differ in size.
It doesn't depend on any library and compat version can be easy
build with: make CFLAGS=-m32 net/ipsec

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: netdev@vger.kernel.org

Dmitry Safonov (18):
  x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT
  compat: Cleanup in_compat_syscall() callers
  selftest/net/xfrm: Add test for ipsec tunnel
  net/xfrm: Add _packed types for compat users
  net/xfrm: Parse userspi_info{,_packed} depending on syscall
  netlink: Do not subscribe to non-existent groups
  netlink: Pass groups pointer to .bind()
  xfrm: Add in-kernel groups for compat notifications
  xfrm: Dump usersa_info in compat/native formats
  xfrm: Send state notifications in compat format too
  xfrm: Add compat support for xfrm_user_expire messages
  xfrm: Add compat support for xfrm_userpolicy_info messages
  xfrm: Add compat support for xfrm_user_acquire messages
  xfrm: Add compat support for xfrm_user_polexpire messages
  xfrm: Check compat acquire listeners in xfrm_is_alive()
  xfrm: Notify compat listeners about policy flush
  xfrm: Notify compat listeners about state flush
  xfrm: Enable compat syscalls

 MAINTAINERS                            |    1 +
 arch/x86/include/asm/compat.h          |    9 +-
 arch/x86/include/asm/ftrace.h          |    4 +-
 arch/x86/kernel/process_64.c           |    4 +-
 arch/x86/kernel/sys_x86_64.c           |   11 +-
 arch/x86/mm/hugetlbpage.c              |    4 +-
 arch/x86/mm/mmap.c                     |    2 +-
 drivers/firmware/efi/efivars.c         |   16 +-
 include/linux/compat.h                 |    4 +-
 include/linux/netlink.h                |    2 +-
 include/net/xfrm.h                     |   14 -
 kernel/audit.c                         |    2 +-
 kernel/time/time.c                     |    2 +-
 net/core/rtnetlink.c                   |   14 +-
 net/core/sock_diag.c                   |   25 +-
 net/netfilter/nfnetlink.c              |   24 +-
 net/netlink/af_netlink.c               |   28 +-
 net/netlink/af_netlink.h               |    4 +-
 net/netlink/genetlink.c                |   26 +-
 net/xfrm/xfrm_state.c                  |    5 -
 net/xfrm/xfrm_user.c                   |  690 ++++++++---
 tools/testing/selftests/net/.gitignore |    1 +
 tools/testing/selftests/net/Makefile   |    1 +
 tools/testing/selftests/net/ipsec.c    | 1987 ++++++++++++++++++++++++++++++++
 24 files changed, 2612 insertions(+), 268 deletions(-)
 create mode 100644 tools/testing/selftests/net/ipsec.c

-- 
2.13.6

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-26  2:31 ` Dmitry Safonov
  0 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-26  2:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev, Andy Lutomirski, Ard Biesheuvel,
	H. Peter Anvin, Ingo Molnar, John Stultz, Kirill A. Shutemov,
	Oleg Nesterov, Stephen Boyd, Steven Rostedt, Thomas Gleixner,
	x86, linux-efi, Andrew Morton, Greg Kroah-Hartman, Mauro

Due to some historical mistake, xfrm User ABI differ between native and
compatible applications. The difference is in structures paddings and in
the result in the size of netlink messages.
As it's already visible ABI, it cannot be adjusted by packing structures.

Possibility for compatible application to manage xfrm tunnels was
disabled by: the commmit 19d7df69fdb2 ("xfrm: Refuse to insert 32 bit
userspace socket policies on 64 bit systems") and the commit 74005991b78a
("xfrm: Do not parse 32bits compiled xfrm netlink msg on 64bits host").

By some wonderful reasons and brilliant architecture decisions for
creating userspace, on Arista switches we still use 32-bit userspace
with 64-bit kernel. There is slow movement to full 64-bit build, but
it's not yet here. As the switches need support for ipsec tunnels, the
local kernel has reverted mentioned patches that disable xfrm for
compat apps. On the top of that there is a bunch of disgraceful hacks
in userspace to work around the size check for netlink messages
and all that jazz.

It looks like, we're not the only desirable users of compatible xfrm,
there were a couple of attempts to make it work:
https://lkml.org/lkml/2017/1/20/733
https://patchwork.ozlabs.org/patch/44600/
http://netdev.vger.kernel.narkive.com/2Gesykj6/patch-net-next-xfrm-correctly-parse-netlink-msg-from-32bits-ip-command-on-64bits-host

All the discussions end in the conclusion that xfrm should have a full
compatible layer to correctly work with 32-bit applications on 64-bit
kernels:
https://lkml.org/lkml/2017/1/23/413
https://patchwork.ozlabs.org/patch/433279/

In some recent lkml discussion, Linus said that it's worth to fix this
problem and not giving people an excuse to stay on 32-bit kernel:
https://lkml.org/lkml/2018/2/13/752

So, here I add a compatible layer to xfrm.
As xfrm uses netlink notifications, kernel should send them in ABI
format that an application will parse. The proposed solution is
to save the ABI of bind() syscall. The realization detail is
to create kernel-hidden, non visible to userspace netlink groups
for compat applications.

The first two patches simplify ifdeffery, and while I've already submitted
them a while ago, I'm resending them for completeness:
https://lore.kernel.org/lkml/20180717005004.25984-1-dima@arista.com/T/#u

There is also an exhaustive selftest for ipsec tunnels and to check
that kernel parses correctly the structures those differ in size.
It doesn't depend on any library and compat version can be easy
build with: make CFLAGS=-m32 net/ipsec

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: netdev@vger.kernel.org

Dmitry Safonov (18):
  x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT
  compat: Cleanup in_compat_syscall() callers
  selftest/net/xfrm: Add test for ipsec tunnel
  net/xfrm: Add _packed types for compat users
  net/xfrm: Parse userspi_info{,_packed} depending on syscall
  netlink: Do not subscribe to non-existent groups
  netlink: Pass groups pointer to .bind()
  xfrm: Add in-kernel groups for compat notifications
  xfrm: Dump usersa_info in compat/native formats
  xfrm: Send state notifications in compat format too
  xfrm: Add compat support for xfrm_user_expire messages
  xfrm: Add compat support for xfrm_userpolicy_info messages
  xfrm: Add compat support for xfrm_user_acquire messages
  xfrm: Add compat support for xfrm_user_polexpire messages
  xfrm: Check compat acquire listeners in xfrm_is_alive()
  xfrm: Notify compat listeners about policy flush
  xfrm: Notify compat listeners about state flush
  xfrm: Enable compat syscalls

 MAINTAINERS                            |    1 +
 arch/x86/include/asm/compat.h          |    9 +-
 arch/x86/include/asm/ftrace.h          |    4 +-
 arch/x86/kernel/process_64.c           |    4 +-
 arch/x86/kernel/sys_x86_64.c           |   11 +-
 arch/x86/mm/hugetlbpage.c              |    4 +-
 arch/x86/mm/mmap.c                     |    2 +-
 drivers/firmware/efi/efivars.c         |   16 +-
 include/linux/compat.h                 |    4 +-
 include/linux/netlink.h                |    2 +-
 include/net/xfrm.h                     |   14 -
 kernel/audit.c                         |    2 +-
 kernel/time/time.c                     |    2 +-
 net/core/rtnetlink.c                   |   14 +-
 net/core/sock_diag.c                   |   25 +-
 net/netfilter/nfnetlink.c              |   24 +-
 net/netlink/af_netlink.c               |   28 +-
 net/netlink/af_netlink.h               |    4 +-
 net/netlink/genetlink.c                |   26 +-
 net/xfrm/xfrm_state.c                  |    5 -
 net/xfrm/xfrm_user.c                   |  690 ++++++++---
 tools/testing/selftests/net/.gitignore |    1 +
 tools/testing/selftests/net/Makefile   |    1 +
 tools/testing/selftests/net/ipsec.c    | 1987 ++++++++++++++++++++++++++++++++
 24 files changed, 2612 insertions(+), 268 deletions(-)
 create mode 100644 tools/testing/selftests/net/ipsec.c

-- 
2.13.6

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-26  2:31 ` Dmitry Safonov
  0 siblings, 0 replies; 74+ messages in thread
From: dima @ 2018-07-26  2:31 UTC (permalink / raw)


Due to some historical mistake, xfrm User ABI differ between native and
compatible applications. The difference is in structures paddings and in
the result in the size of netlink messages.
As it's already visible ABI, it cannot be adjusted by packing structures.

Possibility for compatible application to manage xfrm tunnels was
disabled by: the commmit 19d7df69fdb2 ("xfrm: Refuse to insert 32 bit
userspace socket policies on 64 bit systems") and the commit 74005991b78a
("xfrm: Do not parse 32bits compiled xfrm netlink msg on 64bits host").

By some wonderful reasons and brilliant architecture decisions for
creating userspace, on Arista switches we still use 32-bit userspace
with 64-bit kernel. There is slow movement to full 64-bit build, but
it's not yet here. As the switches need support for ipsec tunnels, the
local kernel has reverted mentioned patches that disable xfrm for
compat apps. On the top of that there is a bunch of disgraceful hacks
in userspace to work around the size check for netlink messages
and all that jazz.

It looks like, we're not the only desirable users of compatible xfrm,
there were a couple of attempts to make it work:
https://lkml.org/lkml/2017/1/20/733
https://patchwork.ozlabs.org/patch/44600/
http://netdev.vger.kernel.narkive.com/2Gesykj6/patch-net-next-xfrm-correctly-parse-netlink-msg-from-32bits-ip-command-on-64bits-host

All the discussions end in the conclusion that xfrm should have a full
compatible layer to correctly work with 32-bit applications on 64-bit
kernels:
https://lkml.org/lkml/2017/1/23/413
https://patchwork.ozlabs.org/patch/433279/

In some recent lkml discussion, Linus said that it's worth to fix this
problem and not giving people an excuse to stay on 32-bit kernel:
https://lkml.org/lkml/2018/2/13/752

So, here I add a compatible layer to xfrm.
As xfrm uses netlink notifications, kernel should send them in ABI
format that an application will parse. The proposed solution is
to save the ABI of bind() syscall. The realization detail is
to create kernel-hidden, non visible to userspace netlink groups
for compat applications.

The first two patches simplify ifdeffery, and while I've already submitted
them a while ago, I'm resending them for completeness:
https://lore.kernel.org/lkml/20180717005004.25984-1-dima at arista.com/T/#u

There is also an exhaustive selftest for ipsec tunnels and to check
that kernel parses correctly the structures those differ in size.
It doesn't depend on any library and compat version can be easy
build with: make CFLAGS=-m32 net/ipsec

Cc: "David S. Miller" <davem at davemloft.net>
Cc: Herbert Xu <herbert at gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert at secunet.com>
Cc: Dmitry Safonov <0x7f454c46 at gmail.com>
Cc: netdev at vger.kernel.org

Dmitry Safonov (18):
  x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT
  compat: Cleanup in_compat_syscall() callers
  selftest/net/xfrm: Add test for ipsec tunnel
  net/xfrm: Add _packed types for compat users
  net/xfrm: Parse userspi_info{,_packed} depending on syscall
  netlink: Do not subscribe to non-existent groups
  netlink: Pass groups pointer to .bind()
  xfrm: Add in-kernel groups for compat notifications
  xfrm: Dump usersa_info in compat/native formats
  xfrm: Send state notifications in compat format too
  xfrm: Add compat support for xfrm_user_expire messages
  xfrm: Add compat support for xfrm_userpolicy_info messages
  xfrm: Add compat support for xfrm_user_acquire messages
  xfrm: Add compat support for xfrm_user_polexpire messages
  xfrm: Check compat acquire listeners in xfrm_is_alive()
  xfrm: Notify compat listeners about policy flush
  xfrm: Notify compat listeners about state flush
  xfrm: Enable compat syscalls

 MAINTAINERS                            |    1 +
 arch/x86/include/asm/compat.h          |    9 +-
 arch/x86/include/asm/ftrace.h          |    4 +-
 arch/x86/kernel/process_64.c           |    4 +-
 arch/x86/kernel/sys_x86_64.c           |   11 +-
 arch/x86/mm/hugetlbpage.c              |    4 +-
 arch/x86/mm/mmap.c                     |    2 +-
 drivers/firmware/efi/efivars.c         |   16 +-
 include/linux/compat.h                 |    4 +-
 include/linux/netlink.h                |    2 +-
 include/net/xfrm.h                     |   14 -
 kernel/audit.c                         |    2 +-
 kernel/time/time.c                     |    2 +-
 net/core/rtnetlink.c                   |   14 +-
 net/core/sock_diag.c                   |   25 +-
 net/netfilter/nfnetlink.c              |   24 +-
 net/netlink/af_netlink.c               |   28 +-
 net/netlink/af_netlink.h               |    4 +-
 net/netlink/genetlink.c                |   26 +-
 net/xfrm/xfrm_state.c                  |    5 -
 net/xfrm/xfrm_user.c                   |  690 ++++++++---
 tools/testing/selftests/net/.gitignore |    1 +
 tools/testing/selftests/net/Makefile   |    1 +
 tools/testing/selftests/net/ipsec.c    | 1987 ++++++++++++++++++++++++++++++++
 24 files changed, 2612 insertions(+), 268 deletions(-)
 create mode 100644 tools/testing/selftests/net/ipsec.c

-- 
2.13.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-26  2:31 ` Dmitry Safonov
  0 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-26  2:31 UTC (permalink / raw)


Due to some historical mistake, xfrm User ABI differ between native and
compatible applications. The difference is in structures paddings and in
the result in the size of netlink messages.
As it's already visible ABI, it cannot be adjusted by packing structures.

Possibility for compatible application to manage xfrm tunnels was
disabled by: the commmit 19d7df69fdb2 ("xfrm: Refuse to insert 32 bit
userspace socket policies on 64 bit systems") and the commit 74005991b78a
("xfrm: Do not parse 32bits compiled xfrm netlink msg on 64bits host").

By some wonderful reasons and brilliant architecture decisions for
creating userspace, on Arista switches we still use 32-bit userspace
with 64-bit kernel. There is slow movement to full 64-bit build, but
it's not yet here. As the switches need support for ipsec tunnels, the
local kernel has reverted mentioned patches that disable xfrm for
compat apps. On the top of that there is a bunch of disgraceful hacks
in userspace to work around the size check for netlink messages
and all that jazz.

It looks like, we're not the only desirable users of compatible xfrm,
there were a couple of attempts to make it work:
https://lkml.org/lkml/2017/1/20/733
https://patchwork.ozlabs.org/patch/44600/
http://netdev.vger.kernel.narkive.com/2Gesykj6/patch-net-next-xfrm-correctly-parse-netlink-msg-from-32bits-ip-command-on-64bits-host

All the discussions end in the conclusion that xfrm should have a full
compatible layer to correctly work with 32-bit applications on 64-bit
kernels:
https://lkml.org/lkml/2017/1/23/413
https://patchwork.ozlabs.org/patch/433279/

In some recent lkml discussion, Linus said that it's worth to fix this
problem and not giving people an excuse to stay on 32-bit kernel:
https://lkml.org/lkml/2018/2/13/752

So, here I add a compatible layer to xfrm.
As xfrm uses netlink notifications, kernel should send them in ABI
format that an application will parse. The proposed solution is
to save the ABI of bind() syscall. The realization detail is
to create kernel-hidden, non visible to userspace netlink groups
for compat applications.

The first two patches simplify ifdeffery, and while I've already submitted
them a while ago, I'm resending them for completeness:
https://lore.kernel.org/lkml/20180717005004.25984-1-dima at arista.com/T/#u

There is also an exhaustive selftest for ipsec tunnels and to check
that kernel parses correctly the structures those differ in size.
It doesn't depend on any library and compat version can be easy
build with: make CFLAGS=-m32 net/ipsec

Cc: "David S. Miller" <davem at davemloft.net>
Cc: Herbert Xu <herbert at gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert at secunet.com>
Cc: Dmitry Safonov <0x7f454c46 at gmail.com>
Cc: netdev at vger.kernel.org

Dmitry Safonov (18):
  x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT
  compat: Cleanup in_compat_syscall() callers
  selftest/net/xfrm: Add test for ipsec tunnel
  net/xfrm: Add _packed types for compat users
  net/xfrm: Parse userspi_info{,_packed} depending on syscall
  netlink: Do not subscribe to non-existent groups
  netlink: Pass groups pointer to .bind()
  xfrm: Add in-kernel groups for compat notifications
  xfrm: Dump usersa_info in compat/native formats
  xfrm: Send state notifications in compat format too
  xfrm: Add compat support for xfrm_user_expire messages
  xfrm: Add compat support for xfrm_userpolicy_info messages
  xfrm: Add compat support for xfrm_user_acquire messages
  xfrm: Add compat support for xfrm_user_polexpire messages
  xfrm: Check compat acquire listeners in xfrm_is_alive()
  xfrm: Notify compat listeners about policy flush
  xfrm: Notify compat listeners about state flush
  xfrm: Enable compat syscalls

 MAINTAINERS                            |    1 +
 arch/x86/include/asm/compat.h          |    9 +-
 arch/x86/include/asm/ftrace.h          |    4 +-
 arch/x86/kernel/process_64.c           |    4 +-
 arch/x86/kernel/sys_x86_64.c           |   11 +-
 arch/x86/mm/hugetlbpage.c              |    4 +-
 arch/x86/mm/mmap.c                     |    2 +-
 drivers/firmware/efi/efivars.c         |   16 +-
 include/linux/compat.h                 |    4 +-
 include/linux/netlink.h                |    2 +-
 include/net/xfrm.h                     |   14 -
 kernel/audit.c                         |    2 +-
 kernel/time/time.c                     |    2 +-
 net/core/rtnetlink.c                   |   14 +-
 net/core/sock_diag.c                   |   25 +-
 net/netfilter/nfnetlink.c              |   24 +-
 net/netlink/af_netlink.c               |   28 +-
 net/netlink/af_netlink.h               |    4 +-
 net/netlink/genetlink.c                |   26 +-
 net/xfrm/xfrm_state.c                  |    5 -
 net/xfrm/xfrm_user.c                   |  690 ++++++++---
 tools/testing/selftests/net/.gitignore |    1 +
 tools/testing/selftests/net/Makefile   |    1 +
 tools/testing/selftests/net/ipsec.c    | 1987 ++++++++++++++++++++++++++++++++
 24 files changed, 2612 insertions(+), 268 deletions(-)
 create mode 100644 tools/testing/selftests/net/ipsec.c

-- 
2.13.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 01/18] x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT
  2018-07-26  2:31 ` Dmitry Safonov
                   ` (3 preceding siblings ...)
  (?)
@ 2018-07-26  2:31 ` Dmitry Safonov
  -1 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-26  2:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev, Andy Lutomirski, Ard Biesheuvel,
	H. Peter Anvin, Ingo Molnar, John Stultz, Kirill A. Shutemov,
	Oleg Nesterov, Stephen Boyd, Steven Rostedt, Thomas Gleixner,
	x86, linux-efi

The result of in_compat_syscall() can be pictured as:

x86 platform:
    ---------------------------------------------------
    |  Arch\syscall  |  64-bit  |   ia32   |   x32    |
    |-------------------------------------------------|
    |     x86_64     |  false   |   true   |   true   |
    |-------------------------------------------------|
    |      i686      |  false   |  <true>  |   false  |
    ---------------------------------------------------

Other platforms:
    ------------------------------------------------
    |  Arch\syscall  |  64-bit  |   compat (32?)   |
    |----------------------------------------------|
    |     64-bit     |  false   |       true       |
    |----------------------------------------------|
    |    32-bit(?)   |  false   |     <false>      |
    ------------------------------------------------

As it seen, the result of in_compat_syscall() on generic 32-bit platform
differs from i686.

There is no reason for in_compat_syscall() == true on native i686.
It also easy to misread code if the result on native 32-bit platform
differs between arches.
Because of that non arch-specific code has many places with:
    if (IS_ENABLED(CONFIG_COMPAT) && in_compat_syscall())
in different variations.

It looks-like the only non-x86 code which uses in_compat_syscall() not
under CONFIG_COMPAT guard is in amd/amdkfd. But according to
the commit a18069c132cb ("amdkfd: Disable support for 32-bit user
processes"), it actually should be disabled on native i686.

Rename in_compat_syscall() to in_32bit_syscall() for x86-specific code
and make in_compat_syscall() false under !CONFIG_COMPAT.

With a following patch I'll clean generic users which were forced
to check IS_ENABLED(CONFIG_COMPAT) with in_compat_syscall().

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: linux-efi@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 arch/x86/include/asm/compat.h |  9 ++++++++-
 arch/x86/include/asm/ftrace.h |  4 +---
 arch/x86/kernel/process_64.c  |  4 ++--
 arch/x86/kernel/sys_x86_64.c  | 11 ++++++-----
 arch/x86/mm/hugetlbpage.c     |  4 ++--
 arch/x86/mm/mmap.c            |  2 +-
 include/linux/compat.h        |  4 ++--
 7 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h
index fb97cf7c4137..626bcf1d037d 100644
--- a/arch/x86/include/asm/compat.h
+++ b/arch/x86/include/asm/compat.h
@@ -232,11 +232,18 @@ static inline bool in_x32_syscall(void)
 	return false;
 }
 
-static inline bool in_compat_syscall(void)
+static inline bool in_32bit_syscall(void)
 {
 	return in_ia32_syscall() || in_x32_syscall();
 }
+
+#ifdef CONFIG_COMPAT
+static inline bool in_compat_syscall(void)
+{
+	return in_32bit_syscall();
+}
 #define in_compat_syscall in_compat_syscall	/* override the generic impl */
+#endif
 
 struct compat_siginfo;
 int __copy_siginfo_to_user32(struct compat_siginfo __user *to,
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index c18ed65287d5..cf350639e76d 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -76,9 +76,7 @@ static inline bool arch_syscall_match_sym_name(const char *sym, const char *name
 #define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS 1
 static inline bool arch_trace_is_compat_syscall(struct pt_regs *regs)
 {
-	if (in_compat_syscall())
-		return true;
-	return false;
+	return in_32bit_syscall();
 }
 #endif /* CONFIG_FTRACE_SYSCALLS && CONFIG_IA32_EMULATION */
 #endif /* !COMPILE_OFFSETS */
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 12bb445fb98d..3a6f3cf27808 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -564,10 +564,10 @@ static void __set_personality_x32(void)
 		current->mm->context.ia32_compat = TIF_X32;
 	current->personality &= ~READ_IMPLIES_EXEC;
 	/*
-	 * in_compat_syscall() uses the presence of the x32 syscall bit
+	 * in_32bit_syscall() uses the presence of the x32 syscall bit
 	 * flag to determine compat status.  The x86 mmap() code relies on
 	 * the syscall bitness so set x32 syscall bit right here to make
-	 * in_compat_syscall() work during exec().
+	 * in_32bit_syscall() work during exec().
 	 *
 	 * Pretend to come from a x32 execve.
 	 */
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index 6a78d4b36a79..f7476ce23b6e 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -105,7 +105,7 @@ SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
 static void find_start_end(unsigned long addr, unsigned long flags,
 		unsigned long *begin, unsigned long *end)
 {
-	if (!in_compat_syscall() && (flags & MAP_32BIT)) {
+	if (!in_32bit_syscall() && (flags & MAP_32BIT)) {
 		/* This is usually used needed to map code in small
 		   model, so it needs to be in the first 31bit. Limit
 		   it to that.  This means we need to move the
@@ -122,7 +122,7 @@ static void find_start_end(unsigned long addr, unsigned long flags,
 	}
 
 	*begin	= get_mmap_base(1);
-	if (in_compat_syscall())
+	if (in_32bit_syscall())
 		*end = task_size_32bit();
 	else
 		*end = task_size_64bit(addr > DEFAULT_MAP_WINDOW);
@@ -193,7 +193,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 		return addr;
 
 	/* for MAP_32BIT mappings we force the legacy mmap base */
-	if (!in_compat_syscall() && (flags & MAP_32BIT))
+	if (!in_32bit_syscall() && (flags & MAP_32BIT))
 		goto bottomup;
 
 	/* requesting a specific address */
@@ -217,9 +217,10 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
 	 * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
 	 * in the full address space.
 	 *
-	 * !in_compat_syscall() check to avoid high addresses for x32.
+	 * !in_32bit_syscall() check to avoid high addresses for x32
+	 * (and make it no op on native i386).
 	 */
-	if (addr > DEFAULT_MAP_WINDOW && !in_compat_syscall())
+	if (addr > DEFAULT_MAP_WINDOW && !in_32bit_syscall())
 		info.high_limit += TASK_SIZE_MAX - DEFAULT_MAP_WINDOW;
 
 	info.align_mask = 0;
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 00b296617ca4..92e4c4b85bba 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -92,7 +92,7 @@ static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file,
 	 * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
 	 * in the full address space.
 	 */
-	info.high_limit = in_compat_syscall() ?
+	info.high_limit = in_32bit_syscall() ?
 		task_size_32bit() : task_size_64bit(addr > DEFAULT_MAP_WINDOW);
 
 	info.align_mask = PAGE_MASK & ~huge_page_mask(h);
@@ -116,7 +116,7 @@ static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file,
 	 * If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
 	 * in the full address space.
 	 */
-	if (addr > DEFAULT_MAP_WINDOW && !in_compat_syscall())
+	if (addr > DEFAULT_MAP_WINDOW && !in_32bit_syscall())
 		info.high_limit += TASK_SIZE_MAX - DEFAULT_MAP_WINDOW;
 
 	info.align_mask = PAGE_MASK & ~huge_page_mask(h);
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 48c591251600..36334ce78be8 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -166,7 +166,7 @@ unsigned long get_mmap_base(int is_legacy)
 	struct mm_struct *mm = current->mm;
 
 #ifdef CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
-	if (in_compat_syscall()) {
+	if (in_32bit_syscall()) {
 		return is_legacy ? mm->mmap_compat_legacy_base
 				 : mm->mmap_compat_base;
 	}
diff --git a/include/linux/compat.h b/include/linux/compat.h
index c68acc47da57..4dd4b00407ab 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -1031,9 +1031,9 @@ static inline struct compat_timeval ns_to_compat_timeval(s64 nsec)
 #else /* !CONFIG_COMPAT */
 
 #define is_compat_task() (0)
-#ifndef in_compat_syscall
+/* Ensure no one redefines in_compat_syscall() under !CONFIG_COMPAT */
+#define in_compat_syscall in_compat_syscall
 static inline bool in_compat_syscall(void) { return false; }
-#endif
 
 #endif /* CONFIG_COMPAT */
 
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 02/18] compat: Cleanup in_compat_syscall() callers
  2018-07-26  2:31 ` Dmitry Safonov
                   ` (4 preceding siblings ...)
  (?)
@ 2018-07-26  2:31 ` Dmitry Safonov
  -1 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-26  2:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev, Andy Lutomirski, Ard Biesheuvel,
	H. Peter Anvin, Ingo Molnar, John Stultz, Kirill A. Shutemov,
	Oleg Nesterov, Stephen Boyd, Steven Rostedt, Thomas Gleixner,
	x86, linux-efi

Now that in_compat_syscall() == false on native i686, it's possible to
remove some ifdeffery and no more needed helpers.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: linux-efi@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 drivers/firmware/efi/efivars.c | 16 ++++------------
 kernel/time/time.c             |  2 +-
 net/xfrm/xfrm_state.c          |  2 --
 net/xfrm/xfrm_user.c           |  2 --
 4 files changed, 5 insertions(+), 17 deletions(-)

diff --git a/drivers/firmware/efi/efivars.c b/drivers/firmware/efi/efivars.c
index 3e626fd9bd4e..8061667a6765 100644
--- a/drivers/firmware/efi/efivars.c
+++ b/drivers/firmware/efi/efivars.c
@@ -229,14 +229,6 @@ sanity_check(struct efi_variable *var, efi_char16_t *name, efi_guid_t vendor,
 	return 0;
 }
 
-static inline bool is_compat(void)
-{
-	if (IS_ENABLED(CONFIG_COMPAT) && in_compat_syscall())
-		return true;
-
-	return false;
-}
-
 static void
 copy_out_compat(struct efi_variable *dst, struct compat_efi_variable *src)
 {
@@ -263,7 +255,7 @@ efivar_store_raw(struct efivar_entry *entry, const char *buf, size_t count)
 	u8 *data;
 	int err;
 
-	if (is_compat()) {
+	if (in_compat_syscall()) {
 		struct compat_efi_variable *compat;
 
 		if (count != sizeof(*compat))
@@ -324,7 +316,7 @@ efivar_show_raw(struct efivar_entry *entry, char *buf)
 			     &entry->var.DataSize, entry->var.Data))
 		return -EIO;
 
-	if (is_compat()) {
+	if (in_compat_syscall()) {
 		compat = (struct compat_efi_variable *)buf;
 
 		size = sizeof(*compat);
@@ -418,7 +410,7 @@ static ssize_t efivar_create(struct file *filp, struct kobject *kobj,
 	struct compat_efi_variable *compat = (struct compat_efi_variable *)buf;
 	struct efi_variable *new_var = (struct efi_variable *)buf;
 	struct efivar_entry *new_entry;
-	bool need_compat = is_compat();
+	bool need_compat = in_compat_syscall();
 	efi_char16_t *name;
 	unsigned long size;
 	u32 attributes;
@@ -495,7 +487,7 @@ static ssize_t efivar_delete(struct file *filp, struct kobject *kobj,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EACCES;
 
-	if (is_compat()) {
+	if (in_compat_syscall()) {
 		if (count != sizeof(*compat))
 			return -EINVAL;
 
diff --git a/kernel/time/time.c b/kernel/time/time.c
index 2b41e8e2d31d..d59caa6d03e6 100644
--- a/kernel/time/time.c
+++ b/kernel/time/time.c
@@ -865,7 +865,7 @@ int get_timespec64(struct timespec64 *ts,
 	ts->tv_sec = kts.tv_sec;
 
 	/* Zero out the padding for 32 bit systems or in compat mode */
-	if (IS_ENABLED(CONFIG_64BIT_TIME) && (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall()))
+	if (IS_ENABLED(CONFIG_64BIT_TIME) && in_compat_syscall())
 		kts.tv_nsec &= 0xFFFFFFFFUL;
 
 	ts->tv_nsec = kts.tv_nsec;
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 8308281f3253..3f48a6925606 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2057,10 +2057,8 @@ int xfrm_user_policy(struct sock *sk, int optname, u8 __user *optval, int optlen
 	struct xfrm_mgr *km;
 	struct xfrm_policy *pol = NULL;
 
-#ifdef CONFIG_COMPAT
 	if (in_compat_syscall())
 		return -EOPNOTSUPP;
-#endif
 
 	if (!optval && !optlen) {
 		xfrm_sk_policy_insert(sk, XFRM_POLICY_IN, NULL);
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 080035f056d9..2677cb55b7a8 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2546,10 +2546,8 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 	const struct xfrm_link *link;
 	int type, err;
 
-#ifdef CONFIG_COMPAT
 	if (in_compat_syscall())
 		return -EOPNOTSUPP;
-#endif
 
 	type = nlh->nlmsg_type;
 	if (type > XFRM_MSG_MAX)
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 03/18] selftest/net/xfrm: Add test for ipsec tunnel
  2018-07-26  2:31 ` Dmitry Safonov
  (?)
@ 2018-07-26  2:31   ` dima
  -1 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-26  2:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev, Andrew Morton, Greg Kroah-Hartman,
	Mauro Carvalho Chehab, Shuah Khan, linux-kselftest

It's an exhaustive testing for ipsec: covering all encryption/
authentication/compression algorithms. The tests are run in two
network namespaces, connected by veth interfaces. To make exhaustive
testing less time-consuming, the tests are run in parallel tasks,
specified by parameter to the selftest.

As the patches set adds support for xfrm in compatible tasks, there are
tests to check structures that differ in size between 64-bit and 32-bit
applications.
The selftest doesn't use libnl so that it can be easily compiled as
compatible application and don't require compatible .so.

Here is a diagram of the selftest:

                           ---------------
                           |  selftest   |
                           |  (parent)   |
                           ---------------
                              |        |
                              | (pipe) |
                              ----------
                             /   |  |   \
               /-------------   /    \   -------------\
               |          /-----      -----\          |
      ---------|----------|----------------|----------|---------
      |   ---------   ---------        ---------   ---------   |
      |   | child |   | child |  NS A  | child |   | child |   |
      |   ---------   ---------        ---------   ---------   |
      -------|------------|----------------|-------------|------
           veth0        veth1            veth2         vethN
    ---------|------------|----------------|-------------|----------
    | ------------  ------------       ------------   ------------ |
    | | gr.child |  | gr.child | NS B  | gr.child |   | gr.child | |
    | ------------  ------------       ------------   ------------ |
    ----------------------------------------------------------------

The parent sends the description of a test (xfrm parameters) to the
child, the child and grand child setup a tunnel over veth interface and
test it by sending udp packets.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: linux-kselftest@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 MAINTAINERS                            |    1 +
 tools/testing/selftests/net/.gitignore |    1 +
 tools/testing/selftests/net/Makefile   |    1 +
 tools/testing/selftests/net/ipsec.c    | 1987 ++++++++++++++++++++++++++++++++
 4 files changed, 1990 insertions(+)
 create mode 100644 tools/testing/selftests/net/ipsec.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 0fe4228f78cb..7e20db5d0210 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9938,6 +9938,7 @@ F:	net/ipv6/ipcomp6.c
 F:	net/ipv6/ip6_vti.c
 F:	include/uapi/linux/xfrm.h
 F:	include/net/xfrm.h
+F:	tools/testing/selftests/net/ipsec.c
 
 NETWORKING [IPv4/IPv6]
 M:	"David S. Miller" <davem@davemloft.net>
diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
index 1a0ac3a29ec5..6896547292cb 100644
--- a/tools/testing/selftests/net/.gitignore
+++ b/tools/testing/selftests/net/.gitignore
@@ -1,3 +1,4 @@
+ipsec
 msg_zerocopy
 socket
 psock_fanout
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 663e11e85727..9f35c01fbc0a 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -12,6 +12,7 @@ TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
 TEST_GEN_FILES += tcp_mmap tcp_inq psock_snd
 TEST_GEN_FILES += udpgso udpgso_bench_tx udpgso_bench_rx
+TEST_GEN_FILES += ipsec
 TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
 TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict
 
diff --git a/tools/testing/selftests/net/ipsec.c b/tools/testing/selftests/net/ipsec.c
new file mode 100644
index 000000000000..e0752e127ba7
--- /dev/null
+++ b/tools/testing/selftests/net/ipsec.c
@@ -0,0 +1,1987 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ipsec.c - Check xfrm on veth inside a net-ns.
+ * Copyright (c) 2018 Dmitry Safonov (Arista Networks)
+ */
+
+#define _GNU_SOURCE
+
+#include <arpa/inet.h>
+#include <asm/types.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <linux/limits.h>
+#include <linux/netlink.h>
+#include <linux/random.h>
+#include <linux/rtnetlink.h>
+#include <linux/veth.h>
+#include <linux/xfrm.h>
+#include <netinet/in.h>
+#include <net/if.h>
+#include <sched.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <time.h>
+#include <unistd.h>
+
+#define printk(fmt, lvl, ...)						\
+	fprintf(stderr, "[%s] (%s:%d)\t" fmt "\n",			\
+	lvl, __FILE__, __LINE__, ##__VA_ARGS__)
+
+#define pr_p(func, fmt, ...)	func(fmt ": %m", ##__VA_ARGS__)
+
+#define pr_err(fmt, ...)						\
+	printk(fmt, "ERR", ##__VA_ARGS__)
+#define pr_warn(fmt, ...)						\
+	printk(fmt, "WARN", ##__VA_ARGS__)
+#define pr_note(fmt, ...)						\
+	printk(fmt, "NOTE", ##__VA_ARGS__)
+#define pr_ok(fmt, ...)							\
+	printk(fmt, "OK", ##__VA_ARGS__)
+#define pr_debug(fmt, ...)						\
+	while (0) {							\
+		printk(fmt, "NOTE", ##__VA_ARGS__);			\
+	}
+
+#define pr_perror(fmt, ...)	pr_p(pr_err, fmt, ##__VA_ARGS__)
+#define pr_pwarn(fmt, ...)	pr_p(pr_warn, fmt, ##__VA_ARGS__)
+
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+#define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
+
+#define IPV4_STR_SZ	16	/* xxx.xxx.xxx.xxx is longest + \0 */
+#define MAX_PAYLOAD	2048
+#define XFRM_ALGO_KEY_BUF_SIZE	512
+#define MAX_PROCESSES	(1 << 14) /* /16 mask divided by /30 subnets */
+#define INADDR_A	((in_addr_t) 0x0a000000) /* 10.0.0.0 */
+#define INADDR_B	((in_addr_t) 0xc0a80000) /* 192.168.0.0 */
+
+/* /30 mask for one veth connection */
+#define PREFIX_LEN	30
+#define child_ip(nr)	(4*nr + 1)
+#define grchild_ip(nr)	(4*nr + 2)
+
+#define VETH_FMT	"ktst-%d"
+#define VETH_LEN	10
+#define BEGIN_SEQ	(time(NULL))
+
+static int nsfd_parent	= -1;
+static int nsfd_childa	= -1;
+static int nsfd_childb	= -1;
+static long page_size;
+
+const unsigned int ping_delay_nsec	= 50 * 1000 * 1000;
+const unsigned int ping_timeout		= 300;
+const unsigned int ping_count		= 100;
+const unsigned int ping_success		= 80;
+
+static int unshare_open(void)
+{
+	const char *netns_path = "/proc/self/ns/net";
+	int fd;
+
+	if (unshare(CLONE_NEWNET) != 0) {
+		pr_pwarn("unshare()");
+		return -1;
+	}
+
+	fd = open(netns_path, O_RDONLY);
+	if (fd <= 0) {
+		pr_pwarn("open(%s)", netns_path);
+		return -1;
+	}
+
+	return fd;
+}
+
+static int switch_ns(int fd)
+{
+	if (setns(fd, CLONE_NEWNET)) {
+		pr_pwarn("setns()");
+		return -1;
+	}
+	return 0;
+}
+
+/*
+ * Running the test inside a new parent net namespace to bother less
+ * about cleanup on error-path.
+ */
+static int init_namespaces(void)
+{
+	nsfd_parent = unshare_open();
+	if (nsfd_parent <= 0)
+		return -1;
+
+	nsfd_childa = unshare_open();
+	if (nsfd_childa <= 0)
+		return -1;
+
+	if (switch_ns(nsfd_parent))
+		return -1;
+
+	nsfd_childb = unshare_open();
+	if (nsfd_childb <= 0)
+		return -1;
+
+	if (switch_ns(nsfd_parent))
+		return -1;
+	return 0;
+}
+
+static int netlink_sock(int *sock, uint32_t *seq_nr, int proto)
+{
+	int route_sock = 0;
+	uint32_t seq;
+
+	if (*sock > 0) {
+		seq_nr++;
+		return 0;
+	}
+
+	*sock = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, proto);
+	if (*sock <= 0) {
+		pr_pwarn("socket(AF_NETLINK)");
+		return errno;
+	}
+	seq = BEGIN_SEQ;
+
+	return 0;
+}
+
+static inline struct rtattr *rtattr_hdr(struct nlmsghdr *nh)
+{
+	return (struct rtattr *)((char *)(nh) + RTA_ALIGN((nh)->nlmsg_len));
+}
+
+static int rtattr_pack(struct nlmsghdr *nh, size_t req_sz,
+		unsigned short rta_type, const void *payload, size_t size)
+{
+	/* NLMSG_ALIGNTO == RTA_ALIGNTO, nlmsg_len already aligned */
+	struct rtattr *attr = rtattr_hdr(nh);
+	size_t nl_size = RTA_ALIGN(nh->nlmsg_len) + RTA_LENGTH(size);
+
+	if (req_sz < nl_size) {
+		pr_err("req buf is too small: %zu < %zu", req_sz, nl_size);
+		return -1;
+	}
+	nh->nlmsg_len = nl_size;
+
+	attr->rta_len = RTA_LENGTH(size); /* XXX: rta_len = size? */
+	attr->rta_type = rta_type;
+	memcpy(RTA_DATA(attr), payload, size);
+
+	return 0;
+}
+
+static struct rtattr *_rtattr_begin(struct nlmsghdr *nh, size_t req_sz,
+		unsigned short rta_type, const void *payload, size_t size)
+{
+	struct rtattr *ret = rtattr_hdr(nh);
+
+	if (rtattr_pack(nh, req_sz, rta_type, payload, size))
+		return 0;
+
+	return ret;
+}
+
+static inline struct rtattr *rtattr_begin(struct nlmsghdr *nh, size_t req_sz,
+		unsigned short rta_type)
+{
+	return _rtattr_begin(nh, req_sz, rta_type, 0, 0);
+}
+
+static inline void rtattr_end(struct nlmsghdr *nh, struct rtattr *attr)
+{
+	char *nlmsg_end = (char *)nh + nh->nlmsg_len;
+
+	attr->rta_len = nlmsg_end - (char *)attr;
+}
+
+static int veth_pack_peerb(struct nlmsghdr *nh, size_t req_sz,
+		const char *peer, int ns)
+{
+	struct ifinfomsg pi;
+	struct rtattr *peer_attr;
+
+	memset(&pi, 0, sizeof(pi));
+	pi.ifi_family	= AF_UNSPEC;
+	pi.ifi_change	= 0xFFFFFFFF;
+
+	peer_attr = _rtattr_begin(nh, req_sz, VETH_INFO_PEER, &pi, sizeof(pi));
+	if (!peer_attr)
+		return -1;
+
+	if (rtattr_pack(nh, req_sz, IFLA_IFNAME, peer, strlen(peer)))
+		return -1;
+
+	if (rtattr_pack(nh, req_sz, IFLA_NET_NS_FD, &ns, sizeof(ns)))
+		return -1;
+
+	rtattr_end(nh, peer_attr);
+
+	return 0;
+}
+
+static int netlink_check_answer(int sock)
+{
+	struct nlmsgerror {
+		struct nlmsghdr hdr;
+		int error;
+		struct nlmsghdr orig_msg;
+	} answer;
+
+	if (recv(sock, &answer, sizeof(answer), 0) < 0) {
+		pr_perror("recv()");
+		return -1;
+	} else if (answer.hdr.nlmsg_type != NLMSG_ERROR) {
+		pr_err("expected NLMSG_ERROR, got %d", (int)answer.hdr.nlmsg_type);
+		return -1;
+	} else if (answer.error) {
+		pr_err("NLMSG_ERROR: %d: %s",
+			answer.error, strerror(-answer.error));
+		return answer.error;
+	}
+
+	return 0;
+}
+
+static int veth_add(int sock, uint32_t seq, const char *peera, int ns_a,
+		const char *peerb, int ns_b)
+{
+	uint16_t flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_EXCL | NLM_F_CREATE;
+	struct {
+		struct nlmsghdr		nh;
+		struct ifinfomsg	info;
+		char			attrbuf[MAX_PAYLOAD];
+	} req;
+	const char veth_type[] = "veth";
+	struct rtattr *link_info, *info_data;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.info));
+	req.nh.nlmsg_type	= RTM_NEWLINK;
+	req.nh.nlmsg_flags	= flags;
+	req.nh.nlmsg_seq	= seq;
+	req.info.ifi_family	= AF_UNSPEC;
+	req.info.ifi_change	= 0xFFFFFFFF;
+
+	if (rtattr_pack(&req.nh, sizeof(req), IFLA_IFNAME, peera, strlen(peera)))
+		return -1;
+
+	if (rtattr_pack(&req.nh, sizeof(req), IFLA_NET_NS_FD, &ns_a, sizeof(ns_a)))
+		return -1;
+
+	link_info = rtattr_begin(&req.nh, sizeof(req), IFLA_LINKINFO);
+	if (!link_info)
+		return -1;
+
+	if (rtattr_pack(&req.nh, sizeof(req), IFLA_INFO_KIND, veth_type, sizeof(veth_type)))
+		return -1;
+
+	info_data = rtattr_begin(&req.nh, sizeof(req), IFLA_INFO_DATA);
+	if (!info_data)
+		return -1;
+
+	if (veth_pack_peerb(&req.nh, sizeof(req), peerb, ns_b))
+		return -1;
+
+	rtattr_end(&req.nh, info_data);
+	rtattr_end(&req.nh, link_info);
+
+	if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+	return netlink_check_answer(sock);
+}
+
+static int ip4_addr_set(int sock, uint32_t seq, const char *intf,
+		struct in_addr addr, uint8_t prefix)
+{
+	uint16_t flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_EXCL | NLM_F_CREATE;
+	struct {
+		struct nlmsghdr		nh;
+		struct ifaddrmsg	info;
+		char			attrbuf[MAX_PAYLOAD];
+	} req;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.info));
+	req.nh.nlmsg_type	= RTM_NEWADDR;
+	req.nh.nlmsg_flags	= flags;
+	req.nh.nlmsg_seq	= seq;
+	req.info.ifa_family	= AF_INET;
+	req.info.ifa_prefixlen	= prefix;
+	req.info.ifa_index	= if_nametoindex(intf);
+
+#if 0
+	{
+		char addr_str[IPV4_STR_SZ] = {};
+
+		strncpy(addr_str, inet_ntoa(addr), IPV4_STR_SZ - 1);
+
+		pr_warn("ip addr set %s", addr_str);
+	}
+#endif
+
+	if (rtattr_pack(&req.nh, sizeof(req), IFA_LOCAL, &addr, sizeof(addr)))
+		return -1;
+
+	if (rtattr_pack(&req.nh, sizeof(req), IFA_ADDRESS, &addr, sizeof(addr)))
+		return -1;
+
+	if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+	return netlink_check_answer(sock);
+}
+
+static int link_set_up(int sock, uint32_t seq, const char *intf)
+{
+	struct {
+		struct nlmsghdr		nh;
+		struct ifinfomsg	info;
+		char			attrbuf[MAX_PAYLOAD];
+	} req;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.info));
+	req.nh.nlmsg_type	= RTM_NEWLINK;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= seq;
+	req.info.ifi_family	= AF_UNSPEC;
+	req.info.ifi_change	= 0xFFFFFFFF;
+	req.info.ifi_index	= if_nametoindex(intf);
+	req.info.ifi_flags	= IFF_UP;
+	req.info.ifi_change	= IFF_UP;
+
+	if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+	return netlink_check_answer(sock);
+}
+
+static int ip4_route_set(int sock, uint32_t seq, const char *intf,
+		struct in_addr src, struct in_addr dst)
+{
+	struct {
+		struct nlmsghdr	nh;
+		struct rtmsg	rt;
+		char		attrbuf[MAX_PAYLOAD];
+	} req;
+	unsigned int index = if_nametoindex(intf);
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.rt));
+	req.nh.nlmsg_type	= RTM_NEWROUTE;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK | NLM_F_CREATE;
+	req.nh.nlmsg_seq	= seq;
+	req.rt.rtm_family	= AF_INET;
+	req.rt.rtm_dst_len	= 32;
+	req.rt.rtm_table	= RT_TABLE_MAIN;
+	req.rt.rtm_protocol	= RTPROT_BOOT;
+	req.rt.rtm_scope	= RT_SCOPE_LINK;
+	req.rt.rtm_type		= RTN_UNICAST;
+
+	if (rtattr_pack(&req.nh, sizeof(req), RTA_DST, &dst, sizeof(dst)))
+		return -1;
+
+	if (rtattr_pack(&req.nh, sizeof(req), RTA_PREFSRC, &src, sizeof(src)))
+		return -1;
+
+	if (rtattr_pack(&req.nh, sizeof(req), RTA_OIF, &index, sizeof(index)))
+		return -1;
+
+	if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+
+	return netlink_check_answer(sock);
+}
+
+static int tunnel_set_route(int route_sock, uint32_t *route_seq, char *veth,
+		struct in_addr tunsrc, struct in_addr tundst)
+{
+	if (ip4_addr_set(route_sock, (*route_seq)++, "lo",
+			tunsrc, PREFIX_LEN)) {
+		pr_err("Failed to set ipv4 addr");
+		return -1;
+	}
+
+	if (ip4_route_set(route_sock, (*route_seq)++, veth, tunsrc, tundst)) {
+		pr_err("Failed to set ipv4 route");
+		return -1;
+	}
+}
+
+static int init_child(int nsfd, char *veth, unsigned int src, unsigned int dst)
+{
+	struct in_addr intsrc = inet_makeaddr(INADDR_B, src);
+	struct in_addr tunsrc = inet_makeaddr(INADDR_A, src);
+	struct in_addr tundst = inet_makeaddr(INADDR_A, dst);
+	int route_sock = -1, ret = -1;
+	uint32_t route_seq;
+
+	if (switch_ns(nsfd))
+		return -1;
+
+	if (netlink_sock(&route_sock, &route_seq, NETLINK_ROUTE)) {
+		pr_err("Failed to open netlink route socket in child");
+		return -1;
+	}
+
+	if (ip4_addr_set(route_sock, route_seq++, veth, intsrc, PREFIX_LEN)) {
+		pr_err("Failed to set ipv4 addr");
+		goto err;
+	}
+
+	if (link_set_up(route_sock, route_seq++, veth)) {
+		pr_err("Failed to bring up %s", veth);
+		goto err;
+	}
+
+	if (tunnel_set_route(route_sock, &route_seq, veth, tunsrc, tundst)) {
+		pr_err("Failed to add tunnel route on %s", veth);
+		goto err;
+	}
+	ret = 0;
+
+err:
+	close(route_sock);
+	return ret;
+}
+
+#define ALGO_LEN	64
+enum desc_type {
+	CREATE_TUNNEL	= 0,
+	ALLOCATE_SPI,
+	MONITOR_ACQUIRE,
+	EXPIRE_STATE,
+	EXPIRE_POLICY,
+};
+struct xfrm_desc {
+	enum desc_type	type;
+	uint8_t		proto;
+	char		a_algo[ALGO_LEN];
+	char		e_algo[ALGO_LEN];
+	char		c_algo[ALGO_LEN];
+	char		ae_algo[ALGO_LEN];
+	unsigned int	icv_len;
+	/* unsigned key_len; */
+};
+
+enum msg_type {
+	MSG_ACK		= 0,
+	MSG_EXIT,
+	MSG_PING,
+	MSG_XFRM_PREPARE,
+	MSG_XFRM_ADD,
+	MSG_XFRM_DEL,
+	MSG_XFRM_CLEANUP,
+};
+
+struct test_desc {
+	enum msg_type type;
+	union {
+		struct {
+			in_addr_t reply_ip;
+			unsigned int port;
+		} ping;
+		struct xfrm_desc xfrm_desc;
+	} body;
+};
+
+static void write_msg(int fd, struct test_desc *msg)
+{
+	ssize_t bytes = write(fd, msg, sizeof(*msg));
+
+	/* Make sure that write/read is atomic to a pipe */
+	BUILD_BUG_ON(sizeof(struct test_desc) > PIPE_BUF);
+
+	if (bytes < 0) {
+		pr_perror("write()");
+		exit(1);
+	}
+	if (bytes != sizeof(*msg)) {
+		pr_perror("sent part of the message %zd/%zu", bytes, sizeof(*msg));
+		exit(1);
+	}
+}
+
+static void read_msg(int fd, struct test_desc *msg)
+{
+	ssize_t bytes = read(fd, msg, sizeof(*msg));
+
+	if (bytes < 0) {
+		pr_perror("read()");
+		exit(1);
+	}
+	if (bytes != sizeof(*msg)) {
+		pr_perror("got incomplete message %zd/%zu", bytes, sizeof(*msg));
+		exit(1);
+	}
+}
+
+static int udp_ping_init(struct in_addr listen_ip, unsigned int u_timeout,
+		unsigned int *server_port, int sock[2])
+{
+	struct sockaddr_in server;
+	struct timeval t = { .tv_sec = 0, .tv_usec = u_timeout };
+	socklen_t s_len = sizeof(server);
+
+	sock[0] = socket(AF_INET, SOCK_DGRAM, 0);
+	if (sock[0] < 0) {
+		pr_perror("socket()");
+		return -1;
+	}
+
+	server.sin_family	= AF_INET;
+	server.sin_port		= 0;
+	memcpy(&server.sin_addr.s_addr, &listen_ip, sizeof(struct in_addr));
+
+	if (bind(sock[0], (struct sockaddr *)&server, s_len)) {
+		pr_perror("bind()");
+		goto err_close_server;
+	}
+
+	if (getsockname(sock[0], (struct sockaddr *)&server, &s_len)) {
+		pr_perror("getsockname()");
+		goto err_close_server;
+	}
+
+	*server_port = ntohs(server.sin_port);
+
+	if (setsockopt(sock[0], SOL_SOCKET, SO_RCVTIMEO, (const char *)&t, sizeof t)) {
+		pr_perror("setsockopt()");
+		goto err_close_server;
+	}
+
+	sock[1] = socket(AF_INET, SOCK_DGRAM, 0);
+	if (sock[1] < 0) {
+		pr_perror("socket()");
+		goto err_close_server;
+	}
+
+	return 0;
+
+err_close_server:
+	close(sock[0]);
+	return -1;
+}
+
+static int udp_ping_send(int sock[2], in_addr_t dest_ip, unsigned int port,
+		char *buf, size_t buf_len)
+{
+	struct sockaddr_in server;
+	const struct sockaddr *dest_addr = (struct sockaddr *)&server;
+	char *sock_buf[buf_len];
+	ssize_t r_bytes, s_bytes;
+
+	server.sin_family	= AF_INET;
+	server.sin_port		= htons(port);
+	server.sin_addr.s_addr	= dest_ip;
+
+	s_bytes = sendto(sock[1], buf, buf_len, 0, dest_addr, sizeof(server));
+	if (s_bytes < 0) {
+		pr_perror("sendto()");
+		return -1;
+	} else if (s_bytes != buf_len) {
+		pr_err("send part of the message: %zd/%zu", s_bytes, sizeof(server));
+		return -1;
+	}
+
+	r_bytes = recv(sock[0], sock_buf, buf_len, 0);
+	if (r_bytes < 0) {
+		if (errno != EAGAIN)
+			pr_perror("recv()");
+		return -1;
+	} else if (r_bytes == 0) { /* EOF */
+		pr_err("EOF on reply to ping");
+		return -1;
+	} else if (r_bytes != buf_len || memcmp(buf, sock_buf, buf_len)) {
+		pr_err("ping reply packet is corrupted %zd/%zu", r_bytes, buf_len);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int udp_ping_reply(int sock[2], in_addr_t dest_ip, unsigned int port,
+		char *buf, size_t buf_len)
+{
+	struct sockaddr_in server;
+	const struct sockaddr *dest_addr = (struct sockaddr *)&server;
+	char *sock_buf[buf_len];
+	ssize_t r_bytes, s_bytes;
+
+	server.sin_family	= AF_INET;
+	server.sin_port		= htons(port);
+	server.sin_addr.s_addr	= dest_ip;
+
+	r_bytes = recv(sock[0], sock_buf, buf_len, 0);
+	if (r_bytes < 0) {
+		if (errno != EAGAIN)
+			pr_perror("recv()");
+		return -1;
+	}
+	if (r_bytes == 0) { /* EOF */
+		pr_err("EOF on reply to ping");
+		return -1;
+	}
+	if (r_bytes != buf_len || memcmp(buf, sock_buf, buf_len)) {
+		pr_err("ping reply packet is corrupted %zd/%zu", r_bytes, buf_len);
+		return -1;
+	}
+
+	s_bytes = sendto(sock[1], buf, buf_len, 0, dest_addr, sizeof(server));
+	if (s_bytes < 0) {
+		pr_perror("sendto()");
+		return -1;
+	} else if (s_bytes != buf_len) {
+		pr_err("send part of the message: %zd/%zu", s_bytes, sizeof(server));
+		return -1;
+	}
+
+	return 0;
+}
+
+typedef int (*ping_f)(int sock[2], in_addr_t dest_ip, unsigned int port,
+		char *buf, size_t buf_len);
+static int do_ping(int cmd_fd, char *buf, size_t buf_len, struct in_addr from,
+		bool init_side, int d_port, in_addr_t to, ping_f func)
+{
+	struct test_desc msg;
+	unsigned int s_port, i, ping_succeeded = 0;
+	int ping_sock[2];
+	char to_str[IPV4_STR_SZ] = {}, from_str[IPV4_STR_SZ] = {};
+
+	if (udp_ping_init(from, ping_timeout, &s_port, ping_sock)) {
+		pr_err("Failed to init ping");
+		return -1;
+	}
+
+	memset(&msg, 0, sizeof(msg));
+	msg.type		= MSG_PING;
+	msg.body.ping.port	= s_port;
+	memcpy(&msg.body.ping.reply_ip, &from, sizeof(from));
+
+	write_msg(cmd_fd, &msg);
+	if (init_side) {
+		/* The other end sends ip to ping */
+		read_msg(cmd_fd, &msg);
+		if (msg.type != MSG_PING)
+			return -1;
+		to = msg.body.ping.reply_ip;
+		d_port = msg.body.ping.port;
+	}
+
+	for (i = 0; i < ping_count ; i++) {
+		struct timespec sleep_time = {
+			.tv_sec = 0,
+			.tv_nsec = ping_delay_nsec,
+		};
+
+		ping_succeeded += !func(ping_sock, to, d_port, buf, page_size);
+		nanosleep(&sleep_time, 0);
+	}
+
+	close(ping_sock[0]);
+	close(ping_sock[1]);
+
+	strncpy(to_str, inet_ntoa(*(struct in_addr *)&to), IPV4_STR_SZ - 1);
+	strncpy(from_str, inet_ntoa(from), IPV4_STR_SZ - 1);
+
+	if (ping_succeeded < ping_success) {
+		pr_err("ping (%s) %s->%s failed %u/%u times",
+			init_side ? "send" : "reply", from_str, to_str,
+			ping_count - ping_succeeded, ping_count);
+		return -1;
+	}
+
+	pr_debug("ping (%s) %s->%s succeeded %u/%u times",
+		init_side ? "send" : "reply", from_str, to_str,
+		ping_succeeded, ping_count);
+
+	return 0;
+}
+
+static int randomize_buffer(void *buf, size_t buflen)
+{
+	int random_bytes = 0;
+
+	if (!buflen)
+		return 0;
+
+	do {
+		random_bytes += syscall(SYS_getrandom, buf, buflen, 0);
+	} while (random_bytes > 0 && random_bytes < buflen);
+
+	if (random_bytes < 0) {
+		pr_err("get_random() failed: %d\n", random_bytes);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int xfrm_fill_key(char *name, char *buf,
+		size_t buf_len, unsigned int *key_len)
+{
+	/* XXX: use set/map instead of all this */
+	if (strncmp(name, "digest_null", ALGO_LEN) == 0)
+		*key_len = 0;
+	else if (strncmp(name, "ecb(cipher_null)", ALGO_LEN) == 0)
+		*key_len = 0;
+	else if (strncmp(name, "cbc(des)", ALGO_LEN) == 0)
+		*key_len = 64;
+	else if (strncmp(name, "hmac(md5)", ALGO_LEN) == 0)
+		*key_len = 128;
+	else if (strncmp(name, "cmac(aes)", ALGO_LEN) == 0)
+		*key_len = 128;
+	else if (strncmp(name, "xcbc(aes)", ALGO_LEN) == 0)
+		*key_len = 128;
+	else if (strncmp(name, "cbc(cast5)", ALGO_LEN) == 0)
+		*key_len = 128;
+	else if (strncmp(name, "cbc(serpent)", ALGO_LEN) == 0)
+		*key_len = 128;
+	else if (strncmp(name, "hmac(sha1)", ALGO_LEN) == 0)
+		*key_len = 160;
+	else if (strncmp(name, "hmac(rmd160)", ALGO_LEN) == 0)
+		*key_len = 160;
+	else if (strncmp(name, "cbc(des3_ede)", ALGO_LEN) == 0)
+		*key_len = 192;
+	else if (strncmp(name, "hmac(sha256)", ALGO_LEN) == 0)
+		*key_len = 256;
+	else if (strncmp(name, "cbc(aes)", ALGO_LEN) == 0)
+		*key_len = 256;
+	else if (strncmp(name, "cbc(camellia)", ALGO_LEN) == 0)
+		*key_len = 256;
+	else if (strncmp(name, "cbc(twofish)", ALGO_LEN) == 0)
+		*key_len = 256;
+	else if (strncmp(name, "rfc3686(ctr(aes))", ALGO_LEN) == 0)
+		*key_len = 288;
+	else if (strncmp(name, "hmac(sha384)", ALGO_LEN) == 0)
+		*key_len = 384;
+	else if (strncmp(name, "cbc(blowfish)", ALGO_LEN) == 0)
+		*key_len = 448;
+	else if (strncmp(name, "hmac(sha512)", ALGO_LEN) == 0)
+		*key_len = 512;
+	else if (strncmp(name, "rfc4106(gcm(aes))-128", ALGO_LEN) == 0)
+		*key_len = 160;
+	else if (strncmp(name, "rfc4543(gcm(aes))-128", ALGO_LEN) == 0)
+		*key_len = 160;
+	else if (strncmp(name, "rfc4309(ccm(aes))-128", ALGO_LEN) == 0)
+		*key_len = 152;
+	else if (strncmp(name, "rfc4106(gcm(aes))-192", ALGO_LEN) == 0)
+		*key_len = 224;
+	else if (strncmp(name, "rfc4543(gcm(aes))-192", ALGO_LEN) == 0)
+		*key_len = 224;
+	else if (strncmp(name, "rfc4309(ccm(aes))-192", ALGO_LEN) == 0)
+		*key_len = 216;
+	else if (strncmp(name, "rfc4106(gcm(aes))-256", ALGO_LEN) == 0)
+		*key_len = 288;
+	else if (strncmp(name, "rfc4543(gcm(aes))-256", ALGO_LEN) == 0)
+		*key_len = 288;
+	else if (strncmp(name, "rfc4309(ccm(aes))-256", ALGO_LEN) == 0)
+		*key_len = 280;
+	else if (strncmp(name, "rfc7539(chacha20,poly1305)-128", ALGO_LEN) == 0)
+		*key_len = 0;
+
+	if (*key_len > buf_len) {
+		pr_err("Can't pack a key - too big for buffer");
+		return -1;
+	}
+
+	return randomize_buffer(buf, *key_len);
+}
+
+static int xfrm_state_pack_algo(struct nlmsghdr *nh, size_t req_sz,
+		struct xfrm_desc *desc)
+{
+	struct {
+		union {
+			struct xfrm_algo	alg;
+			struct xfrm_algo_aead	aead;
+			struct xfrm_algo_auth	auth;
+		} u;
+		char buf[XFRM_ALGO_KEY_BUF_SIZE];
+	} alg = {};
+	size_t alen, elen, clen, aelen;
+	unsigned short type;
+
+	alen = strlen(desc->a_algo);
+	elen = strlen(desc->e_algo);
+	clen = strlen(desc->c_algo);
+	aelen = strlen(desc->ae_algo);
+
+	/* Verify desc */
+	switch (desc->proto) {
+	case IPPROTO_AH:
+		if (!alen || elen || clen || aelen) {
+			pr_err("BUG: buggy ah desc");
+			return -1;
+		}
+		strncpy(alg.u.alg.alg_name, desc->a_algo, ALGO_LEN);
+		if (xfrm_fill_key(desc->a_algo, alg.u.alg.alg_key,
+				sizeof(alg.buf), &alg.u.alg.alg_key_len))
+			return -1;
+		type = XFRMA_ALG_AUTH;
+		break;
+	case IPPROTO_COMP:
+		if (!clen || elen || alen || aelen) {
+			pr_err("BUG: buggy comp desc");
+			return -1;
+		}
+		strncpy(alg.u.alg.alg_name, desc->c_algo, ALGO_LEN);
+		if (xfrm_fill_key(desc->c_algo, alg.u.alg.alg_key,
+				sizeof(alg.buf), &alg.u.alg.alg_key_len))
+			return -1;
+		type = XFRMA_ALG_COMP;
+		break;
+	case IPPROTO_ESP:
+		if (!((alen && elen) ^ aelen) || clen) {
+			pr_err("BUG: buggy esp desc");
+			return -1;
+		}
+		if (aelen) {
+			alg.u.aead.alg_icv_len = desc->icv_len;
+			strncpy(alg.u.aead.alg_name, desc->ae_algo, ALGO_LEN);
+			if (xfrm_fill_key(desc->ae_algo, alg.u.aead.alg_key,
+						sizeof(alg.buf), &alg.u.aead.alg_key_len))
+				return -1;
+			type = XFRMA_ALG_AEAD;
+		} else {
+
+			strncpy(alg.u.alg.alg_name, desc->e_algo, ALGO_LEN);
+			type = XFRMA_ALG_CRYPT;
+			if (xfrm_fill_key(desc->e_algo, alg.u.alg.alg_key,
+						sizeof(alg.buf), &alg.u.alg.alg_key_len))
+				return -1;
+			if (rtattr_pack(nh, req_sz, type, &alg, sizeof(alg)))
+				return -1;
+
+			strncpy(alg.u.alg.alg_name, desc->a_algo, ALGO_LEN);
+			type = XFRMA_ALG_AUTH;
+			if (xfrm_fill_key(desc->a_algo, alg.u.alg.alg_key,
+						sizeof(alg.buf), &alg.u.alg.alg_key_len))
+				return -1;
+		}
+		break;
+	default:
+		pr_err("BUG: unknown proto in desc");
+		return -1;
+	}
+
+	if (rtattr_pack(nh, req_sz, type, &alg, sizeof(alg)))
+		return -1;
+
+	return 0;
+}
+
+static inline uint32_t gen_spi(struct in_addr src)
+{
+	return htonl(inet_lnaof(src));
+}
+
+static int xfrm_state_add(int xfrm_sock, uint32_t seq, uint32_t spi,
+		struct in_addr src, struct in_addr dst,
+		struct xfrm_desc *desc)
+{
+	struct {
+		struct nlmsghdr		nh;
+		struct xfrm_usersa_info	info;
+		char			attrbuf[MAX_PAYLOAD];
+	} req;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.info));
+	req.nh.nlmsg_type	= XFRM_MSG_NEWSA;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= seq;
+
+	/* Fill selector. */
+	memcpy(&req.info.sel.daddr, &dst, sizeof(dst));
+	memcpy(&req.info.sel.saddr, &src, sizeof(src));
+	req.info.sel.family		= AF_INET;
+	req.info.sel.prefixlen_d	= PREFIX_LEN;
+	req.info.sel.prefixlen_s	= PREFIX_LEN;
+
+	/* Fill id */
+	memcpy(&req.info.id.daddr, &dst, sizeof(dst));
+	/* Note: zero-spi cannot be deleted */
+	req.info.id.spi = spi;
+	req.info.id.proto	= desc->proto;
+
+	memcpy(&req.info.saddr, &src, sizeof(src));
+
+	/* Fill lifteme_cfg */
+	req.info.lft.soft_byte_limit	= XFRM_INF;
+	req.info.lft.hard_byte_limit	= XFRM_INF;
+	req.info.lft.soft_packet_limit	= XFRM_INF;
+	req.info.lft.hard_packet_limit	= XFRM_INF;
+
+	req.info.family		= AF_INET;
+	req.info.mode		= XFRM_MODE_TUNNEL;
+
+	/* XXX: Fill seq, reqid, replay_window, flags? */
+
+	if (xfrm_state_pack_algo(&req.nh, sizeof(req), desc))
+		return -1;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+
+	return netlink_check_answer(xfrm_sock);
+}
+
+static int xfrm_set(int xfrm_sock, uint32_t *seq,
+		struct in_addr src, struct in_addr dst,
+		struct in_addr tunsrc, struct in_addr tundst,
+		struct xfrm_desc *desc)
+{
+	int err;
+
+	err = xfrm_state_add(xfrm_sock, (*seq)++, gen_spi(src), src, dst, desc);
+	if (err) {
+		pr_err("Failed to add xfrm state");
+		return -1;
+	}
+
+	err = xfrm_state_add(xfrm_sock, (*seq)++, gen_spi(src), dst, src, desc);
+	if (err) {
+		pr_err("Failed to add xfrm state");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int xfrm_policy_add(int xfrm_sock, uint32_t seq, uint32_t spi,
+		struct in_addr src, struct in_addr dst, uint8_t dir,
+		struct in_addr tunsrc, struct in_addr tundst, uint8_t proto)
+{
+	struct {
+		struct nlmsghdr			nh;
+		struct xfrm_userpolicy_info	info;
+		char				attrbuf[MAX_PAYLOAD];
+	} req;
+	struct xfrm_user_tmpl tmpl;
+
+	memset(&req, 0, sizeof(req));
+	memset(&tmpl, 0, sizeof(tmpl));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.info));
+	req.nh.nlmsg_type	= XFRM_MSG_NEWPOLICY;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= seq;
+
+	/* Fill selector. */
+	memcpy(&req.info.sel.daddr, &dst, sizeof(tundst));
+	memcpy(&req.info.sel.saddr, &src, sizeof(tunsrc));
+	req.info.sel.family		= AF_INET;
+	req.info.sel.prefixlen_d	= PREFIX_LEN;
+	req.info.sel.prefixlen_s	= PREFIX_LEN;
+
+	/* Fill lifteme_cfg */
+	req.info.lft.soft_byte_limit	= XFRM_INF;
+	req.info.lft.hard_byte_limit	= XFRM_INF;
+	req.info.lft.soft_packet_limit	= XFRM_INF;
+	req.info.lft.hard_packet_limit	= XFRM_INF;
+
+	req.info.dir = dir;
+
+	/* Fill tmpl */
+	memcpy(&tmpl.id.daddr, &dst, sizeof(dst));
+	/* Note: zero-spi cannot be deleted */
+	tmpl.id.spi = spi;
+	tmpl.id.proto	= proto;
+	tmpl.family	= AF_INET;
+	memcpy(&tmpl.saddr, &src, sizeof(src));
+	tmpl.mode	= XFRM_MODE_TUNNEL;
+	tmpl.aalgos = (~(uint32_t)0);
+	tmpl.ealgos = (~(uint32_t)0);
+	tmpl.calgos = (~(uint32_t)0);
+
+	if (rtattr_pack(&req.nh, sizeof(req), XFRMA_TMPL, &tmpl, sizeof(tmpl)))
+		return -1;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+
+	return netlink_check_answer(xfrm_sock);
+}
+
+static int xfrm_prepare(int xfrm_sock, uint32_t *seq,
+		struct in_addr src, struct in_addr dst,
+		struct in_addr tunsrc, struct in_addr tundst, uint8_t proto)
+{
+	if (xfrm_policy_add(xfrm_sock, (*seq)++, gen_spi(src), src, dst,
+				XFRM_POLICY_OUT, tunsrc, tundst, proto)) {
+		pr_err("Failed to add xfrm policy");
+		return -1;
+	}
+
+	if (xfrm_policy_add(xfrm_sock, (*seq)++, gen_spi(src), dst, src,
+				XFRM_POLICY_IN, tunsrc, tundst, proto)) {
+		pr_err("Failed to add xfrm policy");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int xfrm_policy_del(int xfrm_sock, uint32_t seq,
+		struct in_addr src, struct in_addr dst, uint8_t dir,
+		struct in_addr tunsrc, struct in_addr tundst)
+{
+	struct {
+		struct nlmsghdr			nh;
+		struct xfrm_userpolicy_id	id;
+		char				attrbuf[MAX_PAYLOAD];
+	} req;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.id));
+	req.nh.nlmsg_type	= XFRM_MSG_DELPOLICY;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= seq;
+
+	/* Fill id */
+	memcpy(&req.id.sel.daddr, &dst, sizeof(tundst));
+	memcpy(&req.id.sel.saddr, &src, sizeof(tunsrc));
+	req.id.sel.family		= AF_INET;
+	req.id.sel.prefixlen_d		= PREFIX_LEN;
+	req.id.sel.prefixlen_s		= PREFIX_LEN;
+	req.id.dir = dir;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+
+	return netlink_check_answer(xfrm_sock);
+}
+
+static int xfrm_cleanup(int xfrm_sock, uint32_t *seq,
+		struct in_addr src, struct in_addr dst,
+		struct in_addr tunsrc, struct in_addr tundst)
+{
+	if (xfrm_policy_del(xfrm_sock, (*seq)++, src, dst,
+				XFRM_POLICY_OUT, tunsrc, tundst)) {
+		pr_err("Failed to add xfrm policy");
+		return -1;
+	}
+
+	if (xfrm_policy_del(xfrm_sock, (*seq)++, dst, src,
+				XFRM_POLICY_IN, tunsrc, tundst)) {
+		pr_err("Failed to add xfrm policy");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int xfrm_state_del(int xfrm_sock, uint32_t seq, uint32_t spi,
+		struct in_addr src, struct in_addr dst, uint8_t proto)
+{
+	struct {
+		struct nlmsghdr		nh;
+		struct xfrm_usersa_id	id;
+		char			attrbuf[MAX_PAYLOAD];
+	} req;
+	xfrm_address_t saddr = {};
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.id));
+	req.nh.nlmsg_type	= XFRM_MSG_DELSA;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= seq;
+
+	memcpy(&req.id.daddr, &dst, sizeof(dst));
+	req.id.family		= AF_INET;
+	req.id.proto		= proto;
+	/* Note: zero-spi cannot be deleted */
+	req.id.spi = spi;
+
+	memcpy(&saddr, &src, sizeof(src));
+	if (rtattr_pack(&req.nh, sizeof(req), XFRMA_SRCADDR, &saddr, sizeof(saddr)))
+		return -1;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+
+	return netlink_check_answer(xfrm_sock);
+}
+
+static int xfrm_delete(int xfrm_sock, uint32_t *seq,
+		struct in_addr src, struct in_addr dst,
+		struct in_addr tunsrc, struct in_addr tundst, uint8_t proto)
+{
+	if (xfrm_state_del(xfrm_sock, (*seq)++, gen_spi(src), src, dst, proto)) {
+		pr_err("Failed to remove xfrm state");
+		return -1;
+	}
+
+	if (xfrm_state_del(xfrm_sock, (*seq)++, gen_spi(src), dst, src, proto)) {
+		pr_err("Failed to remove xfrm state");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int xfrm_state_allocspi(int xfrm_sock, uint32_t *seq,
+		uint32_t spi, uint8_t proto)
+{
+	struct {
+		struct nlmsghdr			nh;
+		struct xfrm_userspi_info	spi;
+	} req;
+	struct {
+		struct nlmsghdr			nh;
+		union {
+			struct xfrm_usersa_info	info;
+			int error;
+		};
+	} answer;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.spi));
+	req.nh.nlmsg_type	= XFRM_MSG_ALLOCSPI;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST;
+	req.nh.nlmsg_seq	= (*seq)++;
+
+	req.spi.info.family	= AF_INET;
+	req.spi.min		= spi;
+	req.spi.max		= spi;
+	req.spi.info.id.proto	= proto;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+
+	if (recv(xfrm_sock, &answer, sizeof(answer), 0) < 0) {
+		pr_perror("recv()");
+		return -1;
+	} else if (answer.nh.nlmsg_type == XFRM_MSG_NEWSA) {
+		uint32_t new_spi = htonl(answer.info.id.spi);
+
+		if (new_spi != spi) {
+			pr_err("allocated spi is different from requested: %#x != %#x",
+					new_spi, spi);
+			return -1;
+		}
+		return 0;
+	} else if (answer.nh.nlmsg_type != NLMSG_ERROR) {
+		pr_err("expected NLMSG_ERROR, got %d", (int)answer.nh.nlmsg_type);
+		return -1;
+	}
+
+	pr_err("NLMSG_ERROR: %d: %s", answer.error, strerror(-answer.error));
+	return answer.error;
+}
+
+static int netlink_sock_bind(int *sock, uint32_t *seq, int proto, uint32_t groups)
+{
+	struct sockaddr_nl snl = {};
+	socklen_t addr_len;
+	int ret = -1;
+
+	snl.nl_family = AF_NETLINK;
+	snl.nl_groups = groups;
+
+	if (netlink_sock(sock, seq, proto)) {
+		pr_err("Failed to open xfrm netlink socket");
+		return -1;
+	}
+
+	if (bind(*sock, (struct sockaddr *)&snl, sizeof(snl)) < 0) {
+		pr_perror("bind()");
+		goto out_close;
+	}
+
+	addr_len = sizeof(snl);
+	if (getsockname(*sock, (struct sockaddr *)&snl, &addr_len) < 0) {
+		pr_perror("getsockname()");
+		goto out_close;
+	}
+	if (addr_len != sizeof(snl)) {
+		pr_err("Wrong address length %d", addr_len);
+		goto out_close;
+	}
+	if (snl.nl_family != AF_NETLINK) {
+		pr_err("Wrong address family %d", snl.nl_family);
+		goto out_close;
+	}
+	return 0;
+
+out_close:
+	close(*sock);
+	return ret;
+}
+
+static int xfrm_monitor_acquire(int xfrm_sock, uint32_t *seq, unsigned int nr)
+{
+	struct {
+		struct nlmsghdr nh;
+		union {
+			struct xfrm_user_acquire acq;
+			int error;
+		};
+		char attrbuf[MAX_PAYLOAD];
+	} req;
+	struct xfrm_user_tmpl xfrm_tmpl = {};
+	int xfrm_listen = -1, ret = -1;
+	uint32_t seq_listen;
+
+	if (netlink_sock_bind(&xfrm_listen, &seq_listen, NETLINK_XFRM, XFRMNLGRP_ACQUIRE))
+		return -1;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.acq));
+	req.nh.nlmsg_type	= XFRM_MSG_ACQUIRE;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= (*seq)++;
+
+	req.acq.policy.sel.family	= AF_INET;
+	req.acq.aalgos	= 0xfeed;
+	req.acq.ealgos	= 0xbaad;
+	req.acq.calgos	= 0xbabe;
+
+	xfrm_tmpl.family = AF_INET;
+	xfrm_tmpl.id.proto = IPPROTO_ESP;
+	if (rtattr_pack(&req.nh, sizeof(req), XFRMA_TMPL, &xfrm_tmpl, sizeof(xfrm_tmpl)))
+		goto out_close;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		goto out_close;
+	}
+
+	if (recv(xfrm_sock, &req, sizeof(req), 0) < 0) {
+		pr_perror("recv()");
+		goto out_close;
+	} else if (req.nh.nlmsg_type != NLMSG_ERROR) {
+		pr_err("expected NLMSG_ERROR, got %d", (int)req.nh.nlmsg_type);
+		goto out_close;
+	}
+
+	if (req.error) {
+		pr_err("NLMSG_ERROR: %d: %s", req.error, strerror(-req.error));
+		ret = req.error;
+		goto out_close;
+	}
+
+	if (recv(xfrm_listen, &req, sizeof(req), 0) < 0) {
+		pr_perror("recv()");
+		goto out_close;
+	}
+
+	if (req.acq.aalgos != 0xfeed || req.acq.ealgos != 0xbaad
+			|| req.acq.calgos != 0xbabe) {
+		pr_err("xfrm_user_acquire has changed  %x %x %x",
+				req.acq.aalgos, req.acq.ealgos, req.acq.calgos);
+		goto out_close;
+	}
+
+	ret = 0;
+out_close:
+	close(xfrm_listen);
+	return ret;
+}
+
+static int xfrm_expire_state(int xfrm_sock, uint32_t *seq,
+		unsigned int nr, struct xfrm_desc *desc)
+{
+	struct {
+		struct nlmsghdr nh;
+		union {
+			struct xfrm_user_expire expire;
+			int error;
+		};
+	} req;
+	struct in_addr src, dst;
+	int xfrm_listen = -1, ret = -1;
+	uint32_t seq_listen;
+
+	src = inet_makeaddr(INADDR_B, child_ip(nr));
+	dst = inet_makeaddr(INADDR_B, grchild_ip(nr));
+
+	if (xfrm_state_add(xfrm_sock, (*seq)++, gen_spi(src), src, dst, desc)) {
+		pr_err("Failed to add xfrm state");
+		return -1;
+	}
+
+	if (netlink_sock_bind(&xfrm_listen, &seq_listen, NETLINK_XFRM, XFRMNLGRP_EXPIRE))
+		return -1;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.expire));
+	req.nh.nlmsg_type	= XFRM_MSG_EXPIRE;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= (*seq)++;
+
+	memcpy(&req.expire.state.id.daddr, &dst, sizeof(dst));
+	req.expire.state.id.spi		= gen_spi(src);
+	req.expire.state.id.proto	= desc->proto;
+	req.expire.state.family		= AF_INET;
+	req.expire.hard			= 0xff;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		goto out_close;
+	}
+
+	if (recv(xfrm_sock, &req, sizeof(req), 0) < 0) {
+		pr_perror("recv()");
+		goto out_close;
+	} else if (req.nh.nlmsg_type != NLMSG_ERROR) {
+		pr_err("expected NLMSG_ERROR, got %d", (int)req.nh.nlmsg_type);
+		goto out_close;
+	}
+
+	if (req.error) {
+		pr_err("NLMSG_ERROR: %d: %s", req.error, strerror(-req.error));
+		ret = req.error;
+		goto out_close;
+	}
+
+	if (recv(xfrm_listen, &req, sizeof(req), 0) < 0) {
+		pr_perror("recv()");
+		goto out_close;
+	}
+
+	if (req.expire.hard != 0x1) {
+		pr_err("expire.hard is not set: %x", req.expire.hard);
+		goto out_close;
+	}
+
+	ret = 0;
+out_close:
+	close(xfrm_listen);
+	return ret;
+}
+
+static int xfrm_expire_policy(int xfrm_sock, uint32_t *seq,
+		unsigned int nr, struct xfrm_desc *desc)
+{
+	struct {
+		struct nlmsghdr nh;
+		union {
+			struct xfrm_user_polexpire expire;
+			int error;
+		};
+	} req;
+	struct in_addr src, dst, tunsrc, tundst;
+	int xfrm_listen = -1, ret = -1;
+	uint32_t seq_listen;
+
+	src = inet_makeaddr(INADDR_B, child_ip(nr));
+	dst = inet_makeaddr(INADDR_B, grchild_ip(nr));
+	tunsrc = inet_makeaddr(INADDR_A, child_ip(nr));
+	tundst = inet_makeaddr(INADDR_A, grchild_ip(nr));
+
+	if (xfrm_policy_add(xfrm_sock, (*seq)++, gen_spi(src), src, dst,
+				XFRM_POLICY_OUT, tunsrc, tundst, desc->proto)) {
+		pr_err("Failed to add xfrm policy");
+		return -1;
+	}
+
+	if (netlink_sock_bind(&xfrm_listen, &seq_listen, NETLINK_XFRM, XFRMNLGRP_EXPIRE))
+		return -1;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.expire));
+	req.nh.nlmsg_type	= XFRM_MSG_POLEXPIRE;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= (*seq)++;
+
+	/* Fill selector. */
+	memcpy(&req.expire.pol.sel.daddr, &dst, sizeof(tundst));
+	memcpy(&req.expire.pol.sel.saddr, &src, sizeof(tunsrc));
+	req.expire.pol.sel.family	= AF_INET;
+	req.expire.pol.sel.prefixlen_d	= PREFIX_LEN;
+	req.expire.pol.sel.prefixlen_s	= PREFIX_LEN;
+	req.expire.pol.dir		= XFRM_POLICY_OUT;
+	req.expire.hard			= 0xff;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		goto out_close;
+	}
+
+	if (recv(xfrm_sock, &req, sizeof(req), 0) < 0) {
+		pr_perror("recv()");
+		goto out_close;
+	} else if (req.nh.nlmsg_type != NLMSG_ERROR) {
+		pr_err("expected NLMSG_ERROR, got %d", (int)req.nh.nlmsg_type);
+		goto out_close;
+	}
+
+	if (req.error) {
+		pr_err("NLMSG_ERROR: %d: %s", req.error, strerror(-req.error));
+		ret = req.error;
+		goto out_close;
+	}
+
+	if (recv(xfrm_listen, &req, sizeof(req), 0) < 0) {
+		pr_perror("recv()");
+		goto out_close;
+	}
+
+	if (req.expire.hard != 0x1) {
+		pr_err("expire.hard is not set: %x", req.expire.hard);
+		goto out_close;
+	}
+
+	ret = 0;
+out_close:
+	close(xfrm_listen);
+	return ret;
+}
+
+static void print_desc(char *lvl, char *msg, struct xfrm_desc *desc)
+{
+	printk("%s: [%u, '%s', '%s', '%s', '%s', %u]", lvl, msg,
+		(unsigned int)desc->proto, desc->a_algo, desc->e_algo,
+		desc->c_algo, desc->ae_algo, desc->icv_len);
+}
+
+static int child_serv(int xfrm_sock, uint32_t *seq,
+		unsigned int nr, int cmd_fd, void *buf, struct xfrm_desc *desc)
+{
+	struct in_addr src, dst, tunsrc, tundst;
+	struct test_desc msg;
+	int ret = -1;
+
+	src = inet_makeaddr(INADDR_B, child_ip(nr));
+	dst = inet_makeaddr(INADDR_B, grchild_ip(nr));
+	tunsrc = inet_makeaddr(INADDR_A, child_ip(nr));
+	tundst = inet_makeaddr(INADDR_A, grchild_ip(nr));
+
+	/* UDP pinging without xfrm */
+	if (do_ping(cmd_fd, buf, page_size, src, true, 0, 0, udp_ping_send)) {
+		pr_err("ping failed before setting xfrm");
+		return -1;
+	}
+
+	memset(&msg, 0, sizeof(msg));
+	msg.type = MSG_XFRM_PREPARE;
+	memcpy(&msg.body.xfrm_desc, desc, sizeof(*desc));
+	write_msg(cmd_fd, &msg);
+
+	if (xfrm_prepare(xfrm_sock, seq, src, dst, tunsrc, tundst, desc->proto)) {
+		print_desc("ERR", "failed to prepare xfrm", desc);
+		goto cleanup;
+	}
+
+	memset(&msg, 0, sizeof(msg));
+	msg.type = MSG_XFRM_ADD;
+	memcpy(&msg.body.xfrm_desc, desc, sizeof(*desc));
+	write_msg(cmd_fd, &msg);
+	if (xfrm_set(xfrm_sock, seq, src, dst, tunsrc, tundst, desc)) {
+		print_desc("ERR", "failed to set xfrm", desc);
+		goto cleanup;
+	}
+
+	/* UDP pinging with xfrm tunnel */
+	if (do_ping(cmd_fd, buf, page_size, tunsrc,
+				true, 0, 0, udp_ping_send)) {
+		print_desc("ERR", "ping failed for xfrm", desc);
+		goto delete;
+	}
+
+	print_desc("OK", "xfrm", desc);
+	ret = 0;
+delete:
+	/* xfrm delete */
+	memset(&msg, 0, sizeof(msg));
+	msg.type = MSG_XFRM_DEL;
+	memcpy(&msg.body.xfrm_desc, desc, sizeof(*desc));
+	write_msg(cmd_fd, &msg);
+
+	if (xfrm_delete(xfrm_sock, seq, src, dst, tunsrc, tundst, desc->proto)) {
+		print_desc("ERR", "ping to remove xfrm", desc);
+		ret = -1;
+	}
+
+cleanup:
+	memset(&msg, 0, sizeof(msg));
+	msg.type = MSG_XFRM_CLEANUP;
+	memcpy(&msg.body.xfrm_desc, desc, sizeof(*desc));
+	write_msg(cmd_fd, &msg);
+	if (xfrm_cleanup(xfrm_sock, seq, src, dst, tunsrc, tundst)) {
+		print_desc("ERR", "ping to cleanup xfrm", desc);
+		ret = -1;
+	}
+	return ret;
+}
+
+static int child_f(unsigned int nr, int test_desc_fd, int cmd_fd, void *buf)
+{
+	struct xfrm_desc desc;
+	struct test_desc msg;
+	int xfrm_sock = -1;
+	uint32_t seq;
+	int ret = 1;
+
+	if (switch_ns(nsfd_childa))
+		exit(1);
+
+	if (netlink_sock(&xfrm_sock, &seq, NETLINK_XFRM)) {
+		pr_err("Failed to open xfrm netlink socket");
+		return -1;
+	}
+
+	/* Check that seq sock is ready, just for sure. */
+	memset(&msg, 0, sizeof(msg));
+	msg.type = MSG_ACK;
+	write_msg(cmd_fd, &msg);
+	read_msg(cmd_fd, &msg);
+	if (msg.type != MSG_ACK) {
+		pr_err("Ack failed");
+		exit(1);
+	}
+
+	for (;;) {
+		ssize_t received = read(test_desc_fd, &desc, sizeof(desc));
+
+		if (received == 0) /* EOF */
+			break;
+
+		if (received != sizeof(desc)) {
+			pr_perror("read() returned %zd", received);
+			goto exit;
+		}
+
+		switch (desc.type) {
+		case CREATE_TUNNEL:
+			if (child_serv(xfrm_sock, &seq, nr, cmd_fd, buf, &desc))
+				goto exit;
+			break;
+		case ALLOCATE_SPI:
+			if (xfrm_state_allocspi(xfrm_sock, &seq, -1, desc.proto)) {
+				pr_err("allocspi failed");
+				goto exit;
+			}
+			pr_ok("allocspi");
+			break;
+		case MONITOR_ACQUIRE:
+			if (xfrm_monitor_acquire(xfrm_sock, &seq, nr)) {
+				pr_err("monitor acqure failed");
+				goto exit;
+			}
+			pr_ok("monitor acqure");
+			break;
+		case EXPIRE_STATE:
+			if (xfrm_expire_state(xfrm_sock, &seq, nr, &desc)) {
+				pr_err("expire state failed");
+				goto exit;
+			}
+			pr_ok("expire state");
+			break;
+		case EXPIRE_POLICY:
+			if (xfrm_expire_policy(xfrm_sock, &seq, nr, &desc)) {
+				pr_err("expire policy failed");
+				goto exit;
+			}
+			pr_ok("expire policy");
+			break;
+		default:
+			pr_err("Unknown desc type");
+			goto exit;
+		}
+	}
+
+	ret = 0;
+exit:
+	close(xfrm_sock);
+
+	msg.type = MSG_EXIT;
+	write_msg(cmd_fd, &msg);
+	exit(ret);
+}
+
+static int grand_child_serv(unsigned int nr, int cmd_fd, void *buf,
+		struct test_desc *msg, int xfrm_sock, uint32_t *seq)
+{
+	struct in_addr src, dst, tunsrc, tundst;
+	bool tun_reply;
+	struct xfrm_desc *desc = &msg->body.xfrm_desc;
+
+	src = inet_makeaddr(INADDR_B, grchild_ip(nr));
+	dst = inet_makeaddr(INADDR_B, child_ip(nr));
+	tunsrc = inet_makeaddr(INADDR_A, grchild_ip(nr));
+	tundst = inet_makeaddr(INADDR_A, child_ip(nr));
+
+	switch (msg->type) {
+	case MSG_EXIT:
+		exit(0);
+	case MSG_ACK:
+		write_msg(cmd_fd, msg);
+		break;
+	case MSG_PING:
+		tun_reply = memcmp(&dst, &msg->body.ping.reply_ip, sizeof(in_addr_t));
+		/* UDP pinging without xfrm */
+		if (do_ping(cmd_fd, buf, page_size, tun_reply ? tunsrc : src,
+				false, msg->body.ping.port,
+				msg->body.ping.reply_ip, udp_ping_reply)) {
+			pr_err("ping failed before setting xfrm");
+			return -1;
+		}
+		break;
+	case MSG_XFRM_PREPARE:
+		if (xfrm_prepare(xfrm_sock, seq, src, dst, tunsrc, tundst,
+					desc->proto)) {
+			print_desc("ERR", "failed to prepare xfrm", desc);
+			return -1;
+		}
+		break;
+	case MSG_XFRM_ADD:
+		if (xfrm_set(xfrm_sock, seq, src, dst, tunsrc, tundst, desc)) {
+			print_desc("ERR", "failed to set xfrm", desc);
+			return -1;
+		}
+		break;
+	case MSG_XFRM_DEL:
+		if (xfrm_delete(xfrm_sock, seq, src, dst, tunsrc, tundst,
+					desc->proto)) {
+			print_desc("ERR", "failed to remove xfrm", desc);
+			return -1;
+		}
+		break;
+	case MSG_XFRM_CLEANUP:
+		if (xfrm_cleanup(xfrm_sock, seq, src, dst, tunsrc, tundst)) {
+			print_desc("ERR", "failed to cleanup xfrm", desc);
+			return -1;
+		}
+		break;
+	default:
+		pr_err("got unknown msg type %d\n", msg->type);
+		return -1;
+	};
+
+	return 0;
+}
+
+static int grand_child_f(unsigned int nr, int cmd_fd, void *buf)
+{
+	struct test_desc msg;
+	int xfrm_sock = -1;
+	uint32_t seq;
+
+	if (switch_ns(nsfd_childb))
+		exit(1);
+
+	if (netlink_sock(&xfrm_sock, &seq, NETLINK_XFRM)) {
+		pr_err("Failed to open xfrm netlink socket");
+		return -1;
+	}
+
+	do {
+		read_msg(cmd_fd, &msg);
+		if (grand_child_serv(nr, cmd_fd, buf, &msg, xfrm_sock, &seq))
+			break;
+	} while (1);
+
+	close(xfrm_sock);
+	exit(1);
+}
+
+static int start_child(unsigned int nr, char *veth, int test_desc_fd[2])
+{
+	uint32_t route_seq;
+	int cmd_sock[2];
+	void *data_map;
+	pid_t child;
+
+	if (init_child(nsfd_childa, veth, child_ip(nr), grchild_ip(nr)))
+		return -1;
+
+	if (init_child(nsfd_childb, veth, grchild_ip(nr), child_ip(nr)))
+		return -1;
+
+	child = fork();
+	if (child < 0) {
+		pr_perror("fork()");
+		return -1;
+	} else if (child) {
+		/* in parent - selftest */
+		return switch_ns(nsfd_parent);
+	}
+
+	if (close(test_desc_fd[1])) {
+		pr_perror("close()");
+		return -1;
+	}
+
+	/* child */
+	data_map = mmap(0, page_size, PROT_READ | PROT_WRITE,
+			MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+	if (data_map == MAP_FAILED) {
+		pr_perror("mmap()");
+		return -1;
+	}
+	if (randomize_buffer(data_map, page_size))
+		return -1;
+
+	if (socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, cmd_sock)) {
+		pr_perror("socketpair()");
+		return -1;
+	}
+
+	child = fork();
+	if (child < 0) {
+		pr_perror("fork()");
+		return -1;
+	} else if (child) {
+		if (close(cmd_sock[0])) {
+			pr_perror("close()");
+			return -1;
+		}
+		return child_f(nr, test_desc_fd[0], cmd_sock[1], data_map);
+	}
+	if (close(cmd_sock[1])) {
+		pr_perror("close()");
+		return -1;
+	}
+	return grand_child_f(nr, cmd_sock[0], data_map);
+}
+
+static void usage_exit(char **argv)
+{
+	fprintf(stderr, "Usage: %s [nr_process]\n", argv[0]);
+	exit(1);
+}
+
+static int write_desc(int proto, int test_desc_fd,
+		char *a, char *e, char *c, char *ae)
+{
+	struct xfrm_desc desc = {};
+
+	desc.type = CREATE_TUNNEL;
+	desc.proto = proto;
+
+	if (a)
+		strncpy(desc.a_algo, a, ALGO_LEN);
+	if (e)
+		strncpy(desc.e_algo, e, ALGO_LEN);
+	if (c)
+		strncpy(desc.c_algo, c, ALGO_LEN);
+	if (ae)
+		strncpy(desc.ae_algo, ae, ALGO_LEN);
+
+	return write(test_desc_fd, &desc, sizeof(desc)) != sizeof(desc);
+}
+
+int proto_list[] = { IPPROTO_AH, IPPROTO_COMP, IPPROTO_ESP };
+char *ah_list[] = {
+	"digest_null", "hmac(md5)", "hmac(sha1)", "hmac(sha256)",
+	"hmac(sha384)", "hmac(sha512)", "hmac(rmd160)",
+	"xcbc(aes)", "cmac(aes)"
+};
+char *comp_list[] = {
+	"deflate"
+#if 0
+	/* No compression backend realization */
+	"lzs", "lzjh"
+#endif
+};
+char *e_list[] = {
+	"ecb(cipher_null)", "cbc(des)", "cbc(des3_ede)", "cbc(cast5)",
+	"cbc(blowfish)", "cbc(aes)", "cbc(serpent)", "cbc(camellia)",
+	"cbc(twofish)", "rfc3686(ctr(aes))"
+};
+char *ae_list[] = {
+#if 0
+	/* not implemented */
+	"rfc4106(gcm(aes))", "rfc4309(ccm(aes))", "rfc4543(gcm(aes))",
+	"rfc7539esp(chacha20,poly1305)"
+#endif
+};
+
+static int write_proto_plan(int fd, int proto)
+{
+	unsigned int i;
+
+	switch (proto) {
+	case IPPROTO_AH:
+		for (i = 0; i < ARRAY_SIZE(ah_list); i++) {
+			if (write_desc(proto, fd, ah_list[i], 0, 0, 0)) {
+				pr_err("writing test's desc failed");
+				return -1;
+			}
+		}
+		break;
+	case IPPROTO_COMP:
+		for (i = 0; i < ARRAY_SIZE(comp_list); i++) {
+			if (write_desc(proto, fd, 0, 0, comp_list[i], 0)) {
+				pr_err("writing test's desc failed");
+				return -1;
+			}
+		}
+		break;
+	case IPPROTO_ESP:
+		for (i = 0; i < ARRAY_SIZE(ah_list); i++) {
+			int j;
+
+			for (j = 0; j < ARRAY_SIZE(e_list); j++) {
+				if (write_desc(proto, fd, ah_list[i],
+							e_list[j], 0, 0)) {
+					pr_err("writing test's desc failed");
+					return -1;
+				}
+			}
+		}
+		for (i = 0; i < ARRAY_SIZE(ae_list); i++) {
+			if (write_desc(proto, fd, 0, 0, 0, ae_list[i])) {
+				pr_err("writing test's desc failed");
+				return -1;
+			}
+		}
+		break;
+	default:
+		pr_err("BUG: Specified unknown proto %d", proto);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int write_compat_struct_tests(int test_desc_fd)
+{
+	struct xfrm_desc desc = {};
+
+	desc.type = ALLOCATE_SPI;
+	desc.proto = IPPROTO_AH;
+	strncpy(desc.a_algo, ah_list[0], ALGO_LEN);
+
+	if (write(test_desc_fd, &desc, sizeof(desc)) != sizeof(desc))
+		return -1;
+
+	desc.type = MONITOR_ACQUIRE;
+	if (write(test_desc_fd, &desc, sizeof(desc)) != sizeof(desc))
+		return -1;
+
+	desc.type = EXPIRE_STATE;
+	if (write(test_desc_fd, &desc, sizeof(desc)) != sizeof(desc))
+		return -1;
+
+	desc.type = EXPIRE_POLICY;
+	if (write(test_desc_fd, &desc, sizeof(desc)) != sizeof(desc))
+		return -1;
+
+	return 0;
+}
+
+static int write_test_plan(int test_desc_fd)
+{
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(proto_list); i++) {
+		if (write_proto_plan(test_desc_fd, proto_list[i]))
+			return -1;
+	}
+
+	if (write_compat_struct_tests(test_desc_fd))
+		return -1;
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	unsigned int nr_process = 1;
+	int route_sock = -1, ret = 1;
+	int test_desc_fd[2];
+	uint32_t route_seq;
+	unsigned int i;
+
+	if (argc > 2)
+		usage_exit(argv);
+
+	if (argc > 1) {
+		char *endptr;
+
+		errno = 0;
+		nr_process = strtol(argv[1], &endptr, 10);
+		if ((errno == ERANGE && (nr_process == LONG_MAX || nr_process == LONG_MIN))
+				|| (errno != 0 && nr_process == 0)
+				|| (endptr == argv[1]) || (*endptr != '\0')) {
+			pr_err("Failed to parse [nr_process]");
+			usage_exit(argv);
+		}
+
+		if (nr_process > MAX_PROCESSES || !nr_process) {
+			pr_err("nr_process should be between [1; %u]", MAX_PROCESSES);
+			usage_exit(argv);
+		}
+	}
+
+	page_size = sysconf(_SC_PAGESIZE);
+	if (page_size < 1) {
+		pr_perror("sysconf()");
+		return 1;
+	}
+
+	if (pipe2(test_desc_fd, O_DIRECT) < 0) {
+		pr_perror("pipe()");
+		return 1;
+	}
+
+	if (init_namespaces()) {
+		pr_err("Failed to create namespaces");
+		return 1;
+	}
+
+	if (netlink_sock(&route_sock, &route_seq, NETLINK_ROUTE)) {
+		pr_err("Failed to open netlink route socket");
+		return 1;
+	}
+
+	for (i = 0; i < nr_process; i++) {
+		char veth[VETH_LEN];
+
+		snprintf(veth, VETH_LEN, VETH_FMT, i);
+
+		if (veth_add(route_sock, route_seq++, veth, nsfd_childa, veth, nsfd_childb)) {
+			pr_err("Failed to create veth device");
+			goto err;
+		}
+
+		if (start_child(i, veth, test_desc_fd)) {
+			pr_err("Child failed to start");
+			goto err;
+		}
+	}
+
+	if (close(test_desc_fd[0])) {
+		pr_perror("close()");
+		goto err;
+	}
+
+	ret = write_test_plan(test_desc_fd[1]);
+	/* XXX: add wait() */
+err:
+	close(route_sock);
+	return ret;
+}
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 03/18] selftest/net/xfrm: Add test for ipsec tunnel
@ 2018-07-26  2:31   ` dima
  0 siblings, 0 replies; 74+ messages in thread
From: dima @ 2018-07-26  2:31 UTC (permalink / raw)


It's an exhaustive testing for ipsec: covering all encryption/
authentication/compression algorithms. The tests are run in two
network namespaces, connected by veth interfaces. To make exhaustive
testing less time-consuming, the tests are run in parallel tasks,
specified by parameter to the selftest.

As the patches set adds support for xfrm in compatible tasks, there are
tests to check structures that differ in size between 64-bit and 32-bit
applications.
The selftest doesn't use libnl so that it can be easily compiled as
compatible application and don't require compatible .so.

Here is a diagram of the selftest:

                           ---------------
                           |  selftest   |
                           |  (parent)   |
                           ---------------
                              |        |
                              | (pipe) |
                              ----------
                             /   |  |   \
               /-------------   /    \   -------------\
               |          /-----      -----\          |
      ---------|----------|----------------|----------|---------
      |   ---------   ---------        ---------   ---------   |
      |   | child |   | child |  NS A  | child |   | child |   |
      |   ---------   ---------        ---------   ---------   |
      -------|------------|----------------|-------------|------
           veth0        veth1            veth2         vethN
    ---------|------------|----------------|-------------|----------
    | ------------  ------------       ------------   ------------ |
    | | gr.child |  | gr.child | NS B  | gr.child |   | gr.child | |
    | ------------  ------------       ------------   ------------ |
    ----------------------------------------------------------------

The parent sends the description of a test (xfrm parameters) to the
child, the child and grand child setup a tunnel over veth interface and
test it by sending udp packets.

Cc: Andrew Morton <akpm at linux-foundation.org>
Cc: "David S. Miller" <davem at davemloft.net>
Cc: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
Cc: Herbert Xu <herbert at gondor.apana.org.au>
Cc: Mauro Carvalho Chehab <mchehab+samsung at kernel.org>
Cc: Shuah Khan <shuah at kernel.org>
Cc: Steffen Klassert <steffen.klassert at secunet.com>
Cc: linux-kselftest at vger.kernel.org
Cc: netdev at vger.kernel.org
Signed-off-by: Dmitry Safonov <dima at arista.com>
---
 MAINTAINERS                            |    1 +
 tools/testing/selftests/net/.gitignore |    1 +
 tools/testing/selftests/net/Makefile   |    1 +
 tools/testing/selftests/net/ipsec.c    | 1987 ++++++++++++++++++++++++++++++++
 4 files changed, 1990 insertions(+)
 create mode 100644 tools/testing/selftests/net/ipsec.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 0fe4228f78cb..7e20db5d0210 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9938,6 +9938,7 @@ F:	net/ipv6/ipcomp6.c
 F:	net/ipv6/ip6_vti.c
 F:	include/uapi/linux/xfrm.h
 F:	include/net/xfrm.h
+F:	tools/testing/selftests/net/ipsec.c
 
 NETWORKING [IPv4/IPv6]
 M:	"David S. Miller" <davem at davemloft.net>
diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
index 1a0ac3a29ec5..6896547292cb 100644
--- a/tools/testing/selftests/net/.gitignore
+++ b/tools/testing/selftests/net/.gitignore
@@ -1,3 +1,4 @@
+ipsec
 msg_zerocopy
 socket
 psock_fanout
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 663e11e85727..9f35c01fbc0a 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -12,6 +12,7 @@ TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
 TEST_GEN_FILES += tcp_mmap tcp_inq psock_snd
 TEST_GEN_FILES += udpgso udpgso_bench_tx udpgso_bench_rx
+TEST_GEN_FILES += ipsec
 TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
 TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict
 
diff --git a/tools/testing/selftests/net/ipsec.c b/tools/testing/selftests/net/ipsec.c
new file mode 100644
index 000000000000..e0752e127ba7
--- /dev/null
+++ b/tools/testing/selftests/net/ipsec.c
@@ -0,0 +1,1987 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ipsec.c - Check xfrm on veth inside a net-ns.
+ * Copyright (c) 2018 Dmitry Safonov (Arista Networks)
+ */
+
+#define _GNU_SOURCE
+
+#include <arpa/inet.h>
+#include <asm/types.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <linux/limits.h>
+#include <linux/netlink.h>
+#include <linux/random.h>
+#include <linux/rtnetlink.h>
+#include <linux/veth.h>
+#include <linux/xfrm.h>
+#include <netinet/in.h>
+#include <net/if.h>
+#include <sched.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <time.h>
+#include <unistd.h>
+
+#define printk(fmt, lvl, ...)						\
+	fprintf(stderr, "[%s] (%s:%d)\t" fmt "\n",			\
+	lvl, __FILE__, __LINE__, ##__VA_ARGS__)
+
+#define pr_p(func, fmt, ...)	func(fmt ": %m", ##__VA_ARGS__)
+
+#define pr_err(fmt, ...)						\
+	printk(fmt, "ERR", ##__VA_ARGS__)
+#define pr_warn(fmt, ...)						\
+	printk(fmt, "WARN", ##__VA_ARGS__)
+#define pr_note(fmt, ...)						\
+	printk(fmt, "NOTE", ##__VA_ARGS__)
+#define pr_ok(fmt, ...)							\
+	printk(fmt, "OK", ##__VA_ARGS__)
+#define pr_debug(fmt, ...)						\
+	while (0) {							\
+		printk(fmt, "NOTE", ##__VA_ARGS__);			\
+	}
+
+#define pr_perror(fmt, ...)	pr_p(pr_err, fmt, ##__VA_ARGS__)
+#define pr_pwarn(fmt, ...)	pr_p(pr_warn, fmt, ##__VA_ARGS__)
+
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+#define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
+
+#define IPV4_STR_SZ	16	/* xxx.xxx.xxx.xxx is longest + \0 */
+#define MAX_PAYLOAD	2048
+#define XFRM_ALGO_KEY_BUF_SIZE	512
+#define MAX_PROCESSES	(1 << 14) /* /16 mask divided by /30 subnets */
+#define INADDR_A	((in_addr_t) 0x0a000000) /* 10.0.0.0 */
+#define INADDR_B	((in_addr_t) 0xc0a80000) /* 192.168.0.0 */
+
+/* /30 mask for one veth connection */
+#define PREFIX_LEN	30
+#define child_ip(nr)	(4*nr + 1)
+#define grchild_ip(nr)	(4*nr + 2)
+
+#define VETH_FMT	"ktst-%d"
+#define VETH_LEN	10
+#define BEGIN_SEQ	(time(NULL))
+
+static int nsfd_parent	= -1;
+static int nsfd_childa	= -1;
+static int nsfd_childb	= -1;
+static long page_size;
+
+const unsigned int ping_delay_nsec	= 50 * 1000 * 1000;
+const unsigned int ping_timeout		= 300;
+const unsigned int ping_count		= 100;
+const unsigned int ping_success		= 80;
+
+static int unshare_open(void)
+{
+	const char *netns_path = "/proc/self/ns/net";
+	int fd;
+
+	if (unshare(CLONE_NEWNET) != 0) {
+		pr_pwarn("unshare()");
+		return -1;
+	}
+
+	fd = open(netns_path, O_RDONLY);
+	if (fd <= 0) {
+		pr_pwarn("open(%s)", netns_path);
+		return -1;
+	}
+
+	return fd;
+}
+
+static int switch_ns(int fd)
+{
+	if (setns(fd, CLONE_NEWNET)) {
+		pr_pwarn("setns()");
+		return -1;
+	}
+	return 0;
+}
+
+/*
+ * Running the test inside a new parent net namespace to bother less
+ * about cleanup on error-path.
+ */
+static int init_namespaces(void)
+{
+	nsfd_parent = unshare_open();
+	if (nsfd_parent <= 0)
+		return -1;
+
+	nsfd_childa = unshare_open();
+	if (nsfd_childa <= 0)
+		return -1;
+
+	if (switch_ns(nsfd_parent))
+		return -1;
+
+	nsfd_childb = unshare_open();
+	if (nsfd_childb <= 0)
+		return -1;
+
+	if (switch_ns(nsfd_parent))
+		return -1;
+	return 0;
+}
+
+static int netlink_sock(int *sock, uint32_t *seq_nr, int proto)
+{
+	int route_sock = 0;
+	uint32_t seq;
+
+	if (*sock > 0) {
+		seq_nr++;
+		return 0;
+	}
+
+	*sock = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, proto);
+	if (*sock <= 0) {
+		pr_pwarn("socket(AF_NETLINK)");
+		return errno;
+	}
+	seq = BEGIN_SEQ;
+
+	return 0;
+}
+
+static inline struct rtattr *rtattr_hdr(struct nlmsghdr *nh)
+{
+	return (struct rtattr *)((char *)(nh) + RTA_ALIGN((nh)->nlmsg_len));
+}
+
+static int rtattr_pack(struct nlmsghdr *nh, size_t req_sz,
+		unsigned short rta_type, const void *payload, size_t size)
+{
+	/* NLMSG_ALIGNTO == RTA_ALIGNTO, nlmsg_len already aligned */
+	struct rtattr *attr = rtattr_hdr(nh);
+	size_t nl_size = RTA_ALIGN(nh->nlmsg_len) + RTA_LENGTH(size);
+
+	if (req_sz < nl_size) {
+		pr_err("req buf is too small: %zu < %zu", req_sz, nl_size);
+		return -1;
+	}
+	nh->nlmsg_len = nl_size;
+
+	attr->rta_len = RTA_LENGTH(size); /* XXX: rta_len = size? */
+	attr->rta_type = rta_type;
+	memcpy(RTA_DATA(attr), payload, size);
+
+	return 0;
+}
+
+static struct rtattr *_rtattr_begin(struct nlmsghdr *nh, size_t req_sz,
+		unsigned short rta_type, const void *payload, size_t size)
+{
+	struct rtattr *ret = rtattr_hdr(nh);
+
+	if (rtattr_pack(nh, req_sz, rta_type, payload, size))
+		return 0;
+
+	return ret;
+}
+
+static inline struct rtattr *rtattr_begin(struct nlmsghdr *nh, size_t req_sz,
+		unsigned short rta_type)
+{
+	return _rtattr_begin(nh, req_sz, rta_type, 0, 0);
+}
+
+static inline void rtattr_end(struct nlmsghdr *nh, struct rtattr *attr)
+{
+	char *nlmsg_end = (char *)nh + nh->nlmsg_len;
+
+	attr->rta_len = nlmsg_end - (char *)attr;
+}
+
+static int veth_pack_peerb(struct nlmsghdr *nh, size_t req_sz,
+		const char *peer, int ns)
+{
+	struct ifinfomsg pi;
+	struct rtattr *peer_attr;
+
+	memset(&pi, 0, sizeof(pi));
+	pi.ifi_family	= AF_UNSPEC;
+	pi.ifi_change	= 0xFFFFFFFF;
+
+	peer_attr = _rtattr_begin(nh, req_sz, VETH_INFO_PEER, &pi, sizeof(pi));
+	if (!peer_attr)
+		return -1;
+
+	if (rtattr_pack(nh, req_sz, IFLA_IFNAME, peer, strlen(peer)))
+		return -1;
+
+	if (rtattr_pack(nh, req_sz, IFLA_NET_NS_FD, &ns, sizeof(ns)))
+		return -1;
+
+	rtattr_end(nh, peer_attr);
+
+	return 0;
+}
+
+static int netlink_check_answer(int sock)
+{
+	struct nlmsgerror {
+		struct nlmsghdr hdr;
+		int error;
+		struct nlmsghdr orig_msg;
+	} answer;
+
+	if (recv(sock, &answer, sizeof(answer), 0) < 0) {
+		pr_perror("recv()");
+		return -1;
+	} else if (answer.hdr.nlmsg_type != NLMSG_ERROR) {
+		pr_err("expected NLMSG_ERROR, got %d", (int)answer.hdr.nlmsg_type);
+		return -1;
+	} else if (answer.error) {
+		pr_err("NLMSG_ERROR: %d: %s",
+			answer.error, strerror(-answer.error));
+		return answer.error;
+	}
+
+	return 0;
+}
+
+static int veth_add(int sock, uint32_t seq, const char *peera, int ns_a,
+		const char *peerb, int ns_b)
+{
+	uint16_t flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_EXCL | NLM_F_CREATE;
+	struct {
+		struct nlmsghdr		nh;
+		struct ifinfomsg	info;
+		char			attrbuf[MAX_PAYLOAD];
+	} req;
+	const char veth_type[] = "veth";
+	struct rtattr *link_info, *info_data;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.info));
+	req.nh.nlmsg_type	= RTM_NEWLINK;
+	req.nh.nlmsg_flags	= flags;
+	req.nh.nlmsg_seq	= seq;
+	req.info.ifi_family	= AF_UNSPEC;
+	req.info.ifi_change	= 0xFFFFFFFF;
+
+	if (rtattr_pack(&req.nh, sizeof(req), IFLA_IFNAME, peera, strlen(peera)))
+		return -1;
+
+	if (rtattr_pack(&req.nh, sizeof(req), IFLA_NET_NS_FD, &ns_a, sizeof(ns_a)))
+		return -1;
+
+	link_info = rtattr_begin(&req.nh, sizeof(req), IFLA_LINKINFO);
+	if (!link_info)
+		return -1;
+
+	if (rtattr_pack(&req.nh, sizeof(req), IFLA_INFO_KIND, veth_type, sizeof(veth_type)))
+		return -1;
+
+	info_data = rtattr_begin(&req.nh, sizeof(req), IFLA_INFO_DATA);
+	if (!info_data)
+		return -1;
+
+	if (veth_pack_peerb(&req.nh, sizeof(req), peerb, ns_b))
+		return -1;
+
+	rtattr_end(&req.nh, info_data);
+	rtattr_end(&req.nh, link_info);
+
+	if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+	return netlink_check_answer(sock);
+}
+
+static int ip4_addr_set(int sock, uint32_t seq, const char *intf,
+		struct in_addr addr, uint8_t prefix)
+{
+	uint16_t flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_EXCL | NLM_F_CREATE;
+	struct {
+		struct nlmsghdr		nh;
+		struct ifaddrmsg	info;
+		char			attrbuf[MAX_PAYLOAD];
+	} req;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.info));
+	req.nh.nlmsg_type	= RTM_NEWADDR;
+	req.nh.nlmsg_flags	= flags;
+	req.nh.nlmsg_seq	= seq;
+	req.info.ifa_family	= AF_INET;
+	req.info.ifa_prefixlen	= prefix;
+	req.info.ifa_index	= if_nametoindex(intf);
+
+#if 0
+	{
+		char addr_str[IPV4_STR_SZ] = {};
+
+		strncpy(addr_str, inet_ntoa(addr), IPV4_STR_SZ - 1);
+
+		pr_warn("ip addr set %s", addr_str);
+	}
+#endif
+
+	if (rtattr_pack(&req.nh, sizeof(req), IFA_LOCAL, &addr, sizeof(addr)))
+		return -1;
+
+	if (rtattr_pack(&req.nh, sizeof(req), IFA_ADDRESS, &addr, sizeof(addr)))
+		return -1;
+
+	if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+	return netlink_check_answer(sock);
+}
+
+static int link_set_up(int sock, uint32_t seq, const char *intf)
+{
+	struct {
+		struct nlmsghdr		nh;
+		struct ifinfomsg	info;
+		char			attrbuf[MAX_PAYLOAD];
+	} req;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.info));
+	req.nh.nlmsg_type	= RTM_NEWLINK;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= seq;
+	req.info.ifi_family	= AF_UNSPEC;
+	req.info.ifi_change	= 0xFFFFFFFF;
+	req.info.ifi_index	= if_nametoindex(intf);
+	req.info.ifi_flags	= IFF_UP;
+	req.info.ifi_change	= IFF_UP;
+
+	if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+	return netlink_check_answer(sock);
+}
+
+static int ip4_route_set(int sock, uint32_t seq, const char *intf,
+		struct in_addr src, struct in_addr dst)
+{
+	struct {
+		struct nlmsghdr	nh;
+		struct rtmsg	rt;
+		char		attrbuf[MAX_PAYLOAD];
+	} req;
+	unsigned int index = if_nametoindex(intf);
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.rt));
+	req.nh.nlmsg_type	= RTM_NEWROUTE;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK | NLM_F_CREATE;
+	req.nh.nlmsg_seq	= seq;
+	req.rt.rtm_family	= AF_INET;
+	req.rt.rtm_dst_len	= 32;
+	req.rt.rtm_table	= RT_TABLE_MAIN;
+	req.rt.rtm_protocol	= RTPROT_BOOT;
+	req.rt.rtm_scope	= RT_SCOPE_LINK;
+	req.rt.rtm_type		= RTN_UNICAST;
+
+	if (rtattr_pack(&req.nh, sizeof(req), RTA_DST, &dst, sizeof(dst)))
+		return -1;
+
+	if (rtattr_pack(&req.nh, sizeof(req), RTA_PREFSRC, &src, sizeof(src)))
+		return -1;
+
+	if (rtattr_pack(&req.nh, sizeof(req), RTA_OIF, &index, sizeof(index)))
+		return -1;
+
+	if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+
+	return netlink_check_answer(sock);
+}
+
+static int tunnel_set_route(int route_sock, uint32_t *route_seq, char *veth,
+		struct in_addr tunsrc, struct in_addr tundst)
+{
+	if (ip4_addr_set(route_sock, (*route_seq)++, "lo",
+			tunsrc, PREFIX_LEN)) {
+		pr_err("Failed to set ipv4 addr");
+		return -1;
+	}
+
+	if (ip4_route_set(route_sock, (*route_seq)++, veth, tunsrc, tundst)) {
+		pr_err("Failed to set ipv4 route");
+		return -1;
+	}
+}
+
+static int init_child(int nsfd, char *veth, unsigned int src, unsigned int dst)
+{
+	struct in_addr intsrc = inet_makeaddr(INADDR_B, src);
+	struct in_addr tunsrc = inet_makeaddr(INADDR_A, src);
+	struct in_addr tundst = inet_makeaddr(INADDR_A, dst);
+	int route_sock = -1, ret = -1;
+	uint32_t route_seq;
+
+	if (switch_ns(nsfd))
+		return -1;
+
+	if (netlink_sock(&route_sock, &route_seq, NETLINK_ROUTE)) {
+		pr_err("Failed to open netlink route socket in child");
+		return -1;
+	}
+
+	if (ip4_addr_set(route_sock, route_seq++, veth, intsrc, PREFIX_LEN)) {
+		pr_err("Failed to set ipv4 addr");
+		goto err;
+	}
+
+	if (link_set_up(route_sock, route_seq++, veth)) {
+		pr_err("Failed to bring up %s", veth);
+		goto err;
+	}
+
+	if (tunnel_set_route(route_sock, &route_seq, veth, tunsrc, tundst)) {
+		pr_err("Failed to add tunnel route on %s", veth);
+		goto err;
+	}
+	ret = 0;
+
+err:
+	close(route_sock);
+	return ret;
+}
+
+#define ALGO_LEN	64
+enum desc_type {
+	CREATE_TUNNEL	= 0,
+	ALLOCATE_SPI,
+	MONITOR_ACQUIRE,
+	EXPIRE_STATE,
+	EXPIRE_POLICY,
+};
+struct xfrm_desc {
+	enum desc_type	type;
+	uint8_t		proto;
+	char		a_algo[ALGO_LEN];
+	char		e_algo[ALGO_LEN];
+	char		c_algo[ALGO_LEN];
+	char		ae_algo[ALGO_LEN];
+	unsigned int	icv_len;
+	/* unsigned key_len; */
+};
+
+enum msg_type {
+	MSG_ACK		= 0,
+	MSG_EXIT,
+	MSG_PING,
+	MSG_XFRM_PREPARE,
+	MSG_XFRM_ADD,
+	MSG_XFRM_DEL,
+	MSG_XFRM_CLEANUP,
+};
+
+struct test_desc {
+	enum msg_type type;
+	union {
+		struct {
+			in_addr_t reply_ip;
+			unsigned int port;
+		} ping;
+		struct xfrm_desc xfrm_desc;
+	} body;
+};
+
+static void write_msg(int fd, struct test_desc *msg)
+{
+	ssize_t bytes = write(fd, msg, sizeof(*msg));
+
+	/* Make sure that write/read is atomic to a pipe */
+	BUILD_BUG_ON(sizeof(struct test_desc) > PIPE_BUF);
+
+	if (bytes < 0) {
+		pr_perror("write()");
+		exit(1);
+	}
+	if (bytes != sizeof(*msg)) {
+		pr_perror("sent part of the message %zd/%zu", bytes, sizeof(*msg));
+		exit(1);
+	}
+}
+
+static void read_msg(int fd, struct test_desc *msg)
+{
+	ssize_t bytes = read(fd, msg, sizeof(*msg));
+
+	if (bytes < 0) {
+		pr_perror("read()");
+		exit(1);
+	}
+	if (bytes != sizeof(*msg)) {
+		pr_perror("got incomplete message %zd/%zu", bytes, sizeof(*msg));
+		exit(1);
+	}
+}
+
+static int udp_ping_init(struct in_addr listen_ip, unsigned int u_timeout,
+		unsigned int *server_port, int sock[2])
+{
+	struct sockaddr_in server;
+	struct timeval t = { .tv_sec = 0, .tv_usec = u_timeout };
+	socklen_t s_len = sizeof(server);
+
+	sock[0] = socket(AF_INET, SOCK_DGRAM, 0);
+	if (sock[0] < 0) {
+		pr_perror("socket()");
+		return -1;
+	}
+
+	server.sin_family	= AF_INET;
+	server.sin_port		= 0;
+	memcpy(&server.sin_addr.s_addr, &listen_ip, sizeof(struct in_addr));
+
+	if (bind(sock[0], (struct sockaddr *)&server, s_len)) {
+		pr_perror("bind()");
+		goto err_close_server;
+	}
+
+	if (getsockname(sock[0], (struct sockaddr *)&server, &s_len)) {
+		pr_perror("getsockname()");
+		goto err_close_server;
+	}
+
+	*server_port = ntohs(server.sin_port);
+
+	if (setsockopt(sock[0], SOL_SOCKET, SO_RCVTIMEO, (const char *)&t, sizeof t)) {
+		pr_perror("setsockopt()");
+		goto err_close_server;
+	}
+
+	sock[1] = socket(AF_INET, SOCK_DGRAM, 0);
+	if (sock[1] < 0) {
+		pr_perror("socket()");
+		goto err_close_server;
+	}
+
+	return 0;
+
+err_close_server:
+	close(sock[0]);
+	return -1;
+}
+
+static int udp_ping_send(int sock[2], in_addr_t dest_ip, unsigned int port,
+		char *buf, size_t buf_len)
+{
+	struct sockaddr_in server;
+	const struct sockaddr *dest_addr = (struct sockaddr *)&server;
+	char *sock_buf[buf_len];
+	ssize_t r_bytes, s_bytes;
+
+	server.sin_family	= AF_INET;
+	server.sin_port		= htons(port);
+	server.sin_addr.s_addr	= dest_ip;
+
+	s_bytes = sendto(sock[1], buf, buf_len, 0, dest_addr, sizeof(server));
+	if (s_bytes < 0) {
+		pr_perror("sendto()");
+		return -1;
+	} else if (s_bytes != buf_len) {
+		pr_err("send part of the message: %zd/%zu", s_bytes, sizeof(server));
+		return -1;
+	}
+
+	r_bytes = recv(sock[0], sock_buf, buf_len, 0);
+	if (r_bytes < 0) {
+		if (errno != EAGAIN)
+			pr_perror("recv()");
+		return -1;
+	} else if (r_bytes == 0) { /* EOF */
+		pr_err("EOF on reply to ping");
+		return -1;
+	} else if (r_bytes != buf_len || memcmp(buf, sock_buf, buf_len)) {
+		pr_err("ping reply packet is corrupted %zd/%zu", r_bytes, buf_len);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int udp_ping_reply(int sock[2], in_addr_t dest_ip, unsigned int port,
+		char *buf, size_t buf_len)
+{
+	struct sockaddr_in server;
+	const struct sockaddr *dest_addr = (struct sockaddr *)&server;
+	char *sock_buf[buf_len];
+	ssize_t r_bytes, s_bytes;
+
+	server.sin_family	= AF_INET;
+	server.sin_port		= htons(port);
+	server.sin_addr.s_addr	= dest_ip;
+
+	r_bytes = recv(sock[0], sock_buf, buf_len, 0);
+	if (r_bytes < 0) {
+		if (errno != EAGAIN)
+			pr_perror("recv()");
+		return -1;
+	}
+	if (r_bytes == 0) { /* EOF */
+		pr_err("EOF on reply to ping");
+		return -1;
+	}
+	if (r_bytes != buf_len || memcmp(buf, sock_buf, buf_len)) {
+		pr_err("ping reply packet is corrupted %zd/%zu", r_bytes, buf_len);
+		return -1;
+	}
+
+	s_bytes = sendto(sock[1], buf, buf_len, 0, dest_addr, sizeof(server));
+	if (s_bytes < 0) {
+		pr_perror("sendto()");
+		return -1;
+	} else if (s_bytes != buf_len) {
+		pr_err("send part of the message: %zd/%zu", s_bytes, sizeof(server));
+		return -1;
+	}
+
+	return 0;
+}
+
+typedef int (*ping_f)(int sock[2], in_addr_t dest_ip, unsigned int port,
+		char *buf, size_t buf_len);
+static int do_ping(int cmd_fd, char *buf, size_t buf_len, struct in_addr from,
+		bool init_side, int d_port, in_addr_t to, ping_f func)
+{
+	struct test_desc msg;
+	unsigned int s_port, i, ping_succeeded = 0;
+	int ping_sock[2];
+	char to_str[IPV4_STR_SZ] = {}, from_str[IPV4_STR_SZ] = {};
+
+	if (udp_ping_init(from, ping_timeout, &s_port, ping_sock)) {
+		pr_err("Failed to init ping");
+		return -1;
+	}
+
+	memset(&msg, 0, sizeof(msg));
+	msg.type		= MSG_PING;
+	msg.body.ping.port	= s_port;
+	memcpy(&msg.body.ping.reply_ip, &from, sizeof(from));
+
+	write_msg(cmd_fd, &msg);
+	if (init_side) {
+		/* The other end sends ip to ping */
+		read_msg(cmd_fd, &msg);
+		if (msg.type != MSG_PING)
+			return -1;
+		to = msg.body.ping.reply_ip;
+		d_port = msg.body.ping.port;
+	}
+
+	for (i = 0; i < ping_count ; i++) {
+		struct timespec sleep_time = {
+			.tv_sec = 0,
+			.tv_nsec = ping_delay_nsec,
+		};
+
+		ping_succeeded += !func(ping_sock, to, d_port, buf, page_size);
+		nanosleep(&sleep_time, 0);
+	}
+
+	close(ping_sock[0]);
+	close(ping_sock[1]);
+
+	strncpy(to_str, inet_ntoa(*(struct in_addr *)&to), IPV4_STR_SZ - 1);
+	strncpy(from_str, inet_ntoa(from), IPV4_STR_SZ - 1);
+
+	if (ping_succeeded < ping_success) {
+		pr_err("ping (%s) %s->%s failed %u/%u times",
+			init_side ? "send" : "reply", from_str, to_str,
+			ping_count - ping_succeeded, ping_count);
+		return -1;
+	}
+
+	pr_debug("ping (%s) %s->%s succeeded %u/%u times",
+		init_side ? "send" : "reply", from_str, to_str,
+		ping_succeeded, ping_count);
+
+	return 0;
+}
+
+static int randomize_buffer(void *buf, size_t buflen)
+{
+	int random_bytes = 0;
+
+	if (!buflen)
+		return 0;
+
+	do {
+		random_bytes += syscall(SYS_getrandom, buf, buflen, 0);
+	} while (random_bytes > 0 && random_bytes < buflen);
+
+	if (random_bytes < 0) {
+		pr_err("get_random() failed: %d\n", random_bytes);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int xfrm_fill_key(char *name, char *buf,
+		size_t buf_len, unsigned int *key_len)
+{
+	/* XXX: use set/map instead of all this */
+	if (strncmp(name, "digest_null", ALGO_LEN) == 0)
+		*key_len = 0;
+	else if (strncmp(name, "ecb(cipher_null)", ALGO_LEN) == 0)
+		*key_len = 0;
+	else if (strncmp(name, "cbc(des)", ALGO_LEN) == 0)
+		*key_len = 64;
+	else if (strncmp(name, "hmac(md5)", ALGO_LEN) == 0)
+		*key_len = 128;
+	else if (strncmp(name, "cmac(aes)", ALGO_LEN) == 0)
+		*key_len = 128;
+	else if (strncmp(name, "xcbc(aes)", ALGO_LEN) == 0)
+		*key_len = 128;
+	else if (strncmp(name, "cbc(cast5)", ALGO_LEN) == 0)
+		*key_len = 128;
+	else if (strncmp(name, "cbc(serpent)", ALGO_LEN) == 0)
+		*key_len = 128;
+	else if (strncmp(name, "hmac(sha1)", ALGO_LEN) == 0)
+		*key_len = 160;
+	else if (strncmp(name, "hmac(rmd160)", ALGO_LEN) == 0)
+		*key_len = 160;
+	else if (strncmp(name, "cbc(des3_ede)", ALGO_LEN) == 0)
+		*key_len = 192;
+	else if (strncmp(name, "hmac(sha256)", ALGO_LEN) == 0)
+		*key_len = 256;
+	else if (strncmp(name, "cbc(aes)", ALGO_LEN) == 0)
+		*key_len = 256;
+	else if (strncmp(name, "cbc(camellia)", ALGO_LEN) == 0)
+		*key_len = 256;
+	else if (strncmp(name, "cbc(twofish)", ALGO_LEN) == 0)
+		*key_len = 256;
+	else if (strncmp(name, "rfc3686(ctr(aes))", ALGO_LEN) == 0)
+		*key_len = 288;
+	else if (strncmp(name, "hmac(sha384)", ALGO_LEN) == 0)
+		*key_len = 384;
+	else if (strncmp(name, "cbc(blowfish)", ALGO_LEN) == 0)
+		*key_len = 448;
+	else if (strncmp(name, "hmac(sha512)", ALGO_LEN) == 0)
+		*key_len = 512;
+	else if (strncmp(name, "rfc4106(gcm(aes))-128", ALGO_LEN) == 0)
+		*key_len = 160;
+	else if (strncmp(name, "rfc4543(gcm(aes))-128", ALGO_LEN) == 0)
+		*key_len = 160;
+	else if (strncmp(name, "rfc4309(ccm(aes))-128", ALGO_LEN) == 0)
+		*key_len = 152;
+	else if (strncmp(name, "rfc4106(gcm(aes))-192", ALGO_LEN) == 0)
+		*key_len = 224;
+	else if (strncmp(name, "rfc4543(gcm(aes))-192", ALGO_LEN) == 0)
+		*key_len = 224;
+	else if (strncmp(name, "rfc4309(ccm(aes))-192", ALGO_LEN) == 0)
+		*key_len = 216;
+	else if (strncmp(name, "rfc4106(gcm(aes))-256", ALGO_LEN) == 0)
+		*key_len = 288;
+	else if (strncmp(name, "rfc4543(gcm(aes))-256", ALGO_LEN) == 0)
+		*key_len = 288;
+	else if (strncmp(name, "rfc4309(ccm(aes))-256", ALGO_LEN) == 0)
+		*key_len = 280;
+	else if (strncmp(name, "rfc7539(chacha20,poly1305)-128", ALGO_LEN) == 0)
+		*key_len = 0;
+
+	if (*key_len > buf_len) {
+		pr_err("Can't pack a key - too big for buffer");
+		return -1;
+	}
+
+	return randomize_buffer(buf, *key_len);
+}
+
+static int xfrm_state_pack_algo(struct nlmsghdr *nh, size_t req_sz,
+		struct xfrm_desc *desc)
+{
+	struct {
+		union {
+			struct xfrm_algo	alg;
+			struct xfrm_algo_aead	aead;
+			struct xfrm_algo_auth	auth;
+		} u;
+		char buf[XFRM_ALGO_KEY_BUF_SIZE];
+	} alg = {};
+	size_t alen, elen, clen, aelen;
+	unsigned short type;
+
+	alen = strlen(desc->a_algo);
+	elen = strlen(desc->e_algo);
+	clen = strlen(desc->c_algo);
+	aelen = strlen(desc->ae_algo);
+
+	/* Verify desc */
+	switch (desc->proto) {
+	case IPPROTO_AH:
+		if (!alen || elen || clen || aelen) {
+			pr_err("BUG: buggy ah desc");
+			return -1;
+		}
+		strncpy(alg.u.alg.alg_name, desc->a_algo, ALGO_LEN);
+		if (xfrm_fill_key(desc->a_algo, alg.u.alg.alg_key,
+				sizeof(alg.buf), &alg.u.alg.alg_key_len))
+			return -1;
+		type = XFRMA_ALG_AUTH;
+		break;
+	case IPPROTO_COMP:
+		if (!clen || elen || alen || aelen) {
+			pr_err("BUG: buggy comp desc");
+			return -1;
+		}
+		strncpy(alg.u.alg.alg_name, desc->c_algo, ALGO_LEN);
+		if (xfrm_fill_key(desc->c_algo, alg.u.alg.alg_key,
+				sizeof(alg.buf), &alg.u.alg.alg_key_len))
+			return -1;
+		type = XFRMA_ALG_COMP;
+		break;
+	case IPPROTO_ESP:
+		if (!((alen && elen) ^ aelen) || clen) {
+			pr_err("BUG: buggy esp desc");
+			return -1;
+		}
+		if (aelen) {
+			alg.u.aead.alg_icv_len = desc->icv_len;
+			strncpy(alg.u.aead.alg_name, desc->ae_algo, ALGO_LEN);
+			if (xfrm_fill_key(desc->ae_algo, alg.u.aead.alg_key,
+						sizeof(alg.buf), &alg.u.aead.alg_key_len))
+				return -1;
+			type = XFRMA_ALG_AEAD;
+		} else {
+
+			strncpy(alg.u.alg.alg_name, desc->e_algo, ALGO_LEN);
+			type = XFRMA_ALG_CRYPT;
+			if (xfrm_fill_key(desc->e_algo, alg.u.alg.alg_key,
+						sizeof(alg.buf), &alg.u.alg.alg_key_len))
+				return -1;
+			if (rtattr_pack(nh, req_sz, type, &alg, sizeof(alg)))
+				return -1;
+
+			strncpy(alg.u.alg.alg_name, desc->a_algo, ALGO_LEN);
+			type = XFRMA_ALG_AUTH;
+			if (xfrm_fill_key(desc->a_algo, alg.u.alg.alg_key,
+						sizeof(alg.buf), &alg.u.alg.alg_key_len))
+				return -1;
+		}
+		break;
+	default:
+		pr_err("BUG: unknown proto in desc");
+		return -1;
+	}
+
+	if (rtattr_pack(nh, req_sz, type, &alg, sizeof(alg)))
+		return -1;
+
+	return 0;
+}
+
+static inline uint32_t gen_spi(struct in_addr src)
+{
+	return htonl(inet_lnaof(src));
+}
+
+static int xfrm_state_add(int xfrm_sock, uint32_t seq, uint32_t spi,
+		struct in_addr src, struct in_addr dst,
+		struct xfrm_desc *desc)
+{
+	struct {
+		struct nlmsghdr		nh;
+		struct xfrm_usersa_info	info;
+		char			attrbuf[MAX_PAYLOAD];
+	} req;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.info));
+	req.nh.nlmsg_type	= XFRM_MSG_NEWSA;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= seq;
+
+	/* Fill selector. */
+	memcpy(&req.info.sel.daddr, &dst, sizeof(dst));
+	memcpy(&req.info.sel.saddr, &src, sizeof(src));
+	req.info.sel.family		= AF_INET;
+	req.info.sel.prefixlen_d	= PREFIX_LEN;
+	req.info.sel.prefixlen_s	= PREFIX_LEN;
+
+	/* Fill id */
+	memcpy(&req.info.id.daddr, &dst, sizeof(dst));
+	/* Note: zero-spi cannot be deleted */
+	req.info.id.spi = spi;
+	req.info.id.proto	= desc->proto;
+
+	memcpy(&req.info.saddr, &src, sizeof(src));
+
+	/* Fill lifteme_cfg */
+	req.info.lft.soft_byte_limit	= XFRM_INF;
+	req.info.lft.hard_byte_limit	= XFRM_INF;
+	req.info.lft.soft_packet_limit	= XFRM_INF;
+	req.info.lft.hard_packet_limit	= XFRM_INF;
+
+	req.info.family		= AF_INET;
+	req.info.mode		= XFRM_MODE_TUNNEL;
+
+	/* XXX: Fill seq, reqid, replay_window, flags? */
+
+	if (xfrm_state_pack_algo(&req.nh, sizeof(req), desc))
+		return -1;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+
+	return netlink_check_answer(xfrm_sock);
+}
+
+static int xfrm_set(int xfrm_sock, uint32_t *seq,
+		struct in_addr src, struct in_addr dst,
+		struct in_addr tunsrc, struct in_addr tundst,
+		struct xfrm_desc *desc)
+{
+	int err;
+
+	err = xfrm_state_add(xfrm_sock, (*seq)++, gen_spi(src), src, dst, desc);
+	if (err) {
+		pr_err("Failed to add xfrm state");
+		return -1;
+	}
+
+	err = xfrm_state_add(xfrm_sock, (*seq)++, gen_spi(src), dst, src, desc);
+	if (err) {
+		pr_err("Failed to add xfrm state");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int xfrm_policy_add(int xfrm_sock, uint32_t seq, uint32_t spi,
+		struct in_addr src, struct in_addr dst, uint8_t dir,
+		struct in_addr tunsrc, struct in_addr tundst, uint8_t proto)
+{
+	struct {
+		struct nlmsghdr			nh;
+		struct xfrm_userpolicy_info	info;
+		char				attrbuf[MAX_PAYLOAD];
+	} req;
+	struct xfrm_user_tmpl tmpl;
+
+	memset(&req, 0, sizeof(req));
+	memset(&tmpl, 0, sizeof(tmpl));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.info));
+	req.nh.nlmsg_type	= XFRM_MSG_NEWPOLICY;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= seq;
+
+	/* Fill selector. */
+	memcpy(&req.info.sel.daddr, &dst, sizeof(tundst));
+	memcpy(&req.info.sel.saddr, &src, sizeof(tunsrc));
+	req.info.sel.family		= AF_INET;
+	req.info.sel.prefixlen_d	= PREFIX_LEN;
+	req.info.sel.prefixlen_s	= PREFIX_LEN;
+
+	/* Fill lifteme_cfg */
+	req.info.lft.soft_byte_limit	= XFRM_INF;
+	req.info.lft.hard_byte_limit	= XFRM_INF;
+	req.info.lft.soft_packet_limit	= XFRM_INF;
+	req.info.lft.hard_packet_limit	= XFRM_INF;
+
+	req.info.dir = dir;
+
+	/* Fill tmpl */
+	memcpy(&tmpl.id.daddr, &dst, sizeof(dst));
+	/* Note: zero-spi cannot be deleted */
+	tmpl.id.spi = spi;
+	tmpl.id.proto	= proto;
+	tmpl.family	= AF_INET;
+	memcpy(&tmpl.saddr, &src, sizeof(src));
+	tmpl.mode	= XFRM_MODE_TUNNEL;
+	tmpl.aalgos = (~(uint32_t)0);
+	tmpl.ealgos = (~(uint32_t)0);
+	tmpl.calgos = (~(uint32_t)0);
+
+	if (rtattr_pack(&req.nh, sizeof(req), XFRMA_TMPL, &tmpl, sizeof(tmpl)))
+		return -1;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+
+	return netlink_check_answer(xfrm_sock);
+}
+
+static int xfrm_prepare(int xfrm_sock, uint32_t *seq,
+		struct in_addr src, struct in_addr dst,
+		struct in_addr tunsrc, struct in_addr tundst, uint8_t proto)
+{
+	if (xfrm_policy_add(xfrm_sock, (*seq)++, gen_spi(src), src, dst,
+				XFRM_POLICY_OUT, tunsrc, tundst, proto)) {
+		pr_err("Failed to add xfrm policy");
+		return -1;
+	}
+
+	if (xfrm_policy_add(xfrm_sock, (*seq)++, gen_spi(src), dst, src,
+				XFRM_POLICY_IN, tunsrc, tundst, proto)) {
+		pr_err("Failed to add xfrm policy");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int xfrm_policy_del(int xfrm_sock, uint32_t seq,
+		struct in_addr src, struct in_addr dst, uint8_t dir,
+		struct in_addr tunsrc, struct in_addr tundst)
+{
+	struct {
+		struct nlmsghdr			nh;
+		struct xfrm_userpolicy_id	id;
+		char				attrbuf[MAX_PAYLOAD];
+	} req;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.id));
+	req.nh.nlmsg_type	= XFRM_MSG_DELPOLICY;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= seq;
+
+	/* Fill id */
+	memcpy(&req.id.sel.daddr, &dst, sizeof(tundst));
+	memcpy(&req.id.sel.saddr, &src, sizeof(tunsrc));
+	req.id.sel.family		= AF_INET;
+	req.id.sel.prefixlen_d		= PREFIX_LEN;
+	req.id.sel.prefixlen_s		= PREFIX_LEN;
+	req.id.dir = dir;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+
+	return netlink_check_answer(xfrm_sock);
+}
+
+static int xfrm_cleanup(int xfrm_sock, uint32_t *seq,
+		struct in_addr src, struct in_addr dst,
+		struct in_addr tunsrc, struct in_addr tundst)
+{
+	if (xfrm_policy_del(xfrm_sock, (*seq)++, src, dst,
+				XFRM_POLICY_OUT, tunsrc, tundst)) {
+		pr_err("Failed to add xfrm policy");
+		return -1;
+	}
+
+	if (xfrm_policy_del(xfrm_sock, (*seq)++, dst, src,
+				XFRM_POLICY_IN, tunsrc, tundst)) {
+		pr_err("Failed to add xfrm policy");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int xfrm_state_del(int xfrm_sock, uint32_t seq, uint32_t spi,
+		struct in_addr src, struct in_addr dst, uint8_t proto)
+{
+	struct {
+		struct nlmsghdr		nh;
+		struct xfrm_usersa_id	id;
+		char			attrbuf[MAX_PAYLOAD];
+	} req;
+	xfrm_address_t saddr = {};
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.id));
+	req.nh.nlmsg_type	= XFRM_MSG_DELSA;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= seq;
+
+	memcpy(&req.id.daddr, &dst, sizeof(dst));
+	req.id.family		= AF_INET;
+	req.id.proto		= proto;
+	/* Note: zero-spi cannot be deleted */
+	req.id.spi = spi;
+
+	memcpy(&saddr, &src, sizeof(src));
+	if (rtattr_pack(&req.nh, sizeof(req), XFRMA_SRCADDR, &saddr, sizeof(saddr)))
+		return -1;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+
+	return netlink_check_answer(xfrm_sock);
+}
+
+static int xfrm_delete(int xfrm_sock, uint32_t *seq,
+		struct in_addr src, struct in_addr dst,
+		struct in_addr tunsrc, struct in_addr tundst, uint8_t proto)
+{
+	if (xfrm_state_del(xfrm_sock, (*seq)++, gen_spi(src), src, dst, proto)) {
+		pr_err("Failed to remove xfrm state");
+		return -1;
+	}
+
+	if (xfrm_state_del(xfrm_sock, (*seq)++, gen_spi(src), dst, src, proto)) {
+		pr_err("Failed to remove xfrm state");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int xfrm_state_allocspi(int xfrm_sock, uint32_t *seq,
+		uint32_t spi, uint8_t proto)
+{
+	struct {
+		struct nlmsghdr			nh;
+		struct xfrm_userspi_info	spi;
+	} req;
+	struct {
+		struct nlmsghdr			nh;
+		union {
+			struct xfrm_usersa_info	info;
+			int error;
+		};
+	} answer;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.spi));
+	req.nh.nlmsg_type	= XFRM_MSG_ALLOCSPI;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST;
+	req.nh.nlmsg_seq	= (*seq)++;
+
+	req.spi.info.family	= AF_INET;
+	req.spi.min		= spi;
+	req.spi.max		= spi;
+	req.spi.info.id.proto	= proto;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+
+	if (recv(xfrm_sock, &answer, sizeof(answer), 0) < 0) {
+		pr_perror("recv()");
+		return -1;
+	} else if (answer.nh.nlmsg_type == XFRM_MSG_NEWSA) {
+		uint32_t new_spi = htonl(answer.info.id.spi);
+
+		if (new_spi != spi) {
+			pr_err("allocated spi is different from requested: %#x != %#x",
+					new_spi, spi);
+			return -1;
+		}
+		return 0;
+	} else if (answer.nh.nlmsg_type != NLMSG_ERROR) {
+		pr_err("expected NLMSG_ERROR, got %d", (int)answer.nh.nlmsg_type);
+		return -1;
+	}
+
+	pr_err("NLMSG_ERROR: %d: %s", answer.error, strerror(-answer.error));
+	return answer.error;
+}
+
+static int netlink_sock_bind(int *sock, uint32_t *seq, int proto, uint32_t groups)
+{
+	struct sockaddr_nl snl = {};
+	socklen_t addr_len;
+	int ret = -1;
+
+	snl.nl_family = AF_NETLINK;
+	snl.nl_groups = groups;
+
+	if (netlink_sock(sock, seq, proto)) {
+		pr_err("Failed to open xfrm netlink socket");
+		return -1;
+	}
+
+	if (bind(*sock, (struct sockaddr *)&snl, sizeof(snl)) < 0) {
+		pr_perror("bind()");
+		goto out_close;
+	}
+
+	addr_len = sizeof(snl);
+	if (getsockname(*sock, (struct sockaddr *)&snl, &addr_len) < 0) {
+		pr_perror("getsockname()");
+		goto out_close;
+	}
+	if (addr_len != sizeof(snl)) {
+		pr_err("Wrong address length %d", addr_len);
+		goto out_close;
+	}
+	if (snl.nl_family != AF_NETLINK) {
+		pr_err("Wrong address family %d", snl.nl_family);
+		goto out_close;
+	}
+	return 0;
+
+out_close:
+	close(*sock);
+	return ret;
+}
+
+static int xfrm_monitor_acquire(int xfrm_sock, uint32_t *seq, unsigned int nr)
+{
+	struct {
+		struct nlmsghdr nh;
+		union {
+			struct xfrm_user_acquire acq;
+			int error;
+		};
+		char attrbuf[MAX_PAYLOAD];
+	} req;
+	struct xfrm_user_tmpl xfrm_tmpl = {};
+	int xfrm_listen = -1, ret = -1;
+	uint32_t seq_listen;
+
+	if (netlink_sock_bind(&xfrm_listen, &seq_listen, NETLINK_XFRM, XFRMNLGRP_ACQUIRE))
+		return -1;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.acq));
+	req.nh.nlmsg_type	= XFRM_MSG_ACQUIRE;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= (*seq)++;
+
+	req.acq.policy.sel.family	= AF_INET;
+	req.acq.aalgos	= 0xfeed;
+	req.acq.ealgos	= 0xbaad;
+	req.acq.calgos	= 0xbabe;
+
+	xfrm_tmpl.family = AF_INET;
+	xfrm_tmpl.id.proto = IPPROTO_ESP;
+	if (rtattr_pack(&req.nh, sizeof(req), XFRMA_TMPL, &xfrm_tmpl, sizeof(xfrm_tmpl)))
+		goto out_close;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		goto out_close;
+	}
+
+	if (recv(xfrm_sock, &req, sizeof(req), 0) < 0) {
+		pr_perror("recv()");
+		goto out_close;
+	} else if (req.nh.nlmsg_type != NLMSG_ERROR) {
+		pr_err("expected NLMSG_ERROR, got %d", (int)req.nh.nlmsg_type);
+		goto out_close;
+	}
+
+	if (req.error) {
+		pr_err("NLMSG_ERROR: %d: %s", req.error, strerror(-req.error));
+		ret = req.error;
+		goto out_close;
+	}
+
+	if (recv(xfrm_listen, &req, sizeof(req), 0) < 0) {
+		pr_perror("recv()");
+		goto out_close;
+	}
+
+	if (req.acq.aalgos != 0xfeed || req.acq.ealgos != 0xbaad
+			|| req.acq.calgos != 0xbabe) {
+		pr_err("xfrm_user_acquire has changed  %x %x %x",
+				req.acq.aalgos, req.acq.ealgos, req.acq.calgos);
+		goto out_close;
+	}
+
+	ret = 0;
+out_close:
+	close(xfrm_listen);
+	return ret;
+}
+
+static int xfrm_expire_state(int xfrm_sock, uint32_t *seq,
+		unsigned int nr, struct xfrm_desc *desc)
+{
+	struct {
+		struct nlmsghdr nh;
+		union {
+			struct xfrm_user_expire expire;
+			int error;
+		};
+	} req;
+	struct in_addr src, dst;
+	int xfrm_listen = -1, ret = -1;
+	uint32_t seq_listen;
+
+	src = inet_makeaddr(INADDR_B, child_ip(nr));
+	dst = inet_makeaddr(INADDR_B, grchild_ip(nr));
+
+	if (xfrm_state_add(xfrm_sock, (*seq)++, gen_spi(src), src, dst, desc)) {
+		pr_err("Failed to add xfrm state");
+		return -1;
+	}
+
+	if (netlink_sock_bind(&xfrm_listen, &seq_listen, NETLINK_XFRM, XFRMNLGRP_EXPIRE))
+		return -1;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.expire));
+	req.nh.nlmsg_type	= XFRM_MSG_EXPIRE;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= (*seq)++;
+
+	memcpy(&req.expire.state.id.daddr, &dst, sizeof(dst));
+	req.expire.state.id.spi		= gen_spi(src);
+	req.expire.state.id.proto	= desc->proto;
+	req.expire.state.family		= AF_INET;
+	req.expire.hard			= 0xff;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		goto out_close;
+	}
+
+	if (recv(xfrm_sock, &req, sizeof(req), 0) < 0) {
+		pr_perror("recv()");
+		goto out_close;
+	} else if (req.nh.nlmsg_type != NLMSG_ERROR) {
+		pr_err("expected NLMSG_ERROR, got %d", (int)req.nh.nlmsg_type);
+		goto out_close;
+	}
+
+	if (req.error) {
+		pr_err("NLMSG_ERROR: %d: %s", req.error, strerror(-req.error));
+		ret = req.error;
+		goto out_close;
+	}
+
+	if (recv(xfrm_listen, &req, sizeof(req), 0) < 0) {
+		pr_perror("recv()");
+		goto out_close;
+	}
+
+	if (req.expire.hard != 0x1) {
+		pr_err("expire.hard is not set: %x", req.expire.hard);
+		goto out_close;
+	}
+
+	ret = 0;
+out_close:
+	close(xfrm_listen);
+	return ret;
+}
+
+static int xfrm_expire_policy(int xfrm_sock, uint32_t *seq,
+		unsigned int nr, struct xfrm_desc *desc)
+{
+	struct {
+		struct nlmsghdr nh;
+		union {
+			struct xfrm_user_polexpire expire;
+			int error;
+		};
+	} req;
+	struct in_addr src, dst, tunsrc, tundst;
+	int xfrm_listen = -1, ret = -1;
+	uint32_t seq_listen;
+
+	src = inet_makeaddr(INADDR_B, child_ip(nr));
+	dst = inet_makeaddr(INADDR_B, grchild_ip(nr));
+	tunsrc = inet_makeaddr(INADDR_A, child_ip(nr));
+	tundst = inet_makeaddr(INADDR_A, grchild_ip(nr));
+
+	if (xfrm_policy_add(xfrm_sock, (*seq)++, gen_spi(src), src, dst,
+				XFRM_POLICY_OUT, tunsrc, tundst, desc->proto)) {
+		pr_err("Failed to add xfrm policy");
+		return -1;
+	}
+
+	if (netlink_sock_bind(&xfrm_listen, &seq_listen, NETLINK_XFRM, XFRMNLGRP_EXPIRE))
+		return -1;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.expire));
+	req.nh.nlmsg_type	= XFRM_MSG_POLEXPIRE;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= (*seq)++;
+
+	/* Fill selector. */
+	memcpy(&req.expire.pol.sel.daddr, &dst, sizeof(tundst));
+	memcpy(&req.expire.pol.sel.saddr, &src, sizeof(tunsrc));
+	req.expire.pol.sel.family	= AF_INET;
+	req.expire.pol.sel.prefixlen_d	= PREFIX_LEN;
+	req.expire.pol.sel.prefixlen_s	= PREFIX_LEN;
+	req.expire.pol.dir		= XFRM_POLICY_OUT;
+	req.expire.hard			= 0xff;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		goto out_close;
+	}
+
+	if (recv(xfrm_sock, &req, sizeof(req), 0) < 0) {
+		pr_perror("recv()");
+		goto out_close;
+	} else if (req.nh.nlmsg_type != NLMSG_ERROR) {
+		pr_err("expected NLMSG_ERROR, got %d", (int)req.nh.nlmsg_type);
+		goto out_close;
+	}
+
+	if (req.error) {
+		pr_err("NLMSG_ERROR: %d: %s", req.error, strerror(-req.error));
+		ret = req.error;
+		goto out_close;
+	}
+
+	if (recv(xfrm_listen, &req, sizeof(req), 0) < 0) {
+		pr_perror("recv()");
+		goto out_close;
+	}
+
+	if (req.expire.hard != 0x1) {
+		pr_err("expire.hard is not set: %x", req.expire.hard);
+		goto out_close;
+	}
+
+	ret = 0;
+out_close:
+	close(xfrm_listen);
+	return ret;
+}
+
+static void print_desc(char *lvl, char *msg, struct xfrm_desc *desc)
+{
+	printk("%s: [%u, '%s', '%s', '%s', '%s', %u]", lvl, msg,
+		(unsigned int)desc->proto, desc->a_algo, desc->e_algo,
+		desc->c_algo, desc->ae_algo, desc->icv_len);
+}
+
+static int child_serv(int xfrm_sock, uint32_t *seq,
+		unsigned int nr, int cmd_fd, void *buf, struct xfrm_desc *desc)
+{
+	struct in_addr src, dst, tunsrc, tundst;
+	struct test_desc msg;
+	int ret = -1;
+
+	src = inet_makeaddr(INADDR_B, child_ip(nr));
+	dst = inet_makeaddr(INADDR_B, grchild_ip(nr));
+	tunsrc = inet_makeaddr(INADDR_A, child_ip(nr));
+	tundst = inet_makeaddr(INADDR_A, grchild_ip(nr));
+
+	/* UDP pinging without xfrm */
+	if (do_ping(cmd_fd, buf, page_size, src, true, 0, 0, udp_ping_send)) {
+		pr_err("ping failed before setting xfrm");
+		return -1;
+	}
+
+	memset(&msg, 0, sizeof(msg));
+	msg.type = MSG_XFRM_PREPARE;
+	memcpy(&msg.body.xfrm_desc, desc, sizeof(*desc));
+	write_msg(cmd_fd, &msg);
+
+	if (xfrm_prepare(xfrm_sock, seq, src, dst, tunsrc, tundst, desc->proto)) {
+		print_desc("ERR", "failed to prepare xfrm", desc);
+		goto cleanup;
+	}
+
+	memset(&msg, 0, sizeof(msg));
+	msg.type = MSG_XFRM_ADD;
+	memcpy(&msg.body.xfrm_desc, desc, sizeof(*desc));
+	write_msg(cmd_fd, &msg);
+	if (xfrm_set(xfrm_sock, seq, src, dst, tunsrc, tundst, desc)) {
+		print_desc("ERR", "failed to set xfrm", desc);
+		goto cleanup;
+	}
+
+	/* UDP pinging with xfrm tunnel */
+	if (do_ping(cmd_fd, buf, page_size, tunsrc,
+				true, 0, 0, udp_ping_send)) {
+		print_desc("ERR", "ping failed for xfrm", desc);
+		goto delete;
+	}
+
+	print_desc("OK", "xfrm", desc);
+	ret = 0;
+delete:
+	/* xfrm delete */
+	memset(&msg, 0, sizeof(msg));
+	msg.type = MSG_XFRM_DEL;
+	memcpy(&msg.body.xfrm_desc, desc, sizeof(*desc));
+	write_msg(cmd_fd, &msg);
+
+	if (xfrm_delete(xfrm_sock, seq, src, dst, tunsrc, tundst, desc->proto)) {
+		print_desc("ERR", "ping to remove xfrm", desc);
+		ret = -1;
+	}
+
+cleanup:
+	memset(&msg, 0, sizeof(msg));
+	msg.type = MSG_XFRM_CLEANUP;
+	memcpy(&msg.body.xfrm_desc, desc, sizeof(*desc));
+	write_msg(cmd_fd, &msg);
+	if (xfrm_cleanup(xfrm_sock, seq, src, dst, tunsrc, tundst)) {
+		print_desc("ERR", "ping to cleanup xfrm", desc);
+		ret = -1;
+	}
+	return ret;
+}
+
+static int child_f(unsigned int nr, int test_desc_fd, int cmd_fd, void *buf)
+{
+	struct xfrm_desc desc;
+	struct test_desc msg;
+	int xfrm_sock = -1;
+	uint32_t seq;
+	int ret = 1;
+
+	if (switch_ns(nsfd_childa))
+		exit(1);
+
+	if (netlink_sock(&xfrm_sock, &seq, NETLINK_XFRM)) {
+		pr_err("Failed to open xfrm netlink socket");
+		return -1;
+	}
+
+	/* Check that seq sock is ready, just for sure. */
+	memset(&msg, 0, sizeof(msg));
+	msg.type = MSG_ACK;
+	write_msg(cmd_fd, &msg);
+	read_msg(cmd_fd, &msg);
+	if (msg.type != MSG_ACK) {
+		pr_err("Ack failed");
+		exit(1);
+	}
+
+	for (;;) {
+		ssize_t received = read(test_desc_fd, &desc, sizeof(desc));
+
+		if (received == 0) /* EOF */
+			break;
+
+		if (received != sizeof(desc)) {
+			pr_perror("read() returned %zd", received);
+			goto exit;
+		}
+
+		switch (desc.type) {
+		case CREATE_TUNNEL:
+			if (child_serv(xfrm_sock, &seq, nr, cmd_fd, buf, &desc))
+				goto exit;
+			break;
+		case ALLOCATE_SPI:
+			if (xfrm_state_allocspi(xfrm_sock, &seq, -1, desc.proto)) {
+				pr_err("allocspi failed");
+				goto exit;
+			}
+			pr_ok("allocspi");
+			break;
+		case MONITOR_ACQUIRE:
+			if (xfrm_monitor_acquire(xfrm_sock, &seq, nr)) {
+				pr_err("monitor acqure failed");
+				goto exit;
+			}
+			pr_ok("monitor acqure");
+			break;
+		case EXPIRE_STATE:
+			if (xfrm_expire_state(xfrm_sock, &seq, nr, &desc)) {
+				pr_err("expire state failed");
+				goto exit;
+			}
+			pr_ok("expire state");
+			break;
+		case EXPIRE_POLICY:
+			if (xfrm_expire_policy(xfrm_sock, &seq, nr, &desc)) {
+				pr_err("expire policy failed");
+				goto exit;
+			}
+			pr_ok("expire policy");
+			break;
+		default:
+			pr_err("Unknown desc type");
+			goto exit;
+		}
+	}
+
+	ret = 0;
+exit:
+	close(xfrm_sock);
+
+	msg.type = MSG_EXIT;
+	write_msg(cmd_fd, &msg);
+	exit(ret);
+}
+
+static int grand_child_serv(unsigned int nr, int cmd_fd, void *buf,
+		struct test_desc *msg, int xfrm_sock, uint32_t *seq)
+{
+	struct in_addr src, dst, tunsrc, tundst;
+	bool tun_reply;
+	struct xfrm_desc *desc = &msg->body.xfrm_desc;
+
+	src = inet_makeaddr(INADDR_B, grchild_ip(nr));
+	dst = inet_makeaddr(INADDR_B, child_ip(nr));
+	tunsrc = inet_makeaddr(INADDR_A, grchild_ip(nr));
+	tundst = inet_makeaddr(INADDR_A, child_ip(nr));
+
+	switch (msg->type) {
+	case MSG_EXIT:
+		exit(0);
+	case MSG_ACK:
+		write_msg(cmd_fd, msg);
+		break;
+	case MSG_PING:
+		tun_reply = memcmp(&dst, &msg->body.ping.reply_ip, sizeof(in_addr_t));
+		/* UDP pinging without xfrm */
+		if (do_ping(cmd_fd, buf, page_size, tun_reply ? tunsrc : src,
+				false, msg->body.ping.port,
+				msg->body.ping.reply_ip, udp_ping_reply)) {
+			pr_err("ping failed before setting xfrm");
+			return -1;
+		}
+		break;
+	case MSG_XFRM_PREPARE:
+		if (xfrm_prepare(xfrm_sock, seq, src, dst, tunsrc, tundst,
+					desc->proto)) {
+			print_desc("ERR", "failed to prepare xfrm", desc);
+			return -1;
+		}
+		break;
+	case MSG_XFRM_ADD:
+		if (xfrm_set(xfrm_sock, seq, src, dst, tunsrc, tundst, desc)) {
+			print_desc("ERR", "failed to set xfrm", desc);
+			return -1;
+		}
+		break;
+	case MSG_XFRM_DEL:
+		if (xfrm_delete(xfrm_sock, seq, src, dst, tunsrc, tundst,
+					desc->proto)) {
+			print_desc("ERR", "failed to remove xfrm", desc);
+			return -1;
+		}
+		break;
+	case MSG_XFRM_CLEANUP:
+		if (xfrm_cleanup(xfrm_sock, seq, src, dst, tunsrc, tundst)) {
+			print_desc("ERR", "failed to cleanup xfrm", desc);
+			return -1;
+		}
+		break;
+	default:
+		pr_err("got unknown msg type %d\n", msg->type);
+		return -1;
+	};
+
+	return 0;
+}
+
+static int grand_child_f(unsigned int nr, int cmd_fd, void *buf)
+{
+	struct test_desc msg;
+	int xfrm_sock = -1;
+	uint32_t seq;
+
+	if (switch_ns(nsfd_childb))
+		exit(1);
+
+	if (netlink_sock(&xfrm_sock, &seq, NETLINK_XFRM)) {
+		pr_err("Failed to open xfrm netlink socket");
+		return -1;
+	}
+
+	do {
+		read_msg(cmd_fd, &msg);
+		if (grand_child_serv(nr, cmd_fd, buf, &msg, xfrm_sock, &seq))
+			break;
+	} while (1);
+
+	close(xfrm_sock);
+	exit(1);
+}
+
+static int start_child(unsigned int nr, char *veth, int test_desc_fd[2])
+{
+	uint32_t route_seq;
+	int cmd_sock[2];
+	void *data_map;
+	pid_t child;
+
+	if (init_child(nsfd_childa, veth, child_ip(nr), grchild_ip(nr)))
+		return -1;
+
+	if (init_child(nsfd_childb, veth, grchild_ip(nr), child_ip(nr)))
+		return -1;
+
+	child = fork();
+	if (child < 0) {
+		pr_perror("fork()");
+		return -1;
+	} else if (child) {
+		/* in parent - selftest */
+		return switch_ns(nsfd_parent);
+	}
+
+	if (close(test_desc_fd[1])) {
+		pr_perror("close()");
+		return -1;
+	}
+
+	/* child */
+	data_map = mmap(0, page_size, PROT_READ | PROT_WRITE,
+			MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+	if (data_map == MAP_FAILED) {
+		pr_perror("mmap()");
+		return -1;
+	}
+	if (randomize_buffer(data_map, page_size))
+		return -1;
+
+	if (socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, cmd_sock)) {
+		pr_perror("socketpair()");
+		return -1;
+	}
+
+	child = fork();
+	if (child < 0) {
+		pr_perror("fork()");
+		return -1;
+	} else if (child) {
+		if (close(cmd_sock[0])) {
+			pr_perror("close()");
+			return -1;
+		}
+		return child_f(nr, test_desc_fd[0], cmd_sock[1], data_map);
+	}
+	if (close(cmd_sock[1])) {
+		pr_perror("close()");
+		return -1;
+	}
+	return grand_child_f(nr, cmd_sock[0], data_map);
+}
+
+static void usage_exit(char **argv)
+{
+	fprintf(stderr, "Usage: %s [nr_process]\n", argv[0]);
+	exit(1);
+}
+
+static int write_desc(int proto, int test_desc_fd,
+		char *a, char *e, char *c, char *ae)
+{
+	struct xfrm_desc desc = {};
+
+	desc.type = CREATE_TUNNEL;
+	desc.proto = proto;
+
+	if (a)
+		strncpy(desc.a_algo, a, ALGO_LEN);
+	if (e)
+		strncpy(desc.e_algo, e, ALGO_LEN);
+	if (c)
+		strncpy(desc.c_algo, c, ALGO_LEN);
+	if (ae)
+		strncpy(desc.ae_algo, ae, ALGO_LEN);
+
+	return write(test_desc_fd, &desc, sizeof(desc)) != sizeof(desc);
+}
+
+int proto_list[] = { IPPROTO_AH, IPPROTO_COMP, IPPROTO_ESP };
+char *ah_list[] = {
+	"digest_null", "hmac(md5)", "hmac(sha1)", "hmac(sha256)",
+	"hmac(sha384)", "hmac(sha512)", "hmac(rmd160)",
+	"xcbc(aes)", "cmac(aes)"
+};
+char *comp_list[] = {
+	"deflate"
+#if 0
+	/* No compression backend realization */
+	"lzs", "lzjh"
+#endif
+};
+char *e_list[] = {
+	"ecb(cipher_null)", "cbc(des)", "cbc(des3_ede)", "cbc(cast5)",
+	"cbc(blowfish)", "cbc(aes)", "cbc(serpent)", "cbc(camellia)",
+	"cbc(twofish)", "rfc3686(ctr(aes))"
+};
+char *ae_list[] = {
+#if 0
+	/* not implemented */
+	"rfc4106(gcm(aes))", "rfc4309(ccm(aes))", "rfc4543(gcm(aes))",
+	"rfc7539esp(chacha20,poly1305)"
+#endif
+};
+
+static int write_proto_plan(int fd, int proto)
+{
+	unsigned int i;
+
+	switch (proto) {
+	case IPPROTO_AH:
+		for (i = 0; i < ARRAY_SIZE(ah_list); i++) {
+			if (write_desc(proto, fd, ah_list[i], 0, 0, 0)) {
+				pr_err("writing test's desc failed");
+				return -1;
+			}
+		}
+		break;
+	case IPPROTO_COMP:
+		for (i = 0; i < ARRAY_SIZE(comp_list); i++) {
+			if (write_desc(proto, fd, 0, 0, comp_list[i], 0)) {
+				pr_err("writing test's desc failed");
+				return -1;
+			}
+		}
+		break;
+	case IPPROTO_ESP:
+		for (i = 0; i < ARRAY_SIZE(ah_list); i++) {
+			int j;
+
+			for (j = 0; j < ARRAY_SIZE(e_list); j++) {
+				if (write_desc(proto, fd, ah_list[i],
+							e_list[j], 0, 0)) {
+					pr_err("writing test's desc failed");
+					return -1;
+				}
+			}
+		}
+		for (i = 0; i < ARRAY_SIZE(ae_list); i++) {
+			if (write_desc(proto, fd, 0, 0, 0, ae_list[i])) {
+				pr_err("writing test's desc failed");
+				return -1;
+			}
+		}
+		break;
+	default:
+		pr_err("BUG: Specified unknown proto %d", proto);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int write_compat_struct_tests(int test_desc_fd)
+{
+	struct xfrm_desc desc = {};
+
+	desc.type = ALLOCATE_SPI;
+	desc.proto = IPPROTO_AH;
+	strncpy(desc.a_algo, ah_list[0], ALGO_LEN);
+
+	if (write(test_desc_fd, &desc, sizeof(desc)) != sizeof(desc))
+		return -1;
+
+	desc.type = MONITOR_ACQUIRE;
+	if (write(test_desc_fd, &desc, sizeof(desc)) != sizeof(desc))
+		return -1;
+
+	desc.type = EXPIRE_STATE;
+	if (write(test_desc_fd, &desc, sizeof(desc)) != sizeof(desc))
+		return -1;
+
+	desc.type = EXPIRE_POLICY;
+	if (write(test_desc_fd, &desc, sizeof(desc)) != sizeof(desc))
+		return -1;
+
+	return 0;
+}
+
+static int write_test_plan(int test_desc_fd)
+{
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(proto_list); i++) {
+		if (write_proto_plan(test_desc_fd, proto_list[i]))
+			return -1;
+	}
+
+	if (write_compat_struct_tests(test_desc_fd))
+		return -1;
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	unsigned int nr_process = 1;
+	int route_sock = -1, ret = 1;
+	int test_desc_fd[2];
+	uint32_t route_seq;
+	unsigned int i;
+
+	if (argc > 2)
+		usage_exit(argv);
+
+	if (argc > 1) {
+		char *endptr;
+
+		errno = 0;
+		nr_process = strtol(argv[1], &endptr, 10);
+		if ((errno == ERANGE && (nr_process == LONG_MAX || nr_process == LONG_MIN))
+				|| (errno != 0 && nr_process == 0)
+				|| (endptr == argv[1]) || (*endptr != '\0')) {
+			pr_err("Failed to parse [nr_process]");
+			usage_exit(argv);
+		}
+
+		if (nr_process > MAX_PROCESSES || !nr_process) {
+			pr_err("nr_process should be between [1; %u]", MAX_PROCESSES);
+			usage_exit(argv);
+		}
+	}
+
+	page_size = sysconf(_SC_PAGESIZE);
+	if (page_size < 1) {
+		pr_perror("sysconf()");
+		return 1;
+	}
+
+	if (pipe2(test_desc_fd, O_DIRECT) < 0) {
+		pr_perror("pipe()");
+		return 1;
+	}
+
+	if (init_namespaces()) {
+		pr_err("Failed to create namespaces");
+		return 1;
+	}
+
+	if (netlink_sock(&route_sock, &route_seq, NETLINK_ROUTE)) {
+		pr_err("Failed to open netlink route socket");
+		return 1;
+	}
+
+	for (i = 0; i < nr_process; i++) {
+		char veth[VETH_LEN];
+
+		snprintf(veth, VETH_LEN, VETH_FMT, i);
+
+		if (veth_add(route_sock, route_seq++, veth, nsfd_childa, veth, nsfd_childb)) {
+			pr_err("Failed to create veth device");
+			goto err;
+		}
+
+		if (start_child(i, veth, test_desc_fd)) {
+			pr_err("Child failed to start");
+			goto err;
+		}
+	}
+
+	if (close(test_desc_fd[0])) {
+		pr_perror("close()");
+		goto err;
+	}
+
+	ret = write_test_plan(test_desc_fd[1]);
+	/* XXX: add wait() */
+err:
+	close(route_sock);
+	return ret;
+}
-- 
2.13.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 03/18] selftest/net/xfrm: Add test for ipsec tunnel
@ 2018-07-26  2:31   ` dima
  0 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-26  2:31 UTC (permalink / raw)


It's an exhaustive testing for ipsec: covering all encryption/
authentication/compression algorithms. The tests are run in two
network namespaces, connected by veth interfaces. To make exhaustive
testing less time-consuming, the tests are run in parallel tasks,
specified by parameter to the selftest.

As the patches set adds support for xfrm in compatible tasks, there are
tests to check structures that differ in size between 64-bit and 32-bit
applications.
The selftest doesn't use libnl so that it can be easily compiled as
compatible application and don't require compatible .so.

Here is a diagram of the selftest:

                           ---------------
                           |  selftest   |
                           |  (parent)   |
                           ---------------
                              |        |
                              | (pipe) |
                              ----------
                             /   |  |   \
               /-------------   /    \   -------------\
               |          /-----      -----\          |
      ---------|----------|----------------|----------|---------
      |   ---------   ---------        ---------   ---------   |
      |   | child |   | child |  NS A  | child |   | child |   |
      |   ---------   ---------        ---------   ---------   |
      -------|------------|----------------|-------------|------
           veth0        veth1            veth2         vethN
    ---------|------------|----------------|-------------|----------
    | ------------  ------------       ------------   ------------ |
    | | gr.child |  | gr.child | NS B  | gr.child |   | gr.child | |
    | ------------  ------------       ------------   ------------ |
    ----------------------------------------------------------------

The parent sends the description of a test (xfrm parameters) to the
child, the child and grand child setup a tunnel over veth interface and
test it by sending udp packets.

Cc: Andrew Morton <akpm at linux-foundation.org>
Cc: "David S. Miller" <davem at davemloft.net>
Cc: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
Cc: Herbert Xu <herbert at gondor.apana.org.au>
Cc: Mauro Carvalho Chehab <mchehab+samsung at kernel.org>
Cc: Shuah Khan <shuah at kernel.org>
Cc: Steffen Klassert <steffen.klassert at secunet.com>
Cc: linux-kselftest at vger.kernel.org
Cc: netdev at vger.kernel.org
Signed-off-by: Dmitry Safonov <dima at arista.com>
---
 MAINTAINERS                            |    1 +
 tools/testing/selftests/net/.gitignore |    1 +
 tools/testing/selftests/net/Makefile   |    1 +
 tools/testing/selftests/net/ipsec.c    | 1987 ++++++++++++++++++++++++++++++++
 4 files changed, 1990 insertions(+)
 create mode 100644 tools/testing/selftests/net/ipsec.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 0fe4228f78cb..7e20db5d0210 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9938,6 +9938,7 @@ F:	net/ipv6/ipcomp6.c
 F:	net/ipv6/ip6_vti.c
 F:	include/uapi/linux/xfrm.h
 F:	include/net/xfrm.h
+F:	tools/testing/selftests/net/ipsec.c
 
 NETWORKING [IPv4/IPv6]
 M:	"David S. Miller" <davem at davemloft.net>
diff --git a/tools/testing/selftests/net/.gitignore b/tools/testing/selftests/net/.gitignore
index 1a0ac3a29ec5..6896547292cb 100644
--- a/tools/testing/selftests/net/.gitignore
+++ b/tools/testing/selftests/net/.gitignore
@@ -1,3 +1,4 @@
+ipsec
 msg_zerocopy
 socket
 psock_fanout
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 663e11e85727..9f35c01fbc0a 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -12,6 +12,7 @@ TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
 TEST_GEN_FILES += tcp_mmap tcp_inq psock_snd
 TEST_GEN_FILES += udpgso udpgso_bench_tx udpgso_bench_rx
+TEST_GEN_FILES += ipsec
 TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
 TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict
 
diff --git a/tools/testing/selftests/net/ipsec.c b/tools/testing/selftests/net/ipsec.c
new file mode 100644
index 000000000000..e0752e127ba7
--- /dev/null
+++ b/tools/testing/selftests/net/ipsec.c
@@ -0,0 +1,1987 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ipsec.c - Check xfrm on veth inside a net-ns.
+ * Copyright (c) 2018 Dmitry Safonov (Arista Networks)
+ */
+
+#define _GNU_SOURCE
+
+#include <arpa/inet.h>
+#include <asm/types.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <limits.h>
+#include <linux/limits.h>
+#include <linux/netlink.h>
+#include <linux/random.h>
+#include <linux/rtnetlink.h>
+#include <linux/veth.h>
+#include <linux/xfrm.h>
+#include <netinet/in.h>
+#include <net/if.h>
+#include <sched.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <time.h>
+#include <unistd.h>
+
+#define printk(fmt, lvl, ...)						\
+	fprintf(stderr, "[%s] (%s:%d)\t" fmt "\n",			\
+	lvl, __FILE__, __LINE__, ##__VA_ARGS__)
+
+#define pr_p(func, fmt, ...)	func(fmt ": %m", ##__VA_ARGS__)
+
+#define pr_err(fmt, ...)						\
+	printk(fmt, "ERR", ##__VA_ARGS__)
+#define pr_warn(fmt, ...)						\
+	printk(fmt, "WARN", ##__VA_ARGS__)
+#define pr_note(fmt, ...)						\
+	printk(fmt, "NOTE", ##__VA_ARGS__)
+#define pr_ok(fmt, ...)							\
+	printk(fmt, "OK", ##__VA_ARGS__)
+#define pr_debug(fmt, ...)						\
+	while (0) {							\
+		printk(fmt, "NOTE", ##__VA_ARGS__);			\
+	}
+
+#define pr_perror(fmt, ...)	pr_p(pr_err, fmt, ##__VA_ARGS__)
+#define pr_pwarn(fmt, ...)	pr_p(pr_warn, fmt, ##__VA_ARGS__)
+
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+#define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)]))
+
+#define IPV4_STR_SZ	16	/* xxx.xxx.xxx.xxx is longest + \0 */
+#define MAX_PAYLOAD	2048
+#define XFRM_ALGO_KEY_BUF_SIZE	512
+#define MAX_PROCESSES	(1 << 14) /* /16 mask divided by /30 subnets */
+#define INADDR_A	((in_addr_t) 0x0a000000) /* 10.0.0.0 */
+#define INADDR_B	((in_addr_t) 0xc0a80000) /* 192.168.0.0 */
+
+/* /30 mask for one veth connection */
+#define PREFIX_LEN	30
+#define child_ip(nr)	(4*nr + 1)
+#define grchild_ip(nr)	(4*nr + 2)
+
+#define VETH_FMT	"ktst-%d"
+#define VETH_LEN	10
+#define BEGIN_SEQ	(time(NULL))
+
+static int nsfd_parent	= -1;
+static int nsfd_childa	= -1;
+static int nsfd_childb	= -1;
+static long page_size;
+
+const unsigned int ping_delay_nsec	= 50 * 1000 * 1000;
+const unsigned int ping_timeout		= 300;
+const unsigned int ping_count		= 100;
+const unsigned int ping_success		= 80;
+
+static int unshare_open(void)
+{
+	const char *netns_path = "/proc/self/ns/net";
+	int fd;
+
+	if (unshare(CLONE_NEWNET) != 0) {
+		pr_pwarn("unshare()");
+		return -1;
+	}
+
+	fd = open(netns_path, O_RDONLY);
+	if (fd <= 0) {
+		pr_pwarn("open(%s)", netns_path);
+		return -1;
+	}
+
+	return fd;
+}
+
+static int switch_ns(int fd)
+{
+	if (setns(fd, CLONE_NEWNET)) {
+		pr_pwarn("setns()");
+		return -1;
+	}
+	return 0;
+}
+
+/*
+ * Running the test inside a new parent net namespace to bother less
+ * about cleanup on error-path.
+ */
+static int init_namespaces(void)
+{
+	nsfd_parent = unshare_open();
+	if (nsfd_parent <= 0)
+		return -1;
+
+	nsfd_childa = unshare_open();
+	if (nsfd_childa <= 0)
+		return -1;
+
+	if (switch_ns(nsfd_parent))
+		return -1;
+
+	nsfd_childb = unshare_open();
+	if (nsfd_childb <= 0)
+		return -1;
+
+	if (switch_ns(nsfd_parent))
+		return -1;
+	return 0;
+}
+
+static int netlink_sock(int *sock, uint32_t *seq_nr, int proto)
+{
+	int route_sock = 0;
+	uint32_t seq;
+
+	if (*sock > 0) {
+		seq_nr++;
+		return 0;
+	}
+
+	*sock = socket(AF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, proto);
+	if (*sock <= 0) {
+		pr_pwarn("socket(AF_NETLINK)");
+		return errno;
+	}
+	seq = BEGIN_SEQ;
+
+	return 0;
+}
+
+static inline struct rtattr *rtattr_hdr(struct nlmsghdr *nh)
+{
+	return (struct rtattr *)((char *)(nh) + RTA_ALIGN((nh)->nlmsg_len));
+}
+
+static int rtattr_pack(struct nlmsghdr *nh, size_t req_sz,
+		unsigned short rta_type, const void *payload, size_t size)
+{
+	/* NLMSG_ALIGNTO == RTA_ALIGNTO, nlmsg_len already aligned */
+	struct rtattr *attr = rtattr_hdr(nh);
+	size_t nl_size = RTA_ALIGN(nh->nlmsg_len) + RTA_LENGTH(size);
+
+	if (req_sz < nl_size) {
+		pr_err("req buf is too small: %zu < %zu", req_sz, nl_size);
+		return -1;
+	}
+	nh->nlmsg_len = nl_size;
+
+	attr->rta_len = RTA_LENGTH(size); /* XXX: rta_len = size? */
+	attr->rta_type = rta_type;
+	memcpy(RTA_DATA(attr), payload, size);
+
+	return 0;
+}
+
+static struct rtattr *_rtattr_begin(struct nlmsghdr *nh, size_t req_sz,
+		unsigned short rta_type, const void *payload, size_t size)
+{
+	struct rtattr *ret = rtattr_hdr(nh);
+
+	if (rtattr_pack(nh, req_sz, rta_type, payload, size))
+		return 0;
+
+	return ret;
+}
+
+static inline struct rtattr *rtattr_begin(struct nlmsghdr *nh, size_t req_sz,
+		unsigned short rta_type)
+{
+	return _rtattr_begin(nh, req_sz, rta_type, 0, 0);
+}
+
+static inline void rtattr_end(struct nlmsghdr *nh, struct rtattr *attr)
+{
+	char *nlmsg_end = (char *)nh + nh->nlmsg_len;
+
+	attr->rta_len = nlmsg_end - (char *)attr;
+}
+
+static int veth_pack_peerb(struct nlmsghdr *nh, size_t req_sz,
+		const char *peer, int ns)
+{
+	struct ifinfomsg pi;
+	struct rtattr *peer_attr;
+
+	memset(&pi, 0, sizeof(pi));
+	pi.ifi_family	= AF_UNSPEC;
+	pi.ifi_change	= 0xFFFFFFFF;
+
+	peer_attr = _rtattr_begin(nh, req_sz, VETH_INFO_PEER, &pi, sizeof(pi));
+	if (!peer_attr)
+		return -1;
+
+	if (rtattr_pack(nh, req_sz, IFLA_IFNAME, peer, strlen(peer)))
+		return -1;
+
+	if (rtattr_pack(nh, req_sz, IFLA_NET_NS_FD, &ns, sizeof(ns)))
+		return -1;
+
+	rtattr_end(nh, peer_attr);
+
+	return 0;
+}
+
+static int netlink_check_answer(int sock)
+{
+	struct nlmsgerror {
+		struct nlmsghdr hdr;
+		int error;
+		struct nlmsghdr orig_msg;
+	} answer;
+
+	if (recv(sock, &answer, sizeof(answer), 0) < 0) {
+		pr_perror("recv()");
+		return -1;
+	} else if (answer.hdr.nlmsg_type != NLMSG_ERROR) {
+		pr_err("expected NLMSG_ERROR, got %d", (int)answer.hdr.nlmsg_type);
+		return -1;
+	} else if (answer.error) {
+		pr_err("NLMSG_ERROR: %d: %s",
+			answer.error, strerror(-answer.error));
+		return answer.error;
+	}
+
+	return 0;
+}
+
+static int veth_add(int sock, uint32_t seq, const char *peera, int ns_a,
+		const char *peerb, int ns_b)
+{
+	uint16_t flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_EXCL | NLM_F_CREATE;
+	struct {
+		struct nlmsghdr		nh;
+		struct ifinfomsg	info;
+		char			attrbuf[MAX_PAYLOAD];
+	} req;
+	const char veth_type[] = "veth";
+	struct rtattr *link_info, *info_data;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.info));
+	req.nh.nlmsg_type	= RTM_NEWLINK;
+	req.nh.nlmsg_flags	= flags;
+	req.nh.nlmsg_seq	= seq;
+	req.info.ifi_family	= AF_UNSPEC;
+	req.info.ifi_change	= 0xFFFFFFFF;
+
+	if (rtattr_pack(&req.nh, sizeof(req), IFLA_IFNAME, peera, strlen(peera)))
+		return -1;
+
+	if (rtattr_pack(&req.nh, sizeof(req), IFLA_NET_NS_FD, &ns_a, sizeof(ns_a)))
+		return -1;
+
+	link_info = rtattr_begin(&req.nh, sizeof(req), IFLA_LINKINFO);
+	if (!link_info)
+		return -1;
+
+	if (rtattr_pack(&req.nh, sizeof(req), IFLA_INFO_KIND, veth_type, sizeof(veth_type)))
+		return -1;
+
+	info_data = rtattr_begin(&req.nh, sizeof(req), IFLA_INFO_DATA);
+	if (!info_data)
+		return -1;
+
+	if (veth_pack_peerb(&req.nh, sizeof(req), peerb, ns_b))
+		return -1;
+
+	rtattr_end(&req.nh, info_data);
+	rtattr_end(&req.nh, link_info);
+
+	if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+	return netlink_check_answer(sock);
+}
+
+static int ip4_addr_set(int sock, uint32_t seq, const char *intf,
+		struct in_addr addr, uint8_t prefix)
+{
+	uint16_t flags = NLM_F_REQUEST | NLM_F_ACK | NLM_F_EXCL | NLM_F_CREATE;
+	struct {
+		struct nlmsghdr		nh;
+		struct ifaddrmsg	info;
+		char			attrbuf[MAX_PAYLOAD];
+	} req;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.info));
+	req.nh.nlmsg_type	= RTM_NEWADDR;
+	req.nh.nlmsg_flags	= flags;
+	req.nh.nlmsg_seq	= seq;
+	req.info.ifa_family	= AF_INET;
+	req.info.ifa_prefixlen	= prefix;
+	req.info.ifa_index	= if_nametoindex(intf);
+
+#if 0
+	{
+		char addr_str[IPV4_STR_SZ] = {};
+
+		strncpy(addr_str, inet_ntoa(addr), IPV4_STR_SZ - 1);
+
+		pr_warn("ip addr set %s", addr_str);
+	}
+#endif
+
+	if (rtattr_pack(&req.nh, sizeof(req), IFA_LOCAL, &addr, sizeof(addr)))
+		return -1;
+
+	if (rtattr_pack(&req.nh, sizeof(req), IFA_ADDRESS, &addr, sizeof(addr)))
+		return -1;
+
+	if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+	return netlink_check_answer(sock);
+}
+
+static int link_set_up(int sock, uint32_t seq, const char *intf)
+{
+	struct {
+		struct nlmsghdr		nh;
+		struct ifinfomsg	info;
+		char			attrbuf[MAX_PAYLOAD];
+	} req;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.info));
+	req.nh.nlmsg_type	= RTM_NEWLINK;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= seq;
+	req.info.ifi_family	= AF_UNSPEC;
+	req.info.ifi_change	= 0xFFFFFFFF;
+	req.info.ifi_index	= if_nametoindex(intf);
+	req.info.ifi_flags	= IFF_UP;
+	req.info.ifi_change	= IFF_UP;
+
+	if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+	return netlink_check_answer(sock);
+}
+
+static int ip4_route_set(int sock, uint32_t seq, const char *intf,
+		struct in_addr src, struct in_addr dst)
+{
+	struct {
+		struct nlmsghdr	nh;
+		struct rtmsg	rt;
+		char		attrbuf[MAX_PAYLOAD];
+	} req;
+	unsigned int index = if_nametoindex(intf);
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.rt));
+	req.nh.nlmsg_type	= RTM_NEWROUTE;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK | NLM_F_CREATE;
+	req.nh.nlmsg_seq	= seq;
+	req.rt.rtm_family	= AF_INET;
+	req.rt.rtm_dst_len	= 32;
+	req.rt.rtm_table	= RT_TABLE_MAIN;
+	req.rt.rtm_protocol	= RTPROT_BOOT;
+	req.rt.rtm_scope	= RT_SCOPE_LINK;
+	req.rt.rtm_type		= RTN_UNICAST;
+
+	if (rtattr_pack(&req.nh, sizeof(req), RTA_DST, &dst, sizeof(dst)))
+		return -1;
+
+	if (rtattr_pack(&req.nh, sizeof(req), RTA_PREFSRC, &src, sizeof(src)))
+		return -1;
+
+	if (rtattr_pack(&req.nh, sizeof(req), RTA_OIF, &index, sizeof(index)))
+		return -1;
+
+	if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+
+	return netlink_check_answer(sock);
+}
+
+static int tunnel_set_route(int route_sock, uint32_t *route_seq, char *veth,
+		struct in_addr tunsrc, struct in_addr tundst)
+{
+	if (ip4_addr_set(route_sock, (*route_seq)++, "lo",
+			tunsrc, PREFIX_LEN)) {
+		pr_err("Failed to set ipv4 addr");
+		return -1;
+	}
+
+	if (ip4_route_set(route_sock, (*route_seq)++, veth, tunsrc, tundst)) {
+		pr_err("Failed to set ipv4 route");
+		return -1;
+	}
+}
+
+static int init_child(int nsfd, char *veth, unsigned int src, unsigned int dst)
+{
+	struct in_addr intsrc = inet_makeaddr(INADDR_B, src);
+	struct in_addr tunsrc = inet_makeaddr(INADDR_A, src);
+	struct in_addr tundst = inet_makeaddr(INADDR_A, dst);
+	int route_sock = -1, ret = -1;
+	uint32_t route_seq;
+
+	if (switch_ns(nsfd))
+		return -1;
+
+	if (netlink_sock(&route_sock, &route_seq, NETLINK_ROUTE)) {
+		pr_err("Failed to open netlink route socket in child");
+		return -1;
+	}
+
+	if (ip4_addr_set(route_sock, route_seq++, veth, intsrc, PREFIX_LEN)) {
+		pr_err("Failed to set ipv4 addr");
+		goto err;
+	}
+
+	if (link_set_up(route_sock, route_seq++, veth)) {
+		pr_err("Failed to bring up %s", veth);
+		goto err;
+	}
+
+	if (tunnel_set_route(route_sock, &route_seq, veth, tunsrc, tundst)) {
+		pr_err("Failed to add tunnel route on %s", veth);
+		goto err;
+	}
+	ret = 0;
+
+err:
+	close(route_sock);
+	return ret;
+}
+
+#define ALGO_LEN	64
+enum desc_type {
+	CREATE_TUNNEL	= 0,
+	ALLOCATE_SPI,
+	MONITOR_ACQUIRE,
+	EXPIRE_STATE,
+	EXPIRE_POLICY,
+};
+struct xfrm_desc {
+	enum desc_type	type;
+	uint8_t		proto;
+	char		a_algo[ALGO_LEN];
+	char		e_algo[ALGO_LEN];
+	char		c_algo[ALGO_LEN];
+	char		ae_algo[ALGO_LEN];
+	unsigned int	icv_len;
+	/* unsigned key_len; */
+};
+
+enum msg_type {
+	MSG_ACK		= 0,
+	MSG_EXIT,
+	MSG_PING,
+	MSG_XFRM_PREPARE,
+	MSG_XFRM_ADD,
+	MSG_XFRM_DEL,
+	MSG_XFRM_CLEANUP,
+};
+
+struct test_desc {
+	enum msg_type type;
+	union {
+		struct {
+			in_addr_t reply_ip;
+			unsigned int port;
+		} ping;
+		struct xfrm_desc xfrm_desc;
+	} body;
+};
+
+static void write_msg(int fd, struct test_desc *msg)
+{
+	ssize_t bytes = write(fd, msg, sizeof(*msg));
+
+	/* Make sure that write/read is atomic to a pipe */
+	BUILD_BUG_ON(sizeof(struct test_desc) > PIPE_BUF);
+
+	if (bytes < 0) {
+		pr_perror("write()");
+		exit(1);
+	}
+	if (bytes != sizeof(*msg)) {
+		pr_perror("sent part of the message %zd/%zu", bytes, sizeof(*msg));
+		exit(1);
+	}
+}
+
+static void read_msg(int fd, struct test_desc *msg)
+{
+	ssize_t bytes = read(fd, msg, sizeof(*msg));
+
+	if (bytes < 0) {
+		pr_perror("read()");
+		exit(1);
+	}
+	if (bytes != sizeof(*msg)) {
+		pr_perror("got incomplete message %zd/%zu", bytes, sizeof(*msg));
+		exit(1);
+	}
+}
+
+static int udp_ping_init(struct in_addr listen_ip, unsigned int u_timeout,
+		unsigned int *server_port, int sock[2])
+{
+	struct sockaddr_in server;
+	struct timeval t = { .tv_sec = 0, .tv_usec = u_timeout };
+	socklen_t s_len = sizeof(server);
+
+	sock[0] = socket(AF_INET, SOCK_DGRAM, 0);
+	if (sock[0] < 0) {
+		pr_perror("socket()");
+		return -1;
+	}
+
+	server.sin_family	= AF_INET;
+	server.sin_port		= 0;
+	memcpy(&server.sin_addr.s_addr, &listen_ip, sizeof(struct in_addr));
+
+	if (bind(sock[0], (struct sockaddr *)&server, s_len)) {
+		pr_perror("bind()");
+		goto err_close_server;
+	}
+
+	if (getsockname(sock[0], (struct sockaddr *)&server, &s_len)) {
+		pr_perror("getsockname()");
+		goto err_close_server;
+	}
+
+	*server_port = ntohs(server.sin_port);
+
+	if (setsockopt(sock[0], SOL_SOCKET, SO_RCVTIMEO, (const char *)&t, sizeof t)) {
+		pr_perror("setsockopt()");
+		goto err_close_server;
+	}
+
+	sock[1] = socket(AF_INET, SOCK_DGRAM, 0);
+	if (sock[1] < 0) {
+		pr_perror("socket()");
+		goto err_close_server;
+	}
+
+	return 0;
+
+err_close_server:
+	close(sock[0]);
+	return -1;
+}
+
+static int udp_ping_send(int sock[2], in_addr_t dest_ip, unsigned int port,
+		char *buf, size_t buf_len)
+{
+	struct sockaddr_in server;
+	const struct sockaddr *dest_addr = (struct sockaddr *)&server;
+	char *sock_buf[buf_len];
+	ssize_t r_bytes, s_bytes;
+
+	server.sin_family	= AF_INET;
+	server.sin_port		= htons(port);
+	server.sin_addr.s_addr	= dest_ip;
+
+	s_bytes = sendto(sock[1], buf, buf_len, 0, dest_addr, sizeof(server));
+	if (s_bytes < 0) {
+		pr_perror("sendto()");
+		return -1;
+	} else if (s_bytes != buf_len) {
+		pr_err("send part of the message: %zd/%zu", s_bytes, sizeof(server));
+		return -1;
+	}
+
+	r_bytes = recv(sock[0], sock_buf, buf_len, 0);
+	if (r_bytes < 0) {
+		if (errno != EAGAIN)
+			pr_perror("recv()");
+		return -1;
+	} else if (r_bytes == 0) { /* EOF */
+		pr_err("EOF on reply to ping");
+		return -1;
+	} else if (r_bytes != buf_len || memcmp(buf, sock_buf, buf_len)) {
+		pr_err("ping reply packet is corrupted %zd/%zu", r_bytes, buf_len);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int udp_ping_reply(int sock[2], in_addr_t dest_ip, unsigned int port,
+		char *buf, size_t buf_len)
+{
+	struct sockaddr_in server;
+	const struct sockaddr *dest_addr = (struct sockaddr *)&server;
+	char *sock_buf[buf_len];
+	ssize_t r_bytes, s_bytes;
+
+	server.sin_family	= AF_INET;
+	server.sin_port		= htons(port);
+	server.sin_addr.s_addr	= dest_ip;
+
+	r_bytes = recv(sock[0], sock_buf, buf_len, 0);
+	if (r_bytes < 0) {
+		if (errno != EAGAIN)
+			pr_perror("recv()");
+		return -1;
+	}
+	if (r_bytes == 0) { /* EOF */
+		pr_err("EOF on reply to ping");
+		return -1;
+	}
+	if (r_bytes != buf_len || memcmp(buf, sock_buf, buf_len)) {
+		pr_err("ping reply packet is corrupted %zd/%zu", r_bytes, buf_len);
+		return -1;
+	}
+
+	s_bytes = sendto(sock[1], buf, buf_len, 0, dest_addr, sizeof(server));
+	if (s_bytes < 0) {
+		pr_perror("sendto()");
+		return -1;
+	} else if (s_bytes != buf_len) {
+		pr_err("send part of the message: %zd/%zu", s_bytes, sizeof(server));
+		return -1;
+	}
+
+	return 0;
+}
+
+typedef int (*ping_f)(int sock[2], in_addr_t dest_ip, unsigned int port,
+		char *buf, size_t buf_len);
+static int do_ping(int cmd_fd, char *buf, size_t buf_len, struct in_addr from,
+		bool init_side, int d_port, in_addr_t to, ping_f func)
+{
+	struct test_desc msg;
+	unsigned int s_port, i, ping_succeeded = 0;
+	int ping_sock[2];
+	char to_str[IPV4_STR_SZ] = {}, from_str[IPV4_STR_SZ] = {};
+
+	if (udp_ping_init(from, ping_timeout, &s_port, ping_sock)) {
+		pr_err("Failed to init ping");
+		return -1;
+	}
+
+	memset(&msg, 0, sizeof(msg));
+	msg.type		= MSG_PING;
+	msg.body.ping.port	= s_port;
+	memcpy(&msg.body.ping.reply_ip, &from, sizeof(from));
+
+	write_msg(cmd_fd, &msg);
+	if (init_side) {
+		/* The other end sends ip to ping */
+		read_msg(cmd_fd, &msg);
+		if (msg.type != MSG_PING)
+			return -1;
+		to = msg.body.ping.reply_ip;
+		d_port = msg.body.ping.port;
+	}
+
+	for (i = 0; i < ping_count ; i++) {
+		struct timespec sleep_time = {
+			.tv_sec = 0,
+			.tv_nsec = ping_delay_nsec,
+		};
+
+		ping_succeeded += !func(ping_sock, to, d_port, buf, page_size);
+		nanosleep(&sleep_time, 0);
+	}
+
+	close(ping_sock[0]);
+	close(ping_sock[1]);
+
+	strncpy(to_str, inet_ntoa(*(struct in_addr *)&to), IPV4_STR_SZ - 1);
+	strncpy(from_str, inet_ntoa(from), IPV4_STR_SZ - 1);
+
+	if (ping_succeeded < ping_success) {
+		pr_err("ping (%s) %s->%s failed %u/%u times",
+			init_side ? "send" : "reply", from_str, to_str,
+			ping_count - ping_succeeded, ping_count);
+		return -1;
+	}
+
+	pr_debug("ping (%s) %s->%s succeeded %u/%u times",
+		init_side ? "send" : "reply", from_str, to_str,
+		ping_succeeded, ping_count);
+
+	return 0;
+}
+
+static int randomize_buffer(void *buf, size_t buflen)
+{
+	int random_bytes = 0;
+
+	if (!buflen)
+		return 0;
+
+	do {
+		random_bytes += syscall(SYS_getrandom, buf, buflen, 0);
+	} while (random_bytes > 0 && random_bytes < buflen);
+
+	if (random_bytes < 0) {
+		pr_err("get_random() failed: %d\n", random_bytes);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int xfrm_fill_key(char *name, char *buf,
+		size_t buf_len, unsigned int *key_len)
+{
+	/* XXX: use set/map instead of all this */
+	if (strncmp(name, "digest_null", ALGO_LEN) == 0)
+		*key_len = 0;
+	else if (strncmp(name, "ecb(cipher_null)", ALGO_LEN) == 0)
+		*key_len = 0;
+	else if (strncmp(name, "cbc(des)", ALGO_LEN) == 0)
+		*key_len = 64;
+	else if (strncmp(name, "hmac(md5)", ALGO_LEN) == 0)
+		*key_len = 128;
+	else if (strncmp(name, "cmac(aes)", ALGO_LEN) == 0)
+		*key_len = 128;
+	else if (strncmp(name, "xcbc(aes)", ALGO_LEN) == 0)
+		*key_len = 128;
+	else if (strncmp(name, "cbc(cast5)", ALGO_LEN) == 0)
+		*key_len = 128;
+	else if (strncmp(name, "cbc(serpent)", ALGO_LEN) == 0)
+		*key_len = 128;
+	else if (strncmp(name, "hmac(sha1)", ALGO_LEN) == 0)
+		*key_len = 160;
+	else if (strncmp(name, "hmac(rmd160)", ALGO_LEN) == 0)
+		*key_len = 160;
+	else if (strncmp(name, "cbc(des3_ede)", ALGO_LEN) == 0)
+		*key_len = 192;
+	else if (strncmp(name, "hmac(sha256)", ALGO_LEN) == 0)
+		*key_len = 256;
+	else if (strncmp(name, "cbc(aes)", ALGO_LEN) == 0)
+		*key_len = 256;
+	else if (strncmp(name, "cbc(camellia)", ALGO_LEN) == 0)
+		*key_len = 256;
+	else if (strncmp(name, "cbc(twofish)", ALGO_LEN) == 0)
+		*key_len = 256;
+	else if (strncmp(name, "rfc3686(ctr(aes))", ALGO_LEN) == 0)
+		*key_len = 288;
+	else if (strncmp(name, "hmac(sha384)", ALGO_LEN) == 0)
+		*key_len = 384;
+	else if (strncmp(name, "cbc(blowfish)", ALGO_LEN) == 0)
+		*key_len = 448;
+	else if (strncmp(name, "hmac(sha512)", ALGO_LEN) == 0)
+		*key_len = 512;
+	else if (strncmp(name, "rfc4106(gcm(aes))-128", ALGO_LEN) == 0)
+		*key_len = 160;
+	else if (strncmp(name, "rfc4543(gcm(aes))-128", ALGO_LEN) == 0)
+		*key_len = 160;
+	else if (strncmp(name, "rfc4309(ccm(aes))-128", ALGO_LEN) == 0)
+		*key_len = 152;
+	else if (strncmp(name, "rfc4106(gcm(aes))-192", ALGO_LEN) == 0)
+		*key_len = 224;
+	else if (strncmp(name, "rfc4543(gcm(aes))-192", ALGO_LEN) == 0)
+		*key_len = 224;
+	else if (strncmp(name, "rfc4309(ccm(aes))-192", ALGO_LEN) == 0)
+		*key_len = 216;
+	else if (strncmp(name, "rfc4106(gcm(aes))-256", ALGO_LEN) == 0)
+		*key_len = 288;
+	else if (strncmp(name, "rfc4543(gcm(aes))-256", ALGO_LEN) == 0)
+		*key_len = 288;
+	else if (strncmp(name, "rfc4309(ccm(aes))-256", ALGO_LEN) == 0)
+		*key_len = 280;
+	else if (strncmp(name, "rfc7539(chacha20,poly1305)-128", ALGO_LEN) == 0)
+		*key_len = 0;
+
+	if (*key_len > buf_len) {
+		pr_err("Can't pack a key - too big for buffer");
+		return -1;
+	}
+
+	return randomize_buffer(buf, *key_len);
+}
+
+static int xfrm_state_pack_algo(struct nlmsghdr *nh, size_t req_sz,
+		struct xfrm_desc *desc)
+{
+	struct {
+		union {
+			struct xfrm_algo	alg;
+			struct xfrm_algo_aead	aead;
+			struct xfrm_algo_auth	auth;
+		} u;
+		char buf[XFRM_ALGO_KEY_BUF_SIZE];
+	} alg = {};
+	size_t alen, elen, clen, aelen;
+	unsigned short type;
+
+	alen = strlen(desc->a_algo);
+	elen = strlen(desc->e_algo);
+	clen = strlen(desc->c_algo);
+	aelen = strlen(desc->ae_algo);
+
+	/* Verify desc */
+	switch (desc->proto) {
+	case IPPROTO_AH:
+		if (!alen || elen || clen || aelen) {
+			pr_err("BUG: buggy ah desc");
+			return -1;
+		}
+		strncpy(alg.u.alg.alg_name, desc->a_algo, ALGO_LEN);
+		if (xfrm_fill_key(desc->a_algo, alg.u.alg.alg_key,
+				sizeof(alg.buf), &alg.u.alg.alg_key_len))
+			return -1;
+		type = XFRMA_ALG_AUTH;
+		break;
+	case IPPROTO_COMP:
+		if (!clen || elen || alen || aelen) {
+			pr_err("BUG: buggy comp desc");
+			return -1;
+		}
+		strncpy(alg.u.alg.alg_name, desc->c_algo, ALGO_LEN);
+		if (xfrm_fill_key(desc->c_algo, alg.u.alg.alg_key,
+				sizeof(alg.buf), &alg.u.alg.alg_key_len))
+			return -1;
+		type = XFRMA_ALG_COMP;
+		break;
+	case IPPROTO_ESP:
+		if (!((alen && elen) ^ aelen) || clen) {
+			pr_err("BUG: buggy esp desc");
+			return -1;
+		}
+		if (aelen) {
+			alg.u.aead.alg_icv_len = desc->icv_len;
+			strncpy(alg.u.aead.alg_name, desc->ae_algo, ALGO_LEN);
+			if (xfrm_fill_key(desc->ae_algo, alg.u.aead.alg_key,
+						sizeof(alg.buf), &alg.u.aead.alg_key_len))
+				return -1;
+			type = XFRMA_ALG_AEAD;
+		} else {
+
+			strncpy(alg.u.alg.alg_name, desc->e_algo, ALGO_LEN);
+			type = XFRMA_ALG_CRYPT;
+			if (xfrm_fill_key(desc->e_algo, alg.u.alg.alg_key,
+						sizeof(alg.buf), &alg.u.alg.alg_key_len))
+				return -1;
+			if (rtattr_pack(nh, req_sz, type, &alg, sizeof(alg)))
+				return -1;
+
+			strncpy(alg.u.alg.alg_name, desc->a_algo, ALGO_LEN);
+			type = XFRMA_ALG_AUTH;
+			if (xfrm_fill_key(desc->a_algo, alg.u.alg.alg_key,
+						sizeof(alg.buf), &alg.u.alg.alg_key_len))
+				return -1;
+		}
+		break;
+	default:
+		pr_err("BUG: unknown proto in desc");
+		return -1;
+	}
+
+	if (rtattr_pack(nh, req_sz, type, &alg, sizeof(alg)))
+		return -1;
+
+	return 0;
+}
+
+static inline uint32_t gen_spi(struct in_addr src)
+{
+	return htonl(inet_lnaof(src));
+}
+
+static int xfrm_state_add(int xfrm_sock, uint32_t seq, uint32_t spi,
+		struct in_addr src, struct in_addr dst,
+		struct xfrm_desc *desc)
+{
+	struct {
+		struct nlmsghdr		nh;
+		struct xfrm_usersa_info	info;
+		char			attrbuf[MAX_PAYLOAD];
+	} req;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.info));
+	req.nh.nlmsg_type	= XFRM_MSG_NEWSA;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= seq;
+
+	/* Fill selector. */
+	memcpy(&req.info.sel.daddr, &dst, sizeof(dst));
+	memcpy(&req.info.sel.saddr, &src, sizeof(src));
+	req.info.sel.family		= AF_INET;
+	req.info.sel.prefixlen_d	= PREFIX_LEN;
+	req.info.sel.prefixlen_s	= PREFIX_LEN;
+
+	/* Fill id */
+	memcpy(&req.info.id.daddr, &dst, sizeof(dst));
+	/* Note: zero-spi cannot be deleted */
+	req.info.id.spi = spi;
+	req.info.id.proto	= desc->proto;
+
+	memcpy(&req.info.saddr, &src, sizeof(src));
+
+	/* Fill lifteme_cfg */
+	req.info.lft.soft_byte_limit	= XFRM_INF;
+	req.info.lft.hard_byte_limit	= XFRM_INF;
+	req.info.lft.soft_packet_limit	= XFRM_INF;
+	req.info.lft.hard_packet_limit	= XFRM_INF;
+
+	req.info.family		= AF_INET;
+	req.info.mode		= XFRM_MODE_TUNNEL;
+
+	/* XXX: Fill seq, reqid, replay_window, flags? */
+
+	if (xfrm_state_pack_algo(&req.nh, sizeof(req), desc))
+		return -1;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+
+	return netlink_check_answer(xfrm_sock);
+}
+
+static int xfrm_set(int xfrm_sock, uint32_t *seq,
+		struct in_addr src, struct in_addr dst,
+		struct in_addr tunsrc, struct in_addr tundst,
+		struct xfrm_desc *desc)
+{
+	int err;
+
+	err = xfrm_state_add(xfrm_sock, (*seq)++, gen_spi(src), src, dst, desc);
+	if (err) {
+		pr_err("Failed to add xfrm state");
+		return -1;
+	}
+
+	err = xfrm_state_add(xfrm_sock, (*seq)++, gen_spi(src), dst, src, desc);
+	if (err) {
+		pr_err("Failed to add xfrm state");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int xfrm_policy_add(int xfrm_sock, uint32_t seq, uint32_t spi,
+		struct in_addr src, struct in_addr dst, uint8_t dir,
+		struct in_addr tunsrc, struct in_addr tundst, uint8_t proto)
+{
+	struct {
+		struct nlmsghdr			nh;
+		struct xfrm_userpolicy_info	info;
+		char				attrbuf[MAX_PAYLOAD];
+	} req;
+	struct xfrm_user_tmpl tmpl;
+
+	memset(&req, 0, sizeof(req));
+	memset(&tmpl, 0, sizeof(tmpl));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.info));
+	req.nh.nlmsg_type	= XFRM_MSG_NEWPOLICY;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= seq;
+
+	/* Fill selector. */
+	memcpy(&req.info.sel.daddr, &dst, sizeof(tundst));
+	memcpy(&req.info.sel.saddr, &src, sizeof(tunsrc));
+	req.info.sel.family		= AF_INET;
+	req.info.sel.prefixlen_d	= PREFIX_LEN;
+	req.info.sel.prefixlen_s	= PREFIX_LEN;
+
+	/* Fill lifteme_cfg */
+	req.info.lft.soft_byte_limit	= XFRM_INF;
+	req.info.lft.hard_byte_limit	= XFRM_INF;
+	req.info.lft.soft_packet_limit	= XFRM_INF;
+	req.info.lft.hard_packet_limit	= XFRM_INF;
+
+	req.info.dir = dir;
+
+	/* Fill tmpl */
+	memcpy(&tmpl.id.daddr, &dst, sizeof(dst));
+	/* Note: zero-spi cannot be deleted */
+	tmpl.id.spi = spi;
+	tmpl.id.proto	= proto;
+	tmpl.family	= AF_INET;
+	memcpy(&tmpl.saddr, &src, sizeof(src));
+	tmpl.mode	= XFRM_MODE_TUNNEL;
+	tmpl.aalgos = (~(uint32_t)0);
+	tmpl.ealgos = (~(uint32_t)0);
+	tmpl.calgos = (~(uint32_t)0);
+
+	if (rtattr_pack(&req.nh, sizeof(req), XFRMA_TMPL, &tmpl, sizeof(tmpl)))
+		return -1;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+
+	return netlink_check_answer(xfrm_sock);
+}
+
+static int xfrm_prepare(int xfrm_sock, uint32_t *seq,
+		struct in_addr src, struct in_addr dst,
+		struct in_addr tunsrc, struct in_addr tundst, uint8_t proto)
+{
+	if (xfrm_policy_add(xfrm_sock, (*seq)++, gen_spi(src), src, dst,
+				XFRM_POLICY_OUT, tunsrc, tundst, proto)) {
+		pr_err("Failed to add xfrm policy");
+		return -1;
+	}
+
+	if (xfrm_policy_add(xfrm_sock, (*seq)++, gen_spi(src), dst, src,
+				XFRM_POLICY_IN, tunsrc, tundst, proto)) {
+		pr_err("Failed to add xfrm policy");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int xfrm_policy_del(int xfrm_sock, uint32_t seq,
+		struct in_addr src, struct in_addr dst, uint8_t dir,
+		struct in_addr tunsrc, struct in_addr tundst)
+{
+	struct {
+		struct nlmsghdr			nh;
+		struct xfrm_userpolicy_id	id;
+		char				attrbuf[MAX_PAYLOAD];
+	} req;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.id));
+	req.nh.nlmsg_type	= XFRM_MSG_DELPOLICY;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= seq;
+
+	/* Fill id */
+	memcpy(&req.id.sel.daddr, &dst, sizeof(tundst));
+	memcpy(&req.id.sel.saddr, &src, sizeof(tunsrc));
+	req.id.sel.family		= AF_INET;
+	req.id.sel.prefixlen_d		= PREFIX_LEN;
+	req.id.sel.prefixlen_s		= PREFIX_LEN;
+	req.id.dir = dir;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+
+	return netlink_check_answer(xfrm_sock);
+}
+
+static int xfrm_cleanup(int xfrm_sock, uint32_t *seq,
+		struct in_addr src, struct in_addr dst,
+		struct in_addr tunsrc, struct in_addr tundst)
+{
+	if (xfrm_policy_del(xfrm_sock, (*seq)++, src, dst,
+				XFRM_POLICY_OUT, tunsrc, tundst)) {
+		pr_err("Failed to add xfrm policy");
+		return -1;
+	}
+
+	if (xfrm_policy_del(xfrm_sock, (*seq)++, dst, src,
+				XFRM_POLICY_IN, tunsrc, tundst)) {
+		pr_err("Failed to add xfrm policy");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int xfrm_state_del(int xfrm_sock, uint32_t seq, uint32_t spi,
+		struct in_addr src, struct in_addr dst, uint8_t proto)
+{
+	struct {
+		struct nlmsghdr		nh;
+		struct xfrm_usersa_id	id;
+		char			attrbuf[MAX_PAYLOAD];
+	} req;
+	xfrm_address_t saddr = {};
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.id));
+	req.nh.nlmsg_type	= XFRM_MSG_DELSA;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= seq;
+
+	memcpy(&req.id.daddr, &dst, sizeof(dst));
+	req.id.family		= AF_INET;
+	req.id.proto		= proto;
+	/* Note: zero-spi cannot be deleted */
+	req.id.spi = spi;
+
+	memcpy(&saddr, &src, sizeof(src));
+	if (rtattr_pack(&req.nh, sizeof(req), XFRMA_SRCADDR, &saddr, sizeof(saddr)))
+		return -1;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+
+	return netlink_check_answer(xfrm_sock);
+}
+
+static int xfrm_delete(int xfrm_sock, uint32_t *seq,
+		struct in_addr src, struct in_addr dst,
+		struct in_addr tunsrc, struct in_addr tundst, uint8_t proto)
+{
+	if (xfrm_state_del(xfrm_sock, (*seq)++, gen_spi(src), src, dst, proto)) {
+		pr_err("Failed to remove xfrm state");
+		return -1;
+	}
+
+	if (xfrm_state_del(xfrm_sock, (*seq)++, gen_spi(src), dst, src, proto)) {
+		pr_err("Failed to remove xfrm state");
+		return -1;
+	}
+
+	return 0;
+}
+
+static int xfrm_state_allocspi(int xfrm_sock, uint32_t *seq,
+		uint32_t spi, uint8_t proto)
+{
+	struct {
+		struct nlmsghdr			nh;
+		struct xfrm_userspi_info	spi;
+	} req;
+	struct {
+		struct nlmsghdr			nh;
+		union {
+			struct xfrm_usersa_info	info;
+			int error;
+		};
+	} answer;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.spi));
+	req.nh.nlmsg_type	= XFRM_MSG_ALLOCSPI;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST;
+	req.nh.nlmsg_seq	= (*seq)++;
+
+	req.spi.info.family	= AF_INET;
+	req.spi.min		= spi;
+	req.spi.max		= spi;
+	req.spi.info.id.proto	= proto;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		return -1;
+	}
+
+	if (recv(xfrm_sock, &answer, sizeof(answer), 0) < 0) {
+		pr_perror("recv()");
+		return -1;
+	} else if (answer.nh.nlmsg_type == XFRM_MSG_NEWSA) {
+		uint32_t new_spi = htonl(answer.info.id.spi);
+
+		if (new_spi != spi) {
+			pr_err("allocated spi is different from requested: %#x != %#x",
+					new_spi, spi);
+			return -1;
+		}
+		return 0;
+	} else if (answer.nh.nlmsg_type != NLMSG_ERROR) {
+		pr_err("expected NLMSG_ERROR, got %d", (int)answer.nh.nlmsg_type);
+		return -1;
+	}
+
+	pr_err("NLMSG_ERROR: %d: %s", answer.error, strerror(-answer.error));
+	return answer.error;
+}
+
+static int netlink_sock_bind(int *sock, uint32_t *seq, int proto, uint32_t groups)
+{
+	struct sockaddr_nl snl = {};
+	socklen_t addr_len;
+	int ret = -1;
+
+	snl.nl_family = AF_NETLINK;
+	snl.nl_groups = groups;
+
+	if (netlink_sock(sock, seq, proto)) {
+		pr_err("Failed to open xfrm netlink socket");
+		return -1;
+	}
+
+	if (bind(*sock, (struct sockaddr *)&snl, sizeof(snl)) < 0) {
+		pr_perror("bind()");
+		goto out_close;
+	}
+
+	addr_len = sizeof(snl);
+	if (getsockname(*sock, (struct sockaddr *)&snl, &addr_len) < 0) {
+		pr_perror("getsockname()");
+		goto out_close;
+	}
+	if (addr_len != sizeof(snl)) {
+		pr_err("Wrong address length %d", addr_len);
+		goto out_close;
+	}
+	if (snl.nl_family != AF_NETLINK) {
+		pr_err("Wrong address family %d", snl.nl_family);
+		goto out_close;
+	}
+	return 0;
+
+out_close:
+	close(*sock);
+	return ret;
+}
+
+static int xfrm_monitor_acquire(int xfrm_sock, uint32_t *seq, unsigned int nr)
+{
+	struct {
+		struct nlmsghdr nh;
+		union {
+			struct xfrm_user_acquire acq;
+			int error;
+		};
+		char attrbuf[MAX_PAYLOAD];
+	} req;
+	struct xfrm_user_tmpl xfrm_tmpl = {};
+	int xfrm_listen = -1, ret = -1;
+	uint32_t seq_listen;
+
+	if (netlink_sock_bind(&xfrm_listen, &seq_listen, NETLINK_XFRM, XFRMNLGRP_ACQUIRE))
+		return -1;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.acq));
+	req.nh.nlmsg_type	= XFRM_MSG_ACQUIRE;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= (*seq)++;
+
+	req.acq.policy.sel.family	= AF_INET;
+	req.acq.aalgos	= 0xfeed;
+	req.acq.ealgos	= 0xbaad;
+	req.acq.calgos	= 0xbabe;
+
+	xfrm_tmpl.family = AF_INET;
+	xfrm_tmpl.id.proto = IPPROTO_ESP;
+	if (rtattr_pack(&req.nh, sizeof(req), XFRMA_TMPL, &xfrm_tmpl, sizeof(xfrm_tmpl)))
+		goto out_close;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		goto out_close;
+	}
+
+	if (recv(xfrm_sock, &req, sizeof(req), 0) < 0) {
+		pr_perror("recv()");
+		goto out_close;
+	} else if (req.nh.nlmsg_type != NLMSG_ERROR) {
+		pr_err("expected NLMSG_ERROR, got %d", (int)req.nh.nlmsg_type);
+		goto out_close;
+	}
+
+	if (req.error) {
+		pr_err("NLMSG_ERROR: %d: %s", req.error, strerror(-req.error));
+		ret = req.error;
+		goto out_close;
+	}
+
+	if (recv(xfrm_listen, &req, sizeof(req), 0) < 0) {
+		pr_perror("recv()");
+		goto out_close;
+	}
+
+	if (req.acq.aalgos != 0xfeed || req.acq.ealgos != 0xbaad
+			|| req.acq.calgos != 0xbabe) {
+		pr_err("xfrm_user_acquire has changed  %x %x %x",
+				req.acq.aalgos, req.acq.ealgos, req.acq.calgos);
+		goto out_close;
+	}
+
+	ret = 0;
+out_close:
+	close(xfrm_listen);
+	return ret;
+}
+
+static int xfrm_expire_state(int xfrm_sock, uint32_t *seq,
+		unsigned int nr, struct xfrm_desc *desc)
+{
+	struct {
+		struct nlmsghdr nh;
+		union {
+			struct xfrm_user_expire expire;
+			int error;
+		};
+	} req;
+	struct in_addr src, dst;
+	int xfrm_listen = -1, ret = -1;
+	uint32_t seq_listen;
+
+	src = inet_makeaddr(INADDR_B, child_ip(nr));
+	dst = inet_makeaddr(INADDR_B, grchild_ip(nr));
+
+	if (xfrm_state_add(xfrm_sock, (*seq)++, gen_spi(src), src, dst, desc)) {
+		pr_err("Failed to add xfrm state");
+		return -1;
+	}
+
+	if (netlink_sock_bind(&xfrm_listen, &seq_listen, NETLINK_XFRM, XFRMNLGRP_EXPIRE))
+		return -1;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.expire));
+	req.nh.nlmsg_type	= XFRM_MSG_EXPIRE;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= (*seq)++;
+
+	memcpy(&req.expire.state.id.daddr, &dst, sizeof(dst));
+	req.expire.state.id.spi		= gen_spi(src);
+	req.expire.state.id.proto	= desc->proto;
+	req.expire.state.family		= AF_INET;
+	req.expire.hard			= 0xff;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		goto out_close;
+	}
+
+	if (recv(xfrm_sock, &req, sizeof(req), 0) < 0) {
+		pr_perror("recv()");
+		goto out_close;
+	} else if (req.nh.nlmsg_type != NLMSG_ERROR) {
+		pr_err("expected NLMSG_ERROR, got %d", (int)req.nh.nlmsg_type);
+		goto out_close;
+	}
+
+	if (req.error) {
+		pr_err("NLMSG_ERROR: %d: %s", req.error, strerror(-req.error));
+		ret = req.error;
+		goto out_close;
+	}
+
+	if (recv(xfrm_listen, &req, sizeof(req), 0) < 0) {
+		pr_perror("recv()");
+		goto out_close;
+	}
+
+	if (req.expire.hard != 0x1) {
+		pr_err("expire.hard is not set: %x", req.expire.hard);
+		goto out_close;
+	}
+
+	ret = 0;
+out_close:
+	close(xfrm_listen);
+	return ret;
+}
+
+static int xfrm_expire_policy(int xfrm_sock, uint32_t *seq,
+		unsigned int nr, struct xfrm_desc *desc)
+{
+	struct {
+		struct nlmsghdr nh;
+		union {
+			struct xfrm_user_polexpire expire;
+			int error;
+		};
+	} req;
+	struct in_addr src, dst, tunsrc, tundst;
+	int xfrm_listen = -1, ret = -1;
+	uint32_t seq_listen;
+
+	src = inet_makeaddr(INADDR_B, child_ip(nr));
+	dst = inet_makeaddr(INADDR_B, grchild_ip(nr));
+	tunsrc = inet_makeaddr(INADDR_A, child_ip(nr));
+	tundst = inet_makeaddr(INADDR_A, grchild_ip(nr));
+
+	if (xfrm_policy_add(xfrm_sock, (*seq)++, gen_spi(src), src, dst,
+				XFRM_POLICY_OUT, tunsrc, tundst, desc->proto)) {
+		pr_err("Failed to add xfrm policy");
+		return -1;
+	}
+
+	if (netlink_sock_bind(&xfrm_listen, &seq_listen, NETLINK_XFRM, XFRMNLGRP_EXPIRE))
+		return -1;
+
+	memset(&req, 0, sizeof(req));
+	req.nh.nlmsg_len	= NLMSG_LENGTH(sizeof(req.expire));
+	req.nh.nlmsg_type	= XFRM_MSG_POLEXPIRE;
+	req.nh.nlmsg_flags	= NLM_F_REQUEST | NLM_F_ACK;
+	req.nh.nlmsg_seq	= (*seq)++;
+
+	/* Fill selector. */
+	memcpy(&req.expire.pol.sel.daddr, &dst, sizeof(tundst));
+	memcpy(&req.expire.pol.sel.saddr, &src, sizeof(tunsrc));
+	req.expire.pol.sel.family	= AF_INET;
+	req.expire.pol.sel.prefixlen_d	= PREFIX_LEN;
+	req.expire.pol.sel.prefixlen_s	= PREFIX_LEN;
+	req.expire.pol.dir		= XFRM_POLICY_OUT;
+	req.expire.hard			= 0xff;
+
+	if (send(xfrm_sock, &req, req.nh.nlmsg_len, 0) < 0) {
+		pr_perror("write()");
+		goto out_close;
+	}
+
+	if (recv(xfrm_sock, &req, sizeof(req), 0) < 0) {
+		pr_perror("recv()");
+		goto out_close;
+	} else if (req.nh.nlmsg_type != NLMSG_ERROR) {
+		pr_err("expected NLMSG_ERROR, got %d", (int)req.nh.nlmsg_type);
+		goto out_close;
+	}
+
+	if (req.error) {
+		pr_err("NLMSG_ERROR: %d: %s", req.error, strerror(-req.error));
+		ret = req.error;
+		goto out_close;
+	}
+
+	if (recv(xfrm_listen, &req, sizeof(req), 0) < 0) {
+		pr_perror("recv()");
+		goto out_close;
+	}
+
+	if (req.expire.hard != 0x1) {
+		pr_err("expire.hard is not set: %x", req.expire.hard);
+		goto out_close;
+	}
+
+	ret = 0;
+out_close:
+	close(xfrm_listen);
+	return ret;
+}
+
+static void print_desc(char *lvl, char *msg, struct xfrm_desc *desc)
+{
+	printk("%s: [%u, '%s', '%s', '%s', '%s', %u]", lvl, msg,
+		(unsigned int)desc->proto, desc->a_algo, desc->e_algo,
+		desc->c_algo, desc->ae_algo, desc->icv_len);
+}
+
+static int child_serv(int xfrm_sock, uint32_t *seq,
+		unsigned int nr, int cmd_fd, void *buf, struct xfrm_desc *desc)
+{
+	struct in_addr src, dst, tunsrc, tundst;
+	struct test_desc msg;
+	int ret = -1;
+
+	src = inet_makeaddr(INADDR_B, child_ip(nr));
+	dst = inet_makeaddr(INADDR_B, grchild_ip(nr));
+	tunsrc = inet_makeaddr(INADDR_A, child_ip(nr));
+	tundst = inet_makeaddr(INADDR_A, grchild_ip(nr));
+
+	/* UDP pinging without xfrm */
+	if (do_ping(cmd_fd, buf, page_size, src, true, 0, 0, udp_ping_send)) {
+		pr_err("ping failed before setting xfrm");
+		return -1;
+	}
+
+	memset(&msg, 0, sizeof(msg));
+	msg.type = MSG_XFRM_PREPARE;
+	memcpy(&msg.body.xfrm_desc, desc, sizeof(*desc));
+	write_msg(cmd_fd, &msg);
+
+	if (xfrm_prepare(xfrm_sock, seq, src, dst, tunsrc, tundst, desc->proto)) {
+		print_desc("ERR", "failed to prepare xfrm", desc);
+		goto cleanup;
+	}
+
+	memset(&msg, 0, sizeof(msg));
+	msg.type = MSG_XFRM_ADD;
+	memcpy(&msg.body.xfrm_desc, desc, sizeof(*desc));
+	write_msg(cmd_fd, &msg);
+	if (xfrm_set(xfrm_sock, seq, src, dst, tunsrc, tundst, desc)) {
+		print_desc("ERR", "failed to set xfrm", desc);
+		goto cleanup;
+	}
+
+	/* UDP pinging with xfrm tunnel */
+	if (do_ping(cmd_fd, buf, page_size, tunsrc,
+				true, 0, 0, udp_ping_send)) {
+		print_desc("ERR", "ping failed for xfrm", desc);
+		goto delete;
+	}
+
+	print_desc("OK", "xfrm", desc);
+	ret = 0;
+delete:
+	/* xfrm delete */
+	memset(&msg, 0, sizeof(msg));
+	msg.type = MSG_XFRM_DEL;
+	memcpy(&msg.body.xfrm_desc, desc, sizeof(*desc));
+	write_msg(cmd_fd, &msg);
+
+	if (xfrm_delete(xfrm_sock, seq, src, dst, tunsrc, tundst, desc->proto)) {
+		print_desc("ERR", "ping to remove xfrm", desc);
+		ret = -1;
+	}
+
+cleanup:
+	memset(&msg, 0, sizeof(msg));
+	msg.type = MSG_XFRM_CLEANUP;
+	memcpy(&msg.body.xfrm_desc, desc, sizeof(*desc));
+	write_msg(cmd_fd, &msg);
+	if (xfrm_cleanup(xfrm_sock, seq, src, dst, tunsrc, tundst)) {
+		print_desc("ERR", "ping to cleanup xfrm", desc);
+		ret = -1;
+	}
+	return ret;
+}
+
+static int child_f(unsigned int nr, int test_desc_fd, int cmd_fd, void *buf)
+{
+	struct xfrm_desc desc;
+	struct test_desc msg;
+	int xfrm_sock = -1;
+	uint32_t seq;
+	int ret = 1;
+
+	if (switch_ns(nsfd_childa))
+		exit(1);
+
+	if (netlink_sock(&xfrm_sock, &seq, NETLINK_XFRM)) {
+		pr_err("Failed to open xfrm netlink socket");
+		return -1;
+	}
+
+	/* Check that seq sock is ready, just for sure. */
+	memset(&msg, 0, sizeof(msg));
+	msg.type = MSG_ACK;
+	write_msg(cmd_fd, &msg);
+	read_msg(cmd_fd, &msg);
+	if (msg.type != MSG_ACK) {
+		pr_err("Ack failed");
+		exit(1);
+	}
+
+	for (;;) {
+		ssize_t received = read(test_desc_fd, &desc, sizeof(desc));
+
+		if (received == 0) /* EOF */
+			break;
+
+		if (received != sizeof(desc)) {
+			pr_perror("read() returned %zd", received);
+			goto exit;
+		}
+
+		switch (desc.type) {
+		case CREATE_TUNNEL:
+			if (child_serv(xfrm_sock, &seq, nr, cmd_fd, buf, &desc))
+				goto exit;
+			break;
+		case ALLOCATE_SPI:
+			if (xfrm_state_allocspi(xfrm_sock, &seq, -1, desc.proto)) {
+				pr_err("allocspi failed");
+				goto exit;
+			}
+			pr_ok("allocspi");
+			break;
+		case MONITOR_ACQUIRE:
+			if (xfrm_monitor_acquire(xfrm_sock, &seq, nr)) {
+				pr_err("monitor acqure failed");
+				goto exit;
+			}
+			pr_ok("monitor acqure");
+			break;
+		case EXPIRE_STATE:
+			if (xfrm_expire_state(xfrm_sock, &seq, nr, &desc)) {
+				pr_err("expire state failed");
+				goto exit;
+			}
+			pr_ok("expire state");
+			break;
+		case EXPIRE_POLICY:
+			if (xfrm_expire_policy(xfrm_sock, &seq, nr, &desc)) {
+				pr_err("expire policy failed");
+				goto exit;
+			}
+			pr_ok("expire policy");
+			break;
+		default:
+			pr_err("Unknown desc type");
+			goto exit;
+		}
+	}
+
+	ret = 0;
+exit:
+	close(xfrm_sock);
+
+	msg.type = MSG_EXIT;
+	write_msg(cmd_fd, &msg);
+	exit(ret);
+}
+
+static int grand_child_serv(unsigned int nr, int cmd_fd, void *buf,
+		struct test_desc *msg, int xfrm_sock, uint32_t *seq)
+{
+	struct in_addr src, dst, tunsrc, tundst;
+	bool tun_reply;
+	struct xfrm_desc *desc = &msg->body.xfrm_desc;
+
+	src = inet_makeaddr(INADDR_B, grchild_ip(nr));
+	dst = inet_makeaddr(INADDR_B, child_ip(nr));
+	tunsrc = inet_makeaddr(INADDR_A, grchild_ip(nr));
+	tundst = inet_makeaddr(INADDR_A, child_ip(nr));
+
+	switch (msg->type) {
+	case MSG_EXIT:
+		exit(0);
+	case MSG_ACK:
+		write_msg(cmd_fd, msg);
+		break;
+	case MSG_PING:
+		tun_reply = memcmp(&dst, &msg->body.ping.reply_ip, sizeof(in_addr_t));
+		/* UDP pinging without xfrm */
+		if (do_ping(cmd_fd, buf, page_size, tun_reply ? tunsrc : src,
+				false, msg->body.ping.port,
+				msg->body.ping.reply_ip, udp_ping_reply)) {
+			pr_err("ping failed before setting xfrm");
+			return -1;
+		}
+		break;
+	case MSG_XFRM_PREPARE:
+		if (xfrm_prepare(xfrm_sock, seq, src, dst, tunsrc, tundst,
+					desc->proto)) {
+			print_desc("ERR", "failed to prepare xfrm", desc);
+			return -1;
+		}
+		break;
+	case MSG_XFRM_ADD:
+		if (xfrm_set(xfrm_sock, seq, src, dst, tunsrc, tundst, desc)) {
+			print_desc("ERR", "failed to set xfrm", desc);
+			return -1;
+		}
+		break;
+	case MSG_XFRM_DEL:
+		if (xfrm_delete(xfrm_sock, seq, src, dst, tunsrc, tundst,
+					desc->proto)) {
+			print_desc("ERR", "failed to remove xfrm", desc);
+			return -1;
+		}
+		break;
+	case MSG_XFRM_CLEANUP:
+		if (xfrm_cleanup(xfrm_sock, seq, src, dst, tunsrc, tundst)) {
+			print_desc("ERR", "failed to cleanup xfrm", desc);
+			return -1;
+		}
+		break;
+	default:
+		pr_err("got unknown msg type %d\n", msg->type);
+		return -1;
+	};
+
+	return 0;
+}
+
+static int grand_child_f(unsigned int nr, int cmd_fd, void *buf)
+{
+	struct test_desc msg;
+	int xfrm_sock = -1;
+	uint32_t seq;
+
+	if (switch_ns(nsfd_childb))
+		exit(1);
+
+	if (netlink_sock(&xfrm_sock, &seq, NETLINK_XFRM)) {
+		pr_err("Failed to open xfrm netlink socket");
+		return -1;
+	}
+
+	do {
+		read_msg(cmd_fd, &msg);
+		if (grand_child_serv(nr, cmd_fd, buf, &msg, xfrm_sock, &seq))
+			break;
+	} while (1);
+
+	close(xfrm_sock);
+	exit(1);
+}
+
+static int start_child(unsigned int nr, char *veth, int test_desc_fd[2])
+{
+	uint32_t route_seq;
+	int cmd_sock[2];
+	void *data_map;
+	pid_t child;
+
+	if (init_child(nsfd_childa, veth, child_ip(nr), grchild_ip(nr)))
+		return -1;
+
+	if (init_child(nsfd_childb, veth, grchild_ip(nr), child_ip(nr)))
+		return -1;
+
+	child = fork();
+	if (child < 0) {
+		pr_perror("fork()");
+		return -1;
+	} else if (child) {
+		/* in parent - selftest */
+		return switch_ns(nsfd_parent);
+	}
+
+	if (close(test_desc_fd[1])) {
+		pr_perror("close()");
+		return -1;
+	}
+
+	/* child */
+	data_map = mmap(0, page_size, PROT_READ | PROT_WRITE,
+			MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+	if (data_map == MAP_FAILED) {
+		pr_perror("mmap()");
+		return -1;
+	}
+	if (randomize_buffer(data_map, page_size))
+		return -1;
+
+	if (socketpair(PF_LOCAL, SOCK_SEQPACKET, 0, cmd_sock)) {
+		pr_perror("socketpair()");
+		return -1;
+	}
+
+	child = fork();
+	if (child < 0) {
+		pr_perror("fork()");
+		return -1;
+	} else if (child) {
+		if (close(cmd_sock[0])) {
+			pr_perror("close()");
+			return -1;
+		}
+		return child_f(nr, test_desc_fd[0], cmd_sock[1], data_map);
+	}
+	if (close(cmd_sock[1])) {
+		pr_perror("close()");
+		return -1;
+	}
+	return grand_child_f(nr, cmd_sock[0], data_map);
+}
+
+static void usage_exit(char **argv)
+{
+	fprintf(stderr, "Usage: %s [nr_process]\n", argv[0]);
+	exit(1);
+}
+
+static int write_desc(int proto, int test_desc_fd,
+		char *a, char *e, char *c, char *ae)
+{
+	struct xfrm_desc desc = {};
+
+	desc.type = CREATE_TUNNEL;
+	desc.proto = proto;
+
+	if (a)
+		strncpy(desc.a_algo, a, ALGO_LEN);
+	if (e)
+		strncpy(desc.e_algo, e, ALGO_LEN);
+	if (c)
+		strncpy(desc.c_algo, c, ALGO_LEN);
+	if (ae)
+		strncpy(desc.ae_algo, ae, ALGO_LEN);
+
+	return write(test_desc_fd, &desc, sizeof(desc)) != sizeof(desc);
+}
+
+int proto_list[] = { IPPROTO_AH, IPPROTO_COMP, IPPROTO_ESP };
+char *ah_list[] = {
+	"digest_null", "hmac(md5)", "hmac(sha1)", "hmac(sha256)",
+	"hmac(sha384)", "hmac(sha512)", "hmac(rmd160)",
+	"xcbc(aes)", "cmac(aes)"
+};
+char *comp_list[] = {
+	"deflate"
+#if 0
+	/* No compression backend realization */
+	"lzs", "lzjh"
+#endif
+};
+char *e_list[] = {
+	"ecb(cipher_null)", "cbc(des)", "cbc(des3_ede)", "cbc(cast5)",
+	"cbc(blowfish)", "cbc(aes)", "cbc(serpent)", "cbc(camellia)",
+	"cbc(twofish)", "rfc3686(ctr(aes))"
+};
+char *ae_list[] = {
+#if 0
+	/* not implemented */
+	"rfc4106(gcm(aes))", "rfc4309(ccm(aes))", "rfc4543(gcm(aes))",
+	"rfc7539esp(chacha20,poly1305)"
+#endif
+};
+
+static int write_proto_plan(int fd, int proto)
+{
+	unsigned int i;
+
+	switch (proto) {
+	case IPPROTO_AH:
+		for (i = 0; i < ARRAY_SIZE(ah_list); i++) {
+			if (write_desc(proto, fd, ah_list[i], 0, 0, 0)) {
+				pr_err("writing test's desc failed");
+				return -1;
+			}
+		}
+		break;
+	case IPPROTO_COMP:
+		for (i = 0; i < ARRAY_SIZE(comp_list); i++) {
+			if (write_desc(proto, fd, 0, 0, comp_list[i], 0)) {
+				pr_err("writing test's desc failed");
+				return -1;
+			}
+		}
+		break;
+	case IPPROTO_ESP:
+		for (i = 0; i < ARRAY_SIZE(ah_list); i++) {
+			int j;
+
+			for (j = 0; j < ARRAY_SIZE(e_list); j++) {
+				if (write_desc(proto, fd, ah_list[i],
+							e_list[j], 0, 0)) {
+					pr_err("writing test's desc failed");
+					return -1;
+				}
+			}
+		}
+		for (i = 0; i < ARRAY_SIZE(ae_list); i++) {
+			if (write_desc(proto, fd, 0, 0, 0, ae_list[i])) {
+				pr_err("writing test's desc failed");
+				return -1;
+			}
+		}
+		break;
+	default:
+		pr_err("BUG: Specified unknown proto %d", proto);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int write_compat_struct_tests(int test_desc_fd)
+{
+	struct xfrm_desc desc = {};
+
+	desc.type = ALLOCATE_SPI;
+	desc.proto = IPPROTO_AH;
+	strncpy(desc.a_algo, ah_list[0], ALGO_LEN);
+
+	if (write(test_desc_fd, &desc, sizeof(desc)) != sizeof(desc))
+		return -1;
+
+	desc.type = MONITOR_ACQUIRE;
+	if (write(test_desc_fd, &desc, sizeof(desc)) != sizeof(desc))
+		return -1;
+
+	desc.type = EXPIRE_STATE;
+	if (write(test_desc_fd, &desc, sizeof(desc)) != sizeof(desc))
+		return -1;
+
+	desc.type = EXPIRE_POLICY;
+	if (write(test_desc_fd, &desc, sizeof(desc)) != sizeof(desc))
+		return -1;
+
+	return 0;
+}
+
+static int write_test_plan(int test_desc_fd)
+{
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(proto_list); i++) {
+		if (write_proto_plan(test_desc_fd, proto_list[i]))
+			return -1;
+	}
+
+	if (write_compat_struct_tests(test_desc_fd))
+		return -1;
+
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	unsigned int nr_process = 1;
+	int route_sock = -1, ret = 1;
+	int test_desc_fd[2];
+	uint32_t route_seq;
+	unsigned int i;
+
+	if (argc > 2)
+		usage_exit(argv);
+
+	if (argc > 1) {
+		char *endptr;
+
+		errno = 0;
+		nr_process = strtol(argv[1], &endptr, 10);
+		if ((errno == ERANGE && (nr_process == LONG_MAX || nr_process == LONG_MIN))
+				|| (errno != 0 && nr_process == 0)
+				|| (endptr == argv[1]) || (*endptr != '\0')) {
+			pr_err("Failed to parse [nr_process]");
+			usage_exit(argv);
+		}
+
+		if (nr_process > MAX_PROCESSES || !nr_process) {
+			pr_err("nr_process should be between [1; %u]", MAX_PROCESSES);
+			usage_exit(argv);
+		}
+	}
+
+	page_size = sysconf(_SC_PAGESIZE);
+	if (page_size < 1) {
+		pr_perror("sysconf()");
+		return 1;
+	}
+
+	if (pipe2(test_desc_fd, O_DIRECT) < 0) {
+		pr_perror("pipe()");
+		return 1;
+	}
+
+	if (init_namespaces()) {
+		pr_err("Failed to create namespaces");
+		return 1;
+	}
+
+	if (netlink_sock(&route_sock, &route_seq, NETLINK_ROUTE)) {
+		pr_err("Failed to open netlink route socket");
+		return 1;
+	}
+
+	for (i = 0; i < nr_process; i++) {
+		char veth[VETH_LEN];
+
+		snprintf(veth, VETH_LEN, VETH_FMT, i);
+
+		if (veth_add(route_sock, route_seq++, veth, nsfd_childa, veth, nsfd_childb)) {
+			pr_err("Failed to create veth device");
+			goto err;
+		}
+
+		if (start_child(i, veth, test_desc_fd)) {
+			pr_err("Child failed to start");
+			goto err;
+		}
+	}
+
+	if (close(test_desc_fd[0])) {
+		pr_perror("close()");
+		goto err;
+	}
+
+	ret = write_test_plan(test_desc_fd[1]);
+	/* XXX: add wait() */
+err:
+	close(route_sock);
+	return ret;
+}
-- 
2.13.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 04/18] net/xfrm: Add _packed types for compat users
  2018-07-26  2:31 ` Dmitry Safonov
                   ` (6 preceding siblings ...)
  (?)
@ 2018-07-26  2:31 ` Dmitry Safonov
  -1 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-26  2:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev

xfrm_usersa_info and xfrm_userpolicy_info structures differ in size
between 64-bit and 32-bit ABI. In 64-bit ABI there is additional
4-byte padding in the end of the structure:

32-bit:
sizeof(xfrm_usersa_info)	= 220
sizeof(xfrm_userpolicy_info)	= 164
64-bit:
sizeof(xfrm_usersa_info)	= 224
sizeof(xfrm_userpolicy_info)	= 168

In preparation to add compat support to xfrm it's needed to add _packed
verstion for those types.

Parse xfrm_usersa_info and xfrm_userpolicy_info netlink messages
sent by userspace using _packed structures (as we don't care about
parsing padding).
Sending _packed notification messages back to userspace will be done
with following patches (in copy_to_user_state() and
copy_to_user_policy()).

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 net/xfrm/xfrm_user.c | 89 ++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 65 insertions(+), 24 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 2677cb55b7a8..b382cdd3bef6 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -33,6 +33,34 @@
 #endif
 #include <asm/unaligned.h>
 
+struct xfrm_usersa_info_packed {
+	struct xfrm_selector		sel;
+	struct xfrm_id			id;
+	xfrm_address_t			saddr;
+	struct xfrm_lifetime_cfg	lft;
+	struct xfrm_lifetime_cur	curlft;
+	struct xfrm_stats		stats;
+	__u32				seq;
+	__u32				reqid;
+	__u16				family;
+	__u8				mode;		/* XFRM_MODE_xxx */
+	__u8				replay_window;
+	__u8				flags;
+	__u8				__pad[3];
+} __packed;
+
+struct xfrm_userpolicy_info_packed {
+	struct xfrm_selector		sel;
+	struct xfrm_lifetime_cfg	lft;
+	struct xfrm_lifetime_cur	curlft;
+	__u32				priority;
+	__u32				index;
+	__u8				dir;
+	__u8				action;
+	__u8				flags;
+	__u8				share;
+} __packed;
+
 static int verify_one_alg(struct nlattr **attrs, enum xfrm_attr_type_t type)
 {
 	struct nlattr *rt = attrs[type];
@@ -115,7 +143,7 @@ static inline int verify_sec_ctx_len(struct nlattr **attrs)
 	return 0;
 }
 
-static inline int verify_replay(struct xfrm_usersa_info *p,
+static inline int verify_replay(struct xfrm_usersa_info_packed *p,
 				struct nlattr **attrs)
 {
 	struct nlattr *rt = attrs[XFRMA_REPLAY_ESN_VAL];
@@ -143,7 +171,7 @@ static inline int verify_replay(struct xfrm_usersa_info *p,
 	return 0;
 }
 
-static int verify_newsa_info(struct xfrm_usersa_info *p,
+static int verify_newsa_info(struct xfrm_usersa_info_packed *p,
 			     struct nlattr **attrs)
 {
 	int err;
@@ -464,7 +492,8 @@ static inline unsigned int xfrm_user_sec_ctx_size(struct xfrm_sec_ctx *xfrm_ctx)
 	return len;
 }
 
-static void copy_from_user_state(struct xfrm_state *x, struct xfrm_usersa_info *p)
+static void copy_from_user_state(struct xfrm_state *x,
+				 struct xfrm_usersa_info_packed *p)
 {
 	memcpy(&x->id, &p->id, sizeof(x->id));
 	memcpy(&x->sel, &p->sel, sizeof(x->sel));
@@ -528,9 +557,8 @@ static void xfrm_update_ae_params(struct xfrm_state *x, struct nlattr **attrs,
 }
 
 static struct xfrm_state *xfrm_state_construct(struct net *net,
-					       struct xfrm_usersa_info *p,
-					       struct nlattr **attrs,
-					       int *errp)
+	       struct xfrm_usersa_info_packed *p,
+	       struct nlattr **attrs, int *errp)
 {
 	struct xfrm_state *x = xfrm_state_alloc(net);
 	int err = -ENOMEM;
@@ -630,7 +658,7 @@ static int xfrm_add_sa(struct sk_buff *skb, struct nlmsghdr *nlh,
 		struct nlattr **attrs)
 {
 	struct net *net = sock_net(skb->sk);
-	struct xfrm_usersa_info *p = nlmsg_data(nlh);
+	struct xfrm_usersa_info_packed *p = nlmsg_data(nlh);
 	struct xfrm_state *x;
 	int err;
 	struct km_event c;
@@ -1331,7 +1359,7 @@ static int verify_policy_type(u8 type)
 	return 0;
 }
 
-static int verify_newpolicy_info(struct xfrm_userpolicy_info *p)
+static int verify_newpolicy_info(struct xfrm_userpolicy_info_packed *p)
 {
 	int ret;
 
@@ -1513,7 +1541,8 @@ static int copy_from_user_policy_type(u8 *tp, struct nlattr **attrs)
 	return 0;
 }
 
-static void copy_from_user_policy(struct xfrm_policy *xp, struct xfrm_userpolicy_info *p)
+static void copy_from_user_policy(struct xfrm_policy *xp,
+		struct xfrm_userpolicy_info_packed *p)
 {
 	xp->priority = p->priority;
 	xp->index = p->index;
@@ -1540,7 +1569,9 @@ static void copy_to_user_policy(struct xfrm_policy *xp, struct xfrm_userpolicy_i
 	p->share = XFRM_SHARE_ANY; /* XXX xp->share */
 }
 
-static struct xfrm_policy *xfrm_policy_construct(struct net *net, struct xfrm_userpolicy_info *p, struct nlattr **attrs, int *errp)
+static struct xfrm_policy *xfrm_policy_construct(struct net *net,
+		struct xfrm_userpolicy_info_packed *p,
+		struct nlattr **attrs, int *errp)
 {
 	struct xfrm_policy *xp = xfrm_policy_alloc(net, GFP_KERNEL);
 	int err;
@@ -1575,7 +1606,7 @@ static int xfrm_add_policy(struct sk_buff *skb, struct nlmsghdr *nlh,
 		struct nlattr **attrs)
 {
 	struct net *net = sock_net(skb->sk);
-	struct xfrm_userpolicy_info *p = nlmsg_data(nlh);
+	struct xfrm_userpolicy_info_packed *p = nlmsg_data(nlh);
 	struct xfrm_policy *xp;
 	struct km_event c;
 	int err;
@@ -2079,7 +2110,7 @@ static int xfrm_add_pol_expire(struct sk_buff *skb, struct nlmsghdr *nlh,
 	struct net *net = sock_net(skb->sk);
 	struct xfrm_policy *xp;
 	struct xfrm_user_polexpire *up = nlmsg_data(nlh);
-	struct xfrm_userpolicy_info *p = &up->pol;
+	struct xfrm_userpolicy_info_packed *p = (void *)&up->pol;
 	u8 type = XFRM_POLICY_TYPE_MAIN;
 	int err = -ENOENT;
 	struct xfrm_mark m;
@@ -2140,7 +2171,7 @@ static int xfrm_add_sa_expire(struct sk_buff *skb, struct nlmsghdr *nlh,
 	struct xfrm_state *x;
 	int err;
 	struct xfrm_user_expire *ue = nlmsg_data(nlh);
-	struct xfrm_usersa_info *p = &ue->state;
+	struct xfrm_usersa_info_packed *p = (struct xfrm_usersa_info_packed *)&ue->state;
 	struct xfrm_mark m;
 	u32 mark = xfrm_mark_get(attrs, &m);
 
@@ -2178,6 +2209,7 @@ static int xfrm_add_acquire(struct sk_buff *skb, struct nlmsghdr *nlh,
 	struct xfrm_mark mark;
 
 	struct xfrm_user_acquire *ua = nlmsg_data(nlh);
+	struct xfrm_userpolicy_info_packed *upi = (void *)&ua->policy;
 	struct xfrm_state *x = xfrm_state_alloc(net);
 	int err = -ENOMEM;
 
@@ -2186,12 +2218,12 @@ static int xfrm_add_acquire(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 	xfrm_mark_get(attrs, &mark);
 
-	err = verify_newpolicy_info(&ua->policy);
+	err = verify_newpolicy_info(upi);
 	if (err)
 		goto free_state;
 
 	/*   build an XP */
-	xp = xfrm_policy_construct(net, &ua->policy, attrs, &err);
+	xp = xfrm_policy_construct(net, upi, attrs, &err);
 	if (!xp)
 		goto free_state;
 
@@ -2881,11 +2913,21 @@ static struct xfrm_policy *xfrm_compile_policy(struct sock *sk, int opt,
 					       u8 *data, int len, int *dir)
 {
 	struct net *net = sock_net(sk);
-	struct xfrm_userpolicy_info *p = (struct xfrm_userpolicy_info *)data;
-	struct xfrm_user_tmpl *ut = (struct xfrm_user_tmpl *) (p + 1);
+	struct xfrm_userpolicy_info *upi = (void *)data;
+	struct xfrm_userpolicy_info_packed *_upi = (void *)data;
+	size_t policy_size;
+	struct xfrm_user_tmpl *ut;
 	struct xfrm_policy *xp;
 	int nr;
 
+	if (in_compat_syscall()) {
+		ut = (struct xfrm_user_tmpl *)(_upi + 1);
+		policy_size = sizeof(*_upi);
+	} else {
+		ut = (struct xfrm_user_tmpl *)(upi + 1);
+		policy_size = sizeof(*upi);
+	}
+
 	switch (sk->sk_family) {
 	case AF_INET:
 		if (opt != IP_XFRM_POLICY) {
@@ -2908,15 +2950,14 @@ static struct xfrm_policy *xfrm_compile_policy(struct sock *sk, int opt,
 
 	*dir = -EINVAL;
 
-	if (len < sizeof(*p) ||
-	    verify_newpolicy_info(p))
+	if (len < policy_size || verify_newpolicy_info(_upi))
 		return NULL;
 
-	nr = ((len - sizeof(*p)) / sizeof(*ut));
-	if (validate_tmpl(nr, ut, p->sel.family))
+	nr = ((len - policy_size) / sizeof(*ut));
+	if (validate_tmpl(nr, ut, _upi->sel.family))
 		return NULL;
 
-	if (p->dir > XFRM_POLICY_OUT)
+	if (_upi->dir > XFRM_POLICY_OUT)
 		return NULL;
 
 	xp = xfrm_policy_alloc(net, GFP_ATOMIC);
@@ -2925,11 +2966,11 @@ static struct xfrm_policy *xfrm_compile_policy(struct sock *sk, int opt,
 		return NULL;
 	}
 
-	copy_from_user_policy(xp, p);
+	copy_from_user_policy(xp, _upi);
 	xp->type = XFRM_POLICY_TYPE_MAIN;
 	copy_templates(xp, ut, nr);
 
-	*dir = p->dir;
+	*dir = _upi->dir;
 
 	return xp;
 }
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 05/18] net/xfrm: Parse userspi_info{,_packed} depending on syscall
  2018-07-26  2:31 ` Dmitry Safonov
                   ` (7 preceding siblings ...)
  (?)
@ 2018-07-26  2:31 ` Dmitry Safonov
  -1 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-26  2:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev

Struct xfrm_userspi_info differs in size between 64-bit/32-bit UAPI
because of (possible) padding of xfrm_usersa_info:

          32-bit                                   64-bit
----------------------------------------------------------------------
sizeof(xfrm_userspi_info) = 228     |  sizeof(xfrm_userspi_info) = 232
xfrm_userspi_info::info = 0         |  xfrm_userspi_info::info = 0
xfrm_userspi_info::min = 220        |  xfrm_userspi_info::min = 224
xfrm_userspi_info::max = 224        |  xfrm_userspi_info::max = 228

xfrm_alloc_userspi() can handle both UAPI by checking the type of
send() syscall used by userspace with XFRM_MSG_ALLOCSPI.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 net/xfrm/xfrm_user.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index b382cdd3bef6..bf2ca93edaf5 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -61,6 +61,12 @@ struct xfrm_userpolicy_info_packed {
 	__u8				share;
 } __packed;
 
+struct xfrm_userspi_info_packed {
+	struct xfrm_usersa_info_packed	info;
+	__u32				min;
+	__u32				max;
+} __packed;
+
 static int verify_one_alg(struct nlattr **attrs, enum xfrm_attr_type_t type)
 {
 	struct nlattr *rt = attrs[type];
@@ -1279,11 +1285,21 @@ static int xfrm_alloc_userspi(struct sk_buff *skb, struct nlmsghdr *nlh,
 	xfrm_address_t *daddr;
 	int family;
 	int err;
-	u32 mark;
+	u32 mark, spi_min, spi_max;
 	struct xfrm_mark m;
 
 	p = nlmsg_data(nlh);
-	err = verify_spi_info(p->info.id.proto, p->min, p->max);
+	if (in_compat_syscall()) {
+		struct xfrm_userspi_info_packed *_p = nlmsg_data(nlh);
+
+		spi_min = _p->min;
+		spi_max = _p->max;
+	} else {
+		spi_min = p->min;
+		spi_max = p->max;
+	}
+
+	err = verify_spi_info(p->info.id.proto, spi_min, spi_max);
 	if (err)
 		goto out_noput;
 
@@ -1310,7 +1326,7 @@ static int xfrm_alloc_userspi(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (x == NULL)
 		goto out_noput;
 
-	err = xfrm_alloc_spi(x, p->min, p->max);
+	err = xfrm_alloc_spi(x, spi_min, spi_max);
 	if (err)
 		goto out;
 
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 06/18] netlink: Do not subscribe to non-existent groups
  2018-07-26  2:31 ` Dmitry Safonov
                   ` (8 preceding siblings ...)
  (?)
@ 2018-07-26  2:31 ` Dmitry Safonov
  2018-07-26  4:22   ` David Miller
  -1 siblings, 1 reply; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-26  2:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev

Make ABI more strict about subscribing to group > ngroups.
Code doesn't check for that and it looks bogus.
(one can subscribe to non-existing group)
Still, it's possible to bind() to all possible groups with (-1)

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 net/netlink/af_netlink.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 393573a99a5a..ac805caed2e2 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1008,6 +1008,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
 		if (err)
 			return err;
 	}
+	groups &= (1UL << nlk->ngroups) - 1;
 
 	bound = nlk->bound;
 	if (bound) {
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 07/18] netlink: Pass groups pointer to .bind()
  2018-07-26  2:31 ` Dmitry Safonov
                   ` (9 preceding siblings ...)
  (?)
@ 2018-07-26  2:31 ` Dmitry Safonov
  -1 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-26  2:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev, Eric Paris, Florian Westphal,
	Jozsef Kadlecsik, Pablo Neira Ayuso, Paul Moore, coreteam,
	linux-audit, netfilter-devel

Netlink messages sent by xfrm differ in size between 64-bit native and
32-bit compatible applications. To know which UABI to use to send the
message from kernel, I'll use the type of bind() syscall.
Xfrm will have hidden from userspace kernel-only groups for compatible
applications.
So, add pointer to groups to netlink_bind().
With later patches xfrm will set a proper compat group for netlink
socket during bind().

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: coreteam@netfilter.org
Cc: linux-audit@redhat.com
Cc: netdev@vger.kernel.org
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 include/linux/netlink.h   |  2 +-
 kernel/audit.c            |  2 +-
 net/core/rtnetlink.c      | 14 ++++++--------
 net/core/sock_diag.c      | 25 ++++++++++++-------------
 net/netfilter/nfnetlink.c | 24 ++++++++++++++----------
 net/netlink/af_netlink.c  | 27 ++++++++++-----------------
 net/netlink/af_netlink.h  |  4 ++--
 net/netlink/genetlink.c   | 26 ++++++++++++++++++--------
 8 files changed, 64 insertions(+), 60 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index f3075d6c7e82..19202648e04a 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -46,7 +46,7 @@ struct netlink_kernel_cfg {
 	unsigned int	flags;
 	void		(*input)(struct sk_buff *skb);
 	struct mutex	*cb_mutex;
-	int		(*bind)(struct net *net, int group);
+	int		(*bind)(struct net *net, unsigned long *groups);
 	void		(*unbind)(struct net *net, int group);
 	bool		(*compare)(struct net *net, struct sock *sk);
 };
diff --git a/kernel/audit.c b/kernel/audit.c
index e7478cb58079..87ca0214bcf2 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -1523,7 +1523,7 @@ static void audit_receive(struct sk_buff  *skb)
 }
 
 /* Run custom bind function on netlink socket group connect or bind requests. */
-static int audit_bind(struct net *net, int group)
+static int audit_bind(struct net *net, unsigned long *groups)
 {
 	if (!capable(CAP_AUDIT_READ))
 		return -EPERM;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index e3f743c141b3..0465e692ae32 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -4683,15 +4683,13 @@ static void rtnetlink_rcv(struct sk_buff *skb)
 	netlink_rcv_skb(skb, &rtnetlink_rcv_msg);
 }
 
-static int rtnetlink_bind(struct net *net, int group)
+static int rtnetlink_bind(struct net *net, unsigned long *groups)
 {
-	switch (group) {
-	case RTNLGRP_IPV4_MROUTE_R:
-	case RTNLGRP_IPV6_MROUTE_R:
-		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
-			return -EPERM;
-		break;
-	}
+	unsigned long mroute_r;
+
+	mroute_r = 1UL << RTNLGRP_IPV4_MROUTE_R | 1UL << RTNLGRP_IPV6_MROUTE_R;
+	if ((*groups & mroute_r) && !ns_capable(net->user_ns, CAP_NET_ADMIN))
+		return -EPERM;
 	return 0;
 }
 
diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
index c37b5be7c5e4..befa6759f2ad 100644
--- a/net/core/sock_diag.c
+++ b/net/core/sock_diag.c
@@ -273,20 +273,19 @@ static void sock_diag_rcv(struct sk_buff *skb)
 	mutex_unlock(&sock_diag_mutex);
 }
 
-static int sock_diag_bind(struct net *net, int group)
+static int sock_diag_bind(struct net *net, unsigned long *groups)
 {
-	switch (group) {
-	case SKNLGRP_INET_TCP_DESTROY:
-	case SKNLGRP_INET_UDP_DESTROY:
-		if (!sock_diag_handlers[AF_INET])
-			sock_load_diag_module(AF_INET, 0);
-		break;
-	case SKNLGRP_INET6_TCP_DESTROY:
-	case SKNLGRP_INET6_UDP_DESTROY:
-		if (!sock_diag_handlers[AF_INET6])
-			sock_load_diag_module(AF_INET6, 0);
-		break;
-	}
+	unsigned long inet_mask, inet6_mask;
+
+	inet_mask   = 1UL << SKNLGRP_INET_TCP_DESTROY;
+	inet_mask  |= 1UL << SKNLGRP_INET_UDP_DESTROY;
+	inet6_mask  = 1UL << SKNLGRP_INET6_TCP_DESTROY;
+	inet6_mask |= 1UL << SKNLGRP_INET6_UDP_DESTROY;
+
+	if ((*groups & inet_mask) && !sock_diag_handlers[AF_INET])
+		sock_load_diag_module(AF_INET, 0);
+	if ((*groups & inet6_mask) && !sock_diag_handlers[AF_INET6])
+		sock_load_diag_module(AF_INET6, 0);
 	return 0;
 }
 
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index e1b6be29848d..6a8893df5285 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -556,21 +556,25 @@ static void nfnetlink_rcv(struct sk_buff *skb)
 }
 
 #ifdef CONFIG_MODULES
-static int nfnetlink_bind(struct net *net, int group)
+static int nfnetlink_bind(struct net *net, unsigned long *groups)
 {
 	const struct nfnetlink_subsystem *ss;
-	int type;
+	unsigned long _groups = *groups;
+	int type, group_bit, group = -1;
 
-	if (group <= NFNLGRP_NONE || group > NFNLGRP_MAX)
-		return 0;
+	while ((group_bit = __builtin_ffsl(_groups))) {
+		group += group_bit;
 
-	type = nfnl_group2type[group];
+		type = nfnl_group2type[group];
+		rcu_read_lock();
+		ss = nfnetlink_get_subsys(type << 8);
+		rcu_read_unlock();
+		if (!ss)
+			request_module("nfnetlink-subsys-%d", type);
+
+		_groups >>= group_bit;
+	}
 
-	rcu_read_lock();
-	ss = nfnetlink_get_subsys(type << 8);
-	rcu_read_unlock();
-	if (!ss)
-		request_module("nfnetlink-subsys-%d", type);
 	return 0;
 }
 #endif
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index ac805caed2e2..1e11e706c683 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -668,7 +668,7 @@ static int netlink_create(struct net *net, struct socket *sock, int protocol,
 	struct module *module = NULL;
 	struct mutex *cb_mutex;
 	struct netlink_sock *nlk;
-	int (*bind)(struct net *net, int group);
+	int (*bind)(struct net *net, unsigned long *groups);
 	void (*unbind)(struct net *net, int group);
 	int err = 0;
 
@@ -969,8 +969,7 @@ static int netlink_realloc_groups(struct sock *sk)
 	return err;
 }
 
-static void netlink_undo_bind(int group, long unsigned int groups,
-			      struct sock *sk)
+static void netlink_undo_bind(unsigned long groups, struct sock *sk)
 {
 	struct netlink_sock *nlk = nlk_sk(sk);
 	int undo;
@@ -978,7 +977,7 @@ static void netlink_undo_bind(int group, long unsigned int groups,
 	if (!nlk->netlink_unbind)
 		return;
 
-	for (undo = 0; undo < group; undo++)
+	for (undo = 0; undo < nlk->ngroups; undo++)
 		if (test_bit(undo, &groups))
 			nlk->netlink_unbind(sock_net(sk), undo + 1);
 }
@@ -991,7 +990,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
 	struct netlink_sock *nlk = nlk_sk(sk);
 	struct sockaddr_nl *nladdr = (struct sockaddr_nl *)addr;
 	int err = 0;
-	long unsigned int groups = nladdr->nl_groups;
+	unsigned long groups = nladdr->nl_groups;
 	bool bound;
 
 	if (addr_len < sizeof(struct sockaddr_nl))
@@ -1021,17 +1020,9 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
 
 	netlink_lock_table();
 	if (nlk->netlink_bind && groups) {
-		int group;
-
-		for (group = 0; group < nlk->ngroups; group++) {
-			if (!test_bit(group, &groups))
-				continue;
-			err = nlk->netlink_bind(net, group + 1);
-			if (!err)
-				continue;
-			netlink_undo_bind(group, groups, sk);
+		err = nlk->netlink_bind(net, &groups);
+		if (err)
 			goto unlock;
-		}
 	}
 
 	/* No need for barriers here as we return to user-space without
@@ -1042,7 +1033,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
 			netlink_insert(sk, nladdr->nl_pid) :
 			netlink_autobind(sock);
 		if (err) {
-			netlink_undo_bind(nlk->ngroups, groups, sk);
+			netlink_undo_bind(groups, sk);
 			goto unlock;
 		}
 	}
@@ -1652,7 +1643,9 @@ static int netlink_setsockopt(struct socket *sock, int level, int optname,
 		if (!val || val - 1 >= nlk->ngroups)
 			return -EINVAL;
 		if (optname == NETLINK_ADD_MEMBERSHIP && nlk->netlink_bind) {
-			err = nlk->netlink_bind(sock_net(sk), val);
+			unsigned long groups = 1UL << val;
+
+			err = nlk->netlink_bind(sock_net(sk), &groups);
 			if (err)
 				return err;
 		}
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index 962de7b3c023..e765172abbb7 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -39,7 +39,7 @@ struct netlink_sock {
 	struct mutex		*cb_mutex;
 	struct mutex		cb_def_mutex;
 	void			(*netlink_rcv)(struct sk_buff *skb);
-	int			(*netlink_bind)(struct net *net, int group);
+	int			(*netlink_bind)(struct net *net, unsigned long *groups);
 	void			(*netlink_unbind)(struct net *net, int group);
 	struct module		*module;
 
@@ -61,7 +61,7 @@ struct netlink_table {
 	unsigned int		groups;
 	struct mutex		*cb_mutex;
 	struct module		*module;
-	int			(*bind)(struct net *net, int group);
+	int			(*bind)(struct net *net, unsigned long *groups);
 	void			(*unbind)(struct net *net, int group);
 	bool			(*compare)(struct net *net, struct sock *sock);
 	int			registered;
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 25eeb6d2a75a..a86b105730cf 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -960,28 +960,38 @@ static struct genl_family genl_ctrl __ro_after_init = {
 	.netnsok = true,
 };
 
-static int genl_bind(struct net *net, int group)
+static int genl_bind(struct net *net, unsigned long *groups)
 {
+	unsigned long mcgrps;
 	struct genl_family *f;
-	int err = -ENOENT;
+	int err = 0;
 	unsigned int id;
 
 	down_read(&cb_lock);
 
 	idr_for_each_entry(&genl_fam_idr, f, id) {
-		if (group >= f->mcgrp_offset &&
-		    group < f->mcgrp_offset + f->n_mcgrps) {
-			int fam_grp = group - f->mcgrp_offset;
+		int fam_grp_bit, fam_grp = -1;
+
+		mcgrps = (1UL << f->n_mcgrps) - 1;
+		mcgrps <<= f->mcgrp_offset;
+		mcgrps &= *groups;
+
+		if (!mcgrps)
+			continue;
+
+		while ((fam_grp_bit = __builtin_ffsl(mcgrps))) {
+			fam_grp += fam_grp_bit;
 
 			if (!f->netnsok && net != &init_net)
 				err = -ENOENT;
 			else if (f->mcast_bind)
 				err = f->mcast_bind(net, fam_grp);
-			else
-				err = 0;
-			break;
+
+			if (err)
+				goto out;
 		}
 	}
+out:
 	up_read(&cb_lock);
 
 	return err;
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 08/18] xfrm: Add in-kernel groups for compat notifications
  2018-07-26  2:31 ` Dmitry Safonov
                   ` (10 preceding siblings ...)
  (?)
@ 2018-07-26  2:31 ` Dmitry Safonov
  -1 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-26  2:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev

Introduce kernel-only, hidden from userspace groups.
Application that bind()ed by kernel to such group will receive netlink
messages in compatible ABI on 64-bit kernels.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 net/xfrm/xfrm_user.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index bf2ca93edaf5..b123e788488f 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -67,6 +67,29 @@ struct xfrm_userspi_info_packed {
 	__u32				max;
 } __packed;
 
+/* In-kernel, non-uapi compat groups.
+ * As compat/native messages differ, send notifications according
+ * to .bind() caller's ABI. There are *_COMPAT hidden from userspace
+ * groups for such task.
+ */
+enum xfrm_nlgroups_kernel {
+	XFRMNLGRP_COMPAT_MIN = XFRMNLGRP_MAX,
+	XFRMNLGRP_COMPAT_ACQUIRE,
+	XFRMNLGRP_COMPAT_EXPIRE,
+	XFRMNLGRP_COMPAT_SA,
+	XFRMNLGRP_COMPAT_POLICY,
+	/* Group messages for the following notifications do not differ
+	 * in size between native and compat structures:
+	 * XFRMNLGRP_AEVENTS,
+	 * XFRMNLGRP_REPORT,
+	 * XFRMNLGRP_MIGRATE,
+	 * XFRMNLGRP_MAPPING,
+	 */
+	__XFRMNLGRP_COMPAT_MAX
+};
+
+#define XFRMNLGRP_KERNEL_MAX	(__XFRMNLGRP_COMPAT_MAX - 1)
+
 static int verify_one_alg(struct nlattr **attrs, enum xfrm_attr_type_t type)
 {
 	struct nlattr *rt = attrs[type];
@@ -2645,6 +2668,34 @@ static void xfrm_netlink_rcv(struct sk_buff *skb)
 	mutex_unlock(&net->xfrm.xfrm_cfg_mutex);
 }
 
+static inline void xfrm_nlgrp_compat(unsigned long *groups,
+		int group, int group_compat)
+{
+	unsigned long group_bit = 1UL << (group - 1);
+
+	if (*groups & group_bit) {
+		*groups &= ~group_bit;
+		*groups |= 1UL << (group_compat - 1);
+	}
+}
+
+static int xfrm_netlink_bind(struct net *net, unsigned long *groups)
+{
+	unsigned long uapi_mask = (1UL << XFRMNLGRP_MAX) - 1;
+
+	*groups &= uapi_mask;
+
+	if (!in_compat_syscall())
+		return 0;
+
+	xfrm_nlgrp_compat(groups, XFRMNLGRP_ACQUIRE, XFRMNLGRP_COMPAT_ACQUIRE);
+	xfrm_nlgrp_compat(groups, XFRMNLGRP_EXPIRE, XFRMNLGRP_COMPAT_EXPIRE);
+	xfrm_nlgrp_compat(groups, XFRMNLGRP_SA, XFRMNLGRP_COMPAT_SA);
+	xfrm_nlgrp_compat(groups, XFRMNLGRP_POLICY, XFRMNLGRP_COMPAT_POLICY);
+
+	return 0;
+}
+
 static inline unsigned int xfrm_expire_msgsize(void)
 {
 	return NLMSG_ALIGN(sizeof(struct xfrm_user_expire))
@@ -3283,8 +3334,9 @@ static int __net_init xfrm_user_net_init(struct net *net)
 {
 	struct sock *nlsk;
 	struct netlink_kernel_cfg cfg = {
-		.groups	= XFRMNLGRP_MAX,
+		.groups	= XFRMNLGRP_KERNEL_MAX,
 		.input	= xfrm_netlink_rcv,
+		.bind	= xfrm_netlink_bind,
 	};
 
 	nlsk = netlink_kernel_create(net, NETLINK_XFRM, &cfg);
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 09/18] xfrm: Dump usersa_info in compat/native formats
  2018-07-26  2:31 ` Dmitry Safonov
                   ` (11 preceding siblings ...)
  (?)
@ 2018-07-26  2:31 ` Dmitry Safonov
  -1 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-26  2:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev

Create xfrm_usersa_info in netlink messages in 32/64-bit UABI according
to type of syscall used to dump xfrm state.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 net/xfrm/xfrm_user.c | 55 ++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 43 insertions(+), 12 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index b123e788488f..63622264a3a9 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -799,9 +799,9 @@ static int xfrm_del_sa(struct sk_buff *skb, struct nlmsghdr *nlh,
 	return err;
 }
 
-static void copy_to_user_state(struct xfrm_state *x, struct xfrm_usersa_info *p)
+static void __copy_to_user_state(struct xfrm_state *x,
+		struct xfrm_usersa_info_packed *p)
 {
-	memset(p, 0, sizeof(*p));
 	memcpy(&p->id, &x->id, sizeof(p->id));
 	memcpy(&p->sel, &x->sel, sizeof(p->sel));
 	memcpy(&p->lft, &x->lft, sizeof(p->lft));
@@ -818,11 +818,25 @@ static void copy_to_user_state(struct xfrm_state *x, struct xfrm_usersa_info *p)
 	p->seq = x->km.seq;
 }
 
+static void copy_to_user_state(struct xfrm_state *x, struct xfrm_usersa_info *p)
+{
+	memset(p, 0, sizeof(*p));
+	__copy_to_user_state(x, (struct xfrm_usersa_info_packed *)p);
+}
+
+static void copy_to_user_state_compat(struct xfrm_state *x,
+		struct xfrm_usersa_info_packed *p)
+{
+	memset(p, 0, sizeof(*p));
+	__copy_to_user_state(x, p);
+}
+
 struct xfrm_dump_info {
 	struct sk_buff *in_skb;
 	struct sk_buff *out_skb;
 	u32 nlmsg_seq;
 	u16 nlmsg_flags;
+	bool compat_dump;
 };
 
 static int copy_sec_ctx(struct xfrm_sec_ctx *s, struct sk_buff *skb)
@@ -882,14 +896,10 @@ static int copy_to_user_auth(struct xfrm_algo_auth *auth, struct sk_buff *skb)
 }
 
 /* Don't change this without updating xfrm_sa_len! */
-static int copy_to_user_state_extra(struct xfrm_state *x,
-				    struct xfrm_usersa_info *p,
-				    struct sk_buff *skb)
+static int __copy_to_user_state_extra(struct xfrm_state *x, struct sk_buff *skb)
 {
 	int ret = 0;
 
-	copy_to_user_state(x, p);
-
 	if (x->props.extra_flags) {
 		ret = nla_put_u32(skb, XFRMA_SA_EXTRA_FLAGS,
 				  x->props.extra_flags);
@@ -968,23 +978,42 @@ static int copy_to_user_state_extra(struct xfrm_state *x,
 	return ret;
 }
 
+static int copy_to_user_state_extra(struct xfrm_state *x,
+		struct xfrm_usersa_info *p, struct sk_buff *skb)
+{
+	copy_to_user_state(x, p);
+	return __copy_to_user_state_extra(x, skb);
+}
+
+static int copy_to_user_state_extra_compat(struct xfrm_state *x,
+		struct xfrm_usersa_info_packed *p, struct sk_buff *skb)
+{
+	copy_to_user_state_compat(x, p);
+	return __copy_to_user_state_extra(x, skb);
+}
+
 static int dump_one_state(struct xfrm_state *x, int count, void *ptr)
 {
 	struct xfrm_dump_info *sp = ptr;
 	struct sk_buff *in_skb = sp->in_skb;
 	struct sk_buff *skb = sp->out_skb;
-	struct xfrm_usersa_info *p;
 	struct nlmsghdr *nlh;
+	size_t msg_len;
 	int err;
 
+	if (sp->compat_dump)
+		msg_len = sizeof(struct xfrm_usersa_info_packed);
+	else
+		msg_len = sizeof(struct xfrm_usersa_info);
 	nlh = nlmsg_put(skb, NETLINK_CB(in_skb).portid, sp->nlmsg_seq,
-			XFRM_MSG_NEWSA, sizeof(*p), sp->nlmsg_flags);
+			XFRM_MSG_NEWSA, msg_len, sp->nlmsg_flags);
 	if (nlh == NULL)
 		return -EMSGSIZE;
 
-	p = nlmsg_data(nlh);
-
-	err = copy_to_user_state_extra(x, p, skb);
+	if (sp->compat_dump)
+		err = copy_to_user_state_extra_compat(x, nlmsg_data(nlh), skb);
+	else
+		err = copy_to_user_state_extra(x, nlmsg_data(nlh), skb);
 	if (err) {
 		nlmsg_cancel(skb, nlh);
 		return err;
@@ -1018,6 +1047,7 @@ static int xfrm_dump_sa(struct sk_buff *skb, struct netlink_callback *cb)
 	info.out_skb = skb;
 	info.nlmsg_seq = cb->nlh->nlmsg_seq;
 	info.nlmsg_flags = NLM_F_MULTI;
+	info.compat_dump = in_compat_syscall();
 
 	if (!cb->args[0]) {
 		struct nlattr *attrs[XFRMA_MAX+1];
@@ -1064,6 +1094,7 @@ static struct sk_buff *xfrm_state_netlink(struct sk_buff *in_skb,
 	info.out_skb = skb;
 	info.nlmsg_seq = seq;
 	info.nlmsg_flags = 0;
+	info.compat_dump = in_compat_syscall();
 
 	err = dump_one_state(x, 0, &info);
 	if (err) {
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 10/18] xfrm: Send state notifications in compat format too
  2018-07-26  2:31 ` Dmitry Safonov
                   ` (12 preceding siblings ...)
  (?)
@ 2018-07-26  2:31 ` Dmitry Safonov
  -1 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-26  2:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev

Applications that used native bind() syscall are in XFRMNLGRP_SA, so
send there xfrm_usersa_info messages (with 64-bit ABI). Compatible
applications are added to kernel-hidden XFRMNLGRP_COMPAT_SA group, so
send there xfrm_usersa_info messages_packed (with 32-bit ABI)

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 net/xfrm/xfrm_user.c | 38 +++++++++++++++++++++++++++++---------
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 63622264a3a9..230462077dc9 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2856,18 +2856,24 @@ static inline unsigned int xfrm_sa_len(struct xfrm_state *x)
 	return l;
 }
 
-static int xfrm_notify_sa(struct xfrm_state *x, const struct km_event *c)
+static int __xfrm_notify_sa(struct xfrm_state *x,
+		const struct km_event *c, bool compat)
 {
 	struct net *net = xs_net(x);
-	struct xfrm_usersa_info *p;
 	struct xfrm_usersa_id *id;
 	struct nlmsghdr *nlh;
 	struct sk_buff *skb;
 	unsigned int len = xfrm_sa_len(x);
-	unsigned int headlen;
+	unsigned int headlen, usersa_info_size;
+	void *usersa_info;
 	int err;
 
-	headlen = sizeof(*p);
+	if (compat)
+		usersa_info_size = sizeof(struct xfrm_usersa_info_packed);
+	else
+		usersa_info_size = sizeof(struct xfrm_usersa_info);
+	headlen = usersa_info_size;
+
 	if (c->event == XFRM_MSG_DELSA) {
 		len += nla_total_size(headlen);
 		headlen = sizeof(*id);
@@ -2884,7 +2890,7 @@ static int xfrm_notify_sa(struct xfrm_state *x, const struct km_event *c)
 	if (nlh == NULL)
 		goto out_free_skb;
 
-	p = nlmsg_data(nlh);
+	usersa_info = nlmsg_data(nlh);
 	if (c->event == XFRM_MSG_DELSA) {
 		struct nlattr *attr;
 
@@ -2895,26 +2901,40 @@ static int xfrm_notify_sa(struct xfrm_state *x, const struct km_event *c)
 		id->family = x->props.family;
 		id->proto = x->id.proto;
 
-		attr = nla_reserve(skb, XFRMA_SA, sizeof(*p));
+		attr = nla_reserve(skb, XFRMA_SA, usersa_info_size);
 		err = -EMSGSIZE;
 		if (attr == NULL)
 			goto out_free_skb;
 
-		p = nla_data(attr);
+		usersa_info = nla_data(attr);
 	}
-	err = copy_to_user_state_extra(x, p, skb);
+
+	if (compat)
+		err = copy_to_user_state_extra(x, usersa_info, skb);
+	else
+		err = copy_to_user_state_extra_compat(x, usersa_info, skb);
 	if (err)
 		goto out_free_skb;
 
 	nlmsg_end(skb, nlh);
 
-	return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_SA);
+	return xfrm_nlmsg_multicast(net, skb, 0,
+			compat ? XFRMNLGRP_COMPAT_SA : XFRMNLGRP_SA);
 
 out_free_skb:
 	kfree_skb(skb);
 	return err;
 }
 
+static int xfrm_notify_sa(struct xfrm_state *x, const struct km_event *c)
+{
+	int ret = __xfrm_notify_sa(x, c, false);
+
+	if ((ret && ret != -ESRCH) || !IS_ENABLED(CONFIG_COMPAT))
+		return ret;
+	return __xfrm_notify_sa(x, c, true);
+}
+
 static int xfrm_send_state_notify(struct xfrm_state *x, const struct km_event *c)
 {
 
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 11/18] xfrm: Add compat support for xfrm_user_expire messages
  2018-07-26  2:31 ` Dmitry Safonov
                   ` (13 preceding siblings ...)
  (?)
@ 2018-07-26  2:31 ` Dmitry Safonov
  -1 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-26  2:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev

Parse expire messages sent by userspace according to in_compat_syscall().
Applications that used native bind() syscall are in XFRMNLGRP_EXPIRE, so
send there xfrm_usersa_info messages (with 64-bit ABI). Compatible
applications are added to kernel-hidden XFRMNLGRP_COMPAT_EXPIRE group, so
send there xfrm_usersa_info messages_packed (with 32-bit ABI)

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 net/xfrm/xfrm_user.c | 95 +++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 65 insertions(+), 30 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 230462077dc9..ca1a14f45cf7 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -67,6 +67,12 @@ struct xfrm_userspi_info_packed {
 	__u32				max;
 } __packed;
 
+struct xfrm_user_expire_packed {
+	struct xfrm_usersa_info_packed	state;
+	__u8				hard;
+	__u8				__pad[3];
+} __packed;
+
 /* In-kernel, non-uapi compat groups.
  * As compat/native messages differ, send notifications according
  * to .bind() caller's ABI. There are *_COMPAT hidden from userspace
@@ -2240,10 +2246,19 @@ static int xfrm_add_sa_expire(struct sk_buff *skb, struct nlmsghdr *nlh,
 	struct net *net = sock_net(skb->sk);
 	struct xfrm_state *x;
 	int err;
-	struct xfrm_user_expire *ue = nlmsg_data(nlh);
-	struct xfrm_usersa_info_packed *p = (struct xfrm_usersa_info_packed *)&ue->state;
+	struct xfrm_user_expire_packed *ue = nlmsg_data(nlh);
+	struct xfrm_usersa_info_packed *p = &ue->state;
 	struct xfrm_mark m;
 	u32 mark = xfrm_mark_get(attrs, &m);
+	u8 hard;
+
+	if (in_compat_syscall()) {
+		hard = ue->hard;
+	} else {
+		struct xfrm_user_expire *expire = nlmsg_data(nlh);
+
+		hard = expire->hard;
+	}
 
 	x = xfrm_state_lookup(net, mark, &p->id.daddr, p->id.spi, p->id.proto, p->family);
 
@@ -2255,9 +2270,9 @@ static int xfrm_add_sa_expire(struct sk_buff *skb, struct nlmsghdr *nlh,
 	err = -EINVAL;
 	if (x->km.state != XFRM_STATE_VALID)
 		goto out;
-	km_state_expired(x, ue->hard, nlh->nlmsg_pid);
+	km_state_expired(x, hard, nlh->nlmsg_pid);
 
-	if (ue->hard) {
+	if (hard) {
 		__xfrm_state_delete(x);
 		xfrm_audit_state_delete(x, 1, true);
 	}
@@ -2727,33 +2742,49 @@ static int xfrm_netlink_bind(struct net *net, unsigned long *groups)
 	return 0;
 }
 
-static inline unsigned int xfrm_expire_msgsize(void)
-{
-	return NLMSG_ALIGN(sizeof(struct xfrm_user_expire))
-	       + nla_total_size(sizeof(struct xfrm_mark));
-}
-
-static int build_expire(struct sk_buff *skb, struct xfrm_state *x, const struct km_event *c)
+static int build_expire(struct sk_buff **skb, struct xfrm_state *x,
+		const struct km_event *c, bool compat)
 {
-	struct xfrm_user_expire *ue;
 	struct nlmsghdr *nlh;
+	unsigned int ue_sz;
 	int err;
 
-	nlh = nlmsg_put(skb, c->portid, 0, XFRM_MSG_EXPIRE, sizeof(*ue), 0);
-	if (nlh == NULL)
+	if (compat)
+		ue_sz = NLMSG_ALIGN(sizeof(struct xfrm_user_expire_packed));
+	else
+		ue_sz = NLMSG_ALIGN(sizeof(struct xfrm_user_expire));
+
+	*skb = nlmsg_new(ue_sz + nla_total_size(sizeof(struct xfrm_mark)), GFP_ATOMIC);
+	if (*skb == NULL)
+		return -ENOMEM;
+
+	nlh = nlmsg_put(*skb, c->portid, 0, XFRM_MSG_EXPIRE, ue_sz, 0);
+	if (nlh == NULL) {
+		kfree_skb(*skb);
 		return -EMSGSIZE;
+	}
 
-	ue = nlmsg_data(nlh);
-	copy_to_user_state(x, &ue->state);
-	ue->hard = (c->data.hard != 0) ? 1 : 0;
-	/* clear the padding bytes */
-	memset(&ue->hard + 1, 0, sizeof(*ue) - offsetofend(typeof(*ue), hard));
+	if (compat) {
+		struct xfrm_user_expire_packed *ue = nlmsg_data(nlh);
 
-	err = xfrm_mark_put(skb, &x->mark);
-	if (err)
+		copy_to_user_state_compat(x, &ue->state);
+		ue->hard = (c->data.hard != 0) ? 1 : 0;
+	} else {
+		struct xfrm_user_expire *ue = nlmsg_data(nlh);
+
+		copy_to_user_state(x, &ue->state);
+		ue->hard = (c->data.hard != 0) ? 1 : 0;
+		/* clear the padding bytes */
+		memset(&ue->hard + 1, 0, sizeof(*ue) - offsetofend(typeof(*ue), hard));
+	}
+
+	err = xfrm_mark_put(*skb, &x->mark);
+	if (err) {
+		kfree_skb(*skb);
 		return err;
+	}
 
-	nlmsg_end(skb, nlh);
+	nlmsg_end(*skb, nlh);
 	return 0;
 }
 
@@ -2761,17 +2792,21 @@ static int xfrm_exp_state_notify(struct xfrm_state *x, const struct km_event *c)
 {
 	struct net *net = xs_net(x);
 	struct sk_buff *skb;
+	int err;
 
-	skb = nlmsg_new(xfrm_expire_msgsize(), GFP_ATOMIC);
-	if (skb == NULL)
-		return -ENOMEM;
+	err = build_expire(&skb, x, c, false);
+	if (err)
+		return err;
 
-	if (build_expire(skb, x, c) < 0) {
-		kfree_skb(skb);
-		return -EMSGSIZE;
-	}
+	err = xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_EXPIRE);
+	if ((err && err != -ESRCH) || !IS_ENABLED(CONFIG_COMPAT))
+		return err;
 
-	return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_EXPIRE);
+	err = build_expire(&skb, x, c, true);
+	if (err)
+		return err;
+
+	return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_COMPAT_EXPIRE);
 }
 
 static int xfrm_aevent_state_notify(struct xfrm_state *x, const struct km_event *c)
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 12/18] xfrm: Add compat support for xfrm_userpolicy_info messages
  2018-07-26  2:31 ` Dmitry Safonov
                   ` (14 preceding siblings ...)
  (?)
@ 2018-07-26  2:31 ` Dmitry Safonov
  -1 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-26  2:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev

Parse userpolicy messages sent by userspace according to in_compat_syscall().
Applications that used native bind() syscall are in XFRMNLGRP_POLICY, so
send there xfrm_usersa_info messages (with 64-bit ABI). Compatible
applications are added to kernel-hidden XFRMNLGRP_COMPAT_POLICY group, so
send there xfrm_usersa_info messages_packed (with 32-bit ABI)

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 net/xfrm/xfrm_user.c | 73 +++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 58 insertions(+), 15 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index ca1a14f45cf7..df792a3be8f2 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1630,9 +1630,9 @@ static void copy_from_user_policy(struct xfrm_policy *xp,
 	/* XXX xp->share = p->share; */
 }
 
-static void copy_to_user_policy(struct xfrm_policy *xp, struct xfrm_userpolicy_info *p, int dir)
+static void __copy_to_user_policy(struct xfrm_policy *xp,
+		struct xfrm_userpolicy_info_packed *p, int dir)
 {
-	memset(p, 0, sizeof(*p));
 	memcpy(&p->sel, &xp->selector, sizeof(p->sel));
 	memcpy(&p->lft, &xp->lft, sizeof(p->lft));
 	memcpy(&p->curlft, &xp->curlft, sizeof(p->curlft));
@@ -1645,6 +1645,20 @@ static void copy_to_user_policy(struct xfrm_policy *xp, struct xfrm_userpolicy_i
 	p->share = XFRM_SHARE_ANY; /* XXX xp->share */
 }
 
+static void copy_to_user_policy(struct xfrm_policy *xp,
+		struct xfrm_userpolicy_info *p, int dir)
+{
+	memset(p, 0, sizeof(*p));
+	__copy_to_user_policy(xp, (struct xfrm_userpolicy_info_packed *)p, dir);
+}
+
+static void copy_to_user_policy_compat(struct xfrm_policy *xp,
+		struct xfrm_userpolicy_info_packed *p, int dir)
+{
+	memset(p, 0, sizeof(*p));
+	__copy_to_user_policy(xp, p, dir);
+}
+
 static struct xfrm_policy *xfrm_policy_construct(struct net *net,
 		struct xfrm_userpolicy_info_packed *p,
 		struct nlattr **attrs, int *errp)
@@ -1795,19 +1809,26 @@ static inline int copy_to_user_policy_type(u8 type, struct sk_buff *skb)
 static int dump_one_policy(struct xfrm_policy *xp, int dir, int count, void *ptr)
 {
 	struct xfrm_dump_info *sp = ptr;
-	struct xfrm_userpolicy_info *p;
 	struct sk_buff *in_skb = sp->in_skb;
 	struct sk_buff *skb = sp->out_skb;
 	struct nlmsghdr *nlh;
+	size_t msg_len;
 	int err;
 
+	if (sp->compat_dump)
+		msg_len = sizeof(struct xfrm_userpolicy_info_packed);
+	else
+		msg_len = sizeof(struct xfrm_userpolicy_info);
 	nlh = nlmsg_put(skb, NETLINK_CB(in_skb).portid, sp->nlmsg_seq,
-			XFRM_MSG_NEWPOLICY, sizeof(*p), sp->nlmsg_flags);
+			XFRM_MSG_NEWPOLICY, msg_len, sp->nlmsg_flags);
 	if (nlh == NULL)
 		return -EMSGSIZE;
 
-	p = nlmsg_data(nlh);
-	copy_to_user_policy(xp, p, dir);
+	if (sp->compat_dump)
+		copy_to_user_policy_compat(xp, nlmsg_data(nlh), dir);
+	else
+		copy_to_user_policy(xp, nlmsg_data(nlh), dir);
+
 	err = copy_to_user_tmpl(xp, skb);
 	if (!err)
 		err = copy_to_user_sec_ctx(xp, skb);
@@ -1852,6 +1873,7 @@ static int xfrm_dump_policy(struct sk_buff *skb, struct netlink_callback *cb)
 	info.out_skb = skb;
 	info.nlmsg_seq = cb->nlh->nlmsg_seq;
 	info.nlmsg_flags = NLM_F_MULTI;
+	info.compat_dump = in_compat_syscall();
 
 	(void) xfrm_policy_walk(net, walk, dump_one_policy, &info);
 
@@ -1874,6 +1896,7 @@ static struct sk_buff *xfrm_policy_netlink(struct sk_buff *in_skb,
 	info.out_skb = skb;
 	info.nlmsg_seq = seq;
 	info.nlmsg_flags = 0;
+	info.compat_dump = in_compat_syscall();
 
 	err = dump_one_policy(xp, dir, 0, &info);
 	if (err) {
@@ -3184,18 +3207,24 @@ static int xfrm_exp_policy_notify(struct xfrm_policy *xp, int dir, const struct
 	return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_EXPIRE);
 }
 
-static int xfrm_notify_policy(struct xfrm_policy *xp, int dir, const struct km_event *c)
+static int __xfrm_notify_policy(struct xfrm_policy *xp, int dir,
+		const struct km_event *c, bool compat)
 {
 	unsigned int len = nla_total_size(sizeof(struct xfrm_user_tmpl) * xp->xfrm_nr);
+	unsigned int headlen, upi_size;
 	struct net *net = xp_net(xp);
-	struct xfrm_userpolicy_info *p;
 	struct xfrm_userpolicy_id *id;
+	void *userpolicy_info;
 	struct nlmsghdr *nlh;
 	struct sk_buff *skb;
-	unsigned int headlen;
 	int err;
 
-	headlen = sizeof(*p);
+	if (compat)
+		upi_size = sizeof(struct xfrm_userpolicy_info_packed);
+	else
+		upi_size = sizeof(struct xfrm_userpolicy_info);
+	headlen = upi_size;
+
 	if (c->event == XFRM_MSG_DELPOLICY) {
 		len += nla_total_size(headlen);
 		headlen = sizeof(*id);
@@ -3213,7 +3242,7 @@ static int xfrm_notify_policy(struct xfrm_policy *xp, int dir, const struct km_e
 	if (nlh == NULL)
 		goto out_free_skb;
 
-	p = nlmsg_data(nlh);
+	userpolicy_info = nlmsg_data(nlh);
 	if (c->event == XFRM_MSG_DELPOLICY) {
 		struct nlattr *attr;
 
@@ -3225,15 +3254,18 @@ static int xfrm_notify_policy(struct xfrm_policy *xp, int dir, const struct km_e
 		else
 			memcpy(&id->sel, &xp->selector, sizeof(id->sel));
 
-		attr = nla_reserve(skb, XFRMA_POLICY, sizeof(*p));
+		attr = nla_reserve(skb, XFRMA_POLICY, upi_size);
 		err = -EMSGSIZE;
 		if (attr == NULL)
 			goto out_free_skb;
 
-		p = nla_data(attr);
+		userpolicy_info = nla_data(attr);
 	}
 
-	copy_to_user_policy(xp, p, dir);
+	if (compat)
+		copy_to_user_policy_compat(xp, userpolicy_info, dir);
+	else
+		copy_to_user_policy(xp, userpolicy_info, dir);
 	err = copy_to_user_tmpl(xp, skb);
 	if (!err)
 		err = copy_to_user_policy_type(xp->type, skb);
@@ -3244,13 +3276,24 @@ static int xfrm_notify_policy(struct xfrm_policy *xp, int dir, const struct km_e
 
 	nlmsg_end(skb, nlh);
 
-	return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_POLICY);
+	return xfrm_nlmsg_multicast(net, skb, 0, compat ?
+			XFRMNLGRP_COMPAT_POLICY : XFRMNLGRP_POLICY);
 
 out_free_skb:
 	kfree_skb(skb);
 	return err;
 }
 
+static int xfrm_notify_policy(struct xfrm_policy *xp, int dir,
+		const struct km_event *c)
+{
+	int ret = __xfrm_notify_policy(xp, dir, c, false);
+
+	if ((ret && ret != -ESRCH) || !IS_ENABLED(CONFIG_COMPAT))
+		return ret;
+	return __xfrm_notify_policy(xp, dir, c, true);
+}
+
 static int xfrm_notify_policy_flush(const struct km_event *c)
 {
 	struct net *net = c->net;
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 13/18] xfrm: Add compat support for xfrm_user_acquire messages
  2018-07-26  2:31 ` Dmitry Safonov
                   ` (15 preceding siblings ...)
  (?)
@ 2018-07-26  2:31 ` Dmitry Safonov
  -1 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-26  2:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev

Parse acquire messages sent by userspace according to in_compat_syscall().
Applications that used native bind() syscall are in XFRMNLGRP_ACQUIRE, so
send there xfrm_usersa_info messages (with 64-bit ABI). Compatible
applications are added to kernel-hidden XFRMNLGRP_COMPAT_ACQUIRE group, so
send there xfrm_usersa_info messages_packed (with 32-bit ABI)

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 net/xfrm/xfrm_user.c | 113 +++++++++++++++++++++++++++++++++++----------------
 1 file changed, 77 insertions(+), 36 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index df792a3be8f2..89f891a0a9a4 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -73,6 +73,17 @@ struct xfrm_user_expire_packed {
 	__u8				__pad[3];
 } __packed;
 
+struct xfrm_user_acquire_packed {
+	struct xfrm_id				id;
+	xfrm_address_t				saddr;
+	struct xfrm_selector			sel;
+	struct xfrm_userpolicy_info_packed	policy;
+	__u32					aalgos;
+	__u32					ealgos;
+	__u32					calgos;
+	__u32					seq;
+} __packed;
+
 /* In-kernel, non-uapi compat groups.
  * As compat/native messages differ, send notifications according
  * to .bind() caller's ABI. There are *_COMPAT hidden from userspace
@@ -2316,8 +2327,8 @@ static int xfrm_add_acquire(struct sk_buff *skb, struct nlmsghdr *nlh,
 	struct nlattr *rt = attrs[XFRMA_TMPL];
 	struct xfrm_mark mark;
 
-	struct xfrm_user_acquire *ua = nlmsg_data(nlh);
-	struct xfrm_userpolicy_info_packed *upi = (void *)&ua->policy;
+	struct xfrm_user_acquire_packed *ua = nlmsg_data(nlh);
+	struct xfrm_user_acquire *_ua = nlmsg_data(nlh);
 	struct xfrm_state *x = xfrm_state_alloc(net);
 	int err = -ENOMEM;
 
@@ -2326,12 +2337,12 @@ static int xfrm_add_acquire(struct sk_buff *skb, struct nlmsghdr *nlh,
 
 	xfrm_mark_get(attrs, &mark);
 
-	err = verify_newpolicy_info(upi);
+	err = verify_newpolicy_info(&ua->policy);
 	if (err)
 		goto free_state;
 
 	/*   build an XP */
-	xp = xfrm_policy_construct(net, upi, attrs, &err);
+	xp = xfrm_policy_construct(net, &ua->policy, attrs, &err);
 	if (!xp)
 		goto free_state;
 
@@ -2348,9 +2359,15 @@ static int xfrm_add_acquire(struct sk_buff *skb, struct nlmsghdr *nlh,
 		x->props.mode = t->mode;
 		x->props.reqid = t->reqid;
 		x->props.family = ut->family;
-		t->aalgos = ua->aalgos;
-		t->ealgos = ua->ealgos;
-		t->calgos = ua->calgos;
+		if (in_compat_syscall()) {
+			t->aalgos = ua->aalgos;
+			t->ealgos = ua->ealgos;
+			t->calgos = ua->calgos;
+		} else {
+			t->aalgos = _ua->aalgos;
+			t->ealgos = _ua->ealgos;
+			t->calgos = _ua->calgos;
+		}
 		err = km_query(x, t, xp);
 
 	}
@@ -3017,25 +3034,32 @@ static int xfrm_send_state_notify(struct xfrm_state *x, const struct km_event *c
 
 }
 
-static inline unsigned int xfrm_acquire_msgsize(struct xfrm_state *x,
-						struct xfrm_policy *xp)
+static int build_acquire(struct sk_buff **skb, struct xfrm_state *x,
+			 struct xfrm_tmpl *xt, struct xfrm_policy *xp,
+			 bool compat)
 {
-	return NLMSG_ALIGN(sizeof(struct xfrm_user_acquire))
+	__u32 seq = xfrm_get_acqseq();
+	struct xfrm_user_acquire_packed *ua;
+	struct nlmsghdr *nlh;
+	unsigned int ua_size, ack_msgsize;
+	int err;
+
+	if (compat)
+		ua_size = NLMSG_ALIGN(sizeof(struct xfrm_user_acquire_packed));
+	else
+		ua_size = NLMSG_ALIGN(sizeof(struct xfrm_user_acquire));
+
+	ack_msgsize = ua_size
 	       + nla_total_size(sizeof(struct xfrm_user_tmpl) * xp->xfrm_nr)
 	       + nla_total_size(sizeof(struct xfrm_mark))
 	       + nla_total_size(xfrm_user_sec_ctx_size(x->security))
 	       + userpolicy_type_attrsize();
-}
 
-static int build_acquire(struct sk_buff *skb, struct xfrm_state *x,
-			 struct xfrm_tmpl *xt, struct xfrm_policy *xp)
-{
-	__u32 seq = xfrm_get_acqseq();
-	struct xfrm_user_acquire *ua;
-	struct nlmsghdr *nlh;
-	int err;
+	*skb = nlmsg_new(ack_msgsize, GFP_ATOMIC);
+	if (*skb == NULL)
+		return -ENOMEM;
 
-	nlh = nlmsg_put(skb, 0, 0, XFRM_MSG_ACQUIRE, sizeof(*ua), 0);
+	nlh = nlmsg_put(*skb, 0, 0, XFRM_MSG_ACQUIRE, ua_size, 0);
 	if (nlh == NULL)
 		return -EMSGSIZE;
 
@@ -3043,25 +3067,36 @@ static int build_acquire(struct sk_buff *skb, struct xfrm_state *x,
 	memcpy(&ua->id, &x->id, sizeof(ua->id));
 	memcpy(&ua->saddr, &x->props.saddr, sizeof(ua->saddr));
 	memcpy(&ua->sel, &x->sel, sizeof(ua->sel));
-	copy_to_user_policy(xp, &ua->policy, XFRM_POLICY_OUT);
-	ua->aalgos = xt->aalgos;
-	ua->ealgos = xt->ealgos;
-	ua->calgos = xt->calgos;
-	ua->seq = x->km.seq = seq;
 
-	err = copy_to_user_tmpl(xp, skb);
+	if (compat) {
+		copy_to_user_policy_compat(xp, &ua->policy, XFRM_POLICY_OUT);
+		ua->aalgos = xt->aalgos;
+		ua->ealgos = xt->ealgos;
+		ua->calgos = xt->calgos;
+		ua->seq = x->km.seq = seq;
+	} else {
+		struct xfrm_user_acquire *_ua = nlmsg_data(nlh);
+
+		copy_to_user_policy(xp, &_ua->policy, XFRM_POLICY_OUT);
+		_ua->aalgos = xt->aalgos;
+		_ua->ealgos = xt->ealgos;
+		_ua->calgos = xt->calgos;
+		_ua->seq = x->km.seq = seq;
+	}
+
+	err = copy_to_user_tmpl(xp, *skb);
 	if (!err)
-		err = copy_to_user_state_sec_ctx(x, skb);
+		err = copy_to_user_state_sec_ctx(x, *skb);
 	if (!err)
-		err = copy_to_user_policy_type(xp->type, skb);
+		err = copy_to_user_policy_type(xp->type, *skb);
 	if (!err)
-		err = xfrm_mark_put(skb, &xp->mark);
+		err = xfrm_mark_put(*skb, &xp->mark);
 	if (err) {
-		nlmsg_cancel(skb, nlh);
+		nlmsg_cancel(*skb, nlh);
 		return err;
 	}
 
-	nlmsg_end(skb, nlh);
+	nlmsg_end(*skb, nlh);
 	return 0;
 }
 
@@ -3072,14 +3107,20 @@ static int xfrm_send_acquire(struct xfrm_state *x, struct xfrm_tmpl *xt,
 	struct sk_buff *skb;
 	int err;
 
-	skb = nlmsg_new(xfrm_acquire_msgsize(x, xp), GFP_ATOMIC);
-	if (skb == NULL)
-		return -ENOMEM;
 
-	err = build_acquire(skb, x, xt, xp);
-	BUG_ON(err < 0);
+	err = build_acquire(&skb, x, xt, xp, false);
+	if (err)
+		return err;
+
+	err = xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_ACQUIRE);
+	if ((err && err != -ESRCH) || !IS_ENABLED(CONFIG_COMPAT))
+		return err;
+
+	err = build_acquire(&skb, x, xt, xp, true);
+	if (err)
+		return err;
 
-	return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_ACQUIRE);
+	return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_COMPAT_ACQUIRE);
 }
 
 /* User gives us xfrm_user_policy_info followed by an array of 0
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 14/18] xfrm: Add compat support for xfrm_user_polexpire messages
  2018-07-26  2:31 ` Dmitry Safonov
                   ` (16 preceding siblings ...)
  (?)
@ 2018-07-26  2:31 ` Dmitry Safonov
  -1 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-26  2:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev

Parse polexpire messages sent by userspace according to in_compat_syscall().
Applications that used native bind() syscall are in XFRMNLGRP_EXPIRE, so
send there xfrm_usersa_info messages (with 64-bit ABI). Compatible
applications are added to kernel-hidden XFRMNLGRP_COMPAT_EXPIRE group, so
send there xfrm_usersa_info messages_packed (with 32-bit ABI)

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 net/xfrm/xfrm_user.c | 89 +++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 61 insertions(+), 28 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 89f891a0a9a4..2fe6174b8a18 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -84,6 +84,12 @@ struct xfrm_user_acquire_packed {
 	__u32					seq;
 } __packed;
 
+struct xfrm_user_polexpire_packed {
+	struct xfrm_userpolicy_info_packed	pol;
+	__u8					hard;
+	__u8					__pad[3];
+} __packed;
+
 /* In-kernel, non-uapi compat groups.
  * As compat/native messages differ, send notifications according
  * to .bind() caller's ABI. There are *_COMPAT hidden from userspace
@@ -2225,7 +2231,15 @@ static int xfrm_add_pol_expire(struct sk_buff *skb, struct nlmsghdr *nlh,
 	int err = -ENOENT;
 	struct xfrm_mark m;
 	u32 mark = xfrm_mark_get(attrs, &m);
+	u8 hard;
 
+	if (in_compat_syscall()) {
+		struct xfrm_user_polexpire_packed *_up = nlmsg_data(nlh);
+
+		hard = _up->hard;
+	} else {
+		hard = up->hard;
+	}
 	err = copy_from_user_policy_type(&type, attrs);
 	if (err)
 		return err;
@@ -2263,11 +2277,11 @@ static int xfrm_add_pol_expire(struct sk_buff *skb, struct nlmsghdr *nlh,
 		goto out;
 
 	err = 0;
-	if (up->hard) {
+	if (hard) {
 		xfrm_policy_delete(xp, p->dir);
 		xfrm_audit_policy_delete(xp, 1, true);
 	}
-	km_policy_expired(xp, p->dir, up->hard, nlh->nlmsg_pid);
+	km_policy_expired(xp, p->dir, hard, nlh->nlmsg_pid);
 
 out:
 	xfrm_pol_put(xp);
@@ -3192,43 +3206,59 @@ static struct xfrm_policy *xfrm_compile_policy(struct sock *sk, int opt,
 	return xp;
 }
 
-static inline unsigned int xfrm_polexpire_msgsize(struct xfrm_policy *xp)
+static int build_polexpire(struct sk_buff **skb, struct xfrm_policy *xp,
+			   int dir, const struct km_event *c, bool compat)
 {
-	return NLMSG_ALIGN(sizeof(struct xfrm_user_polexpire))
+	struct xfrm_user_polexpire_packed *_upe;
+	struct xfrm_user_polexpire *upe;
+	unsigned int upe_size, polexpire_msgsize;
+	int hard = c->data.hard;
+	struct nlmsghdr *nlh;
+	int err;
+
+	if (compat)
+		upe_size = NLMSG_ALIGN(sizeof(struct xfrm_user_polexpire_packed));
+	else
+		upe_size = NLMSG_ALIGN(sizeof(struct xfrm_user_polexpire));
+	polexpire_msgsize = upe_size
 	       + nla_total_size(sizeof(struct xfrm_user_tmpl) * xp->xfrm_nr)
 	       + nla_total_size(xfrm_user_sec_ctx_size(xp->security))
 	       + nla_total_size(sizeof(struct xfrm_mark))
 	       + userpolicy_type_attrsize();
-}
 
-static int build_polexpire(struct sk_buff *skb, struct xfrm_policy *xp,
-			   int dir, const struct km_event *c)
-{
-	struct xfrm_user_polexpire *upe;
-	int hard = c->data.hard;
-	struct nlmsghdr *nlh;
-	int err;
+	*skb = nlmsg_new(polexpire_msgsize, GFP_ATOMIC);
+	if (*skb == NULL)
+		return -ENOMEM;
 
-	nlh = nlmsg_put(skb, c->portid, 0, XFRM_MSG_POLEXPIRE, sizeof(*upe), 0);
+	nlh = nlmsg_put(*skb, c->portid, 0, XFRM_MSG_POLEXPIRE, upe_size, 0);
 	if (nlh == NULL)
 		return -EMSGSIZE;
 
+	_upe = nlmsg_data(nlh);
 	upe = nlmsg_data(nlh);
-	copy_to_user_policy(xp, &upe->pol, dir);
-	err = copy_to_user_tmpl(xp, skb);
+	if (compat)
+		copy_to_user_policy_compat(xp, &_upe->pol, dir);
+	else
+		copy_to_user_policy(xp, &upe->pol, dir);
+
+	err = copy_to_user_tmpl(xp, *skb);
 	if (!err)
-		err = copy_to_user_sec_ctx(xp, skb);
+		err = copy_to_user_sec_ctx(xp, *skb);
 	if (!err)
-		err = copy_to_user_policy_type(xp->type, skb);
+		err = copy_to_user_policy_type(xp->type, *skb);
 	if (!err)
-		err = xfrm_mark_put(skb, &xp->mark);
+		err = xfrm_mark_put(*skb, &xp->mark);
 	if (err) {
-		nlmsg_cancel(skb, nlh);
+		nlmsg_cancel(*skb, nlh);
 		return err;
 	}
-	upe->hard = !!hard;
 
-	nlmsg_end(skb, nlh);
+	if (compat)
+		_upe->hard = !!hard;
+	else
+		upe->hard = !!hard;
+
+	nlmsg_end(*skb, nlh);
 	return 0;
 }
 
@@ -3238,14 +3268,17 @@ static int xfrm_exp_policy_notify(struct xfrm_policy *xp, int dir, const struct
 	struct sk_buff *skb;
 	int err;
 
-	skb = nlmsg_new(xfrm_polexpire_msgsize(xp), GFP_ATOMIC);
-	if (skb == NULL)
-		return -ENOMEM;
-
-	err = build_polexpire(skb, xp, dir, c);
-	BUG_ON(err < 0);
+	err = build_polexpire(&skb, xp, dir, c, false);
+	if (err)
+		return err;
+	err = xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_EXPIRE);
+	if ((err && err != -ESRCH) || !IS_ENABLED(CONFIG_COMPAT))
+		return err;
 
-	return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_EXPIRE);
+	err = build_polexpire(&skb, xp, dir, c, true);
+	if (err)
+		return err;
+	return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_COMPAT_EXPIRE);
 }
 
 static int __xfrm_notify_policy(struct xfrm_policy *xp, int dir,
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 15/18] xfrm: Check compat acquire listeners in xfrm_is_alive()
  2018-07-26  2:31 ` Dmitry Safonov
                   ` (17 preceding siblings ...)
  (?)
@ 2018-07-26  2:31 ` Dmitry Safonov
  -1 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-26  2:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev

As now there are two groups of listeners:
XFRMNLGRP_ACQUIRE - applications that uses native UABI for messages;
XFRMNLGRP_COMPAT_ACQUIRE - applications that uses compat UABI for messages;

So, both groups should be checked for listeners of acquire
notifications.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 include/net/xfrm.h   | 14 --------------
 net/xfrm/xfrm_user.c | 16 ++++++++++++++++
 2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 557122846e0e..c9b713017ae8 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1784,20 +1784,6 @@ static inline int xfrm_aevent_is_on(struct net *net)
 	rcu_read_unlock();
 	return ret;
 }
-
-static inline int xfrm_acquire_is_on(struct net *net)
-{
-	struct sock *nlsk;
-	int ret = 0;
-
-	rcu_read_lock();
-	nlsk = rcu_dereference(net->xfrm.nlsk);
-	if (nlsk)
-		ret = netlink_has_listeners(nlsk, XFRMNLGRP_ACQUIRE);
-	rcu_read_unlock();
-
-	return ret;
-}
 #endif
 
 static inline unsigned int aead_len(struct xfrm_algo_aead *alg)
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 2fe6174b8a18..752019963b1e 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -3517,6 +3517,22 @@ static int xfrm_send_mapping(struct xfrm_state *x, xfrm_address_t *ipaddr,
 	return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_MAPPING);
 }
 
+static inline int xfrm_acquire_is_on(struct net *net)
+{
+	struct sock *nlsk;
+	int ret = 0;
+
+	rcu_read_lock();
+	nlsk = rcu_dereference(net->xfrm.nlsk);
+	if (nlsk)
+		ret = netlink_has_listeners(nlsk, XFRMNLGRP_ACQUIRE);
+	if (!ret || IS_ENABLED(CONFIG_COMPAT))
+		ret = netlink_has_listeners(nlsk, XFRMNLGRP_COMPAT_ACQUIRE);
+	rcu_read_unlock();
+
+	return ret;
+}
+
 static bool xfrm_is_alive(const struct km_event *c)
 {
 	return (bool)xfrm_acquire_is_on(c->net);
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 16/18] xfrm: Notify compat listeners about policy flush
  2018-07-26  2:31 ` Dmitry Safonov
                   ` (18 preceding siblings ...)
  (?)
@ 2018-07-26  2:31 ` Dmitry Safonov
  -1 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-26  2:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev

Notify two groups of listeners:
XFRMNLGRP_POLICY - applications that uses native UABI for messages;
XFRMNLGRP_COMPAT_POLICY - applications that uses compat UABI for messages;

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 net/xfrm/xfrm_user.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 752019963b1e..7bba0638c014 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -3368,7 +3368,8 @@ static int xfrm_notify_policy(struct xfrm_policy *xp, int dir,
 	return __xfrm_notify_policy(xp, dir, c, true);
 }
 
-static int xfrm_notify_policy_flush(const struct km_event *c)
+static int __xfrm_notify_policy_flush(const struct km_event *c,
+			unsigned int group)
 {
 	struct net *net = c->net;
 	struct nlmsghdr *nlh;
@@ -3389,13 +3390,22 @@ static int xfrm_notify_policy_flush(const struct km_event *c)
 
 	nlmsg_end(skb, nlh);
 
-	return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_POLICY);
+	return xfrm_nlmsg_multicast(net, skb, 0, group);
 
 out_free_skb:
 	kfree_skb(skb);
 	return err;
 }
 
+static int xfrm_notify_policy_flush(const struct km_event *c)
+{
+	int ret = __xfrm_notify_policy_flush(c, XFRMNLGRP_POLICY);
+
+	if ((ret && ret != -ESRCH) || !IS_ENABLED(CONFIG_COMPAT))
+		return ret;
+	return __xfrm_notify_policy_flush(c, XFRMNLGRP_COMPAT_POLICY);
+}
+
 static int xfrm_send_policy_notify(struct xfrm_policy *xp, int dir, const struct km_event *c)
 {
 
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 17/18] xfrm: Notify compat listeners about state flush
  2018-07-26  2:31 ` Dmitry Safonov
                   ` (19 preceding siblings ...)
  (?)
@ 2018-07-26  2:31 ` Dmitry Safonov
  -1 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-26  2:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev

Notify two groups of listeners:
XFRMNLGRP_SA - applications that uses native UABI for messages;
XFRMNLGRP_COMPAT_SA - applications that uses compat UABI for messages;

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 net/xfrm/xfrm_user.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 7bba0638c014..7e3a132b76fb 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2879,7 +2879,7 @@ static int xfrm_aevent_state_notify(struct xfrm_state *x, const struct km_event
 	return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_AEVENTS);
 }
 
-static int xfrm_notify_sa_flush(const struct km_event *c)
+static int __xfrm_notify_sa_flush(const struct km_event *c, unsigned int group)
 {
 	struct net *net = c->net;
 	struct xfrm_usersa_flush *p;
@@ -2902,7 +2902,16 @@ static int xfrm_notify_sa_flush(const struct km_event *c)
 
 	nlmsg_end(skb, nlh);
 
-	return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_SA);
+	return xfrm_nlmsg_multicast(net, skb, 0, group);
+}
+
+static int xfrm_notify_sa_flush(const struct km_event *c)
+{
+	int ret = __xfrm_notify_sa_flush(c, XFRMNLGRP_SA);
+
+	if ((ret && ret != -ESRCH) || !IS_ENABLED(CONFIG_COMPAT))
+		return ret;
+	return __xfrm_notify_sa_flush(c, XFRMNLGRP_COMPAT_SA);
 }
 
 static inline unsigned int xfrm_sa_len(struct xfrm_state *x)
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [PATCH 18/18] xfrm: Enable compat syscalls
  2018-07-26  2:31 ` Dmitry Safonov
                   ` (20 preceding siblings ...)
  (?)
@ 2018-07-26  2:31 ` Dmitry Safonov
  -1 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-26  2:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dmitry Safonov, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev, Fan Du

Compatible syscalls were disabled for xfrm with the following commits:
19d7df69fdb2 ("xfrm: Refuse to insert 32 bit userspace socket policies
on 64 bit systems") and 74005991b78a ("xfrm: Do not parse 32bits
compiled xfrm netlink msg on 64bits host").

As some structures in xfrm uapi header were not packed by a mistake,
they differ in size between 64-bit and 32-bit applications:

          32-bit UABI                 |             64-bit UABI
--------------------------------------|--------------------------------------
sizeof(xfrm_usersa_info)     = 220    |    sizeof(xfrm_usersa_info)     = 224
sizeof(xfrm_userpolicy_info) = 164    |    sizeof(xfrm_userpolicy_info) = 168
sizeof(xfrm_userspi_info)    = 228    |    sizeof(xfrm_userspi_info)    = 232
sizeof(xfrm_user_acquire)    = 276    |    sizeof(xfrm_user_acquire)    = 280
sizeof(xfrm_user_expire)     = 224    |    sizeof(xfrm_user_expire)     = 232
sizeof(xfrm_user_polexpire)  = 168    |    sizeof(xfrm_user_polexpire)  = 176

With previous patches compatible layer was added to xfrm, so now we
support users of both ABI. A selftest to check work of ipsec tunnel is
present in net/ipsec. It can be easily compiled as compat application
and doesn't require any compat libraries.

Revert the mentioned commits and check the size of received message
according to native/compat syscall.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Fan Du <fan.du@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 net/xfrm/xfrm_state.c |  3 ---
 net/xfrm/xfrm_user.c  | 35 ++++++++++++++++++++++++++++++-----
 2 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 3f48a6925606..515a565bfc37 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2057,9 +2057,6 @@ int xfrm_user_policy(struct sock *sk, int optname, u8 __user *optval, int optlen
 	struct xfrm_mgr *km;
 	struct xfrm_policy *pol = NULL;
 
-	if (in_compat_syscall())
-		return -EOPNOTSUPP;
-
 	if (!optval && !optlen) {
 		xfrm_sk_policy_insert(sk, XFRM_POLICY_IN, NULL);
 		xfrm_sk_policy_insert(sk, XFRM_POLICY_OUT, NULL);
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 7e3a132b76fb..f6da6ea65d37 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2634,6 +2634,30 @@ static const int xfrm_msg_min[XFRM_NR_MSGTYPES] = {
 	[XFRM_MSG_GETSPDINFO  - XFRM_MSG_BASE] = sizeof(u32),
 };
 
+static const int xfrm_msg_min_compat[XFRM_NR_MSGTYPES] = {
+	[XFRM_MSG_NEWSA       - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_info_packed),
+	[XFRM_MSG_DELSA       - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_id),
+	[XFRM_MSG_GETSA       - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_id),
+	[XFRM_MSG_NEWPOLICY   - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userpolicy_info_packed),
+	[XFRM_MSG_DELPOLICY   - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userpolicy_id),
+	[XFRM_MSG_GETPOLICY   - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userpolicy_id),
+	[XFRM_MSG_ALLOCSPI    - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userspi_info_packed),
+	[XFRM_MSG_ACQUIRE     - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_acquire_packed),
+	[XFRM_MSG_EXPIRE      - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_expire_packed),
+	[XFRM_MSG_UPDPOLICY   - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userpolicy_info_packed),
+	[XFRM_MSG_UPDSA       - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_info_packed),
+	[XFRM_MSG_POLEXPIRE   - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_polexpire_packed),
+	[XFRM_MSG_FLUSHSA     - XFRM_MSG_BASE] = XMSGSIZE(xfrm_usersa_flush),
+	[XFRM_MSG_FLUSHPOLICY - XFRM_MSG_BASE] = 0,
+	[XFRM_MSG_NEWAE       - XFRM_MSG_BASE] = XMSGSIZE(xfrm_aevent_id),
+	[XFRM_MSG_GETAE       - XFRM_MSG_BASE] = XMSGSIZE(xfrm_aevent_id),
+	[XFRM_MSG_REPORT      - XFRM_MSG_BASE] = XMSGSIZE(xfrm_user_report),
+	[XFRM_MSG_MIGRATE     - XFRM_MSG_BASE] = XMSGSIZE(xfrm_userpolicy_id),
+	[XFRM_MSG_GETSADINFO  - XFRM_MSG_BASE] = sizeof(u32),
+	[XFRM_MSG_NEWSPDINFO  - XFRM_MSG_BASE] = sizeof(u32),
+	[XFRM_MSG_GETSPDINFO  - XFRM_MSG_BASE] = sizeof(u32),
+};
+
 #undef XMSGSIZE
 
 static const struct nla_policy xfrma_policy[XFRMA_MAX+1] = {
@@ -2715,10 +2739,7 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 	struct net *net = sock_net(skb->sk);
 	struct nlattr *attrs[XFRMA_MAX+1];
 	const struct xfrm_link *link;
-	int type, err;
-
-	if (in_compat_syscall())
-		return -EOPNOTSUPP;
+	int type, err, hdrlen;
 
 	type = nlh->nlmsg_type;
 	if (type > XFRM_MSG_MAX)
@@ -2747,7 +2768,11 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
 		}
 	}
 
-	err = nlmsg_parse(nlh, xfrm_msg_min[type], attrs,
+	hdrlen = xfrm_msg_min[type];
+	if (in_compat_syscall())
+		hdrlen = xfrm_msg_min_compat[type];
+
+	err = nlmsg_parse(nlh, hdrlen, attrs,
 			  link->nla_max ? : XFRMA_MAX,
 			  link->nla_pol ? : xfrma_policy, extack);
 	if (err < 0)
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [PATCH 06/18] netlink: Do not subscribe to non-existent groups
  2018-07-26  2:31 ` [PATCH 06/18] netlink: Do not subscribe to non-existent groups Dmitry Safonov
@ 2018-07-26  4:22   ` David Miller
  2018-07-27 13:43     ` Dmitry Safonov
  0 siblings, 1 reply; 74+ messages in thread
From: David Miller @ 2018-07-26  4:22 UTC (permalink / raw)
  To: dima; +Cc: linux-kernel, herbert, steffen.klassert, 0x7f454c46, netdev

From: Dmitry Safonov <dima@arista.com>
Date: Thu, 26 Jul 2018 03:31:32 +0100

> Make ABI more strict about subscribing to group > ngroups.
> Code doesn't check for that and it looks bogus.
> (one can subscribe to non-existing group)
> Still, it's possible to bind() to all possible groups with (-1)
> 
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Cc: Steffen Klassert <steffen.klassert@secunet.com>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Dmitry Safonov <dima@arista.com>

This really has nothing to do with adding a compat layer for xfrm,
and is a bug fix that should be submitted separately in it's own
right.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
  2018-07-26  2:31 ` Dmitry Safonov
                     ` (2 preceding siblings ...)
  (?)
@ 2018-07-26  8:49   ` Florian Westphal
  -1 siblings, 0 replies; 74+ messages in thread
From: Florian Westphal @ 2018-07-26  8:49 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev, Andy Lutomirski, Ard Biesheuvel,
	H. Peter Anvin, Ingo Molnar, John Stultz, Kirill A. Shutemov,
	Oleg Nesterov, Stephen Boyd, Steven Rostedt, Thomas Gleixner,
	x86, linux-efi, Andrew Morton, Greg Kroah-Hartman,
	Mauro Carvalho Chehab, Shuah Khan, linux-kselftest, Eric Paris,
	Florian Westphal, Jozsef Kadlecsik, Pablo Neira Ayuso,
	Paul Moore, coreteam, linux-audit, netfilter-devel, Fan Du

Dmitry Safonov <dima@arista.com> wrote:
> So, here I add a compatible layer to xfrm.
> As xfrm uses netlink notifications, kernel should send them in ABI
> format that an application will parse. The proposed solution is
> to save the ABI of bind() syscall. The realization detail is
> to create kernel-hidden, non visible to userspace netlink groups
> for compat applications.

Why not use exisiting netlink support?
Just add the 32bit skb to skb64->frag_list and let
netlink find if tasks needs 64 or 32 one.

It only needs this small fix to properly signal the end of a dump:
https://marc.info/?l=linux-netdev&m=126625240303351&w=2

I had started a second attempt to make xfrm compat work,
but its still in early stage.

One link that might still have some value:
https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869
(compat structure definitions with BUILD_BUG_ON checking)

My plan was to make xfrm compat work strictly as shrinker (64->32)
and expander (32->64), i.e. no/little changes to exisiting code and
pass all "expanded" skbs through existing xfrm rcv functions.

Example to illustrate idea:
https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=c622f067849b02170127b69471cb3481e4bc9e49

... its supposed to take 64bit skb and create a 32bit one from it.

Just for reference; I currently don't plan to work on this again.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-26  8:49   ` Florian Westphal
  0 siblings, 0 replies; 74+ messages in thread
From: Florian Westphal @ 2018-07-26  8:49 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev, Andy Lutomirski, Ard Biesheuvel,
	H. Peter Anvin, Ingo Molnar, John Stultz, Kirill A. Shutemov,
	Oleg Nesterov, Stephen Boyd, Steven Rostedt, Thomas Gleixner,
	x86, linux-efi, Andrew Morton, Greg Kroah-Hartman, Mauro

Dmitry Safonov <dima@arista.com> wrote:
> So, here I add a compatible layer to xfrm.
> As xfrm uses netlink notifications, kernel should send them in ABI
> format that an application will parse. The proposed solution is
> to save the ABI of bind() syscall. The realization detail is
> to create kernel-hidden, non visible to userspace netlink groups
> for compat applications.

Why not use exisiting netlink support?
Just add the 32bit skb to skb64->frag_list and let
netlink find if tasks needs 64 or 32 one.

It only needs this small fix to properly signal the end of a dump:
https://marc.info/?l=linux-netdev&m=126625240303351&w=2

I had started a second attempt to make xfrm compat work,
but its still in early stage.

One link that might still have some value:
https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869
(compat structure definitions with BUILD_BUG_ON checking)

My plan was to make xfrm compat work strictly as shrinker (64->32)
and expander (32->64), i.e. no/little changes to exisiting code and
pass all "expanded" skbs through existing xfrm rcv functions.

Example to illustrate idea:
https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=c622f067849b02170127b69471cb3481e4bc9e49

... its supposed to take 64bit skb and create a 32bit one from it.

Just for reference; I currently don't plan to work on this again.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-26  8:49   ` Florian Westphal
  0 siblings, 0 replies; 74+ messages in thread
From: Florian Westphal @ 2018-07-26  8:49 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, David S. Miller, Herbert Xu, Steffen Klassert,
	Dmitry Safonov, netdev, Andy Lutomirski, Ard Biesheuvel,
	H. Peter Anvin, Ingo Molnar, John Stultz, Kirill A. Shutemov,
	Oleg Nesterov, Stephen Boyd, Steven Rostedt, Thomas Gleixner,
	x86, linux-efi, Andrew Morton, Greg Kroah-Hartman, Mauro

Dmitry Safonov <dima@arista.com> wrote:
> So, here I add a compatible layer to xfrm.
> As xfrm uses netlink notifications, kernel should send them in ABI
> format that an application will parse. The proposed solution is
> to save the ABI of bind() syscall. The realization detail is
> to create kernel-hidden, non visible to userspace netlink groups
> for compat applications.

Why not use exisiting netlink support?
Just add the 32bit skb to skb64->frag_list and let
netlink find if tasks needs 64 or 32 one.

It only needs this small fix to properly signal the end of a dump:
https://marc.info/?l=linux-netdev&m=126625240303351&w=2

I had started a second attempt to make xfrm compat work,
but its still in early stage.

One link that might still have some value:
https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869
(compat structure definitions with BUILD_BUG_ON checking)

My plan was to make xfrm compat work strictly as shrinker (64->32)
and expander (32->64), i.e. no/little changes to exisiting code and
pass all "expanded" skbs through existing xfrm rcv functions.

Example to illustrate idea:
https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=c622f067849b02170127b69471cb3481e4bc9e49

... its supposed to take 64bit skb and create a 32bit one from it.

Just for reference; I currently don't plan to work on this again.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-26  8:49   ` Florian Westphal
  0 siblings, 0 replies; 74+ messages in thread
From: fw @ 2018-07-26  8:49 UTC (permalink / raw)


Dmitry Safonov <dima at arista.com> wrote:
> So, here I add a compatible layer to xfrm.
> As xfrm uses netlink notifications, kernel should send them in ABI
> format that an application will parse. The proposed solution is
> to save the ABI of bind() syscall. The realization detail is
> to create kernel-hidden, non visible to userspace netlink groups
> for compat applications.

Why not use exisiting netlink support?
Just add the 32bit skb to skb64->frag_list and let
netlink find if tasks needs 64 or 32 one.

It only needs this small fix to properly signal the end of a dump:
https://marc.info/?l=linux-netdev&m=126625240303351&w=2

I had started a second attempt to make xfrm compat work,
but its still in early stage.

One link that might still have some value:
https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869
(compat structure definitions with BUILD_BUG_ON checking)

My plan was to make xfrm compat work strictly as shrinker (64->32)
and expander (32->64), i.e. no/little changes to exisiting code and
pass all "expanded" skbs through existing xfrm rcv functions.

Example to illustrate idea:
https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=c622f067849b02170127b69471cb3481e4bc9e49

... its supposed to take 64bit skb and create a 32bit one from it.

Just for reference; I currently don't plan to work on this again.
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-26  8:49   ` Florian Westphal
  0 siblings, 0 replies; 74+ messages in thread
From: Florian Westphal @ 2018-07-26  8:49 UTC (permalink / raw)


Dmitry Safonov <dima@arista.com> wrote:
> So, here I add a compatible layer to xfrm.
> As xfrm uses netlink notifications, kernel should send them in ABI
> format that an application will parse. The proposed solution is
> to save the ABI of bind() syscall. The realization detail is
> to create kernel-hidden, non visible to userspace netlink groups
> for compat applications.

Why not use exisiting netlink support?
Just add the 32bit skb to skb64->frag_list and let
netlink find if tasks needs 64 or 32 one.

It only needs this small fix to properly signal the end of a dump:
https://marc.info/?l=linux-netdev&m=126625240303351&w=2

I had started a second attempt to make xfrm compat work,
but its still in early stage.

One link that might still have some value:
https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869
(compat structure definitions with BUILD_BUG_ON checking)

My plan was to make xfrm compat work strictly as shrinker (64->32)
and expander (32->64), i.e. no/little changes to exisiting code and
pass all "expanded" skbs through existing xfrm rcv functions.

Example to illustrate idea:
https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=c622f067849b02170127b69471cb3481e4bc9e49

... its supposed to take 64bit skb and create a 32bit one from it.

Just for reference; I currently don't plan to work on this again.
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
  2018-07-26  8:49   ` Florian Westphal
                       ` (2 preceding siblings ...)
  (?)
@ 2018-07-27  7:37     ` Steffen Klassert
  -1 siblings, 0 replies; 74+ messages in thread
From: Steffen Klassert @ 2018-07-27  7:37 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Dmitry Safonov, linux-kernel, David S. Miller, Herbert Xu,
	Dmitry Safonov, netdev, Andy Lutomirski, Ard Biesheuvel,
	H. Peter Anvin, Ingo Molnar, John Stultz, Kirill A. Shutemov,
	Oleg Nesterov, Stephen Boyd, Steven Rostedt, Thomas Gleixner,
	x86, linux-efi, Andrew Morton, Greg Kroah-Hartman,
	Mauro Carvalho Chehab, Shuah Khan, linux-kselftest, Eric Paris,
	Jozsef Kadlecsik, Pablo Neira Ayuso, Paul Moore, coreteam,
	linux-audit, netfilter-devel, Fan Du

On Thu, Jul 26, 2018 at 10:49:59AM +0200, Florian Westphal wrote:
> Dmitry Safonov <dima@arista.com> wrote:
> > So, here I add a compatible layer to xfrm.
> > As xfrm uses netlink notifications, kernel should send them in ABI
> > format that an application will parse. The proposed solution is
> > to save the ABI of bind() syscall. The realization detail is
> > to create kernel-hidden, non visible to userspace netlink groups
> > for compat applications.
> 
> Why not use exisiting netlink support?
> Just add the 32bit skb to skb64->frag_list and let
> netlink find if tasks needs 64 or 32 one.
> 
> It only needs this small fix to properly signal the end of a dump:
> https://marc.info/?l=linux-netdev&m=126625240303351&w=2
> 
> I had started a second attempt to make xfrm compat work,
> but its still in early stage.
> 
> One link that might still have some value:
> https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869
> (compat structure definitions with BUILD_BUG_ON checking)
> 
> My plan was to make xfrm compat work strictly as shrinker (64->32)
> and expander (32->64), i.e. no/little changes to exisiting code and
> pass all "expanded" skbs through existing xfrm rcv functions.

I agree here with Florian. The code behind this ABI
is already complicated. Please stay away from generic
code a much as possible. Generic and compat code should
be clearly separated.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-27  7:37     ` Steffen Klassert
  0 siblings, 0 replies; 74+ messages in thread
From: Steffen Klassert @ 2018-07-27  7:37 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Dmitry Safonov, linux-kernel, David S. Miller, Herbert Xu,
	Dmitry Safonov, netdev, Andy Lutomirski, Ard Biesheuvel,
	H. Peter Anvin, Ingo Molnar, John Stultz, Kirill A. Shutemov,
	Oleg Nesterov, Stephen Boyd, Steven Rostedt, Thomas Gleixner,
	x86, linux-efi, Andrew Morton, Greg Kroah-Hartman,
	Mauro Carvalho Chehab

On Thu, Jul 26, 2018 at 10:49:59AM +0200, Florian Westphal wrote:
> Dmitry Safonov <dima@arista.com> wrote:
> > So, here I add a compatible layer to xfrm.
> > As xfrm uses netlink notifications, kernel should send them in ABI
> > format that an application will parse. The proposed solution is
> > to save the ABI of bind() syscall. The realization detail is
> > to create kernel-hidden, non visible to userspace netlink groups
> > for compat applications.
> 
> Why not use exisiting netlink support?
> Just add the 32bit skb to skb64->frag_list and let
> netlink find if tasks needs 64 or 32 one.
> 
> It only needs this small fix to properly signal the end of a dump:
> https://marc.info/?l=linux-netdev&m=126625240303351&w=2
> 
> I had started a second attempt to make xfrm compat work,
> but its still in early stage.
> 
> One link that might still have some value:
> https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869
> (compat structure definitions with BUILD_BUG_ON checking)
> 
> My plan was to make xfrm compat work strictly as shrinker (64->32)
> and expander (32->64), i.e. no/little changes to exisiting code and
> pass all "expanded" skbs through existing xfrm rcv functions.

I agree here with Florian. The code behind this ABI
is already complicated. Please stay away from generic
code a much as possible. Generic and compat code should
be clearly separated.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-27  7:37     ` Steffen Klassert
  0 siblings, 0 replies; 74+ messages in thread
From: Steffen Klassert @ 2018-07-27  7:37 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Dmitry Safonov, linux-kernel, David S. Miller, Herbert Xu,
	Dmitry Safonov, netdev, Andy Lutomirski, Ard Biesheuvel,
	H. Peter Anvin, Ingo Molnar, John Stultz, Kirill A. Shutemov,
	Oleg Nesterov, Stephen Boyd, Steven Rostedt, Thomas Gleixner,
	x86, linux-efi, Andrew Morton, Greg Kroah-Hartman,
	Mauro Carvalho Chehab

On Thu, Jul 26, 2018 at 10:49:59AM +0200, Florian Westphal wrote:
> Dmitry Safonov <dima@arista.com> wrote:
> > So, here I add a compatible layer to xfrm.
> > As xfrm uses netlink notifications, kernel should send them in ABI
> > format that an application will parse. The proposed solution is
> > to save the ABI of bind() syscall. The realization detail is
> > to create kernel-hidden, non visible to userspace netlink groups
> > for compat applications.
> 
> Why not use exisiting netlink support?
> Just add the 32bit skb to skb64->frag_list and let
> netlink find if tasks needs 64 or 32 one.
> 
> It only needs this small fix to properly signal the end of a dump:
> https://marc.info/?l=linux-netdev&m=126625240303351&w=2
> 
> I had started a second attempt to make xfrm compat work,
> but its still in early stage.
> 
> One link that might still have some value:
> https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869
> (compat structure definitions with BUILD_BUG_ON checking)
> 
> My plan was to make xfrm compat work strictly as shrinker (64->32)
> and expander (32->64), i.e. no/little changes to exisiting code and
> pass all "expanded" skbs through existing xfrm rcv functions.

I agree here with Florian. The code behind this ABI
is already complicated. Please stay away from generic
code a much as possible. Generic and compat code should
be clearly separated.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-27  7:37     ` Steffen Klassert
  0 siblings, 0 replies; 74+ messages in thread
From: steffen.klassert @ 2018-07-27  7:37 UTC (permalink / raw)


On Thu, Jul 26, 2018 at 10:49:59AM +0200, Florian Westphal wrote:
> Dmitry Safonov <dima at arista.com> wrote:
> > So, here I add a compatible layer to xfrm.
> > As xfrm uses netlink notifications, kernel should send them in ABI
> > format that an application will parse. The proposed solution is
> > to save the ABI of bind() syscall. The realization detail is
> > to create kernel-hidden, non visible to userspace netlink groups
> > for compat applications.
> 
> Why not use exisiting netlink support?
> Just add the 32bit skb to skb64->frag_list and let
> netlink find if tasks needs 64 or 32 one.
> 
> It only needs this small fix to properly signal the end of a dump:
> https://marc.info/?l=linux-netdev&m=126625240303351&w=2
> 
> I had started a second attempt to make xfrm compat work,
> but its still in early stage.
> 
> One link that might still have some value:
> https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869
> (compat structure definitions with BUILD_BUG_ON checking)
> 
> My plan was to make xfrm compat work strictly as shrinker (64->32)
> and expander (32->64), i.e. no/little changes to exisiting code and
> pass all "expanded" skbs through existing xfrm rcv functions.

I agree here with Florian. The code behind this ABI
is already complicated. Please stay away from generic
code a much as possible. Generic and compat code should
be clearly separated.
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-27  7:37     ` Steffen Klassert
  0 siblings, 0 replies; 74+ messages in thread
From: Steffen Klassert @ 2018-07-27  7:37 UTC (permalink / raw)


On Thu, Jul 26, 2018@10:49:59AM +0200, Florian Westphal wrote:
> Dmitry Safonov <dima@arista.com> wrote:
> > So, here I add a compatible layer to xfrm.
> > As xfrm uses netlink notifications, kernel should send them in ABI
> > format that an application will parse. The proposed solution is
> > to save the ABI of bind() syscall. The realization detail is
> > to create kernel-hidden, non visible to userspace netlink groups
> > for compat applications.
> 
> Why not use exisiting netlink support?
> Just add the 32bit skb to skb64->frag_list and let
> netlink find if tasks needs 64 or 32 one.
> 
> It only needs this small fix to properly signal the end of a dump:
> https://marc.info/?l=linux-netdev&m=126625240303351&w=2
> 
> I had started a second attempt to make xfrm compat work,
> but its still in early stage.
> 
> One link that might still have some value:
> https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869
> (compat structure definitions with BUILD_BUG_ON checking)
> 
> My plan was to make xfrm compat work strictly as shrinker (64->32)
> and expander (32->64), i.e. no/little changes to exisiting code and
> pass all "expanded" skbs through existing xfrm rcv functions.

I agree here with Florian. The code behind this ABI
is already complicated. Please stay away from generic
code a much as possible. Generic and compat code should
be clearly separated.
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 06/18] netlink: Do not subscribe to non-existent groups
  2018-07-26  4:22   ` David Miller
@ 2018-07-27 13:43     ` Dmitry Safonov
  0 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-27 13:43 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, herbert, steffen.klassert, 0x7f454c46, netdev

On Wed, 2018-07-25 at 21:22 -0700, David Miller wrote:
> From: Dmitry Safonov <dima@arista.com>
> Date: Thu, 26 Jul 2018 03:31:32 +0100
> 
> > Make ABI more strict about subscribing to group > ngroups.
> > Code doesn't check for that and it looks bogus.
> > (one can subscribe to non-existing group)
> > Still, it's possible to bind() to all possible groups with (-1)
> > 
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Herbert Xu <herbert@gondor.apana.org.au>
> > Cc: Steffen Klassert <steffen.klassert@secunet.com>
> > Cc: netdev@vger.kernel.org
> > Signed-off-by: Dmitry Safonov <dima@arista.com>
> 
> This really has nothing to do with adding a compat layer for xfrm,
> and is a bug fix that should be submitted separately in it's own
> right.

Sure, will do.

-- 
Thanks,
             Dmitry

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
  2018-07-27  7:37     ` Steffen Klassert
                         ` (2 preceding siblings ...)
  (?)
@ 2018-07-27 14:02       ` Dmitry Safonov
  -1 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-27 14:02 UTC (permalink / raw)
  To: Steffen Klassert, Florian Westphal
  Cc: linux-kernel, David S. Miller, Herbert Xu, Dmitry Safonov,
	netdev, Andy Lutomirski, Ard Biesheuvel, H. Peter Anvin,
	Ingo Molnar, John Stultz, Kirill A. Shutemov, Oleg Nesterov,
	Stephen Boyd, Steven Rostedt, Thomas Gleixner, x86, linux-efi,
	Andrew Morton, Greg Kroah-Hartman, Mauro Carvalho Chehab,
	Shuah Khan, linux-kselftest, Eric Paris, Jozsef Kadlecsik,
	Pablo Neira Ayuso, Paul Moore, coreteam, linux-audit,
	netfilter-devel, Fan Du

On Fri, 2018-07-27 at 09:37 +0200, Steffen Klassert wrote:
> On Thu, Jul 26, 2018 at 10:49:59AM +0200, Florian Westphal wrote:
> > Dmitry Safonov <dima@arista.com> wrote:
> > > So, here I add a compatible layer to xfrm.
> > > As xfrm uses netlink notifications, kernel should send them in
> > > ABI
> > > format that an application will parse. The proposed solution is
> > > to save the ABI of bind() syscall. The realization detail is
> > > to create kernel-hidden, non visible to userspace netlink groups
> > > for compat applications.
> > 
> > Why not use exisiting netlink support?
> > Just add the 32bit skb to skb64->frag_list and let
> > netlink find if tasks needs 64 or 32 one.
> > 
> > It only needs this small fix to properly signal the end of a dump:
> > https://marc.info/?l=linux-netdev&m=126625240303351&w=2
> > 
> > I had started a second attempt to make xfrm compat work,
> > but its still in early stage.
> > 
> > One link that might still have some value:
> > https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_confi
> > g_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869
> > (compat structure definitions with BUILD_BUG_ON checking)
> > 
> > My plan was to make xfrm compat work strictly as shrinker (64->32)
> > and expander (32->64), i.e. no/little changes to exisiting code and
> > pass all "expanded" skbs through existing xfrm rcv functions.
> 
> I agree here with Florian. The code behind this ABI
> is already complicated. Please stay away from generic
> code a much as possible. Generic and compat code should
> be clearly separated.

Yeah, I tend to agree that it would be better to separate it.
But:
1. It will double copy netlink messages, making it O(n) instead of
O(1), where n - is number of bind()s.. Probably we don't care much.
2. The patches not-yet-done on the link have +500 added lines - as much
as my working patches set, so probably it'll add more code.

Probably, we don't care that much about amount of code added and
additional copies than about separating compat layer from the main
code. Will look into that.

-- 
Thanks,
             Dmitry

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-27 14:02       ` Dmitry Safonov
  0 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-27 14:02 UTC (permalink / raw)
  To: Steffen Klassert, Florian Westphal
  Cc: linux-kernel, David S. Miller, Herbert Xu, Dmitry Safonov,
	netdev, Andy Lutomirski, Ard Biesheuvel, H. Peter Anvin,
	Ingo Molnar, John Stultz, Kirill A. Shutemov, Oleg Nesterov,
	Stephen Boyd, Steven Rostedt, Thomas Gleixner, x86, linux-efi,
	Andrew Morton, Greg Kroah-Hartman, Mauro Carvalho Chehab,
	Shuah Khan

On Fri, 2018-07-27 at 09:37 +0200, Steffen Klassert wrote:
> On Thu, Jul 26, 2018 at 10:49:59AM +0200, Florian Westphal wrote:
> > Dmitry Safonov <dima@arista.com> wrote:
> > > So, here I add a compatible layer to xfrm.
> > > As xfrm uses netlink notifications, kernel should send them in
> > > ABI
> > > format that an application will parse. The proposed solution is
> > > to save the ABI of bind() syscall. The realization detail is
> > > to create kernel-hidden, non visible to userspace netlink groups
> > > for compat applications.
> > 
> > Why not use exisiting netlink support?
> > Just add the 32bit skb to skb64->frag_list and let
> > netlink find if tasks needs 64 or 32 one.
> > 
> > It only needs this small fix to properly signal the end of a dump:
> > https://marc.info/?l=linux-netdev&m=126625240303351&w=2
> > 
> > I had started a second attempt to make xfrm compat work,
> > but its still in early stage.
> > 
> > One link that might still have some value:
> > https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_confi
> > g_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869
> > (compat structure definitions with BUILD_BUG_ON checking)
> > 
> > My plan was to make xfrm compat work strictly as shrinker (64->32)
> > and expander (32->64), i.e. no/little changes to exisiting code and
> > pass all "expanded" skbs through existing xfrm rcv functions.
> 
> I agree here with Florian. The code behind this ABI
> is already complicated. Please stay away from generic
> code a much as possible. Generic and compat code should
> be clearly separated.

Yeah, I tend to agree that it would be better to separate it.
But:
1. It will double copy netlink messages, making it O(n) instead of
O(1), where n - is number of bind()s.. Probably we don't care much.
2. The patches not-yet-done on the link have +500 added lines - as much
as my working patches set, so probably it'll add more code.

Probably, we don't care that much about amount of code added and
additional copies than about separating compat layer from the main
code. Will look into that.

-- 
Thanks,
             Dmitry

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-27 14:02       ` Dmitry Safonov
  0 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-27 14:02 UTC (permalink / raw)
  To: Steffen Klassert, Florian Westphal
  Cc: linux-kernel, David S. Miller, Herbert Xu, Dmitry Safonov,
	netdev, Andy Lutomirski, Ard Biesheuvel, H. Peter Anvin,
	Ingo Molnar, John Stultz, Kirill A. Shutemov, Oleg Nesterov,
	Stephen Boyd, Steven Rostedt, Thomas Gleixner, x86, linux-efi,
	Andrew Morton, Greg Kroah-Hartman, Mauro Carvalho Chehab,
	Shuah Khan

On Fri, 2018-07-27 at 09:37 +0200, Steffen Klassert wrote:
> On Thu, Jul 26, 2018 at 10:49:59AM +0200, Florian Westphal wrote:
> > Dmitry Safonov <dima@arista.com> wrote:
> > > So, here I add a compatible layer to xfrm.
> > > As xfrm uses netlink notifications, kernel should send them in
> > > ABI
> > > format that an application will parse. The proposed solution is
> > > to save the ABI of bind() syscall. The realization detail is
> > > to create kernel-hidden, non visible to userspace netlink groups
> > > for compat applications.
> > 
> > Why not use exisiting netlink support?
> > Just add the 32bit skb to skb64->frag_list and let
> > netlink find if tasks needs 64 or 32 one.
> > 
> > It only needs this small fix to properly signal the end of a dump:
> > https://marc.info/?l=linux-netdev&m=126625240303351&w=2
> > 
> > I had started a second attempt to make xfrm compat work,
> > but its still in early stage.
> > 
> > One link that might still have some value:
> > https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_confi
> > g_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869
> > (compat structure definitions with BUILD_BUG_ON checking)
> > 
> > My plan was to make xfrm compat work strictly as shrinker (64->32)
> > and expander (32->64), i.e. no/little changes to exisiting code and
> > pass all "expanded" skbs through existing xfrm rcv functions.
> 
> I agree here with Florian. The code behind this ABI
> is already complicated. Please stay away from generic
> code a much as possible. Generic and compat code should
> be clearly separated.

Yeah, I tend to agree that it would be better to separate it.
But:
1. It will double copy netlink messages, making it O(n) instead of
O(1), where n - is number of bind()s.. Probably we don't care much.
2. The patches not-yet-done on the link have +500 added lines - as much
as my working patches set, so probably it'll add more code.

Probably, we don't care that much about amount of code added and
additional copies than about separating compat layer from the main
code. Will look into that.

-- 
Thanks,
             Dmitry

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-27 14:02       ` Dmitry Safonov
  0 siblings, 0 replies; 74+ messages in thread
From: dima @ 2018-07-27 14:02 UTC (permalink / raw)


On Fri, 2018-07-27 at 09:37 +0200, Steffen Klassert wrote:
> On Thu, Jul 26, 2018 at 10:49:59AM +0200, Florian Westphal wrote:
> > Dmitry Safonov <dima at arista.com> wrote:
> > > So, here I add a compatible layer to xfrm.
> > > As xfrm uses netlink notifications, kernel should send them in
> > > ABI
> > > format that an application will parse. The proposed solution is
> > > to save the ABI of bind() syscall. The realization detail is
> > > to create kernel-hidden, non visible to userspace netlink groups
> > > for compat applications.
> > 
> > Why not use exisiting netlink support?
> > Just add the 32bit skb to skb64->frag_list and let
> > netlink find if tasks needs 64 or 32 one.
> > 
> > It only needs this small fix to properly signal the end of a dump:
> > https://marc.info/?l=linux-netdev&m=126625240303351&w=2
> > 
> > I had started a second attempt to make xfrm compat work,
> > but its still in early stage.
> > 
> > One link that might still have some value:
> > https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_confi
> > g_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869
> > (compat structure definitions with BUILD_BUG_ON checking)
> > 
> > My plan was to make xfrm compat work strictly as shrinker (64->32)
> > and expander (32->64), i.e. no/little changes to exisiting code and
> > pass all "expanded" skbs through existing xfrm rcv functions.
> 
> I agree here with Florian. The code behind this ABI
> is already complicated. Please stay away from generic
> code a much as possible. Generic and compat code should
> be clearly separated.

Yeah, I tend to agree that it would be better to separate it.
But:
1. It will double copy netlink messages, making it O(n) instead of
O(1), where n - is number of bind()s.. Probably we don't care much.
2. The patches not-yet-done on the link have +500 added lines - as much
as my working patches set, so probably it'll add more code.

Probably, we don't care that much about amount of code added and
additional copies than about separating compat layer from the main
code. Will look into that.

-- 
Thanks,
             Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-27 14:02       ` Dmitry Safonov
  0 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-27 14:02 UTC (permalink / raw)


On Fri, 2018-07-27@09:37 +0200, Steffen Klassert wrote:
> On Thu, Jul 26, 2018@10:49:59AM +0200, Florian Westphal wrote:
> > Dmitry Safonov <dima@arista.com> wrote:
> > > So, here I add a compatible layer to xfrm.
> > > As xfrm uses netlink notifications, kernel should send them in
> > > ABI
> > > format that an application will parse. The proposed solution is
> > > to save the ABI of bind() syscall. The realization detail is
> > > to create kernel-hidden, non visible to userspace netlink groups
> > > for compat applications.
> > 
> > Why not use exisiting netlink support?
> > Just add the 32bit skb to skb64->frag_list and let
> > netlink find if tasks needs 64 or 32 one.
> > 
> > It only needs this small fix to properly signal the end of a dump:
> > https://marc.info/?l=linux-netdev&m=126625240303351&w=2
> > 
> > I had started a second attempt to make xfrm compat work,
> > but its still in early stage.
> > 
> > One link that might still have some value:
> > https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_confi
> > g_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869
> > (compat structure definitions with BUILD_BUG_ON checking)
> > 
> > My plan was to make xfrm compat work strictly as shrinker (64->32)
> > and expander (32->64), i.e. no/little changes to exisiting code and
> > pass all "expanded" skbs through existing xfrm rcv functions.
> 
> I agree here with Florian. The code behind this ABI
> is already complicated. Please stay away from generic
> code a much as possible. Generic and compat code should
> be clearly separated.

Yeah, I tend to agree that it would be better to separate it.
But:
1. It will double copy netlink messages, making it O(n) instead of
O(1), where n - is number of bind()s.. Probably we don't care much.
2. The patches not-yet-done on the link have +500 added lines - as much
as my working patches set, so probably it'll add more code.

Probably, we don't care that much about amount of code added and
additional copies than about separating compat layer from the main
code. Will look into that.

-- 
Thanks,
             Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
  2018-07-27 14:02       ` Dmitry Safonov
                           ` (2 preceding siblings ...)
  (?)
@ 2018-07-27 14:19         ` Florian Westphal
  -1 siblings, 0 replies; 74+ messages in thread
From: Florian Westphal @ 2018-07-27 14:19 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: Steffen Klassert, Florian Westphal, linux-kernel,
	David S. Miller, Herbert Xu, Dmitry Safonov, netdev,
	Andy Lutomirski, Ard Biesheuvel, H. Peter Anvin, Ingo Molnar,
	John Stultz, Kirill A. Shutemov, Oleg Nesterov, Stephen Boyd,
	Steven Rostedt, Thomas Gleixner, x86, linux-efi, Andrew Morton,
	Greg Kroah-Hartman, Mauro Carvalho Chehab, Shuah Khan,
	linux-kselftest, Eric Paris, Jozsef Kadlecsik, Pablo Neira Ayuso,
	Paul Moore, coreteam, linux-audit, netfilter-devel, Fan Du

Dmitry Safonov <dima@arista.com> wrote:
> 1. It will double copy netlink messages, making it O(n) instead of
> O(1), where n - is number of bind()s.. Probably we don't care much.

About those bind() patches, I don't understand why they are needed.

Why can't you just add the compat skb to the native skb when doing
the multicast call?

skb_shinfo(skb)->frag_list = compat_skb;
xfrm_nlmsg_multicast(net, skb, 0, ...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-27 14:19         ` Florian Westphal
  0 siblings, 0 replies; 74+ messages in thread
From: Florian Westphal @ 2018-07-27 14:19 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: Steffen Klassert, Florian Westphal, linux-kernel,
	David S. Miller, Herbert Xu, Dmitry Safonov, netdev,
	Andy Lutomirski, Ard Biesheuvel, H. Peter Anvin, Ingo Molnar,
	John Stultz, Kirill A. Shutemov, Oleg Nesterov, Stephen Boyd,
	Steven Rostedt, Thomas Gleixner, x86, linux-efi, Andrew Morton,
	Greg Kroah-Hartman

Dmitry Safonov <dima@arista.com> wrote:
> 1. It will double copy netlink messages, making it O(n) instead of
> O(1), where n - is number of bind()s.. Probably we don't care much.

About those bind() patches, I don't understand why they are needed.

Why can't you just add the compat skb to the native skb when doing
the multicast call?

skb_shinfo(skb)->frag_list = compat_skb;
xfrm_nlmsg_multicast(net, skb, 0, ...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-27 14:19         ` Florian Westphal
  0 siblings, 0 replies; 74+ messages in thread
From: Florian Westphal @ 2018-07-27 14:19 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: Steffen Klassert, Florian Westphal, linux-kernel,
	David S. Miller, Herbert Xu, Dmitry Safonov, netdev,
	Andy Lutomirski, Ard Biesheuvel, H. Peter Anvin, Ingo Molnar,
	John Stultz, Kirill A. Shutemov, Oleg Nesterov, Stephen Boyd,
	Steven Rostedt, Thomas Gleixner, x86, linux-efi, Andrew Morton,
	Greg Kroah-Hartman

Dmitry Safonov <dima@arista.com> wrote:
> 1. It will double copy netlink messages, making it O(n) instead of
> O(1), where n - is number of bind()s.. Probably we don't care much.

About those bind() patches, I don't understand why they are needed.

Why can't you just add the compat skb to the native skb when doing
the multicast call?

skb_shinfo(skb)->frag_list = compat_skb;
xfrm_nlmsg_multicast(net, skb, 0, ...

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-27 14:19         ` Florian Westphal
  0 siblings, 0 replies; 74+ messages in thread
From: fw @ 2018-07-27 14:19 UTC (permalink / raw)


Dmitry Safonov <dima at arista.com> wrote:
> 1. It will double copy netlink messages, making it O(n) instead of
> O(1), where n - is number of bind()s.. Probably we don't care much.

About those bind() patches, I don't understand why they are needed.

Why can't you just add the compat skb to the native skb when doing
the multicast call?

skb_shinfo(skb)->frag_list = compat_skb;
xfrm_nlmsg_multicast(net, skb, 0, ...
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-27 14:19         ` Florian Westphal
  0 siblings, 0 replies; 74+ messages in thread
From: Florian Westphal @ 2018-07-27 14:19 UTC (permalink / raw)


Dmitry Safonov <dima@arista.com> wrote:
> 1. It will double copy netlink messages, making it O(n) instead of
> O(1), where n - is number of bind()s.. Probably we don't care much.

About those bind() patches, I don't understand why they are needed.

Why can't you just add the compat skb to the native skb when doing
the multicast call?

skb_shinfo(skb)->frag_list = compat_skb;
xfrm_nlmsg_multicast(net, skb, 0, ...
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
  2018-07-27 14:19         ` Florian Westphal
                             ` (2 preceding siblings ...)
  (?)
@ 2018-07-27 14:51           ` Dmitry Safonov
  -1 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-27 14:51 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Steffen Klassert, linux-kernel, David S. Miller, Herbert Xu,
	Dmitry Safonov, netdev, Andy Lutomirski, Ard Biesheuvel,
	H. Peter Anvin, Ingo Molnar, John Stultz, Kirill A. Shutemov,
	Oleg Nesterov, Stephen Boyd, Steven Rostedt, Thomas Gleixner,
	x86, linux-efi, Andrew Morton, Greg Kroah-Hartman,
	Mauro Carvalho Chehab, Shuah Khan, linux-kselftest, Eric Paris,
	Jozsef Kadlecsik, Pablo Neira Ayuso, Paul Moore, coreteam,
	linux-audit, netfilter-devel, Fan Du

On Fri, 2018-07-27 at 16:19 +0200, Florian Westphal wrote:
> Dmitry Safonov <dima@arista.com> wrote:
> > 1. It will double copy netlink messages, making it O(n) instead of
> > O(1), where n - is number of bind()s.. Probably we don't care much.
> 
> About those bind() patches, I don't understand why they are needed.
> 
> Why can't you just add the compat skb to the native skb when doing
> the multicast call?
> 
> skb_shinfo(skb)->frag_list = compat_skb;
> xfrm_nlmsg_multicast(net, skb, 0, ...

Oh yeah, sorry, I think I misread the patch - will try to add compat
skb in the multicast call.

-- 
Thanks,
             Dmitry

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-27 14:51           ` Dmitry Safonov
  0 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-27 14:51 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Steffen Klassert, linux-kernel, David S. Miller, Herbert Xu,
	Dmitry Safonov, netdev, Andy Lutomirski, Ard Biesheuvel,
	H. Peter Anvin, Ingo Molnar, John Stultz, Kirill A. Shutemov,
	Oleg Nesterov, Stephen Boyd, Steven Rostedt, Thomas Gleixner,
	x86, linux-efi, Andrew Morton, Greg Kroah-Hartman, Mauro

On Fri, 2018-07-27 at 16:19 +0200, Florian Westphal wrote:
> Dmitry Safonov <dima@arista.com> wrote:
> > 1. It will double copy netlink messages, making it O(n) instead of
> > O(1), where n - is number of bind()s.. Probably we don't care much.
> 
> About those bind() patches, I don't understand why they are needed.
> 
> Why can't you just add the compat skb to the native skb when doing
> the multicast call?
> 
> skb_shinfo(skb)->frag_list = compat_skb;
> xfrm_nlmsg_multicast(net, skb, 0, ...

Oh yeah, sorry, I think I misread the patch - will try to add compat
skb in the multicast call.

-- 
Thanks,
             Dmitry

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-27 14:51           ` Dmitry Safonov
  0 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-27 14:51 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Steffen Klassert, linux-kernel, David S. Miller, Herbert Xu,
	Dmitry Safonov, netdev, Andy Lutomirski, Ard Biesheuvel,
	H. Peter Anvin, Ingo Molnar, John Stultz, Kirill A. Shutemov,
	Oleg Nesterov, Stephen Boyd, Steven Rostedt, Thomas Gleixner,
	x86, linux-efi, Andrew Morton, Greg Kroah-Hartman, Mauro

On Fri, 2018-07-27 at 16:19 +0200, Florian Westphal wrote:
> Dmitry Safonov <dima@arista.com> wrote:
> > 1. It will double copy netlink messages, making it O(n) instead of
> > O(1), where n - is number of bind()s.. Probably we don't care much.
> 
> About those bind() patches, I don't understand why they are needed.
> 
> Why can't you just add the compat skb to the native skb when doing
> the multicast call?
> 
> skb_shinfo(skb)->frag_list = compat_skb;
> xfrm_nlmsg_multicast(net, skb, 0, ...

Oh yeah, sorry, I think I misread the patch - will try to add compat
skb in the multicast call.

-- 
Thanks,
             Dmitry

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-27 14:51           ` Dmitry Safonov
  0 siblings, 0 replies; 74+ messages in thread
From: dima @ 2018-07-27 14:51 UTC (permalink / raw)


On Fri, 2018-07-27 at 16:19 +0200, Florian Westphal wrote:
> Dmitry Safonov <dima at arista.com> wrote:
> > 1. It will double copy netlink messages, making it O(n) instead of
> > O(1), where n - is number of bind()s.. Probably we don't care much.
> 
> About those bind() patches, I don't understand why they are needed.
> 
> Why can't you just add the compat skb to the native skb when doing
> the multicast call?
> 
> skb_shinfo(skb)->frag_list = compat_skb;
> xfrm_nlmsg_multicast(net, skb, 0, ...

Oh yeah, sorry, I think I misread the patch - will try to add compat
skb in the multicast call.

-- 
Thanks,
             Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-27 14:51           ` Dmitry Safonov
  0 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-27 14:51 UTC (permalink / raw)


On Fri, 2018-07-27@16:19 +0200, Florian Westphal wrote:
> Dmitry Safonov <dima@arista.com> wrote:
> > 1. It will double copy netlink messages, making it O(n) instead of
> > O(1), where n - is number of bind()s.. Probably we don't care much.
> 
> About those bind() patches, I don't understand why they are needed.
> 
> Why can't you just add the compat skb to the native skb when doing
> the multicast call?
> 
> skb_shinfo(skb)->frag_list = compat_skb;
> xfrm_nlmsg_multicast(net, skb, 0, ...

Oh yeah, sorry, I think I misread the patch - will try to add compat
skb in the multicast call.

-- 
Thanks,
             Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
  2018-07-27 14:51           ` Dmitry Safonov
  (?)
  (?)
@ 2018-07-27 16:48             ` Nathan Harold
  -1 siblings, 0 replies; 74+ messages in thread
From: Nathan Harold @ 2018-07-27 16:48 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: Florian Westphal, Steffen Klassert, linux-kernel,
	David S. Miller, Herbert Xu, Dmitry Safonov, netdev,
	Andy Lutomirski, Ard Biesheuvel, H. Peter Anvin, Ingo Molnar,
	John Stultz, Kirill A. Shutemov, Oleg Nesterov, Stephen Boyd,
	Steven Rostedt, Thomas Gleixner, x86, linux-efi, Andrew Morton,
	Greg Kroah-Hartman

[-- Attachment #1: Type: text/plain, Size: 1805 bytes --]

*We (Android) are very interested in removing the restriction for 32-bit
userspace processes accessing xfrm netlink on 64-bit kernels. IPsec support
is required to pass Android conformance tests, and any manufacturer wishing
to ship 32-bit userspace with a recent kernel needs out-of-tree changes
(removing the compat_task check) to do so.That said, it’s not difficult to
work around alignment issues directly in userspace, so maybe we could just
remove the check and make this the caller's responsibility? Here’s an
example of the workaround currently in the Android
tree:https://android.googlesource.com/platform/system/netd/+/refs/heads/master/server/XfrmController.h#257
<https://android.googlesource.com/platform/system/netd/+/refs/heads/master/server/XfrmController.h#257>We
could also employ a (relatively simple) solution such as the one above in
the uapi XFRM header itself, though it would require a caller to declare
the target kernel ABI at compile time. Maybe that’s not unthinkable for an
uncommon case?-Nathan*

On Fri, Jul 27, 2018 at 7:51 AM, Dmitry Safonov <dima@arista.com> wrote:

> On Fri, 2018-07-27 at 16:19 +0200, Florian Westphal wrote:
> > Dmitry Safonov <dima@arista.com> wrote:
> > > 1. It will double copy netlink messages, making it O(n) instead of
> > > O(1), where n - is number of bind()s.. Probably we don't care much.
> >
> > About those bind() patches, I don't understand why they are needed.
> >
> > Why can't you just add the compat skb to the native skb when doing
> > the multicast call?
> >
> > skb_shinfo(skb)->frag_list = compat_skb;
> > xfrm_nlmsg_multicast(net, skb, 0, ...
>
> Oh yeah, sorry, I think I misread the patch - will try to add compat
> skb in the multicast call.
>
> --
> Thanks,
>              Dmitry
>

[-- Attachment #2: Type: text/html, Size: 3972 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-27 16:48             ` Nathan Harold
  0 siblings, 0 replies; 74+ messages in thread
From: Nathan Harold @ 2018-07-27 16:48 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: Florian Westphal, Steffen Klassert, linux-kernel,
	David S. Miller, Herbert Xu, Dmitry Safonov, netdev,
	Andy Lutomirski, Ard Biesheuvel, H. Peter Anvin, Ingo Molnar,
	John Stultz, Kirill A. Shutemov, Oleg Nesterov, Stephen Boyd,
	Steven Rostedt, Thomas Gleixner, x86, linux-efi, Andrew Morton,
	Greg Kroah-Hartman

[-- Attachment #1: Type: text/plain, Size: 1805 bytes --]

*We (Android) are very interested in removing the restriction for 32-bit
userspace processes accessing xfrm netlink on 64-bit kernels. IPsec support
is required to pass Android conformance tests, and any manufacturer wishing
to ship 32-bit userspace with a recent kernel needs out-of-tree changes
(removing the compat_task check) to do so.That said, it’s not difficult to
work around alignment issues directly in userspace, so maybe we could just
remove the check and make this the caller's responsibility? Here’s an
example of the workaround currently in the Android
tree:https://android.googlesource.com/platform/system/netd/+/refs/heads/master/server/XfrmController.h#257
<https://android.googlesource.com/platform/system/netd/+/refs/heads/master/server/XfrmController.h#257>We
could also employ a (relatively simple) solution such as the one above in
the uapi XFRM header itself, though it would require a caller to declare
the target kernel ABI at compile time. Maybe that’s not unthinkable for an
uncommon case?-Nathan*

On Fri, Jul 27, 2018 at 7:51 AM, Dmitry Safonov <dima@arista.com> wrote:

> On Fri, 2018-07-27 at 16:19 +0200, Florian Westphal wrote:
> > Dmitry Safonov <dima@arista.com> wrote:
> > > 1. It will double copy netlink messages, making it O(n) instead of
> > > O(1), where n - is number of bind()s.. Probably we don't care much.
> >
> > About those bind() patches, I don't understand why they are needed.
> >
> > Why can't you just add the compat skb to the native skb when doing
> > the multicast call?
> >
> > skb_shinfo(skb)->frag_list = compat_skb;
> > xfrm_nlmsg_multicast(net, skb, 0, ...
>
> Oh yeah, sorry, I think I misread the patch - will try to add compat
> skb in the multicast call.
>
> --
> Thanks,
>              Dmitry
>

[-- Attachment #2: Type: text/html, Size: 3972 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-27 16:48             ` Nathan Harold
  0 siblings, 0 replies; 74+ messages in thread
From: nharold @ 2018-07-27 16:48 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1959 bytes --]

*We (Android) are very interested in removing the restriction for 32-bit
userspace processes accessing xfrm netlink on 64-bit kernels. IPsec support
is required to pass Android conformance tests, and any manufacturer wishing
to ship 32-bit userspace with a recent kernel needs out-of-tree changes
(removing the compat_task check) to do so.That said, it’s not difficult to
work around alignment issues directly in userspace, so maybe we could just
remove the check and make this the caller's responsibility? Here’s an
example of the workaround currently in the Android
tree:https://android.googlesource.com/platform/system/netd/+/refs/heads/master/server/XfrmController.h#257
<https://android.googlesource.com/platform/system/netd/+/refs/heads/master/server/XfrmController.h#257>We
could also employ a (relatively simple) solution such as the one above in
the uapi XFRM header itself, though it would require a caller to declare
the target kernel ABI at compile time. Maybe that’s not unthinkable for an
uncommon case?-Nathan*

On Fri, Jul 27, 2018 at 7:51 AM, Dmitry Safonov <dima at arista.com> wrote:

> On Fri, 2018-07-27 at 16:19 +0200, Florian Westphal wrote:
> > Dmitry Safonov <dima at arista.com> wrote:
> > > 1. It will double copy netlink messages, making it O(n) instead of
> > > O(1), where n - is number of bind()s.. Probably we don't care much.
> >
> > About those bind() patches, I don't understand why they are needed.
> >
> > Why can't you just add the compat skb to the native skb when doing
> > the multicast call?
> >
> > skb_shinfo(skb)->frag_list = compat_skb;
> > xfrm_nlmsg_multicast(net, skb, 0, ...
>
> Oh yeah, sorry, I think I misread the patch - will try to add compat
> skb in the multicast call.
>
> --
> Thanks,
>              Dmitry
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linaro.org/pipermail/linux-kselftest-mirror/attachments/20180727/b25cf8da/attachment.html>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-27 16:48             ` Nathan Harold
  0 siblings, 0 replies; 74+ messages in thread
From: Nathan Harold @ 2018-07-27 16:48 UTC (permalink / raw)


*We (Android) are very interested in removing the restriction for 32-bit
userspace processes accessing xfrm netlink on 64-bit kernels. IPsec support
is required to pass Android conformance tests, and any manufacturer wishing
to ship 32-bit userspace with a recent kernel needs out-of-tree changes
(removing the compat_task check) to do so.That said, it’s not difficult to
work around alignment issues directly in userspace, so maybe we could just
remove the check and make this the caller's responsibility? Here’s an
example of the workaround currently in the Android
tree:https://android.googlesource.com/platform/system/netd/+/refs/heads/master/server/XfrmController.h#257
<https://android.googlesource.com/platform/system/netd/+/refs/heads/master/server/XfrmController.h#257>We
could also employ a (relatively simple) solution such as the one above in
the uapi XFRM header itself, though it would require a caller to declare
the target kernel ABI at compile time. Maybe that’s not unthinkable for an
uncommon case?-Nathan*

On Fri, Jul 27, 2018@7:51 AM, Dmitry Safonov <dima@arista.com> wrote:

> On Fri, 2018-07-27@16:19 +0200, Florian Westphal wrote:
> > Dmitry Safonov <dima@arista.com> wrote:
> > > 1. It will double copy netlink messages, making it O(n) instead of
> > > O(1), where n - is number of bind()s.. Probably we don't care much.
> >
> > About those bind() patches, I don't understand why they are needed.
> >
> > Why can't you just add the compat skb to the native skb when doing
> > the multicast call?
> >
> > skb_shinfo(skb)->frag_list = compat_skb;
> > xfrm_nlmsg_multicast(net, skb, 0, ...
>
> Oh yeah, sorry, I think I misread the patch - will try to add compat
> skb in the multicast call.
>
> --
> Thanks,
>              Dmitry
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linaro.org/pipermail/linux-kselftest-mirror/attachments/20180727/b25cf8da/attachment.html>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
  2018-07-27 16:48             ` Nathan Harold
  (?)
  (?)
@ 2018-07-27 17:09               ` Andy Lutomirski
  -1 siblings, 0 replies; 74+ messages in thread
From: Andy Lutomirski @ 2018-07-27 17:09 UTC (permalink / raw)
  To: Nathan Harold
  Cc: Dmitry Safonov, Florian Westphal, Steffen Klassert, linux-kernel,
	David S. Miller, Herbert Xu, Dmitry Safonov, netdev,
	Andy Lutomirski, Ard Biesheuvel, H. Peter Anvin, Ingo Molnar,
	John Stultz, Kirill A. Shutemov, Oleg Nesterov, Stephen Boyd,
	Steven Rostedt, Thomas Gleixner, x86, linux-efi, Andrew Morton,
	G

[-- Attachment #1: Type: text/plain, Size: 2064 bytes --]



> On Jul 27, 2018, at 9:48 AM, Nathan Harold <nharold@google.com> wrote:
> 
> We (Android) are very interested in removing the restriction for 32-bit userspace processes accessing xfrm netlink on 64-bit kernels. IPsec support is required to pass Android conformance tests, and any manufacturer wishing to ship 32-bit userspace with a recent kernel needs out-of-tree changes (removing the compat_task check) to do so.
> 
> That said, it’s not difficult to work around alignment issues directly in userspace, so maybe we could just remove the check and make this the caller's responsibility? Here’s an example of the workaround currently in the Android tree:
> https://android.googlesource.com/platform/system/netd/+/refs/heads/master/server/XfrmController.h#257
> 
> We could also employ a (relatively simple) solution such as the one above in the uapi XFRM header itself, though it would require a caller to declare the target kernel ABI at compile time. Maybe that’s not unthinkable for an uncommon case?
> 

Could there just be an XFRM2 that is entirely identical to XFRM for 64-bit userspace but makes the 32-bit structures match?  If there are a grand total of two or so userspace implementations, that should cover most use cases. L

> -Nathan
> 
> 
>> On Fri, Jul 27, 2018 at 7:51 AM, Dmitry Safonov <dima@arista.com> wrote:
>> On Fri, 2018-07-27 at 16:19 +0200, Florian Westphal wrote:
>> > Dmitry Safonov <dima@arista.com> wrote:
>> > > 1. It will double copy netlink messages, making it O(n) instead of
>> > > O(1), where n - is number of bind()s.. Probably we don't care much.
>> > 
>> > About those bind() patches, I don't understand why they are needed.
>> > 
>> > Why can't you just add the compat skb to the native skb when doing
>> > the multicast call?
>> > 
>> > skb_shinfo(skb)->frag_list = compat_skb;
>> > xfrm_nlmsg_multicast(net, skb, 0, ...
>> 
>> Oh yeah, sorry, I think I misread the patch - will try to add compat
>> skb in the multicast call.
>> 
>> -- 
>> Thanks,
>>              Dmitry
> 

[-- Attachment #2: Type: text/html, Size: 4724 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-27 17:09               ` Andy Lutomirski
  0 siblings, 0 replies; 74+ messages in thread
From: Andy Lutomirski @ 2018-07-27 17:09 UTC (permalink / raw)
  To: Nathan Harold
  Cc: Dmitry Safonov, Florian Westphal, Steffen Klassert, linux-kernel,
	David S. Miller, Herbert Xu, Dmitry Safonov, netdev,
	Andy Lutomirski, Ard Biesheuvel, H. Peter Anvin, Ingo Molnar,
	John Stultz, Kirill A. Shutemov, Oleg Nesterov, Stephen Boyd,
	Steven Rostedt, Thomas Gleixner, x86, linux-efi, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 2064 bytes --]



> On Jul 27, 2018, at 9:48 AM, Nathan Harold <nharold@google.com> wrote:
> 
> We (Android) are very interested in removing the restriction for 32-bit userspace processes accessing xfrm netlink on 64-bit kernels. IPsec support is required to pass Android conformance tests, and any manufacturer wishing to ship 32-bit userspace with a recent kernel needs out-of-tree changes (removing the compat_task check) to do so.
> 
> That said, it’s not difficult to work around alignment issues directly in userspace, so maybe we could just remove the check and make this the caller's responsibility? Here’s an example of the workaround currently in the Android tree:
> https://android.googlesource.com/platform/system/netd/+/refs/heads/master/server/XfrmController.h#257
> 
> We could also employ a (relatively simple) solution such as the one above in the uapi XFRM header itself, though it would require a caller to declare the target kernel ABI at compile time. Maybe that’s not unthinkable for an uncommon case?
> 

Could there just be an XFRM2 that is entirely identical to XFRM for 64-bit userspace but makes the 32-bit structures match?  If there are a grand total of two or so userspace implementations, that should cover most use cases. L

> -Nathan
> 
> 
>> On Fri, Jul 27, 2018 at 7:51 AM, Dmitry Safonov <dima@arista.com> wrote:
>> On Fri, 2018-07-27 at 16:19 +0200, Florian Westphal wrote:
>> > Dmitry Safonov <dima@arista.com> wrote:
>> > > 1. It will double copy netlink messages, making it O(n) instead of
>> > > O(1), where n - is number of bind()s.. Probably we don't care much.
>> > 
>> > About those bind() patches, I don't understand why they are needed.
>> > 
>> > Why can't you just add the compat skb to the native skb when doing
>> > the multicast call?
>> > 
>> > skb_shinfo(skb)->frag_list = compat_skb;
>> > xfrm_nlmsg_multicast(net, skb, 0, ...
>> 
>> Oh yeah, sorry, I think I misread the patch - will try to add compat
>> skb in the multicast call.
>> 
>> -- 
>> Thanks,
>>              Dmitry
> 

[-- Attachment #2: Type: text/html, Size: 4724 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-27 17:09               ` Andy Lutomirski
  0 siblings, 0 replies; 74+ messages in thread
From: luto @ 2018-07-27 17:09 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2225 bytes --]



> On Jul 27, 2018, at 9:48 AM, Nathan Harold <nharold at google.com> wrote:
> 
> We (Android) are very interested in removing the restriction for 32-bit userspace processes accessing xfrm netlink on 64-bit kernels. IPsec support is required to pass Android conformance tests, and any manufacturer wishing to ship 32-bit userspace with a recent kernel needs out-of-tree changes (removing the compat_task check) to do so.
> 
> That said, it’s not difficult to work around alignment issues directly in userspace, so maybe we could just remove the check and make this the caller's responsibility? Here’s an example of the workaround currently in the Android tree:
> https://android.googlesource.com/platform/system/netd/+/refs/heads/master/server/XfrmController.h#257
> 
> We could also employ a (relatively simple) solution such as the one above in the uapi XFRM header itself, though it would require a caller to declare the target kernel ABI at compile time. Maybe that’s not unthinkable for an uncommon case?
> 

Could there just be an XFRM2 that is entirely identical to XFRM for 64-bit userspace but makes the 32-bit structures match?  If there are a grand total of two or so userspace implementations, that should cover most use cases. L

> -Nathan
> 
> 
>> On Fri, Jul 27, 2018 at 7:51 AM, Dmitry Safonov <dima at arista.com> wrote:
>> On Fri, 2018-07-27 at 16:19 +0200, Florian Westphal wrote:
>> > Dmitry Safonov <dima at arista.com> wrote:
>> > > 1. It will double copy netlink messages, making it O(n) instead of
>> > > O(1), where n - is number of bind()s.. Probably we don't care much.
>> > 
>> > About those bind() patches, I don't understand why they are needed.
>> > 
>> > Why can't you just add the compat skb to the native skb when doing
>> > the multicast call?
>> > 
>> > skb_shinfo(skb)->frag_list = compat_skb;
>> > xfrm_nlmsg_multicast(net, skb, 0, ...
>> 
>> Oh yeah, sorry, I think I misread the patch - will try to add compat
>> skb in the multicast call.
>> 
>> -- 
>> Thanks,
>>              Dmitry
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linaro.org/pipermail/linux-kselftest-mirror/attachments/20180727/62f1409d/attachment-0001.html>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-27 17:09               ` Andy Lutomirski
  0 siblings, 0 replies; 74+ messages in thread
From: Andy Lutomirski @ 2018-07-27 17:09 UTC (permalink / raw)




> On Jul 27, 2018,@9:48 AM, Nathan Harold <nharold@google.com> wrote:
> 
> We (Android) are very interested in removing the restriction for 32-bit userspace processes accessing xfrm netlink on 64-bit kernels. IPsec support is required to pass Android conformance tests, and any manufacturer wishing to ship 32-bit userspace with a recent kernel needs out-of-tree changes (removing the compat_task check) to do so.
> 
> That said, it’s not difficult to work around alignment issues directly in userspace, so maybe we could just remove the check and make this the caller's responsibility? Here’s an example of the workaround currently in the Android tree:
> https://android.googlesource.com/platform/system/netd/+/refs/heads/master/server/XfrmController.h#257
> 
> We could also employ a (relatively simple) solution such as the one above in the uapi XFRM header itself, though it would require a caller to declare the target kernel ABI at compile time. Maybe that’s not unthinkable for an uncommon case?
> 

Could there just be an XFRM2 that is entirely identical to XFRM for 64-bit userspace but makes the 32-bit structures match?  If there are a grand total of two or so userspace implementations, that should cover most use cases. L

> -Nathan
> 
> 
>> On Fri, Jul 27, 2018@7:51 AM, Dmitry Safonov <dima@arista.com> wrote:
>> On Fri, 2018-07-27@16:19 +0200, Florian Westphal wrote:
>> > Dmitry Safonov <dima@arista.com> wrote:
>> > > 1. It will double copy netlink messages, making it O(n) instead of
>> > > O(1), where n - is number of bind()s.. Probably we don't care much.
>> > 
>> > About those bind() patches, I don't understand why they are needed.
>> > 
>> > Why can't you just add the compat skb to the native skb when doing
>> > the multicast call?
>> > 
>> > skb_shinfo(skb)->frag_list = compat_skb;
>> > xfrm_nlmsg_multicast(net, skb, 0, ...
>> 
>> Oh yeah, sorry, I think I misread the patch - will try to add compat
>> skb in the multicast call.
>> 
>> -- 
>> Thanks,
>>              Dmitry
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linaro.org/pipermail/linux-kselftest-mirror/attachments/20180727/62f1409d/attachment-0001.html>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
  2018-07-27 16:48             ` Nathan Harold
                                 ` (2 preceding siblings ...)
  (?)
@ 2018-07-28 16:26               ` Dmitry Safonov
  -1 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-28 16:26 UTC (permalink / raw)
  To: Nathan Harold
  Cc: Florian Westphal, Steffen Klassert, linux-kernel,
	David S. Miller, Herbert Xu, Dmitry Safonov, netdev,
	Andy Lutomirski, Ard Biesheuvel, H. Peter Anvin, Ingo Molnar,
	John Stultz, Kirill A. Shutemov, Oleg Nesterov, Stephen Boyd,
	Steven Rostedt, Thomas Gleixner, x86, linux-efi, Andrew Morton,
	Greg Kroah-Hartman, Mauro Carvalho Chehab, Shuah Khan,
	linux-kselftest, Eric Paris, Jozsef Kadlecsik, Pablo Neira Ayuso,
	Paul Moore, coreteam, linux-audit, netfilter-devel, Fan Du

On Fri, 2018-07-27 at 09:48 -0700, Nathan Harold wrote:
> We (Android) are very interested in removing the restriction for 32-
> bit userspace processes accessing xfrm netlink on 64-bit kernels.
> IPsec support is required to pass Android conformance tests, and any
> manufacturer wishing to ship 32-bit userspace with a recent kernel
> needs out-of-tree changes (removing the compat_task check) to do so.

Glad to hear - that justify my attempts more :)

> That said, it’s not difficult to work around alignment issues
> directly in userspace, so maybe we could just remove the check and
> make this the caller's responsibility? Here’s an example of the
> workaround currently in the Android tree:
> https://android.googlesource.com/platform/system/netd/+/refs/heads/ma
> ster/server/XfrmController.h#257

We've kinda same workarounds in our userspace..
But I don't think reverting the check makes much sense - it'll make
broken compat ABI in stone.
If you're fine with disgraceful hacks and just want to get rid of
additional non-mainstream patch - you can make 64-bit syscalls from 32-
bit task (hint: examples in x86 selftests).


> We could also employ a (relatively simple) solution such as the one
> above in the uapi XFRM header itself, though it would require a
> caller to declare the target kernel ABI at compile time. Maybe that’s
> not unthinkable for an uncommon case?

Well, I think, I'll rework my patches set according to critics and
separate compat xfrm layer. I've already a selftest to check that 32/64
bit xfrm works - so the most time-taking part is done.
So, if you'll wait a week or two - you may help me to justify acception
of mainstreaming those patches.

> On Fri, Jul 27, 2018 at 7:51 AM, Dmitry Safonov <dima@arista.com>
> wrote:
> > On Fri, 2018-07-27 at 16:19 +0200, Florian Westphal wrote:
> > > Dmitry Safonov <dima@arista.com> wrote:
> > > > 1. It will double copy netlink messages, making it O(n) instead
> > of
> > > > O(1), where n - is number of bind()s.. Probably we don't care
> > much.
> > > 
> > > About those bind() patches, I don't understand why they are
> > needed.
> > > 
> > > Why can't you just add the compat skb to the native skb when
> > doing
> > > the multicast call?
> > > 
> > > skb_shinfo(skb)->frag_list = compat_skb;
> > > xfrm_nlmsg_multicast(net, skb, 0, ...
> > 
> > Oh yeah, sorry, I think I misread the patch - will try to add
> > compat
> > skb in the multicast call.
> > 

-- 
Thanks,
             Dmitry

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-28 16:26               ` Dmitry Safonov
  0 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-28 16:26 UTC (permalink / raw)
  To: Nathan Harold
  Cc: Florian Westphal, Steffen Klassert, linux-kernel,
	David S. Miller, Herbert Xu, Dmitry Safonov, netdev,
	Andy Lutomirski, Ard Biesheuvel, H. Peter Anvin, Ingo Molnar,
	John Stultz, Kirill A. Shutemov, Oleg Nesterov, Stephen Boyd,
	Steven Rostedt, Thomas Gleixner, x86, linux-efi, Andrew Morton,
	Greg Kroah-Hartman

On Fri, 2018-07-27 at 09:48 -0700, Nathan Harold wrote:
> We (Android) are very interested in removing the restriction for 32-
> bit userspace processes accessing xfrm netlink on 64-bit kernels.
> IPsec support is required to pass Android conformance tests, and any
> manufacturer wishing to ship 32-bit userspace with a recent kernel
> needs out-of-tree changes (removing the compat_task check) to do so.

Glad to hear - that justify my attempts more :)

> That said, it’s not difficult to work around alignment issues
> directly in userspace, so maybe we could just remove the check and
> make this the caller's responsibility? Here’s an example of the
> workaround currently in the Android tree:
> https://android.googlesource.com/platform/system/netd/+/refs/heads/ma
> ster/server/XfrmController.h#257

We've kinda same workarounds in our userspace..
But I don't think reverting the check makes much sense - it'll make
broken compat ABI in stone.
If you're fine with disgraceful hacks and just want to get rid of
additional non-mainstream patch - you can make 64-bit syscalls from 32-
bit task (hint: examples in x86 selftests).


> We could also employ a (relatively simple) solution such as the one
> above in the uapi XFRM header itself, though it would require a
> caller to declare the target kernel ABI at compile time. Maybe that’s
> not unthinkable for an uncommon case?

Well, I think, I'll rework my patches set according to critics and
separate compat xfrm layer. I've already a selftest to check that 32/64
bit xfrm works - so the most time-taking part is done.
So, if you'll wait a week or two - you may help me to justify acception
of mainstreaming those patches.

> On Fri, Jul 27, 2018 at 7:51 AM, Dmitry Safonov <dima@arista.com>
> wrote:
> > On Fri, 2018-07-27 at 16:19 +0200, Florian Westphal wrote:
> > > Dmitry Safonov <dima@arista.com> wrote:
> > > > 1. It will double copy netlink messages, making it O(n) instead
> > of
> > > > O(1), where n - is number of bind()s.. Probably we don't care
> > much.
> > > 
> > > About those bind() patches, I don't understand why they are
> > needed.
> > > 
> > > Why can't you just add the compat skb to the native skb when
> > doing
> > > the multicast call?
> > > 
> > > skb_shinfo(skb)->frag_list = compat_skb;
> > > xfrm_nlmsg_multicast(net, skb, 0, ...
> > 
> > Oh yeah, sorry, I think I misread the patch - will try to add
> > compat
> > skb in the multicast call.
> > 

-- 
Thanks,
             Dmitry

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-28 16:26               ` Dmitry Safonov
  0 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-28 16:26 UTC (permalink / raw)
  To: Nathan Harold
  Cc: Florian Westphal, Steffen Klassert, linux-kernel,
	David S. Miller, Herbert Xu, Dmitry Safonov, netdev,
	Andy Lutomirski, Ard Biesheuvel, H. Peter Anvin, Ingo Molnar,
	John Stultz, Kirill A. Shutemov, Oleg Nesterov, Stephen Boyd,
	Steven Rostedt, Thomas Gleixner, x86, linux-efi, Andrew Morton,
	Greg Kroah-Hartman

On Fri, 2018-07-27 at 09:48 -0700, Nathan Harold wrote:
> We (Android) are very interested in removing the restriction for 32-
> bit userspace processes accessing xfrm netlink on 64-bit kernels.
> IPsec support is required to pass Android conformance tests, and any
> manufacturer wishing to ship 32-bit userspace with a recent kernel
> needs out-of-tree changes (removing the compat_task check) to do so.

Glad to hear - that justify my attempts more :)

> That said, it’s not difficult to work around alignment issues
> directly in userspace, so maybe we could just remove the check and
> make this the caller's responsibility? Here’s an example of the
> workaround currently in the Android tree:
> https://android.googlesource.com/platform/system/netd/+/refs/heads/ma
> ster/server/XfrmController.h#257

We've kinda same workarounds in our userspace..
But I don't think reverting the check makes much sense - it'll make
broken compat ABI in stone.
If you're fine with disgraceful hacks and just want to get rid of
additional non-mainstream patch - you can make 64-bit syscalls from 32-
bit task (hint: examples in x86 selftests).


> We could also employ a (relatively simple) solution such as the one
> above in the uapi XFRM header itself, though it would require a
> caller to declare the target kernel ABI at compile time. Maybe that’s
> not unthinkable for an uncommon case?

Well, I think, I'll rework my patches set according to critics and
separate compat xfrm layer. I've already a selftest to check that 32/64
bit xfrm works - so the most time-taking part is done.
So, if you'll wait a week or two - you may help me to justify acception
of mainstreaming those patches.

> On Fri, Jul 27, 2018 at 7:51 AM, Dmitry Safonov <dima@arista.com>
> wrote:
> > On Fri, 2018-07-27 at 16:19 +0200, Florian Westphal wrote:
> > > Dmitry Safonov <dima@arista.com> wrote:
> > > > 1. It will double copy netlink messages, making it O(n) instead
> > of
> > > > O(1), where n - is number of bind()s.. Probably we don't care
> > much.
> > > 
> > > About those bind() patches, I don't understand why they are
> > needed.
> > > 
> > > Why can't you just add the compat skb to the native skb when
> > doing
> > > the multicast call?
> > > 
> > > skb_shinfo(skb)->frag_list = compat_skb;
> > > xfrm_nlmsg_multicast(net, skb, 0, ...
> > 
> > Oh yeah, sorry, I think I misread the patch - will try to add
> > compat
> > skb in the multicast call.
> > 

-- 
Thanks,
             Dmitry

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-28 16:26               ` Dmitry Safonov
  0 siblings, 0 replies; 74+ messages in thread
From: dima @ 2018-07-28 16:26 UTC (permalink / raw)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 2686 bytes --]

On Fri, 2018-07-27 at 09:48 -0700, Nathan Harold wrote:
> We (Android) are very interested in removing the restriction for 32-
> bit userspace processes accessing xfrm netlink on 64-bit kernels.
> IPsec support is required to pass Android conformance tests, and any
> manufacturer wishing to ship 32-bit userspace with a recent kernel
> needs out-of-tree changes (removing the compat_task check) to do so.

Glad to hear - that justify my attempts more :)

> That said, it’s not difficult to work around alignment issues
> directly in userspace, so maybe we could just remove the check and
> make this the caller's responsibility? Here’s an example of the
> workaround currently in the Android tree:
> https://android.googlesource.com/platform/system/netd/+/refs/heads/ma
> ster/server/XfrmController.h#257

We've kinda same workarounds in our userspace..
But I don't think reverting the check makes much sense - it'll make
broken compat ABI in stone.
If you're fine with disgraceful hacks and just want to get rid of
additional non-mainstream patch - you can make 64-bit syscalls from 32-
bit task (hint: examples in x86 selftests).


> We could also employ a (relatively simple) solution such as the one
> above in the uapi XFRM header itself, though it would require a
> caller to declare the target kernel ABI at compile time. Maybe that’s
> not unthinkable for an uncommon case?

Well, I think, I'll rework my patches set according to critics and
separate compat xfrm layer. I've already a selftest to check that 32/64
bit xfrm works - so the most time-taking part is done.
So, if you'll wait a week or two - you may help me to justify acception
of mainstreaming those patches.

> On Fri, Jul 27, 2018 at 7:51 AM, Dmitry Safonov <dima at arista.com>
> wrote:
> > On Fri, 2018-07-27 at 16:19 +0200, Florian Westphal wrote:
> > > Dmitry Safonov <dima at arista.com> wrote:
> > > > 1. It will double copy netlink messages, making it O(n) instead
> > of
> > > > O(1), where n - is number of bind()s.. Probably we don't care
> > much.
> > > 
> > > About those bind() patches, I don't understand why they are
> > needed.
> > > 
> > > Why can't you just add the compat skb to the native skb when
> > doing
> > > the multicast call?
> > > 
> > > skb_shinfo(skb)->frag_list = compat_skb;
> > > xfrm_nlmsg_multicast(net, skb, 0, ...
> > 
> > Oh yeah, sorry, I think I misread the patch - will try to add
> > compat
> > skb in the multicast call.
> > 

-- 
Thanks,
             Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-28 16:26               ` Dmitry Safonov
  0 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-28 16:26 UTC (permalink / raw)


On Fri, 2018-07-27@09:48 -0700, Nathan Harold wrote:
> We (Android) are very interested in removing the restriction for 32-
> bit userspace processes accessing xfrm netlink on 64-bit kernels.
> IPsec support is required to pass Android conformance tests, and any
> manufacturer wishing to ship 32-bit userspace with a recent kernel
> needs out-of-tree changes (removing the compat_task check) to do so.

Glad to hear - that justify my attempts more :)

> That said, it’s not difficult to work around alignment issues
> directly in userspace, so maybe we could just remove the check and
> make this the caller's responsibility? Here’s an example of the
> workaround currently in the Android tree:
> https://android.googlesource.com/platform/system/netd/+/refs/heads/ma
> ster/server/XfrmController.h#257

We've kinda same workarounds in our userspace..
But I don't think reverting the check makes much sense - it'll make
broken compat ABI in stone.
If you're fine with disgraceful hacks and just want to get rid of
additional non-mainstream patch - you can make 64-bit syscalls from 32-
bit task (hint: examples in x86 selftests).


> We could also employ a (relatively simple) solution such as the one
> above in the uapi XFRM header itself, though it would require a
> caller to declare the target kernel ABI at compile time. Maybe that’s
> not unthinkable for an uncommon case?

Well, I think, I'll rework my patches set according to critics and
separate compat xfrm layer. I've already a selftest to check that 32/64
bit xfrm works - so the most time-taking part is done.
So, if you'll wait a week or two - you may help me to justify acception
of mainstreaming those patches.

> On Fri, Jul 27, 2018 at 7:51 AM, Dmitry Safonov <dima at arista.com>
> wrote:
> > On Fri, 2018-07-27@16:19 +0200, Florian Westphal wrote:
> > > Dmitry Safonov <dima@arista.com> wrote:
> > > > 1. It will double copy netlink messages, making it O(n) instead
> > of
> > > > O(1), where n - is number of bind()s.. Probably we don't care
> > much.
> > > 
> > > About those bind() patches, I don't understand why they are
> > needed.
> > > 
> > > Why can't you just add the compat skb to the native skb when
> > doing
> > > the multicast call?
> > > 
> > > skb_shinfo(skb)->frag_list = compat_skb;
> > > xfrm_nlmsg_multicast(net, skb, 0, ...
> > 
> > Oh yeah, sorry, I think I misread the patch - will try to add
> > compat
> > skb in the multicast call.
> > 

-- 
Thanks,
             Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
  2018-07-28 16:26               ` Dmitry Safonov
  (?)
@ 2018-07-28 21:18                 ` davem
  -1 siblings, 0 replies; 74+ messages in thread
From: David Miller @ 2018-07-28 21:18 UTC (permalink / raw)
  To: dima
  Cc: nharold, fw, steffen.klassert, linux-kernel, herbert, 0x7f454c46,
	netdev, luto, ard.biesheuvel, hpa, mingo, john.stultz,
	kirill.shutemov, oleg, sboyd, rostedt, tglx, x86, linux-efi,
	akpm, gregkh, mchehab+samsung, shuah, linux-kselftest, eparis,
	kadlec, pablo, paul, coreteam, linux-audit, netfilter-devel,
	fan.du

From: Dmitry Safonov <dima@arista.com>
Date: Sat, 28 Jul 2018 17:26:55 +0100

> Well, I think, I'll rework my patches set according to critics and
> separate compat xfrm layer. I've already a selftest to check that 32/64
> bit xfrm works - so the most time-taking part is done.

The way you've done the compat structures using __packed is only going
to work on x86, just FYI.

The "32-bit alignment for 64-bit objects" thing x86 has is very much
not universal amongst ABIs having 32-bit and 64-bit variants.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-28 21:18                 ` davem
  0 siblings, 0 replies; 74+ messages in thread
From: davem @ 2018-07-28 21:18 UTC (permalink / raw)


From: Dmitry Safonov <dima at arista.com>
Date: Sat, 28 Jul 2018 17:26:55 +0100

> Well, I think, I'll rework my patches set according to critics and
> separate compat xfrm layer. I've already a selftest to check that 32/64
> bit xfrm works - so the most time-taking part is done.

The way you've done the compat structures using __packed is only going
to work on x86, just FYI.

The "32-bit alignment for 64-bit objects" thing x86 has is very much
not universal amongst ABIs having 32-bit and 64-bit variants.
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-28 21:18                 ` davem
  0 siblings, 0 replies; 74+ messages in thread
From: David Miller @ 2018-07-28 21:18 UTC (permalink / raw)


From: Dmitry Safonov <dima@arista.com>
Date: Sat, 28 Jul 2018 17:26:55 +0100

> Well, I think, I'll rework my patches set according to critics and
> separate compat xfrm layer. I've already a selftest to check that 32/64
> bit xfrm works - so the most time-taking part is done.

The way you've done the compat structures using __packed is only going
to work on x86, just FYI.

The "32-bit alignment for 64-bit objects" thing x86 has is very much
not universal amongst ABIs having 32-bit and 64-bit variants.
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
  2018-07-28 21:18                 ` davem
  (?)
@ 2018-07-30 17:39                   ` dima
  -1 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-30 17:39 UTC (permalink / raw)
  To: David Miller
  Cc: nharold, fw, steffen.klassert, linux-kernel, herbert, 0x7f454c46,
	netdev, luto, ard.biesheuvel, hpa, mingo, john.stultz,
	kirill.shutemov, oleg, sboyd, rostedt, tglx, x86, linux-efi,
	akpm, gregkh, mchehab+samsung, shuah, linux-kselftest, eparis,
	kadlec, pablo, paul, coreteam, linux-audit, netfilter-devel,
	fan.du

On Sat, 2018-07-28 at 14:18 -0700, David Miller wrote:
> From: Dmitry Safonov <dima@arista.com>
> Date: Sat, 28 Jul 2018 17:26:55 +0100
> 
> > Well, I think, I'll rework my patches set according to critics and
> > separate compat xfrm layer. I've already a selftest to check that
> 32/64
> > bit xfrm works - so the most time-taking part is done.
> 
> The way you've done the compat structures using __packed is only
> going
> to work on x86, just FYI.

Thanks for pointing, so I'll probably cover it under something like
HAS_COMPAT_XFRM.
(if there isn't any better idea).

> The "32-bit alignment for 64-bit objects" thing x86 has is very much
> not universal amongst ABIs having 32-bit and 64-bit variants.

-- 
Thanks,
             Dmitry

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-30 17:39                   ` dima
  0 siblings, 0 replies; 74+ messages in thread
From: dima @ 2018-07-30 17:39 UTC (permalink / raw)


On Sat, 2018-07-28 at 14:18 -0700, David Miller wrote:
> From: Dmitry Safonov <dima at arista.com>
> Date: Sat, 28 Jul 2018 17:26:55 +0100
> 
> > Well, I think, I'll rework my patches set according to critics and
> > separate compat xfrm layer. I've already a selftest to check that
> 32/64
> > bit xfrm works - so the most time-taking part is done.
> 
> The way you've done the compat structures using __packed is only
> going
> to work on x86, just FYI.

Thanks for pointing, so I'll probably cover it under something like
HAS_COMPAT_XFRM.
(if there isn't any better idea).

> The "32-bit alignment for 64-bit objects" thing x86 has is very much
> not universal amongst ABIs having 32-bit and 64-bit variants.

-- 
Thanks,
             Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-30 17:39                   ` dima
  0 siblings, 0 replies; 74+ messages in thread
From: Dmitry Safonov @ 2018-07-30 17:39 UTC (permalink / raw)


On Sat, 2018-07-28@14:18 -0700, David Miller wrote:
> From: Dmitry Safonov <dima at arista.com>
> Date: Sat, 28 Jul 2018 17:26:55 +0100
> 
> > Well, I think, I'll rework my patches set according to critics and
> > separate compat xfrm layer. I've already a selftest to check that
> 32/64
> > bit xfrm works - so the most time-taking part is done.
> 
> The way you've done the compat structures using __packed is only
> going
> to work on x86, just FYI.

Thanks for pointing, so I'll probably cover it under something like
HAS_COMPAT_XFRM.
(if there isn't any better idea).

> The "32-bit alignment for 64-bit objects" thing x86 has is very much
> not universal amongst ABIs having 32-bit and 64-bit variants.

-- 
Thanks,
             Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [PATCH 00/18] xfrm: Add compat layer
  2018-07-30 17:39                   ` dima
  (?)
@ 2018-07-30 19:43                     ` fw
  -1 siblings, 0 replies; 74+ messages in thread
From: Florian Westphal @ 2018-07-30 19:43 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: David Miller, nharold, fw, steffen.klassert, linux-kernel,
	herbert, 0x7f454c46, netdev, luto, ard.biesheuvel, hpa, mingo,
	john.stultz, kirill.shutemov, oleg, sboyd, rostedt, tglx, x86,
	linux-efi, akpm, gregkh, mchehab+samsung, shuah, linux-kselftest,
	eparis, kadlec, pablo, paul, coreteam, linux-audit,
	netfilter-devel, fan.du

Dmitry Safonov <dima@arista.com> wrote:
> On Sat, 2018-07-28 at 14:18 -0700, David Miller wrote:
> > From: Dmitry Safonov <dima@arista.com>
> > Date: Sat, 28 Jul 2018 17:26:55 +0100
> > 
> > > Well, I think, I'll rework my patches set according to critics and
> > > separate compat xfrm layer. I've already a selftest to check that
> > 32/64
> > > bit xfrm works - so the most time-taking part is done.
> > 
> > The way you've done the compat structures using __packed is only
> > going
> > to work on x86, just FYI.
> 
> Thanks for pointing, so I'll probably cover it under something like
> HAS_COMPAT_XFRM.
> (if there isn't any better idea).

You can do that, I suspect you can use
CONFIG_COMPAT_FOR_U64_ALIGNMENT
as AFAICR the only reason for the compat problem is different alignment
requirements of 64bit integer types in the structs, not e.g. due to
"long" size differences.

Instead of __packed, you can use the "compat" data types, e.g.
compat_u64 instead of u64:

struct compat_xfrm_lifetime_cur {
	compat_u64 bytes, packets, add_time, use_time;
}; /* same size on i386, but only 4 byte alignment required even on x86_64*/

You might be able to reuse
https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869

in your patch set.

I can try to submit the first few patches (which are not related to
compat, they just add const qualifiers) for inclusion later this week.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-30 19:43                     ` fw
  0 siblings, 0 replies; 74+ messages in thread
From: fw @ 2018-07-30 19:43 UTC (permalink / raw)


Dmitry Safonov <dima at arista.com> wrote:
> On Sat, 2018-07-28 at 14:18 -0700, David Miller wrote:
> > From: Dmitry Safonov <dima at arista.com>
> > Date: Sat, 28 Jul 2018 17:26:55 +0100
> > 
> > > Well, I think, I'll rework my patches set according to critics and
> > > separate compat xfrm layer. I've already a selftest to check that
> > 32/64
> > > bit xfrm works - so the most time-taking part is done.
> > 
> > The way you've done the compat structures using __packed is only
> > going
> > to work on x86, just FYI.
> 
> Thanks for pointing, so I'll probably cover it under something like
> HAS_COMPAT_XFRM.
> (if there isn't any better idea).

You can do that, I suspect you can use
CONFIG_COMPAT_FOR_U64_ALIGNMENT
as AFAICR the only reason for the compat problem is different alignment
requirements of 64bit integer types in the structs, not e.g. due to
"long" size differences.

Instead of __packed, you can use the "compat" data types, e.g.
compat_u64 instead of u64:

struct compat_xfrm_lifetime_cur {
	compat_u64 bytes, packets, add_time, use_time;
}; /* same size on i386, but only 4 byte alignment required even on x86_64*/

You might be able to reuse
https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869

in your patch set.

I can try to submit the first few patches (which are not related to
compat, they just add const qualifiers) for inclusion later this week.
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [PATCH 00/18] xfrm: Add compat layer
@ 2018-07-30 19:43                     ` fw
  0 siblings, 0 replies; 74+ messages in thread
From: Florian Westphal @ 2018-07-30 19:43 UTC (permalink / raw)


Dmitry Safonov <dima@arista.com> wrote:
> On Sat, 2018-07-28@14:18 -0700, David Miller wrote:
> > From: Dmitry Safonov <dima at arista.com>
> > Date: Sat, 28 Jul 2018 17:26:55 +0100
> > 
> > > Well, I think, I'll rework my patches set according to critics and
> > > separate compat xfrm layer. I've already a selftest to check that
> > 32/64
> > > bit xfrm works - so the most time-taking part is done.
> > 
> > The way you've done the compat structures using __packed is only
> > going
> > to work on x86, just FYI.
> 
> Thanks for pointing, so I'll probably cover it under something like
> HAS_COMPAT_XFRM.
> (if there isn't any better idea).

You can do that, I suspect you can use
CONFIG_COMPAT_FOR_U64_ALIGNMENT
as AFAICR the only reason for the compat problem is different alignment
requirements of 64bit integer types in the structs, not e.g. due to
"long" size differences.

Instead of __packed, you can use the "compat" data types, e.g.
compat_u64 instead of u64:

struct compat_xfrm_lifetime_cur {
	compat_u64 bytes, packets, add_time, use_time;
}; /* same size on i386, but only 4 byte alignment required even on x86_64*/

You might be able to reuse
https://git.breakpoint.cc/cgit/fw/net-next.git/commit/?h=xfrm_config_compat_07&id=f64430e6d9e297f3990f485a4832e273751b9869

in your patch set.

I can try to submit the first few patches (which are not related to
compat, they just add const qualifiers) for inclusion later this week.
--
To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2018-07-30 19:44 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-26  2:31 [PATCH 00/18] xfrm: Add compat layer Dmitry Safonov
2018-07-26  2:31 ` Dmitry Safonov
2018-07-26  2:31 ` dima
2018-07-26  2:31 ` Dmitry Safonov
2018-07-26  2:31 ` Dmitry Safonov
2018-07-26  2:31 ` [PATCH 01/18] x86/compat: Adjust in_compat_syscall() to generic code under !COMPAT Dmitry Safonov
2018-07-26  2:31 ` [PATCH 02/18] compat: Cleanup in_compat_syscall() callers Dmitry Safonov
2018-07-26  2:31 ` [PATCH 03/18] selftest/net/xfrm: Add test for ipsec tunnel Dmitry Safonov
2018-07-26  2:31   ` Dmitry Safonov
2018-07-26  2:31   ` dima
2018-07-26  2:31 ` [PATCH 04/18] net/xfrm: Add _packed types for compat users Dmitry Safonov
2018-07-26  2:31 ` [PATCH 05/18] net/xfrm: Parse userspi_info{,_packed} depending on syscall Dmitry Safonov
2018-07-26  2:31 ` [PATCH 06/18] netlink: Do not subscribe to non-existent groups Dmitry Safonov
2018-07-26  4:22   ` David Miller
2018-07-27 13:43     ` Dmitry Safonov
2018-07-26  2:31 ` [PATCH 07/18] netlink: Pass groups pointer to .bind() Dmitry Safonov
2018-07-26  2:31 ` [PATCH 08/18] xfrm: Add in-kernel groups for compat notifications Dmitry Safonov
2018-07-26  2:31 ` [PATCH 09/18] xfrm: Dump usersa_info in compat/native formats Dmitry Safonov
2018-07-26  2:31 ` [PATCH 10/18] xfrm: Send state notifications in compat format too Dmitry Safonov
2018-07-26  2:31 ` [PATCH 11/18] xfrm: Add compat support for xfrm_user_expire messages Dmitry Safonov
2018-07-26  2:31 ` [PATCH 12/18] xfrm: Add compat support for xfrm_userpolicy_info messages Dmitry Safonov
2018-07-26  2:31 ` [PATCH 13/18] xfrm: Add compat support for xfrm_user_acquire messages Dmitry Safonov
2018-07-26  2:31 ` [PATCH 14/18] xfrm: Add compat support for xfrm_user_polexpire messages Dmitry Safonov
2018-07-26  2:31 ` [PATCH 15/18] xfrm: Check compat acquire listeners in xfrm_is_alive() Dmitry Safonov
2018-07-26  2:31 ` [PATCH 16/18] xfrm: Notify compat listeners about policy flush Dmitry Safonov
2018-07-26  2:31 ` [PATCH 17/18] xfrm: Notify compat listeners about state flush Dmitry Safonov
2018-07-26  2:31 ` [PATCH 18/18] xfrm: Enable compat syscalls Dmitry Safonov
2018-07-26  8:49 ` [PATCH 00/18] xfrm: Add compat layer Florian Westphal
2018-07-26  8:49   ` Florian Westphal
2018-07-26  8:49   ` fw
2018-07-26  8:49   ` Florian Westphal
2018-07-26  8:49   ` Florian Westphal
2018-07-27  7:37   ` Steffen Klassert
2018-07-27  7:37     ` Steffen Klassert
2018-07-27  7:37     ` steffen.klassert
2018-07-27  7:37     ` Steffen Klassert
2018-07-27  7:37     ` Steffen Klassert
2018-07-27 14:02     ` Dmitry Safonov
2018-07-27 14:02       ` Dmitry Safonov
2018-07-27 14:02       ` dima
2018-07-27 14:02       ` Dmitry Safonov
2018-07-27 14:02       ` Dmitry Safonov
2018-07-27 14:19       ` Florian Westphal
2018-07-27 14:19         ` Florian Westphal
2018-07-27 14:19         ` fw
2018-07-27 14:19         ` Florian Westphal
2018-07-27 14:19         ` Florian Westphal
2018-07-27 14:51         ` Dmitry Safonov
2018-07-27 14:51           ` Dmitry Safonov
2018-07-27 14:51           ` dima
2018-07-27 14:51           ` Dmitry Safonov
2018-07-27 14:51           ` Dmitry Safonov
2018-07-27 16:48           ` Nathan Harold
2018-07-27 16:48             ` Nathan Harold
2018-07-27 16:48             ` nharold
2018-07-27 16:48             ` Nathan Harold
2018-07-27 17:09             ` Andy Lutomirski
2018-07-27 17:09               ` Andy Lutomirski
2018-07-27 17:09               ` luto
2018-07-27 17:09               ` Andy Lutomirski
2018-07-28 16:26             ` Dmitry Safonov
2018-07-28 16:26               ` Dmitry Safonov
2018-07-28 16:26               ` dima
2018-07-28 16:26               ` Dmitry Safonov
2018-07-28 16:26               ` Dmitry Safonov
2018-07-28 21:18               ` David Miller
2018-07-28 21:18                 ` David Miller
2018-07-28 21:18                 ` davem
2018-07-30 17:39                 ` Dmitry Safonov
2018-07-30 17:39                   ` Dmitry Safonov
2018-07-30 17:39                   ` dima
2018-07-30 19:43                   ` Florian Westphal
2018-07-30 19:43                     ` Florian Westphal
2018-07-30 19:43                     ` fw

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.