All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/3] TCP connection repair (v2)
@ 2012-03-06  9:54 Pavel Emelyanov
  2012-03-06  9:55 ` [PATCH 1/3] tcp: Move code around Pavel Emelyanov
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Pavel Emelyanov @ 2012-03-06  9:54 UTC (permalink / raw)
  To: Linux Netdev List, David Miller, Tejun Heo, Eric Dumazet

Hi!

Attempt #2 with transparent TCP connection hijacking.


The idea briefly is -- introduce the "repair" mode of a TCP socket. In this mode
any API call to the socket (bind/connect/sendmsg/etc.) does not result in packets
sent over the network, but instead modifies the socket locally in an expected way.

I.e., the connect() in the repair mode assigns peer's credentials to the sock and 
just turns one into the connected state without issuing SYN-s or whatever. The
bind() call on the socket under repair forcibly binds one to the desired IP and 
port ignoring any (potential) local conflicts (just like if everybody else has the
SO_REUSEADDR set). The sendmsg() just queues data for transmission, etc.

I think, that it makes sense to have this ability in a form of non-obscure API,
since the connection migration can be used not only by checkpoint/restore project,
but also by various load balancing solutions. E.g., a server can accept the 
connection, read the app-level header out of the stream, take the balancing 
decision based on _it_ (rather than just TCP and/or IP header) and then pass 
the existing connection to another host.


Changes since v1:

* Addressed (I hope) David's concern about TCP sequences self-consistency.

The repair mode is turned on only for "static" TCP states, i.e. TCP_CLOSE and
TCP_ESTABLISHED with the socket being locked. Only two sequences can be changed
manually -- the write_seq and the copied_seq -- and only when the socket is in
TCP_CLOSE state. The rest is maintained fully by the kernel code according to
the protocol rules in connect/sendmsg/etc. calls.

* Slight API change.

Instead of two separate sockoptions for send and receive queues sequences I
introduce the option which sets which queue is under repair right now and
the other option for setting the sequence (as described -- works only for
TCP_CLOSE state) of the queue under repair. Yes, there still two options for
this, but such approach helps with socket queues repair (see below).

* Added support for queues repair.

According to the overall idea of the "repair" mode the recv/send syscalls
are used for this and what they do is just peek/poke data from/to queues.
The queue-under-repair set by the described option makes it possible to read
from the send queue and write to the receive one. Caller is obliged to use
the MSG_PEEK flag for recv in the repair mode.


Thanks,
Pavel

^ permalink raw reply	[flat|nested] 12+ messages in thread
* [PATCH net-next 0/3] TCP connection repair (v3)
@ 2012-03-28 15:36 Pavel Emelyanov
  2012-03-28 15:37 ` [PATCH 2/3] tcp: Initial repair mode Pavel Emelyanov
  0 siblings, 1 reply; 12+ messages in thread
From: Pavel Emelyanov @ 2012-03-28 15:36 UTC (permalink / raw)
  To: Linux Netdev List, David Miller

Hi!

Attempt #3 with transparent TCP connection hijacking
(previous one is here http://lists.openwall.net/netdev/2012/03/06/65).


Changes since v2:

* The CAP_NET_ADMIN is required to turn repair on, not CAP_SYS_ADMIN

* Changed read queue seq sockoption to work on the rcv_nxt, not the
  copied_seq to address the issue with syn flag in the fake header
  (see below).

* Resolved issues with syn and fin flags in fake headers.

  Fin can and should be dropped. The repair mode is currently allowed
  only for closed and established sockets and thus we cannot meet an 
  skb with this flag in the original socket (queuing fin to receive
  queue switches the established state to the close-wait one).

  Syn can also be dropped. This flag in the recv queue's skb means the
  respective skb's seq is off-by-one relative to the actual amount of 
  data on it. Thus, removing the flag from fake skb and fixing the seq 
  respectively solves the issue.

  However, in order to do so it's not enough to know the copied_seq and
  recv queue length only (rcv_nxt should be copied_seq plus data length
  plus "syn-is-there"). Thus, the rcv queue seq get/set sockoption is
  changed to work on the rcv_nxt itself. IOW I emulate the situation
  when the packet with data and syn is splitted into two -- a packet with
  syn and a packet with data and the former one is already "eaten".


Thanks,
Pavel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-03-29  9:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-06  9:54 [RFC][PATCH 0/3] TCP connection repair (v2) Pavel Emelyanov
2012-03-06  9:55 ` [PATCH 1/3] tcp: Move code around Pavel Emelyanov
2012-03-06  9:55 ` [PATCH 2/3] tcp: Initial repair mode Pavel Emelyanov
2012-03-06 13:11   ` Glauber Costa
2012-03-06 20:16     ` David Miller
2012-03-06  9:55 ` [PATCH 3/3] tcp: Repair socket queues Pavel Emelyanov
2012-03-06 21:14 ` [RFC][PATCH 0/3] TCP connection repair (v2) David Miller
2012-03-28 15:36 [PATCH net-next 0/3] TCP connection repair (v3) Pavel Emelyanov
2012-03-28 15:37 ` [PATCH 2/3] tcp: Initial repair mode Pavel Emelyanov
2012-03-28 17:20   ` Glauber Costa
2012-03-29  9:52     ` Pavel Emelyanov
2012-03-28 20:39   ` Ben Hutchings
2012-03-29  9:53     ` Pavel Emelyanov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.