linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sridhar Samudrala <sri@us.ibm.com>
To: Stephen Hemminger <shemminger@osdl.org>
Cc: "David S. Miller" <davem@davemloft.net>,
	mpm@selenic.com, ak@suse.de, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org
Subject: Re: [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism
Date: Fri, 16 Dec 2005 10:38:19 -0800	[thread overview]
Message-ID: <1134758299.10691.28.camel@w-sridhar2.beaverton.ibm.com> (raw)
In-Reply-To: <20051216094810.70082caa@dxpl.pdx.osdl.net>

On Fri, 2005-12-16 at 09:48 -0800, Stephen Hemminger wrote:
> On Thu, 15 Dec 2005 18:09:22 -0800
> Sridhar Samudrala <sri@us.ibm.com> wrote:
> 
> > On Thu, 2005-12-15 at 00:21 -0800, David S. Miller wrote:
> > > From: Sridhar Samudrala <sri@us.ibm.com>
> > > Date: Wed, 14 Dec 2005 23:37:37 -0800 (PST)
> > > 
> > > > Instead, you seem to be suggesting in_emergency to be set dynamically
> > > > when we are about to run out of ATOMIC memory. Is this right?
> > > 
> > > Not when we run out, but rather when we reach some low water mark, the
> > > "critical sockets" would still use GFP_ATOMIC memory but only
> > > "critical sockets" would be allowed to do so.
> > > 
> > > But even this has faults, consider the IPSEC scenerio I mentioned, and
> > > this applies to any kind of encapsulation actually, even simple
> > > tunneling examples can be concocted which make the "critical socket"
> > > idea fail.
> > > 
> > > The knee jerk reaction is "mark IPSEC's sockets critical, and mark the
> > > tunneling allocations critical, and... and..."  well you have
> > > GFP_ATOMIC then my friend.
> > 
> > I would like to mention another reason why we need to have a new 
> > GFP_CRITICAL flag for an allocation request. When we are in emergency,
> > even the GFP_KERNEL allocations for a critical socket should not 
> > sleep. This is because the swap device may have failed and we would
> > like to communicate this event to a management server over the 
> > critical socket so that it can initiate the failover.
> > 
> > We are not trying to solve swapping over network problem. It is much
> > simpler. The critical sockets are to be used only to send/receive
> > a few critical messages reliably during a short period of emergency.
> > 
> 
> If it is only one place, why not pre-allocate one "I'm sick now"
> skb and hold onto it. Any bigger solution seems to snowball into
> a huge mess.

But the problem is even sending/receiving a single packet can cause 
multiple dynamic allocations in the networking path all the way from
the sockets layer->transport->ip->driver.
To successfully send a packet, we may have to do arp, send acks and 
create cached routes etc. So my patch tried to identify the allocations
that are needed to succesfully send/receive packets over a pre-established
socket and adds a new flag GFP_CRITICAL to those calls.
This doesn't make any difference when we are not in emergency. But when
we go into emergency, VM will try to satisfy these allocations from a
critical pool if the normal path leads to failure.

We go into emergency when some management app detects that a swap device
is about to fail(we are not yet in OOM, but will enter OOM soon). In order
to avoid entering OOM, we need to send a message over a critical socket to
a remote server that can initiate failover and switch to a different swap
device. The switchover will happen within 2 minutes after it is initiated.
In a cluster environment, the remote server also sends a message to other
nodes which are also running the management app so that they also enter
emergency. Once we successfully switch to a different swap device, the remote
server sends a message to all the nodes and they come out of emergency.

During the period of emergency, all other communications can block. But
guranteeing the successful delivery of the critical messages will help 
in making sure that we do not enter OOM situation.

Thanks
Sridhar



  reply	other threads:[~2005-12-16 18:39 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-12-14  9:12 [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism Sridhar Samudrala
2005-12-14  9:22 ` Andi Kleen
2005-12-14 17:55   ` Sridhar Samudrala
2005-12-14 18:41     ` Andi Kleen
2005-12-14 19:20       ` David Stevens
2005-12-15  3:39     ` Matt Mackall
2005-12-15  4:30       ` David S. Miller
2005-12-15  5:02         ` Matt Mackall
2005-12-15  5:23           ` David S. Miller
2005-12-15  5:48             ` Matt Mackall
2005-12-15  5:53             ` Nick Piggin
2005-12-15  5:56             ` Stephen Hemminger
2005-12-15  8:44               ` David Stevens
2005-12-15  8:58                 ` David S. Miller
2005-12-15  9:27                   ` David Stevens
2005-12-15  5:42         ` Andi Kleen
2005-12-15  6:06           ` Stephen Hemminger
2005-12-15  7:37         ` Sridhar Samudrala
2005-12-15  8:21           ` David S. Miller
2005-12-15  8:35             ` Arjan van de Ven
2005-12-15  8:55             ` [RFC] Fine-grained memory priorities and PI Kyle Moffett
2005-12-15  9:04               ` Andi Kleen
2005-12-15 12:51                 ` Kyle Moffett
2005-12-15 13:31                   ` Andi Kleen
2005-12-15 12:45               ` Con Kolivas
2005-12-15 12:58                 ` Kyle Moffett
2005-12-15 13:02                   ` Con Kolivas
2005-12-16  2:09             ` [RFC][PATCH 0/3] TCP/IP Critical socket communication mechanism Sridhar Samudrala
2005-12-16 17:48               ` Stephen Hemminger
2005-12-16 18:38                 ` Sridhar Samudrala [this message]
2005-12-21  9:11                   ` Pavel Machek
2005-12-21  9:39                     ` David Stevens
2005-12-14 20:16 ` Jesper Juhl
2005-12-14 20:25   ` Ben Greear
2005-12-14 20:49   ` James Courtier-Dutton
2005-12-14 21:55     ` Sridhar Samudrala
2005-12-14 22:09       ` James Courtier-Dutton
2005-12-14 22:39         ` Ben Greear
2005-12-14 23:42           ` Sridhar Samudrala
2005-12-15  1:54     ` Mitchell Blank Jr
2005-12-15 11:38       ` James Courtier-Dutton
2005-12-15 11:47         ` Arjan van de Ven
2005-12-15 13:00           ` jamal
2005-12-15 13:07             ` Arjan van de Ven
2005-12-15 13:32               ` jamal
     [not found] <5jUjW-8nu-7@gated-at.bofh.it>
     [not found] ` <5jWYp-3K1-19@gated-at.bofh.it>
     [not found]   ` <5jXhZ-4kj-19@gated-at.bofh.it>
2005-12-16  8:35     ` Bodo Eggert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1134758299.10691.28.camel@w-sridhar2.beaverton.ibm.com \
    --to=sri@us.ibm.com \
    --cc=ak@suse.de \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpm@selenic.com \
    --cc=netdev@vger.kernel.org \
    --cc=shemminger@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).