From mboxrd@z Thu Jan 1 00:00:00 1970 From: Oren Laadan Subject: Re: [RFC v14-rc2][PATCH 5/7] Infrastructure for work postponed to the end of checkpoint/restart Date: Tue, 31 Mar 2009 12:00:10 -0400 Message-ID: <49D23E0A.2030308@cs.columbia.edu> References: <1238477552-17083-1-git-send-email-orenl@cs.columbia.edu> <1238477552-17083-6-git-send-email-orenl@cs.columbia.edu> <1238512639.8286.658.camel@nimitz> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1238512639.8286.658.camel@nimitz> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Dave Hansen Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org List-Id: containers.vger.kernel.org Dave Hansen wrote: > On Tue, 2009-03-31 at 01:32 -0400, Oren Laadan wrote: >> Add a interface to postpone an action until the end of the entire >> checkpoint or restart operation. This is useful when during the >> scan of tasks an operation cannot be performed in place, to avoid >> the need for a second scan. > > Why aren't we using the existing kernel workqueue mechanism? Because we need to defer to work until the end of the operation: not earlier, because it we defer it for a reason; not later, because we will block waiting for it. The kernel's workqueue schedules the work for 'some time later'. It may be in particular too early. Although unlikely, it can also occur arbitrarily later, so finishing and cleaning up a checkpoint or a restart will have to block on it. Also, the kernel workqueue cannot make any assumptions about the task context in which the work is performed. The restart many times builds on running in the context of some specific restarting task. Example: this patch assumes a single (common) ipc namespace, but that is easy to change. To support more than one, we'll need to perform the deferred ipc action in the context of the process that has that ipc_ns. (this means that this mechanism will evolve to per-task.) If we were to use that workqueue, we would probably need to create a queue per c/r operation to allow efficient flush; recall that each workqueue comes with its own thread(s). In general, the mechanism is too heavy. What we need is a simple way for the c/r operation as a whole, and later a task in particular, to defer some action until later _in the restart_ process (not arbitrarily). I should have named it cr_deferwork, and wrote this ^^^ in the patch. Oren.