From mboxrd@z Thu Jan  1 00:00:00 1970
From: Keir Fraser <keir@xensource.com>
Subject: Re: [PATCH] Require that xenstored writes to a domain
	complete in a single chunk
Date: Mon, 26 Feb 2007 17:20:34 +0000
Message-ID: <C208C762.A2B6%keir@xensource.com>
References: <871wkcyjig.fsf@apfelstrudel.hh.sledj.net>
Mime-Version: 1.0
Content-Type: text/plain;
	charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <871wkcyjig.fsf@apfelstrudel.hh.sledj.net>
List-Unsubscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: David Edmondson <dme@sun.com>, xen-devel@lists.xensource.com
List-Id: xen-devel@lists.xenproject.org

On 26/2/07 16:24, "David Edmondson" <dme@sun.com> wrote:

> If xenstored is part-way through writing a header+payload into the
> buffer shared with a guest domain when the guest domain decides to
> suspend, the buffer is corrupted, as xenstored doesn't know that it
> has a partial write to complete when the domain revives.  The domain
> is expecting proper completion of the partial header+payload and is
> disappointed.
> 
> The attached patch modifies xenstored such that it checks for
> sufficient space for header+payload before making any changes to the
> shared buffer.
> 
> It is against 3.0.4-1, but the code in unstable looks the same.

This seems dubious. There's no reason we might not have payloads bigger than
the ring size (which is only 1kB).

The right fix would be in the guest, which should already be stopping any
transactions or commands across save/restore. Does this problem occur when
xenstored sends an asynchronous watch-fired message? Probably the
packet-reading thread should be interrupted and put to sleep before
suspending.

For older guest compatibility perhaps we can take a variant of your patch
that only waits for enough space is the entire message fits in the ring in
one go. This would be 'best-effort' at compatibility while not precluding
use of larger messages in general.

 -- Keir