All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
To: Alex Bligh <alex@alex.org.uk>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Xen Devel <xen-devel@lists.xen.org>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Jan Beulich <JBeulich@suse.com>,
	Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
Subject: Re: Fatal crash on xen4.2 HVM + qemu-xen dm + NFS
Date: Wed, 16 Jan 2013 16:27:08 +0000	[thread overview]
Message-ID: <alpine.DEB.2.02.1301161620390.4978@kaball.uk.xensource.com> (raw)
In-Reply-To: <F1DF150F7B2469CFD587A2A4@Ximines.local>

On Wed, 16 Jan 2013, Alex Bligh wrote:
> Stefano,
> 
> --On 16 January 2013 14:34:34 +0000 Stefano Stabellini 
> <stefano.stabellini@eu.citrix.com> wrote:
> 
> > It seems that the grant mapping is already gone by the time
> > tcp_retransmit is called.
> > That might happen because QEMU already completed the read/write
> > operation and called xc_gnttab_munmap, that causes the grant_table and
> > the m2p_override to remove the p2m and m2p mappings of the foreign
> > pages.
> 
> What I want to know is why QEMU is completing the read/write operation
> before the write (as it surely must be a write) has completed in any
> case. This /seems/ to happen only if a backing file is being used
> but I'm not sure if that's just triggering the retransmits due to
> (e.g.) a slow filer.
> 
> If QEMU is completing writes before they've actually been done, haven't
> we got a wider set of problems to worry about?

Reading the thread you linked in a previous email, it seems that
it can actually happen that a userspace application is told that
the write is completed before all the outstanding network requests are
dealt with.


> Could the problem be "cache=writeback" on the QEMU command
> line (evident from a 'ps'). If caching is writeback perhaps QEMU
> needs to copy the data. Is there some setting to turn this off in
> xl for test purposes?

The command line cache options are ignored by xen_disk, so, assuming
that the guest is using the PV disk interface, that can't be the issue.


> > Isn't there a way to prevent tcp_retransmit from running when the
> > request is already completed? Or stop it if you find out that the pages
> > are already gone?
> 
> But what would you do? If you don't run the tcp_retransmit the write
> would be lost (to say nothing of the NFS connection to the server).

Well, that is not true: if the write was really lost, the kernel wouldn't
have completed the AIO write and notified QEMU.


> > You could try persistent grants, that wouldn't solve the bug but they
> > should be able to "hide" it pretty well. Not ideal, I know.
> > The QEMU side commit is 9e496d7458bb01b717afe22db10a724db57d53fd.
> > Konrad issued a pull request recently with the corresponding Linux
> > blkfront changes:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git
> > stable/for-jens-3.8
> 
> That's presumably the fir 8 commits at:
> http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=shortlog;h=refs/heads/stable/for-jens-3.8
> 
> So I'd need a new dom0 kernel and to backport the QEMU patch.

Yep.

  parent reply	other threads:[~2013-01-16 16:27 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-14 14:54 Fatal crash on xen4.2 HVM + qemu-xen dm + NFS Alex Bligh
2012-12-17 10:10 ` Jan Beulich
2012-12-17 17:09   ` Alex Bligh
2013-01-16 10:56   ` Alex Bligh
2013-01-16 14:34     ` Stefano Stabellini
2013-01-16 15:06       ` Alex Bligh
2013-01-16 16:00         ` Alex Bligh
2013-01-16 16:27         ` Stefano Stabellini [this message]
2013-01-16 17:13           ` Alex Bligh
2013-01-16 17:33             ` Stefano Stabellini
2013-01-16 17:39               ` Stefano Stabellini
2013-01-16 18:14                 ` Alex Bligh
2013-01-16 18:49                   ` Stefano Stabellini
2013-01-16 19:00                     ` Stefano Stabellini
2013-01-17  7:58                       ` Alex Bligh
2013-01-16 18:12               ` Alex Bligh
2013-01-21 15:15               ` Alex Bligh
2013-01-21 15:23                 ` Ian Campbell
2013-01-21 15:35                   ` Alex Bligh
2013-01-21 15:50                     ` Ian Campbell
2013-01-21 16:33                       ` Alex Bligh
2013-01-21 16:51                         ` Ian Campbell
2013-01-21 17:06                           ` Alex Bligh
2013-01-21 17:29                             ` Ian Campbell
2013-01-21 17:31                           ` Alex Bligh
2013-01-21 17:32                             ` Ian Campbell
2013-01-21 18:14                               ` Alex Bligh
2013-01-22 10:05                                 ` Ian Campbell
2013-01-22 13:02                                   ` Alex Bligh
2013-01-22 13:13                                     ` Ian Campbell
2013-01-21 20:37                           ` Alex Bligh
2013-01-22 10:07                             ` Ian Campbell
2013-01-22 13:01                               ` Alex Bligh
2013-01-22 13:14                                 ` Ian Campbell
2013-01-22 13:18                                   ` Alex Bligh
2013-01-22 10:13                             ` Ian Campbell
2013-01-22 12:59                               ` Alex Bligh
2013-01-22 15:46                                 ` Stefano Stabellini
2013-01-22 15:42                             ` Stefano Stabellini
2013-01-22 16:09                               ` Stefano Stabellini
2013-01-22 20:31                                 ` Alex Bligh
2013-01-23 11:52                                   ` Stefano Stabellini
2013-01-23 15:19                                     ` Alex Bligh
2013-01-23 16:29                                       ` Stefano Stabellini
2013-01-25 11:28                                         ` Alex Bligh
2013-02-05 15:40                                           ` Alex Bligh
2013-02-22 17:28                                             ` Alex Bligh
2013-02-22 17:41                                               ` Konrad Rzeszutek Wilk
2013-02-22 18:00                                                 ` Stefano Stabellini
2013-02-22 19:53                                                 ` Alex Bligh
2013-03-06 11:50                                                   ` Alex Bligh
2013-03-07  1:01                                                     ` Konrad Rzeszutek Wilk
2013-03-07  4:15                                                       ` Stefano Stabellini
2013-03-07 10:47                                                         ` [PATCH] QEMU(upstream): Disable xen's use of O_DIRECT by default as it results in crashes Alex Bligh
2013-03-08  3:18                                                           ` Stefano Stabellini
2013-03-08  9:25                                                             ` [PATCHv2] " Alex Bligh
2013-03-08  9:26                                                             ` [PATCH] " Alex Bligh
2013-03-08 10:17                                                             ` George Dunlap
2013-03-08 10:27                                                               ` Alex Bligh
2013-03-08 10:35                                                                 ` George Dunlap
2013-03-08 10:50                                                                   ` Alex Bligh
2013-03-08 11:18                                                                     ` George Dunlap
2013-03-08 11:40                                                                       ` [PATCHv3] " Alex Bligh
2013-03-08 12:54                                                                         ` George Dunlap
2013-03-11 14:02                                                                           ` Alex Bligh
2013-03-11 14:42                                                                             ` George Dunlap
2013-03-11 17:48                                                                               ` Konrad Rzeszutek Wilk
2013-03-11 17:55                                                                                 ` Ian Jackson
2013-03-14 17:06                                                                                   ` Alex Bligh
2013-03-14 18:26                                                                                     ` Ian Jackson
2013-03-12 12:08                                                                               ` Ian Campbell
2013-03-14 18:37                                                                         ` Stefano Stabellini
2013-03-14 19:30                                                                           ` Ian Jackson
2013-03-14 19:56                                                                             ` Alex Bligh
2013-03-15  9:28                                                                             ` Ian Campbell
2013-03-15 10:43                                                                               ` Stefano Stabellini
2013-03-15 11:21                                                                                 ` Ian Jackson
2013-03-15 11:28                                                                                   ` Stefano Stabellini
2013-03-15 11:37                                                                                     ` Ian Jackson
2013-03-15 11:43                                                                                       ` Stefano Stabellini
2013-03-15 12:43                                                                                         ` Alex Bligh
2013-03-15 12:50                                                                                           ` Ian Campbell
2013-03-15 18:31                                                                                         ` Ian Jackson
2013-03-18 10:29                                                                                         ` Alex Bligh
2013-03-18 11:47                                                                                           ` Stefano Stabellini
2013-03-18 12:21                                                                                             ` Alex Bligh
2013-03-08 11:41                                                                       ` [PATCH] " Alex Bligh
2013-03-08 10:28                                                               ` George Dunlap
2013-03-08 10:45                                                                 ` Alex Bligh
2013-03-07 10:51                                                         ` Fatal crash on xen4.2 HVM + qemu-xen dm + NFS Alex Bligh
2013-03-07  8:16                                                       ` Alex Bligh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.02.1301161620390.4978@kaball.uk.xensource.com \
    --to=stefano.stabellini@eu.citrix.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=alex@alex.org.uk \
    --cc=konrad.wilk@oracle.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.