All of lore.kernel.org
 help / color / mirror / Atom feed
* xl hangs instead of failing more graciously when the fs is read-only
@ 2014-06-12  7:47 Sander Eikelenboom
  2014-06-12  7:59 ` Ian Campbell
  0 siblings, 1 reply; 4+ messages in thread
From: Sander Eikelenboom @ 2014-06-12  7:47 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

Hi Ian,

At the moment I’m having the privilege of thrashing my box and root-fs 
frequently while testing kernels. This causes the root-fs to be mounted 
read-only. But init continues to do it's job any way .. so we get to xendomains,
which in turn uses 'xl'. But 'xl' needs a writable FS and hangs when it's not,
couldn't and shouldn't this fail more graciously ?

--
Sander
 
[  374.387283] INFO: task xl:9233 blocked for more than 120 seconds.
[  374.401747]       Not tainted 3.15.0-20140611a-netnext+ #1
[  374.416089] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  374.428468] xl              D ffff880033a80000     0  9233   9231 0x00000000
[  374.439931]  ffff88002a617d78 0000000000000216 ffff880033a80000 0000000000014500
[  374.451387]  ffff88002a617fd8 0000000000014500 ffffffff822184e0 ffff880033a80000
[  374.462742]  000000101c8101e6 ffffffff8313deb0 ffff880033a80000 ffff880033a80850
[  374.474082] Call Trace:
[  374.485344]  [<ffffffff81116426>] ? __lock_acquire+0x516/0x2210
[  374.496464]  [<ffffffff8111448a>] ? mark_held_locks+0x6a/0x90
[  374.507358]  [<ffffffff81ba9239>] schedule+0x29/0x70
[  374.518145]  [<ffffffff81ba964e>] schedule_preempt_disabled+0xe/0x10
[  374.528861]  [<ffffffff81bac86a>] mutex_lock_nested+0x17a/0x560
[  374.539422]  [<ffffffff81590267>] ? xenbus_dev_request_and_reply+0x37/0xc0
[  374.550001]  [<ffffffff81590267>] xenbus_dev_request_and_reply+0x37/0xc0
[  374.560543]  [<ffffffff811be2f3>] ? might_fault+0x43/0xa0
[  374.570927]  [<ffffffff815921e8>] xenbus_file_write+0x2c8/0x560
[  374.581130]  [<ffffffff8111856c>] ? lock_release+0x13c/0x2a0
[  374.591086]  [<ffffffff811f71e2>] vfs_write+0xc2/0x1e0
[  374.600943]  [<ffffffff811f76f2>] SyS_write+0x52/0xc0
[  374.610673]  [<ffffffff81baf939>] system_call_fastpath+0x16/0x1b
[  374.620293] 3 locks held by xl/9233:
[  374.629816]  #0:  (sb_writers#10){.+.+..}, at: [<ffffffff811f72e3>] vfs_write+0x1c3/0x1e0
[  374.639444]  #1:  (&u->msgbuffer_mutex){+.+...}, at: [<ffffffff81591f6a>] xenbus_file_write+0x4a/0x560
[  374.649015]  #2:  (&xs_state.request_mutex){+.+...}, at: [<ffffffff81590267>] xenbus_dev_request_and_reply+0x37/0xc0
[  494.597619] INFO: task xl:9233 blocked for more than 120 seconds.
[  494.609487]       Not tainted 3.15.0-20140611a-netnext+ #1
[  494.621051] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  494.630382] xl              D ffff880033a80000     0  9233   9231 0x00000000
[  494.639750]  ffff88002a617d78 0000000000000216 ffff880033a80000 0000000000014500
[  494.649113]  ffff88002a617fd8 0000000000014500 ffffffff822184e0 ffff880033a80000
[  494.658231]  000000101c8101e6 ffffffff8313deb0 ffff880033a80000 ffff880033a80850
[  494.667179] Call Trace:
[  494.675972]  [<ffffffff81116426>] ? __lock_acquire+0x516/0x2210
[  494.684793]  [<ffffffff8111448a>] ? mark_held_locks+0x6a/0x90
[  494.693479]  [<ffffffff81ba9239>] schedule+0x29/0x70
[  494.702161]  [<ffffffff81ba964e>] schedule_preempt_disabled+0xe/0x10
[  494.710782]  [<ffffffff81bac86a>] mutex_lock_nested+0x17a/0x560
[  494.719166]  [<ffffffff81590267>] ? xenbus_dev_request_and_reply+0x37/0xc0
[  494.727404]  [<ffffffff81590267>] xenbus_dev_request_and_reply+0x37/0xc0
[  494.735528]  [<ffffffff811be2f3>] ? might_fault+0x43/0xa0
[  494.743583]  [<ffffffff815921e8>] xenbus_file_write+0x2c8/0x560
[  494.751505]  [<ffffffff8111856c>] ? lock_release+0x13c/0x2a0
[  494.759406]  [<ffffffff811f71e2>] vfs_write+0xc2/0x1e0
[  494.767192]  [<ffffffff811f76f2>] SyS_write+0x52/0xc0
[  494.774714]  [<ffffffff81baf939>] system_call_fastpath+0x16/0x1b
[  494.782034] 3 locks held by xl/9233:
[  494.789223]  #0:  (sb_writers#10){.+.+..}, at: [<ffffffff811f72e3>] vfs_write+0x1c3/0x1e0
[  494.796547]  #1:  (&u->msgbuffer_mutex){+.+...}, at: [<ffffffff81591f6a>] xenbus_file_write+0x4a/0x560
[  494.803955]  #2:  (&xs_state.request_mutex){+.+...}, at: [<ffffffff81590267>] xenbus_dev_request_and_reply+0x37/0xc0


etc. etc.

  



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: xl hangs instead of failing more graciously when the fs is read-only
  2014-06-12  7:47 xl hangs instead of failing more graciously when the fs is read-only Sander Eikelenboom
@ 2014-06-12  7:59 ` Ian Campbell
  2014-06-12  8:08   ` Sander Eikelenboom
  0 siblings, 1 reply; 4+ messages in thread
From: Ian Campbell @ 2014-06-12  7:59 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: xen-devel

On Thu, 2014-06-12 at 09:47 +0200, Sander Eikelenboom wrote:
> Hi Ian,
> 
> At the moment I’m having the privilege of thrashing my box and root-fs 
> frequently while testing kernels. This causes the root-fs to be mounted 
> read-only. But init continues to do it's job any way .. so we get to xendomains,
> which in turn uses 'xl'. But 'xl' needs a writable FS and hangs when it's not,
> couldn't and shouldn't this fail more graciously ?

Your logs seem to be showing reads/writes to a xenbus device, which has
nothing to do with the writeablility of your rootfs afaik.

I'm pretty sure xl will correctly error out if it fails to write to a
file on a readonly fs, or at least I see no evidence here that it is
not.

Maybe the issue is that xenstored is blocked or by the rootfs ro-ness
(or not even started due to it) and so attempts to communicate with it
fail?

I think the right answer is to always have a writeable rootfs rather
than glossing over whatever error led to this situation.

But I suppose as a fallback xendomains could try and probe for a usable
xenstored before proceeding, similar to how xencommons does (ideally
with a shared scriptlet somewhere).

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: xl hangs instead of failing more graciously when the fs is read-only
  2014-06-12  7:59 ` Ian Campbell
@ 2014-06-12  8:08   ` Sander Eikelenboom
  2014-06-12  8:53     ` Ian Campbell
  0 siblings, 1 reply; 4+ messages in thread
From: Sander Eikelenboom @ 2014-06-12  8:08 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel


Thursday, June 12, 2014, 9:59:37 AM, you wrote:

> On Thu, 2014-06-12 at 09:47 +0200, Sander Eikelenboom wrote:
>> Hi Ian,
>> 
>> At the moment I’m having the privilege of thrashing my box and root-fs 
>> frequently while testing kernels. This causes the root-fs to be mounted 
>> read-only. But init continues to do it's job any way .. so we get to xendomains,
>> which in turn uses 'xl'. But 'xl' needs a writable FS and hangs when it's not,
>> couldn't and shouldn't this fail more graciously ?

> Your logs seem to be showing reads/writes to a xenbus device, which has
> nothing to do with the writeablility of your rootfs afaik.

> I'm pretty sure xl will correctly error out if it fails to write to a
> file on a readonly fs, or at least I see no evidence here that it is
> not.

[  734.993832]  [<ffffffff811f71e2>] vfs_write+0xc2/0x1e0
[  735.000239]  [<ffffffff811f76f2>] SyS_write+0x52/0xc0
[  735.019374]  #0:  (sb_writers#10){.+.+..}, at: [<ffffffff811f72e3>] vfs_write+0x1c3/0x1e0

These let me think there was also some direct fs writing done, but i'm not that 
good in interpreting stacktraces to tell which is the actually blocking part.

> Maybe the issue is that xenstored is blocked or by the rootfs ro-ness
> (or not even started due to it) and so attempts to communicate with it
> fail?

> I think the right answer is to always have a writeable rootfs rather
> than glossing over whatever error led to this situation.

I'm aiming for that :-) .. i'm not expecting it to function, but there is a 
subtle difference between hanging and failing. 

> But I suppose as a fallback xendomains could try and probe for a usable
> xenstored before proceeding, similar to how xencommons does (ideally
> with a shared scriptlet somewhere).

Is it the right place to fix places that use xl, rather than xl it self and let 
it fail when it can't use xenstored then ?

> Ian.





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: xl hangs instead of failing more graciously when the fs is read-only
  2014-06-12  8:08   ` Sander Eikelenboom
@ 2014-06-12  8:53     ` Ian Campbell
  0 siblings, 0 replies; 4+ messages in thread
From: Ian Campbell @ 2014-06-12  8:53 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: xen-devel

On Thu, 2014-06-12 at 10:08 +0200, Sander Eikelenboom wrote:
> Thursday, June 12, 2014, 9:59:37 AM, you wrote:
> 
> > On Thu, 2014-06-12 at 09:47 +0200, Sander Eikelenboom wrote:
> >> Hi Ian,
> >> 
> >> At the moment I’m having the privilege of thrashing my box and root-fs 
> >> frequently while testing kernels. This causes the root-fs to be mounted 
> >> read-only. But init continues to do it's job any way .. so we get to xendomains,
> >> which in turn uses 'xl'. But 'xl' needs a writable FS and hangs when it's not,
> >> couldn't and shouldn't this fail more graciously ?
> 
> > Your logs seem to be showing reads/writes to a xenbus device, which has
> > nothing to do with the writeablility of your rootfs afaik.
> 
> > I'm pretty sure xl will correctly error out if it fails to write to a
> > file on a readonly fs, or at least I see no evidence here that it is
> > not.
> 
> [  734.993832]  [<ffffffff811f71e2>] vfs_write+0xc2/0x1e0
> [  735.000239]  [<ffffffff811f76f2>] SyS_write+0x52/0xc0
> [  735.019374]  #0:  (sb_writers#10){.+.+..}, at: [<ffffffff811f72e3>] vfs_write+0x1c3/0x1e0
> 
> These let me think there was also some direct fs writing done, but i'm not that 
> good in interpreting stacktraces to tell which is the actually blocking part.

They are writes, but you can't tell that they are to the rootfs from
this. The rest of the stack trace indicated that they were likely to the
xenbus special device file.

> > Maybe the issue is that xenstored is blocked or by the rootfs ro-ness
> > (or not even started due to it) and so attempts to communicate with it
> > fail?
> 
> > I think the right answer is to always have a writeable rootfs rather
> > than glossing over whatever error led to this situation.
> 
> I'm aiming for that :-) .. i'm not expecting it to function, but there is a 
> subtle difference between hanging and failing. 
> 
> > But I suppose as a fallback xendomains could try and probe for a usable
> > xenstored before proceeding, similar to how xencommons does (ideally
> > with a shared scriptlet somewhere).
> 
> Is it the right place to fix places that use xl, rather than xl it self and let 
> it fail when it can't use xenstored then ?

Fixing it in xl seems to me like it would be very tricky, because xl
uses the kernel interface and not the unix domain socket to access
xenstored (to cope with stub xenstored configurations). You are welcome
to try of course but fixing it in the initscript seems likely to be much
simpler to me.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-06-12  8:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-12  7:47 xl hangs instead of failing more graciously when the fs is read-only Sander Eikelenboom
2014-06-12  7:59 ` Ian Campbell
2014-06-12  8:08   ` Sander Eikelenboom
2014-06-12  8:53     ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.