All of lore.kernel.org
 help / color / mirror / Atom feed
* Deadlock in /proc/xen/xenbus watch+read on 3.17+ (maybe earlier)
@ 2015-03-19  1:19 Marek Marczykowski-Górecki
  2015-03-19 10:38 ` Ian Campbell
  2015-03-19 11:10 ` David Vrabel
  0 siblings, 2 replies; 14+ messages in thread
From: Marek Marczykowski-Górecki @ 2015-03-19  1:19 UTC (permalink / raw)
  To: xen-devel; +Cc: Boris Ostrovsky, David Vrabel


[-- Attachment #1.1: Type: text/plain, Size: 2358 bytes --]

Hi,

I've hit some deadlock in kernel xenstore client exposed via
/proc/xen/xenbus. Steps to reproduce are simple:
int main() {
	struct xs_handle *xs;
	xs = xs_open(0);
	xs_watch(xs, "domid", "token");
	xs_read(xs, 0, "name", NULL);
	return 0;
}

xs_watch internally creates new thread, which uses read to wait for the
watch. And in the same time, the program tries to read some value,
but actually it hangs at sending the command (before even sending a path to be
read). Strace gives this (simplified for readability):
[pid  2494] write(3, "\4\0\0\0\0\0\0\0\0\0\0\0\f\0\0\0", 160 = 16
[pid  2494] write(3, "domid\0", 6)      = 6
[pid  2494] write(3, "token\0", 6)      = 6
[pid  2495] read(3,  <unfinished ...>
[pid  2494] futex(0x71c0d4, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid  2495] <... read resumed>
"\17\0\0\0\377\377\377\377\220~\255\27\f\0\0\0", 16) = 16
[pid  2495] read(3, "domid\0token\0", 12) = 12
[pid  2495] read(3, "\4\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0", 16) = 16
[pid  2495] read(3, "OK\0", 3)          = 3
[pid  2495] futex(0x71c0d4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x71c0d0,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
[pid  2494] <... futex resumed> )       = 0
[pid  2495] <... futex resumed> )       = 1
[pid  2494] futex(0x71c0a8, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid  2495] futex(0x71c0a8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid  2494] <... futex resumed> )       = -1 EAGAIN (Resource
temporarily unavailable)
[pid  2495] <... futex resumed> )       = 0
[pid  2494] futex(0x71c0a8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid  2495] read(3,  <unfinished ...>
[pid  2494] <... futex resumed> )       = 0
[pid  2494] rt_sigaction(SIGPIPE, {SIG_DFL, [], SA_RESTORER,
0x7fc78c1488f0}, NULL, 8) = 0
[pid  2494] rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER,
0x7fc78c1488f0}, {SIG_DFL, [], SA_RESTORER, 0x7fc78c1488f0}, 8) = 0
[pid  2494] write(3, "\2\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0", 16

And thats all - 2494 is waiting on write, 2495 is waiting on read.

On 3.12.x it is working. On 3.17.0 and 3.18.7 it is broken. I haven't
checked versions in the middle.

Any ideas?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Deadlock in /proc/xen/xenbus watch+read on 3.17+ (maybe earlier)
  2015-03-19  1:19 Deadlock in /proc/xen/xenbus watch+read on 3.17+ (maybe earlier) Marek Marczykowski-Górecki
@ 2015-03-19 10:38 ` Ian Campbell
  2015-03-19 12:10   ` Iurii Konovalenko
  2015-03-19 11:10 ` David Vrabel
  1 sibling, 1 reply; 14+ messages in thread
From: Ian Campbell @ 2015-03-19 10:38 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Iurii Konovalenko, Embedded-pv-devel, Boris Ostrovsky,
	David Vrabel, xen-devel

On Thu, 2015-03-19 at 02:19 +0100, Marek Marczykowski-Górecki wrote:
> Hi,
> 
> I've hit some deadlock in kernel xenstore client exposed via
> /proc/xen/xenbus.

Sounds similar to what Iurii also reported last night in "Userspace PV
backend hangs".

Iurri's case was all 3.14 kernels, which is in your range too.

>  Steps to reproduce are simple:
> int main() {
> 	struct xs_handle *xs;
> 	xs = xs_open(0);
> 	xs_watch(xs, "domid", "token");
> 	xs_read(xs, 0, "name", NULL);
> 	return 0;
> }
> 
> xs_watch internally creates new thread, which uses read to wait for the
> watch. And in the same time, the program tries to read some value,
> but actually it hangs at sending the command (before even sending a path to be
> read). Strace gives this (simplified for readability):
> [pid  2494] write(3, "\4\0\0\0\0\0\0\0\0\0\0\0\f\0\0\0", 160 = 16
> [pid  2494] write(3, "domid\0", 6)      = 6
> [pid  2494] write(3, "token\0", 6)      = 6
> [pid  2495] read(3,  <unfinished ...>
> [pid  2494] futex(0x71c0d4, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
> [pid  2495] <... read resumed>
> "\17\0\0\0\377\377\377\377\220~\255\27\f\0\0\0", 16) = 16
> [pid  2495] read(3, "domid\0token\0", 12) = 12
> [pid  2495] read(3, "\4\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0", 16) = 16
> [pid  2495] read(3, "OK\0", 3)          = 3
> [pid  2495] futex(0x71c0d4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x71c0d0,
> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
> [pid  2494] <... futex resumed> )       = 0
> [pid  2495] <... futex resumed> )       = 1
> [pid  2494] futex(0x71c0a8, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid  2495] futex(0x71c0a8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
> [pid  2494] <... futex resumed> )       = -1 EAGAIN (Resource
> temporarily unavailable)
> [pid  2495] <... futex resumed> )       = 0
> [pid  2494] futex(0x71c0a8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
> [pid  2495] read(3,  <unfinished ...>
> [pid  2494] <... futex resumed> )       = 0
> [pid  2494] rt_sigaction(SIGPIPE, {SIG_DFL, [], SA_RESTORER,
> 0x7fc78c1488f0}, NULL, 8) = 0
> [pid  2494] rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER,
> 0x7fc78c1488f0}, {SIG_DFL, [], SA_RESTORER, 0x7fc78c1488f0}, 8) = 0
> [pid  2494] write(3, "\2\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0", 16
> 
> And thats all - 2494 is waiting on write, 2495 is waiting on read.
> 
> On 3.12.x it is working. On 3.17.0 and 3.18.7 it is broken. I haven't
> checked versions in the middle.
> 
> Any ideas?
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Deadlock in /proc/xen/xenbus watch+read on 3.17+ (maybe earlier)
  2015-03-19  1:19 Deadlock in /proc/xen/xenbus watch+read on 3.17+ (maybe earlier) Marek Marczykowski-Górecki
  2015-03-19 10:38 ` Ian Campbell
@ 2015-03-19 11:10 ` David Vrabel
  1 sibling, 0 replies; 14+ messages in thread
From: David Vrabel @ 2015-03-19 11:10 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, xen-devel; +Cc: Boris Ostrovsky, David Vrabel

On 19/03/15 01:19, Marek Marczykowski-Górecki wrote:
> Hi,
> 
> I've hit some deadlock in kernel xenstore client exposed via
> /proc/xen/xenbus. Steps to reproduce are simple:
> int main() {
> 	struct xs_handle *xs;
> 	xs = xs_open(0);
> 	xs_watch(xs, "domid", "token");
> 	xs_read(xs, 0, "name", NULL);
> 	return 0;
> }

This test program works for me on 4.0-rc4 in a dom0 and domU.  I also
don't see any xenbus changes in the kernel that could cause a regression
like this.

Perhaps you not running a vanilla kernel?

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Deadlock in /proc/xen/xenbus watch+read on 3.17+ (maybe earlier)
  2015-03-19 10:38 ` Ian Campbell
@ 2015-03-19 12:10   ` Iurii Konovalenko
  2015-03-19 12:32     ` [Embedded-pv-devel] " Vitaly Chernooky
  2015-03-19 13:00     ` David Vrabel
  0 siblings, 2 replies; 14+ messages in thread
From: Iurii Konovalenko @ 2015-03-19 12:10 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Embedded-pv-devel, Boris Ostrovsky, David Vrabel,
	Marek Marczykowski-Górecki, xen-devel

[-- Attachment #1: Type: text/plain, Size: 3911 bytes --]

Hi, guys!

When I read, that I am not alone and that issue depends on kernel
version, I decided to continue investigation.
And I found why our threads locks on read/write operations.
On Linux kernel 3.14+ syscalls of file read and write changed a bit:
fdget() function was replaced by fdget_pos() - it is fdget() function
plus additional position mutex lock for files with FMODE_ATOMIC_POS
(files for inodes with S_IFREG flag set - regular nodes). As I thought
our xen files are not regular and nonseekable, I hoped this flag is
not set. But it is set. It is because our file system is created by
function simple_fill_super(), and inside it this flag is hardly set:
inode->i_mode = S_IFREG | files->mode;
So, as a fast hack I made a patch: just made copy of this function for
xen, which does not set this flag. It works for me. Could you please
check if it works for you.
Best regards.

Iurii Konovalenko | Senior Software Engineer
GlobalLogic
P +3.8044.492.9695 M +38.099.932.2909
S yufuntik
www.globallogic.com
http://www.globallogic.com/email_disclaimer.txt


On Thu, Mar 19, 2015 at 12:38 PM, Ian Campbell <ian.campbell@citrix.com> wrote:
> On Thu, 2015-03-19 at 02:19 +0100, Marek Marczykowski-Górecki wrote:
>> Hi,
>>
>> I've hit some deadlock in kernel xenstore client exposed via
>> /proc/xen/xenbus.
>
> Sounds similar to what Iurii also reported last night in "Userspace PV
> backend hangs".
>
> Iurri's case was all 3.14 kernels, which is in your range too.
>
>>  Steps to reproduce are simple:
>> int main() {
>>       struct xs_handle *xs;
>>       xs = xs_open(0);
>>       xs_watch(xs, "domid", "token");
>>       xs_read(xs, 0, "name", NULL);
>>       return 0;
>> }
>>
>> xs_watch internally creates new thread, which uses read to wait for the
>> watch. And in the same time, the program tries to read some value,
>> but actually it hangs at sending the command (before even sending a path to be
>> read). Strace gives this (simplified for readability):
>> [pid  2494] write(3, "\4\0\0\0\0\0\0\0\0\0\0\0\f\0\0\0", 160 = 16
>> [pid  2494] write(3, "domid\0", 6)      = 6
>> [pid  2494] write(3, "token\0", 6)      = 6
>> [pid  2495] read(3,  <unfinished ...>
>> [pid  2494] futex(0x71c0d4, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
>> [pid  2495] <... read resumed>
>> "\17\0\0\0\377\377\377\377\220~\255\27\f\0\0\0", 16) = 16
>> [pid  2495] read(3, "domid\0token\0", 12) = 12
>> [pid  2495] read(3, "\4\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0", 16) = 16
>> [pid  2495] read(3, "OK\0", 3)          = 3
>> [pid  2495] futex(0x71c0d4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x71c0d0,
>> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
>> [pid  2494] <... futex resumed> )       = 0
>> [pid  2495] <... futex resumed> )       = 1
>> [pid  2494] futex(0x71c0a8, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
>> [pid  2495] futex(0x71c0a8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>> [pid  2494] <... futex resumed> )       = -1 EAGAIN (Resource
>> temporarily unavailable)
>> [pid  2495] <... futex resumed> )       = 0
>> [pid  2494] futex(0x71c0a8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
>> [pid  2495] read(3,  <unfinished ...>
>> [pid  2494] <... futex resumed> )       = 0
>> [pid  2494] rt_sigaction(SIGPIPE, {SIG_DFL, [], SA_RESTORER,
>> 0x7fc78c1488f0}, NULL, 8) = 0
>> [pid  2494] rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER,
>> 0x7fc78c1488f0}, {SIG_DFL, [], SA_RESTORER, 0x7fc78c1488f0}, 8) = 0
>> [pid  2494] write(3, "\2\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0", 16
>>
>> And thats all - 2494 is waiting on write, 2495 is waiting on read.
>>
>> On 3.12.x it is working. On 3.17.0 and 3.18.7 it is broken. I haven't
>> checked versions in the middle.
>>
>> Any ideas?
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>
>

[-- Attachment #2: 0001-arm-xen-hack-to-make-xen-files-unregular.patch --]
[-- Type: text/x-patch, Size: 2954 bytes --]

From ea5642c0ea537ef3c9da946d4f26a8f597b68622 Mon Sep 17 00:00:00 2001
From: Iurii Konovalenko <iurii.konovalenko@globallogic.com>
Date: Thu, 19 Mar 2015 13:55:44 +0200
Subject: [PATCH] arm: xen: hack to make xen files unregular

Change-Id: I0bc32867ca12dad78aa5f532a9c606fab9a3d1db
Signed-off-by: Iurii Konovalenko <iurii.konovalenko@globallogic.com>
---
 drivers/xen/xenfs/super.c | 71 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 70 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/xenfs/super.c b/drivers/xen/xenfs/super.c
index 06092e0..1f7e74c 100644
--- a/drivers/xen/xenfs/super.c
+++ b/drivers/xen/xenfs/super.c
@@ -14,6 +14,7 @@
 #include <linux/module.h>
 #include <linux/fs.h>
 #include <linux/magic.h>
+#include <linux/pagemap.h>
 
 #include <xen/xen.h>
 
@@ -42,6 +43,74 @@ static const struct file_operations capabilities_file_ops = {
 	.llseek = default_llseek,
 };
 
+static const struct super_operations xen_simple_super_operations = {
+	.statfs		= simple_statfs,
+};
+
+static int xen_simple_fill_super(struct super_block *s, unsigned long magic,
+	      struct tree_descr *files)
+{
+struct inode *inode;
+struct dentry *root;
+struct dentry *dentry;
+int i;
+
+s->s_blocksize = PAGE_CACHE_SIZE;
+s->s_blocksize_bits = PAGE_CACHE_SHIFT;
+s->s_magic = magic;
+s->s_op = &xen_simple_super_operations;
+s->s_time_gran = 1;
+
+inode = new_inode(s);
+if (!inode)
+	return -ENOMEM;
+/*
+* because the root inode is 1, the files array must not contain an
+* entry at index 1
+*/
+inode->i_ino = 1;
+inode->i_mode = S_IFDIR | 0755;
+inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+inode->i_op = &simple_dir_inode_operations;
+inode->i_fop = &simple_dir_operations;
+set_nlink(inode, 2);
+root = d_make_root(inode);
+if (!root)
+	return -ENOMEM;
+for (i = 0; !files->name || files->name[0]; i++, files++) {
+	if (!files->name)
+		continue;
+
+	/* warn if it tries to conflict with the root inode */
+	if (unlikely(i == 1))
+		printk(KERN_WARNING "%s: %s passed in a files array"
+			"with an index of 1!\n", __func__,
+			s->s_type->name);
+
+	dentry = d_alloc_name(root, files->name);
+	if (!dentry)
+		goto out;
+	inode = new_inode(s);
+	if (!inode) {
+		dput(dentry);
+		goto out;
+	}
+	inode->i_mode = files->mode;
+	inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+	inode->i_fop = files->ops;
+	inode->i_ino = i;
+	d_add(dentry, inode);
+}
+s->s_root = root;
+return 0;
+out:
+d_genocide(root);
+shrink_dcache_parent(root);
+dput(root);
+return -ENOMEM;
+}
+
+
 static int xenfs_fill_super(struct super_block *sb, void *data, int silent)
 {
 	static struct tree_descr xenfs_files[] = {
@@ -60,7 +129,7 @@ static int xenfs_fill_super(struct super_block *sb, void *data, int silent)
 		{""},
 	};
 
-	return simple_fill_super(sb, XENFS_SUPER_MAGIC,
+	return xen_simple_fill_super(sb, XENFS_SUPER_MAGIC,
 			xen_initial_domain() ? xenfs_init_files : xenfs_files);
 }
 
-- 
1.9.1


[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Embedded-pv-devel] Deadlock in /proc/xen/xenbus watch+read on 3.17+ (maybe earlier)
  2015-03-19 12:10   ` Iurii Konovalenko
@ 2015-03-19 12:32     ` Vitaly Chernooky
  2015-03-19 13:00     ` David Vrabel
  1 sibling, 0 replies; 14+ messages in thread
From: Vitaly Chernooky @ 2015-03-19 12:32 UTC (permalink / raw)
  To: Iurii Konovalenko
  Cc: Ian Campbell, Marek Marczykowski-Górecki, xen-devel,
	Embedded-pv-devel, David Vrabel, Boris Ostrovsky


[-- Attachment #1.1: Type: text/plain, Size: 4748 bytes --]

Hi All!

This issue is discussed in details here: https://lkml.org/lkml/2014/2/17/324

With best regards,

On Thu, Mar 19, 2015 at 2:10 PM, Iurii Konovalenko <
iurii.konovalenko@globallogic.com> wrote:

> Hi, guys!
>
> When I read, that I am not alone and that issue depends on kernel
> version, I decided to continue investigation.
> And I found why our threads locks on read/write operations.
> On Linux kernel 3.14+ syscalls of file read and write changed a bit:
> fdget() function was replaced by fdget_pos() - it is fdget() function
> plus additional position mutex lock for files with FMODE_ATOMIC_POS
> (files for inodes with S_IFREG flag set - regular nodes). As I thought
> our xen files are not regular and nonseekable, I hoped this flag is
> not set. But it is set. It is because our file system is created by
> function simple_fill_super(), and inside it this flag is hardly set:
> inode->i_mode = S_IFREG | files->mode;
> So, as a fast hack I made a patch: just made copy of this function for
> xen, which does not set this flag. It works for me. Could you please
> check if it works for you.
> Best regards.
>
> Iurii Konovalenko | Senior Software Engineer
> GlobalLogic
> P +3.8044.492.9695 M +38.099.932.2909
> S yufuntik
> www.globallogic.com
> http://www.globallogic.com/email_disclaimer.txt
>
>
> On Thu, Mar 19, 2015 at 12:38 PM, Ian Campbell <ian.campbell@citrix.com>
> wrote:
> > On Thu, 2015-03-19 at 02:19 +0100, Marek Marczykowski-Górecki wrote:
> >> Hi,
> >>
> >> I've hit some deadlock in kernel xenstore client exposed via
> >> /proc/xen/xenbus.
> >
> > Sounds similar to what Iurii also reported last night in "Userspace PV
> > backend hangs".
> >
> > Iurri's case was all 3.14 kernels, which is in your range too.
> >
> >>  Steps to reproduce are simple:
> >> int main() {
> >>       struct xs_handle *xs;
> >>       xs = xs_open(0);
> >>       xs_watch(xs, "domid", "token");
> >>       xs_read(xs, 0, "name", NULL);
> >>       return 0;
> >> }
> >>
> >> xs_watch internally creates new thread, which uses read to wait for the
> >> watch. And in the same time, the program tries to read some value,
> >> but actually it hangs at sending the command (before even sending a
> path to be
> >> read). Strace gives this (simplified for readability):
> >> [pid  2494] write(3, "\4\0\0\0\0\0\0\0\0\0\0\0\f\0\0\0", 160 = 16
> >> [pid  2494] write(3, "domid\0", 6)      = 6
> >> [pid  2494] write(3, "token\0", 6)      = 6
> >> [pid  2495] read(3,  <unfinished ...>
> >> [pid  2494] futex(0x71c0d4, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
> >> [pid  2495] <... read resumed>
> >> "\17\0\0\0\377\377\377\377\220~\255\27\f\0\0\0", 16) = 16
> >> [pid  2495] read(3, "domid\0token\0", 12) = 12
> >> [pid  2495] read(3, "\4\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0", 16) = 16
> >> [pid  2495] read(3, "OK\0", 3)          = 3
> >> [pid  2495] futex(0x71c0d4, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x71c0d0,
> >> {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
> >> [pid  2494] <... futex resumed> )       = 0
> >> [pid  2495] <... futex resumed> )       = 1
> >> [pid  2494] futex(0x71c0a8, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> >> [pid  2495] futex(0x71c0a8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
> >> [pid  2494] <... futex resumed> )       = -1 EAGAIN (Resource
> >> temporarily unavailable)
> >> [pid  2495] <... futex resumed> )       = 0
> >> [pid  2494] futex(0x71c0a8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
> >> [pid  2495] read(3,  <unfinished ...>
> >> [pid  2494] <... futex resumed> )       = 0
> >> [pid  2494] rt_sigaction(SIGPIPE, {SIG_DFL, [], SA_RESTORER,
> >> 0x7fc78c1488f0}, NULL, 8) = 0
> >> [pid  2494] rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER,
> >> 0x7fc78c1488f0}, {SIG_DFL, [], SA_RESTORER, 0x7fc78c1488f0}, 8) = 0
> >> [pid  2494] write(3, "\2\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0", 16
> >>
> >> And thats all - 2494 is waiting on write, 2495 is waiting on read.
> >>
> >> On 3.12.x it is working. On 3.17.0 and 3.18.7 it is broken. I haven't
> >> checked versions in the middle.
> >>
> >> Any ideas?
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xen.org
> >> http://lists.xen.org/xen-devel
> >
> >
>
> _______________________________________________
> Embedded-pv-devel mailing list
> Embedded-pv-devel@lists.xenproject.org
> http://lists.xenproject.org/cgi-bin/mailman/listinfo/embedded-pv-devel
>



-- 
*Vitaly Chernooky | Senior Developer - Product Engineering and Development*
GlobalLogic
P *+380.44.4929695 ext.1136* M *+380.63.6011802* S cvv_2k
www.globallogic.com

http://www.globallogic.com/email_disclaimer.txt

[-- Attachment #1.2: Type: text/html, Size: 7126 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Deadlock in /proc/xen/xenbus watch+read on 3.17+ (maybe earlier)
  2015-03-19 12:10   ` Iurii Konovalenko
  2015-03-19 12:32     ` [Embedded-pv-devel] " Vitaly Chernooky
@ 2015-03-19 13:00     ` David Vrabel
  2015-03-19 13:10       ` Vitaly Chernooky
  1 sibling, 1 reply; 14+ messages in thread
From: David Vrabel @ 2015-03-19 13:00 UTC (permalink / raw)
  To: Iurii Konovalenko, Ian Campbell
  Cc: Embedded-pv-devel, Boris Ostrovsky, xen-devel, David Vrabel,
	Marek Marczykowski-Górecki

On 19/03/15 12:10, Iurii Konovalenko wrote:
> Hi, guys!
> 
> When I read, that I am not alone and that issue depends on kernel
> version, I decided to continue investigation.
> And I found why our threads locks on read/write operations.
> On Linux kernel 3.14+ syscalls of file read and write changed a bit:
> fdget() function was replaced by fdget_pos() - it is fdget() function
> plus additional position mutex lock for files with FMODE_ATOMIC_POS
> (files for inodes with S_IFREG flag set - regular nodes). As I thought
> our xen files are not regular and nonseekable, I hoped this flag is
> not set. But it is set. It is because our file system is created by
> function simple_fill_super(), and inside it this flag is hardly set:
> inode->i_mode = S_IFREG | files->mode;
> So, as a fast hack I made a patch: just made copy of this function for
> xen, which does not set this flag. It works for me. Could you please
> check if it works for you.

I still can't get this to deadlock, but why not clear FMODE_ATOMIC_POS
in xenbus_file_open() ?

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Deadlock in /proc/xen/xenbus watch+read on 3.17+ (maybe earlier)
  2015-03-19 13:00     ` David Vrabel
@ 2015-03-19 13:10       ` Vitaly Chernooky
  2015-03-20  4:04         ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 14+ messages in thread
From: Vitaly Chernooky @ 2015-03-19 13:10 UTC (permalink / raw)
  To: David Vrabel
  Cc: Iurii Konovalenko, Ian Campbell, Marek Marczykowski-Górecki,
	xen-devel, Embedded-pv-devel, Boris Ostrovsky


[-- Attachment #1.1: Type: text/plain, Size: 1750 bytes --]

David,

On Thu, Mar 19, 2015 at 3:00 PM, David Vrabel <david.vrabel@citrix.com>
wrote:

> On 19/03/15 12:10, Iurii Konovalenko wrote:
> > Hi, guys!
> >
> > When I read, that I am not alone and that issue depends on kernel
> > version, I decided to continue investigation.
> > And I found why our threads locks on read/write operations.
> > On Linux kernel 3.14+ syscalls of file read and write changed a bit:
> > fdget() function was replaced by fdget_pos() - it is fdget() function
> > plus additional position mutex lock for files with FMODE_ATOMIC_POS
> > (files for inodes with S_IFREG flag set - regular nodes). As I thought
> > our xen files are not regular and nonseekable, I hoped this flag is
> > not set. But it is set. It is because our file system is created by
> > function simple_fill_super(), and inside it this flag is hardly set:
> > inode->i_mode = S_IFREG | files->mode;
> > So, as a fast hack I made a patch: just made copy of this function for
> > xen, which does not set this flag. It works for me. Could you please
> > check if it works for you.
>
> I still can't get this to deadlock, but why not clear FMODE_ATOMIC_POS
> in xenbus_file_open() ?
>

Because it is not the root of issue. FMODE_ATOMIC_POS is just one of
results of bug. Iurii has fixed the root of issue but in suboptimal way. So
we just need to have found optimal way.

With best regards,


> David
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>



-- 
*Vitaly Chernooky | Senior Developer - Product Engineering and Development*
GlobalLogic
P *+380.44.4929695 ext.1136* M *+380.63.6011802* S cvv_2k
www.globallogic.com

http://www.globallogic.com/email_disclaimer.txt

[-- Attachment #1.2: Type: text/html, Size: 3180 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Deadlock in /proc/xen/xenbus watch+read on 3.17+ (maybe earlier)
  2015-03-19 13:10       ` Vitaly Chernooky
@ 2015-03-20  4:04         ` Marek Marczykowski-Górecki
  2015-03-20  9:58           ` Vitaly Chernooky
  2015-03-20 10:08           ` Vitaly Chernooky
  0 siblings, 2 replies; 14+ messages in thread
From: Marek Marczykowski-Górecki @ 2015-03-20  4:04 UTC (permalink / raw)
  To: Vitaly Chernooky
  Cc: Iurii Konovalenko, Ian Campbell, xen-devel, Embedded-pv-devel,
	David Vrabel, Boris Ostrovsky


[-- Attachment #1.1: Type: text/plain, Size: 2149 bytes --]

On Thu, Mar 19, 2015 at 03:10:49PM +0200, Vitaly Chernooky wrote:
> David,
> 
> On Thu, Mar 19, 2015 at 3:00 PM, David Vrabel <david.vrabel@citrix.com>
> wrote:
> 
> > On 19/03/15 12:10, Iurii Konovalenko wrote:
> > > Hi, guys!
> > >
> > > When I read, that I am not alone and that issue depends on kernel
> > > version, I decided to continue investigation.
> > > And I found why our threads locks on read/write operations.
> > > On Linux kernel 3.14+ syscalls of file read and write changed a bit:
> > > fdget() function was replaced by fdget_pos() - it is fdget() function
> > > plus additional position mutex lock for files with FMODE_ATOMIC_POS
> > > (files for inodes with S_IFREG flag set - regular nodes). As I thought
> > > our xen files are not regular and nonseekable, I hoped this flag is
> > > not set. But it is set. It is because our file system is created by
> > > function simple_fill_super(), and inside it this flag is hardly set:
> > > inode->i_mode = S_IFREG | files->mode;
> > > So, as a fast hack I made a patch: just made copy of this function for
> > > xen, which does not set this flag. It works for me. Could you please
> > > check if it works for you.
> >
> > I still can't get this to deadlock, but why not clear FMODE_ATOMIC_POS
> > in xenbus_file_open() ?
> >
> 
> Because it is not the root of issue. FMODE_ATOMIC_POS is just one of
> results of bug. Iurii has fixed the root of issue but in suboptimal way. So
> we just need to have found optimal way.

I can just confirm that:
1. (unsurprisingly) the bug is still present in 4.0-rc4
2. both proposed fixes are effective

I'm not sure if removing S_IFREG completely is a good idea, I guess
there will be much more side effects...
What about another idea: xenbus_file_open uses nonseekable_open - this
looks like a good place to clear FMODE_ATOMIC_POS if present? It
doesn't make sense to get a lock for position on nonseekable file,
right?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Deadlock in /proc/xen/xenbus watch+read on 3.17+ (maybe earlier)
  2015-03-20  4:04         ` Marek Marczykowski-Górecki
@ 2015-03-20  9:58           ` Vitaly Chernooky
  2015-03-20 10:08           ` Vitaly Chernooky
  1 sibling, 0 replies; 14+ messages in thread
From: Vitaly Chernooky @ 2015-03-20  9:58 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Iurii Konovalenko, Ian Campbell, xen-devel, Embedded-pv-devel,
	David Vrabel, Boris Ostrovsky


[-- Attachment #1.1: Type: text/plain, Size: 2844 bytes --]

On Fri, Mar 20, 2015 at 6:04 AM, Marek Marczykowski-Górecki <
marmarek@invisiblethingslab.com> wrote:

> On Thu, Mar 19, 2015 at 03:10:49PM +0200, Vitaly Chernooky wrote:
> > David,
> >
> > On Thu, Mar 19, 2015 at 3:00 PM, David Vrabel <david.vrabel@citrix.com>
> > wrote:
> >
> > > On 19/03/15 12:10, Iurii Konovalenko wrote:
> > > > Hi, guys!
> > > >
> > > > When I read, that I am not alone and that issue depends on kernel
> > > > version, I decided to continue investigation.
> > > > And I found why our threads locks on read/write operations.
> > > > On Linux kernel 3.14+ syscalls of file read and write changed a bit:
> > > > fdget() function was replaced by fdget_pos() - it is fdget() function
> > > > plus additional position mutex lock for files with FMODE_ATOMIC_POS
> > > > (files for inodes with S_IFREG flag set - regular nodes). As I
> thought
> > > > our xen files are not regular and nonseekable, I hoped this flag is
> > > > not set. But it is set. It is because our file system is created by
> > > > function simple_fill_super(), and inside it this flag is hardly set:
> > > > inode->i_mode = S_IFREG | files->mode;
> > > > So, as a fast hack I made a patch: just made copy of this function
> for
> > > > xen, which does not set this flag. It works for me. Could you please
> > > > check if it works for you.
> > >
> > > I still can't get this to deadlock, but why not clear FMODE_ATOMIC_POS
> > > in xenbus_file_open() ?
> > >
> >
> > Because it is not the root of issue. FMODE_ATOMIC_POS is just one of
> > results of bug. Iurii has fixed the root of issue but in suboptimal way.
> So
> > we just need to have found optimal way.
>
> I can just confirm that:
> 1. (unsurprisingly) the bug is still present in 4.0-rc4
> 2. both proposed fixes are effective
>
> I'm not sure if removing S_IFREG completely is a good idea, I guess
> there will be much more side effects...
>

Hm... Usually nobody expect regular files to be nonseekable, but ...

Yes, I am going to recheck it twice in the nearest future.


> What about another idea: xenbus_file_open uses nonseekable_open - this
> looks like a good place to clear FMODE_ATOMIC_POS if present?


It is wrong place to clear  FMODE_ATOMIC_POS but something in your idea is
right.

It
> doesn't make sense to get a lock for position on nonseekable file,
> right?
>

Usually :)


>
> --
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
>



-- 
*Vitaly Chernooky | Senior Developer - Product Engineering and Development*
GlobalLogic
P *+380.44.4929695 ext.1136* M *+380.63.6011802* S cvv_2k
www.globallogic.com

http://www.globallogic.com/email_disclaimer.txt

[-- Attachment #1.2: Type: text/html, Size: 4751 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Deadlock in /proc/xen/xenbus watch+read on 3.17+ (maybe earlier)
  2015-03-20  4:04         ` Marek Marczykowski-Górecki
  2015-03-20  9:58           ` Vitaly Chernooky
@ 2015-03-20 10:08           ` Vitaly Chernooky
  2015-03-20 10:38             ` Vitaly Chernooky
  2015-03-20 10:41             ` Marek Marczykowski-Górecki
  1 sibling, 2 replies; 14+ messages in thread
From: Vitaly Chernooky @ 2015-03-20 10:08 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Iurii Konovalenko, Ian Campbell, xen-devel, Embedded-pv-devel,
	David Vrabel, Boris Ostrovsky


[-- Attachment #1.1: Type: text/plain, Size: 2800 bytes --]

On Fri, Mar 20, 2015 at 6:04 AM, Marek Marczykowski-Górecki <
marmarek@invisiblethingslab.com> wrote:

> On Thu, Mar 19, 2015 at 03:10:49PM +0200, Vitaly Chernooky wrote:
> > David,
> >
> > On Thu, Mar 19, 2015 at 3:00 PM, David Vrabel <david.vrabel@citrix.com>
> > wrote:
> >
> > > On 19/03/15 12:10, Iurii Konovalenko wrote:
> > > > Hi, guys!
> > > >
> > > > When I read, that I am not alone and that issue depends on kernel
> > > > version, I decided to continue investigation.
> > > > And I found why our threads locks on read/write operations.
> > > > On Linux kernel 3.14+ syscalls of file read and write changed a bit:
> > > > fdget() function was replaced by fdget_pos() - it is fdget() function
> > > > plus additional position mutex lock for files with FMODE_ATOMIC_POS
> > > > (files for inodes with S_IFREG flag set - regular nodes). As I
> thought
> > > > our xen files are not regular and nonseekable, I hoped this flag is
> > > > not set. But it is set. It is because our file system is created by
> > > > function simple_fill_super(), and inside it this flag is hardly set:
> > > > inode->i_mode = S_IFREG | files->mode;
> > > > So, as a fast hack I made a patch: just made copy of this function
> for
> > > > xen, which does not set this flag. It works for me. Could you please
> > > > check if it works for you.
> > >
> > > I still can't get this to deadlock, but why not clear FMODE_ATOMIC_POS
> > > in xenbus_file_open() ?
> > >
> >
> > Because it is not the root of issue. FMODE_ATOMIC_POS is just one of
> > results of bug. Iurii has fixed the root of issue but in suboptimal way.
> So
> > we just need to have found optimal way.
>
> I can just confirm that:
> 1. (unsurprisingly) the bug is still present in 4.0-rc4
> 2. both proposed fixes are effective
>
> I'm not sure if removing S_IFREG completely is a good idea, I guess
> there will be much more side effects...
> What about another idea: xenbus_file_open uses nonseekable_open - this
> looks like a good place to clear FMODE_ATOMIC_POS if present? It
> doesn't make sense to get a lock for position on nonseekable file,
> right?
>

The Open Group Base Specifications Issue 7 IEEE Std 1003.1, 2013 Edition
requires from regular files to be seekable. But Linux kernel looks like
Linus has own opinion on it :(((

With best regards,


> --
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
>



-- 
*Vitaly Chernooky | Senior Developer - Product Engineering and Development*
GlobalLogic
P *+380.44.4929695 ext.1136* M *+380.63.6011802* S cvv_2k
www.globallogic.com

http://www.globallogic.com/email_disclaimer.txt

[-- Attachment #1.2: Type: text/html, Size: 4341 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Deadlock in /proc/xen/xenbus watch+read on 3.17+ (maybe earlier)
  2015-03-20 10:08           ` Vitaly Chernooky
@ 2015-03-20 10:38             ` Vitaly Chernooky
  2015-03-20 10:46               ` David Vrabel
  2015-03-20 10:41             ` Marek Marczykowski-Górecki
  1 sibling, 1 reply; 14+ messages in thread
From: Vitaly Chernooky @ 2015-03-20 10:38 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: Iurii Konovalenko, Ian Campbell, xen-devel, Embedded-pv-devel,
	David Vrabel, Boris Ostrovsky


[-- Attachment #1.1: Type: text/plain, Size: 3396 bytes --]

So I have finished my investigation and suggest to discuss the simple
attaches patch.

With best regards,

On Fri, Mar 20, 2015 at 12:08 PM, Vitaly Chernooky <
vitalii.chernookyi@globallogic.com> wrote:

>
>
> On Fri, Mar 20, 2015 at 6:04 AM, Marek Marczykowski-Górecki <
> marmarek@invisiblethingslab.com> wrote:
>
>> On Thu, Mar 19, 2015 at 03:10:49PM +0200, Vitaly Chernooky wrote:
>> > David,
>> >
>> > On Thu, Mar 19, 2015 at 3:00 PM, David Vrabel <david.vrabel@citrix.com>
>> > wrote:
>> >
>> > > On 19/03/15 12:10, Iurii Konovalenko wrote:
>> > > > Hi, guys!
>> > > >
>> > > > When I read, that I am not alone and that issue depends on kernel
>> > > > version, I decided to continue investigation.
>> > > > And I found why our threads locks on read/write operations.
>> > > > On Linux kernel 3.14+ syscalls of file read and write changed a bit:
>> > > > fdget() function was replaced by fdget_pos() - it is fdget()
>> function
>> > > > plus additional position mutex lock for files with FMODE_ATOMIC_POS
>> > > > (files for inodes with S_IFREG flag set - regular nodes). As I
>> thought
>> > > > our xen files are not regular and nonseekable, I hoped this flag is
>> > > > not set. But it is set. It is because our file system is created by
>> > > > function simple_fill_super(), and inside it this flag is hardly set:
>> > > > inode->i_mode = S_IFREG | files->mode;
>> > > > So, as a fast hack I made a patch: just made copy of this function
>> for
>> > > > xen, which does not set this flag. It works for me. Could you please
>> > > > check if it works for you.
>> > >
>> > > I still can't get this to deadlock, but why not clear FMODE_ATOMIC_POS
>> > > in xenbus_file_open() ?
>> > >
>> >
>> > Because it is not the root of issue. FMODE_ATOMIC_POS is just one of
>> > results of bug. Iurii has fixed the root of issue but in suboptimal
>> way. So
>> > we just need to have found optimal way.
>>
>> I can just confirm that:
>> 1. (unsurprisingly) the bug is still present in 4.0-rc4
>> 2. both proposed fixes are effective
>>
>> I'm not sure if removing S_IFREG completely is a good idea, I guess
>> there will be much more side effects...
>> What about another idea: xenbus_file_open uses nonseekable_open - this
>> looks like a good place to clear FMODE_ATOMIC_POS if present? It
>> doesn't make sense to get a lock for position on nonseekable file,
>> right?
>>
>
> The Open Group Base Specifications Issue 7 IEEE Std 1003.1, 2013 Edition
> requires from regular files to be seekable. But Linux kernel looks like
> Linus has own opinion on it :(((
>
> With best regards,
>
>
>> --
>> Best Regards,
>> Marek Marczykowski-Górecki
>> Invisible Things Lab
>> A: Because it messes up the order in which people normally read text.
>> Q: Why is top-posting such a bad thing?
>>
>
>
>
> --
> *Vitaly Chernooky | Senior Developer - Product Engineering and Development*
> GlobalLogic
> P *+380.44.4929695 ext.1136 <%2B380.44.4929695%20ext.1136>* M *+380.63.6011802
> <%2B380.63.6011802>* S cvv_2k
> www.globallogic.com
>
> http://www.globallogic.com/email_disclaimer.txt
>



-- 
*Vitaly Chernooky | Senior Developer - Product Engineering and Development*
GlobalLogic
P *+380.44.4929695 ext.1136* M *+380.63.6011802* S cvv_2k
www.globallogic.com

http://www.globallogic.com/email_disclaimer.txt

[-- Attachment #1.2: Type: text/html, Size: 5783 bytes --]

[-- Attachment #2: 0001-Fix-deadlock-on-regular-nonseekable-files.patch --]
[-- Type: text/x-patch, Size: 916 bytes --]

From f5f5922f389c0b6a2f0f912adf0c091ee97e0076 Mon Sep 17 00:00:00 2001
From: Vitaly Chernooky <vitaly.chernooky@globallogic.com>
Date: Fri, 20 Mar 2015 12:26:37 +0200
Subject: [PATCH] Fix deadlock on regular nonseekable files

It is actual for proc-like pseudo filesystems which mark their files as
regular but nonseekable.

Change-Id: I92f4da22a5835cc6b6396988c36b5906561ac741
Signed-off-by: Vitaly Chernooky <vitaly.chernooky@globallogic.com>
---
 fs/open.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/open.c b/fs/open.c
index 2ed7325..b792784 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1100,7 +1100,7 @@ EXPORT_SYMBOL(generic_file_open);
  */
 int nonseekable_open(struct inode *inode, struct file *filp)
 {
-	filp->f_mode &= ~(FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE);
+	filp->f_mode &= ~(FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE | FMODE_ATOMIC_POS);
 	return 0;
 }
 
-- 
1.7.9.5


[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Deadlock in /proc/xen/xenbus watch+read on 3.17+ (maybe earlier)
  2015-03-20 10:08           ` Vitaly Chernooky
  2015-03-20 10:38             ` Vitaly Chernooky
@ 2015-03-20 10:41             ` Marek Marczykowski-Górecki
  1 sibling, 0 replies; 14+ messages in thread
From: Marek Marczykowski-Górecki @ 2015-03-20 10:41 UTC (permalink / raw)
  To: Vitaly Chernooky
  Cc: Iurii Konovalenko, Ian Campbell, xen-devel, Embedded-pv-devel,
	David Vrabel, Boris Ostrovsky


[-- Attachment #1.1: Type: text/plain, Size: 3031 bytes --]

On Fri, Mar 20, 2015 at 12:08:48PM +0200, Vitaly Chernooky wrote:
> On Fri, Mar 20, 2015 at 6:04 AM, Marek Marczykowski-Górecki <
> marmarek@invisiblethingslab.com> wrote:
> 
> > On Thu, Mar 19, 2015 at 03:10:49PM +0200, Vitaly Chernooky wrote:
> > > David,
> > >
> > > On Thu, Mar 19, 2015 at 3:00 PM, David Vrabel <david.vrabel@citrix.com>
> > > wrote:
> > >
> > > > On 19/03/15 12:10, Iurii Konovalenko wrote:
> > > > > Hi, guys!
> > > > >
> > > > > When I read, that I am not alone and that issue depends on kernel
> > > > > version, I decided to continue investigation.
> > > > > And I found why our threads locks on read/write operations.
> > > > > On Linux kernel 3.14+ syscalls of file read and write changed a bit:
> > > > > fdget() function was replaced by fdget_pos() - it is fdget() function
> > > > > plus additional position mutex lock for files with FMODE_ATOMIC_POS
> > > > > (files for inodes with S_IFREG flag set - regular nodes). As I
> > thought
> > > > > our xen files are not regular and nonseekable, I hoped this flag is
> > > > > not set. But it is set. It is because our file system is created by
> > > > > function simple_fill_super(), and inside it this flag is hardly set:
> > > > > inode->i_mode = S_IFREG | files->mode;
> > > > > So, as a fast hack I made a patch: just made copy of this function
> > for
> > > > > xen, which does not set this flag. It works for me. Could you please
> > > > > check if it works for you.
> > > >
> > > > I still can't get this to deadlock, but why not clear FMODE_ATOMIC_POS
> > > > in xenbus_file_open() ?
> > > >
> > >
> > > Because it is not the root of issue. FMODE_ATOMIC_POS is just one of
> > > results of bug. Iurii has fixed the root of issue but in suboptimal way.
> > So
> > > we just need to have found optimal way.
> >
> > I can just confirm that:
> > 1. (unsurprisingly) the bug is still present in 4.0-rc4
> > 2. both proposed fixes are effective
> >
> > I'm not sure if removing S_IFREG completely is a good idea, I guess
> > there will be much more side effects...
> > What about another idea: xenbus_file_open uses nonseekable_open - this
> > looks like a good place to clear FMODE_ATOMIC_POS if present? It
> > doesn't make sense to get a lock for position on nonseekable file,
> > right?
> >
> 
> The Open Group Base Specifications Issue 7 IEEE Std 1003.1, 2013 Edition
> requires from regular files to be seekable. But Linux kernel looks like
> Linus has own opinion on it :(((

Maybe the better idea would be to change filetype of xenbus (and others)
to S_IFIFO or something like this (but keep the file type present,
instead of removing it completely).

Regarding the implementation, maybe simple_fill_super can be modified to
not add S_IFREG if other file type is already present in files->mode?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: Type: application/pgp-signature, Size: 473 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Deadlock in /proc/xen/xenbus watch+read on 3.17+ (maybe earlier)
  2015-03-20 10:38             ` Vitaly Chernooky
@ 2015-03-20 10:46               ` David Vrabel
  2015-03-20 11:39                 ` Vitaly Chernooky
  0 siblings, 1 reply; 14+ messages in thread
From: David Vrabel @ 2015-03-20 10:46 UTC (permalink / raw)
  To: Vitaly Chernooky, Marek Marczykowski-Górecki
  Cc: Boris Ostrovsky, Embedded-pv-devel, Iurii Konovalenko,
	Ian Campbell, xen-devel

On 20/03/15 10:38, Vitaly Chernooky wrote:
> So I have finished my investigation and suggest to discuss the simple
> attaches patch.

Looks ok to me.  But this needs to go via the filesystem maintainers,
Cc'ing Linus on it.  You should explain the deadlock in more detail in
the commit message and mention that it fixes a regression and which
commit introduced the regression.

Another solution would be to replace the file with a symlink to
/dev/xen/xenbus.

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Deadlock in /proc/xen/xenbus watch+read on 3.17+ (maybe earlier)
  2015-03-20 10:46               ` David Vrabel
@ 2015-03-20 11:39                 ` Vitaly Chernooky
  0 siblings, 0 replies; 14+ messages in thread
From: Vitaly Chernooky @ 2015-03-20 11:39 UTC (permalink / raw)
  To: David Vrabel
  Cc: Iurii Konovalenko, Ian Campbell, Marek Marczykowski-Górecki,
	xen-devel, Embedded-pv-devel, Boris Ostrovsky

David,

On Fri, Mar 20, 2015 at 12:46 PM, David Vrabel <david.vrabel@citrix.com> wrote:
> On 20/03/15 10:38, Vitaly Chernooky wrote:
>> So I have finished my investigation and suggest to discuss the simple
>> attaches patch.
>
> Looks ok to me.  But this needs to go via the filesystem maintainers,
> Cc'ing Linus on it.

Just add Linus to cc? May be to continue mailing thread mentioned
above will be better?

WIth best regards,

>  You should explain the deadlock in more detail in
> the commit message and mention that it fixes a regression and which
> commit introduced the regression.

Ok,

>
> Another solution would be to replace the file with a symlink to
> /dev/xen/xenbus.
>
> David



-- 
Vitaly Chernooky | Senior Developer - Product Engineering and Development
GlobalLogic
P +380.44.4929695 ext.1136 M +380.63.6011802 S cvv_2k
www.globallogic.com

http://www.globallogic.com/email_disclaimer.txt

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-03-20 11:39 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-19  1:19 Deadlock in /proc/xen/xenbus watch+read on 3.17+ (maybe earlier) Marek Marczykowski-Górecki
2015-03-19 10:38 ` Ian Campbell
2015-03-19 12:10   ` Iurii Konovalenko
2015-03-19 12:32     ` [Embedded-pv-devel] " Vitaly Chernooky
2015-03-19 13:00     ` David Vrabel
2015-03-19 13:10       ` Vitaly Chernooky
2015-03-20  4:04         ` Marek Marczykowski-Górecki
2015-03-20  9:58           ` Vitaly Chernooky
2015-03-20 10:08           ` Vitaly Chernooky
2015-03-20 10:38             ` Vitaly Chernooky
2015-03-20 10:46               ` David Vrabel
2015-03-20 11:39                 ` Vitaly Chernooky
2015-03-20 10:41             ` Marek Marczykowski-Górecki
2015-03-19 11:10 ` David Vrabel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.